Updates on 2022/11/26

Idea: Obscura

Synopsis:

  • Instagram circa early 2010s, but instead of filters that apply color changes from a LUT or something, each filter is a different direction in the Stable Diffusion latent space.
  • Main technical challenge here would be editing real images while preserving fidelity to the original, especially on things like human faces, but I think that’s a solved problem in a few months. Imagic achieves this, sort of, but takes a long time (~5-6mins on a top-of-the-line datacenter GPU).

I’ve been thinking a lot this week about Paper (the classic iPad drawing app). They gave you like a color palette of 15 colors, the most simple color mixer in the world (and no color picker otherwise), and like 5 "brushes" that were like pencil, pen, watercolor. Super simple. As simple as possible. But it was very hard using those tools to make something that looked ugly. What if you could build images by capturing the basics of an idea from a napkin sketch or something in front of you, and then use like 5-10 "style filters" in the Stable Diffusion latent space to "artsify" it in a way that was hard to mess up?

Inspirations from Paper by Fiftythree

  • The experience of reading, ideation, craft, creation deserves a beautiful interface, not just a functional one. The interface should inspire creation, not intimidate with a blank canvas.
  • Paper is great because it offers a simple, minimal set of tools that each work beautifully without needing or affording customization, but combine together to form a cohesive toolkit. Attribute vectors in a latent space UI should feel similarly: a small default set of abstractions not needing customization, that inspire beautiful creation and provide a huge expressive range. This isn't so much about PCA or anything quantitatively computed, as much as a matter of taste -- What minimal set of thoughtfully designed tools makes people unafraid to create, because an unassuming brushstroke can yield something beautiful that they can be proud of?
  • Paper's color mixing wheel is a beautiful interface. It lets users traverse a latent space without explicitly constructing or visualizing the space or attribute vectors, by simply picking from a palette or sampling colors from their canvas. We can imagine implementing something similar for images or language using generative models, moving around in latent space by swirling and remixing images and texts we come across in life or across the web.

Good food for thought a time of explosive experimentation and exploration for creative interfaces using generative AI models.

The degree to which just scaling up compute (read: money) in language models just solves problems that architectural tweaks can't... is truly annoying sometimes.

What minimal set of thoughtfully designed tools makes people unafraid to create, because an unassuming brushstroke can yield something beautiful that they can be proud of?