Linus's stream

What would a software interface for writing look like if we hadn't started with the deeply entrenched prior of the typewriter?

Building toys and worlds > building tools and processes.

We need to pull computation as a capability out of computers and their keyboards and mice and glass screens. Computation as a power deserves better than to be locked into such bland compromised media.

A random idea I've been thinking about: heterogeneous-media shared documents as a collaboration medium between human orchestrators and LLM-style agents.

I sort of mentioned this in passing in one of my blogs in the past. I think it would be interesting to build an assistant experience where tasks are framed as the model filling out a worksheet or modifying some shared, persistent environment alongside the user.

So instead of "Can you book me a flight for X at Y?" you're just co-authoring a "trip planning" doc with the assistant, and e.g. you may tag the assistant under a "flights" section, and the assistant gets to work, using appropriate context about trip schedules and goals elsewhere in the doc. It can obviously put flight info into the doc in response, but if it needs to clarify things or put additional info/metadata/"show work" or share progress updates as it does stuff, a shared doc provides a natural place for those things without having to do sync communication with the user over the rather constrained text chat interface.

A "document" vs "message thread" is also more versatile because the model can use it as a human-readable place to put its long-term knowledge. e.g. keeping track of a list of reminders to check every hour, a place for it to write down things it learns about the user's preferences such that it's user-editable/auditable, etc. Microsoft's defunct Cortana assistant had a "Notebook" feature that worked this way, and I thought still think it was a good idea.

The heterogeneous media part is where I think it would be extra useful if this weren't just a Google Doc, but accommodated "rich action cards" like what you see in actionable mobile notifications or in MercuryOS. The model could do things like embed a Linear ticket or create an interactive map route preview, in a way that those "cards" are fully interactive embeds, but are still legible to the language model under the hood.

To make good tools, think of yourself as a toy maker.

Idea: Obscura

Synopsis:

  • Instagram circa early 2010s, but instead of filters that apply color changes from a LUT or something, each filter is a different direction in the Stable Diffusion latent space.
  • Main technical challenge here would be editing real images while preserving fidelity to the original, especially on things like human faces, but I think that’s a solved problem in a few months. Imagic achieves this, sort of, but takes a long time (~5-6mins on a top-of-the-line datacenter GPU).

I’ve been thinking a lot this week about Paper (the classic iPad drawing app). They gave you like a color palette of 15 colors, the most simple color mixer in the world (and no color picker otherwise), and like 5 "brushes" that were like pencil, pen, watercolor. Super simple. As simple as possible. But it was very hard using those tools to make something that looked ugly. What if you could build images by capturing the basics of an idea from a napkin sketch or something in front of you, and then use like 5-10 "style filters" in the Stable Diffusion latent space to "artsify" it in a way that was hard to mess up?

Inspirations from Paper by Fiftythree

  • The experience of reading, ideation, craft, creation deserves a beautiful interface, not just a functional one. The interface should inspire creation, not intimidate with a blank canvas.
  • Paper is great because it offers a simple, minimal set of tools that each work beautifully without needing or affording customization, but combine together to form a cohesive toolkit. Attribute vectors in a latent space UI should feel similarly: a small default set of abstractions not needing customization, that inspire beautiful creation and provide a huge expressive range. This isn't so much about PCA or anything quantitatively computed, as much as a matter of taste -- What minimal set of thoughtfully designed tools makes people unafraid to create, because an unassuming brushstroke can yield something beautiful that they can be proud of?
  • Paper's color mixing wheel is a beautiful interface. It lets users traverse a latent space without explicitly constructing or visualizing the space or attribute vectors, by simply picking from a palette or sampling colors from their canvas. We can imagine implementing something similar for images or language using generative models, moving around in latent space by swirling and remixing images and texts we come across in life or across the web.

Good food for thought a time of explosive experimentation and exploration for creative interfaces using generative AI models.

The degree to which just scaling up compute (read: money) in language models just solves problems that architectural tweaks can't... is truly annoying sometimes.

What minimal set of thoughtfully designed tools makes people unafraid to create, because an unassuming brushstroke can yield something beautiful that they can be proud of?

Oral tradition is the more canonical form of language — written came later. Maybe that means Glyph should primarily be situated in context and oral/linear-first?