Updates on 2022/11/29

Technology rarely (never?) removes scarcity, even when it appears to be doing so, because scarcity, like energy, is conserved. The scarcity simply goes somewhere else, up and down the value chain or elsewhere in the social fabric.

An idea Grex shared with me once, and stuck in my mind:

Conversations, magazines, and great cities provide a kind of "scoped serendipity" that people seek in productive creative processes.

Scoped serendipity is something in between the wild lawlessness of complete randomness and the predictable mundanity of rigid structure. Sometimes, it's editorial curation (as in magazines), sometimes it's "the algorithm" (Twitter at its best), and sometimes it's just being in a place where the right people or ideas are flying about frequently enough that you bump into them sooner or later (conversations, cities).

How do we create scoped serendipity in creative workflows/tools?

A powerful pattern in designing creative tools seems to be perspective transformations on input.

I first had this thought while thinking about an iPad app for creating animations called Looom. Looom is special because rather than sketching an animation frame-by-frame as in a traditional animation illustration process, you sketch the time-axis first. You first draw a few motion-tracked strokes that set the "motion timing curves" (not sure what else to call them) for your animation, and attach illustrations in every frame to those animation curves. It feels like a totally different way to produce animations, and I've been thinking about it a lot since I first saw the app.

Looom's brilliance is that it lets creation happen in a different perspective, painting "in time" first, than "filling in" the details in space. My phrase for this perspective shift is a perspective transformation, because it reminds me of coordinate transformations.

I can imagine other kinds of interesting perspective transformations:

  • A Fourier transform when creating music, so that you create the rhythm/beats first then add tone and timbre later.
  • A "color space first" video editor, where you start by defining how the color palettes should evolve over the story and later find clips that fit that color palette to flesh out the video.
  • An "emotional arc" based writing tool that lets you first sketch the ups and downs of tone and emotion across your story, and then later transforms your sentences as you type to fit that narrative arc.
  • A songwriting interface that lets you "sculpt" a verse by humming the rough ups and downs of the track first, then iteratively refine the exact notes and rhythms, adding more precision at each step.

What would a software interface for writing look like if we hadn't started with the deeply entrenched prior of the typewriter?

Building toys and worlds > building tools and processes.

We need to pull computation as a capability out of computers and their keyboards and mice and glass screens. Computation as a power deserves better than to be locked into such bland compromised media.

A random idea I've been thinking about: heterogeneous-media shared documents as a collaboration medium between human orchestrators and LLM-style agents.

I sort of mentioned this in passing in one of my blogs in the past. I think it would be interesting to build an assistant experience where tasks are framed as the model filling out a worksheet or modifying some shared, persistent environment alongside the user.

So instead of "Can you book me a flight for X at Y?" you're just co-authoring a "trip planning" doc with the assistant, and e.g. you may tag the assistant under a "flights" section, and the assistant gets to work, using appropriate context about trip schedules and goals elsewhere in the doc. It can obviously put flight info into the doc in response, but if it needs to clarify things or put additional info/metadata/"show work" or share progress updates as it does stuff, a shared doc provides a natural place for those things without having to do sync communication with the user over the rather constrained text chat interface.

A "document" vs "message thread" is also more versatile because the model can use it as a human-readable place to put its long-term knowledge. e.g. keeping track of a list of reminders to check every hour, a place for it to write down things it learns about the user's preferences such that it's user-editable/auditable, etc. Microsoft's defunct Cortana assistant had a "Notebook" feature that worked this way, and I thought still think it was a good idea.

The heterogeneous media part is where I think it would be extra useful if this weren't just a Google Doc, but accommodated "rich action cards" like what you see in actionable mobile notifications or in MercuryOS. The model could do things like embed a Linear ticket or create an interactive map route preview, in a way that those "cards" are fully interactive embeds, but are still legible to the language model under the hood.