the stream

2022/11/29 2:39

A random idea I've been thinking about: heterogeneous-media shared documents as a collaboration medium between human orchestrators and LLM-style agents.

I sort of mentioned this in passing in one of my blogs in the past. I think it would be interesting to build an assistant experience where tasks are framed as the model filling out a worksheet or modifying some shared, persistent environment alongside the user.

So instead of "Can you book me a flight for X at Y?" you're just co-authoring a "trip planning" doc with the assistant, and e.g. you may tag the assistant under a "flights" section, and the assistant gets to work, using appropriate context about trip schedules and goals elsewhere in the doc. It can obviously put flight info into the doc in response, but if it needs to clarify things or put additional info/metadata/"show work" or share progress updates as it does stuff, a shared doc provides a natural place for those things without having to do sync communication with the user over the rather constrained text chat interface.

A "document" vs "message thread" is also more versatile because the model can use it as a human-readable place to put its long-term knowledge. e.g. keeping track of a list of reminders to check every hour, a place for it to write down things it learns about the user's preferences such that it's user-editable/auditable, etc. Microsoft's defunct Cortana assistant had a "Notebook" feature that worked this way, and I ~~thought~~ still think it was a good idea.

The heterogeneous media part is where I think it would be extra useful if this weren't just a Google Doc, but accommodated "rich action cards" like what you see in actionable mobile notifications or in MercuryOS. The model could do things like embed a Linear ticket or create an interactive map route preview, in a way that those "cards" are fully interactive embeds, but are still legible to the language model under the hood.