As AI systems get more and more capable (and comparatively less understandable), there will be more and more leverage placed on the interfaces through which humans work with these capabilities. It seems like a question at least as important to study deeply as the AGI question.
A mathematician without the interface of modern notation is powerless. A human without the right interfaces to superhuman intelligence will be no better. Interfaces and notations form the vocabulary humans and machines use to stay mutually aligned. Interface design, then, is an AI alignment problem.
Chat isn't it. Prompt engineering isn't it. Algorithmic feeds are definitely not it. The answer will be more multimodal and lean on a balance of human abilities in language and other senses.
At the base level, the fundamental question in this space is: what is the right representation for thought? For experience? For questions? What are the right boundary objects through which both AI systems and humans will be able to speak of the same ideas? What are their rules of physics in software interfaces?
What happens if we drag-to-select a thought? Can we pinch-to-zoom on questions? Double-click on answers? Can I drag-and-drop an idea between me and you? In the physical world, humans annotate their language by an elaborate organic dance of gestures, tone, pace, and glances. How, then, do we shrug at a computer or get excited at a chatbot? How might computers give us knowing glances about ideas it's stumbled upon in our work?
Natural language is the Schelling Point of intelligence. To try to bypass language (e.g. Neuralink) may be a misguided mission, because it overestimates the extent to which language is a communication channel, and underestimates the extent to which it's a world model.
Yet another Oak-based CLI lifts off! This time, it's something that's probably useful to more than just me. It's called Rush, and it helps run a single command on many files according to a template string.
For example, I can batch-rename many image files in a short and readable one-liner:
rush mv *.jpeg '{{name}}.jpg'The pitfall of conversational UIs
A lot of tasks involve keeping track of state throughout, and conversations are terrible interfaces in which to keep track of state.
Tasks that involve keeping track of state:
- Travel planning (what you've seen, which places/bookings you made)
- Project management (what have I done? what's on my plate?)
- Researching a topic (why do people keep all those tabs open if not to keep state?)
- Decision making (what choices do I have? which is better how?)
- Following instructions (what have I done? did I miss a step? how much is left?)
- Editing [podcasts, videos, papers]
- Understanding a complex system, like reading a map
Tasks that don't involve state, and are good for CUIs:
- Querying specific trivia (weather, calendar events, adding todos)
- Fire-and-forget tasks (Send X a message, play music)
- AI as conversational partner (e.g. brainstorming, but then you'd need to "keep state" in another place like meeting notes)
If the user has to keep track of state in a conversation they have to hold state in their working memory (hard for no reason) or keep asking the interlocutor (what was step one again?).
So what's the solution? I think interactive/itemized workspaces. Instead of saying "book me a flight" and then doing the dozen back-and-forths to decide on all the details, just present all the choices to the user and have the user drag and drop the right flight onto their calendar. Much faster, more intuitive, and there's a clear, obvious visual analogue of what's happening.
Another solution may be documents you can talk to. Instead of holding a conversation with a bot, you and the bot collaborate together to write a document and build up a record of the salient points and ideas. Think GitHub Copilot for everything.
Just referenced a conversation I had with an AI in a real conversation with a human for the first time... feels like an uncanny inflection point?
It's very satisfying and remarkable that Monocle has almost completely eliminated the "I saw this somewhere, but I can't remember where, and Google can't find it" problem from my life. Almost completely. Online eidetic memory for the web.
I've been working on a conversational bot that I can brainstorm with and ask for advice. I've been tinkering with it for a few days, but today, I finally had a conversation that I thought was helpful enough for me to feel that it was adding value to my working process. Here's a raw (except for fixing some of my typos) transcript of the conversation (minus the prompt). The conversation is generated using GPT-J-6B running under my Calamity deployment.
Here's an excerpt:
Me: What if we present browser history as a map to explore instead? how do we lay out a map?
AI: One option would be to put the first node of the map near the top of the map, and have a node for each event. And then the second map layer might be a map to explore the details for a particular event in the first map. And so on.
Me: How do you think this compares to a geographic map? Can we draw some analogies to this "online history map"?
AI: There's a good analogy here that is actually very similar to what a geographic map is. A geographic map is basically a map that is made by plotting points of latitude and longitude on a flat plane, so they have a two dimensional layout. You can then put different data on top of that map, like cities, roads, rivers, mountains, etc.
Me: How does that analogy extend to showing more abstract browsing histories on a map?
AI: I think you'd have to start with the idea of mapping history to points of latitude and longitude. Then the second layer of information could be a set of different things that go along with that point. You could map history to a collection of objects, such as locations, objects, and/or actions.
Me: I like that. It's like there's a pre-existing "map of the internet" with different topics in different neighborhoods, and browsing the web is like taking a stroll through those neighborhoods. Your paths then show up on the map as paths you took.
AI: That's a good analogy.
It's still... not perfect (I ask it how to be productive, and it tells me I can sit on the bed and watch TV). But baby steps!
I figured out a much more succinct way to communicate what I was trying to say in my last stream update about nonlinear reading:
Linear reading is a depth-first search through the knowledge in a text. Non-linear reading allows breadth-first searches through the same text, treating it as a more densely connected graph rather than a sequence.
Nonlinear reading
Free the written word from the tyranny of linearity!
One interesting consequence of nonlinear reading (using the heatmap prototype I'm building) is that the reward curve for reading becomes much smoother.
With traditional prose writing, you may need to read some substantial portion of a text to get "into the meat" and begin to reap any value from what you're reading, but with nonlinear reading, you start learning and picking up new information immediately, and the closer you read, the more you pick up, in an almost linear correspondence.
This makes me want to read almost everything I come across to at least some level of scrutiny, and makes me at least skim-read texts that I otherwise wouldn't even have clicked on, because even 3-5 seconds of reading can teach me something new or give me good information about whether a closer read is worth my time.