There are no original ideas; only infinitely varied remixes.
This itself isn't a new idea. All thinking is creative recombination. But language model-based creative tools of today all require us to start each idea from scratch, at an empty text box.
When I think within my own mind, I build upon snippets of conversations and bits of quotes from stories I've heard, and recombine them to produce something new. The ingredients with which I think are not words or tokens, but pieces of ideas, abstract blobs of concepts and quotes from my memory. It seems obvious that we should be able to work with language models similarly, at least for creative use cases, by recombining pieces of our experience rather than typing into a mostly empty rectangle.
Instead of instructing with words or prompting with keywords, I want to bring in a passage from my favorite author and say "what does this make you think of?" I want to smash two different paragraphs about creativity from my notes together inside a neural network and see what ideas fall out. I want to paint over a model-generated image with a brush I conjured from the color palette of my favorite photograph. I want to control these models with ideas and experiences plucked from my memory, not tokens and words.
Creation is iterated refinement in an infinite option space
(an excerpt from a text conversation)
I was talking earlier today about how you can view a creation process not as additive (starting with a sentence, and then adding another and another) but as iterated refinement and filtering through an infinite option space (there are an infinite continuations of your first sentence. How do you choose the right continuation?) Because LLMs can explicitly compute all possible continuations, this is an interesting way to look at writing with an AI, and AI-augmented creation in general.
In the case of writing, an interesting UI for this could be like, each paragraph you write becomes a "card", and the AI can place cards underneath a card as suggestions for alternative wordings, newly discovered images, links, etc. Cards kind of "peeking out" from under a paragraph might be a neat way to signal "theres something new here you should look at" without intruding on the writer's space when they haven't explicitly summoned help. It also makes possible this other interface that I've thought about, where you click on any paragraph's "stack of cards" and it unfurls to show you 3-4 possible variations to explore other stylistic variations or clearer wordings.
One of the critical things that Midjourney got right, as I learned from talking to David, was targeting prosumer and the "enthusiastic non-professional" users first over "pro" users. Pros may appear more lucrative in the beginning for a creative product, but they have ingrained workflows that they don't want to change. This early in the lifecycle of the technology and product, when rapid iteration is critical, Midjourney benefits enormously from the flexibility afforded by a less "pro" user base with more flexible workflows that can evolve as quickly as they can ship and experiment.
I imagine similar tailwinds benefit Replit, who seem to ship interesting future-facing ideas about how software is built faster than almost any other organization of any size.
An insight about recommender systems from someone whose name I regretfully can't recall at the moment:
Many recommender systems/algorithms model interest as a precise static region in representation space, such that the algorithm becomes about zooming in forever and ever to higher-resolution patches of this space to find exactly what the user wants. In reality, interests shift, and the recommender algorithm may influence the user's shifting interests, in addition to being informed by them. So it makes more sense to model recommendation algorithms as a thing that traverses a linked list or a branching tree evolving over time, more than a "zooming in forever" into some perfectly interest-aligned patch of the topic space.
Otherworlds
I noted once that the most interesting potential for virtual/mixed reality wasn't to put yourself in a virtual office or the ocean floor; it was that you could experience entirely different worlds with different physics, where time flows differently, where acoustics mutate as sound waves fly through the air. In VR, you could move through scales of experience, from nanometers to miles, as easily as you move a few feet through space in the real world.
I feel a similar sense of loss of underexploration about large generative models for images and text. We can use these models to render anything at all, tell any story at all, invent any language, create any soundscape ... and we use these dream machines mostly to render simulacra of reality with the details swapped around.
Endowed with the magic to immerse ourselves in worlds of our own making and languages of our own creation, we are so eager to rebuild worlds that already constrain us, speaking languages just as familiar as our own. We are given the power to imagine anything, and we imagine the here and now. Why?
All around us, there are other worlds blooming, if we only looked a bit closer.
Ink & Switch and Anthropic produce far and away the best-written and best-produced research reports in computer science-related fields (as far as I can tell), and everyone else should strive to reach for the same level of presentation, accessibility, clarity, and depth.
Underrated fact about training in the very large regime: you don't have to worry about overfitting/early stopping because single-epoch training is the default, and it turns out it's No Big Deal at all if you do single-digit number of epochs on these huge AF overparameterized models!
Academic benchmark datasets that are in the order of tens of thousands of samples are annoying in this way.