Life update: I'm joining Notion!
My "please email, don't use Twitter DM" shirt is raising a lot of questions that are answered by my "please email, don't use Twitter DM" shirt.
The best Taylor Swift song is still Love Story, and the best performance of it is still from the 1989 World Tour.
There’s a story I’ve been wanting to write, but currently don’t have time for. It’s still mostly a scattered cloud of ideas and motifs, so I’m going to write them here in case I want to come back to it when I have time to write some fiction.
I'm fascinated by dreams:
- Dreams as something that makes us human, but as something that can also be transhumanist: technology letting us relive dreams, engineer dreams, and interpret dreams as if prophetic and magical
- Chasing Dreams (a kind of goal-directed behavior) as key to agenthood and intelligence
I'm also interested in exploring these other ideas:
- Escape, independence, freedom as fundamental agentic desires, even beyond humanity
- Cloning and identity. What becomes of identity in a world where the marginal cost of cloning (biological clones, content clones) goes to zero?
- One speculation is to think of the self as a kind of autonomous cloud of distributed agents working together (consciously or not) to accomplish shared goals, rather than a single body.
This idea came to me, ironically, in a dream about an emancipated humanoid robot learning to contend with the concept of identity in a world full of clones of itself. A bit like The Bicentennial Man, but more cyberpunk.
A wise mentor (sadly can't recall who) once told me that good research, above all, asks an interesting question and teaches us something new and meaningful about the world.
I always found that last bit a useful guiding principle: no matter the results or the benchmarks or the numbers and feats in the paper, if the work doesn't reveal something new and interesting about the ideas or mechanisms it studies, it feels less like research and more like engineering.
Latent representations are just semantically sensitive, potentially reversible hash functions.
Impossible to overstate the technical complexity involved in building a simple rich text editor that works well. I'm constantly surprised by how deceptively simple it seems from the outside, and how even extremely well-resourced teams with very talented people often discover latent bugs or misalignments between "what should make sense" and "what people seem to expect the thing to do in the real world."
A key tension when engineering with deep learning systems is balancing composability with performance. In general, when not bottlenecked by compute nor data, DL systems perform best when trained end-to-end on a suitable objective. But intermediate features of the learned model are hard to interpret, and because these intermediate features not intended to be consumed by humans, it's hard to integrate many such systems together into a larger whole in a way that allows humans to reason about the behavior of the integrated whole as reliably as “classic” software systems with well-defined interfaces between modular parts.
For example, Tesla's autonomous driving system has hundreds of subsystems trained semi-independently (same model backbone shared for efficiency, but different training objectives for downstream models). These subsystems perform specific, human-legible tasks like “identify drivable areas” or “locate pedestrians in the scene” which feed into subsequent tasks. This makes debugging and maintenance easier, but might cap system performance for tasks that require passing ambiguity or other more nuanced state between models.
Language model cascades are a notable exception, because using chains of thought, forcing the model to “think” through human-interpretable intermediate states (generated text) seems to improve performance on many tasks compared to a baseline of direct prompting (though they still lose to direct optimization against a large supervised dataset). LLM chains also improve debugging and interpretability, and in general make “engineering with NLP models” more tractable.
Composition and modularity are key to how we maintain large classic software systems, and it seems noteworthy that (1) deep learning systems in general push against this practice and (2) LLM programs can be composed from modular pieces without sacrificing power. This is not to diminish the importance of work in AI interpretability, though, and I think there’s lots of valuable advances ahead in how we convert opaque learned features in end-to-end trained systems to human-legible ideas.
The latter is also the subject of some of my current work: How can we take intermediate features of generative models, render them legible to humans, and then let us use them to further control and refine existing models?
What is the self-driving car of NLP?
Autonomous driving is a landmark problem in computer vision, perhaps the real-world problem, as @geohot from Comma says often:
Self-driving cars are still the coolest applied AI problem today.
I think it’s worth thinking about what such an applied, real-world machine learning problem would be for natural language understanding and text generation. My hypothesis is that a good candidate for this “self-driving car of NLP” is open-domain, abstractive question answering, wherein a human uses an AI system to synthesize a natural language answer to some knowledge-based question, based on a large corpus of diverse documents, only some of which contain information relevant to answering the question.
Natural language web search is the most ambitious form of this problem, but a more tractable target might be to solve a similar problem for organizations with lots of private knowledge — chat histories, emails, planning documents, feedback surveys, paperwork, contacts, and on and on and on — synthesizing answers to questions like "Do I know any recruiters working at a biotech company?" or "Was there any update about our deal with X from last week's board meeting?" or, even higher level, "What are the most common customer complaints we have about Y feature?"
Both autonomous driving and ODQA:
- are challenging and unsolved technical problems, where solving it perfectly requires fully general intelligence, and would advance the state of the art in many related research domains
- meet clear and lucrative market needs with willingness to pay (== willingness for the market to finance R&D necessary to push on the problem)
- are hot problem spaces with many initial players, but probably few real winners in the end, mostly driving by superiority in data, compute ($$), and research capabilities.
- automate real-world activities that most humans perform often in their daily lives, so that solving it would dramatically improve most people’s lives and save humans lots of labor.
In particular, speaking from my personal experience as well as opinions from experts I’ve spoken to, I believe the market is dramatically underestimating the unsolved technical challenges standing in between us today and the long-term solution to this problem.