At risk of over-anthropomorphizing LMs, I think context window = field of view and retrieval into context = memory. You can try to have ever more expansive fields of view, but you don't really need it after a certain point if you just have good memory.
Something so charming and quaint about the feeling of joining a new social network early, like everybody's in the same room all kind of standing around awkwardly, trying to figure out why we're all here and what this is going to be.
I was thinking about how there isn't really an "executable" version of knowledge.
In software, we have declarative data (data notation like JSON and Edn, wire formats like Protobuf) and executable code (functions, expressions, procedures that operate on data). Both data "code" and executable "code" are first-class materials that we work with to describe software systems.
In the realm of less structured, more general knowledge work, though, it's not obvious that there is such a thing as "executable knowledge". Knowledge feels static and declarative -- a collection of statements of facts about the world.
| Declarative | Executable ------------------------------------ Software | Data | Programs Knowledge | Facts | ???
One way to fill this blank would be to notice that programs are transformations on data: programs take in some information and transforms it to modify it or output new information. By that logic, whatever goes in that "executable knowledge" blank, we could say, should enable transformations on knowledge: take in some knowledge about the world, and yield some new statements about the world. Another word for this might be inference; we begin with some base of knowledge, and infer what we may not have known before. Help us expand into the frontiers of knowledge.
It used to be that the only way to automate inference was using logic programming systems like Prolog or theorem provers, which required careful manual specification of known facts about the world, as well as an explicit enumeration of all the rules the system was allowed to use to derive conclusions from facts. Today, we have tools that feel more natural for humans to use, because they learn these inference rules of the world implicitly through observation and training.
One such kind of tool is GPT-style language models. These models can be plugged into conversational interfaces, where humans can give the model access to some base of knowledge and draw conclusions by asking questions or instructing the model directly to compute e.g. a summary or an analysis from existing data.
But to me, GPT-style conversational models don't feel like a robust, well-structured kind of runtime for structured thought that programs can often be. They're a little too squishy and probabilistic (though there's ongoing work to invent more structured ways of prompting GPTs). When I want more structure and composability, I still think there is interesting potential in expressing inference steps as movements in the latent space of these models. Latent space movements are just vector arithmetic, which opens up the possibility of expressing and manipulating complex chains of thought as crisp mathematical objects rather than soft human instructions. There are also hints of cool Lisp-like homoiconicity in the idea of expressing both ideas and transformations on ideas using the same type of data -- vectors in latent space.
Natural language is a UI for navigating high-dimensional spaces.
— David Holz, apparently, via Gordon Brander
Knowledge is conversations integrated over time.
A conversation is the instantaneous rate of change of our collective knowledge.
Maybe the lesson is that no matter where you are, you have to fight against disorder. You should choose a place where the fight makes sense.
Optimizing for generative chat
One attribute of human conversations that make them really creative and surprisingly generative is that there's an inherent kind of nondeterminism to it. Interesting conversations don't simply connect point A to point B or solve a single problem start to finish, but go through twists and turns from disturbances in the environment — people passing by, things that inexplicably pop into someone's mind from earlier that week... Some of my most generative conversations with friends and coworkers happen when we're not conversing to solve a problem or arrive at some conclusion together, but just going back and forth, bouncing ideas around, perturbed by randomness in the ambient environment.
This kind of conversation style seems antithetical to the kinds of advanced chatbots we see proliferating today, which are RLHF-optimized for the task of "being helpful". Helpfulness is a useful trait if we want a conversational interface to problem-solving tools. But conversations are useful for so much more. Anecdotally, good conversations are the most creative, generative activity in my personal work. Nearly all of what I'd consider my "best ideas" come from conversations with other people, much of which are unplanned epiphanies or connections we stumble into in the winding course of "just catching up".
Maybe in addition to building helpful chatbots, we should also try to build creative ones that are less pristine and directed, and a bit more here-and-there, capable of picking random things up from its memory and having serendipitous epiphanies.
My personal chatbot is built on the Cosmo-XL language model, which was trained on natural human dialogue rather than optimized for helpfulness, and it often feels closer to a natural, generative conversation I would have with friends than ChatGPT and Claude, which feel like overly obedient but hopelessly unimaginative assistants. That's not to say that I don't find ChatGPT and Claude useful (obviously, they are — they're optimized to be!). But sometimes, what I want isn't the One True Answer, but, you know, just bouncing ideas around.