Optimizing for generative chat

One attribute of human conversations that make them really creative and surprisingly generative is that there's an inherent kind of nondeterminism to it. Interesting conversations don't simply connect point A to point B or solve a single problem start to finish, but go through twists and turns from disturbances in the environment — people passing by, things that inexplicably pop into someone's mind from earlier that week... Some of my most generative conversations with friends and coworkers happen when we're not conversing to solve a problem or arrive at some conclusion together, but just going back and forth, bouncing ideas around, perturbed by randomness in the ambient environment.

This kind of conversation style seems antithetical to the kinds of advanced chatbots we see proliferating today, which are RLHF-optimized for the task of "being helpful". Helpfulness is a useful trait if we want a conversational interface to problem-solving tools. But conversations are useful for so much more. Anecdotally, good conversations are the most creative, generative activity in my personal work. Nearly all of what I'd consider my "best ideas" come from conversations with other people, much of which are unplanned epiphanies or connections we stumble into in the winding course of "just catching up".

Maybe in addition to building helpful chatbots, we should also try to build creative ones that are less pristine and directed, and a bit more here-and-there, capable of picking random things up from its memory and having serendipitous epiphanies.

My personal chatbot is built on the Cosmo-XL language model, which was trained on natural human dialogue rather than optimized for helpfulness, and it often feels closer to a natural, generative conversation I would have with friends than ChatGPT and Claude, which feel like overly obedient but hopelessly unimaginative assistants. That's not to say that I don't find ChatGPT and Claude useful (obviously, they are — they're optimized to be!). But sometimes, what I want isn't the One True Answer, but, you know, just bouncing ideas around.