Tools for the million-call language model chain future

There is no shortage of tools and startups that want to help people write "language model chains" or "cascades". These tools stitch multiple calls to large language models together to perform complex high-level tasks. Beyond adding capability, making reasoning steps explicit can also make language model programs more interpretable and easier to debug.

LLM chaining tools today accommodate in the order of 10-100 calls to language models, and towards the higher end of that spectrum, these interfaces get very unwieldy to use. But it seems likely to me that, as LLM inference cost falls and latency drops, the average number of LLM inference calls in a language model cascade program is going to go up exponentially, or at least superlinearly, bottlenecked by inference compute cost and quality of tooling.

Tools that work for 100-call model cascades are going to look very different than those designed for 1M-call model cascades, analogous to how programming languages and environments for MHz-range computes look very different compared to languages and tools for modern multi-core GHz-range computers. I think this is a forward-looking problem worth thinking about: What kinds of tools do we need to enable future language model programs with millions of calls to multi-modal, large generative models?