Composable reasoning

A key tension when engineering with deep learning systems is balancing composability with performance. In general, when not bottlenecked by compute nor data, DL systems perform best when trained end-to-end on a suitable objective. But intermediate features of the learned model are hard to interpret, and because these intermediate features not intended to be consumed by humans, it's hard to integrate many such systems together into a larger whole in a way that allows humans to reason about the behavior of the integrated whole as reliably as “classic” software systems with well-defined interfaces between modular parts.

For example, Tesla's autonomous driving system has hundreds of subsystems trained semi-independently (same model backbone shared for efficiency, but different training objectives for downstream models). These subsystems perform specific, human-legible tasks like “identify drivable areas” or “locate pedestrians in the scene” which feed into subsequent tasks. This makes debugging and maintenance easier, but might cap system performance for tasks that require passing ambiguity or other more nuanced state between models.

Language model cascades are a notable exception, because using chains of thought, forcing the model to “think” through human-interpretable intermediate states (generated text) seems to improve performance on many tasks compared to a baseline of direct prompting (though they still lose to direct optimization against a large supervised dataset). LLM chains also improve debugging and interpretability, and in general make “engineering with NLP models” more tractable.

Composition and modularity are key to how we maintain large classic software systems, and it seems noteworthy that (1) deep learning systems in general push against this practice and (2) LLM programs can be composed from modular pieces without sacrificing power. This is not to diminish the importance of work in AI interpretability, though, and I think there’s lots of valuable advances ahead in how we convert opaque learned features in end-to-end trained systems to human-legible ideas.

The latter is also the subject of some of my current work: How can we take intermediate features of generative models, render them legible to humans, and then let us use them to further control and refine existing models?