Media-native programming languages

Modern programming languages are very good at handling strings. Not only do they have built-in representations of strings in the common string "type", they have built-in support (into the language or as standard libraries) for searching within strings, comparing them, slicing them, combining them, and various other useful operations. As a result, most software we use today all expect us to enter text data. They speak the language of "text".

By contrast, modern tools handle images and audio only reluctantly. Images and audio are the native I/O types of the human mind, if you will -- it's much higher-bandwidth, and much more closer to "the organs" even if they're farther from "the metal" of the computer.

What if we could build into programming languages the same capabilities for working with rich media, as we've done for strings? What if OCR and speech to text, seeking and searching for objects or strings within video, photos, and audio, were all as easy as photo.findAll(:car) or audio.transcribe({ lang: 'en_us' }), built into your compiler? I think it would usher in a whole new age of software tools that let us interact with them in richer, more organic ways. If reading text from an image was as easy as reading text out of a binary buffer, how many more tools would let us take pictures to capture information?

You might say, "this sounds like a huge amount of complexity, Linus! No sane PL would ever do this!" But we've done this for text, because the tradeoffs are worth it -- Go ships out of the box with rich support for full UTF-8 text. This wasn't always the case. C, for example, has no native string type -- C works with bytes and characters, in the same way that current programming languages work with pixels and audio file buffers.

I submit to you: it doesn't have to be this way! We can create a world where we can program with rich visual and sonic information with the same ease with which we work with text. That day can't come quick enough.