David Holz on Midjourney and the burgeoning space of AI art
Right now, it feels like the invention of an engine: like, you’re making like a bunch of images every minute, and you’re churning along a road of imagination, and it feels good. But if you take one more step into the future, where instead of making four images at a time, you’re making 1,000 or 10,000, it’s different. And one day, I did that: I made 40,000 pictures in a few minutes, and all of a sudden, I had this huge breadth of nature in front of me — all these different creatures and environments — and it took me four hours just to get through it all, and in that process, I felt like I was drowning. I felt like I was a tiny child, looking into the deep end of a pool, like, knowing I couldn’t swim and having this sense of the depth of the water. And all of sudden, [Midjourney] didn’t feel like an engine but like a torrent of water. And it took me a few weeks to process, and I thought about it and thought about it, and I realized that — you know what? — this is actually water.
Right now, people totally misunderstand what AI is. They see it as a tiger. A tiger is dangerous. It might eat me. It’s an adversary. And there’s danger in water, too — you can drown in it — but the danger of a flowing river of water is very different to the danger of a tiger. Water is dangerous, yes, but you can also swim in it, you can make boats, you can dam it and make electricity. Water is dangerous, but it’s also a driver of civilization, and we are better off as humans who know how to live with and work with water. It’s an opportunity. It has no will, it has no spite, and yes, you can drown in it, but that doesn’t mean we should ban water. And when you discover a new source of water, it’s a really good thing.
I think we, collectively as a species, have discovered a new source of water, and what Midjourney is trying to figure out is, okay, how do we use this for people? How do we teach people to swim? How do we make boats? How do we dam it up? How do we go from people who are scared of drowning to kids in the future who are surfing the wave? We’re making surfboards rather than making water. And I think there’s something profound about that.
feeling very fragile this morning. myself. the world.
State of the art of text-to-speech (t2s) systems
- Current best FOSS solution is espeak
- Grapheme-to-phoneme using the T5 language model: T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion
- May be useful combined with Tacotron 2 for phoneme-to-speech.
Yet another prime DALL-E 2 prompt:
brilliant cloud of knowledge anaphors, books, tools suspended in a chaotic virtual reality space in the style in tristan eaton, victo ngai, artgerm, rhads, ross draws, cinematic evening golden hour light, wide anamorphic shot
Two life hacks:
- Being really fucking good at something can get you very, very far -- with no other tricks.
- Introduce variance into your life. High variance life with good downside protection == more luck.
I think a profound and underrated property of whatever AGI we'll create is that, at least in the bootstrap phase, it will have been built atop the human experience: our writing, our visions, our soundscapes. It will be a beacon that shines our existence into the distance of time.
The billion-dollar idea:
I think large generative models can become much more controllable/predictable with the right interfaces. Generative models are essentially large databases + really effective search algorithms over them, so the right interface is a good search/navigation interface over its latent space.
Sometimes I feel bottlenecked by I/O (how much I'm reading/writing) and sometimes by data (knowing what to do), but right now I'm feeling severely bottlenecked by compute (just being able to execute on even 10% of the things I want to try prototyping/researching/executing).
I think that's a good thing? But man, an extra brain would be really useful right now! Too many thoughts, not nearly enough brain cells.
The world is adrift between two worldviews:
- To engineer scarcity into everything
- To engineer scarcity out of everything
I think the latter is a much more optimistic mission -- abundance over efficiency.
Downloaded OpenWebText today for some from-scratch language model (pre)training experiments! It's a bunch of small .xz files that unzip to more small .xz files, so I ended up writing a little script to automate all the folder-creating and unzipping, with a nice in-place-updating progress meter in the terminal:
std := import('std')
str := import('str')
fs := import('fs')
fmt := import('fmt')
debug := import('debug')
xzFiles := fs.listFiles('.') |> std.filter(fn(f) f.name |> str.endsWith?('.xz'))
xzFiles |> with std.each() fn(f, i) {
name := f.name
dirname := name |> str.trimEnd('_data.xz')
print('\x1b[0F\x1b[2K\x1b[0G') // erase previous line
fmt.format('Unzipping {{0}}/{{1}} {{2}}', i, len(xzFiles), f.name) |> print()
mkdir(dirname) // assume infallible
evt := exec('tar', ['-xf', name, '-C', dirname], '')
if evt.status != 0 -> {
fmt.printf('Error: {{0}}', evt.stderr)
exit(evt.status)
}
}