QR codes for (sending and receiving) latent codes/coordinates in a generative model latent space?
A pattern language for expressive environments (an opinionated summary from an interface design perspective)
What are the signifiers that make a space recognizable? How can you “communicate” a living room, a church, a casino with the least amount of details?
- Decision points/making a choice, especially explicit, discrete decision trees to explore with obvious ways to backtrack, reified in "obstacles and passages."
- Readable affordances for the environment -- which doors can you open? Which obstacles can you interact with? Which roads can you actually take? Affordances that are clear from a distance in moments of urgency/flow can help ease navigation and keep user/player in flow.
- Landmark/global signposting, wherein you orient the player globally in space with an always-visible large landmark that almost becomes a part of the geography, like a mountain or a tower or a light source.
- A brightly illuminated area or light source is a particularly effective global landmark, because humans avoid darkness and chase the light.
- Grand vistas and viewpoints can serve as effective points of tension release/milestones, as well as a way for the user to understand global geography of a space and a way to encourage exploration.
- In-narrative boundaries, limits, and walls can be natural ways for the interface/game to signal invalid action spaces or off-limits areas. These usually take the form of topographic formations like shorelines in islands, unscalable walls in valleys, etc.
- Alternatively, the infinity/unboundedness of procedurally generated worlds can be emphasized in-narrative for greater effect.
- Feel of embodied movement -- "Consider the game feel of your control system and how your environment accommodates for it. A first person perspective can be grounded with heavy footsteps and head bobbing, or disembodied and ghostlike."
- Walking, gliding, falling, floating upwards...
- Cozy, "home" environments/sites vs. the outdoors
Another DALL-E 2 prompt that has a high hit rate for that SoHo loft studio vibe:
high-ceilinged futuristic minimalist attic loft in SoHo, Manhattan studio, an artist painting a giant canvas that's a portal to another ethereal magical pastel universe, golden hour, digital art, dreamlike from Studio Ghibli, color-grading
Trivial for loop inner loop timings:
- Oak
std.loop: 1.7µs - Python 3.10
for i in range(...): 0.12µs
Python can iterate through all 32-bit ints in around 8 minutes; Oak can do it in around 2 hours. (A well-optimized compiled program should take no more than a few seconds.)
Seems right — the Oak-Python gap narrows with more complex programs.
Another few prompts that I liked, for DALL-E 2:
Wide blurry shot on film, dreamlike motion blur, pristinely lit hands manipulating ethereal transparent retro computer displays, color graded like the film "Interstellar", warm-glow basement studio vibes, my favorite photograph from last night
Close-up shaky blurry shot on polaroid, dreamlike motion blur and light leaks, pristinely lit warehouse with friends conversing and thinking together on a humongous ethereal whiteboard, color graded like the film "Interstellar". warm-glow modern basement studio vibes
La fonction de l’artiste est ainsi fort claire: il doit ouvrir un atelier, et y prendre en réparation le monde, par fragments, comme il lui vient.
- Francis Ponge
Keep seeing these narratives casting "traditional" software in some kind of battle with AI systems. The war is never between AI and software. The war is always humanity against entropy, and always will be. AI is just a better weapon to beat entropy.
Humanity against entropy. Consciousness against the void.
David Holz on Midjourney and the burgeoning space of AI art
Right now, it feels like the invention of an engine: like, you’re making like a bunch of images every minute, and you’re churning along a road of imagination, and it feels good. But if you take one more step into the future, where instead of making four images at a time, you’re making 1,000 or 10,000, it’s different. And one day, I did that: I made 40,000 pictures in a few minutes, and all of a sudden, I had this huge breadth of nature in front of me — all these different creatures and environments — and it took me four hours just to get through it all, and in that process, I felt like I was drowning. I felt like I was a tiny child, looking into the deep end of a pool, like, knowing I couldn’t swim and having this sense of the depth of the water. And all of sudden, [Midjourney] didn’t feel like an engine but like a torrent of water. And it took me a few weeks to process, and I thought about it and thought about it, and I realized that — you know what? — this is actually water.
Right now, people totally misunderstand what AI is. They see it as a tiger. A tiger is dangerous. It might eat me. It’s an adversary. And there’s danger in water, too — you can drown in it — but the danger of a flowing river of water is very different to the danger of a tiger. Water is dangerous, yes, but you can also swim in it, you can make boats, you can dam it and make electricity. Water is dangerous, but it’s also a driver of civilization, and we are better off as humans who know how to live with and work with water. It’s an opportunity. It has no will, it has no spite, and yes, you can drown in it, but that doesn’t mean we should ban water. And when you discover a new source of water, it’s a really good thing.
I think we, collectively as a species, have discovered a new source of water, and what Midjourney is trying to figure out is, okay, how do we use this for people? How do we teach people to swim? How do we make boats? How do we dam it up? How do we go from people who are scared of drowning to kids in the future who are surfing the wave? We’re making surfboards rather than making water. And I think there’s something profound about that.
feeling very fragile this morning. myself. the world.
State of the art of text-to-speech (t2s) systems
- Current best FOSS solution is espeak
- Grapheme-to-phoneme using the T5 language model: T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion
- May be useful combined with Tacotron 2 for phoneme-to-speech.