Linus's stream

The billion-dollar idea:

I think large generative models can become much more controllable/predictable with the right interfaces. Generative models are essentially large databases + really effective search algorithms over them, so the right interface is a good search/navigation interface over its latent space.

Sometimes I feel bottlenecked by I/O (how much I'm reading/writing) and sometimes by data (knowing what to do), but right now I'm feeling severely bottlenecked by compute (just being able to execute on even 10% of the things I want to try prototyping/researching/executing).

I think that's a good thing? But man, an extra brain would be really useful right now! Too many thoughts, not nearly enough brain cells.

The world is adrift between two worldviews:

  • To engineer scarcity into everything
  • To engineer scarcity out of everything

I think the latter is a much more optimistic mission -- abundance over efficiency.

Downloaded OpenWebText today for some from-scratch language model (pre)training experiments! It's a bunch of small .xz files that unzip to more small .xz files, so I ended up writing a little script to automate all the folder-creating and unzipping, with a nice in-place-updating progress meter in the terminal:

std := import('std')
str := import('str')
fs := import('fs')
fmt := import('fmt')
debug := import('debug')

xzFiles := fs.listFiles('.') |> std.filter(fn(f) f.name |> str.endsWith?('.xz'))

xzFiles |> with std.each() fn(f, i) {
	name := f.name
	dirname := name |> str.trimEnd('_data.xz')
	print('\x1b[0F\x1b[2K\x1b[0G') // erase previous line
	fmt.format('Unzipping {{0}}/{{1}} {{2}}', i, len(xzFiles), f.name) |> print()

	mkdir(dirname) // assume infallible
	evt := exec('tar', ['-xf', name, '-C', dirname], '')
	if evt.status != 0 -> {
		fmt.printf('Error: {{0}}', evt.stderr)
		exit(evt.status)
	}
}

Somewhere between 1B - 5B parameters, transformer-based language models go from interesting to intelligent to insightful. Currently training a 3B model after having worked for a while with a sub-1B one (t5-3b / t5-large) -- the difference is palpable.

A good DALL-E 2 prompt, I promise:

Soft, warm-glow holographic reality: a cloud of small lines of neatly organized luminous text filling the space around him like speech bubbles, connecting alternate possibilities in words, floating around a student's head as he stands thinking with hands extended out in a busy but cozy candlelit workshop. Wide shot on Hasselblad Mark II, photographed from behind. Firefly swarm vibes.

Thinking like playing with clay; incrementally molding a form rather than micro-assembling.

I really like this framing of a "keyboard" for latent space navigation, vs. the generic "input method" term I've been using to think about this problem.

I’m wondering what a keyboard would like where text is manipulated on the dimension of ideas instead of characters. @thesephist's demo @betaworks tools-for-thought conference felt like an evolution of writing.

- @JohannesMutter

Are yet-unknown ideas and yet-undiscovered facts in between the known or outside of the known?

What should those even mean, conceptually?

Should we look closer, or should we look farther?

I need to get as fast at prototyping and validating deep learning models as I am at prototyping and validating web apps. This requires investment in:

  • Knowing/understanding a few tools deeply and intimately
  • Understanding core concepts deeply
  • Investing in custom tooling and infrastructure where it makes sense