Linus's stream

Today I learned while digging through my /usr/include that my local Fedora install comes with a whopping 1.2M lines of C headers in /usr/include. That seems like a lot? Yeah?

$ cat /usr/include/**/*.h | wc
1208974 4871654 44110606

44MB of source text!

Sorting in Lisp

As I've been porting Klisp to Oak for fun this week, I wrote a couple of sorting algorithms that I think came out very elegant in Lisp:

(defn quicksort (xs)
      (if (< (size xs) 2)
        xs
        (let (pivot (nth xs (int (/ (dec (size xs)) 2)))) ; heuristic: pick midpoint
          (-> (quicksort (filter xs (partial (< ? pivot))))
              (append (filter xs (is pivot)))
              (append (quicksort (filter xs (partial (> ? pivot)))))))))
(defn merge (left right)
      (cond ((empty? left) right)
            ((empty? right) left)
            ((do
               (def l (first left))
               (def r (first right))
               (if (<= l r)
                 (conj l (merge (rest left) right))
                 (conj r (merge left (rest right))))))))

(defn mergesort (xs)
      (if (< (size xs) 2)
        xs
        (let (split-at (int (/ (size xs) 2)))
          (merge (mergesort (take xs split-at))
                 (mergesort (drop xs split-at))))))

Code is a poor interface for interactivity

As I was building Oak Notebook, I thought a lot about interactive documents and what makes me excited about improving their state of the art.

An often-cited category of interactive documents is programming notebooks like Jupyter, Observable, and Nextjournal. They're very, very cool, and they've inspired me in my own work. But I think they share one big flaw in their underlying document model: code is a poor interface for interactivity.

What I mean is that programming notebooks like those use writing and executing code in a general-purpose language as the main way for the reader to control visualizations and data displays. This excludes non-programmers from being able to really understand and work with these documents, and more important, even for programmers, reading and understanding code takes effort. Code is poor notation for things that aren't software, and code as interface hides what's really going on.

Oak Notebook, despite its many flaws, uses well-known and simple interactive input widgets and avoids this pitfall, at least for the simplest 20% of needs that cover 80% of use cases. Kevin Kwok's Carbide is another workaround for this pitfall that I'm excited about — making code as notation easier to interact with by augmenting it with interactive widgets and autodiff.

Cute little command line one-liner to count the number of decimal digits in a number:

echo "9,223,372,036,854,775,807" | \
    oak eval "stdin |> std.filter(str.digit?) |> len()"

(That number is INT64_MAX, if you are curious.)

I had to write a DOM TreeWalker today for a personal project that was written in Oak, and it's surprising how pleasant it was to write. I'd say it was the same or better than using the same DOM API from JavaScript.

// headingsList takes some container <div> and returns a list of all headings
// within that section of the page, in order
fn headingsList(div) {
	walker := document.createTreeWalker(
		div
		NodeFilter.SHOW_ELEMENT
		{
			acceptNode: fn(el) if el.tagName {
				'H1', 'H2', 'H3', 'H4', 'H5', 'H6' -> NodeFilter.FILTER_ACCEPT
				_ -> NodeFilter.FILTER_SKIP
			}
		}
	)

	headings := []
	with loop() fn(_, break) if el := walker.nextNode() {
		? -> break(headings)
		_ -> headings << {
			level: int(el.tagName.1)
			text: el.textContent
		}
	}
}

This is mostly due to two things, I think:

  1. The recently finalized std.loop API is really nice for generic unbounded loops.
  2. This semi-recent change to oak build --web codegen made JS methods very nice to call from Oak.

Oak now has a blog! Huzzah!

There is always too much to write. And too much to read. And too much to make. And too much to feel.

Everything demands to be heard now. And felt now. And thought about now. And it feels as if everything that I can't fit into this hopelessly infinitesimal slice of time called now screams at me of injustice, of being ignored, of being stolen from their deserved chance at skewing ever so slightly who I am in the moment after now.

Benchmarking fib(n) in Oak

I'm in a benchmarking mood today. So I implemented the (naive, not-memoized) Fibonacci function in both Oak and Python (the language it's probably most comparable to in performance) and ran them through some measurements.

Here's the Oak version:

fn fib(n) if n {
	0, 1 -> 1
	_ -> fib(n - 1) + fib(n - 2)
}

print(string(fib(34)) + '\n')

And here's the Python version, a direct port of the Oak one:

def fib(n):
    if n == 0 or n == 1:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)

print(fib(34))

I also made a Node.js version compiled from the Oak implementation, with oak build --entry fib.oak -o out.js --web.

With the magic of Hyperfine, we see ...

Benchmark 1: node out.js
  Time (mean ± σ):     259.4 ms ±   1.9 ms    [User: 252.6 ms, System: 11.7 ms]
Benchmark 2: python3 fib.py
  Time (mean ± σ):      2.441 s ±  0.042 s    [User: 2.421 s, System: 0.011 s]
Benchmark 3: oak fib.oak
  Time (mean ± σ):     13.536 s ±  0.047 s    [User: 14.767 s, System: 0.729 s]

Summary
  'node out.js' ran
    9.41 ± 0.18 times faster than 'python3 fib.py'
   52.19 ± 0.43 times faster than 'oak fib.oak'

The gap between Python and native Oak isn't too surprising -- Oak is generally 4-5 times slower than Python on numerical code, so this looks right. But I was quite surprised to see just how much faster Node.js runs. V8 is very, very good at optimizing simple number-crunching code.

It's very encouraging to see documentation of failed approaches to problems in technical write-ups, even if brief — for example, in How to speed up the Rust compiler in 2022.

Anecdotally, at least half of the big interface/architectural design ideas I'm trying these days (and many more small approaches to performance debugging, etc.) don't end up working out, and the important ones always get a postmortem in my notes for future reference. They've proven invaluable when discussing these issues with other people or when attempting later solutions, because I can look beyond my imperfect memory to see what issues I must design against.

Random fact I learned while making some measurements:

The overhead of an Oak loop (std.loop or any other general iteration construct) in the native Go interpreter is around 2µs per loop. std.loop(10000000, fn {}) takes just under 18s.