the stream

2022/2/9 7:38

I wrote a little pair of scripts today to download and archive my "Saved" media on Instagram. I first reached for an official API to do this, but it turns out there aren't any (at least, that I could find in a few minutes). So I decided to just scrape via internal APIs. The full scripts are here on GitHub Gist, though they may stop working at any time, obviously.

My final system ended up being in two parts:

a "frontend" JavaScript snippet that runs in the browser console on the instagram.com domain, using the browser's stored credentials, to ping Instagram's internal APIs and generate a list of all the image URLs
a "backend" Oak snippet that runs on my computer, locally, and downloads each image from the list of URLs to a unique filename.

Some interesting notes:

They don't have a rate limit on their internal API (or it's very high, such that my nonstop sequential requests for many minutes never hit it).
They have an extra layer of request authentication beyond cookies and CSRF, headers like x-ig-www-claim (an HMAC digest?) and x-asbd-id. They don't seem like message signatures because I could vary the message without changing these IDs.
Their primary GraphQL API is quite nice. Queries are referenced using build-time generated hashes and responses support easy cursor-based pagination.
Their internal API for media (by carousel, resolution, codec, etc.), just like Reddit's, is kind of a mess, with field names like image_versions2. I'm guessing lots of API churn?

2022/2/8 11:35

After adding some remedial tests for poorly tested parts of the syntax stdlib, Oak now has 1000 behavior tests written in Oak itself and tested on both native and web runtimes, covering the language itself and all of the standard library's API surface! Feels like a milestone to celebrate.

2022/2/7 23:56

A particularly satisfying patch to Oak today, ec3a188a. It simplifies implementations of the standard library's str.startsWith? and str.endsWith?.

Before, these functions compared both strings byte-by-byte, short-circuiting the loop if any mismatch was found. This was theoretically more efficient than comparing an entire substring to the given substring, because of the short-circuiting possibility. But in practice, the overhead of the extra VM ops when evaluating such iteration negated any gains.

Now, these functions create substrings of the original string that should equal the given prefix or suffix, and do a single, simple string comparison delegated to the underlying runtime. As a bonus, these one-line implementations are very simple and easy on the eyes.

fn startsWith?(s, prefix) s |> slice(0, len(prefix)) = prefix
fn endsWith?(s, suffix) s |> slice(len(s) - len(suffix)) = suffix

Especially on long inputs, the efficiency gain is significant:

Benchmark 1: oak input.oak (old implementation)
  Time (mean ± σ):      3.197 s ±  0.010 s    [User: 3.792 s, System: 0.200 s]
  Range (min … max):    3.179 s …  3.214 s    10 runs

Benchmark 2: ./oak input.oak (new implementation)
  Time (mean ± σ):      2.141 s ±  0.024 s    [User: 2.539 s, System: 0.144 s]
  Range (min … max):    2.117 s …  2.187 s    10 runs

Summary
  './oak input.oak' ran
    1.49 ± 0.02 times faster than 'oak input.oak'

2022/2/4 3:07

A fun couple-day hack to take a break from working on a text editor — burds.vercel.app

2022/1/30 3:36

West Coast Earthquake Twitter > East Coast Snowstorm Twitter.

2022/1/28 1:10

It's a pseudoscientific myth that humans only use 3% of the brain, or whatever, but it's probably true that at any given moment we only think with about 3% of the things we know. Getting that up to somewhere near 90% — or even 50% — will probably have similarly powerful effects.

2022/1/26 20:34

Earlier today, I spent quite some time building a good implementation of rich text paste in Ligature3/Notation, my current place for notes and written thoughts. Before, I could only realistically paste in a couple of paragraphs at a time. Now, I can select a whole section of a document or an entire blog post, if I wish, and paste it into a new note in my app, and each paragraph will become its own little block, effortlessly, with sub-sections and lists split out into their own sub-lists properly.

With this, I'm finding myself more compelled to "dump" information into my notes with the mass copy-paste being a sort of a surrogate "import" feature. It's slightly changed my relationship to this particular app from it being a pristinely manicured garden to a mixture of handwritten and copied notes.

2022/1/26 3:37

In an ideal world, discovering new thoughts and ideas from your own notes is as addictive/engaging as discovering new videos on YouTube or TikTok, discovering new people on Twitter.

2022/1/25 5:30

One way to measure the progress of web search technology is by looking at the set of knowledge the average person doesn't bother learning about until they need it. The better the commodity search engine, the less effort people will expend to "pre-learn" things before they really need to know it, because they can depend on the knowledge always being quickly accessible.

These days nobody bothers memorizing the population of a country or when the seasons start, nor friends' addresses or phone numbers. But I still find myself wanting to learn more abstract, long-form topics because they can't simply be looked up "just in time" ... yet.

2022/1/17 17:29

My Mac keeps having increasingly frequent issues where the corespotlightd process starts consuming all the memory on the system, logging me out of iCloud and freezing up the entire machine to where I can't even reboot.

Since I don't want to debug a macOS built-in process, and I can't turn it off in settings, it seems like the only reasonable solution is to continually monitor the resident memory usage of the process and kill it if it starts consuming too much, so I wrote up a little bash script to do just that.

We need to ensure there's only one copy of this script running on the system at any given time. So I use a lockfile in /tmp.
We filter ps aux to get the PID of corespotlightd, and if it's running, get the resident set (real memory usage, more or less) size.
If it's using more memory than $MAXRSS (1GB for now), kill -9 it, and repeat every 30 seconds.

#!/bin/bash

LOCKFILE=/tmp/limit_corespotlightd.lock
if [ -f "$LOCKFILE" ]; then
    exit
else
    touch "$LOCKFILE"
fi

MAXRSS=1000000 # 1GB

while true; do
    PID=$(ps aux | grep '/corespotlightd$' | awk '{ print $2 }')

    if [ -n "$PID" ]; then
        RSS=$(ps -"$PID" -o rss | grep '[0-9]')

        if [ "$RSS" -gt "$MAXRSS" ]; then
            echo 'corespotlightd is using' "$RSS" 'kB; killing pid' "$PID"
            kill -9 "$PID"
        fi
    fi

    sleep 30
done

←→

Linus's stream