fawx by Erik Frey

Hi, Kiddo.

I’m not sure when you’ll receive this. Time machines, even the ones that only travel forward, are tricky to aim. The message enclosed is kinda subdued carrier wave, so it’s probably best delivered after you’re done tripping on hormones in your teens. Maybe it will arrive when you’re deciding to become a parent yourself. Or maybe it’s a hedge against a future where we’ve grown distant, a thin silver thread connecting us.

Either way, you’re reading it now. Hi, kiddo. This is a snapshot of your dad before he was your dad:

Current Snapshot Dad

Right now, you’re the size of a blueberry inside your mom’s belly. It’s difficult to describe what I feel about you, because today you’re more the idea of you than the you yourself you are becoming. I can imagine a future in which you grow so large that you become the sun, the new origin point in me and your mom’s universe. But right now? Blueberry.

Your mom might claim otherwise but I think she’s always wanted you. Me, not so much. For a long time, I held some medium-fierce beliefs about the world that made a compelling case against you:

  1. People are already an awful problem, don’t make it worse.
  2. Ideas out-live and out-multiply people, so invest in those instead.
  3. And anyway, raising a kid looks like a pain in the ass.

Bullets one and two took years to come to terms with in my own strange way, so I admit I doubt whether this message will help you. They might not even be relevant to you. Who knows what crazy stuff your generation is gonna do with their genitals. So maybe ignore all this if you’re siring a tamagotchi.

And bullet three? Let’s punt. Current Snapshot Dad is still calling that one a jump ball :)

But the other two! Oh boy!

People are already an awful problem, don’t make it worse.

So, what did you grow up watching and reading? Did I let you watch zombie movies? Or Charlton Heston falling to his knees in front of a semi-submerged Statue of Liberty? Did you learn what soylent green is made of? I wonder if pop culture has instilled in you a deep, abiding sense of inevitable entropy, zero sum.

Scarcity. Exponential growth ain’t easy to grok. At some point, Malthus will teach you that the number of hungry hippos is growing exponentially, but the number of marbles is growing arithmetically. That’s a problem. But stating the problem so simply is misleading.

First, almost every exponential curve is a sigmoid in disguise, and population is no exception. Sure, we’re adding to the problem with all this neat technology that makes our farms more productive and advances our public health, letting us live longer. But that same neat technology has also brought us urbanization, family planning… people doin’ it just for fun. Lower mortality rates but also lower fertility rates. Lucky you, made it just under the cut.

Second, many arithmetical curves are an exponential in disguise! Take the production of solar power, for example. Would you have bet on that curve in the 90’s? But just two decades later solar is nearly cheaper than coal. We’re learning to eat the Sun instead of the Earth, and the Sun is a whole lot more nourishing. But it’s a harrowingly slow learning curve, sailing slowly over spikier, more urgently rising curves.

So how do you bet on which way the curves will bend? Look for discontinuities: global warming, renewable energy, food security, techno-solipsist singularities. Maybe for bedtime we’ll read the Copenhagen Consensus reports - I’m sure you’ll love that. You could spend years running the odds, weighing the factors. I have. And on the whole, stakes look good. I truly believe your mom and I are bringing you into a world that isn’t doomed.

So, Earth probably not doomed, but is it a better place with you in it? This brings us to bullet two.

Ideas out-live and out-multiply people, so invest in those instead

So, kiddo, who wins in a battle between a horse-sized duck and 100 duck-sized horses?

For years, I’ve belonged to a cohort that bets on the 100 duck-sized horses, actually trillions of duck-sized horses - every minute of every person’s day that my colleagues and I try to improve. And we congratulate ourselves for each new technological knicknack and foofaraw that snowballs across continents, each one an inch deep, a mile wide. For years I’ve helped lay down these thin veneers, but the surface area betrays the volume.

Not just the volume, but the interrelatedness of our duck-sized horses, means that you can never really be sure that your impact has anything to do with you. Stigler’s law, first discovered by sociologist Robert K. Merton, says that no scientific discovery is named after its original discoverer. This is convergent evolution: the solutions are inevitable, no matter the solvers. I will never really know whether someone else in my role would have made the world a better place, better than I have.

But I can be sure about you. You are my horse-sized duck. My new enterprise. And to judge the success of any new enterprise you’ve got to take a close look at the leadership team. And here, kiddo, is where you have the ultimate secret weapon.

This is a snapshot of your mom before she was your mom:

Current Snapshot Mom

As you know, your mom is going to be like the Wilt Chamberlain of moms, just superbly busting out mom 3-pointers and mom-dunks. Little blueberry, you are about to get mom’d so hard, good luck trying to keep up.

I love your mom. And I love the world. I deeply believe that the best way for me to make the world a better place is to gift it someone who was raised by her and loved by her.

So that’s why you exist. Message over. If you were hoping this was one of those more mundane time machines, what kind of music are we listening to blabla, well just come over and ask! If you could unplug from your holographic interfaces for even just an hour to come visit your parents in real-space, I swear you kids these days.

An Ode to Set Intersection, Part 1

Norman Casagrande This here is Norman Casagrande, last.fm’s head researcher and my former mentor. The facial expression seen here is one of my favorites. It’s usually accompanied by some questionably-translated Italian aphorism, ”You know, Erik, that is like the ox saying ‘horned’ to the donkey.”

This smarmy smile also usually means I’m about to learn something new. I saw it my very first day at last.fm. Norman was describing to me various gears and nozzles that power last.fm’s recommendations, and when he showed me the code that compares two musical items, I stopped him. I saw my chance to strike.

Set Intersection

Set intersection is a delightful little haiku problem in computer science. It’s simple, elegant, and is ubiquitous in search engine tech, collaborative filtering mechanisms, various types of matrix math… probably other places, too! Its most basic implementation is linear in complexity, and most people are happy with linear performance here. But as I studied the guts of Norman’s recommendation engine, I saw my first opportunity to unleash unspeakable power!

“Norman, what if I told you that there are set intersections much more efficient than this? In fact, I know of one that uses a search algorithm that is…”, I whispered almost breathlessly, “log-log!”

“Ehhh, you can try but I don’t think it will be faster. But you throw the spaghetti at the ceiling propeller and we will see what stays there.” And out came that infuriating smile.

So I set about to wipe that smile off Norman’s face. I was going to write the Rolls-Royce jet engine of all set intersection algorithms.

Interpolation Search and Dr. Baeza-Yates

To write my code I drew from the experience I’d had at a job interview with Google some time before. A smart scientist had quizzed me about interpolation search, a way of finding elements in sorted sets by making educated guesses as to their location. He claimed it had contributed greatly to the performance of operations at Google, so I was sure it could do the same for me.

In turn I’d found a really excellent paper on set intersection, A Fast Set Intersection Algorithm for Sorted Sequences, by a celebrated, highly-respected dude in the field of data mining, Ricardo Baeza-Yates. In this paper he proposes the following algorithm:

  1. Pick the median element, A, in the smaller set.
  2. Search for its insertion-position element, B, in the larger set.
  3. If A and B are equal, append the element to the result.
  4. Repeat steps 1-4 on non-empty subsets on either side of elements A and B.

Dr. Baeza-Yates showed that this algorithm does particularly well when one set is smaller than the other. Just to be sure, I wrote code to count the number of comparisons for three kinds of set intersection: plain old linear set intersection, Dr. Baeza-Yates’ special set intersection with binary search, and his set intersection with interpolation search.

View/download the source for this experiment from my themas github repository. You can profile the algorithms on uniform random data by building the project and running:

hexdump -e '1/4 " %u"' /dev/urandom | ./set_intersection_profile binary compares 100000 0.1 1.0 0.1 20

The results were exactly what I’d hoped, looking at the number of comparisons as I varied the ratio between the two set sizes:

Set Intersection Comparisons

Not only did the interpolation intersection use far fewer comparisons in the case where |M| << |N|, but it performed better than linear everywhere else, too! Armed with this knowledge, I was ready to take Norman down.

To be continued!

p.s. For the observant programmer that is curious as to why the linear intersection’s line isn’t flat in the graph above, see this commit and this tangentially-related article about _builtinexpect. You’ll figure it out!