Remembyte

I built an experiment I called remembyte.rs (originally remembyte).

It explores different ways of remembering small amounts of binary data (like an SSH host key fingerprint), and as a starting point to explore existing research.

The C version is no longer under active development. The Rust version is sort of on backburner; I’m sure I’ll get back to it eventually.

Aside from the research I found which I discuss below, to build the original version I wrote more C than I ever had before, which informs my thoughts on programming today.

Background

I find myself with more servers and workstations than really makes sense. This means I end up keeping track of SSH host keys somehow.

I had seen a clever Perl script to convert SSH host key fingerprints to emoji. I wanted to do something similar.

Actually, my initial plan doesn’t hold up well to hindsight. I wanted to have a general purpose tool I could easily install everywhere, including on Windows, Linux, and (what was at the time called) Mac OS X, usable without installing an interpreter or any dependencies.

Today, I’d write something like that in Go. At the time, I didn’t know hardly anything about Go, especially how nice its tooling for cross compilation and no-deps binaries are.

Also, today, I’d just use an SSH certificate authority, which I did not know was possible at the time.

That’s OK though. After implementing the initial functionality talking to SSH servers and mapping to emoji, however, I got distracted by the user experience angle.

Questions about the UX of byte mapping

The most naive implementation of the emoji mapping is to use the same mapping as the original project, so obviously I did that. However, it raised some questions. Is there a best emoji mapping?

Are there emoji that are more or less likely to be confused for one another? The mapping I found used all of 🐏, 🐐, and 🐑; I’m not sure I would remember which was which.

Are there accessibility issues? The mapping I found used both 🍎 and 🍏; could this be an issue for color blind people? Are there other issues that I wasn’t aware of?

Existing research

I was excited to find that there was existing UX research for byte mappings in another context. I found the PGP word list.

The PGP Word List was designed in 1995 by Patrick Juola, a computational linguist, and Philip Zimmermann, creator of PGP. The words were carefully chosen for their phonetic distinctiveness, using genetic algorithms to select lists of words that had optimum separations in phoneme space. The candidate word lists were randomly drawn from Grady Ward’s Moby Pronunciator list as raw material for the search, successively refined by the genetic algorithms. The automated search converged to an optimized solution in about 40 hours on a DEC Alpha, a particularly fast machine in that era.

Cool.

I was especially impressed that the word list was designed to be robust.

When Wikipedia says “The words were carefully chosen for their phonetic distinctiveness, using genetic algorithms to select lists of words that had optimum separations in phoneme space”, this means the words were chosen to be less easily mistaken from each other. Words were graded on how similar they sound, and then words too close to the sounds of other words were removed from the set.
Further, the PGP word list is actually two word lists - one list of two syllable words, and a second list of three syllable words. Each list has a word that maps to the same number, and which list to choose depends on whether it is an even-numbered or odd-numbered word in the sequence. This helps notify counterparties in a discussion if a word has been entirely elided, which is especially helpful on a phone call which may silently drop out for a short period of time.

Future research

Sparked by the ingenuity of the PGP word list research, I wanted to try other alternating sets. Sadly, this was not implemented. I think it would still be worth trying.

Specifically, I wanted to try sentences, constructed from 4 different word maps:

A noun in an actor role
An adverb
A verb that takes an object
A noun in an object role

That’s pretty abstract; I think it makes a lot more sense with an example. You could make sentences like “The firefighter (actor) greedily (adverb) saw (verb) the cloud (object)”, or “John (actor) sadly (adverb) ate (verb) apples (object)”. These sentences would be easy to remember, and I think based on the PGP research could be made unambiguous. (Note that there is no requirement for the sentences to be logical, which I think is OK for this use case.)

It retains the benefit of the PGP lists, and further, it would make the lists of words easier to remember.

If you ever have opportunity to try this, please do it and let me know how it goes. I don’t care about idea “ownership” or “property”, and I’m sure someone else has had this thought before anyway, so if this sparks any idea at all, please run with it. I would love to hear about any other research like this, or any implementation of it.