Avatars, identicons, and hash visualization

Oftentimes we have a situation where we have a situation where we want an easy way to distinguish people of which we only know the name or a nickname. Probably the most common examples of these are distinguishing different people from each other on various discussion platforms, like chat rooms, forums, wikis, or issue tracking systems. These very often provide the possibility to use handles, often real names or nicknames, and profile images as an alternative. But as very often people don’t provide any profile image, many services use a programmatically generated images to give some uniqueness to default profile images.

In this article I’ll take a look at a few approaches to generate default profile images and other object identifiers. I also take a look at the process of creating some specific hash visualization algorithms that you may have encountered on the web. The general name for such programmatically generated images is hash visualization.

Use cases and background

A starting point to visualize arbitrary data is often hashing. Hashing in this context is to make a fixed length representation of an arbitrary value. This fixed length representation is just a big number that can be then used as a starting point for distinguishing different things from each other.

Hash visualization aims to generate a visual identifier for easy differentiation of different objects that takes an advantage of preattentive processing of human visual system. There are various categories that human visual system uses to do preattentive processing, like form, color, motion, and spatial position. Different hash visualization schemes knowingly or unknowingly take advantage of these.

Simple methods use specific colors as a visualization method, but at the same time limit the values that can be preattentively distinguished to maybe around 10-20 (around 4 bits of information) per colored area. Using additional graphical elements provides a possibility to create more distinct unique elements that are easy to separate from each other. One thesis tries hash visualization with 60 bits of data with varying success. So the amount of visualized data that can be easily distinguished from each other by graphical means is somewhere between 4 and 60 bits.

Avatars

Perhaps the most common way to distinguish different users of an Internet based service is to use an user name and an avatar image. Avatars often go with custom avatar images that users themselves upload to the service. In case the service itself does not want to host avatars, it can link to services like Gravatar that provides avatar hosting as a service. This way user can use the same avatar in multiple services just by providing their email address.

But what about a situation where the user has not uploaded or does not want to upload an avatar image to the service? There usually is a placeholder image that indicates that the user has not set their custom avatar.

There seem to be three major approaches to generate this placeholder image:

A generic dummy placeholder image. These often come in a shape of a generic human profile.
An image from predefined collection of images. These are usually related to a theme that the site wants to present.
A partially or fully algorithmically generated image. These usually make it possible to get the most variety for placeholders with the least amount of work.

Algorithmically generated images vary greatly and usually come in three varieties:

Simple images that mainly use a color and a letter to distinguish users.
Images that consist of a set of a predefined parts and color variations.
More complex shapes usually created fully algorithmically.

I have taken a closer look at different algorithmic avatar creation methods in form of WordPress identicon, GitHub identicon, and MonsterID. These provide an overview on how algorithmic avatar generation can work. There is also a chapter of using a character on a colored background to give an example of a simple algorithmic method for default avatar generation.

I want to focus here on visualizing the few methods that I have encountered rather than having a all-encompassing overview of avatar generation. For that, there is Avatar article on Wikipedia that gives a better textual overview of these and many more methods than I could give.

Character on a colored background

Figure 1: Avatar examples having a colored background with the first letter of an user name.

Services like Google Hangouts, Telegram and Signal, and software with people collaboration functionality like Jira generate a default avatar for an user by using the first letter of user’s name with a colored background. This is simple method where you need to select a font, a shape that you want to use to surround the letter, and size of the avatar. Some example avatars generated by this method are visible from figure 1.

WordPress identicon

WordPress identicon has been a well known algorithmically generated avatar for quite long time on web forums. It has been used to distinguish between users based on their IP addresses and email addresses. Example WordPress identicons can be seen from figure 2 where you should be able to notice that there is some symmetry in these images.

Figure 3: 44 parts from which a WordPress identicon is generated.

Looking at the code how these avatars are generated reveals 44 patterns, shown in figure 3 that the generators uses to generate these images. They are symmetrically placed around the center and rotated 90 degrees when they reach the next quadrant. This process is illustrated in figure 4 that shows how these parts are placed and rotated around the center. In case the requested identicon has an even number of parts, these parts will get duplicated. So 4x4 WordPress identicon will only have 3 distinct parts in it. This is the same number as for 3x3 WordPress identicon.

Figure 4: 5x5 WordPress identicon generation animated.

GitHub identicon

Figure 5: Example GitHub style identicons.

GitHub is a great collaborative code repository sharing service where users can have identifying images. If user, however, does not have any profile image, GitHub will generate an identicon that forms a 5x5 pixel sprite. Figure 5 examples how such identicons can look like. The exact algorithm that GitHub uses to generate these identicons is not public, so these example images just imitating the style of GitHub identicons.

Figure 6: GitHub style identicon creation process animated. Visible pixels are horizontally mirrored.

Figure 6 shows the process of creating such identicon. Basically the given data is hashed and from that hash a specific color is selected as the value for a visible pixel. Then other part of the hash (15 bits) is used to form an image that is horizontally mirrored. I have used a modified ruby_identicon to generate these images and animations. This library also enables the generation of other sizes than 5x5 sprites.

MonsterID

Example artistic MosterIDs. — Figure 7: Example artistic MonsterIDs.

WordPress MonsterID is a hash visualization method where hashes are converted to various types of monsters. It’s a one type of hash visualization method where lifelike creature is created from predefined body parts. WordPress MonsterID actually offers two types of monsters, the default ones and artistic ones. Here I’m showing some example artistic monsters in figure 7 just because they look better than the default ones.

Figure 8: Creating an artistic MonsterID image by selecting and colorizing an appropriate part from each category.

Figure 8 shows the MonsterID creation process. First there is a lightly colored background on top of which we start forming the monster. We select a body part in order of legs, hair, arms, body, eyes, and mouth. These parts are show in figure 9. Then assign a color to the selected part, unless it’s one of the special cases. Then just overlay it on top of the image formed from previous parts. There are special cases that what parts should have a predefined color or be left uncolored, but that does not change the general monster creation process.

Figure 9: Parts from which a MonsterID is created divided into categories (hair, eyes, mouth, arms, body, and legs).

In addition to WordPress MonsterID plugin, Gravatar also supports old style MonsterIDs. These unfortunately do not as good as the artistic monsters in the WordPress MonsterID plugin and there is no option to select an alternative form for these avatars.

OpenSSH visual host key

Hash visualization has also its place in cryptography. OpenSSH client has an option to display so called visual host keys.

When a SSH client connects to a remote server that it has not ever seen before, it needs to accept the public key of this server so that it can communicate with it securely. To prevent man-in-the-middle attacks, user initiating the connection is supposed to verify that the host key indeed matches. An example message about host key verification request is shown in figure 10. The idea is, that either you have seen the this key beforehand or you have some other means to get this key fingerprint and verify that it indeed belongs to the server you are trying to connect to. How this works in practice or not, is a completely different discussion.

$ ssh example.com
The authenticity of host 'example.com (127.1.2.3)' can't be established.
RSA key fingerprint is SHA256:Cy86563vH6bGaY/jcsIsikpYOHvxZe/MVLJTcEQA3IU.
Are you sure you want to continue connecting (yes/no)?

Figure 10: The default SSH server key verification request on the first connection.

As you can see, the key fingerprint is not easy to remember and also the comparison of two fingerprints can be quite hairy. When you enable visual host key support, as show in figure 11, the idea is that you can quickly glance to figure out if the key of a server is different than expected. This should not be treated as a method to see if two keys are equal, as different keys can produce the same image.

$ ssh -o VisualHostKey=true example.com
The authenticity of host 'example.com (127.1.2.3)' can't be established.
RSA key fingerprint is SHA256:Cy86563vH6bGaY/jcsIsikpYOHvxZe/MVLJTcEQA3IU.
+---[RSA 8192]----+
|     ..o.=+      |
|      . E.       |
|        . .      |
| .       o       |
|o o   + S o      |
|.+ o o + *       |
|o.. .o..B.o      |
|.o  o.*BB= .     |
|+ ...=o@@+o      |
+----[SHA-256]----+
Are you sure you want to continue connecting (yes/no)?

Figure 11: Visual host key enabled SSH server key verification request on the first connection.

The algorithm to generate these visual host keys is visualized at figure 12. It’s also described in detail in an article that analyzes the algorithmic security of this visual host key generation: The drunken bishop: An analysis of the OpenSSH fingerprint visualization algorithm.

OpenSSH’s visual host key is formed in such way that the formation starts from the center of the image. Then 2 bits of the host key fingerprint are selected and the location is moved in a diagonal direction. The visual element on the plane where this movement happens is changed based on how many times a certain location is visited. This can be better seen from the animation where each visual host key generation step is visualized as its own frame and where the bits related to the movement are highlighted.

Figure 12: OpenSSH visual host key generation animated.

The exact details how the key is generated can be seen from OpenSSH’s source code. This source code also refers to a paper called Hash Visualization: a New Technique to improve Real-World Security. But the fact that OpenSSH is usually limited to terminals with text interface makes it quite hard to use the more advanced Random Art methods described in the paper. But the idea for host key verification by using images has at least taken one form in the OpenSSH world.

Conclusions

I presented here some methods that I have encountered over the years on various sites and programs. I can not possibly cover all of them, as there likely are as many algorithms to create avatars as there are people inventing them. The biggest question with any of these algorithms is that how many bits they should try to visualize and what types of graphical elements they can use in the visualization without just creating indistinguishable clutter.