Bitmasks and Binary: What Does That Weird Pipe Character Do?
Language
- unknown
by James Smith (Golang Project Structure Admin)
We all know what piping looks like in the real world, but when we talk about pipes in computing, we’re not necessarily all talking about the same thing.
Table of Contents
I’ve Seen Pipes in Linux and Mac OS
In UNIX-based operating systems, the pipe character, also known as the vertical bar, is used to send the output text from one program to the input of another. It allows simple, specialized programs to be chained together to form a sequence. This is based on an important principle among early UNIX developers, represented by the acronym DOTADIW, meaning that each program should only try to “do one thing and do it well”!
Each individual program should be as simple as possible, so that it’s easy to maintain and use. More complex functionality can be built up by combining different programs together. That’s the idea.
For example, the command below, when entered into a Linux terminal, would list the names of the first four files in the current directory when sorted alphabetically:
ls | sort | head -n 4
The ls
command lists all the files, the sort
command takes that output and orders it alphabetically and, finally, the head
command prints only the first four lines. The three commands are separated by the pipe character.
So Do We Have Pipes in Golang Too?
You may have seen the pipe character, also known as the vertical bar, in Go code, and wondered exactly how it’s used and what it’s for. As an example, look at the following code, which is used to write to the operating system’s log:
package main
import (
"fmt"
"log/syslog"
)
func main() {
writer, err := syslog.New(syslog.LOG_WARNING|syslog.LOG_USER, "example")
if err != nil {
panic(err)
}
fmt.Fprintf(writer, "This is an example warning.")
}
A new writer is created with a given priority and a string tag, which acts as an identifier (in this case "example"
). This writer is then used to send messages to the system log. But look at the priority parameters in the initial call to the syslog.New
function: two constants are separated by the pipe character.
In this case, the symbol simply acts to tie the two constants together, so they can both be passed as a single parameter. But how does this happen?
Internally, the two separate constants are stored in the Go standard library as numbers (they both share the int
type). For example, syslog.LOG_WARNING
equals the number 4
and syslog.LOG_USER
equals the number 8
.
Thinking in Binary
In Golang, the pipe character acts as a bitwise inclusive OR operator. This sounds complicated, but it just means that it does some simple maths using the binary representation of each of the numbers on either side of symbol. So let’s think about how numbers are represented in the binary number system:
1) 00001
2) 00010
4) 00100
8) 01000
16) 10000
You can see above that every time the decimal number doubles, the 1
in its binary representation moves one place to the left. This is because binary numbers are built around powers of 2
: if the decimal number is an exact power of 2
, then its binary representation will contain a single one surrounded by zeros (for example, 2
to the power of 4
is 16
, shown in the last row above).
Each of the 1
s and 0
s in the binary representation above are known as binary digits — or bits for short. When the vertical bar is used with two numbers, it simply combines the bits in each of the numbers. So, for example, 4 | 8
would equal 12
, because that’s the value of the result when all the bits are combined:
4) 00100
8) 01000
12) 01100
In that case, the inclusive OR operator simply seemed to add the two numbers together, because 4 + 8 == 12
. However, the operation only appears to be the same as an addition because both of the operands are exact powers of two. If, for example, we do 3 | 6
(which equals 7
), we get a different result than if we had calculated 3 + 6
(which, of course, equals 9
), as seen below:
3) 00011
6) 00110
7) 00111
Bitmasks and Option Constants
In the early history of computing, programmers noticed that, because each power of two resulted in a single bit being set, you could combine these numbers (using the bitwise OR operator) to produce a result where specified bits would be set. So option constants — which are called bitmasks — could be stored as powers of two and a variable could combine these options in order to show which ones are currently set. In other words, it’s like having multiple boolean values stored in a single number. Let’s look at an example:
package main
import "fmt"
type emotion uint32
const (
HappyEmotion emotion = 1 << iota // 1 (2⁰) (0b000001)
SadEmotion // 2 (2¹) (0b000010)
AngryEmotion // 4 (2²) (0b000100)
AnnoyedEmotion // 8 (2³) (0b001000)
ConfidentEmotion // 16 (2⁴) (0b010000)
WistfulEmotion // 32 (2⁵) (0b100000)
)
const (
// 17 (1 + 16) (2⁰ + 2⁴) (0b000001 | 0b010000) (0b010001)
CurrentEmotion = HappyEmotion | ConfidentEmotion
)
func main() {
fmt.Println("I am feeling happy and confident.")
fmt.Printf(
"My emotion's value is %d (%b in binary).\n",
CurrentEmotion, CurrentEmotion,
)
}
You can see in the code above that we start by setting our bitmasks: each constant represents a separate emotion that a person can be feeling. We use 1 << iota
to set each of the emotions to a different power of two: iota
equals 0
for the first constant, 1
for the second, 2
for the third and so on.
The <<
is called the left-shift operator, and it moves all bits set in the number one place to the left: in this case, because it's moving only a single bit to the left an increasing number of places, it's giving us increasing powers of two as we declare each constant. My comments next to each constant show the decimal, exponential and binary representations of each value.
Hopefully you can see how the binary digits of the happy and confident emotions get combined to produce a value that can only represent a state of both happiness and confidence.
Can You Quantify My Emotions?
Looking at the previous example, can you work out what the binary and decimal representations of a value that combined angry and wistful emotions would be? Try to think it through in your head and come up with an answer. Then rewrite my code to change the value of CurrentEmotion
and see if you're right or where you went wrong.
What emotion would be represented by the decimal value 42? Try converting the number to hexadecimal or binary, if it helps, and then think about the individual emotion constants that could be combined to produce that value.
Remember that more than two constants can be combined, each separated by the vertical bar — you can combine as many or as few of them as you have, as long as the integer type you're using has sufficient capacity to store all of the bits.
Conclusion
You should now be able to understand better the very first example of Go code we looked at from the "log/syslog"
package, which used the inclusive OR operator to combine option constants. If we create a writer with a priority of syslog.LOG_WARNING|syslog.LOG_USER
, then our messages will simply be user-level warnings (as opposed to, say, kernel-level alerts, another priority value we could have used). We also now know that the boolean value of this combined parameter will be 0b1100
, which shows us the two binary digits that have been set.
This is a very efficient way of setting options, since we could, for example, store 64 different options on each of the bits of a single uint64
, whereas we would need to use 64 different variables if we had a separate bool
for each one.