Golang Project Structure

Tutorials, tips and tricks for writing and structuring code in Go (with additional content for other programming languages)

How to Hide a Secret Message in a Unicode Text File

Language

  • unknown

by

Imagine sending a secret message that’s hidden in plain sight — embedded within an innocent-looking text file. No encryption and no suspicious attachments, just pure Unicode trickery.

It sounds like something out of a spy novel, doesn’t? Well, with a little bit of clever manipulation, we can use Unicode characters to conceal data in a way that’s nearly undetectable to the casual observer, and I’ll show you how to do just that in this blog post.

We will begin by discussing the meaning of the word steganography and how it relates to secret messages. Then we will look at some Go code that can be used to hide secret messages of own.

What Is Steganography?

Steganography is the technical term for hiding something secret in something else that looks normal, so doesn’t raise any suspicions. This can be a message that’s hidden in an ordinary looking letter or text file, but it can also refer to secret messages contained on physical objects or embedded within audio/video files.

The word steganography comes from two ancient Greek words: the first part comes from στεγανός [steganós], which is an adjective meaning “concealed”, and the suffix comes from γραφή [graphé], which means “writing”.

So it may seem like a complicated and technical term, but the word really just means “secret writing”.

How Was Steganography Employed in Real-World Situations?

An early example of employing steganographic techniques would be using invisible ink to write a secret message on a piece of paper.

Substances like lemon juice, milk or vinegar could be used to write a message that remained invisible until exposed to heat or ultraviolet light, so the secret text would be rendered invisible to most observers.

Another practical example of steganography is the use of microdots: these are tiny photographs of documents that are shrunk down to the size of a period on a printed page.

Microdots were heavily employed during the Cold War by intelligence agencies seeking to smuggle critical information across borders without arousing suspicion.

Similar techniques involved embedding messages within seemingly harmless letters or documents by using slight alterations in the text’s spacing, alignment or handwriting, allowing an informed recipient to reconstruct the otherwise hidden content.

Another more trivial and whimsical example would be the act of hiding a file inside a cake in order to help a prisoner escape incarceration, as seen on countless movies and TV shows. While this is more of a smuggling technique than true steganography, the underlying principle remains the same — i.e. concealing something in plain sight to avoid detection.

Of course, in our digital age, the practise of steganography has evolved to use more sophisticated methods, such as embedding messages within image files, audio files and even network traffic.

Cybercriminals have been known to hide malicious code inside seemingly benign executables, in order to evade detection by security software, while whistleblowers and journalists have used digital steganography to communicate securely when they’re working within oppressive regimes.

Do Prisoners Use Steganography?

In fact, the previous example was particularly apt, because steganography has long been associated with prisoners (and other people who are being held captive, such as POWs).

When communication is restricted and monitored, concealing messages becomes not just a matter of secrecy but of survival.

The history books are replete with heroic tales of prisoners inventing ingenious ways to hide information within seemingly innocent correspondence, artwork, food — and even their own bodies.

One of the most famous examples of prisoners using steganography comes from World War II, when the British intelligence agency MI9 developed specialized playing cards, Monopoly boards and other seemingly innocent items that contained hidden escape maps.

These were sent to the POW camps under the guise of Red Cross humanitarian aid, providing captured soldiers with a way to navigate their way to freedom. Since the Geneva Convention allowed POWs to receive aid packages, the deception often went undetected.

Aleksandr Solzhenitsyn, the Soviet dissident and author of The Gulag Archipelago, reportedly wrote parts of his manuscript on tiny strips of paper and hid them inside everyday objects while imprisoned in a Soviet labour camp.

Some incarcerated individuals have been known to communicate by marking specific words in legal paperwork, arranging messages within the first letter of each paragraph, or even modifying the layout of chess moves in correspondence games to spell out hidden words.

The Prisoners’ Problem

These historical examples naturally bring us to a more general formulation.

The Prisoners’ Problem, which was first formulated by cryptographer Gustavus J. Simmons in 1983, presents a scenario where two inmates, traditionally named Alice and Bob, wish to coordinate an escape plan.

Their communications are monitored by a warden, named Eve, who will intercept any messages containing hidden information.

Solving this problem underscores the need for covert communication techniques, such as steganography, to embed secret data within ordinary media, making the existence of the hidden message undetectable.

A view through metal bars inside Alcatraz Federal Penitentiary, showing the cells inside the prison.
A view of the cells inside Alcatraz Federal Penitentiary, historically a maximum security prison, now a museum.

In this context, the effectiveness of a steganographic method hinges on its undetectability. If Eve can detect the hidden communication, the method is compromised — and the prisoners could be punished, isolated or, at worst, killed.

What Is Steganalysis?

On the other hand, Eve has a toolkit she can use against steganographic attempts at communication between the two prisoners: steganalysis is the name given to the set of tools and techniques that can used to detect or extract hidden information, primarily by analyzing inconsistencies and patterns.

One of the most fundamental approaches used in steganalysis involves statistical analysis. A method that’s widely used today is histogram analysis, where the frequency distribution of pixel values in an original image is compared to that of a suspected “stego-image”. If the embedding process alters the natural pixel distribution of pixels, the histogram may reveal subtle yet detectable changes.

Modern steganalysis tools also increasingly rely on machine learning and deep learning to improve their accuracy of detection.

How Could Someone Encode a Secret Message?

Now that we understand the fundamental ideas behind steganography and steganalysis, let’s do a little roleplay and think through a real-world example.

Let’s assume that you’re an accomplice of a bank robber and you want to pass the code to the bank’s safe onto him without anyone else suspecting what you’re talking about.

Hello, just a quick message to let you know that
I'm planning to see you at my sister's wedding.
She's holding it at Snodgrass Hall on March 31st.
The ceremony starts at 16:55. You can bring up to 8 guests,
so long as you notify us in advance.

See you there!

If you find all of the digits in the message above and concatenate them, you get the secret code 316558. It doesn’t matter whether there actually is a wedding ceremony or not: that’s just filler to hide the secret code!

Of course, it would be better if your sister actually was holding her wedding at the venue listed, because then you have a greater degree of plausible deniability.

If the plaintext message makes sense, then no one has any need to search for a secret one. On the other hand, if the plaintext message sounds nonsensical, then suspicions are more likely to be raised.

Using Invisible Unicode Characters to Hide a Secret Message

There are a small number of characters in the Unicode character set that aren’t visible but still occupy space in a text.

When we go on to write some Go code, we are going to use the two characters below, which are normally invisible, in order to encode a hidden message:

Character NameUnicode ValueBinary Value
zero-width non-joiner0x200c0
Zero-width joiner0x200d1

It’s important to note that the binary values in the table above have been assigned by us as part of our secret code. There’s nothing about a zero-width joiner that means one and there’s nothing about a zero-width non-joiner that means zero.

We’ve chosen to equate those two binary numbers to the Unicode characters, so that we can represent long strings of binary digits simply by using these two characters, which won’t be visible in most text-rendering applications.

We can’t use the ASCII codes for 0 and 1 themselves, because then those numbers would be visible, and they’d alert the viewer that there’s something strange hidden within their message.

Encoding the Secret Message

Now we’re actually going to do some encoding.

The code below will steganographically encode a hidden message within ordinary text, using the invisible-character method we’ve just discussed.

Have a look at the code and then we’ll discuss it in more detail:

package main

import (
	"fmt"
	"strconv"
	"strings"
)

const (
	zeroWidthNonJoiner = rune(0x200c) // used as binary 0
	zeroWidthJoiner    = rune(0x200d) // used as binary 1
)

func encode(secretMessage, plainMessage string) (encodedMessage string) {
	var encodedMessageBuilder strings.Builder

	for i, r := range plainMessage {
		encodedMessageBuilder.WriteRune(r)

		if i < len(secretMessage) {
			r2 := secretMessage[i]
			r2Binary := strconv.FormatInt(int64(r2), 2)

			for _, b := range r2Binary {
				switch b {
				case '1':
					encodedMessageBuilder.WriteRune(zeroWidthJoiner)
				case '0':
					encodedMessageBuilder.WriteRune(zeroWidthNonJoiner)
				}
			}
		}
	}

	encodedMessage = encodedMessageBuilder.String()

	return
}

func main() {
	const secretMessage = "I HATE YOU SO MUCH"
	const plainMessage = `Hello again, sweetheart.
		I just wanted to tell you how much I've been thinking of you.
		I know that we've had our difficulties, but I'll always love you.`

	encodedMessage := encode(secretMessage, plainMessage)

	fmt.Println(encodedMessage)
	fmt.Println(len(encodedMessage), len(plainMessage))
}

Within the main function, we first declare two constants: secretMessage holds a string of text that expresses my true feelings, while plainMessage is the text that I’m happy to share openly.

We’re going to hide the content of secretMessage within plainMessage, so that an unwitting observer wouldn’t know my true feelings, but anyone who can successfully decode the message will understand them.

In order to hide secretMessage, we use the encode function. This iterates through every rune in plainMessage, adding that rune to the strings.Builder variable that is used to create our output. Before moving on to the next rune in plainMessage, however, we get a rune using the same index from secretMessage (if one exists).

We encode r2, which is the rune from secretMessage, as a binary-digit string, which just contains a long row of zeros and ones. The length of r2Binary will depend on the highest bit that’s set in the number representing the rune.

We then iterate through r2Binary, writing a zero-width joiner rune to our output if we encounter a one, and writing a zero-width non-joiner if we encounter a zero. This is the arbitrary encoding system that we discussed earlier.

We have chosen to associate a specific meaning to these Unicode characters that they don’t inherently have. No one who comes across them in our text would necessarily assume that they stand for zeros and ones (and that each string of binary digits stands for a human-readable character), unless they already knew something about our encoding system.

Note that it’s necessary for secretMessage to be less than or equal in length to plainMessage, because of the way our algorithm works. However, there are various ways we could modify it to allow a small plainMessage to hide a big secretMessage.

For example, we could use another type of invisible character to act as an end-of-rune marker, so that multiple runes of secretMessage could be placed between a single rune of plainMessage, or we could simply use a defined length of binary digits (32 would be a reasonable choice, since a rune is equivalent to a uint32 in Go, even though ASCII characters only use seven or eight bits), so we know that if we have twice that many digits, then we must have two runes.

Decoding a Secret Message

Now that we’ve done the encoding, let’s write some code to decode the message that we’ve just hidden:

package main

import (
	"fmt"
	"os"
	"strconv"
	"strings"
)

const (
	zeroWidthNonJoiner = rune(0x200c)
	zeroWidthJoiner    = rune(0x200d)
)

func encode(secretMessage, plainMessage string) (encodedMessage string)

func decode(encodedMessage string) (decodedMessage string, err error) {
	var decodedMessageBuilder strings.Builder
	var binary string

	for _, r := range encodedMessage {
		switch r {
		case zeroWidthJoiner:
			binary += "1"
		case zeroWidthNonJoiner:
			binary += "0"
		default:
			if len(binary) > 0 {
				runeValue, errLocal := strconv.ParseInt(binary, 2, 0)
				if errLocal != nil {
					err = errLocal
					return
				}

				decodedMessageBuilder.WriteRune(rune(runeValue))
			}

			binary = ""
		}
	}

	decodedMessage = decodedMessageBuilder.String()

	return
}

func main() {
	const secretMessage = "I HATE YOU SO MUCH"
	const plainMessage = `Hello again, sweetheart.
		I just wanted to tell you how much I've been thinking of you.
		I know that we've had our difficulties, but I'll always love you.`

	encodedMessage := encode(secretMessage, plainMessage)
	decodedMessage, err := decode(encodedMessage)
	if err != nil {
		panic(err)
	}

	if decodedMessage == secretMessage {
		fmt.Println("MESSAGE SUCCESSFULLY DECODED")
	} else {
		fmt.Fprintln(os.Stderr, "DECODED MESSAGE DOES NOT MATCH THE ORIGINAL SECRET")
	}
}

The decode function is responsible for extracting the hidden message from an encoded text by detecting and interpreting the zero-width characters embedded within it. As it iterates through the runes in encodedMessage, it looks for occurrences of the zero-width joiner and zero-width non-joiner characters, which were used to encode binary data in the encode function.

Whenever it encounters a zero-width joiner, it appends a "1" to a string variable named binary, and whenever it encounters a zero-width non-joiner, it appends a "0". This process gradually reconstructs the binary representation of each hidden rune, one bit at a time.

The function continues this process until it encounters a non-zero-width character. The appearance of any visible character in the text signals that a complete sequence of binary digits has been collected.

At this point, binary is converted into an integer using strconv.ParseInt, which interprets the string as a base-2 number. This integer corresponds to a Unicode code point, which is then cast into a rune, allowing it to be appended to the decodedMessageBuilder, which is responsible for constructing the final output string.

Once the character has been successfully extracted, the binary string is reset, ensuring that the next sequence of zero-width characters can be processed correctly.

One potential issue that could arise during this process is the possibility of encountering an invalid binary sequence, which could happen if the encoded text is malformed or contains extraneous zero-width characters that do not form a valid Unicode code point. To handle such situations, the decode function checks for errors when parsing the binary string. If strconv.ParseInt returns an error, the function immediately propagates it, thereby preventing corruption of the output message.

Finally, the main function demonstrates the complete encoding and decoding process. It first encodes the secret message within the plain message and then attempts to extract it again.

After calling decode, it compares the extracted message with the original secretMessage to verify that the process was successful.

If the two strings match, it prints a success message to indicate that the hidden message was correctly recovered. Otherwise, it logs an error message to stderr, signalling that something went wrong.

This simple but effective verification step ensures that the encoding and decoding functions work as intended, allowing us to have confidence in the integrity of our hidden communications.

Leave a Reply

Your email address will not be published. Required fields are marked *