Introduction

Steganography

Steganography, like Cryptography, is a science that has as objective the transmission of "secrets" from a sender to a recipient. Between these two sciences lies a substantial difference: the former allows a message to be transmitted as cleartext, but hidden inside other kinds of messages that are apparently harmless (cover messages); in the latter case, the tranmission of a secret message is evident, but the message itself is encrypted and secrecy lies in the difficulty to obtain the original cleartext for whoever isn't the sender or the intended recipient. The two techniques may obviously be combined by first encrypting the message to send (embedded message or payload) with cryptographic algorithms, then hiding it inside another message (text, image, or anything else) that will not raise any suspicions. This way one can have the benefits of both strategies, because were the payload discovered inside the message that is actually sent (stego message), it would still be hard to decypher it.

Throughout the centuries Steganography has been employed in a wide variety of ways. Herodotus tells of the Greek Demeratus at the Persian court of Xerxes, who managed to warn his fellow countrymen of an incoming attack on Sparta by the Persians simply by etching a message on the wood underneath some wax writing tablets and then covering it back with wax. The tablets then looked unused, and raised no suspicions when carried out of the country. Similarly, Histiaeus shaved the head of one of his slaves, tattooed a secret message on it and waited until the hair grew back before sending the slave to Persia, where the tattoo would help start a rebellion. Amazingly enough, the latter method is reported to have been used also by German spies in the 20th century.

In many other cases and ways spies and soldiers made use of steganographic techniques to transmit instructions and reports, for example using inks that would be invisible until heated, or, as the Germans used to do during the second World War, using microdots: through this technique some text or image to transmit is shrunk so much that it fits inside a dot printed in a harmless cover text, like for example a period at the end of a sentence or a dot over the letter 'i'.

By comparing a suspect message with the original message, when possible, it is possible to discover the payload, or at least its existence. In most cases this comparison is not possible, therefore steganalysis techniques are used, that is to say statistical analysis that, starting from a set of suspect stego messages, aims at finding some probability that messages in the set contain alterations (hardly determining it with certainty). This kind of analysis is based on the predictivity of the processes that bring to the construction of stego messages (thus identifying specific noise patterns, hints left behind by data compression, and so on) and it is complicated in many cases by voluntary addition of meaningless data or Cryptography of the payload, which tends to distribute data more evenly, therefore erasing easily recognizable patterns.

Steganography in software

So that it's possible to use Steganography to hide a payload inside a cover file, it is necessary that the cover file contain a sufficient amount of redundant data. By modifying this data in such a way that the alteration is not easily noticeable one can obtain good Steganography. The amount of redundant data may thus determine the "quality" of a cover file and its "capacity"; it should be easy then to understand why a text file isn't generally a good cover file, as little information could be hidden in it. Some steganographic techniques based on text formatting do exist though, like Line-shift Encoding and Word-shift Encoding, but they are quite vulnerable to changes in the formatting of the file contents or in the file format (for example when converting simple text to PDF format).

Digital images as cover files

Using digital images rather than text as cover files gives better results. Steganography based on this kind of cover generally relies on limitations in the way the human eye recognizes single colors in a set that are different from one another, but still very similar to one another. Because of this reliance greater benefits can be gathered from 24-bit images (which may use 16 millions of colors if we consider RGB images, much beyond the human abilities of recognition) rather than, for example, 8-bit ones (256 colors, usually tones of grey to mitigate the differences). The main problem with 24-bit RGB images is their large potential size: while providing a large quantity of redundant data among which to hide information, it could create problems during transmission using common means, for example via e-mail. Moreover, in some cases the size could raise some additional suspicions, as steganalysts are expected to know very well that large images make better covers. Excessive size may be dealt with with compression: lossy techniques (like JPEG) work very well in practice, but pose a high risk of damaging the hidden message; on the other hand, lossless algorithms (like GIF and BMP) maintain the whole information but can't achieve the same level of compression.

Among the most common techniques used with digital images we may mention Least Significant Bit (LSB). LSB works by substituting the least significant bit in every byte that makes up the cover image. This way, color information that is encoded in such bytes is changed, but not enough to make the alteration become visible to the human eye. With images with a color depth of 8 bits per pixel one would then be able to use a bit for every pixel in the image for hidden data; with 24 bits per pixel one would have three times as much room where to hide information. When dealing with 24-bit images it becomes possible to use the two least significant bits in every byte, since even in that case the alteration will be hardly noticeable. This technique is unfortunately vulnerable to lossy compression and format conversion, that often operate on the least significant bits themselves.

Other techniques are based on manipulation of brightness and constrast data in images, for example to add a watermark (Digital Watermarking) to digital content, making it part of the content itself. These techniques go beyond the purposes of this project and will not be taken into consideration here.

Digital audio as cover files

Digital audio files may also be used as cover files. The most widely used audio formats are based on Sample Quantization (like WAVE and AIFF) or Temporal Sampling Rate. In these cases the "usable" space in the file is generally larger the higher the sampling rate is. A third widely used encoding format is called Perceptual Sampling and is based on encoding only those parts of sounds that may be perceived by the human ear (as in MP3 encoding: when encoding sound data, the frequencies that cannot be heard by the human ear can be eliminated for compression reasons). If when using digital images as cover files the difficulty of the human eye to distinguish colors is taken advantage of, when using digital audio one can count on the different sensitivity of the human ear when it comes to sounds of low and high intensity: usually, higher sounds are perceived better than lower ones, and it is thus easier to hide data among low sounds without the human ear noticing the alteration.

With these forms of encoding there are usable steganographic techniques. Again, one may use a Least Significant Bit (LSB), or Low-bit Encoding, methodology to embed data in the least significant bits. Transmission channel capacity is in this case usually equal to 1 Kbps per kilohertz, but it is easy to lose information because of noise and re-quantization. A safer technique is called Spread Spectrum, and through its use the message is encoded throughout the entire frequency spectrum. The audio file is then transmitted on various frequencies, that change according to the method used. A possible method is called Direct Sequence Spread Spectrum; it multiplies the signal by a pseudo-random sequence called chip before it is transmitted. This technique has the problem of being able to introduce noise in the sound, with the possibility of data loss.

Related software

Several software utilities let users experiment with Steganography, including: JPHide and JSteg, Outguess, Steghide, Stegdetect and Stegbreak. These programs deal mostly with the JPEG image format for the cover file, but programs that correctly accept BMP and PPM formats are common. All commonly available software permits data embedding and extraction, using one or more predefined algorithms; Stegdetect and Stegbreak are used for steganalysis, that is to say to detect the presence of a hidden payload in apparently harmless files. Stegbreak is also able to attempt attacks based on brute force and dictionaries of common words to try and access the hidden data.

For a detailed discussion of the techniques employed by each of the listed programs please see the respective documentation and web sites; in the next chapter we shall instead examine Steghide, the program for which the front-end SteGUI was written and that supports both digital images and digital audio as cover files.

© 2005-2008 Nicola Cocchiaro