Some thoughts about steganography
Let’s say you have plaintext A to hide. You have something to hide it in, we’ll call it medium B (preferably lossy, but hiding it in audio WAV audio files works just fine while JPG images often might be preferred).
You generate key C to protect it. You do this by picking a strong password and running this trough a one-way checksum generator. SHA256 is a good choice if you’re going to use AES with 256 bit encryption.
If A is a text file, you should compress it. Bzip2 is a good choice, IMHO. Then you encrypt it with key C and a symmetric encryption algorithm like AES, giving you the ciphertext.
Then you generate an error correction code for the encrypted data becaue it makes it a bit more resistant against modifications.
Then you encrypt the error correction code with the same key C. (Yes, this means that if there’s damage to the error correction code in the image you have lost the ability to get easy verification.)
Now you append the the encrypted error correction code to the ciphertext.
This is then hidden in your medium B using key C as a key, once again. Yes, this means that if you use a poor steganography algorithm that the key can be extracted from if all you got is the medium, then your encryption method is broken too.
When extracting the data you use the same key C to get the ciphertext and encrypted error correction code. Then you decrypt the error correction code and verify the ciphertext. Then you decrypt the ciphertext and get the hidden data.
If you used an encryption algorithm that does not have a “waterfall effect” on the error correction data, you would not loose the entire error correction data due to a small error in it.
(This would mean not using AES, but potentially just XOR:ing the error correction data with the key. Beware of any encryption method that is weak to cryptoanalysis! Also, beware of steganography methods that let attackers calculate your key!)
If using an encryption method where bits are encrypted one by one or only in small chunks that doesn’t effect the rest of the encrypted data, it could allow a JPG image to be recompressed and the data would be recoverable, despite being encrypted and seemingly random to begin with. Depending on the sixe of the image and the data, the data could survive almost unimaginable alterations. With 1000% error correction data (10 bits of reduntant data for every bit of actual data – Qr codes use 30%, 3 extra bits per 10 bits of data) and a 20 megapixel image, a couple of lines of text could easily be hidden and survive many recompressions and alterations. The data could potentially be recoverable if you printed the picture and took a photo of it and then tried to recover the data from that.
Encrypting the error correction code prevents an attacker from being able to easily confirm if he has found the hidden ciphertext or not in an image.
So what’s the point with all this? Nothing, really. It’s just interesting to me. I’d like to try implementing this myself some day (with existing algorithms of course, sine I’m lazy ;).
It would be fun to print an image and take a photo of it and still be able to recover he hidden data.