Codes and secret languages are great fun if your playing spies or detective games, but they're also a really important part of keeping yourself safe online. When you access a website you don't want the American (and British) government recording everything you do! While once upon a time if you wanted to be a spy, then you'd need a degree in history, or philosophy from Oxbridge, now days the best qualification is a good maths degree.
Lets start with some basic terminology so you don't sound like an amateur: "Codes" are things that represent text in a different way, so when you say "the fox has left the henhouse", the person you're talking to knows what you really mean. When we use an algorithm to mess around the letters, then we should call that a cipher.
The other thing you need to know if you want to send messages as a cryptographer, is that your new name is Alice. Cryptographers are ALWAYS called Alice. Alice (as best I can figure out - she's pretty secretive about his), seems to have some kind of crush on Bob. I know there's something going on between them (implied by the metadata) - Alice is always sending secret messages to Bob. I don't know what they say because they always encrypt what they're saying.
Alice sends messages to Bob. Others taking an interest in their behaviour include Carol, Dan, Eve.
Shared Secret Key Encryption
If you've played with ciphers before then you'll know that the basic method is to scramble the letters up. For example A gets switched with Z, B gets switched with Y, C with W and so on. You take your message, switch the letters, and then send the scrambled message. At the other end Bob unscrambles the letters and can read the original message.
But what happens if Carol gets to see the encrypted message? What's to stop her unscrambling it? Well if we switch the letters in a really obvious fashion (like we just did), then pretty much nothing. But we could scramble them in all sorts of different ways. We could write down a table of letters explaining how to scramble/unscramble the message. If Bob knows the exact method of scrambling and Carol doesn't then only he will be able to unscramble it (we hope).
Because the table of letters can "unlock" the message we call it a "key". Encryption on a computer uses the same basic technique.
make cipher xxtea device
make key string
make plaintext string
make cyphertext string
when start
.forever
..wait pick random 1 to 100 microsecs
when start
.ask "Generate some Entropy?" and wait
.tell cipher to "generateKey"
.say join "shared secret key " key
.
.set plaintext to "This is secret"
.tell cipher to "encrypt"
.set plaintext to ""
.
.say "Encrypted:"
.say cyphertext
.
.tell cipher to "decrypt"
.say plaintext
Sniff has a device called xxtea which implements a reasonable strong but simple encryption method. It's a bit more advanced than simple swapping letters, but not much different in principle. We start by generating a key, which creates a very larger number representing how we're going to scramble the message. Normally we'd generate this separately, and both Alice and Bob already know it.
Then we take our message, which we call the "plaintext", and encrypt it. What we get back is the "ciphertext" - the mangled version of our message. Getting our original message back is easy - just call decrypt, and ciphertext is decrypted into plaintext again. Provided of course you know the key!
If you don't know the key then you'll have to try and crack the cipher. In fact there are know attacks against XXTEA so if you happen to have data protected with it, and a professional cryptographer wants to get they probably can. On the other hand if professional cryptographers are trying to get your data you've already got big problems. If you just want some basic encryption then it should be pretty safe to use.
Entropy and Randomness
On important step in making this work is generating good keys. These should be random - otherwise they can be calculated! If we can calculate them, so can someone else. But it turns out computers are really bad at generating random numbers, because they essentially do have to calculate them. To get round this we need a source of "Entropy"- something external to the program than is essentially unpredictable, and outside the code's control.
when start
.forever
..wait pick random 1 to 100 microsecs
when start
.ask "Generate some Entropy?" and wait
Before we generate any keys we ask the user to "generate some Entropy". This is just a fancy way introducing an unpredictable delay into the program. You run the code, and after a few seconds (hours/days!) you press return to continue. In the mean time a loop has been running that is burning through random numbers. We throw away random numbers for a completly undetermined length of time. Even if we tried, there's no way to reproduce the exact same delay, so we get genuinely randomised keys each time.
Public Key Encryption
However there is a more signifigant problem with this whole approach to encryption. Alice wants to send a message to Bob. She knows Carol is listening in, so decides to encrypt the message. She creates a key, and uses it to encrypt the message, and then sends the message to Bob. But of course Bob before he can decrypt the message he needs to get the key from Alice. Of course Alice could send it to him... but Carol is listening. If Alice had a way to send the key to Bob, then she could just that to send her message!
In practise the way round this is for Alice and Bob to meet up somewhere in private and agree on a shared, secret key. They can then use that key for future communications. Once upon a time secure couriers were used to carry brief cases with crypographic keys around the world. Once they key had been securely transported, those keys could then be used for secure communication without having to worry about eavesdropping.
That sort of works if you're a large business and need to communicate securely between two of your offices on different sides of the world, but it breaks down when millions of people want to communicate with websites. How can you securly communicate with a website when you don't have a shared secret key, and making/getting one is impractical?
In the early 70's GCHQ figured out how to do this, and in typical helpful government agency fashion they immediately classified it, and made sure no one found out about it. A few years latter a bunch of guys at MIT figured out exactly the same solution and published it as what we now call RSA (and went on to get rich and famous as a result, though recently their company did some bad things).
The trick they figured out was how to make a cipher where you could tell everyone the key, and it wouldn't matter. More specifically they broke the key into two parts: knowing how to scramble the letters (encrypt) wouldn't help you unscramble them again (decrypt). You could tell everyone your "public key" and then they could encrpyt messages. However the "private key" for decrypting you kept secret.
Now if Alice wants to talk to Bob she can just say "hey Bob whats your public key", and Bob can shout it back as loud as he likes. Carol might hear it but it doesn't matter. Alice encrypts a message with that key, and sends it to Bob. Carol might hear that message too, but it can only be decrypted using Bob's private key, so only Bob can read it. This is exactly what happens when you connect to a "secure" website.
make cipher rsa16 device
make publicKey string
make privateKey string
make key string
make plaintext string
make cyphertext string
when start
.forever
..wait pick random 1 to 100 microsecs
when start
.ask "Generate some Entropy?" and wait
.tell cipher to "generateKey"
.say join "Public key is " publicKey
.
.set key to publicKey
.set plaintext to "This is secret"
.tell cipher to "encrypt"
.set plaintext to ""
.
.say cyphertext
.
.set key to privateKey
.tell cipher to "decrypt"
.say plaintext
Here's some code to do that in Sniff. We've made an rsa16 device, and when we ask it go make us a key it makes two. On is public and the other is private. Just as before we encrypt and then decrypt the message, but this time we use a different key each time. It's like a lock where one key can lock it, and the other can unlock it.
Lets break that out into its two parts. Alice knows Bobs public key and used it to encrypt:
when start
.set bobsPublicKey to "12827:349"
.
.ask "What's the message?" and wait
.set plaintext to answer
.set key to bobsPublicKey
.tell cipher to "encrypt"
.say cyphertext
Then Bob can decode it:
when start
.set bobsPrivateKey to "12827:4549"
.
.ask "What's the cipherd message?" and wait
.set cyphertext to answer
.set key to bobsPrivateKey
.tell cipher to "decrypt"
.say plaintext
Remember that only Bob knows his private key so only he can do this!
But wait a minute... What if Carol is playing a trick on Bob. Everyone now knows Bob's public key (it is after all - public!), so Carol decides to send Bob a message and pretend its from Alice. Being able to prove who a message is from is actually even more useful than being able to keep it secret. It's effectively a "digital signature", and RSA can do that too!
What if Alice encrypted the message with her private key? Only she can do that because only she knows it. Then we could decrypt it with her public key. That doesn't help keep it secret as anyone can now decrypt the message, but it does means anyone can prove that the message came from Alice.
If we want both, then we can both sign and encrypt the message:
when start
.set bobsPublicKey to "12827:349"
.set alicesPrivateKey to "19493:5609"
.set alicesPublicKey to "19493:89"
.
.ask "What's the message?" and wait
.set plaintext to answer
.
.set key to alicesPrivateKey
.tell cypher to "encrypt"
.
.set plaintext to cyphertext
.
.set key to bobsPublicKey
.tell cypher to "encrypt"
.say cyphertext
Alice first signs the message using her private key to prove its from her. Then encrypts with Bobs public key, so only he'll be able to read it. Note we do it this way round, so that only Bob can see that the message is from Alice (darn metadata again).
Then do decode and check:
when start
.set bobsPrivateKey to "12827:4549"
.set bobsPublicKey to "12827:349"
.set alicesPublicKey to "19493:89"
.
.ask "What's the encrypted message?" and wait
.set cyphertext to answer
.set key to bobsPrivateKey
.tell cipher to "decrypt"
.
.set cyphertext to plaintext
.
.set key to alicesPublicKey
.tell cipher to "decrypt"
.say plaintext
If we get a valid message at the end, then it must have been sent by Alice and can only be read by Bob. Rather than having "shared secrets" (as the saying goes, three people can keep a secret if two of them are dead), everything is either public (we assume everyone knows it) or private (only one person knows it).
When you connect to a secure website you use their public keys, so only they can read what you send them, and you send them your public key so they can send you data securely.
But is it really secure?
How can you give instructions for scrambling without someone smart being able to work those same steps backwards? It turns out its all based on prime numbers and clock arithmetic (or modulo arithmetic to give it its proper name). That maths is complex, but its based on a basic ideas.
Let's look at Bob's keys again:
.set bobsPrivateKey to "12827:4549"
.set bobsPublicKey to "12827:349"
The keys consist of two parts: the first is known as n, and its common to both bits. But its not just any number: its the product of two prime numbers. To generate a pair of keys we first generate two prime numbers, multiply them to get n, then use all three numbers to create the second parts of the keys.
Here's the trick: finding two prime numbers isn't too difficult, pick any two (7 and 17?). Multiplying them is easy (119). But what if I asked you what the factors of 119 were? Turns out that's a "hard" problem - or more accurately its believed to be a hard problem. Mathematicians and Computer Scientists have a very specific definition of "hard problem", which means something more than difficult - it basically comes down to saying the only solution is to try potential answers and see if they work. No one can prove that factoring is "hard" (that a way of doing it doesn't exist), but no one has shown it can be done efficiently either - RSA depends on this assumption.
If we know the original prime numbers then we can generate the keys. We then throw away those primes, knowing that without them we don't have enough information to calculate the private key from the public key, and there's no easy way to find them even though we know n.
So the security of RSA is dependant on how hard it is to factorise a number. If we want to crack Bobs messages we need to find two numbers that multiply to give 12827. That is (probably) "hard", but unfortunately its not actually that difficult if you've got a computer. A quick bit of googling turns up a websites that can do it, and if you've got a Pi, then Mathematica can solve it almost instantly.
We've done nothing wrong in theory but 12827 is just too small. A computer can try every possible factor almost instantly. We need a bigger number! The RSA16 device is a real implementation of RSA, and works great to demonstrate public key in action but it uses values of n that are up to 16bits in length. That's not enough to keep anyone out. You can try the RSA32 device which uses 32bit keys, but that's massively slower, and still way to small to be considered the slightest bit secure. Currently 2048 bits are required for even basic levels of security - computers are just really fast! 4096bits is a good idea if you're serious about this, and even 8192bits might be needed some time soon. Currently we don't have a Sniff RSA2048 device, so the RSA implemention is just "for fun".
The xxtea, rsa16 and rsa32 devices will be in Sniff Release 19, which should be available in a few weeks.