Cryptography is only one small part of network security, but it can be a particularly fun part. In this lab, you will decrypt a message, without having been given the secret key used to perform the encryption.
Our textbook describes polyalphabetic substitution ciphers, in which each character of the plain text is mapped into a corresponding character of ciphertext, but the mapping used varies depending on the position within the texts. That is, the first plaintext character gets mapped to the first ciphertext character in one way, the second plaintext character to the second ciphertext character in a second way, etc.
For implementation in a computer, a handy version of polyalphabetic substitution uses one pseudo-randomly chosen byte for each position in the texts, and computes a byte of ciphertext by computing the bitwise exclusive-or of the corresponding byte of plaintext with the pseudo-random byte. The nice feature of this particular version is that each mapping is its own inverse. Thus, you don't need two programs, one for encryption and one for decryption. Instead, if you just run the ciphertext through the same program a second time, with the same key (i.e., sequence of pseudo-random bytes), you will get the plaintext back.
The cryptographic strength of this approach hinges entirely on how good the pseudo-random number generator is. If the bytes are truly random, and used only once, then the system is called a "one-time pad" and is theoretically unbreakable. Any plaintext of the same length would be equally likely to map into the ciphertext. The problem is that the sender and receiver now have to agree upon a key the size of the text to be sent, namely the full sequence of random bytes. It is tempting to agree upon a smaller key and use it as a "seed" to generate an arbitrarily long sequence of pseudo-random bytes by algorithmic means. The problem with this approach is that the cryptosystem may no longer be secure, especially if the pseudo-random number generator is poorly designed, as you will find in this lab.
The file DumbCrypt.java contains a Java
program that can transform a file using this sort of pseudo-random
exclusive-or polyalphabetic substitution cipher. As described above,
the program can be used identically both for encryption and for decryption.
Assuming you have compiled this Java class, if you type a message into
a file called plain.text
, and then run
java DumbCrypt plain.text cipher.text 987654321 java DumbCrypt cipher.text deciphered.text 987654321
you will find that all three files have the same length in bytes, and
that cipher.text
contains gibberish, whereas
deciphered.text
contains the exact same contents as
plain.text
. The program was used both times with a key of 987654321;
no difference was needed between encryption and decryption, since the
exclusive-or substitutions are their own inverses.
If you leave the key off of the command line (e.g., omit
987654321
from the prior examples), you will find the program
reports "Key recovery feature not yet written." This is because one
way you might want to achieve the lab's goal would be to write
this feature, so that when the program is not given a key, it figures
one out on its own, so as to transform the input file into an output
file that looks like it might plausibly be plaintext. That is,
continuing the above example,
java DumbCrypt cipher.text recovered.text
would result in a file recovered.text
that is likely to
be identical to plain.text
. (The cryptosystem used in
DumbCrypt is unlikely to have more that one key that produces
plausible plaintext.) Note however that you need not program this particular
feature in the Java program. You could use some other programming
language or means, so long as you succeed in deciphering the challenge
ciphertext.
Each of you has a personal file in the directory
~max/MCS-377/2006-Fall/ciphertexts
on our fileserver that
contains a ciphertext produced using DumbCrypt. Your file is has as
its name the first letter of your last name (as an upper-case
letter). Your goal is to find
the corresponding plaintext. I don't want to
say too much in advance about how you might go about this. However,
as usual, feel free to ask me for help, either in person or by
email.
byte
and char
in JavaThe DumbCrypt program uses the Java data type byte
for
each character it processes. This type treats each eight-bit quantity
as an integer in the range from -128 to 127 (i.e., -27 to
27-1). Depending on how you attack the problem, you might
find yourself wanting to convert between these bytes and the
char
datatype used in Java for characters. For some
purposes, you can mix the two types without problems. For example,
you can compare a byte
to a char
using
comparison operators like ==
, >
,
<
, etc. For other purposes, you may need to
explicitly convert. You can convert a byte into a
char by prefixing it with the "cast" operation
(char)
, and can do the conversion in the opposite
direction by writing (byte)
in front of a char.
The particular kind of pseudo-random number generator used in DumbCrypt is called a linear congruential generator. Starting with some initial value n0 (which in our case is the key), this class of generators produces a sequence of values by a rule of the following form:
nk+1 = ank + b (mod m)
The DumbCrypt program uses the coefficient a = 314159, the constant term b = 2718, and the modulus m = 232.
In the Java code, the variable generator
holds the
current element of the sequence, nk.
To step it to the next value, it is multiplied by our value of a and
then increased by our value of b:
generator *= 314159; generator += 2718;
The reduction modulo m is implicit, because all arithmetic
in Java on the data type int
is done modulo
232. (The range runs from -231 to
231-1, rather than from 0 to 232-1, but that is
immaterial. It just affects which element of each congruence class is
used to represent the class, not the fundamentals of the modular
arithmetic.)
Because each element of this sequence is a full 32-bit integer, and we only want an 8-bit byte, a cast operation is used to throw away all but one byte:
output[i] = (byte) (generator >> 8);
As you can see, before the cast is done, the value is shifted 8 bits to the right. This is because the rightmost bits aren't very random, so it is good to discard them and use the bits to their left.
If you succeed in producing recognizable English text from your ciphertext, all you need to turn in is the deciphered text, along with the standard signed honor pledge, and you will receive full credit.
If you do not succeed, turn in a description of the approach you were trying, a copy of any program code you were using, and information on where you got stuck.