MCS-377 Lab 4: Cryptography (Fall 2006)

Due: December 8, 2006

Objective

Cryptography is only one small part of network security, but it can be a particularly fun part. In this lab, you will decrypt a message, without having been given the secret key used to perform the encryption.

Background

Our textbook describes polyalphabetic substitution ciphers, in which each character of the plain text is mapped into a corresponding character of ciphertext, but the mapping used varies depending on the position within the texts. That is, the first plaintext character gets mapped to the first ciphertext character in one way, the second plaintext character to the second ciphertext character in a second way, etc.

For implementation in a computer, a handy version of polyalphabetic substitution uses one pseudo-randomly chosen byte for each position in the texts, and computes a byte of ciphertext by computing the bitwise exclusive-or of the corresponding byte of plaintext with the pseudo-random byte. The nice feature of this particular version is that each mapping is its own inverse. Thus, you don't need two programs, one for encryption and one for decryption. Instead, if you just run the ciphertext through the same program a second time, with the same key (i.e., sequence of pseudo-random bytes), you will get the plaintext back.

The cryptographic strength of this approach hinges entirely on how good the pseudo-random number generator is. If the bytes are truly random, and used only once, then the system is called a "one-time pad" and is theoretically unbreakable. Any plaintext of the same length would be equally likely to map into the ciphertext. The problem is that the sender and receiver now have to agree upon a key the size of the text to be sent, namely the full sequence of random bytes. It is tempting to agree upon a smaller key and use it as a "seed" to generate an arbitrarily long sequence of pseudo-random bytes by algorithmic means. The problem with this approach is that the cryptosystem may no longer be secure, especially if the pseudo-random number generator is poorly designed, as you will find in this lab.

The file DumbCrypt.java contains a Java program that can transform a file using this sort of pseudo-random exclusive-or polyalphabetic substitution cipher. As described above, the program can be used identically both for encryption and for decryption. Assuming you have compiled this Java class, if you type a message into a file called plain.text, and then run

  java DumbCrypt plain.text cipher.text 987654321
  java DumbCrypt cipher.text deciphered.text 987654321

you will find that all three files have the same length in bytes, and that cipher.text contains gibberish, whereas deciphered.text contains the exact same contents as plain.text. The program was used both times with a key of 987654321; no difference was needed between encryption and decryption, since the exclusive-or substitutions are their own inverses.

If you leave the key off of the command line (e.g., omit 987654321 from the prior examples), you will find the program reports "Key recovery feature not yet written." This is because one way you might want to achieve the lab's goal would be to write this feature, so that when the program is not given a key, it figures one out on its own, so as to transform the input file into an output file that looks like it might plausibly be plaintext. That is, continuing the above example,

  java DumbCrypt cipher.text recovered.text

would result in a file recovered.text that is likely to be identical to plain.text. (The cryptosystem used in DumbCrypt is unlikely to have more that one key that produces plausible plaintext.) Note however that you need not program this particular feature in the Java program. You could use some other programming language or means, so long as you succeed in deciphering the challenge ciphertext.

The challenge

Each of you has a personal file in the directory ~max/MCS-377/2006-Fall/ciphertexts on our fileserver that contains a ciphertext produced using DumbCrypt. Your file is has as its name the first letter of your last name (as an upper-case letter). Your goal is to find the corresponding plaintext. I don't want to say too much in advance about how you might go about this. However, as usual, feel free to ask me for help, either in person or by email.

Technical notes

The types byte and char in Java

The DumbCrypt program uses the Java data type byte for each character it processes. This type treats each eight-bit quantity as an integer in the range from -128 to 127 (i.e., -27 to 27-1). Depending on how you attack the problem, you might find yourself wanting to convert between these bytes and the char datatype used in Java for characters. For some purposes, you can mix the two types without problems. For example, you can compare a byte to a char using comparison operators like ==, >, <, etc. For other purposes, you may need to explicitly convert. You can convert a byte into a char by prefixing it with the "cast" operation (char), and can do the conversion in the opposite direction by writing (byte) in front of a char.

The pseudo-random number generator

The particular kind of pseudo-random number generator used in DumbCrypt is called a linear congruential generator. Starting with some initial value n0 (which in our case is the key), this class of generators produces a sequence of values by a rule of the following form:

nk+1 = ank + b (mod m)

The DumbCrypt program uses the coefficient a = 314159, the constant term b = 2718, and the modulus m = 232.

In the Java code, the variable generator holds the current element of the sequence, nk. To step it to the next value, it is multiplied by our value of a and then increased by our value of b:

  generator *= 314159;
  generator += 2718;

The reduction modulo m is implicit, because all arithmetic in Java on the data type int is done modulo 232. (The range runs from -231 to 231-1, rather than from 0 to 232-1, but that is immaterial. It just affects which element of each congruence class is used to represent the class, not the fundamentals of the modular arithmetic.)

Because each element of this sequence is a full 32-bit integer, and we only want an 8-bit byte, a cast operation is used to throw away all but one byte:

  output[i] = (byte) (generator >> 8);

As you can see, before the cast is done, the value is shifted 8 bits to the right. This is because the rightmost bits aren't very random, so it is good to discard them and use the bits to their left.

What to turn in

If you succeed in producing recognizable English text from your ciphertext, all you need to turn in is the deciphered text, along with the standard signed honor pledge, and you will receive full credit.

If you do not succeed, turn in a description of the approach you were trying, a copy of any program code you were using, and information on where you got stuck.


Course web site: http://www.gac.edu/~max/courses/F2006/MCS-377/
Instructor: Max Hailperin <max@gac.edu>