MC78 Lab 4: Deadlock Detection and Recovery (Fall 1996)

Due: December 12, 1996

Goals of the lab

In this lab your team has the freedom to either do the normal lab assignment, on deadlock detection and recovery, which is described in this handout, or can instead design an alternative project of its own. See the section at the end of this handout on alternative projects if you are interested in this possibility.

Assuming you go with the mainstream lab, you will be building on your work from lab 2, in which you implemented locks and condition variables in Nachos. In this lab you will add the ability to detect deadlocks and even recover from them. Your system only needs to detect deadlocks formed using just locks, you needn't worry about deadlocks involving semaphores or a mix of semaphores and locks.

The first thing to do is to add another operation to the Lock class, namely TryToAcquire. It should be like Acquire except that rather than returning void, i.e., no value, it returns a bool, which should be true if the lock was successfully acquired (possibly after waiting for another thread to release it), and false if the lock was not acquired because waiting for it would have formed a deadlock. The Acquire method should be rewritten to simply do the following:

ASSERT(TryToAcquire());

I.e., it should leave all the real work to TryToAcquire, and if it receives the bad news that a deadlock was detected, then it should halt Nachos's execution completely with the assertion failure. Note that to implement this deadlock detection, you may need to make other changes, for example keeping track in the Thread objects for blocked threads of which Lock they are waiting for.

At this point you should have a system that does deadlock detection, but has no recovery mechanism other than causing the whole Nachos system to bomb out. The first test you do should be that a non-deadlocking system still works as before; the prime number sieve from lab 2 would be a fine test for this. Next you should modify what the ThreadTest procedure (in sieve.cc) does such that a deadlock results, and verify that the deadlock is detected and ASSERT in Acquire duly aborts Nachos. (Your modified code should still be using Acquire, not (directly) TryToAcquire.) In order to make the deadlock actually happen, you may need to insert currentThread->Yield(); into your threads at appropriate points (as well as the more substantive changes to make the deadlock possible), so that the two threads executions are appropriately interleaved. The debugging output may help you make sure that you understand what is happening.

Now you are ready to make a more sophisticated test program that uses TryToAcquire to detect a deadlock, but then recovers itself from it in a more sophisticated manner. In particular, set up two threads that each need two locks, let's say A and B. One thread should acquire A, then try to acquire B, while the other acquires B, then tries to acquire A. However, in each thread if the attempt to acquire the second lock fails (due to deadlock detection), the thread should recover by releasing the lock it already holds, yielding, and then starting the process over. Use debugging output to ensure that indeed the deadlock situation is being detected and recovered from. As before, you may need to put in some strategically placed yields to make the deadlock actually occur.

Origin of this lab

This code for this lab come from the Nachos simulated operating system developed by Tom Anderson at Berkeley for educational use. The assignment, however, is original.

What to turn in

Turn in a jointly-authored lab report containing your changes to the files you needed to change, the test results showing your program's behavior, and a brief description of the logic behind your program.

Alternative projects

If you would rather, you are welcome to propose an alternative project to me. I will entertain any suggestion related to the course. It can be in the context of Nachos, Java, Linux, or any other system. It should involve concurrency and/or networked distributed systems in some way. Possibilities include, but are in no way limited to

extending the Nachos file system from the prior lab in some interesting way
writing some interesting Java distributed system, perhaps another game
writing a program that uses features of Linux (such as the fsync system call) to do write-ahead logging in order to provide atomic transactions

Whatever you choose to do, please run the idea by me first for approval before you start work.

Instructor: Max Hailperin