MCS-378 Lab 2: De-tuning Linux's Scheduler, Fall 2001

Due: October 17, 2001

In this lab, you will come to better understand Linux's scheduling algorithm through "de-tuning" experiments. That is, you will modify the scheduler to be worse than it normally is and measure the undesirable effects of your changes. This should help you appreciate the benefits conferred by the normal (unmodified) design. This lab doesn't involve much programming, but requires non-trivial understanding; don't be embarrassed to ask questions.

For this entire lab, we will concentrate exclusively on Linux's normal scheduling algorithm, used for processes with the scheduling policy SCHED_OTHER. (All normal processes use this policy.) We will ignore the real-time scheduling policies SCHED_FIFO and SCHED_RR.

The scheduler has two primary objectives:

When there are multiple CPU-bound processes, they should share the processor in accordance with their priorities. For example, if there are two normal-priority processes and one low-priority process, the two normal-priority processes might each get 2/5 of the CPU's time and the low-priority process 1/5. (Other ratios than 2:1 are also possible.)
When a process that hasn't been runnable becomes runnable, it should soon thereafter run, especially if its priority isn't low relative to other runnable processes. (This keeps interactive processes responsive, streaming-media delivery relatively jitter-free, etc.)

In this lab, we will focus on understanding the mechanisms that achieve objective 2, making changes that impact the "responsiveness" of the system, i.e., the extra latency (delay) from when a process becomes runnable to when it runs. With respect to objective 1 (CPU sharing), you will only be asked to show that the scheduler changes have no significant impact on it.

To observe the scheduler's behavior, you will need two test programs to run. One is just a do-nothing infinite loop, to provide a CPU-bound process. The other is the process we use to measure responsiveness. It would be difficult to accurately measure how quickly a program responds to input, such as your typing a character on the keyboard. Therefore, we will instead measure how quickly a program starts running again after it has requested that the operating system put it to sleep for a specified time period.

Your measurement program will have a loop that uses two operating system calls. One is nanosleep, which asks the operating system to put the current process to sleep for a specified number of seconds and nanoseconds. You will use values large enough to sleep for several seconds. This is done each time around the loop, with essentially nothing else. Thus the process is mostly asleep (not runnable), waking up (running) only very briefly every few seconds. The other system call the program uses, gettimeofday, does what it says. (You should be familiar with it from the prior lab.) The program uses gettimeofday twice each time around the loop - once before and once after the nanosleep. By subtraction it computes the extra delay (latency) beyond the requested sleep time. The program then prints out this latency for you to see. The programs from the prior lab should help you write the test programs you need for this lab, but if you have trouble, please ask: this is not intended to be the focus of the lab, so you shouldn't get stuck here. To read the on-line documentation of the system calls, you can use a command like

man nanosleep

To run one of the programs at a specified "niceness" (priority) level, you can use the system program called nice. (See the on-line documentation.) For example, to run a program called ./runner at approximately half the normal priority, you would do

nice -10 ./runner

Do some experiments as in the previous lab, using top, to see how multiple CPU-bound processes share the CPU when they have different priority levels (as well as when they have the same priority). When you change the scheduler, verify that this CPU sharing behavior remains essentially unchanged.

In experimenting with responsiveness, it is probably easiest to understand what is happening if you have tens of CPU-bound runners going, and then start one "sleeper," that is, the measurement program that repeatedly sleeps and measures how much extra latency there is upon awakening. In this case, there are three relatively distinct time scales that you may observe:

If the sleeper upon awakening preempts the currently running runner, and then is chosen as the next process to run, it can run immediately. However, because the Linux timer has a 10 ms granularity, you may still see a delay of this magnitude.
If the awakening sleeper does not preempt the current runner, the current runner will finish its current timeslice before the next process is selected. For a normal priority process, the timeslice is 60 ms. Thus, if there is no preemption, but the awakening sleeper is chosen to run next, you may see a latency of tens of milliseconds, depending on when in the runner's time slice the sleeper wakes up.
If the awakened sleeper is not selected as the next process to run, but rather has to wait for all the queued up runners to run first, then you may see latencies on the order of a second, because you have tens of runners each allotted tens of milliseconds.

First, by understanding Linux's normal scheduler, predict what combinations of "niceness" (for the runners and the sleeper) will exhibit each of these three behaviors. You may find it easiest to stick with normal priority runners and only consider variations in the priority (niceness) of the sleeper. Test your predictions, and if they prove false, investigate until you have a consistent theoretical and empirical understanding.

Now modify the scheduler and, for each modified kernel, repeat the above process of prediction and experiment. See below for more on how to make a modified kernel. To get full credit for this lab assignment, you need to do both of the following:

Make a kernel in which the only change is that processes no longer keep half of their unused counter when counter values are recalculated.
Make a kernel that has the change described above and also adds newly woken processes to the tail of the run queue rather than the head.

Don't overlook: You should also show that neither kernel change has much impact on how CPU-bound processes share the CPU.

Lab facilities

Each team will have an older PC in the OHS 329 lab which will be for the exclusive use of that team, and for which they will have the root password. This will allow you to experiment with kernel modifications. We've got a special CD-ROM and boot disks which we can use to bring the machine back to a clean state; I'll show you this. So long as you don't make too much trouble for the rest of us, you can feel free to do whatever other experiments you want on these machines.

I will issue each team three diskettes. Each diskette is MS-DOS formatted. You can put one of these disks in an already booted Linux machine, or equally well in one of the normal lab machines, and mount it using the command

mount /mnt/floppy

At this point, the diskette is available to you as the /mnt/floppy directory. When you are done reading or writing it, before you physically eject the diskette you should give the command

umount /mnt/floppy

You will need to mount this MS-DOS diskette in order to transport files back and forth between our experimental machines and the normal lab machines. You will need to do this in order to print files, maintain safe backup copies, etc. (Don't treat the hard drive of your experimental machine as a safe place to leave your work.) The experimental machines are not networked.

The other two diskettes for each team happen to also be MS-DOS formatted, but you won't use this feature. Instead, they are for you to make experimental, modified, kernel boot disks as described below.

If you ever want to turn your computer off, first give the halt command (while logged in as root).

Tip: automating running a lot of processes

Suppose you want have a program called runner that does nothing but run. If you give the command

./runner &

you now have one runner process. If you repeat this, you can get two, or three, or whatever. But suppose you want 20? This would be a pain to type in by hand each time you wanted to try the experiment. So I present a shell script that will do the job for you. Put the six lines below into a file called doruns:

#!/bin/bash
n=${1:?You need to specify how many.}
while [ $n -gt 0 ]; do
  ./runner &
  n=`expr $n - 1`
done

Now change the settings on that file so that it is executable, by giving the command

chmod +x doruns

Now you can give a command like

./doruns 20

to get 20 runner processes going. (Of course, you can also use a number other than 20.)

When you want to get rid of all the runner processes, you can use the command

killall runner

(This will work whether you started the processes manually or using the doruns script.)

The scheduler - and modifying it

I've provided you with an annotated copy of the file /usr/src/linux-2.4/kernel/sched.c, which is the scheduler. (That annotated copy is linked here as PostScript and as text.) Read it with an eye to the specific predictions and de-tuning changes listed above. Your changes should be very small: just a handful of characters. This is mostly a read-and-understand lab (with experimental testing of your understanding, possibly resulting in revised understanding).

Remember when reading the code that we are only concerned with the scheduling policy SCHED_OTHER. Also, you can ignore all code that is specifically for SMP (symmetric multi-processor) systems. In the annotated version of the code, I've indicated how to tell which code is for SMP systems.

Note that the normal networked lab machines are running an older version of the Linux kernel than we are using on our experimental machines. Thus you will need to do your editing starting from the sched.c that is on the experimental machine, not the version on the normal networked lab machines.

To rebuild the kernel after editing the sched.c file, put a blank disk in the floppy drive and, with your current directory being /usr/src/linux-2.4, give the command

make bzdisk

Important note: as stated above, your current directory should be /usr/src/linux-2.4, not /usr/src/linux-2.4.2, or making the kernel will take much longer.

To boot your new kernel, leave that disk in the drive and give the command

reboot

Remember that kernel hacking is difficult, and debugging especially so. So don't feel bad if your modifications don't work at first. Be sure to consult with me.

Thoughts for your lab report

Some things you may want to do in your lab report are:

Explain the significance of relevant portions of the normal Linux scheduler: how do they contribute to good responsiveness?
Explain the changes you made to the Linux scheduler. What changes did you make? Why would these changes hurt responsiveness? Under what circumstances?
Explain your experiments to measure responsiveness. What did you measure? Under what circumstances?
Present the results of your experiment, for example in clear tables or graphs.
Interpret your results. How do your measured data fit with your understanding of how the variant schedulers would behave?

Instructor: Max Hailperin