For this entire lab, we will concentrate exclusively on Linux's normal
scheduling algorithm, used for processes with the scheduling policy
SCHED_OTHER
. (All normal processes use this policy.) We
will ignore the real-time scheduling policies SCHED_FIFO
and SCHED_RR
.
The scheduler has two primary objectives:
To observe the scheduler's behavior, you will need two test programs to run. One is just a do-nothing infinite loop, to provide a CPU-bound process. The other is the process we use to measure responsiveness. It would be difficult to accurately measure how quickly a program responds to input, such as your typing a character on the keyboard. Therefore, we will instead measure how quickly a program starts running again after it has requested that the operating system put it to sleep for a specified time period.
Your measurement program will have a loop that uses two operating system
calls. One is nanosleep
, which asks the operating system
to put the current process to sleep for a specified number of seconds
and nanoseconds. You will use values large enough to sleep for
several seconds. This is done each time around the loop, with
essentially nothing else. Thus the process is mostly asleep (not
runnable), waking up (running) only very briefly every few seconds.
The other system call the program uses, gettimeofday
,
does what it says. (You should be familiar with it from the prior
lab.) The program uses gettimeofday
twice each time
around the loop - once before and once after the
nanosleep
. By subtraction it computes the extra delay
(latency) beyond the requested sleep time. The program then prints
out this latency for you to see. The programs from the prior lab
should help you write the test programs you need for this lab, but if
you have trouble, please ask: this is not intended to be the focus of
the lab, so you shouldn't get stuck here. To read the on-line
documentation of the system calls, you can use a command like
man nanosleep
To run one of the programs at a specified "niceness" (priority) level,
you can use the system program called nice
. (See the
on-line documentation.) For example, to run a program called
./runner
at approximately half the normal priority, you
would do
nice -10 ./runner
Do some experiments as in the previous lab, using top
, to
see how multiple CPU-bound processes share the CPU when they have
different priority levels (as well as when they have the same
priority). When you change the scheduler, verify that this CPU
sharing behavior remains essentially unchanged.
In experimenting with responsiveness, it is probably easiest to understand what is happening if you have tens of CPU-bound runners going, and then start one "sleeper," that is, the measurement program that repeatedly sleeps and measures how much extra latency there is upon awakening. In this case, there are three relatively distinct time scales that you may observe:
First, by understanding Linux's normal scheduler, predict what combinations of "niceness" (for the runners and the sleeper) will exhibit each of these three behaviors. You may find it easiest to stick with normal priority runners and only consider variations in the priority (niceness) of the sleeper. Test your predictions, and if they prove false, investigate until you have a consistent theoretical and empirical understanding.
Now modify the scheduler and, for each modified kernel, repeat the above process of prediction and experiment. See below for more on how to make a modified kernel. To get full credit for this lab assignment, you need to do both of the following:
counter
when
counter
values are recalculated.
I will issue each team three diskettes. Each diskette is MS-DOS formatted. You can put one of these disks in an already booted Linux machine, or equally well in one of the normal lab machines, and mount it using the command
mount /mnt/floppyAt this point, the diskette is available to you as the /mnt/floppy directory. When you are done reading or writing it, before you physically eject the diskette you should give the command
umount /mnt/floppyYou will need to mount this MS-DOS diskette in order to transport files back and forth between our experimental machines and the normal lab machines. You will need to do this in order to print files, maintain safe backup copies, etc. (Don't treat the hard drive of your experimental machine as a safe place to leave your work.) The experimental machines are not networked.
The other two diskettes for each team happen to also be MS-DOS formatted, but you won't use this feature. Instead, they are for you to make experimental, modified, kernel boot disks as described below.
If you ever want to turn your computer off, first give the
halt
command (while logged in as root).
./runner &you now have one runner process. If you repeat this, you can get two, or three, or whatever. But suppose you want 20? This would be a pain to type in by hand each time you wanted to try the experiment. So I present a shell script that will do the job for you. Put the six lines below into a file called doruns:
#!/bin/bash n=${1:?You need to specify how many.} while [ $n -gt 0 ]; do ./runner & n=`expr $n - 1` doneNow change the settings on that file so that it is executable, by giving the command
chmod +x dorunsNow you can give a command like
./doruns 20to get 20 runner processes going. (Of course, you can also use a number other than 20.)
When you want to get rid of all the runner processes, you can use the command
killall runner(This will work whether you started the processes manually or using the doruns script.)
/usr/src/linux-2.4/kernel/sched.c
, which is the
scheduler. (That annotated copy is linked here as PostScript and as text.)
Read it with an eye to the specific predictions and de-tuning changes
listed above. Your changes should be very small: just a handful of
characters. This is mostly a read-and-understand lab (with
experimental testing of your understanding, possibly resulting in
revised understanding).
Remember when reading the code that we are only concerned with
the scheduling policy SCHED_OTHER
. Also, you can ignore
all code that is specifically for SMP (symmetric multi-processor)
systems. In the annotated version of the code, I've indicated how to
tell which code is for SMP systems.
Note that the normal networked lab machines are running an older version of the Linux kernel than we are using on our experimental machines. Thus you will need to do your editing starting from the sched.c that is on the experimental machine, not the version on the normal networked lab machines.
To rebuild the kernel after editing the sched.c
file,
put a blank
disk in the floppy drive and, with your current directory being
/usr/src/linux-2.4
, give the command
make bzdiskImportant note: as stated above, your current directory should be
/usr/src/linux-2.4
, not
/usr/src/linux-2.4.2
, or making the kernel will take much longer.
To boot your new kernel, leave that disk in the drive and give the command
rebootRemember that kernel hacking is difficult, and debugging especially so. So don't feel bad if your modifications don't work at first. Be sure to consult with me.
Instructor: Max Hailperin