MCS-378 Lab 3: File Allocation, Fall 2015

Due: November 24, 2015

In this lab, you will experimentally determine how performance-critical the Linux kernel's choice of block groups is when creating new files and directories.

You should carry out Exploration Project 8.6 from the textbook (pages 390–391), except as modified below.

The book discusses the older ext2 filesystem and the then-current ext3 filesystem. I would suggest you work with the now-current ext4 filesystem. (In particular, your computer presumably has the the main Ubuntu installation in an ext4 filesystem. Although your main working filesystem doesn't provide the most controlled, repeatable environment in which to experiment, it is adequate for the purposes of our course.)

The book calls for making two versions of the kernel, one of which is unmodified and one of which spreads newly created files more uniformly. Conducting the experiment would require rebooting repeatedly, switching back and forth between the two kernels. Instead, you can use the same techniques as in the previous lab to conduct your experiments under a single kernel that has selectable behavior. This time it would make sense to put your addition to kernel/sysctl.c into the fs_table array instead of vm_table, so that your control will show up in /proc/sys/fs/ instead of /proc/sys/vm/.

The organization of the inode allocation procedures in fs/ext4/ialloc.c are somewhat different from what the book describes for the older fs/ext2/ialloc.c. The ext4_new_inode procedure contains an if statement that calls find_group_orlov for directories and find_group_other for other files. The find_group_orlov procedure in turn has another if statement that distinguishes top level directories from other cases. Top-level directories are spread out into different block groups, whereas other directories and (even more so) other files are generally allocated near their parent directory, providing locality of disk access. Therefore, to see how much worse performance would be without this locality, it makes sense to introduce a special mode of operation that influences both of these if statements, so that all inode allocations are treated the way top-level directories ordinarily are treated.

You should use the same statistical techniques as in the previous lab and write a report of the same kind.

The Exploration Project asks you to explore whether extra time (assuming it exists) is spent running code or waiting for the disk drive. To do this, you will want to include among the changes you make in your experimental script a different setting for TIMEFORMAT. Changing this from %R to %R,%S,%U would cause the script to record not only the real time but also the system and user CPU times.

Extra opportunities

You can earn extra grade points by investigating either of the following two questions, or feel free to suggest another of your own devising. I'd be glad to discuss how you might approach these.

Presumably you've found that ext4's ordinary allocation strategy provides better performance than scattering all files and directories. How much of this performance advantage can be realized by just placing ordinary files near their containing directories, while scattering the directories around? (This alternative is similar to the situation prior to the introduction of the Orlov allocator).

The laptop I issued you includes a solid state drive (SSD). I didn't install Ubuntu onto it because its capacity is too limited to support our full range of course activities. However, you could install Ubuntu and your modified kernel onto it, boot off of it, and repeat your experiments. Locality of allocation seems unlikely to have any impact on SSD performance. Still, if you avoid experiments expected to show no effect, you'll never allow reality to surprise you with an unexpected effect. Is this one of those surprising learning opportunities, or are your statistics consistent with the expectation that locality doesn't matter on an SSD?

Submitting your report

Your "clear, concise scientific report" should be formatted into a single PDF file that incorporates such elements as a boxplot and an appendix of kernel changes. You should upload that file to moodle.


Instructor: Max Hailperin