MCS-378 Lab 3: File Allocation, Fall 2013

Due: November 26, 2013

In this lab, you will experimentally determine how performance-critical the Linux kernel's choice of block groups is when creating new files and directories.

You should carry out Exploration Project 8.6 from the textbook (pages 386-387), except as modified below.

The book discusses the older ext2 filesystem and the then-current ext3 filesystem. I would suggest you work with the now-current ext4 filesystem.

The book calls for making two versions of the kernel, one of which is unmodified and one of which spreads newly created files more uniformly. Conducting the experiment would require rebooting repeatedly, switching back and forth between the two kernels. Instead, you can use the same techniques as in the previous lab to conduct your experiments under a single kernel that has selectable behavior. This time it would make sense to put your addition to kernel/sysctl.c into the fs_table array instead of vm_table, so that your control will show up in /proc/sys/fs/ instead of /proc/sys/vm/.

The organization of the inode allocation procedures in fs/ext4/ialloc.c are somewhat different from what the book describes for the older fs/ext2/ialloc.c. The ext4_new_inode procedure contains an if statement that calls find_group_orlov for directories and find_group_other for other files. The find_group_orlov procedure in turn has another if statement that distinguishes top level directories from other cases. Top-level directories are spread out into different block groups, whereas other directories and (even more so) other files are generally allocated near their parent directory, providing locality of disk access. Therefore, to see how much worse performance would be without this locality, it makes sense to introduce a special mode of operation that influences both of these if statements, so that all inode allocations are treated the way top-level directories ordinarily are treated.

You should use the same statistical techniques as in the previous lab and write a report of the same kind.

You can decide whether you want to conduct your experiments in your main file system (which is partially full and has gone through a complicated history of having files created and deleted), or in a separate file system in another partition. If you use a separate file system, you can decide whether you want to use a freshly created empty file system or use some standardized but non-freshly-created file system contents of your devising. Rather than deciding between a fresh file system and a used one, you could also consider conducting a comparison of the two.

The Exploration Project asks you to explore whether extra time (assuming it exists) is spent running your code or waiting for the disk drive. To do this, you likely will want to change your experimental script's setting of TIMEFORMAT. For example, changing it from %R to %R,%S,%U would cause you to record not only the real time but also the system and user CPU times.


Instructor: Max Hailperin