First of all, happy new year to everybody!

I've recently got a MacBook Pro and, while this little machine is great overall, the 5400 RPM hard disk is a noticeable performance bottleneck. Many people I've talked to say that the difference from 5400 to 7200 RPM should not be noticeable because:
  • These 2.5-inch drives use perpendicular recording, hence storing data with a higher bit density. This means that, theorically, they can read/write data more quickly achieving speeds similar to 7200 RPM drives.
  • Modern file systems prevent fragmentation, as described here for HFS+.
To me, these two reasons are valid as long as you manage large files: the file system will try to keep them physically close and the disk will be able to transfer sequential data fairly quickly.

But unfortunately, these ideas break when you have to deal with thousands of tiny files around (or when you flood the drive with requests from different applications, but this is not what I want to talk about today). The easiest way to demonstrate this is to use CVS to manage a copy of pkgsrc on such drives.

Let's start by checking out a fresh copy of pkgsrc from the CVS repository. As long as the file system has a lot of free space (and has not been "polluted" by erased files), this will run quite fast because it will store all new files physically close (theorically in consecutive cylinders). Hence, we take advantage of the higher bit densities and the file system's file allocation policy. Just after the check out operation (or unarchiving of a tarball of the tree), run an update (cvs -z3 -q update -dP) and write down the amount of time it takes. In my specific tests, the update took around 5 minutes, which is a good measure; in fact, it is almost the same I got in my desktop machine with a 7200 RPM disk.

Now start using pkgsrc by building a "big" package; I've been doing tests with mencoder, which has a bunch of dependencies and boost, which installs a ton of files. The object files generated during the builds, as well as the resulting files, will be physically stored "after" pkgsrc. It is likely that there will be "holes" in the disk because you'll be removing the work directories but not the installed files, which will result in a lot of files stored non-contiguously. To make things worse, keep using your machine for a couple of days.

Then, do another update of the whole tree. In my specific tests, the process now takes around 10 minutes. Yes, it has doubled the original measure. This problem was also present with faster disks, but not as noticeable. But do we have to blame the drive for such a slowdown or maybe, just maybe, it is CVS's fault?

The pkgsrc repository contains lots of empty directories that were once populated. However, CVS does not handle such entries very well. During an update, CVS recreates these empty directories locally and, at the end of the process, it erases them provided that you passed the -P (prune) option. Furthermore, every such directory will end up consuming, at least, 5 inodes on the local disk because it will contain a CVS control directory (which typically stores 3 tiny files). This continuous creation and deletion of directories and files fragment the original tree by spreading the updated files all around.

Sincerely, I don't know why CVS works like this (anyone?), but I bet that switching to a superior VCS could mitigate this problem. A temporary solution can be the usage of disk images, holding each source tree individually and keeping its total size as tight as possible. This way one can expect the image to be permanently stored in a contiguous disk area.

Oh, and by the way: Boot Camp really suffers from the slow drive because it creates the Windows partition at the end of the disk; that is, its inner part, which typically has slower access times. (Well, I'm not sure if it'd make any difference if the partition was created at the beginning.) Launching a game such as Half-Life 2 takes forever; fortunately, when it is up it is fast enough.

Update (January 9th): As "r." kindly points out, the slower part of the disk is the inner one, not the outer one as I had previously written (had a lapsus because CDs are written the other way around). And the reason is this: current disks use Zone Bit Recording (ZBR), a technique that fits a different amount of sectors depeding on the track's length. Hence, outer (longer) tracks have more sectors allocated to them and can transfer more data in a single disk rotation.

Go to posts index

Comments from the original Blogger-hosted post: