A file-system directory is an object that maps file names to nodes (i-nodes in UFS terminology). When given a file name, the directory must be able to tell whether it has that name or not and return the node number attached to it. File names are not stored in the nodes themselves as this allows for hard link creation flawlessly: you can have multiple directory entries pointing to the same file with no extra cost.

Let's see a very simple directory implementation, coming from NetBSD's tmpfs (see tmpfs.h). This file system implements directories by using a linked list (struct tmpfs_dir) of directory entries (struct tmpfs_dirent). These structures look like:
struct tmpfs_dirent {
TAILQ_ENTRY(tmpfs_dirent) td_entries;
uint16_t td_namelen;
char *td_name;
struct tmpfs_node *td_node;
};
TAILQ_HEAD(tmpfs_dir, tmpfs_dirent);
Of special interest are the td_name field, which holds the entry name (file name), and the td_node pointer, which points to the file system node related to this directory entry (UFS could use i-node numbers instead).

This implementation is really simple as it is completely backed by virtual memory; adding and removing entries is as easy as allocating a memory block and modifying the linked list accordingly. It could, of course, be more complex if it used a B+ tree or some other structure instead.

However, on-disk file systems do extra tasks to optimize directory accesses. For example, when an entry is removed, it is marked as deleted using a special flag but it is not really removed from disk, because shrinking the directory could be expensive. Similarly, new entries overwrite previously deleted entries, if possible.

In the next post, we will outline how some directory operations (such as lookup and readdir) work.

Post suggested by Pavel Cahyna.