Next: 4. Examples Up: A Stackable File System Previous: 2. Design

Subsections

3. Implementation

Each of the five primary data structures used in the Linux VFS contains an operations vector describing all of the functions that can be applied to an instance of that data structure. We describe the implementation of these operations not based on the data structure they belong to, but based on one of five implementation categories:

1.: mounting and unmounting a file system
2.: functions creating new objects
3.: data manipulation functions
4.: functions that use file names
5.: miscellaneous functions

For conciseness, we describe the implementation of the 1-2 most important functions in each category. Readers are referred to other documentation[2] for a description of the rest of the file system operations in Linux, and to Wrapfs's sources for their implementation.

There are two important auxiliary functions in Wrapfs. The first function, interpose, takes a lower level dentry and a wrapfs dentry, and creates the links between them and their inodes. When done, the Wrapfs dentry is said to be interposed on top of the dentry for the lower level file system. The interpose function also allocates a new Wrapfs inode, initializes it, and increases the reference counts of the dentries in use. The second important auxiliary function is called hidden_dentry and is the opposite of interpose. It retrieves the lower level (hidden) dentry from a Wrapfs dentry. The hidden dentry is stored in the private data field of struct dentry.

3.1 Mounting and Unmounting

The function read_super performs all of the important actions that occur when mounting Wrapfs. It sets the operations vector of the superblock to that of Wrapfs's, allocates a new root dentry (the root of the mounted file system), and finally calls interpose to link the root dentry to that of the mount point. This is vital for lookups since they are relative to a given directory (see Section 3.2). From that point on, every lookup within the Wrapfs file system will use Wrapfs's own operations.

3.2 Creating New Objects

Several inode functions result in the creation of new inodes and dentries: lookup, link, symlink, mkdir, and mknod. The lookup function is the most complex in this group because it also has to handle negative dentries (ones that do not yet contain valid inodes). Lookup is given a directory inode to look in, and a dentry (containing the pathname) to look for. It proceeds as follows:

1.: encode the file name it was given using encode_filename and get a new one.
2.: find the lower level (hidden) dentry from the Wrapfs dentry.
3.: call Linux's primary lookup function, called lookup_dentry, to locate the encoded file name in the hidden dentry. Return a new dentry (or one found in the directory cache, dcache) upon success.
4.: if the new dentry is negative, interpose it on top of the hidden dentry and return.
5.: if the new dentry is not negative, interpose it and the inodes it refers to, as seen in Figure 4.

3.3 Data Manipulation

File data can be manipulated in one of two ways: (1) the traditional read and write interface can be used to read or write any number of bytes starting at any given offset in a file, and (2) the MMAP interface can be used to map pages of files into a process that can use them as normal data buffers. The MMAP interface can manipulate only whole pages and on page boundaries. Since MMAP support is vital for executing binaries, we decided to manipulate data in Wrapfs in whole pages.

Reading data turned out to be easy. We set the file read function to the general purpose generic_file_read function, and were subsequently required to implement only our version of the readpage inode operation. Readpage is asked to retrieve one page in a given opened file. Our implementation looks for a page with the same offset in the hidden file. If it cannot find one, Wrapfs's readpage allocates a new one. It proceeds by calling the lower file system's readpage function to get the page's data, and then it decodes the data from the hidden page into the Wrapfs page. Finally, Wrapfs's readpage function mimics some of the functionality that generic_file_read performs: it unlocks the page, marks it as referenced, and wakes up anyone who might be waiting for that page.

3.4 File Name Manipulation

As mentioned in Section 3.2, we use the call to encode_filename at every file system function that is given a file name and has to pass it to the lower level file system, such as rmdir. There are only two places where file names are decoded: readlink needs to decode the target of a symlink after having read it from the lower level file system, and readdir needs to decode each file name read from a directory. Readdir is implemented in a similar fashion to other Linux file systems, by using a callback function called ``filldir'' that is used to process one file name at a time.

3.5 Miscellaneous Functions

In Section 3.3 we described some MMAP functions that handle file data. Other than those, we had to implement three MMAP-related functions that are part of the vm_area_struct, but only for shared memory-mapped pages: vm_open, vm_close, and vm_shared_unmap. We implemented them to properly support multiple (shared) mappings to the same page. Shared pages have increased reference counts and they must be handled carefully (see Figure 4 and Section 2.2.2). The rest of the vm_area_struct functions were left implemented or unimplemented as defined by the generic operations vectors of this structure.

This implementation underscored the only change, albeit a crucial one, that we had to make to the Linux kernel. The data structure vm_area_struct is the only one (as of kernel 2.1.129) that does not contain a private data field into which we can store a link from our Wrapfs vm_area object to the hidden one of the lower level file system. This change was necessary to support stacking.¹

All other functions that had to be implemented reproduce the functionality of the generic (upper) level vnode code (see Section 2.2) and follow a similar procedure: for each object passed to the function, they find the corresponding object in the lower level file system, and repeat the same operation on the lower level objects.

Next: 4. Examples Up: A Stackable File System Previous: 2. Design

Erez Zadok
1999-03-29