Next: 3. Implementation Up: A Stackable File System Previous: 1. Introduction

Subsections

2. Design

The design of Wrapfs concentrated on the following:

1.: Simplifying the developer API so that it addresses most of the needs of users developing file systems using Wrapfs.
2.: Adding a stackable vnode interface to Linux with minimal changes to the kernel, and with no changes to other file systems.
3.: Keeping the performance overhead of Wrapfs as low as possible.

The first two points are discussed below. Performance is addressed in Section 5.

2.1 Developer API

There are three parts of a file system that developers wish to manipulate: file data, file names, and file attributes. Of those, data and names are the most important and also the hardest to handle. File data is difficult to manipulate because there are many different functions that use them such as read and write, and the memory-mapping (MMAP) ones; various functions manipulate files of different sizes at different offsets. File names are complicated to use not just because many functions use them, but also because the directory reading function, readdir, is a restartable function.

We created four functions that Wrapfs developers can use. These four functions address the manipulation of file data and file names:

1.: encode_data: takes a buffer of 4KB or 8KB size (typical page size), and returns another buffer. The returned buffer has the encoded data of the incoming buffer. For example, an encryption file system can encrypt the incoming data into the outgoing data buffer. This function also returns a status code indicating any possible error (negative integer) or the number of bytes successfully encoded.
2.: decode_data: is the inverse function of encode_data and otherwise has the same behavior. An encryption file system, for example, can use this to decrypt a block of data.
3.: encode_filename: takes a file name string as input and returns a newly allocated and encoded file name of any length. It also returns a status code indicating either an error (negative integer) or the number of bytes in the new string. For example, a file system that converts between Unix and MS-DOS file names can use this function to encode long mixed-case Unix file names into short 8.3-format upper-case names used in MS-DOS.
4.: decode_filename: is the inverse function of encode_filename and otherwise has the same behavior.

With the above functions available, file system developers that use Wrapfs as a template can implement most of the desired functionality of their file system in a few places and not have to worry about the rest.

File system developers may also manipulate file attributes such as ownership and modes. For example, a simple intrusion avoidance file system can prevent setting the setuid bit on any root-owned executables. Such a file system can declare certain important and seldom changing binaries (such as /bin/login) as immutable, to deny a potential attacker from replacing them with trojans, and may even require an authentication key to modify them. Inspecting or changing file attributes in Linux is easy, as they are trivially available by dereferencing the inode structure's fields. Therefore, we decided not to create a special API for manipulating attributes, so as not to hinder performance for something that is easily accessible.

2.2 Kernel Issues

Without stackable file system support, the divisions between file system specific code and the more general (upper) code are relatively clear, as depicted in Figure 2.

**Figure 2:** Normal File System Boundaries
$\begin{figure} \begin{centering} \epsfig{file=figures/fs-boundaries.eps}\vspace{-0.5em} \end{centering}\end{figure}$

When a stackable file system such as Wrapfs is added to the kernel, these boundaries are obscured, as seen in Figure 3.

**Figure 3:** File System Boundaries with Wrapfs
$\begin{figure} \begin{centering} \epsfig{file=figures/wrapfs-boundaries.eps}\vspace{-0.5em} \end{centering}\end{figure}$

Wrapfs assumes a dual responsibility: it must appear to the layer above it (upper-1) as a native file system (lower-2), and at the same time it must treat the lower level native file system (lower-1) as a generic vnode layer (upper-2).

This dual role presents a serious challenge to the design of Wrapfs. The file system boundary as depicted in Figure 2 does not divide the file system code into two completely independent sections. A lot of state is exchanged and assumed by both the generic (upper) code and native (lower) file systems. These two parts must agree on who allocates and frees memory buffers, who creates and releases locks, who increases and decreases reference counts of various objects, and so on. This coordinated effort between the upper and lower halves of the file system must be perfectly maintained by Wrapfs in its interaction with them.

2.2.1 Call Sequence and Existence

The Linux vnode interface contains several classes of functions:

mandatory: these are functions that must be implemented by each file system. For example, the read_inode superblock operation which is used to initialize a newly created inode (read its fields from the mounted file system).
semi-optional: functions that must either be implemented specifically by the file system, or set to use a generic version offered for all common file systems. For example, the read file operation can be implemented by the specific file system, or it can be set to a general purpose read function called generic_file_read which offers read functionality for file systems that use the page cache.
optional: functions that can be safely left unimplemented. For example, the inode readlink function is necessary only for file systems that support symbolic links.
dependent: these are functions whose implementation or existence depends on other functions. For example, if the file operation read is implemented using generic_file_read, then the inode operation readpage must also be implemented. In this case, all reading in that file system is performed using the MMAP interface.

Wrapfs was designed to accurately reproduce the aforementioned call sequence and existence checking of the various classes of file system functions.

2.2.2 Data Structures

There are five primary data structures that are used in Linux file systems:

1.: super_block: represents an instance of a mounted file system (also known as struct vfs in BSD).
2.: inode: represents a file object in memory (also known as struct vnode in BSD).
3.: dentry: represents an inode that is cached in the Directory Cache (dcache) and also includes its name. This structure is extended in Linux 2.1, and combines several older facilities that existed in Linux 2.0. A dentry is an abstraction that is higher than an inode. A negative dentry is one which does not (yet) contain a valid inode; otherwise, the dentry contains a pointer to its corresponding inode.
4.: file: represents an open file or directory object that is in use by a process. A file is an abstraction that is one level higher than the dentry. The file structure contains a valid pointer to a dentry.
5.: vm_area_struct: represents custom per-process virtual memory manager page-fault handlers.

The key point that enables stacking is that each of the major data structures used in the file system contain a field into which file system specific data can be stored. Wrapfs uses that private field to store several pieces of information, especially a pointer to the corresponding lower level file system's object. Figure 4 shows

**Figure 4:** Connections Between Wrapfs and the Stacked-on File System
$\begin{figure} \begin{centering} \epsfig{file=figures/refcounts.eps}\vspace{-0.5em} \end{centering}\end{figure}$

the connections between some objects in Wrapfs and their corresponding objects in the stacked-on file system, as well as the regular connections between the objects within the same layer. When a file system operation in Wrapfs is called, it finds the corresponding lower level's object from the current one, and repeats the same operation on the lower object.

Figure 4 also suggests one additional complication that Wrapfs must deal with carefully -- reference counts. Whenever more than one file system object refers to a single instance of another object, Linux employs a traditional reference counter in the referred-to object (possibly with a corresponding mutex lock variable to guarantee atomic updates to the reference counter). Within a single file system layer, each of the file, dentry, and inode objects for the same file will have a reference count of one. With Wrapfs in place, however, the dentry and inode objects of the lower level file system must have a reference count of two, since there are two distinct objects referring to each. These additional pointers between objects are ironically necessary to keep Wrapfs as independent from other layers as possible. The horizontal arrows in Figure 4 represent links that are part of the Linux file system interface and cannot be avoided. The vertical arrows represent those that are necessary for stacking. The higher reference counts ensure that the lower level file system and its objects could not disappear and leave Wrapfs's objects pointing to invalid objects.

2.2.3 Caching

Wrapfs keeps independent copies of its own data structures and objects. For example, each dentry contains the component name of the file it represents. (In an encryption file system, for example, the upper dentry will contain the cleartext name while the lower dentry contain the ciphertext name.) We pursued this independence and designed Wrapfs to be as separate as possible from the file system layers above and below it. This means that Wrapfs keeps its own copies of cached objects, reference counts, and memory mapped pages -- allocating and freeing these as necessary.

Such a design not only promotes greater independence, but also improves performance, as data is served off of a cache at the top of the stack. Cache incoherency could result if pages at different layers are modified independently[7]. We therefore decided that higher layers would be more authoritative. For example, when writing to disk, cached pages for the same file in Wrapfs overwrite their EXT2 counterparts. This policy correlates with the most common case of cache access, through the uppermost layer.

Next: 3. Implementation Up: A Stackable File System Previous: 1. Introduction

Erez Zadok
1999-03-29