art with code

2008-09-29

Basics of I/O

Files and pipes


The kernel wraps your hardware devices into two basic kinds: pipes and files. A pipe is a lazy FIFO list, a file is a lazy array (well, more like a rope; a list of arrays.) A file can be split into two arrays, a read-only array, and a write-only array. Pipes are either read-only or write-only. A socket consists of a read-only pipe and a write-only pipe.

Things like input devices and network devices are modeled as pipes, while things like RAM and hard drives are modeled as files. Graphics hardware can be either thought of as a pipe or a file, depending on whether it's treated as a computing node that executes drawing instructions or as an editable frame buffer.

The decision on whether to represent a device as a pipe or a file lies in the access patterns possible for the device. If you can access the nth byte of the device without accessing all the preceding bytes, it's a file. If you need to access the bytes from 0 to n-1 in order to access the nth byte, it's a pipe.

If you only do sequential access, you can treat files as pipes. If you want to do random access, you need to treat files as arrays. The easiest way to achieve lazy array semantics for a file is to mmap it.

Data structures


Every file and pipe is a serialized data structure. To write a file, you need to convert your data structure into an array of bytes. To read a file, you need to convert an array of bytes into the corresponding data structure. With pipes, you have a FIFO list of bytes instead of an array.

Most file and especially pipe data structures are lists of structs with a possible header. For example, /etc/passwd is a headerless newline-separated list of structs. MP3 files are lists of MP3 frames, with an optional ID3 metadata header. PNG files have a PNG header, followed by a list of PNG chunks. JPEG files are lists of JPEG markers. UTF-encoded text files are lists of UTF characters with an optional BOM header.

Some file formats encode tree data structures. HTML, XML and JSON are examples of unindexed trees. XML has a header (the "<?xml..."-bit), HTML has an optional one ("<!DOCTYPE..."), and JSON doesn't. SQLite files are indexed trees, the index enabling random access to the tree.

An array-like (i.e. every struct is the same size) data structure also makes it possible to do random access to a file. Examples of array-like file formats are ASCII encoded text, UTF-16 encoded text, PNM images and uncompressed TGAs. Array-like files also make it very easy to do in-place editing.

Serializing data structures


As every data structure is manifest in memory, it is possible to turn a data structure into a continuous array of bytes with no external references. This conversion is called serialization.

To prove that it is possible for any data structure, consider a program with a data structure in memory. Now reallocate the memory used by the program so that the parts not associated with the data structure are in a block of memory separate from the data structure. Then move all data structure blocks after each other and substract the address of the start of the data structure from all the pointers in the data structure, making them relative pointers. You now have a serialized version of the data structure.

Converting the serialized data structure into one that the program can work with is called deserialization. For the above serialization scheme, deserialization is easy: load the serialized data into memory and add the address of the start of the data to every pointer in the data structure.

To make the serialized data structure more portable, you can write a header that describes the type signature of the data structure, e.g. "array of uint32_le". To make the data structure detect errors, you can add metadata for detecting errors: the length of the data and CRC32 or a cryptographic hash (e.g. MD5.)

Using a compression format that already does error checking is likely the best way: not only do you save yourself a headache, compression also reduces the amount of IO required. You can do around 50 cycles of "free" computation per every byte streamed from disk. For a random access at 10ms, the number of cycles is around 50 million. If the compressor takes less than 50 cycles per byte, it is possible to write a compressed file faster than an uncompressed file, and similarly for decompression and reading.

On file access costs


As you need at least one random access to read a file, any file smaller than 1MB takes more time for the seek than the read. Flash drives have 0.1ms seek times, which makes the random access cost "only" a half million cycles, making the 50:50 point lie at around 10kB.

In other words, with a disk, if your file is smaller than 1MB, it's cheaper to read the whole file than to do seeking reads. With flash, reading the whole file is cheaper only when the file is smaller than 10kB.

No comments:

Blog Archive