Content Warning: this post contains blatant self-promotion.
Contributions to engineering fields can only reasonably be assessed in hindsight, by looking at how they survived exposure to the real world over the long term. Four of my contributions to various systems have stood the test of time. Below the fold, I blow my own horn four times.
Four Decades
X11R1 on a Sun/1 |
All this while I was also working on a competitor, Sun's NeWS — which didn't survive the test of time.
Nearly Three-and-a-Half Decades
One of the things I really enjoyed about working on NeWS was that the PostScript environment it implemented was object-oriented, a legacy of PostScript's origins at Xerox PARC. Owen Densmore and I developed A User‐Interface Toolkit in Object‐Oriented PostScript that made developing NeWS applications very easy, provided you were comfortable with an object-oriented programming paradigm.I think it was sometime in 1988 while working on the SunOS 4.0 kernel that I realized that the BSD Vnode interface was in a loose sense object-oriented. It defines the interface between the file system and the rest of the kernel. An instance of BSD's type vnode consisted of some instance data and a pointer to an "ops vector" that defined its class via an array of methods (function pointers). But it wasn't object-oriented enough to, for example, implement inheritance properly.
This flaw had led to some inelegancies as the interface had evolved through time, but what interested me more was the potential applications that would be unleashed if the interface could be made properly object-oriented. Instead of being implemented from scratch, file systems could be implemented by sub-classing other file systems. For example, a read-only file system such as a CD-ROM could be made writable by "stacking" a cache file system on top, as shown in Figure 11. I immediately saw the possibility of significant improvements in system administration that could flow from stacking file systems.
Evolving the Vnode Interface: Fig. 11 |
This simple module can use any file system as a file-level cache for any other (read-only) file system. It has no knowledge of the file systems it is using; it sees them only via their opaque vnodes. Figure 11 shows it using a local writable ufs file system to cache a remote read-only NFS file system, thereby reducing the load on the server. Another possible configuration would be to use a local writable ufs file system to cache a CD-ROM, obscuring the speed penalty of CD.Over the next quarter-century the idea of stacking vnodes and the related idea of "union mounts" from Rob Pike and Plan 9 churned around until, in October 2014, Linus Torvalds added overlayfs to the 3.18 kernel. I covered the details of this history in 2015's It takes longer than it takes. In it I quoted from Valerie Aurora's excellent series of articles about the architectural and implementation difficulties involved in adding union mounts to the Linux kernel. I concurred with her statement that:
The consensus at the 2009 Linux file systems workshop was that stackable file systems are conceptually elegant, but difficult or impossible to implement in a maintainable manner with the current VFS structure. My own experience writing a stacked file system (an in-kernel chunkfs prototype) leads me to agree with these criticisms.I wrote:
Note that my original paper was only incidentally about union mounts, it was a critique of the then-current VFS structure, and a suggestion that stackable vnodes might be a better way to go. It was such a seductive suggestion that it took nearly two decades to refute it!Nevertheless, the example I used in Evolving the Vnode Interface of a use for stacking vnodes was what persisted. It took a while for the fact that overlayfs was an official part of the Linux kernel to percolate through the ecosystem, but after six years I was able to write Blatant Self-Promotion about the transformation it wrought on Linux's packaging and software distribution, inspired by Liam Proven's NixOS and the changing face of Linux operating systems. He writes about less radical ideas than NixOS:
So, instead of re-architecting the way distros are built, vendors are reimplementing similar functionality using simpler tools inherited from the server world: containers, squashfs filesystems inside single files, and, for distros that have them, copy-on-write filesystems to provide rollback functionality.Since then this model has become universal. Distros ship as a bootable ISO image, which uses overlayfs to mount a writable temporary file system on top. This is precisely how my 1989 prototype was intended to ship SunOS 4.1. The technology has spread to individual applications with systems such as Snaps and Flatpak.
The goal is to build operating systems as robust as mobile OSes: periodically, the vendor ships a thoroughly tested and integrated image which end users can't change and don't need to. In normal use, the root filesystem is mounted read-only, and there's no package manager.
Three Decades
The opportunity we saw when we started Nvidia was that the PC was transitioning from the ISA bus to version 1 of the PCI bus. The ISA bus' bandwidth was completely inadequate for 3D games, but the PCI bus had considerably more. Whether it was enough was an open question. We clearly needed to make the best possible use of the limited bandwidth we could get.Nvidia's first chip had three key innovations:
- Rendering objects with quadric patches not triangles. A realistic model using quadric patches needed perehaps a fifth of the data for an equivalent triangle model.
- I/O virtualization with applications using a write-mostly, object-oriented interface. Read operations are neccessarily synchronous, whereas write operations are asynchronous. Thus the more writes per read across sthe bus, the better the utilization of the available bus bandwidth.
- A modular internal architecture based on an on-chip token-ring network. Thie goal was that each functional unit be simple enough to be designed and tested by a three-person team.
SEGA's Virtua Fighter on NV1 |
- I/O virtualization allowed multiple processes direct access to the graphics hardware, with no need to pass operations through the operating system. I explained the importance of this a decade ago in Hardware I/O Virtualization, using the example of Amazon building their own network interface cards. Tne first chip appeared on the bus as having 128 wide FIFOs. The operating system could map one of them into each process wanting access to the chip, allowing applications direct access to the hardware but under the control of the operating system.
- The interface was write-mostly because the application could read from the FIFO the number of free slots, that is the number of writes before the bus would stall.
- The interface was object-oriented because the data and the offset in the FIFO formed an invocation of a method on an instance of a (virtual) class. Some classes were implemented in hardware, others trapped into the kernel and were implemented by the driver, but the application just created and used instances of the available classes without knowing which was which. The classes were arranged in a hierarchy starting with class CLASS. Enumerating the instances of class CLASS told the application which classes it could use. Enumerating the instances of each of those classes told the application how many of each type of resource it could use.
Cachefs plus immutable distributions.... that was Pravda, right?
ReplyDeleteRight, that was my name for the project. The point being to establish the official "party line" and identify any deviations from it.
ReplyDeleteAndreas Spies has a useful video explaining how you can greatly extend the life of SD cards with the Raspbian OS for the Pi with two clicks. Raspbian implements a setup option to layer overlayfs over the SD card, thus avoiding writes to it.
ReplyDeleteEnable and disable overlayfs from the command line with
ReplyDeletesudo raspi-config nonint enable_overlayfs or sudo raspi-config nonint disable_overlayfs