From cf75bd0d98d91debae974f213700ad77d21e267b Mon Sep 17 00:00:00 2001
From: phk <phk@FreeBSD.org>
Date: Mon, 18 Feb 2002 09:48:59 +0000
Subject: [PATCH] The DEVFS paper presented at BSDcon-euro 2001 and
 BSDcon-2002.

---
 share/doc/papers/Makefile       |    2 +-
 share/doc/papers/devfs/Makefile |    8 +
 share/doc/papers/devfs/paper.me | 1277 +++++++++++++++++++++++++++++++
 3 files changed, 1286 insertions(+), 1 deletion(-)
 create mode 100644 share/doc/papers/devfs/Makefile
 create mode 100644 share/doc/papers/devfs/paper.me

diff --git a/share/doc/papers/Makefile b/share/doc/papers/Makefile
index f8538316ddfa..1be030cb4710 100644
--- a/share/doc/papers/Makefile
+++ b/share/doc/papers/Makefile
@@ -1,7 +1,7 @@
 # $FreeBSD$
 
 SUBDIR=	beyond4.3 bufbio diskperf fsinterface jail kernmalloc kerntune \
-	malloc newvm nqnfs px relengr sysperf \
+	malloc newvm nqnfs px relengr sysperf devfs \
 	contents
 
 .include <bsd.subdir.mk>
diff --git a/share/doc/papers/devfs/Makefile b/share/doc/papers/devfs/Makefile
new file mode 100644
index 000000000000..6b0e6fed8071
--- /dev/null
+++ b/share/doc/papers/devfs/Makefile
@@ -0,0 +1,8 @@
+# $FreeBSD$
+
+VOLUME=	papers
+DOC=	devfs
+SRCS=	paper.me
+MACROS=	-me
+
+.include <bsd.doc.mk>
diff --git a/share/doc/papers/devfs/paper.me b/share/doc/papers/devfs/paper.me
new file mode 100644
index 000000000000..34ffd1712e7f
--- /dev/null
+++ b/share/doc/papers/devfs/paper.me
@@ -0,0 +1,1277 @@
+.\" format with ditroff -me
+.\" $FreeBSD$
+.\" format made to look as a paper for the proceedings is to look
+.\" (as specified in the text)
+.if n \{ .po 0
+.	ll 78n
+.	na
+.\}
+.if t \{ .po 1.0i
+.	ll 6.5i
+.	nr pp 10		\" text point size
+.	nr sp \n(pp+2		\" section heading point size
+.	nr ss 1.5v		\" spacing before section headings
+.\}
+.nr tm 1i
+.nr bm 1i
+.nr fm 2v
+.he ''''
+.de bu
+.ip \0\s-2\(bu\s+2
+..
+.lp
+.rs
+.ce 5
+.sp
+.sz 14
+.b "Rethinking /dev and devices in the UNIX kernel"
+.sz 12
+.sp
+.i "Poul-Henning Kamp"
+.sp .1
+.i "<phk@FreeBSD.org>"
+.i "The FreeBSD Project"
+.i
+.sp 1.5
+.b Abstract
+.lp
+An outstanding novelty in UNIX at its introduction was the notion
+of ``a file is a file is a file and even a device is a file.''
+Going from ``hardware only changes when the DEC Field engineer is here''
+to ``my toaster has USB'' has put serious strain on the rather crude
+implementation of the ``devices as files'' concept, an implementation which
+has survived practically unchanged for 30 years in most UNIX variants.
+Starting from a high-level view of devices and the semantics that
+have grown around them over the years, this paper takes the audience on a
+grand tour of the redesigned FreeBSD device-I/O system, 
+to convey an overview of how it all fits together, and to explain why
+things ended up as they did, how to use the new features and 
+in particular how not to.
+.sp
+.if t \{ 
+.2c
+.\}
+.\" end boilerplate... paper starts here.
+.sh 1 "Introduction"
+.sp
+There are really only two fundamental ways to conceptualise
+I/O devices in an operating system:
+The usual way and the UNIX way.
+.lp
+The usual way is to treat I/O devices as their own class of things,
+possibly several classes of things, and provide APIs tailored
+to the semantics of the devices.
+In practice this means that a program must know what it is dealing
+with, it has to interact with disks one way, tapes another and
+rodents yet a third way, all of which are different from how it
+interacts with a plain disk file.
+.lp
+The UNIX way has never been described better than in the very first
+paper 
+published on UNIX by Ritchie and Thompson [Ritchie74]:
+.(q
+Special files constitute the most unusual feature of the UNIX filesystem.
+Each supported I/O device is associated with at least one such file.
+Special files are read and written just like ordinary disk files,
+but requests to read or write result in activation of the associated device.
+An entry for each special file resides in directory /dev,
+although a link may be made to one of these files just as it may to an
+ordinary file.
+Thus, for example, to write on a magnetic tape one may write on the file /dev/mt.
+
+Special files exist for each communication line, each disk, each tape drive,
+and for physical main memory.
+Of course, the active disks and the memory special files are protected from indiscriminate access.
+
+There is a threefold advantage in treating I/O devices this way:
+file and device I/O are as similar as possible;
+file and device names have the same syntax and meaning,
+so that a program expecting a file name as a parameter can be passed a device name;
+finally, special files are subject to the same protection mechanism as regular files.
+.)q
+.lp
+.\" (Why was this so special at the time?)
+At the time, this was quite a strange concept; it was totally accepted
+for instance, that neither the system administrator nor the users were
+able to interact with a disk as a disk.
+Operating systems simply
+did not provide access to disk other than as a filesystem.
+Most vendors did not even release a program to initialise a
+disk-pack with a filesystem: selling pre-initialised and ``quality
+tested'' disk-packs was quite a profitable business.
+.lp
+In many cases some kind of API for reading and
+writing individual sectors on a disk pack
+did exist in the operating system,
+but more often than not
+it was not listed in the public documentation.
+.sh 2 "The traditional implementation"
+.lp
+.\" (Explain how opening /dev/lpt0 lands you in the right device driver)
+The initial implementation used hardcoded inode numbers [Ritchie98].
+The console
+device would be inode number 5, the paper-tape-punch number 6 and so on,
+even if those inodes were also actual regular files in the filesystem.
+.lp
+For reasons one can only too vividly imagine, this was changed and 
+Thompson
+[Thompson78]
+describes how the implementation now used ``major and minor''
+device numbers to index though the devsw array to the correct device driver.
+.lp
+For all intents and purposes, this is the implementation which survives
+in most UNIX-like systems even to this day.
+Apart from the access control and timestamp information which is
+found in all inodes, the special inodes in the filesystem contain only
+one piece of information: the major and minor device numbers, often
+logically OR'ed to one field.
+.lp
+When a program opens a special file, the kernel uses the major number
+to find the entry points in the device driver, and passes the combined
+major and minor numbers as a parameter to the device driver.
+.sh 1 "The challenge"
+.lp
+Now, we did not talk much about where the special inodes came from
+to begin with.
+They were created by hand, using the
+mknod(2) system call, usually through the mknod(8) program.
+.lp
+In those days a
+computer had a very static hardware configuration\**
+.(f
+\** Unless your assigned field engineer was present on site.
+.)f
+and it certainly did not
+change while the system was up and running, so creating device nodes
+by hand was certainly an acceptable solution.
+.lp
+The first sign that this would not hold up as a solution came with
+the advent of TCP/IP and the telnet(1) program, or more precisely 
+with the telnetd(8) daemon.
+In order to support remote login a ``pseudo-tty'' device driver was implemented,
+basically as tty driver which instead of hardware had another device which
+would allow a process to ``act as hardware'' for the tty.
+The telnetd(8) daemon would read and write data on the ``master'' side of
+the pseudo-tty and the user would be running on the ``slave'' side,
+which would act just like any other tty: you could change the erase 
+character if you wanted to and all the signals and all that stuff worked.
+.lp
+Obviously with a device requiring no hardware, you can compile as many
+instances into the kernel as you like, as long as you do not use
+too much memory.
+As system after system was connected
+to the ARPANet, ``increasing number of ptys'' became a regular task
+for system administrators, and part of this task was to create
+more special nodes in the filesystem.
+.lp
+Several UNIX vendors also noticed an issue when they sold minicomputers
+in many different configurations: explaining to system administrators
+just which special nodes they would need and how to create them were
+a significant documentation hassle.  Some opted for the simple solution
+and pre-populated /dev with every conceivable device node, resulting
+in a predictable slowdown on access to filenames in /dev.
+.lp
+System V UNIX provided a band-aid solution:
+a special boot sequence would take effect if the kernel or
+the hardware had changed since last reboot.
+This boot procedure would
+amongst other things create the necessary special files in the filesystem,
+based on an intricate system of per device driver configuration files.
+.lp
+In the recent years, we have become used to hardware which changes
+configuration at any time: people plug USB, Firewire and PCCard
+devices into their computers.
+These devices can be anything from modems and disks to GPS receivers 
+and fingerprint authentication hardware.
+Suddenly maintaining the
+correct set of special devices in ``/dev'' became a major headache.
+.lp
+Along the way, UNIX kernels had learned to deal with multiple filesystem
+types [Heidemann91a] and a ``device-pseudo-filesystem'' was a pretty
+obvious idea.
+The device drivers have a pretty good idea which
+devices they have found in the configuration, so all that is needed is
+to present this information as a filesystem filled with just the right
+special files.
+Experience has shown that this like most other ``pseudo
+filesystems'' sound a lot simpler in theory than in practice.
+.sh 1 "Truly understanding devices"
+.lp
+Before we continue, we need to fully understand the
+``device special file'' in UNIX.
+.lp
+First we need to realize that a special file has the nature of
+a pointer from the filesystem into a different namespace;
+a little understood fact with far reaching consequences.
+.lp
+One implication of this is that several special files can
+exist in the filename namespace all pointing to the same device
+but each having their own access and timestamp attributes:
+.lp
+.(b M
+.vs -3
+\fC\s-3guest# ls -l /dev/fd0 /tmp/fd0
+crw-r----- 1 root operator 9, 0 Sep 27 19:21 /dev/fd0
+crw-rw-rw- 1 root wheel    9, 0 Sep 27 19:24 /tmp/fd0\fP\s+3
+.vs +3
+.)b
+Obviously, the administrator needs to be on top of this:
+one popular way to exploit an unguarded root prompt is
+to create a replica of the special file /dev/kmem 
+in a location where it will not be noticed.
+Since /dev/kmem gives access to the kernel memory, 
+gaining any particular
+privilege can be arranged by suitably modifying the kernel's
+data structures through the illicit special file.
+.lp
+When NFS appeared it opened a new avenue for this attack:
+People may have root privilege on one machine but not another.
+Since device nodes are not interpreted on the NFS server
+but rather on the local computer,
+a user with root privilege on a NFS client
+computer can create a device node to his liking on a filesystem
+mounted from an NFS server.
+This device node can in turn be used to 
+circumvent the security of other computers which mount that filesystem,
+including the server, unless they protect themselves by not
+trusting any device entries on untrusted filesystem by mounting such
+filesystems with the \fCnodev\fP mount-option.
+.lp
+The fact that the device itself does not actually exist inside the
+filesystem which holds the special file makes it possible
+to perform boot-strapping stunts in the spirit 
+of Baron Von Münchausen [raspe1785],
+where a filesystem is (re)mounted using one of its own
+device vnodes:
+.(b M
+.vs -3
+\fC\s-2guest# mount -o ro /dev/fd0 /mnt
+guest# fsck /mnt/dev/fd0
+guest# mount -u -o rw /mnt/dev/fd0 /mnt\fP\s+2
+.vs +3
+.)b
+.lp
+Other interesting details are chroot(2) and jail(2) [Kamp2000] which
+provide filesystem isolation for process-trees.
+Whereas chroot(2) was not implemented as a security tool [Mckusick1999]
+(although it has been widely used as such), the jail(2) security
+facility in FreeBSD provides a pretty convincing ``virtual machine''
+where even the root privilege is isolated and restricted to the designated
+area of the machine.
+Obviously chroot(2) and jail(2) may require access to a well-defined
+subset of devices like /dev/null, /dev/zero and /dev/tty,
+whereas access to other devices such as /dev/kmem
+or any disks could be used to compromise the integrity of the jail(2)
+confinement.
+.lp
+For a long time FreeBSD, like almost all UNIX-like systems had two kinds
+of devices, ``block'' and
+``character'' special files, the difference being that ``block''
+devices would provide caching and alignment for disk device access.
+This was one of those minor architectural mistakes which took
+forever to correct.
+.lp
+The argument that block devices were a mistake is really very
+very simple:  Many devices other than disks have multiple modes
+of access which you select by choosing which special file to use.
+.lp
+Pick any old timer and he will be able to recite painful
+sagas about the crucial difference between the /dev/rmt 
+and /dev/nrmt devices for tape access.\**
+.(f
+\** Make absolutely sure you know the difference before you take
+important data on a multi-file 9-track tape to remote locations.
+.)f
+.lp
+Tapes, asynchronous ports, line printer ports and many other devices
+have implemented submodes, selectable by the user
+at a special filename level, but that has not earned them their
+own special file types.
+Only disks\**
+.(f
+\** Well, OK: and some 9-track tapes.
+.)f
+have enjoyed the privilege of getting an entire file type dedicated to a
+a minor device mode.
+.lp
+Caching and alignment modes should have been enabled by setting
+some bit in the minor device number on the disk special file,
+not by polluting the filesystem code with another file type.
+.lp
+In FreeBSD block devices were not even implemented in a fashion
+which would be of any use, since any write errors would never be
+reported to the writing process.  For this reason, and since no
+applications 
+were found to be in existence which relied on block devices
+and since historical usage was indeed historical [Mckusick2000],
+block devices were removed from the FreeBSD system.
+This greatly simlified the task of keeping track of open(2) 
+reference counts for disks and
+removed much magic special-case code throughout.
+.lp
+.sh 1 "Files, sockets, pipes, SVID IPC and devices"
+.sp
+It is an instructive lesson in inconsistency to look at the
+various types of ``things'' a process can access in UNIX-like
+systems today.
+.lp
+First there are normal files, which are our reference yardstick here:
+they are accessed with open(2), read(2), write(2), mmap(2), close(2)
+and various other auxiliary system calls.
+.lp
+Sockets and pipes are also accessed via file handles but each has
+its own namespace.  That means you cannot open(2) a socket,\**
+.(f
+\** This is particularly bizarre in the case of UNIX domain sockets
+which use the filesystem as their namespace and appear in directory
+listings.
+.)f
+but you can read(2) and write(2) to it.
+Sockets and pipes vector off at the file descriptor level and do
+not get in touch with the vnode based part of the kernel at all.
+.lp
+Devices land somewhere in the middle between pipes and sockets on
+one side and normal files on the other.
+They use the filesystem 
+namespace, are implemented with vnodes, and can be operated
+on like normal files, but don't actually live in the filesystem.
+.lp
+Devices are in fact special-cased all the way through the vnode system.
+For one thing devices break the ``one file-one vnode''
+rule, making it necessary to chain all vnodes for the same
+device together in
+order to be able to find ``the canonical vnode for this device node'',
+but more importantly, many operations have to be specifically denied
+on special file vnodes since they do not make any sense.
+.lp
+For true inconsistency, consider the SVID IPC mechanisms - not
+only do they not operate via file handles,
+but they also sport a singularly
+illconceived 32 bit numeric namespace and a dedicated set of
+system calls for access.
+.lp
+Several people have convincingly argued that this is an inconsistent
+mess, and have proposed and implemented more consistent operating systems
+like the Plan9 from Bell Labs [Pike90a] [Pike92a].
+Unfortunately reality is that people are not interested in learning a new
+operating system when the one they have is pretty darn good, and
+consequently research into better and more consistent ways is
+a pretty frustrating [Pike2000] but by no means irrelevant topic.
+.sh 1 "Solving the /dev maintenance problem"
+.lp
+There are a number of obvious, simple but wrong ways one could
+go about solving the ``/dev'' maintenance problem.
+.lp
+The very straightforward way is to hack the namei() kernel function
+responsible for filename translation and lookup.
+It is only a minor matter of programming to
+add code to special-case any lookup which ends up in ``/dev''.
+But this leads to problems:  in the case of chroot(2) or jail(2), the
+administrator will want to present only a subset of the available
+devices in ``/dev'', so some kind of state will have to be kept per
+chroot(2)/jail(2) about which devices are visible and
+which devices are hidden, but no obvious location for this information
+is available in the absence of a mount data structure.
+.lp
+It also leads to some unpleasant issues
+because of the fact that ``/dev/foo'' is a synthesised directory
+entry which may or may not actually be present on the filesystem 
+which seems to provide ``/dev''.
+The vnodes either have to belong to a filesystem or they
+must be special-cased throughout the vnode layer of the kernel.
+.lp
+Finally there is the simple matter of generality:
+hardcoding the string "/dev" in the kernel is very general.
+.lp
+A cruder solution is to leave it to a daemon: make a special
+device driver, have a daemon read messages from it and create and
+destroy nodes in ``/dev'' in response to these messages.
+.lp
+The main drawback to this idea is that now we have added IPC
+to the mix introducing new and interesting race conditions.
+.lp
+Otherwise this solution is a surprisingly effective,
+but chroot(2)/jail(2) requirements prevents a simple implementation 
+and running a daemon per jail would become an administrative
+nightmare.
+.lp
+Another pitfall of
+this approach is that we are not able to remount the root filesystem
+read-write at boot until we have a device node for the root device,
+but if this node is missing we cannot create it with a daemon since
+the root filesystem (and hence /dev) is read-only.
+Adding a read-write memory-filesystem mount /dev to solve this problem
+does not improve
+the architectural qualities further and certainly the KISS principle has
+been violated by now.
+.lp
+The final and in the end only satisfactory solution is to write a ``DEVFS''
+which mounts on ``/dev''.
+.lp
+The good news is that it does solve the problem with chroot(2) and jail(2):
+just mount a DEVFS instance on the ``dev'' directory inside the filesystem
+subtree where the chroot or jail lives.  Having a mountpoint gives us
+a convenient place to keep track of the local state of this DEVFS mount.
+.lp
+The bad news is that it takes a lot of cleanup and care to implement
+a DEVFS into a UNIX kernel.
+.sh 1 "DEVFS architectural decisions"
+.lp
+Before implementing a DEVFS, it is necessary to decide on a range
+of corner cases in behaviour, and some of these choices have proved
+surprisingly hard to settle for the FreeBSD project.
+.sh 2 "The ``persistence'' issue"
+.lp
+When DEVFS in FreeBSD was initially presented at a BoF at the 1995
+USENIX Technical Conference in New Orleans,
+a group of people demanded that it provide ``persistence''
+for administrative changes.
+.lp
+When trying to get a definition of ``persistence'', people can generally
+agree that if the administrator changes the access control bits of
+a device node, they want that mode to survive across reboots.
+.lp
+Once more tricky examples of the sort of manipulations one can do
+on special files are proposed, people rapidly disagree about what
+should be supported and what should not.
+.lp
+For instance, imagine a
+system with one floppy drive which appears in DEVFS as ``/dev/fd0''.
+Now the administrator, in order to get some badly written software
+to run, links this to ``/dev/fd1'':
+.(b M
+\fC\s-2ln /dev/fd0 /dev/fd1\fP\s+2
+.)b
+This works as expected and with persistence in DEVFS, the link is
+still there after a reboot.
+But what if after a reboot another floppy drive has been connected
+to the system?
+This drive would naturally have the name ``/dev/fd1'',
+but this name is now occupied by the administrators hard link.
+Should the link be broken?
+Should the new floppy drive be called
+``/dev/fd2''?  Nobody can agree on anything but the ugliness of the
+situation.
+.lp
+Given that we are no longer dependent on DEC Field engineers to
+change all four wheels to see which one is flat, the basic assumption
+that the machine has a constant hardware configuration is simply no
+longer true.
+The new assumption one should start from when analysing this
+issue is that when the system boots, we cannot know what devices we
+will find, and we can not know if the devices we do find
+are the same ones we had when the system was last shut down.
+.lp
+And in fact, this is very much the case with laptops today:  if I attach
+my IOmega Zip drive to my laptop it appears like a SCSI disk named
+``/dev/da0'', but so does the RAID-5 array attached to the PCI SCSI controller
+installed in my laptop's docking station.  If I change mode to ``a+rw''
+on the Zip drive, do I want that mode to apply to the RAID-5 as well?
+Unlikely.
+.lp
+And what if we have persistent information about the mode of
+device ``/dev/sio0'', but we boot and do not find any sio devices?
+Do we keep the information in our device-persistence registry?
+How long do we keep it?  If I borrow a a modem card,
+set the permissions to some non-standard value like 0666,
+and then attach some other serial device a year from now - do I
+want some old permissions changes to come back and haunt me,
+just because they both happened to be ``/dev/sio0''?
+Unlikely.
+.lp
+The fact that more people have laptop computers today than
+five years ago, and the fact that nobody has been able to credibly
+propose where a persistent DEVFS would actually store the 
+information about these things in the first place has settled the issue.
+.lp
+Persistence may be the right answer, but to the
+wrong question: persistence is not a desirable property for a DEVFS
+when the hardware configuration may change literally at any time.
+.sh 2 "Who decides on the names?"
+.lp
+In a DEVFS-enabled system, the responsibility for creating nodes in
+/dev shifts to the device drivers, and consequently the device
+drivers get to choose the names of the device files.
+In addition an initial value for owner, group and mode bits are
+provided by the device driver.
+.lp
+But should it be possible to rename ``/dev/lpt0'' to ``/dev/myprinter''?
+While the obvious affirmative answer is easy to arrive at, it leaves
+a lot to be desired once the implications are unmasked.
+.lp
+Most device drivers know their own name and use it purposefully in
+their debug and log messages to identify themselves.
+Furthermore, the ``NewBus'' [NewBus] infrastructure facility,
+which ties hardware to device drivers, identifies things by name 
+and unit numbers.
+.lp
+A very common way to report errors in fact:
+.(b M
+.vs -3
+\fC\s-2#define LPT_NAME "lpt" /* our official name */
+[...]
+printf(LPT_NAME
+    ": cannot alloc ppbus (%d)!", error);\fP\s+2
+.vs +3
+.)b
+.lp
+So despite the user renaming the device node pointing to the printer
+to ``myprinter'', this has absolutely no effect in the kernel and can
+be considered a userland aliasing operation.
+.lp
+The decision was therefore made that it should not be possible to rename
+device nodes since it would only lead to confusion and because the desired
+effect could be attained by giving the user the ability to create
+symlinks in DEVFS.
+.sh 2 "On-demand device creation"
+.lp
+Pseudo-devices like pty, tun and bpf,
+but also some real devices, may not pre-emptively create entries for all
+possible device nodes.  It would be a pointless waste of resources
+to always create 1000 ptys just in case they are needed,
+and in the worst case more than 1800 device nodes would be needed per 
+physical disk to represent all possible slices and partitions.
+.lp
+For pseudo-devices the task at hand is to make a magic device node,
+``/dev/pty'', which when opened will magically transmogrify into the
+first available pty subdevice, maybe ``/dev/pty123''.
+.lp
+Device submodes, on the other hand, work by having multiple
+entries in /dev, each with a different minor number, as a way to instruct
+the device driver in aspects of its operation.  The most widespread
+example is probably ``/dev/mt0'' and ``/dev/nmt0'', where the node
+with the extra ``n''
+instructs the tape device driver to not rewind on close.\**
+.(f
+\** This is the answer to the question in footnote number 2.
+.)f
+.lp
+Some UNIX systems have solved the problem for pseudo-devices by
+creating magic cloning devices like ``/dev/tcp''.
+When a cloning device is opened,
+it finds a free instance and through vnode and file descriptor mangling
+return this new device to the opening process.
+.lp
+This scheme has two disadvantages: the complexity of switching vnodes
+in midstream is non-trivial, but even worse is the fact that it 
+does not work for
+submodes for a device because it only reacts to one particular /dev entry.
+.lp
+The solution for both needs is a more flexible on-demand device
+creation, implemented in FreeBSD as a two-level lookup.
+When a
+filename is looked up in DEVFS, a match in the existing device nodes is
+sought first and if found, returned.
+If no match is found, device drivers are polled in turn to ask if
+they would be able to synthesise a device node of the given name.
+.lp
+The device driver gets a chance to modify the name
+and create a device with make_dev().
+If one of the drivers succeeds in this, the lookup is started over and
+the newly found device node is returned:
+.(b M
+.vs -3
+\fC\s-2pty_clone()
+   if (name != "pty")
+      return(NULL); /* no luck */
+   n = find_next_unit();
+   dev = make_dev(...,n,"pty%d",n);
+   name = dev->name;
+   return(dev);\fP\s+2
+.vs +3
+.)b
+.lp
+An interesting mixed use of this mechanism is with the sound device drivers.
+Modern sound devices have multiple channels, presumably to allow the
+user to listen to CNN, Napstered MP3 files and Quake sound effects at
+the same time.
+The only problem is that all applications attempt to open ``/dev/dsp''
+since they have no concept of multiple sound devices.
+The sound device drivers use the cloning facility to direct ``/dev/dsp''
+to the first available sound channel completely transparently to the
+process.
+.lp
+There are very few drawbacks to this mechanism, the major one being
+that ``ls /dev'' now errs on the sparse side instead of the rich when used
+as a system device inventory, a practice which has always been 
+of dubious precision at best.
+.sh 2 "Deleting and recreating devices"
+.lp
+Deleting device nodes is no problem to implement, but as likely as not,
+some people will want a method to get them back.
+Since only the device driver know how to create a given device,
+recreation cannot be performed solely on the basis of the parameters 
+provided by a process in userland.
+.lp
+In order to not complicate the code which updates the directory
+structure for a mountpoint to reflect changes in the DEVFS inode list,
+a deleted entry is merely marked with DE_WHITEOUT instead of being
+removed entirely.
+Otherwise a separate list would be needed for inodes which we had
+deleted so that they would not be mistaken for new inodes.
+.lp
+The obvious way to recreate deleted devices is to let mknod(2) do it
+by matching the name and disregarding the major/minor arguments.
+Recreating the device with mknod(2) will simply remove the DE_WHITEOUT
+flag.
+.sh 2 "Jail(2), chroot(2) and DEVFS"
+.lp
+The primary requirement from facilities like jail(2) and chroot(2)
+is that it must be possible to control the contents of a DEVFS mount
+point.
+.lp
+Obviously, it would not be desirable for dynamic devices to pop
+into existence in the carefully pruned /dev of jails so it must be
+possible to mark a DEVFS mountpoint as ``no new devices''.
+And in the same way, the jailed root should not be able to recreate
+device nodes which the real root has removed.
+.lp
+These behaviours will be controlled with mount options, but these have not
+yet been implemented because FreeBSD has run out of bitmap flags for
+mount options, and a new unlimited mount option implementation is
+still not in place at the time of writing.
+.lp
+One mount option ``jaildevfs'', will restrict the contents of the
+DEVFS mountpoint to the ``normal set'' of devices for a jail and
+automatically hide all future devices and make it impossible
+for a jailed root to un-hide hidden entries while letting an un-jailed
+root do so.
+.lp
+Mounting or remounting read-only, will prevent all future
+devices from appearing and will make it impossible to
+hide or un-hide entries in the mountpoint.
+This is probably only useful for chroots or jails where no tty
+access is intended since cloning will not work either.
+.lp
+More mount options may be needed as more experience is gained.
+.sh 2 "Default mode, owner & group"
+.lp
+When a device driver creates a device node, and a DEVFS mount adds it
+to its directory tree, it needs to have some values for the access
+control fields: mode, owner and group.
+.lp
+Currently, the device driver specifies the initial values in the
+make_dev() call, but this is far from optimal.
+For one thing, embedding magic UIDs and GIDs in the kernel is simply
+bad style unless they are numerically zero.
+More seriously, they represent compile-time defaults which in these
+enlightened days is rather old-fashioned.
+.lp
+.sh 1 "Cleaning up before we build: struct specinfo and dev_t"
+.lp
+Most of the rest of the paper will be about the various challenges
+and issues in the implementation of DEVFS in FreeBSD.
+All of this should be applicable to other systems derived from
+4.4BSD-Lite as well.
+.lp
+POSIX has defined a type called ``dev_t'' which is the identity of a device.
+This is mainly for use in the few system calls which knows about devices:
+stat(2), fstat(2) and mknod(2).
+A dev_t is constructed by logically OR'ing
+the major# and minor# for the device.
+Since those have been defined
+as having no overlapping bits, the major# and minor#
+can be retrieved from the dev_t by a simple masking operation.
+.lp
+Although the kernel had a well-defined concept of any particular
+device it did not have a data structure to represent "a device".
+The device driver has such a structure, traditionally called ``softc''
+but the high kernel does not (and should not!) have access to the
+device driver's private data structures.
+.lp
+It is an interesting tale how things got to be this way,\**
+.(f
+\** Basically, devices should have been moved up with sockets and
+pipes at the file descriptor level when the VFS layering was introduced,
+rather than have all the special casing throughout the vnode system.
+.)f
+but for now just record for
+a fact how the actual relationship between the data structures was
+in the 4.4BSD release (Fig. 1). [44BSDBook]
+.(z
+.PS 3
+F: box "file" "handle"
+arrow down from F.s
+V: box "vnode"
+arrow right from V.e
+S: box "specinfo"
+arrow down from V.s
+I: box "inode"
+arrow right from I.e
+C: box invis "devsw[]" "[major#]"
+arrow down from C.s
+D: box "device" "driver"
+line right from D.e
+box invis "softc[]" "[minor#]"
+F2: box "file" "handle" at F + (2.5,0)
+arrow down from F2.s
+V2: box "vnode"
+arrow right from V2.e
+S2: box "specinfo"
+arrow down from V2.s
+I2: box "inode"
+arrow left from I2.w
+.PE
+.ce 1
+Fig. 1 - Data structures in 4.4BSD
+.)z
+.lp
+As for all other files, a vnode references a filesystem inode, but
+in addition it points to a ``specinfo'' structure.  In the inode
+we find the dev_t which is used to reference the device driver.
+.lp
+Access to the device driver happens by extracting the major# from
+the dev_t, indexing through the global devsw[] array to locate
+the device driver's entry point.
+.lp
+The device driver will extract the minor# from the dev_t and use
+that as the index into the softc array of private data per device.
+.lp
+The ``specinfo'' structure is a little sidekick vnodes grew underway,
+and is used to find all vnodes which reference the same device (i.e.
+they have the same  major# and minor#).
+This linkage is used to determine
+which vnode is the ``chosen one'' for this device, and to keep track of
+open(2)/close(2) against this device.
+The actual implementation was an inefficient hash implementation,
+which depending on the vnode reclamation rate and /dev directory lookup
+traffic, may become a measurable performance liability.
+.sh 2 "The new vnode/inode/dev_t layout"
+.lp
+In the new layout (Fig. 2) the specinfo structure takes a central
+role.  There is only one instanace of struct specinfo per 
+device (i.e. unique major#
+and minor# combination) and all vnodes referencing this device point
+to this structure directly.
+.(z
+.PS 2.25
+F: box "file" "handle"
+arrow down from F.s
+V: box "vnode"
+arrow right from V.e
+S: box "specinfo"
+arrow down from V.s
+I: box "inode"
+F2: box "file" "handle" at F + (2.5,0)
+arrow down from F2.s
+V2: box "vnode"
+arrow left from V2.w
+arrow down from V2.s
+I2: box "inode"
+arrow down from S.s
+D: box "device" "driver"
+.PE
+.ce 1
+Fig. 2 - The new FreeBSD data structures.
+.)z
+.lp
+In userland, a dev_t is still the logical OR of the major# and
+minor#, but this entity is now called a udev_t in the kernel.
+In the kernel a dev_t is now a pointer to a struct specinfo.
+.lp
+All vnodes referencing a device are linked to a list hanging
+directly off the specinfo structure, removing the need for the
+hash table and  consequently simplifying and speeding up a lot
+of code dealing with vnode instantiation, retirement and
+name-caching.
+.lp
+The entry points to the device driver are stored in the specinfo
+structure, removing the need for the devsw[] array and allowing
+device drivers to use separate entrypoints for various minor numbers.
+.lp
+This is is very convenient for devices which have a ``control''
+device for management and tuning.  The control device, almost always
+have entirely separate open/close/ioctl implementations [MD.C].
+.lp
+In addition to this, two data elements are included in the specinfo
+structure but ``owned'' by the device driver.  Typically the
+device driver will store a pointer to the softc structure in
+one of these, and unit number or mode information in the other.
+.lp
+This removes the need for drivers to find the softc using array
+indexing based on the minor#, and at the same time has obliviated
+the need for the compiled-in ``NFOO'' constants which traditionally
+determined how many softc structures and therefore devices
+the driver could support.\**
+.(f
+\** Not to mention all the drivers which implemented panic(2) 
+because they forgot to perform bounds checking on the index before
+using it on their softc arrays.
+.)f
+.lp
+There are some trivial technical issues relating to allocating
+the storage for specinfo early in the boot sequence and how to
+find a specinfo from the udev_t/major#+minor#, but they will
+not be discussed here.
+.sh 2 "Creating and destroying devices"
+.lp
+Ideally, devices should only be created and
+destroyed by the device drivers which know what devices are present.
+This is accomplished with the make_dev() and destroy_dev()
+function calls.
+.lp
+Life is seldom quite that simple.  The operating system might be called
+on to act as a NFS server for a diskless workstation, possibly even
+of a different architecture, so we still need to be able to represent
+device nodes with no device driver backing in the filesystems and
+consequently we need to be able to create a specinfo from
+the major#+minor# in these inodes when we encounter them.
+In practice this is quite trivial, but in a few places in the code
+one needs to be aware of the existence
+of both ``named'' and ``anonymous'' specinfo structures.
+.lp
+The make_dev() call creates a specinfo structure and populates
+it with driver entry points, major#, minor#, device node name
+(for instance ``lpt0''), UID, GID and access mode bits.  The return
+value is a dev_t (i.e.,  a pointer to struct specinfo).
+If the device driver determines that the device is no longer
+present, it calls destroy_dev(), giving a dev_t as argument
+and the dev_t will be cleaned and converted to an anonymous dev_t.
+.lp
+Once created with make_dev() a named dev_t exists until destroy_dev()
+is called by the driver.  The driver can rely on this and keep state
+in the fields in dev_t which is reserved for driver use.
+.sh 1 "DEVFS"
+.lp
+By now we have all the relevant information about each device node
+collected in struct specinfo but we still have one problem to
+solve before we can add the DEVFS filesystem on top of it.
+.sh 2 "The interrupt problem"
+.lp
+Some device drivers, notably the CAM/SCSI subsystem in FreeBSD
+will discover changes in the device configuration inside an interrupt
+routine.
+.lp
+This imposes some limitations on what can and should do be done:
+first one should minimise the amount
+of work done in an interrupt routine for performance reasons;
+second, to avoid deadlocks, vnodes and mountpoints should not be
+accessed from an interrupt routine.
+.lp
+Also, in addition to the locking issue,
+a machine can have many instances of DEVFS mounted:
+for a jail(8) based virtual-machine web-server several hundred instances
+is not unheard of, making it far too expensive to update all of them
+in an interrupt routine.
+.lp
+The solution to this problem is to do all the filesystem work on
+the filesystem side of DEVFS and use atomically manipulated integer indices
+(``inode numbers'') as the barrier between the two sides.
+.lp
+The functions called from the device drivers, make_dev(), destroy_dev()
+&c. only manipulate the DEVFS inode number of the dev_t in
+question and do not even get near any mountpoints or vnodes.
+.lp
+For make_dev() the task is to assign a unique inode number to the
+dev_t and store the dev_t in the DEVFS-global inode-to-dev_t array.
+.(b M
+.vs -3
+\fC\s-2make_dev(...)
+    store argument values in dev_t
+    assign unique inode number to dev_t
+    atomically insert dev_t into inode_array\fP\s+2
+.vs +3
+.)b
+.lp
+For destroy_dev() the task is the opposite: clear the inode number
+in the dev_t and NULL the pointer in the devfs-global inode-to-dev_t
+array.
+.(b M
+.vs -3
+\fC\s-2destroy_dev(...)
+    clear fields in dev_t
+    zero dev_t inode number.
+    atomically clear entry in inode_array\fP\s+2
+.vs +3
+.)b
+.lp
+Both functions conclude by atomically incrementing a global variable
+\fCdevfs_generation\fP to leave an indication to the filesystem
+side that something has changed.
+.lp
+By modifying the global state only with atomic instructions, locks
+have been entirely avoided in this part of the code which means that
+the make_dev() and destroy_dev() functions can be called from practically
+anywhere in the kernel at any time.
+.lp
+On the filesystem side of DEVFS, the only two vnode methods which examine
+or rely on the directory structure, VOP_LOOKUP and VOP_READDIR,
+call the function devfs_populate() to update their mountpoint's view
+of the device hierarchy to match current reality before doing any work.
+.(b M
+.vs -3
+\fC\s-2devfs_readdir(...)
+    devfs_populate(...)
+    ...\fP\s+2
+.)b
+.vs +3
+.lp
+The devfs_populate() function, compares the current \fCdevfs_generation\fP
+to the value saved in the mountpoint last time devfs_populate() completed
+and if (actually: while) they differ a linear run is made through the
+devfs-global inode-array and the directory tree of the mountpoint is
+brought up to date.
+.lp
+The actual code is slightly more complicated than shown in the pseudo-code
+here because it has to deal with subdirectories and hidden entries.
+.(b M
+.vs -3
+\fC\s-2devfs_populate(...)
+  while (mount->generation != devfs_generation)
+    for i in all inodes
+      if inode created)
+        create directory entry
+      else if inode destroyed
+        remove directory entry
+.vs +3
+.)b
+.lp
+Access to the global DEVFS inode table is again implemented
+with atomic instructions and failsafe retries to avoid the
+need for locking.
+.lp
+From a performance point of view this scheme also means that a particular
+DEVFS mountpoint is not updated until it needs to be, and then always by
+a process belonging to the jail in question thus minimising and 
+distributing the CPU load.
+.sh 1 "Device-driver impact"
+.lp
+All these changes have had a significant impact on how device drivers
+interact with the rest of the kernel regarding registration of
+devices.
+.lp
+If we look first at the ``before'' image in Fig. 3, we notice first
+the NFOO define which imposes a firm upper limit on the number of
+devices the kernel can deal with.
+Also notice that the softc structure for all of them is allocated
+at compile time.
+This is because most device drivers (and texts on writing device
+drivers) are from before the general
+kernel malloc facility [Mckusick1988] was introduced into the BSD kernel.
+.lp
+.(b M
+.vs -3
+\fC\s-2
+#ifndef NFOO
+#	define NFOO	4
+#endif
+
+struct foo_softc {
+	...
+} foo_softc[NFOO];
+
+int nfoo = 0;
+
+foo_open(dev, ...)
+{
+	int unit = minor(dev);
+	struct foo_softc *sc;
+
+	if (unit >= NFOO || unit >= nfoo)
+		return (ENXIO);
+	
+	sc = &foo_softc[unit]
+
+	...
+}
+
+foo_attach(...)
+{
+	struct foo_softc *sc;
+	static int once;
+
+	...
+	if (nfoo >= NFOO) {
+		/* Have hardware, can't handle */
+		return (-1);
+	}
+	sc = &foo_softc[nfoo++];
+	if (!once) {
+		cdevsw_add(&cdevsw);
+		once++;
+	}
+	...
+}
+\fP\s+2
+Fig. 3 - Device-driver, old style.
+.vs +3
+.)b
+.lp
+Also notice how range checking is needed to make sure that the
+minor# is inside range.  This code gets more complex if device-numbering
+is sparse.  Code equivalent to that shown in the foo_open() routine
+would also be needed in foo_read(), foo_write(), foo_ioctl() &c.
+.lp
+Finally notice how the attach routine needs to remember to register
+the cdevsw structure (not shown) when the first device is found.
+.lp
+Now, compare this to our ``after'' image in Fig. 4.
+NFOO is totally gone and so is the compile time allocation
+of space for softc structures.
+.lp
+The foo_open (and foo_close, foo_ioctl &c) functions can now
+derive the softc pointer directly from the dev_t they receive
+as an argument.
+.lp
+.(b M
+.vs -3
+\fC\s-2
+struct foo_softc {
+	....
+};
+
+int nfoo;
+
+foo_open(dev, ...)
+{
+	struct foo_softc *sc = dev->si_drv1;
+
+	...
+}
+
+foo_attach(...)
+{
+	struct foo_softc *sc;
+
+	...
+	sc = MALLOC(..., M_ZERO);
+	if (sc == NULL) {
+		/* Have hardware, can't handle */
+		return (-1);
+	}
+	sc->dev = make_dev(&cdevsw, nfoo,
+	    UID_ROOT, GID_WHEEL, 0644,
+	    "foo%d", nfoo);
+	nfoo++;
+	sc->dev->si_drv1 = sc;
+	...
+}
+\fP\s+2
+Fig. 4 - Device-driver, new style.
+.vs +3
+.)b
+.lp
+In foo_attach() we can now attach to all the devices we can
+allocate memory for and we register the cdevsw structure per
+dev_t rather than globally.
+.lp
+This last trick is what allows us to discard all bounds checking
+in the foo_open() &c. routines, because they can only be
+called through the cdevsw, and the cdevsw is only attached to
+dev_t's which foo_attach() has created.
+There is no way to end
+up in foo_open() with a dev_t not created by foo_attach().
+.lp
+In the two examples here, the difference is only 10 lines of source
+code, primarily because only one of the worker functions of the
+device driver is shown.
+In real device drivers it is not uncommon to save 50 or more lines
+of source code which typically is about a percent or two of the
+entire driver.
+.sh 1 "Future work"
+.lp
+Apart from some minor issues to be cleaned up, DEVFS is now a reality
+and future work therefore is likely concentrate on applying the
+facilities and functionality of DEVFS to FreeBSD.
+.sh 2 "devd"
+.lp
+It would be logical to complement DEVFS with a ``device-daemon'' which
+could configure and de-configure devices as they come and go.
+When a disk appears, mount it.
+When a network interface appears, configure it.
+And in some configurable way allow the user to customise the action,
+so that for instance images will automatically be copied off the
+flash-based media from a camera, &c.
+.lp
+In this context it is good to question how we view dynamic devices.
+If for instance a printer is removed in the middle of a print job
+and another printer arrives a moment later, should the system
+automatically continue the print job on this new printer?
+When a disk-like device arrives, should we always mount it?  Should
+we have a database of known disk-like devices to tell us where to
+mount it, what permissions to give the mountpoint?
+Some computers come in multiple configurations, for instance laptops
+with and without their docking station.  How do we want to present
+this to the users and what behaviour do the users expect?
+.sh 2 "Pathname length limitations"
+.lp
+In order to simplify memory management in the early stages of boot,
+the pathname relative to the mountpoint is presently stored in a 
+small fixed size buffer inside struct specinfo.
+It should be possible to use filenames as long as the system otherwise
+permits, so some kind of extension mechanism is called for.
+.lp
+Since it cannot be guaranteed that memory can be allocated in
+all the possible scenarios where make_dev() can be called, it may
+be necessary to mandate that the caller allocates the buffer if
+the content will not fit inside the default buffer size.
+.sh 2 "Initial access parameter selection"
+.lp
+As it is now, device drivers propose the initial mode, owner and group
+for the device nodes, but it would be more flexible if it were possible
+to give the kernel a set of rules, much like packet filtering rules,
+which allow the user to set the wanted policy for new devices.
+Such a mechanism could also be used to filter new devices for mount
+points in jails and to determine other behaviour.
+.lp
+Doing these things from userland results in some awkward race conditions
+and software bloat for embedded systems, so a kernel approach may be more
+suitable.
+.sh 2 "Applications of on-demand device creation"
+.lp
+The facility for on-demand creation of devices has some very interesting
+possibilities.
+.lp
+One planned use is to enable user-controlled labelling
+of disks.
+Today disks have names like /dev/da0, /dev/ad4, but since
+this numbering is topological any change in the hardware configuration
+may rename the disks, causing /etc/fstab and backup procedures
+to get out of sync with the hardware.
+.lp
+The current idea is to store on the media of the disk a user-chosen
+disk name and allow access through this name, so that for instance 
+/dev/mydisk0
+would be a symlink to whatever topological name the disk might have
+at any given time.
+.lp
+To simplify this and to avoid a forest of symlinks, it will probably
+be decided to move all the sub-divisions of a disk into one subdirectory
+per disk so just a single symlink can do the job.
+In practice that means that the current /dev/ad0s2f will become
+something like /dev/ad0/s2f and so on.
+Obviously, in the same way, disks could also be accessed by their
+topological address, down to the specific path in a SAN environment.
+.lp
+Another potential use could be for automated offline data media libraries.
+It would be quite trivial to make it possible to access all the media
+in the library using /dev/lib/$LABEL which would be a remarkable
+simplification compared with most current automated retrieval facilities.
+.lp
+Another use could be to access devices by parameter rather than by
+name.  One could imagine sending a printjob to /dev/printer/color/A2
+and behind the scenes a search would be made for a device with the
+correct properties and paper-handling facilities.
+.sh 1 "Conclusion"
+.lp
+DEVFS has been successfully implemented in FreeBSD,
+including a powerful, simple and flexible solution supporting 
+pseudo-devices and on-demand device node creation.
+.lp
+Contrary to the trend, the implementation added functionality
+with a net decrease in source lines,
+primarily because of the improved API seen from device drivers point of view.
+.lp
+Even if DEVFS is not desired, other 4.4BSD derived UNIX variants
+would probably benefit from adopting the dev_t/specinfo related
+cleanup.
+.sh 1 "Acknowledgements"
+.lp
+I first got started on DEVFS in 1989 because the abysmal performance
+of the Olivetti M250 computer forced me to implement a network-disk-device
+for Minix in order to retain my sanity.
+That initial work led to a
+crude but working DEVFS for Minix, so obviously both Andrew Tannenbaum
+and Olivetti deserve credit for inspiration.
+.lp
+Julian Elischer implemented a DEVFS for FreeBSD around 1994 which never
+quite made it to maturity and subsequently was abandoned.
+.lp
+Bruce Evans deserves special credit not only for his keen eye for detail,
+and his competent criticism but also for his enthusiastic resistance to the
+very concept of DEVFS.
+.lp
+Many thanks to the people who took time to help me stamp out ``Danglish''
+through their reviews and comments:  Chris Demetriou, Paul Richards,
+Brian Somers, Nik Clayton, and Hanne Munkholm.
+Any remaining insults to proper use of english language are my own fault.
+.\" (list & why)
+.sh 1 "References"
+.lp
+[44BSDBook]
+Mckusick, Bostic, Karels & Quarterman:
+``The Design and Implementation of 4.4 BSD Operating System.''
+Addison Wesley, 1996, ISBN 0-201-54979-4.
+.lp
+[Heidemann91a]
+John S. Heidemann:
+``Stackable layers: an architecture for filesystem development.''
+Master's thesis, University of California, Los Angeles, July 1991.
+Available as UCLA technical report CSD-910056.
+.lp
+[Kamp2000]
+Poul-Henning Kamp and Robert N. M. Watson:
+``Confining the Omnipotent root.''
+Proceedings of the SANE 2000 Conference.
+Available in FreeBSD distributions in \fC/usr/share/papers\fP.
+.lp
+[MD.C]
+Poul-Henning Kamp et al:
+FreeBSD memory disk driver:
+\fCsrc/sys/dev/md/md.c\fP
+.lp
+[Mckusick1988]
+Marshall Kirk Mckusick, Mike J. Karels:
+``Design of a General Purpose Memory Allocator for the 4.3BSD UNIX-Kernel''
+Proceedings of the San Francisco USENIX Conference, pp. 295-303, June 1988.
+.lp
+[Mckusick1999]
+Dr. Marshall Kirk Mckusick:
+Private email communication.
+\fI``According to the SCCS logs, the chroot call was added by Bill Joy
+on March 18, 1982 approximately 1.5 years before 4.2BSD was released.
+That was well before we had ftp servers of any sort (ftp did not
+show up in the source tree until January 1983).  My best guess as
+to its purpose was to allow Bill to chroot into the /4.2BSD build
+directory and build a system using only the files, include files,
+etc contained in that tree.  That was the only use of chroot that
+I remember from the early days.''\fP
+.lp
+[Mckusick2000]
+Dr. Marshall Kirk Mckusick:
+Private communication at BSDcon2000 conference.
+\fI``I have not used block devices since I wrote FFS and that
+was \fPmany\fI years ago.''\fP
+.lp
+[NewBus]
+NewBus is a subsystem which provides most of the glue between
+hardware and device drivers.  Despite the importance of this
+there has never been published any good overview documentation
+for it.
+The following article by Alexander Langer in ``Dæmonnews'' is 
+the best reference I can come up with:
+\fC\s-2http://www.daemonnews.org/200007/newbus-intro.html\fP\s+2
+.lp
+[Pike2000]
+Rob Pike:
+``Systems Software Research is Irrelevant.''
+\fC\s-2http://www.cs.bell\-labs.com/who/rob/utah2000.pdf\fP\s+2
+.lp
+[Pike90a]
+Rob Pike, Dave Presotto, Ken Thompson and Howard Trickey:
+``Plan 9 from Bell Labs.''
+Proceedings of the Summer 1990 UKUUG Conference.
+.lp
+[Pike92a]
+Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey and Phil Winterbottom:
+``The Use of Name Spaces in Plan 9.''
+Proceedings of the 5th ACM SIGOPS Workshop.
+.lp
+[Raspe1785]
+Rudolf Erich Raspe:
+``Baron Münchhausen's Narrative of his marvellous Travels and Campaigns in Russia.''
+Kearsley, 1785.
+.lp
+[Ritchie74]
+D.M. Ritchie and K. Thompson:
+``The UNIX Time-Sharing System''
+Communications of the ACM, Vol. 17, No. 7, July 1974.
+.lp
+[Ritchie98]
+Dennis Ritchie: private conversation at USENIX Annual Technical Conference
+New Orleans, 1998.
+.lp
+[Thompson78]
+Ken Thompson:
+``UNIX Implementation''
+The Bell System Technical Journal, vol 57, 1978, number 6 (part 2) p. 1931ff.