324 lines
14 KiB
Plaintext
324 lines
14 KiB
Plaintext
|
|
||
|
The Subversion Project: Building a Better CVS
|
||
|
==============================================
|
||
|
|
||
|
Ben Collins-Sussman <sussman@collab.net>
|
||
|
|
||
|
Written in August 2001
|
||
|
Published in Linux Journal, January 2002
|
||
|
|
||
|
Abstract
|
||
|
--------
|
||
|
|
||
|
This article discusses the history, goals, features and design of
|
||
|
Subversion (http://subversion.tigris.org), an open-source project that
|
||
|
aims to produce a compelling replacement for CVS.
|
||
|
|
||
|
|
||
|
Introduction
|
||
|
------------
|
||
|
|
||
|
If you work on any kind of open-source project, you've probably worked
|
||
|
with CVS. You probably remember the first time you learned to do an
|
||
|
anonymous checkout of a source tree over the net -- or your first
|
||
|
commit, or learning how to look at CVS diffs. And then the fateful
|
||
|
day came: you asked your friend how to rename a file.
|
||
|
|
||
|
"You can't", was the reply.
|
||
|
|
||
|
What? What do you mean?
|
||
|
|
||
|
"Well, you can delete the file from the repository and then re-add it
|
||
|
under a new name."
|
||
|
|
||
|
Yes, but then nobody would know it had been renamed...
|
||
|
|
||
|
"Let's call the CVS administrator. She can hand-edit the repository's
|
||
|
RCS files for us and possibly make things work."
|
||
|
|
||
|
What?
|
||
|
|
||
|
"And by the way, don't try to delete a directory either."
|
||
|
|
||
|
You rolled your eyes and groaned. How could such simple tasks be
|
||
|
difficult?
|
||
|
|
||
|
|
||
|
The Legacy of CVS
|
||
|
-----------------
|
||
|
|
||
|
No doubt about it, CVS has evolved into the standard Software
|
||
|
Configuration Management (SCM) system of the open source community.
|
||
|
And rightly so! CVS itself is Free software, and its wonderful "non
|
||
|
locking" development model -- whereby dozens of far-flung programmers
|
||
|
collaborate -- fits the open-source world very well. In fact, one
|
||
|
might argue that without CVS, it's doubtful whether sites like
|
||
|
Freshmeat or Sourceforge would ever have flourished as they do now.
|
||
|
CVS and its semi-chaotic development model have become an essential
|
||
|
part of open source culture.
|
||
|
|
||
|
So what's wrong with CVS?
|
||
|
|
||
|
Because it uses the RCS storage-system under the hood, CVS can only
|
||
|
track file contents, not tree structures. As a result, the user has
|
||
|
no way to copy, move, or rename items without losing history. Tree
|
||
|
rearrangements are always ugly server-side tweaks.
|
||
|
|
||
|
The RCS back-end cannot store binary files efficiently, and branching
|
||
|
and tagging operations can grow to be very slow. CVS also uses the
|
||
|
network inefficiently; many users are annoyed by long waits, because
|
||
|
file differeces are sent in only one direction (from server to client,
|
||
|
but not from client to server), and binary files are always
|
||
|
transmitted in their entirety.
|
||
|
|
||
|
From a developer's standpoint, the CVS codebase is the result of
|
||
|
layers upon layers of historical "hacks". (Remember that CVS began
|
||
|
life as a collection of shell-scripts to drive RCS.) This makes the
|
||
|
code difficult to understand, maintain, or extend. For example: CVS's
|
||
|
networking ability was essentially "stapled on". It was never
|
||
|
designed to be a native client-server system.
|
||
|
|
||
|
Rectifying CVS's problems is a huge task -- and we've only listed just
|
||
|
a few of the many common complaints here.
|
||
|
|
||
|
|
||
|
Enter Subversion
|
||
|
----------------
|
||
|
|
||
|
In 1995, Karl Fogel and Jim Blandy founded Cyclic Software, a company
|
||
|
for commercially supporting and improving CVS. Cyclic made the first
|
||
|
public release of a network-enabled CVS (contributed by Cygnus
|
||
|
software.) In 1999, Karl Fogel published a book about CVS and the
|
||
|
open-source development model it enables (cvsbook.red-bean.com). Karl
|
||
|
and Jim had long talked about writing a replacement for CVS; Jim had
|
||
|
even drafted a new, theoretical repository design. Finally, in
|
||
|
February of 2000, Brian Behlendorf of CollabNet (www.collab.net)
|
||
|
offered Karl a full-time job to write a CVS replacement. Karl
|
||
|
gathered a team together and work began in May.
|
||
|
|
||
|
The team settled on a few simple goals: it was decided that Subversion
|
||
|
would be designed as a functional replacement for CVS. It would do
|
||
|
everything that CVS does -- preserving the same development model
|
||
|
while fixing the flaws in CVS's (lack-of) design. Existing CVS users
|
||
|
would be the target audience: any CVS user should be able to start
|
||
|
using Subversion with little effort. Any other SCM "bonus features"
|
||
|
were decided to be of secondary importance (at least before a 1.0
|
||
|
release.)
|
||
|
|
||
|
At the time of writing, the original team has been coding for a little
|
||
|
over a year, and we have a number of excellent volunteer contributors.
|
||
|
(Subversion, like CVS, is a open-source project!)
|
||
|
|
||
|
|
||
|
Subversion's Features
|
||
|
----------------------
|
||
|
|
||
|
Here's a quick run-down of some of the reasons you should be excited
|
||
|
about Subversion:
|
||
|
|
||
|
* Real copies and renames. The Subversion repository doesn't use
|
||
|
RCS files at all; instead, it implements a 'virtual' versioned
|
||
|
filesystem that tracks tree-structures over time (described
|
||
|
below). Files *and* directories are versioned. At last, there
|
||
|
are real client-side `mv' and `cp' commands that behave just as
|
||
|
you think.
|
||
|
|
||
|
* Atomic commits. A commit either goes into the repository
|
||
|
completely, or not all.
|
||
|
|
||
|
* Advanced network layer. The Subversion network server is Apache,
|
||
|
and client and server speak WebDAV(2) to one another. (See the
|
||
|
'design' section below.)
|
||
|
|
||
|
* Faster network access. A binary diffing algorithm is used to
|
||
|
store and transmit deltas in *both* directions, regardless of
|
||
|
whether a file is of text or binary type.
|
||
|
|
||
|
* Filesystem "properties". Each file or directory has an invisible
|
||
|
hashtable attached. You can invent and store any arbitrary
|
||
|
key/value pairs you wish: owner, perms, icons, app-creator,
|
||
|
mime-type, personal notes, etc. This is a general-purpose feature
|
||
|
for users. Properties are versioned, just like file contents.
|
||
|
And some properties are auto-detected, like the mime-type of a
|
||
|
file (no more remembering to use the '-kb' switch!)
|
||
|
|
||
|
* Extensible and hackable. Subversion has no historical baggage; it
|
||
|
was designed and then implemented as a collection of shared C
|
||
|
libraries with well-defined APIs. This makes Subversion extremely
|
||
|
maintainable and usable by other applications and languages.
|
||
|
|
||
|
* Easy migration. The Subversion command-line client is very
|
||
|
similar to CVS; the development model is the same, so CVS users
|
||
|
should have little trouble making the switch. Development of a
|
||
|
'cvs2svn' repository converter is in progress.
|
||
|
|
||
|
* It's Free. Subversion is released under a Apache/BSD-style
|
||
|
open-source license.
|
||
|
|
||
|
|
||
|
Subversion's Design
|
||
|
-------------------
|
||
|
|
||
|
Subversion has a modular design; it's implemented as a collection of C
|
||
|
libraries. Each layer has a well-defined purpose and interface. In
|
||
|
general, code flow begins at the top of the diagram and flows
|
||
|
"downward" -- each layer provides an interface to the layer above it.
|
||
|
|
||
|
<<insert diagram here: svn.tiff>>
|
||
|
|
||
|
|
||
|
Let's take a short tour of these layers, starting at the bottom.
|
||
|
|
||
|
|
||
|
--> The Subversion filesystem.
|
||
|
|
||
|
The Subversion Filesystem is *not* a kernel-level filesystem that one
|
||
|
would install in an operating system (like the Linux ext2 fs.)
|
||
|
Instead, it refers to the design of Subversion's repository. The
|
||
|
repository is built on top of a database -- currently Berkeley DB --
|
||
|
and thus is a collection of .db files. However, a library accesses
|
||
|
these files and exports a C API that simulates a filesystem --
|
||
|
specifically, a "versioned" filesystem.
|
||
|
|
||
|
This means that writing a program to access the repository is like
|
||
|
writing against other filesystem APIs: you can open files and
|
||
|
directories for reading and writing as usual. The main difference is
|
||
|
that this particular filesystem never loses data when written to; old
|
||
|
versions of files and directories are always saved as historical
|
||
|
artifacts.
|
||
|
|
||
|
Whereas CVS's backend (RCS) stores revision numbers on a per-file
|
||
|
basis, Subversion numbers entire trees. Each atomic 'commit' to the
|
||
|
repository creates a completely new filesystem tree, and is
|
||
|
individually labeled with a single, global revision number. Files and
|
||
|
directories which have changed are rewritten (and older versions are
|
||
|
backed up and stored as differences against the latest version), while
|
||
|
unchanged entries are pointed to via a shared-storage mechanism. This
|
||
|
is how the repository is able to version tree structures, not just
|
||
|
file contents.
|
||
|
|
||
|
Finally, it should be mentioned that using a database like Berkeley DB
|
||
|
immediately provides other nice features that Subversion needs: data
|
||
|
integrity, atomic writes, recoverability, and hot backups. (See
|
||
|
www.sleepycat.com for more information.)
|
||
|
|
||
|
|
||
|
--> The network layer.
|
||
|
|
||
|
Subversion has the mark of Apache all over it. At its very core, the
|
||
|
client uses the Apache Portable Runtime (APR) library. (In fact, this
|
||
|
means that Subversion client should compile and run anywhere Apache
|
||
|
httpd does -- right now, this list includes all flavors of Unix,
|
||
|
Win32, BeOS, OS/2, Mac OS X, and possibly Netware.)
|
||
|
|
||
|
However, Subversion depends on more than just APR -- the Subversion
|
||
|
"server" is Apache httpd itself.
|
||
|
|
||
|
Why was Apache chosen? Ultimately, the decision was about not
|
||
|
reinventing the wheel. Apache is a time-tested, open-source server
|
||
|
process that ready for serious use, yet is still extensible. It can
|
||
|
sustain a high network load. It runs on many platforms and can
|
||
|
operate through firewalls. It's able to use a number of different
|
||
|
authentication protocols. It can do network pipelining and caching.
|
||
|
By using Apache as a server, Subversion gets all these features for
|
||
|
free. Why start from scratch?
|
||
|
|
||
|
Subversion uses WebDAV as its network protocol. DAV (Distributed
|
||
|
Authoring and Versioning) is a whole discussion in itself (see
|
||
|
www.webdav.org) -- but in short, it's an extension to HTTP that allows
|
||
|
reads/writes and "versioning" of files over the web. The Subversion
|
||
|
project is hoping to ride a slowly rising tide of support for this
|
||
|
protocol: all of the latest file-browsers for Win32, MacOS, and GNOME
|
||
|
speak this protocol already. Interoperability will (hopefully) become
|
||
|
more and more of a bonus over time.
|
||
|
|
||
|
For users who simply wish to access Subversion repositories on local
|
||
|
disk, the client can do this too; no network is required. The
|
||
|
"Repository Access" layer (RA) is an abstract API implemented by both
|
||
|
the DAV and local-access RA libraries. This is a specific benefit of
|
||
|
writing a "librarized" version control system; it's a big win over
|
||
|
CVS, which has two very different, difficult-to-maintain codepaths for
|
||
|
local vs. network repository-access. Feel like writing a new network
|
||
|
protocol for Subversion? Just write a new library that implements the
|
||
|
RA API!
|
||
|
|
||
|
|
||
|
--> The client libraries.
|
||
|
|
||
|
On the client side, the Subversion "working copy" library maintains
|
||
|
administrative information within special SVN/ subdirectories, similar
|
||
|
in purpose to the CVS/ administrative directories found in CVS working
|
||
|
copies.
|
||
|
|
||
|
A glance inside the typical SVN/ directory turns up a bit more than
|
||
|
usual, however. The `entries' file contains XML which describes the
|
||
|
current state of the working copy directory (and which basically
|
||
|
serves the purposes of CVS's Entries, Root, and Repository files
|
||
|
combined). But other items present (and not found in CVS/) include
|
||
|
storage locations for the versioned "properties" (the metadata
|
||
|
mentioned in 'Subversion Features' above) and private caches of
|
||
|
pristine versions of each file. This latter feature provides the
|
||
|
ability to report local modifications -- and do reversions --
|
||
|
*without* network access. Authentication data is also stored within
|
||
|
SVN/, rather than in a single .cvspass-like file.
|
||
|
|
||
|
The Subversion "client" library has the broadest responsibility; its
|
||
|
job is to mingle the functionality of the working-copy library with
|
||
|
that of the repository-access library, and then to provide a
|
||
|
highest-level API to any application that wishes to perform general
|
||
|
version control actions.
|
||
|
|
||
|
For example: the C routine `svn_client_checkout()' takes a URL as an
|
||
|
argument. It passes this URL to the repository-access library and
|
||
|
opens an authenticated session with a particular repository. It then
|
||
|
asks the repository for a certain tree, and sends this tree into the
|
||
|
working-copy library, which then writes a full working copy to disk
|
||
|
(SVN/ directories and all.)
|
||
|
|
||
|
The client library is designed to be used by any application. While
|
||
|
the Subversion source code includes a standard command-line client, it
|
||
|
should be very easy to write any number of GUI clients on top of the
|
||
|
client library. Hopefully, these GUIs should someday prove to be much
|
||
|
better than the current crop of CVS GUI applications (the majority of
|
||
|
which are no more than fragile "wrappers" around the CVS command-line
|
||
|
client.)
|
||
|
|
||
|
In addition, proper SWIG bindings (www.swig.org) should make
|
||
|
the Subversion API available to any number of languages: java, perl,
|
||
|
python, guile, and so on. In order to Subvert CVS, it helps to be
|
||
|
ubiquitous!
|
||
|
|
||
|
|
||
|
Subversion's Future
|
||
|
-------------------
|
||
|
|
||
|
The release of Subversion 1.0 is currently planned for early 2002.
|
||
|
After the release of 1.0, Subversion is slated for additions such as
|
||
|
i18n support, "intelligent" merging, better "changeset" manipulation,
|
||
|
client-side plugins, and improved features for server administration.
|
||
|
(Also on the wishlist is an eclectic collection of ideas, such as
|
||
|
distributed, replicating repositories.)
|
||
|
|
||
|
A final thought from Subversion's FAQ:
|
||
|
|
||
|
"We aren't (yet) attempting to break new ground in SCM systems, nor
|
||
|
are we attempting to imitate all the best features of every SCM
|
||
|
system out there. We're trying to replace CVS."
|
||
|
|
||
|
If, in three years, Subversion is widely presumed to be the "standard"
|
||
|
SCM system in the open-source community, then the project will have
|
||
|
succeeded. But the future is still hazy: ultimately, Subversion
|
||
|
will have to win this position on its own technical merits.
|
||
|
|
||
|
Patches are welcome.
|
||
|
|
||
|
|
||
|
For More Information
|
||
|
--------------------
|
||
|
|
||
|
Please visit the Subversion project website at
|
||
|
http://subversion.tigris.org. There are discussion lists to join, and
|
||
|
the source code is available via anonymous CVS -- and soon through
|
||
|
Subversion itself.
|
||
|
|