4e0ffe0bae
final release. Obtained from: ftp.isc.org
285 lines
12 KiB
Plaintext
285 lines
12 KiB
Plaintext
|
|
|
|
Description of Dynamic Update and T_UNSPEC Code
|
|
|
|
|
|
|
|
|
|
Added by Mike Schwartz
|
|
University of Washington Computer Science Department
|
|
11/86
|
|
schwartz@cs.washington.edu
|
|
|
|
|
|
|
|
|
|
I have incorporated 2 new features into BIND:
|
|
1. Code to allow (unauthenticated) dynamic updates: surrounded by
|
|
#ifdef ALLOW_UPDATES
|
|
2. Code to allow data of unspecified type: surrounded by
|
|
#ifdef ALLOW_T_UNSPEC
|
|
|
|
Note that you can have one or the other or both (or neither) of these
|
|
modifications running, by appropriately modifying the makefiles. Also,
|
|
the external interface isn't changed (other than being extended), i.e.,
|
|
a BIND server that allows dynamic updates and/or T_UNSPEC data can
|
|
still talk to a 'vanilla' server using the 'vanilla' operations.
|
|
|
|
The description that follows is broken into 3 parts: a functional
|
|
description of the dynamic update facility, a functional description of
|
|
the T_UNSPEC facility, and a discussion of the implementation of
|
|
dynamic updates. The implementation description is mostly intended for
|
|
those who want to make future enhancements (especially the addition of
|
|
a good authentication mechanism). If you make enhancements, I would be
|
|
interested in hearing about them.
|
|
|
|
|
|
|
|
|
|
|
|
1. Dynamic Update Facility
|
|
|
|
I added this code in conjunction with my research into naming in large
|
|
heterogeneous systems. For the purposes of this research, I ignored
|
|
security issues. In other words, no authentication/authorization
|
|
mechanism exists to control updates. Authentication will hopefully be
|
|
addressed at some future point (although probably not by me). In the
|
|
mean time, BIND Internet name servers (as opposed to "private" name
|
|
server networks operating with their own port numbers, as I use in my
|
|
research) should be compiled *without* -DALLOW_UPDATES, so that the
|
|
integrity of the Internet name database won't be compromised by this
|
|
code.
|
|
|
|
|
|
There are 5 different dynamic update interfaces:
|
|
UPDATEA - add a resource record
|
|
UPDATED - delete a specific resource record
|
|
UPDATEDA - delete all named resource records
|
|
UPDATEM - modify a specific resource record
|
|
UPDATEMA - modify all named resource records
|
|
|
|
These all work through the normal resolver interface, i.e., these
|
|
interfaces are opcodes, and the data in the buffers passed to
|
|
res_mkquery must conform to what is expected for the particular
|
|
operation (see the #ifdef ALLOW_UPDATES extensions to nstest.c for
|
|
example usage).
|
|
|
|
UPDATEM is logically equivalent to an UPDATED followed by an UPDATEA,
|
|
except that the updates occur atomically at the primary server (as
|
|
usual with Domain servers, secondaries may become temporarily
|
|
inconsistent). The difference between UPDATED and UPDATEDA is that the
|
|
latter allows you to delete all RRs associated with a name; similarly
|
|
for UPDATEM and UPDATEMA. The reason for the UPDATE{D,M}A interfaces
|
|
is two-fold:
|
|
|
|
1. Sometimes you want to delete/modify some data, but you know you'll
|
|
only have a single RR for that data; in such a case, it's more
|
|
convenient to delete/modify the RR by just giving the name;
|
|
otherwise, you would have to first look it up, and then
|
|
delete/modify it.
|
|
|
|
2. It is sometimes useful to be able to delete/modify multiple RRs
|
|
this way, since one can then perform the operation atomically.
|
|
Otherwise, one would have to delete/modify the RRs one-by-one.
|
|
|
|
One additional point to note about UPDATEMA is that it will return a
|
|
success status if there were *zero* or more RRs associated with the given
|
|
name (and the RR add succeeds), whereas UPDATEM, UPDATED, and UPDATEDA
|
|
will return a success status if there were *one* or more RRs associated
|
|
with the given name. The reason for the difference is to handle the
|
|
(probably common) case where what you want to do is set a particular
|
|
name to contain a single RR, irrespective of whether or not it was
|
|
already set.
|
|
|
|
|
|
|
|
|
|
2. T_UNSPEC Facility
|
|
|
|
Type T_UNSPEC allows you to store data whose layout BIND doesn't
|
|
understand. Data of this type is not marshalled (i.e., converted
|
|
between host and network representation, as is done, for example, with
|
|
Internet addresses) by BIND, so it is up to the client to make sure
|
|
things work out ok w.r.t. heterogeneous data representations. The way
|
|
I use this type is to have the client marshal data, store it, retrieve
|
|
it, and demarshal it. This way I can store arbitrary data in BIND
|
|
without having to add new code for each specific type.
|
|
|
|
T_UNSPEC data is dumped in an ASCII-encoded, checksummed format so
|
|
that, although it's not human-readable, it at least doesn't fill the
|
|
dump file with unprintable characters.
|
|
|
|
Type T_UNSPEC is important for my research environment, where
|
|
potentially lots of people want to store data in the name service, and
|
|
each person's data looks different. Instead of having BIND understand
|
|
the format of each of their data types, the clients define marshaling
|
|
routines and pass buffers of marshalled data to BIND; BIND never tries
|
|
to demarshal the data...it just holds on to it, and gives it back to
|
|
the client when the client requests it, and the client must then
|
|
demarshal it.
|
|
|
|
The Xerox Network System's name service (the Clearinghouse) works this
|
|
way. The reason 'vanilla' BIND understands the format of all the data
|
|
it holds is probably that BIND is tailored for a very specific
|
|
application, and wants to make sure the data it holds makes sense (and,
|
|
for some types, BIND needs to take additional action depending on the
|
|
data's semantics). For more general purpose name services (like the
|
|
Clearinghouse and my usage of BIND), this approach is less tractable.
|
|
|
|
See the #ifdef ALLOW_T_UNSPEC extensions to nstest.c for example usage of
|
|
this type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3. Dynamic Update Implementation Description
|
|
|
|
This section is divided into 3 subsections: General Discussion,
|
|
Miscellaneous Points, and Known Defects.
|
|
|
|
|
|
|
|
|
|
3.1 General Discussion
|
|
|
|
The basic scheme is this: When an update message arrives, a call is
|
|
made to InitDynUpdate, which first looks up the SOA record for the zone
|
|
the update affects. If this is the primary server for that zone, we do
|
|
the update and then update the zone serial number (so that secondaries
|
|
will refresh later). If this is a secondary server, we forward the
|
|
update to the primary, and if that's successful, we update our copy
|
|
afterwards. If it's neither, we refuse the update. (One might think
|
|
to try to propagate the update to an authoritative server; I figured
|
|
that updates will probably be most likely within an administrative
|
|
domain anyway; this could be changed if someone has strong feelings
|
|
about it).
|
|
|
|
Note that this mechanism disallows updates when the primary is
|
|
down, preserving the Domain scheme's consistency requirements,
|
|
but making the primary a critical point for updates. This seemed
|
|
reasonable to me because
|
|
1. Alternative schemes must deal with potentially complex
|
|
situations involving merging of inconsistent secondary
|
|
updates
|
|
2. Updates are presumed to be rare relative to read accesses,
|
|
so this increased restrictiveness for updates over reads is
|
|
probably not critical
|
|
|
|
I have placed comments through out the code, so it shouldn't be
|
|
too hard to see what I did. The majority of the processing is in
|
|
doupdate() and InitDynUpdate(). Also, I added a field to the zone
|
|
struct, to keep track of when zones get updated, so that only changed
|
|
zones get checkpointed.
|
|
|
|
|
|
|
|
|
|
|
|
3.2 Miscellaneous Points
|
|
|
|
I use ns_maint to call zonedump() if the database changes, to
|
|
provide a checkpointing mechanism. I use the zone refresh times to
|
|
set up ns_maint interrupts if there are either secondaries or
|
|
primaries. Hence, if there is a secondary, this interrupt can cause
|
|
zoneref (as before), and if there is a primary, this interrupt can
|
|
cause doadump. I also checkpoint if needed before shutting down.
|
|
|
|
You can force a server to checkpoint any changed zones by sending the
|
|
maint signal (SIGALRM) to the process. Otherwise it just checkpoints
|
|
during maint. interrupts, or when being shutdown (with SIGTERM).
|
|
Sending it the dump signal causes the database to be dumped into the
|
|
(single) dump file, but doesn't checkpoint (i.e., update the boot
|
|
files). Note that the boot files will be overwritten with checkpoint
|
|
files, so if you want to preserve the comments, you should keep copies
|
|
of the original boot files separate from the versions that are actually
|
|
used.
|
|
|
|
I disallow T_SOA updates, for several reasons:
|
|
- T_SOA deletes at the primary wont be discovered by the secondaries
|
|
until they try to request them at maint time, which will cause
|
|
a failure
|
|
- the corresponding NS record would have to be deleted at the same
|
|
time (atomically) to avoid various problems
|
|
- T_SOA updates would have to be done in the right order, or else
|
|
the primary and secondaries will be out-of-sync for that zone.
|
|
My feeling is that changing the zone topology is a weighty enough thing
|
|
to do that it should involve changing the load file and reloading all
|
|
affected servers.
|
|
|
|
There are alot of places where bind exits due to catastrophic failures
|
|
(mainly malloc failures). I don't try to dump the database in these
|
|
places because it's probably inconsistent anyway. It's probably better
|
|
to depend on the most recent dump.
|
|
|
|
|
|
|
|
|
|
|
|
3.2 Known Defects
|
|
|
|
1. I put the following comment in nlookup (db_lookup.c):
|
|
|
|
Note: at this point, if np->n_data is NULL, we could be in one
|
|
of two situations: Either we have come across a name for which
|
|
all the RRs have been (dynamically) deleted, or else we have
|
|
come across a name which has no RRs associated with it because
|
|
it is just a place holder (e.g., EDU). In the former case, we
|
|
would like to delete the namebuf, since it is no longer of use,
|
|
but in the latter case we need to hold on to it, so future
|
|
lookups that depend on it don't fail. The only way I can see
|
|
of doing this is to always leave the namebufs around (although
|
|
then the memory usage continues to grow whenever names are
|
|
added, and can never shrink back down completely when all their
|
|
associated RRs are deleted).
|
|
|
|
Thus, there is a problem that the memory usage will keep growing for
|
|
the situation described. You might just choose to ignore this
|
|
problem (since I don't see any good way out), since things probably
|
|
wont grow fast anyway (how many names are created and then deleted
|
|
during a single server incarnation, after all?)
|
|
|
|
The problem is that one can't delete old namebufs because one would
|
|
want to do it from db_update, but db_update calls nlookup to do the
|
|
actual work, and can't do it there, since we need to maintain place
|
|
holders. One could make db_update not call nlookup, so we know it's
|
|
ok to delete the namebuf (since we know the call is part of a delete
|
|
call); but then there is code with alot of overlapping functionality
|
|
in the 2 routines.
|
|
|
|
This also causes another problem: If you create a name and then do
|
|
UPDATEDA, all it's RRs get deleted, but the name remains; then, if you
|
|
do a lookup on that name later, the name is found in the hash table,
|
|
but no RRs are found for it. It then forwards the query to itself (for
|
|
some reason), and then somehow decides there is no such domain, and then
|
|
returns (with the correct answer, but after going through extra work).
|
|
But the name remains, and each time it is looked up, we go through
|
|
these same steps. This should be fixed, but I don't have time right
|
|
now (and the right answer seems to come back anyway, so it's good
|
|
enough for now).
|
|
|
|
2. There are 2 problems that crop up when you store data (other than
|
|
T_SOA and T_NS records) in the root:
|
|
a. Can't get primary to doaxfr RRs other than SOA and NS to
|
|
secondary.
|
|
b. Upon checkpoint (zonedump), this data sometimes comes out after other
|
|
data in the root, so that (since the SOA and NS records have null
|
|
names), they will get interpreted as being records under the
|
|
other names upon the next boot up. For example, if you have a
|
|
T_A record called ABC, the checkpoint may look like:
|
|
$ORIGIN .
|
|
ABC IN A 128.95.1.3
|
|
99999999 IN NS UW-BORNEO.
|
|
IN SOA UW-BORNEO. SCHWARTZ.CS.WASHINGTON.EDU.
|
|
( 50 3600 300 3600000 3600 )
|
|
Then when booting up the next time, the SOA and NS records get
|
|
interpreted as being called "ABC" rather than the null root
|
|
name.
|
|
|
|
3. The secondary server caches the T_A RR for the primary, and hence when
|
|
it tries to ns_forw an update, it won't find the address of the primary
|
|
using nslookup unless that T_A RR is *also* stored in the main hashtable
|
|
(by putting it in a named.db file as well as the named.ca file).
|
|
|