freebsd-dev/contrib/bind/doc/misc/DynamicUpdate



		Description of Dynamic Update and T_UNSPEC Code


                         Added by Mike Schwartz
          University of Washington Computer Science Department
                                  11/86
                       schwartz@cs.washington.edu


I have incorporated 2 new features into BIND:
	1. Code to allow (unauthenticated) dynamic updates: surrounded by
	   #ifdef ALLOW_UPDATES
	2. Code to allow data of unspecified type: surrounded by
	   #ifdef ALLOW_T_UNSPEC

Note that you can have one or the other or both (or neither) of these
modifications running, by appropriately modifying the makefiles.  Also,
the external interface isn't changed (other than being extended), i.e.,
a BIND server that allows dynamic updates and/or T_UNSPEC data can
still talk to a 'vanilla' server using the 'vanilla' operations.

The description that follows is broken into 3 parts: a functional
description of the dynamic update facility, a functional description of
the T_UNSPEC facility, and a discussion of the implementation of
dynamic updates.  The implementation description is mostly intended for
those who want to make future enhancements (especially the addition of
a good authentication mechanism).  If you make enhancements, I would be
interested in hearing about them.


			1. Dynamic Update Facility

I added this code in conjunction with my research into naming in large
heterogeneous systems.  For the purposes of this research, I ignored
security issues.  In other words, no authentication/authorization
mechanism exists to control updates.  Authentication will hopefully be
addressed at some future point (although probably not by me). In the
mean time, BIND Internet name servers (as opposed to "private" name
server networks operating with their own port numbers, as I use in my
research) should be compiled *without* -DALLOW_UPDATES, so that the
integrity of the Internet name database won't be compromised by this
code.


There are 5 different dynamic update interfaces:
	UPDATEA  - add a resource record
	UPDATED  - delete a specific resource record
	UPDATEDA - delete all named resource records
	UPDATEM  - modify a specific resource record
	UPDATEMA - modify all named resource records

These all work through the normal resolver interface, i.e., these
interfaces are opcodes, and the data in the buffers passed to
res_mkquery must conform to what is expected for the particular
operation (see the #ifdef ALLOW_UPDATES extensions to nstest.c for
example usage).

UPDATEM is logically equivalent to an UPDATED followed by an UPDATEA,
except that the updates occur atomically at the primary server (as
usual with Domain servers, secondaries may become temporarily
inconsistent).  The difference between UPDATED and UPDATEDA is that the
latter allows you to delete all RRs associated with a name; similarly
for UPDATEM and UPDATEMA.  The reason for the UPDATE{D,M}A interfaces
is two-fold:

	1. Sometimes you want to delete/modify some data, but you know you'll
	   only have a single RR for that data; in such a case, it's more
	   convenient to delete/modify the RR by just giving the name;
	   otherwise, you would have to first look it up, and then
	   delete/modify it.

	2. It is sometimes useful to be able to delete/modify multiple RRs
	   this way, since one can then perform the operation atomically.
	   Otherwise, one would have to delete/modify the RRs one-by-one.

One additional point to note about UPDATEMA is that it will return a
success status if there were *zero* or more RRs associated with the given
name (and the RR add succeeds), whereas UPDATEM, UPDATED, and UPDATEDA
will return a success status if there were *one* or more RRs associated
with the given name.  The reason for the difference is to handle the
(probably common) case where what you want to do is set a particular
name to contain a single RR, irrespective of whether or not it was
already set.


			2. T_UNSPEC Facility

Type T_UNSPEC allows you to store data whose layout BIND doesn't
understand.  Data of this type is not marshalled (i.e., converted
between host and network representation, as is done, for example, with
Internet addresses) by BIND, so it is up to the client to make sure
things work out ok w.r.t. heterogeneous data representations.  The way
I use this type is to have the client marshal data, store it, retrieve
it, and demarshal it.  This way I can store arbitrary data in BIND
without having to add new code for each specific type.

T_UNSPEC data is dumped in an ASCII-encoded, checksummed format so
that, although it's not human-readable, it at least doesn't fill the
dump file with unprintable characters.

Type T_UNSPEC is important for my research environment, where
potentially lots of people want to store data in the name service, and
each person's data looks different.  Instead of having BIND understand
the format of each of their data types, the clients define marshaling
routines and pass buffers of marshalled data to BIND; BIND never tries
to demarshal the data...it just holds on to it, and gives it back to
the client when the client requests it, and the client must then
demarshal it.

The Xerox Network System's name service (the Clearinghouse) works this
way.  The reason 'vanilla' BIND understands the format of all the data
it holds is probably that BIND is tailored for a very specific
application, and wants to make sure the data it holds makes sense (and,
for some types, BIND needs to take additional action depending on the
data's semantics).  For more general purpose name services (like the
Clearinghouse and my usage of BIND), this approach is less tractable.

See the #ifdef ALLOW_T_UNSPEC extensions to nstest.c for example usage of
this type.


		3. Dynamic Update Implementation Description

This section is divided into 3 subsections: General Discussion,
Miscellaneous Points, and Known Defects.


		3.1 General Discussion

The basic scheme is this: When an update message arrives, a call is
made to InitDynUpdate, which first looks up the SOA record for the zone
the update affects.  If this is the primary server for that zone, we do
the update and then update the zone serial number (so that secondaries
will refresh later).  If this is a secondary server, we forward the
update to the primary, and if that's successful, we update our copy
afterwards.  If it's neither, we refuse the update.  (One might think
to try to propagate the update to an authoritative server; I figured
that updates will probably be most likely within an administrative
domain anyway; this could be changed if someone has strong feelings
about it).

Note that this mechanism disallows updates when the primary is
down, preserving the Domain scheme's consistency requirements,
but making the primary a critical point for updates.  This seemed
reasonable to me because
	1. Alternative schemes must deal with potentially complex
	   situations involving merging of inconsistent secondary
	   updates
	2. Updates are presumed to be rare relative to read accesses,
	   so this increased restrictiveness for updates over reads is
	   probably not critical

I have placed comments through out the code, so it shouldn't be
too hard to see what I did.  The majority of the processing is in
doupdate() and InitDynUpdate().  Also, I added a field to the zone
struct, to keep track of when zones get updated, so that only changed
zones get checkpointed.


		3.2 Miscellaneous Points

I use ns_maint to call zonedump() if the database changes, to
provide a checkpointing mechanism.  I use the zone refresh times to
set up ns_maint interrupts if there are either secondaries or
primaries.  Hence, if there is a secondary, this interrupt can cause
zoneref (as before), and if there is a primary, this interrupt can
cause doadump.  I also checkpoint if needed before shutting down.

You can force a server to checkpoint any changed zones by sending the
maint signal (SIGALRM) to the process.  Otherwise it just checkpoints
during maint. interrupts, or when being shutdown (with SIGTERM).
Sending it the dump signal causes the database to be dumped into the
(single) dump file, but doesn't checkpoint (i.e., update the boot
files).  Note that the boot files will be overwritten with checkpoint
files, so if you want to preserve the comments, you should keep copies
of the original boot files separate from the versions that are actually
used.

I disallow T_SOA updates, for several reasons:
	- T_SOA deletes at the primary wont be discovered by the secondaries
	  until they try to request them at maint time, which will cause
	  a failure
	- the corresponding NS record would have to be deleted at the same
	  time (atomically) to avoid various problems
	- T_SOA updates would have to be done in the right order, or else
	  the primary and secondaries will be out-of-sync for that zone.
My feeling is that changing the zone topology is a weighty enough thing
to do that it should involve changing the load file and reloading all
affected servers.

There are alot of places where bind exits due to catastrophic failures
(mainly malloc failures).  I don't try to dump the database in these
places because it's probably inconsistent anyway.  It's probably better
to depend on the most recent dump.


		3.2 Known Defects

1. I put the following comment in nlookup (db_lookup.c):

	Note: at this point, if np->n_data is NULL, we could be in one
	of two situations: Either we have come across a name for which
	all the RRs have been (dynamically) deleted, or else we have
	come across a name which has no RRs associated with it because
	it is just a place holder (e.g., EDU).  In the former case, we
	would like to delete the namebuf, since it is no longer of use,
	but in the latter case we need to hold on to it, so future
	lookups that depend on it don't fail.  The only way I can see
	of doing this is to always leave the namebufs around (although
	then the memory usage continues to grow whenever names are
	added, and can never shrink back down completely when all their
	associated RRs are deleted).

   Thus, there is a problem that the memory usage will keep growing for
   the situation described.  You might just choose to ignore this
   problem (since I don't see any good way out), since things probably
   wont grow fast anyway (how many names are created and then deleted
   during a single server incarnation, after all?)

   The problem is that one can't delete old namebufs because one would
   want to do it from db_update, but db_update calls nlookup to do the
   actual work, and can't do it there, since we need to maintain place
   holders.  One could make db_update not call nlookup, so we know it's
   ok to delete the namebuf (since we know the call is part of a delete
   call); but then there is code with alot of overlapping functionality
   in the 2 routines.

   This also causes another problem:  If you create a name and then do
   UPDATEDA, all it's RRs get deleted, but the name remains; then, if you
   do a lookup on that name later, the name is found in the hash table,
   but no RRs are found for it.  It then forwards the query to itself (for
   some reason), and then somehow decides there is no such domain, and then
   returns (with the correct answer, but after going through extra work).
   But the name remains, and each time it is looked up, we go through
   these same steps.  This should be fixed, but I don't have time right
   now (and the right answer seems to come back anyway, so it's good
   enough for now).

2. There are 2 problems that crop up when you store data (other than
   T_SOA and T_NS records) in the root:
   a. Can't get primary to doaxfr RRs other than SOA and NS to
      secondary.
   b. Upon checkpoint (zonedump), this data sometimes comes out after other
      data in the root, so that (since the SOA and NS records have null
      names), they will get interpreted as being records under the
      other names upon the next boot up.  For example, if you have a
      T_A record called ABC, the checkpoint may look like:
	 $ORIGIN .
	 ABC     IN      A       128.95.1.3
	 99999999        IN      NS      UW-BORNEO.
	 IN      SOA     UW-BORNEO. SCHWARTZ.CS.WASHINGTON.EDU.
	 ( 50 3600 300 3600000 3600 )
      Then when booting up the next time, the SOA and NS records get
      interpreted as being called "ABC" rather than the null root
      name.

3. The secondary server caches the T_A RR for the primary, and hence when
   it tries to ns_forw an update, it won't find the address of the primary
   using nslookup unless that T_A RR is *also* stored in the main hashtable
   (by putting it in a named.db file as well as the named.ca file).