1999-12-09 13:01:21 +00:00

333 lines
19 KiB
HTML

<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="GENERATOR" CONTENT="Mozilla/4.01 [en] (Win95; I) [Netscape]">
<TITLE>Mitigation Rules and the ``prefer'' Keyword
</TITLE>
</HEAD>
<BODY>
<H3>
Mitigation Rules and the <TT>prefer</TT> Keyword</H3>
<HR>
<H4>
Introduction</H4>
The mechanics of the NTP algorithms which select the best data sample from
each available peer and the best subset of the peer population have been
finely crafted to resist network jitter, faults in the network or peer
operations, and to deliver the best possible accuracy. Most of the time
these algorithms do a good job without requiring explicit manual tailoring
of the configuration file. However, there are times when the accuracy can
be improved by some careful tailoring. The following sections explain how
to do this using explicit configuration items and special signals, when
available, that are generated by some radio clocks.
<P>In order to provide robust backup sources, primary (stratum-1) servers
are usually operated in a diversity configuration, in which the server
operates with a number of remote peers in addition to one or more radio
or modem clocks operating as local peers. In these configurations the suite
of algorithms used in NTP to refine the data from each peer separately
and to select and weight the data from a number of peers are used with
the entire ensemble of remote peers and local peers. As the result of these
algorithms, a set of <I>survivors</I> are identified which can presumably
provide the most reliable and accurate time. Ordinarily, the individual
clock offsets of the survivors are combined on a weighted average basis
to produce an offset used to control the system clock.
<P>However, because of small but significant systematic time offsets between
the survivors, it is in general not possible to achieve the lowest jitter
and highest stability in these configurations. This happens because the
selection algorithm tends to <I>clockhop</I> between survivors of substantially
the same quality, but showing small systematic offsets between them. In
addition, there are a number of configurations involving pulse-per-second
(PPS) signals, modem backup services and other special cases, so that a
set of mitigation rules becomes necessary to select a single peer from
among the survivors. These rules are based on a set of special characteristics
of the various peers and reference clock drivers specified in the configuration
file.
<H4>
The <TT>prefer</TT> Peer</H4>
The mitigation rules are designed to provide an intelligent selection between
various peers of substantially the same statistical quality. They is designed
to provide the best quality time without compromising the normal operation
of the NTP algorithms. The mitigation scheme in its present form is not
an integral component of the NTP Version 3 specification RFC- 1305. but
is to be included in the version 4 specification when it is published.
The scheme is based on the concept of <I>prefer peer</I>, which is specified
by including the <TT>prefer</TT> keyword with the associated <TT>server</TT>
or <TT>peer</TT> command in the configuration file. This keyword can be
used with any peer or server, but is most commonly used with a radio clock.
While the scheme does not forbid it, it does not seem useful to designate
more than one peer as preferred, since the additional complexities to mitigate
among them do not seem justified from on-air experience.
<P>The prefer scheme works on the set of peers that have survived the sanity
checks and intersection algorithms of the clock selection procedures. Ordinarily,
the members of this set can be considered <I>truechimers</I> and any one
of them could in principle provide correct time; however, due to various
error contributions, not all can provide the most accurate and stable time.
The job of the clustering algorithm, which is invoked at this point, is
to select the best subset of the survivors providing the least variance
in the combined ensemble average, compared to the variance in each member
of the subset separately. The detailed operation of the clustering algorithm,
which is given in the specification, is not important here, other than
to point out it operates in rounds, where a survivor, presumably the worst
of the lot, is discarded in each round until one of several termination
conditions is met.
<P>In the prefer scheme the clustering algorithm is modified so that the
prefer peer is never discarded; on the contrary, its potential removal
becomes a termination condition. If the original algorithm were about to
toss out the prefer peer, the algorithm terminates right there. The prefer
peer can still be discarded by the sanity checks and intersection algorithms,
of course, but it will always survive the clustering algorithm. If it does
not survive or for some reason it fails to provide updates, it will eventually
become unreachable and the clock selection will remitigate to select the
next best source.
<P>Along with this behavior, the clock selection procedures are modified
so that the combining algorithm is not used when a prefer peer is present.
Instead, the offset of the prefer peer is used exclusively as the synchronization
source. In the usual case involving a radio clock and a flock of remote
stratum-1 peers, and with the radio clock designated a prefer peer, the
result is that the high quality radio time disciplines the server clock
as long as the radio itself remains operational and with valid time, as
determined from the remote peers, sanity checks and intersection algorithm.
<H4>
Peer Classification</H4>
In order to understand the effects of the various intricate schemes involved,
it is necessary to understand some arcane details on how the algorithms
decide on a synchronization source, when more than one source is available.
This is done on the basis of a set of explicit mitigation rules, which
define special classes of remote and local peers as a function of configuration
declarations and reference clock driver type:
<OL>
<LI>
The prefer peer is designated using the <TT>prefer</TT> keyword with the
<TT>server</TT> or <TT>peer</TT> commands. All other things being equal,
this peer will be selected for synchronization over all other survivors
of the clock selection procedures.</LI>
<BR>&nbsp;
<LI>
When a PPS signal is connected via the PPS Clock Discipline driver (type
22), this is called the <I>PPS peer</I>. This driver provides precision
clock corrections only within one second, so is always operated in conjunction
with another peer or reference clock driver, which provides the seconds
numbering. The PPS peer is active only under conditions explained below.</LI>
<BR>&nbsp;
<LI>
When the Undisciplined Local Clock driver (type 1) is configured, this
is called the <I>local clock peer</I>. This is used either as a backup
reference source (stratum greater than zero), should all other synchronization
sources fail, or as the primary reference source (stratum zero) in cases
where the kernel time is disciplined by some other means of synchronization,
such as the NIST <TT>lockclock</TT> scheme, or another synchronization
protocol, such as the Digital Time Synchronization Service (DTSS).</LI>
<BR>&nbsp;
<LI>
When a modem driver such as the Automated Computer Time Service driver
(type 18) is configured, this is called the <I>modem peer</I>. This is
used either as a backup reference source, should all other primary sources
fail, or as the (only) primary reference source.</LI>
<BR>&nbsp;
<LI>
Where support is available, the PPS signal may be processed directly by
the kernel, as described in the <A HREF="kern.htm">A Kernel Model for Precision
Timekeeping</A> page. This is called the <I>kernel discipline</I>. The
PPS signal can discipline the kernel in both frequency and time. The frequency
discipline is active as long as the PPS interface device and signal itself
is operating correctly, as determined by the kernel algorithms. The time
discipline is active only under conditions explained below.</LI>
</OL>
Reference clock drivers operate in the manner described in the <A HREF="refclock.htm">Reference
Clock Drivers</A> page and its dependencies. The drivers are ordinarily
operated at stratum zero, so that as the result of ordinary NTP operations,
the server itself operates at stratum one, as required by the NTP specification.
In some cases described below, the driver is intentionally operated at
an elevated stratum, so that it will be selected only if no other survivor
is present with a lower stratum. In the case of the PPS peer or kernel
time discipline, these sources appear active only if the prefer peer has
survived the intersection and clustering algorithms, as described below,
and its clock offset relative to the current local clock is less than a
specified value, currently 128 ms.
<P>The modem clock drivers are a special case. Ordinarily, the update interval
between modem calls to synchronize the system clock is many times longer
than the interval between polls of either the remote or local peers. In
order to provide the best stability, the operation of the clock discipline
algorithm changes gradually from a phase-lock mode at the shorter update
intervals to a frequency-lock mode at the longer update intervals. If both
remote or local peers together with a modem peer are operated in the same
configuration, what can happen is that first the clock selection algorithm
can select one or more remote/local peers and the clock discipline algorithm
will optimize for the shorter update intervals. Then, the selection algorithm
can select the modem peer, which requires a much different optimization.
The intent in the design is to allow the modem peer to control the system
clock either when no other source is available or, if the modem peer happens
to be marked as prefer, then it always controls the clock, as long as it
passes the sanity checks and intersection algorithm. There still is room
for suboptimal operation in this scheme, since a noise spike can still
cause a clockhop either way. Nevertheless, the optimization function is
slow to adapt, so that a clockhop or two does not cause much harm.
<P>The local clock driver is another special case. Normally, this driver
is eligible for selection only if no other source is available. When selected,
vernier adjustments introduced via the configuration file or remotely using
the <TT><A HREF="ntpdc.htm">ntpdc</A> </TT>program can be used to trim
the local clock frequency and time. However, if the local clock driver
is designated the prefer peer, this driver is always selected and all other
sources are ignored. This behavior is intended for use when the kernel
time is controlled by some means external to NTP, such as the NIST <TT>lockclock</TT>
algorithm or&nbsp; another time synchronization protocol such as DTSS.
In this case the only way to disable the local clock driver is to mark
it unsynchronized using the leap indicator bits. In the case of modified
kernels with the <TT>ntp_adjtime()</TT> system call, this can be done automatically
if the external synchronization protocol uses it to discipline the kernel
time.
<H4>
Mitigation Rules</H4>
The mitigation rules apply in the intersection and clustering algorithms
described in the NTP specification. The intersection algorithm first scans
all peers with a persistent association and includes only those that satisfy
specified sanity checks. In addition to the checks required by the specification,
the mitigation rules require either the local-clock peer or modem peer
to be included only if marked as the prefer peer. The intersection algorithm
operates on the included population to select only those peers believed
to represent the correct time. If one or more peers survive the operation,
processing continues in the clustering algorithm. Otherwise, if there is
a modem peer, it is declared the only survivor; otherwise, if there is
a local-clock peer, it is declared the only survivor. Processing then continues
in the clustering algorithm.
<P>The clustering algorithm repeatedly discards outlyers in order to reduce
the residual jitter in the survivor population. As required by the NTP
specification, these operations continue until either a specified minimum
number of survivors remain or the minimum select dispersion of the population
is greater than the maximum peer dispersion of any member. The mitigation
rules require an additional terminating condition which stops these operations
at the point where the prefer peer is about to be discarded.
<P>The mitigation rules establish the choice of <I>system peer</I>, which
determine the stratum, reference identifier and several other system variables
which are visible to clients of the local server. In addition, they establish
which source or combination of sources control the local clock.
<OL>
<LI>
If there is a prefer peer and it is the local-clock peer or the modem peer;
or, if there is a prefer peer and the kernel time discipline is active,
choose the prefer peer as the system peer and its offset as the system
clock offset. If the prefer peer is the local-clock peer, an offset can
be calculated by the driver to produce a frequency offset in order to correct
for systematic frequency errors. In case a source other than NTP is controlling
the system clock, corrections determined by NTP can be ignored by using
the <TT>disable pll</TT> in the configuration file. If the prefer peer
is the modem peer, it must be the primary source for the reasons noted
above. If the kernel time discipline is active, the system clock offset
is ignored and the corrections handled directly by the kernel.</LI>
<LI>
If the above is not the case and there is a PPS peer, then choose it as
the system peer and its offset as the system clock offset.</LI>
<LI>
If the above is not the case and there is a prefer peer (not the local-clock
or modem peer in this case), then choose it as the system peer and its
offset as the system clock offset.</LI>
<LI>
If the above is not the case and the peer previously chosen as the system
peer is in the surviving population, then choose it as the system peer
and average its offset along with the other survivors to determine the
system clock offset. This behavior is designed to avoid excess jitter due
to clockhopping, when switching the system peer would not materially improve
the time accuracy.</LI>
<LI>
If the above is not the case, then choose the first candidate in the list
of survivors ranked in order of synchronization distance and average its
offset along with the other survivors to determine the system clock offset.
This is the default case and the only case considered in the current NTP
specification.</LI>
</OL>
<H4>
Using the Pulse-per-Second (PPS) Signal</H4>
Most radio clocks are connected using a serial port operating at speeds
of 9600 bps or higher. The accuracy using typical timecode formats, where
the on-time epoch is indicated by a designated ASCII character, like carriage-return
<TT>&lt;cr></TT>, is limited to a millisecond at best and a few milliseconds
in typical cases. However, some radios produce a PPS signal which can be
used to improve the accuracy with typical workstation servers to the order
of a few tens of microseconds. The details of how this can be accomplished
are discussed in the <A HREF="pps.htm">Pulse-per-second (PPS) Signal Interfacing</A>
page. The following paragraphs discuss how the PPS signal is affected by
the mitigation rules.
<P>First, it should be pointed out that the PPS signal is inherently ambiguous,
in that it provides a precise seconds epoch, but does not provide a way
to number the seconds. In principle and most commonly, another source of
synchronization, either the timecode from an associated radio clock, or
even one or more remote NTP servers, is available to perform that function.
In all cases, a specific, configured peer or server must be designated
as associated with the PPS signal. This is done using the <TT>prefer</TT>
keyword as described previously. The PPS signal can be associated in this
way with any peer, but is most commonly used with the radio clock generating
the PPS signal.
<P>The PPS signal can be used in two ways to discipline the local clock,
one using a special PPS driver described in the <A HREF="driver22.htm">PPS
Clock Discipline</A> page, the other using PPS signal support in the kernel,
as described in the <A HREF="kern.htm">A Kernel Model for Precision Timekeeping</A>
page. In either case, the signal must be present and within nominal jitter
and wander error tolerances. In addition, the associated prefer peer must
have survived the sanity checks and intersection algorithms and the dispersion
settled below 1 s. This insures that the radio clock hardware is operating
correctly and that, presumably, the PPS signal is operating correctly as
well. Second, the absolute offset of the local clock from that peer must
be less than 128 ms, or well within the 0.5-s unambiguous range of the
PPS signal itself. In the case of the PPS driver, the time offsets generated
from the PPS signal are propagated via the clock filter to the clock selection
procedures just like any other peer. Should these pass the sanity checks
and intersection algorithms, they will show up along with the offsets of
the prefer peer itself. Note that, unlike the prefer peer, the PPS peer
samples are not protected from discard by the clustering algorithm. These
complicated procedures insure that the PPS offsets developed in this way
are the most accurate, reliable available for synchronization.
<P>The PPS peer remains active as long as it survives the intersection
algorithm and the prefer peer is reachable; however, like any other clock
driver, it runs a reachability algorithm on the PPS signal itself. If for
some reason the signal fails or displays gross errors, the PPS peer will
either become unreachable or stray out of the survivor population. In this
case the clock selection remitigates as described above.
<P>When kernel support for the PPS signal is available, the PPS signal
is interfaced to the kernel serial driver code via a modem control lead.
As the PPS signal is derived from external equipment, cables, etc., which
sometimes fail, a good deal of error checking is done in the kernel to
detect signal failure and excessive noise. The way in which the mitigation
rules affect the kernel discipline is as follows.
<P>In order to operate, the kernel support must be enabled by the <TT>enable
pll </TT>command in the configuration file and the signal must be present
and within nominal jitter and wander error tolerances. In the NTP daemon,
the PPS discipline is active only when the prefer peer is among the survivors
of the clustering algorithm, and its absolute offset is within 128 ms,
as in the PPS driver. Under these conditions the kernel disregards updates
produced by the NTP daemon and uses its internal PPS source instead. The
kernel maintains a watchdog timer for the PPS signal; if the signal has
not been heard or is out of tolerance for more than some interval, currently
two minutes, the kernel discipline is declared inoperable and operation
continues as if it were not present.&nbsp;
<HR>
<ADDRESS>
David L. Mills (mills@udel.edu)</ADDRESS>
</BODY>
</HTML>