1999-12-09 13:01:21 +00:00

293 lines
16 KiB
HTML

<html><head><title>
Executive Summary - Computer Network Time Synchronization
</title></head><body><H3>
Executive Summary - Computer Network Time Synchronization
</h3>
<img align=left src=pic/alice12.gif>
from <i>Alice's Adventures in Wonderland</i>, by Lewis Carroll,
illustrations by Sir John Tenniel
<p>The executive is the one on the left.
<br clear=left><hr>
<h4>Introduction</h4>
<p>The standard timescale used by most nations of the world is Universal
Coordinated Time (UTC), which is based on the Earth's rotation about its
axis, and the Gregorian Calendar, which is based on the Earth's rotation
about the Sun. The UTC timescale is disciplined with respect to
International Atomic Time (TAI) by inserting leap seconds at intervals
of about 18 months. UTC time is disseminated by various means, including
radio and satellite navigation systems, telephone modems and portable
clocks.
<p>Special purpose receivers are available for many time-dissemination
services, including the Global Position System (GPS) and other services
operated by various national governments. For reasons of cost and
convenience, it is not possible to equip every computer with one of
these receivers. However, it is possible to equip some number of
computers acting as primary time servers to synchronize a much larger
number of secondary servers and clients connected by a common network.
In order to do this, a distributed network clock synchronization
protocol is required which can read a server clock, transmit the reading
to one or more clients and adjust each client clock as required.
Protocols that do this include the Network Time Protocol (NTP), Digital
Time Synchronization Protocol (DTSS) and others found in the literature
(See "Further Reading" at the end of this article.)
<h4>Protocol Design Issues</h4>
<p>The synchronization protocol determines the time offset of the server
clock relative to the client clock. The various synchronization
protocols in use today provide different means to do this, but they all
follow the same general model. On request, the server sends a message
including its current clock value or <i>timestamp</i> and the client
records its own timestamp upon arrival of the message. For the best
accuracy, the client needs to measure the server-client propagation
delay to determine its clock offset relative to the server. Since it is
not possible to determine the one-way delays, unless the actual clock
offset is known, the protocol measures the total roundtrip delay and
assumes the propagation times are statistically equal in each direction.
In general, this is a useful approximation; however, in the Internet of
today, network paths and the associated delays can differ significantly
due to the individual service providers.
<p>The community served by the synchronization protocol can be very
large. For instance, the NTP community in the Internet of 1998 includes
over 230 primary time servers, synchronized by radio, satellite and
modem, and well over 100,000 secondary servers and clients. In addition,
there are many thousands of private communities in large government,
corporate and institution networks. Each community is organized as a
tree graph or <i>subnet</i>, with the primary servers at the root and
secondary servers and clients at increasing hop count, or stratum level,
in corporate, department and desktop networks. It is usually necessary
at each stratum level to employ redundant servers and diverse network
paths in order to protect against broken software, hardware and network
links.
<p>Synchronization protocols work in one or more association modes,
depending on the protocol design. Client/server mode, also called
master/slave mode, is supported in both DTSS and NTP. In this mode, a
client synchronizes to a stateless server as in the conventional RPC
model. NTP also supports symmetric mode, which allows either of two peer
servers to synchronize to the other, in order to provide mutual backup.
DTSS and NTP support a broadcast mode which allows many clients to
synchronize to one or a few servers, reducing network traffic when large
numbers of clients are involved. In NTP, IP multicast can be used when
the subnet spans multiple networks.
<p>Configuration management can be a serious problem in large subnets.
Various schemes which index public databases and network directory
services are used in DTSS and NTP to discover servers. Both protocols
use broadcast modes to support large client populations; but, since
listen-only clients cannot calibrate the delay, accuracy can suffer. In
NTP, clients determine the delay at the time a server is first
discovered by polling the server in client/server mode and then
reverting to listen-only mode. In addition, NTP clients can broadcast a
special "manycast" message to solicit responses from nearby servers and
continue in client/server mode with the respondents.
<h4>Computer Clock Modelling and Error Analysis</h4>
Most computers include a quartz resonator-stabilized oscillator and
hardware counter that interrupts the processor at intervals of a few
milliseconds. At each interrupt, a quantity called <i>tick</i> is added
to a system variable representing the clock time. The clock can be read
by system and application programs and set on occasion to an external
reference. Once set, the clock readings increment at a nominal rate,
depending on the value of <i>tick</i>. Typical Unix system kernels
provide a programmable mechanism to increase or decrease the value of
<i>tick</i> by a small, fixed amount in order to amortize a given time
adjustment smoothly over multiple <i>tick</i> intervals.
<p>Clock errors are due to variations in network delay and latencies in
computer hardware and software (jitter), as well as clock oscillator
instability (wander). The time of a client relative to its server can be
expressed
<p><center><i>T</i>(<i>t</i>) = <i>T</i>(<i>t</i><sub>0</sub>) +
<i>R</i>(<i>t - t</i><sub>0</sub>) + 1/2 <i>D</i>(<i>t -
T</i><sub>0</sub>)<sup>2</sup>,</center>
<p>where <i>t</i> is the current time, <i>T</i> is the time offset at
the last measurement update <i>t</i><sub>0</sub>, <i>R</i> is the
frequency offset and <i>D</i> is the drift due to resonator ageing. All
three terms include systematic offsets that can be corrected and random
variations that cannot. Some protocols, including DTSS, estimate only
the first term in this expression, while others, including NTP, estimate
the first two terms. Errors due to the third term, while important to
model resonator aging in precision applications, are neglected, since
they are usually dominated by errors in the first two terms.
<p>The synchronization protocol estimates <i>T</i>(<i>t</i><sub>0</sub>)
(and <i>R</i>(<i>t</i><sub>0</sub>), where relevant) at regular
intervals <font face="symbol">t</font> and adjusts the clock to minimize
<i>T</i>(<i>t</i>) in future. In common cases, <i>R</i> can have
systematic offsets of several hundred parts-per-million (PPM) with
random variations of several PPM due to ambient temperature changes. If
not corrected, the resulting errors can accumulate to seconds per day.
In order that these errors do not exceed a nominal specification, the
protocol must periodically re-estimate <i>T</i> and <i>R</i> and
compensate for variations by adjusting the clock at regular intervals.
As a practical matter, for nominal accuracies of tens of milliseconds,
this requires clients to exchange messages with servers at intervals in
the order of tens of minutes.
<p>Analysis of quartz-resonator stabilized oscillators show that errors
are a function of the averaging time, which in turn depends on the
interval between corrections. At correction intervals less than a few
hundred seconds, errors are dominated by jitter, while, at intervals
greater than this, errors are dominated by wander. As explained later,
the characteristics of each regime determine the algorithm used to
discipline the clock. These errors accumulate at each stratum level from
the root to the leaves of the subnet tree. It is possible to quantify
these errors by statistical means, as in NTP. This allows real-time
applications to adjust audio or video playout delay, for example.
However, the required statistics may be different for various classes of
applications. Some applications need absolute error bounds guaranteed
never to exceeded, as provided by the following correctness principles.
<h4>Correctness Principles</h4>
<p>Applications requiring reliable time synchronization such as air
traffic control must have confidence that the local clock is correct
within some bound relative to a given timescale such as UTC. There is a
considerable body of literature that studies these issues with respect
to various failure models such as fail-stop and Byzantine disagreement.
While these models inspire much confidence in a theoretical setting,
most require multiple message rounds for each measurement and would be
impractical in a large computer network such as the Internet. However,
it can be shown that the worst-case error in reading a remote server
clock cannot exceed one-half the roundtrip delay measured by the client.
This is a valuable insight, since it permits strong statements about the
correctness of the timekeeping system.
<p>In the Probabilistic Clock Synchronization (PCS) scheme devised by
Cristian, a maximum error tolerance is established in advance and time
value samples associated with roundtrip delays that exceed twice this
value are discarded. By the above argument, the remaining samples must
represent time values within the specified tolerance. As the tolerance
is decreased, more samples fail the test until a point where no samples
survive. The tolerance can be adjusted for the best compromise between
the highest accuracy consistent with acceptable sample survival rate.
<p>In a scheme devised by Marzullo and exploited in NTP and DTSS, the
worst-case error determined for each server determines a correctness
interval. If each of a number of servers are in fact synchronized to a
common timescale, the actual time must be contained in the intersection
of their correctness intervals. If some intervals do not intersect, then
the clique containing the maximum number of intersections is assumed
correct <i>truechimers</i> and the others assumed incorrect
<i>false<i>tick</i>ers</i>. Only the truechimers are used to adjust the
system
clock.
<h4>Data Grooming Algorithms</h4>
By its very nature, clock synchronization is a continuous process,
resulting in a sequence of measurements with each of possibly several
servers and resulting in a clock adjustment. In some protocols, crafted
algorithms are used to improve the time and frequency estimates and
refine the clock adjustment. Algorithms described in the literature are
based on trimmed-mean and median filter methods. The clock filter
algorithm used in NTP is based on the above observation that the
correctness interval depends on the roundtrip delay. The algorithm
accumulates offset/delay samples in a window of several samples and
selects the offset sample associated with the minimum delay. In general,
larger window sizes provide better estimates; however, stability
considerations limit the window size to about eight.
<p>The same principle could be used when selecting the best subset of
servers and combining their offsets to determine the clock adjustment.
However, different servers often show different systematic offsets, so
the best statistic for the central tendency of the server population may
not be obvious. Various kinds of clustering algorithms have been found
useful for this purpose. The one used in NTP sorts the offsets by a
quality metric, then calculates the variance of all servers relative to
each server separately. The algorithm repeatedly discards the outlyer
with the largest variance until further discards will not improve the
residual variance or until a minimum number of servers remain. The final
clock adjustment is computed as a weighted average of the survivors.
<p>At the heart of the synchronization protocol is the algorithm used to
adjust the system clock in accordance with the final adjustment
determined by the above algorithms. This is called the clock discipline
algorithm or simply the discipline. Such algorithms can be classed
according to whether they minimize the time offset or frequency offset
or both. For instance, the discipline used in DTSS minimizes only the
time offset, while the one used in NTP minimizes both time and frequency
offsets. While the DTSS algorithm cannot remove residual errors due to
systematic frequency errors, the NTP algorithm is more complicated and
less forgiving of design and implementation mistakes.
<p>All clock disciplines function as a feedback loop, with measured
offsets used to adjust the clock oscillator phase and frequency to match
the external synchronization source. The behavior of feedback loops is
well understood and modelled by mathematical analysis. The significant
design parameter is the time constant, or responsiveness to external or
internal variations in time or frequency. Optimum selection of time
constant depends on the interval between update messages. In general,
the longer these intervals, the larger the time constant and vice versa.
In practice and with typical network configurations the optimal poll
intervals vary between one and twenty minutes for network paths to some
thousands of minutes for modem paths.
<h4>Further Reading</h4>
<ol>
<p><li>Cristian, F. Probabilistic clock synchronization. In Distributed
Computing 3, Springer Verlag, 1989, 146-158.</li>
<p><li>Digital Time Service Functional Specification Version T.1.0.5.
DigitalEquipment Corporation, 1989.</li>
<p><li>Gusella, R., and S. Zatti. TEMPO - A network time controller for
a distributed Berkeley UNIX system. IEEE Distributed Processing
Technical Committee Newsletter 6, NoSI-2 (June 1984), 7-15. Also in:
Proc. Summer 1984 USENIX (Salt Lake City, June 1984).</li>
<p><li>Kopetz, H., and W. Ochsenreiter. Clock synchronization in
distributed real-time systems. IEEE Trans. Computers C-36, 8 (August
1987), 933-939.</li>
<p><li>Lamport, L., and P.M. Melliar-Smith. Synchronizing clocks in the
presence of faults. JACM 32, 1 (January 1985), 52-78.</li>
<p><li>Marzullo, K., and S. Owicki. Maintaining the time in a
distributed system. ACM Operating Systems Review 19, 3 (July 1985), 44-
54.</li>
<p><li>Mills, D.L. Internet time synchronization: the Network Time
Protocol. IEEE Trans. Communications COM-39, 10 (October 1991), 1482-
1493. Also in: Yang, Z., and T.A. Marsland (Eds.). Global States and
Time in Distributed Systems, IEEE Press, Los Alamitos, CA, 91-102.</li>
<p><li>Mills, D.L. Modelling and analysis of computer network clocks.
Electrical Engineering Department Report 92-5-2, University of Delaware,
May 1992, 29 pp.</li>
<p><li>NIST Time and Frequency Dissemination Services. NBS Special
Publication432 (Revised 1990), National Institute of Science and
Technology, U.S. Department of Commerce, 1990.</li>
<p><li>Schneider, F.B. A paradigm for reliable clock synchronization.
Department of Computer Science Technical Report TR 86-735, Cornell
University, February 1986.</li>
<p><li>Srikanth, T.K., and S. Toueg. Optimal clock synchronization. JACM
34, 3 (July 1987), 626-645.</li>
<p><li>Stein, S.R. Frequency and time - their measurement and
characterization (Chapter 12). In: E.A. Gerber and A. Ballato (Eds.).
Precision Frequency Control, Vol. 2, Academic Press, New York 1985, 191-
232, 399-416. Also in: Sullivan, D.B., D.W. Allan, D.A. Howe and F.L.
Walls (Eds.). Characterization of Clocks and Oscillators. National
Institute of Standards and Technology Technical Note 1337, U.S.
Government Printing Office (January, 1990), TN61-TN119.</li>
</ol>
<hr><a href=index.htm><img align=left src=pic/home.gif></a><address><a
href=mailto:mills@udel.edu> David L. Mills &lt;mills@udel.edu&gt;</a>
</address></a></body></html>