Give the HZ/overflow check a 10% margin.
Eliminate bogus newline.
If timecounters have equal quality, prefer higher frequency.
Some inspiration from: bde
represents the pruely stylistic changes and should have no net impact
on the rest of the code.
bde's more substantive changes will follow in a separate commit once
we've come to closure on them.
Submitted by: bde
ntp_update_second twice when we have a large step in case that step
goes across a scheduled leap second. The only way this could happen
would be if we didn't call tc_windup over the end of day on the day of
a leap second, which would only happen if timeouts were delayed for
seconds. While it is an edge case, it is an important one to get
right for my employer.
Sponsored by: Timing Solutions Corporation
A timecounter will be selected when registered if its quality is
not negative and no less than the current timecounters.
Add a sysctl to report all available timecounters and their qualities.
Give the dummy timecounter a solid negative quality of minus a million.
Give the i8254 zero and the ACPI 1000.
The TSC gets 800, unless APM or SMP forces it negative.
Other timecounters default to zero quality and thereby retain current
selection behaviour.
Before, we would add/subtract the leap second when the system had been
up for an even multiple of days, rather than at the end of the day, as
a leap second is defined (at least wrt ntp). We do this by
calculating the notion of UTC earlier in the loop, and passing that to
get it adjusted. Any adjustments that ntp_update_second makes to this
time are then transferred to boot time. We can't pass it either the
boot time or the uptime because their sum is what determines when a
leap second is needed. This code adds an extra assignment and two
extra compare in the typical case, which is as cheap as I could made
it.
I have confirmed with this code the kernel time does the correct thing
for both positive and negative leap seconds. Since the ntp interface
doesn't allow for +2 or -2, those cases can't be tested (and the folks
in the know here say there will never be a +2s or -2s leap event, but
rather two +1s or -1s leap events).
There will very likely be no leap seconds for a while, given how the
earth is speeding up and slowing down, so there will be plenty of time
for this fix to propigate. UT1-UTC is currently at "about -0.4s" and
decrementing by .1s every 8 months or so. 6 * 8 is 48 months, or 4
years.
-stable has different code, but a similar bug that was introduced
about the time of the last leap second, which is why nobody has
noticed until now.
MFC After: 3 weeks
Reviewed by: phk
"Furthermore, leap seconds must die." -- Cato the Elder
potential discontinuities in our UTC timescale.
Applications can monitor this variable if they want to be informed
about steps in the timescale. Slews (ntp and adjtime(2)) and
frequency adjustments (ntp) will not increment this counter, only
operations which set the clock. No attempt is made to classify
size or direction of the step.
called. Otherwise (depending on a non-deterministic sort), the timecounter
code can be initialized before the clock rate has been set (on ia64) and it
assumes hz = 100, rather than the real value of 1024. I'm not sure how much
gets upset by this.
Glanced at by: phk
functions which run for several milliseconds at a time and getting
in queue behind one or more of those makes us miss our rewind.
Instead call it from hardclock() like we used to do, but retain the
prescaler so we still cope with high HZ values.
by other bits of code, split struct timecounter into two.
struct timecounter contains just the bits which pertains to the hardware
counter and the reading of it.
struct timehands (as in "the hands on a clock") contains all the ugly bit
fidling stuff. Statically compile ten timehands.
This commit is the functional part. A later cosmetic patch will rename
various variables and fieldnames.
timeout loop.
Limit the rate at which we wind the timecounters to approx 1000 Hz.
This limits the precision of the get{bin,nano,micro}[up]time(9)
functions to roughly a millisecond.
timecounter will be used starting at the next second, which is
good enough for sysctl purposes. If better adjustment is needed
the NTP PLL should be used.
Apply the change as a continuous slew rather than as a series of
discrete steps and make it possible to adjust arbitraryly huge
amounts of time in either direction.
In practice this is done by hooking into the same once-per-second
loop as the NTP PLL and setting a suitable frequency offset deducting
the amount slewed from the remainder. If the remaining delta is
larger than 1 second we slew at 5000PPM (5msec/sec), for a delta
less than a second we slew at 500PPM (500usec/sec) and for the last
one second period we will slew at whatever rate (less than 500PPM)
it takes to eliminate the delta entirely.
The old implementation stepped the clock a number of microseconds
every HZ to acheive the same effect, using the same rates of change.
Eliminate the global variables tickadj, tickdelta and timedelta and
their various use and initializations.
This removes the most significant obstacle to running timecounter and
NTP housekeeping from a timeout rather than hardclock.
our feet when we look inside timecounter structures.
Make the "sync_other" code more robust by never overwriting the
tc_next field.
Add counters for the bin[up]time functions.
Call tc_windup() in tc_init() and switch_timecounter() to make sure
we all the fields set right.
The binary format "bintime" is a 32.64 format, it will go to 64.64
when time_t does.
The bintime format is available to consumers of time in the kernel,
and is preferable where timeintervals needs to be accumulated.
This change simplifies much of the magic math inside the timecounters
and improves the frequency and time precision by a couple of bits.
I have not been able to measure a performance difference which was not
a tiny fraction of the standard deviation on the measurements.
HZ=BIGNUM will strain the assumptions behind timecounters to the
point where they break.
This may or may not help people seeing microuptime() backwards messages.
Make the global timecounter variable volatile, it makes no difference in
the code GCC generates, but it makes represents the intent correctly.
Thanks to: jdp
MFC after: 2 weeks
include:
* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)
* Per-CPU idle processes.
* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).
Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
Make the public interface more systematically named.
Remove the alternate method, it doesn't do any good, only ruins performance.
Add counters to profile the usage of the 8 access functions.
Apply the beer-ware to my code.
The weird +/- counts are caused by two repocopies behind the scenes:
kern/kern_clock.c -> kern/kern_tc.c
sys/time.h -> sys/timetc.h
(thanks peter!)
and extend. The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).
Agreed with in principle by: dufault
NOTE: This will break building ntpd until ntpd has been upgraded to also
support draft 05. People that want to build ntpd in the meantime can
get patches from me.
used for timecounting. The possible values are the names of the
physically present harware timecounters ("i8254" and "TSC" on i386's).
Fixed some nearby bitrot in comments in <sys/time.h>.
Reviewed by: phk
This code is backwards compatible with the older "microkernel" PLL, but
allows ntpd v4 to use nanosecond resolution. Many other improvements.
PPS_SYNC and hardpps() are NOT supported yet.
is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>
can set if your hw/sw produces the "calcru negative..." message.
Setting the alternate method (sysctl -w kern.timecounter.method=1)
makes the the get{nano|micro}*() functions call the real thing at
resulting in a measurable but minor overhead.
I decided to NOT have the "calcru" change the method automatically
because you should be aware of this problem if you have it.
The problems currently seen, related to usleep and a few other corners
are fixed for both methods.
out interrupts for too long. If you still see the "calcru: negative
time..." message you can increase NTIMECOUNTER (see LINT).
Sideeffect is that a timecounter is required to not wrap around in
less than (1 + delta) seconds instead of the (1/hz + delta) required
until now.
Many thanks to: msmith, wpaul, wosch & bde
If you have problems with the "calcru" messages and processes being
killed for excessive cpu time, try to increase the NTIMECOUNTER
#define and report your findings.
Clean up (or if antipodic: down) some of the msgbuf stuff.
Use an inline function rather than a macro for timecounter delta.
Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.
Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()
This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.
WARNING: Programs which muck about with struct proc in userland
will have to be fixed.
Reviewed, but found imperfect by: bde
* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.
* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.
* kill mono_time.
* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)
* nanosleep, select & poll takes long sleeps one day at a time
Reviewed by: bde
Tested by: ache and others