freebsd-dev

Author	SHA1	Message	Date
Ed Schouten	fd0f59709d	Add labels to sysctls related to clocks. Sysctls like kern.eventtimer.et.*.quality currently embed the name of the clock device. This is problematic for the Prometheus metrics exporter for two reasons: - Some of those clocks have dashes in their names, which Prometheus doesn't allow to be used in metric names. - It doesn't allow for extracting the same property of all clocks on the system from within a single query. Attach these nodes to have a label, so that the Prometheus metrics exporter gives these metric a uniform name with the name of the clock attached as a label. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D8775	2016-12-14 12:56:58 +00:00
Konstantin Belousov	1680854946	Implement userspace gettimeofday(2) with HPET timecounter. Right now, userspace (fast) gettimeofday(2) on x86 only works for RDTSC. For older machines, like Core2, where RDTSC is not C2/C3 invariant, and which fall to HPET hardware, this means that the call has both the penalty of the syscall and of the uncached hw behind the QPI or PCIe connection to the sought bridge. Nothing can me done against the access latency, but the syscall overhead can be removed. System already provides mappable /dev/hpetX devices, which gives straight access to the HPET registers page. Add yet another algorithm to the x86 'vdso' timehands. Libc is updated to handle both RDTSC and HPET. For HPET, the index of the hpet device to mmap is passed from kernel to userspace, index might be changed and libc invalidates its mapping as needed. Remove cpu_fill_vdso_timehands() KPI, instead require that timecounters which can be used from userspace, to provide tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64 libc/<arch>/sys/__vdso_gettc.c into one source file in the new libc/x86/sys location. __vdso_gettc() internal interface is changed to move timecounter algorithm detection into the MD code. Measurements show that RDTSC even with the syscall overhead is faster than userspace HPET access. But still, userspace HPET is three-four times faster than syscall HPET on several Core2 and SandyBridge machines. Tested by: Howard Su <howard0su@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7473	2016-08-17 09:52:09 +00:00
Konstantin Belousov	50c22263bb	Cache getbintime(9) answer in timehands, similarly to getnanotime(9) and getmicrotime(9). Suggested and reviewed by: bde (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month	2016-07-30 09:25:57 +00:00
Konstantin Belousov	5760b029ee	Prevent parallel tc_windup() calls, both parallel top-level calls from setclock() and from simultaneous top-level and interrupt. For this, tc_windup() is protected with a tc_setclock_mtx spinlock, in the try mode when called from hardclock interrupt. If spinlock cannot be obtained without spinning from the interrupt context, this means that top-level executes tc_windup() on other core and our try may be avoided. The boottimebin and boottime variables should be adjusted from tc_windup(). To be correct, they must be part of the timehands and read using lockless protocol. Remove the globals and reimplement the getboottime(9)/getboottimebin(9) KPI using the timehands read protocol. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed wit: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:49:41 +00:00
Konstantin Belousov	4d29106e55	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:33:33 +00:00
Konstantin Belousov	a83c016f00	Reduce number of timehands to just two. This is useful because consumers can now be only one tc_windup() call late. Use C99 initialization. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:27:52 +00:00
Konstantin Belousov	584b675ed6	Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:08:59 +00:00
Pedro F. Giffuni	e3043798aa	sys/kern: spelling fixes in comments. No functional change.	2016-04-29 22:15:33 +00:00
Enji Cooper	aaca704590	Define `fhard` in pps_event(..) only when PPS_SYNC is defined to mute an -Wunused-but-set-variable warning Reported by: FreeBSD_HEAD_amd64_gcc4.9 jenkins job Sponsored by: EMC / Isilon Storage Division	2015-11-02 03:14:37 +00:00
Konstantin Belousov	b2557db607	Use per-cpu values for base and last in tc_cpu_ticks(). The values are updated lockess, different CPUs write its own view of timecounter state. The critical section is done for safety, callers of tc_cpu_ticks() are supposed to already enter critical section, or to own a spinlock. The change fixes sporadical reports of too high values reported for the (W)CPU on platforms that do not provide cpu ticker and use tc_cpu_ticks(), in particular, arm*. Diagnosed and reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-09-25 13:03:57 +00:00
Ian Lepore	e8bac3f240	If a specific timecounter has been chosen via sysctl, and a new timecounter with higher quality registers (presumably in a module that has just been loaded), do not undo the user's choice by switching to the new timecounter. Document that behavior, and also the fact that there is no way to unregister a timecounter (and thus no way to unload a module containing one).	2015-08-12 20:50:20 +00:00
Ian Lepore	721b581722	Only process the PPS event types currently enabled in pps_params.mode. This makes the PPS API behave correctly, but isn't ideal -- we still end up capturing PPS data for non-enabled edges, we just don't process the data into an event that becomes visible outside of kern_tc. That's because the event type isn't passed to pps_capture(), so it can't do the filtering. Any solution for capture filtering is going to require touching every driver.	2015-08-07 23:31:31 +00:00
Ian Lepore	6f7a9f7c8d	RFC 2783 requires a status of ETIMEDOUT, not EWOULDBLOCK, on a timeout.	2015-08-07 21:14:19 +00:00
Konstantin Belousov	f4b5a9725a	Reimplement the ordering requirements for the timehands updates, and for timehands consumers, by using fences. Ensure that the timehands->th_generation reset to zero is visible before the data update is visible []. tc_setget() allowed data update writes to become visible before generation (but not on TSO architectures). Remove tc_setgen(), tc_getgen() helpers, use atomics inline []. Noted by: alc [] Requested by: bde [**] Reviewed by: alc, bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-07-08 18:42:08 +00:00
Konstantin Belousov	529c97886b	Tweaks for r284178: Do not include machine/atomic.h explicitely, the header is already included by sys/systm.h. Force inlining of tc_getgen() and tc_setgen(). The functions are used more than once, which causes compilers with non-aggressive inlining policies to generate calls. Suggested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-11 04:41:54 +00:00
Konstantin Belousov	2c6946dca2	When updating/accessing the timehands, barriers are needed to ensure that: - th_generation update is visible after the parameters update is visible; - the read of parameters is not reordered before initial read of th_generation. On UP kernels, compiler barriers are enough. For SMP machines, CPU barriers must be used too, as was confirmed by submitter by testing on the Freescale T4240 platform with 24 PowerPC processors. Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> MFC after: 1 week	2015-06-09 11:49:56 +00:00
Ian Lepore	28315e27a7	Implement a mechanism for making changes in the kernel<->driver PPS interface without breaking ABI or API compatibility with existing drivers. The existing data structures used to communicate between the kernel and driver portions of PPS processing contain no spare/padding fields and no flags field or other straightforward mechanism for communicating changes in the structures or behaviors of the code. This makes it difficult to MFC new features added to the PPS facility. ABI compatibility is important; out-of-tree drivers in module form are known to exist. (Note that the existing api_version field in the pps_params structure must contain the value mandated by RFC 2783 and any RFCs that come along after.) These changes introduce a pair of abi-version fields which are filled in by the driver and the kernel respectively to indicate the interface version. The driver sets its version field before calling the new pps_init_abi() function. That lets the kernel know how much of the pps_state structure is understood by the driver and it can avoid using newer fields at the end of the structure that it knows about if the driver is a lower version. The kernel fills in its version field during the init call, letting the driver know what features and data the kernel supports. To implement the new version information in a way that is backwards compatible with code from before these changes, the high bit of the lightly-used 'kcmode' field is repurposed as a flag bit that indicates the driver is aware of the abi versioning scheme. Basically if this bit is clear that indicates a "version 0" driver and if it is set the driver_abi field indicates the version. These changes also move the recently-added 'mtx' field of pps_state from the middle to the end of the structure, and make the kernel code that uses this field conditional on the driver being abi version 1 or higher. It changes the only driver currently supplying the mtx field, usb_serial, to use pps_init_abi(). Reviewed by: hselasky@	2015-05-04 17:59:39 +00:00
Ian Lepore	91d9eda200	Use sbuf_printf() for sysctl strings instead of stack buffers and snprintf().	2015-03-14 23:16:12 +00:00
Hans Petter Selasky	35ee8a4a59	Add mutex support to the pps_ioctl() API in the kernel. Bump kernel version to reflect structure change. PR: 196897 MFC after: 1 week	2015-03-07 18:23:32 +00:00
Neel Natu	d1b1b60065	Update the vdso timehands only via tc_windup(). Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter hardware was changed via 'sysctl kern.timecounter.hardware'. This happened because the vdso timehands update was missing the special treatment in tc_windup() when changing timecounters. Reviewed by: kib	2015-01-20 03:54:30 +00:00
John Baldwin	92597e064b	On some Intel CPUs with a P-state but not C-state invariant TSC the TSC may also halt in C2 and not just C3 (it seems that in some cases the BIOS advertises its C3 state as a C2 state in _CST). Just play it safe and disable both C2 and C3 states if a user forces the use of the TSC as the timecounter on such CPUs. PR: 192316 Differential Revision: https://reviews.freebsd.org/D1441 No objection from: jkim MFC after: 1 week	2015-01-05 20:44:44 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Davide Italiano	5b999a6be0	- Make callout(9) tickless, relying on eventtimers(4) as backend for precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil	2013-03-04 11:09:56 +00:00
Ian Lepore	a1137de941	Add PPS_CANWAIT support for time_pps_fetch(). This adds support for all three blocking modes described in section 3.4.3 of RFC 2783, allowing the caller to retrieve the most recent values without blocking, to block for a specified time, or to block forever. Reviewed by: discussion on hackers@	2013-02-15 18:30:32 +00:00
John Baldwin	a8df530ddc	Mark 'ticks', 'time_second', and 'time_uptime' as volatile to prevent the compiler from caching their values in tight loops. Reviewed by: bde MFC after: 1 week	2013-01-28 19:38:13 +00:00
George V. Neville-Neil	57d025c338	Add support for walltimestamp in DTrace. Submitted by: Fabian Keil MFC after: 2 weeks	2012-07-16 20:17:19 +00:00
Konstantin Belousov	21c295ef88	Stop updating the struct vdso_timehands from even handler executed in the scheduled task from tc_windup(). Do it directly from tc_windup in interrupt context [1]. Establish the permanent mapping of the shared page into the kernel address space, avoiding the potential need to sleep waiting for allocation of sf buffer during vdso_timehands update. As a consequence, shared_page_write_start() and shared_page_write_end() functions are not needed anymore. Guess and memorize the pointers to native host and compat32 sysentvec during initialization, to avoid the need to get shared_page_alloc_sx lock during the update. In tc_fill_vdso_timehands(), do not loop waiting for timehands generation to stabilize, since vdso_timehands is written in the same interrupt context which wrote timehands. Requested by: mav [1] MFC after: 29 days	2012-06-23 09:33:06 +00:00
Konstantin Belousov	aea810386d	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month	2012-06-22 07:06:40 +00:00
Juli Mallett	9624d94701	o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands using the o32 ABI. This mostly follows nwhitehorn's lead in implementing COMPAT_FREEBSD32 on powerpc64. o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the 32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit ABIs use a 64-bit time_t. o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS with 32-bit compatibility, then, disable some code which assumes otherwise wrongly when built for MIPS. A more general macro to check in this case would seem like a good idea eventually. If someone adds support for using n32 userland with n64 kernels on MIPS, then they will have to add a variety of flags related to each piece of the ABI that can vary. That's probably the right time to generalize further. o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the freebsd32 compat code. Probably this should be generalized at some point. Reviewed by: gonzo	2012-03-03 08:19:18 +00:00
Kevin Lo	de02885a7b	Add a missing break. This bug was introduced in r228856.	2012-02-10 06:30:52 +00:00
Lawrence Stewart	6cedd609b7	Introduce the sysclock_getsnapshot() and sysclock_snap2bintime() KPIs. The sysclock_getsnapshot() function allows the caller to obtain a snapshot of all the system clock and timecounter state required to create time stamps at a later point. The sysclock_snap2bintime() function converts a previously obtained snapshot into a bintime time stamp according to the specified flags e.g. which system clock, uptime vs absolute time, etc. These KPIs enable useful functionality, including direct comparison of the feedback and feed-forward system clocks and generation of multiple time stamps with different formats from a single timecounter read. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ In collaboration with: Julien Ridoux (jridoux at unimelb edu au)	2011-12-24 01:32:01 +00:00
Lawrence Stewart	88394fe42c	Do away with the somewhat clunky sysclock_ops structure and associated code, reimplementing the [get]{bin,nano,micro}[up]time() wrapper functions in terms of the new "fromclock" API instead. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Discussed with: Julien Ridoux (jridoux at unimelb edu au) Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-29 08:33:40 +00:00
Lawrence Stewart	e977bac333	Make the fbclock_[get]{bin,nano,micro}[up]time() function prototypes public so that new APIs with some performance sensitivity can be built on top of them. These functions should not be called directly except in special circumstances. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Discussed with: Julien Ridoux (jridoux at unimelb edu au) Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-29 06:53:36 +00:00
Lawrence Stewart	c2a4ee9906	Fix an oversight in r227747 by calling fbclock_bin{up}time() directly from the fbclock_{nanouptime\|microuptime\|bintime\|nanotime\|microtime}() functions to avoid indirecting through a sysclock_ops wrapper function. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-29 06:12:19 +00:00
Lawrence Stewart	65e359a15c	- Add Pulse-Per-Second timestamping using raw ffcounter and corresponding ffclock time in seconds. - Add IOCTL to retrieve ffclock timestamps from userland. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-21 13:34:29 +00:00
Lawrence Stewart	9bce0f05fe	- Provide a sysctl interface to change the active system clock at runtime. - Wrap [get]{bin,nano,micro}[up]time() functions of sys/time.h to allow requesting time from either the feedback or the feed-forward clock. If a feedback (e.g. ntpd) and feed-forward (e.g. radclock) daemon are both running on the system, both kernel clocks are updated but only one serves time. - Add similar wrappers for the feed-forward difference clock. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-20 05:32:12 +00:00
Lawrence Stewart	b0fdc83713	Core structure and functions to support a feed-forward clock within the kernel. Implement ffcounter, a monotonically increasing cumulative counter on top of the active timecounter. Provide low-level functions to read the ffcounter and convert it to absolute time or a time interval in seconds using the current ffclock estimates, which track the drift of the oscillator. Add a ring of fftimehands to track passing of time on each kernel tick and pick up updates of ffclock estimates. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-19 14:10:16 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Jung-uk Kim	08e1b4f4a9	If TSC stops ticking in C3, disable deep sleep when the user forcefully select TSC as timecounter hardware. Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de)	2011-07-14 21:00:26 +00:00
Matthew D Fleming	cbc134ad03	Introduce signed and unsigned version of CTLTYPE_QUAD, renaming existing uses. Rename sysctl_handle_quad() to sysctl_handle_64().	2011-01-19 23:00:25 +00:00
Colin Percival	772d1e42a2	Add parentheses for clarity. The parentheses around the two terms of the && are unnecessary but I'm leaving them in for the sake of avoiding confusion (I confuse easily). Submitted by: bde	2010-11-23 04:50:01 +00:00
Colin Percival	aa519c0a64	In tc_windup, handle the case where the previous call to tc_windup was more than 1s earlier. Prior to this commit, the computation of th_scale * delta (which produces a 64-bit value equal to the time since the last tc_windup call in units of 2^(-64) seconds) would overflow and any complete seconds would be lost. We fix this by repeatedly converting tc_frequency units of timecounter to one seconds; this is not exactly correct, since it loses the NTP adjustment, but if we find ourselves going more than 1s at a time between clock interrupts, losing a few seconds worth of NTP adjustments is the least of our problems...	2010-11-22 09:13:25 +00:00
Rebecca Cran	8d065a3914	Fix some more style(9) issues.	2010-11-14 16:10:15 +00:00
Rebecca Cran	b389be97db	Fix style(9) issues from r215281 and r215282. MFC after: 1 week	2010-11-14 08:06:29 +00:00
Rebecca Cran	2baa5cddb6	Add some descriptions to sys/kern sysctls. PR: kern/148710 Tested by: Chip Camden <sterling at camdensoftware.com> MFC after: 1 week	2010-11-14 06:09:50 +00:00
Alexander Motin	95d23438dd	Until hardclock() and respectively tc_windup() called first time, system is running on "dummy" time counter. But to function properly in one-shot mode, event timer management code requires working time counter. Slow moving "dummy" time counter delays first hardclock() call by few seconds on my systems, even though timer interrupts were correctly kicking kernel. That causes few seconds delay during boot with one-shot mode enabled. To break this loop, explicitly call tc_windup() first time during initialization process to let it switch to some real time counter.	2010-09-21 08:02:02 +00:00
Alexander Motin	0e18987383	Make kern_tc.c provide minimum frequency of tc_ticktock() calls, required to handle current timecounter wraps. Make kern_clocksource.c to honor that requirement, scheduling sleeps on first CPU for no more then specified period. Allow other CPUs to sleep up to 1/4 second (for any case).	2010-09-14 08:48:06 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00

1 2 3 4 5

238 Commits