59 Commits

Author SHA1 Message Date
jhb
bcc5b0c55d Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
bz
10195fa2e4 Allow orm(4) to be disabled from probing/attaching by a hints entry:
hint.orm.0.disabled=1

Suggested by:	jhb
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6307
2016-05-10 22:28:06 +00:00
royger
390486acbc atrtc: export function to set RTC
This is going to be used by the Xen clock on Dom0 in order to set the RTC of
the host. The current logic in atrtc_settime is moved to atrtc_set and the
unused device_t parameter is removed from the atrtc_set function call so it
can be safely used by other callers.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib, jhb
Differential revision:	https://reviews.freebsd.org/D6067
2016-05-02 16:14:55 +00:00
pfg
be4082c832 X86: use our nitems() macro when it is avaliable through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:41:46 +00:00
jkim
b36627870b Silence PVS-Studio warning (V595). It can never be NULL here. 2016-02-23 23:57:24 +00:00
jhibbits
f8385663ee Introduce a RMAN_IS_DEFAULT_RANGE() macro, and use it.
This simplifies checking for default resource range for bus_alloc_resource(),
and improves readability.

This is part of, and related to, the migration of rman_res_t from u_long to
uintmax_t.

Discussed with:	jhb
Suggested by:	marcel
2016-02-20 01:32:58 +00:00
jhibbits
31bb8ee5bd Convert rman to use rman_res_t instead of u_long
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources.  For
now, this is still compatible with u_long.

This is step one in migrating rman to use uintmax_t for resources instead of
u_long.

Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.

This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.

Reviewed By: jhb
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
2016-01-27 02:23:54 +00:00
brueffer
406a68d5cf Set the initial system time to a sane (as in: not end of 21st century) value when
booting on a PC with CMOS clock set to a year before 2000.

This uses 1980 (instead of 1970 as in the initial patch) as pivot year as
suggested by imp in the PR followup.

PR:		195703
Submitted by:	cs@soi.spb.ru
Reviewed by:	imp
MFC after:	1 weeks
2015-06-29 17:02:09 +00:00
imp
5dd3f4fde2 Include mca_machdep.h. 2015-01-18 03:43:47 +00:00
imp
503404cbf7 Need to include opt_mca.h to test for DEV_MCA. 2015-01-17 02:17:59 +00:00
marcel
48e5a4e056 Virtual machines can easily have more than 16 option ROMs and
when that happens, we happily access our resource array out of
bounds. Make sure we stay within the MAX_ROMS limit.
While here, bump MAX_ROMS from 16 to 32 to minimize the chance
of leaving option ROMs unaccounted for.

Obtained from:	Juniper Networks, Inc.
2014-10-22 01:37:32 +00:00
royger
c934f5fd28 atpic: make sure atpic_init is called after IO APIC initialization
After r269510 the IO APIC and ATPIC initialization is done at the same
order, which means atpic_init can be called before the IO APIC has
been initalized. In that case the ATPIC will take over the interrupt
sources, preventing the IO APIC from registering them.

Reported by: David Wolfskill <david@catwhisker.org>
Tested by: David Wolfskill <david@catwhisker.org>,
           Trond Endrestøl <Trond.Endrestol@fagskolen.gjovik.no>
Sponsored by: Citrix Systems R&D
2014-08-07 17:00:50 +00:00
royger
152e3229be isa: allow ISA bus to attach to xenpv bus
This is needed because syscons depends on ISA.

Sponsored by: Citrix Systems R&D
Approved by: gibbs

x86/isa/isa.c:
 - Allow the ISA bus to attach to xenpv.
2014-06-16 08:49:16 +00:00
imp
9f008568e7 Remove vestiges of knowing the ISA bus, which we gave up on around 20
years ago. Remove redunant copy of isaregs.h.
2014-03-19 21:03:04 +00:00
royger
467e743960 xen: implement an early timer for Xen PVH
When running as a PVH guest, there's no emulated i8254, so we need to
use the Xen PV timer as the early source for DELAY. This change allows
for different implementations of the early DELAY function and
implements a Xen variant for it.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

dev/xen/timer/timer.c:
dev/xen/timer/timer.h:
 - Implement Xen early delay functions using the PV timer and declare
   them.

x86/include/init.h:
 - Add hooks for early clock source initialization and early delay
   functions.

i386/i386/machdep.c:
pc98/pc98/machdep.c:
amd64/amd64/machdep.c:
 - Set early delay hooks to use the i8254 on bare metal.
 - Use clock_init (that will in turn make use of init_ops) to
   initialize the early clock source.

amd64/include/clock.h:
i386/include/clock.h:
 - Declare i8254_delay and clock_init.

i386/xen/clock.c:
 - Rename DELAY to i8254_delay.

x86/isa/clock.c:
 - Introduce clock_init that will take care of initializing the early
   clock by making use of the init_ops hooks.
 - Move non ISA related delay functions to the newly introduced delay
   file.

x86/x86/delay.c:
 - Add moved delay related functions.
 - Implement generic DELAY function that will use the init_ops hooks.

x86/xen/pv.c:
 - Set PVH hooks for the early delay related functions in init_ops.

conf/files.amd64:
conf/files.i386:
conf/files.pc98:
 - Add delay.c to the kernel build.
2014-03-11 10:20:42 +00:00
jhb
94d685456e Drop the 3rd clause from all 3 clause BSD licenses where I am the sole
holder to convert them to 2 clause BSD licenses.

MFC after:	1 week
2014-02-05 18:13:27 +00:00
gibbs
a9c07a6f67 Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	- Make sure that are no MMU related IPIs pending on migration.
	- Reset pending IPI_BITMAP on resume.
	- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
	- Add a "suspend_cancelled" parameter to pic_resume().  For the
	  Xen PIC, restoration of interrupt services differs between
	  the aborted suspend and normal resume cases, so we must provide
	  this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
	- Don't swap out "suspend safe" timers across a suspend/resume
	  cycle.  This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
	- Perform proper suspend/resume process for PVHVM:
		- Suspend all APs before going into suspension, this allows us
		  to reset the vcpu_info on resume for each AP.
		- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
	- Implement suspend/resume support for the PV timer. Since FreeBSD
	  doesn't perform a per-cpu resume of the timer, we need to call
	  smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
	- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
	- When suspending a PVHVM domain make sure there are no MMU IPIs
	  in-flight, or we will get a lockup on resume due to the fact that
	  pending event channels are not carried over on migration.
	- Implement a generic version of restart_cpus that can be used by
	  suspended and stopped cpus.

sys/x86/xen/hvm.c:
	- Implement resume support for the hypercall page and shared info.
	- Clear vcpu_info so it can be reset by APs when resuming from
	  suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
	- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
	- Properly rebind per-cpus VIRQs and IPIs on resume.
2013-09-20 05:06:03 +00:00
brooks
861668a16b Call set_i8254_freq with MODE_STOP (0) rather than a magic number of 0. 2013-08-15 17:21:06 +00:00
mav
6cf7cc6e4d MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
imp
0e93b8cbe6 Use critical_enter/critical_exit around the time sensitive part of
this code to depessimize the worst case we've lived with silently and
uneventfully for the past 12 years. Add a comment about a refinement
for those needing more assurance of accuracy.

Fix ddb's show rtc command deadlock potential when debugging rtc code
by not taking the lock if we're in the debugger. If you need a thumb
to count the number of people that have encountered this, I'd be
surprised.

Submitted by:	bde
2013-02-21 15:35:48 +00:00
imp
e7a3528a80 Correct comment about use of pmtimer, and the real reason it isn't
used or desirable for amd64.
2013-02-21 06:38:24 +00:00
imp
c21cb04a9d Fix broken usage of splhigh() by removing it. 2013-02-21 00:40:08 +00:00
davide
da217eece1 Fixup r246916 in case gcc is used to build.
Reported by:	attilio, simon
2013-02-19 16:43:48 +00:00
mav
a8b029c7cf MFcalloutng:
Microoptimize i8254 one-shot operation mode (disabled by default to allow
timecounter functionality) by not writing to mode and MSB registers when
it is not required.  This saves several microseconds of CPU time per call,
reducing minimal measured interrupts interval to 19.5us.
2013-02-17 18:42:30 +00:00
eadler
92f340b6e7 This isn't functionally identical. In some cases a hint to disable
unit 0 would in fact disable all units.

This reverts r241856

Approved by: cperciva (implicit)
2012-10-22 13:06:09 +00:00
eadler
bc26a2b3b0 Now that device disabling is generic, remove extraneous code from the
device drivers that used to provide this feature.

Reviewed by:	des
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:41:14 +00:00
jhb
38fb55f1d4 Restore proper use of bounce buffers for ISA DMA. When locking was
added, the call to pmap_kextract() was moved up, and as a result the
code never updated the physical address to use for DMA if a bounce
buffer was used.  Restore the earlier location of pmap_kextract() so
it takes bounce buffers into account.

Tested by:	kargl
MFC after:	1 week
2012-03-29 18:58:02 +00:00
nyan
fbebe4e777 - Fix to build a native i386 kernel without the SMP and atpic.
- Merge r232744 changes to pc98.
  (Allow a kernel to be built with 'nodevice atpic'.)
- Move ICU related defines from x86/isa/atpic.c to x86/isa/icu.h and
  use them in x86/x86/intr_machdep.c.

Reviewed by:	jhb
2012-03-16 12:13:44 +00:00
jkim
6d3172737b Implement boot-time TSC synchronization test for SMP. This test is executed
when the user has indicated that the system has synchronized TSCs or it has
P-state invariant TSCs.  For the former case, we may clear the tunable if it
fails the test to prevent accidental foot-shooting.  For the latter case, we
may set it if it passes the test to notify the user that it may be usable.
2011-05-09 17:34:00 +00:00
jhb
5512bf549d Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus
versions instead.  They were never needed as bus_generic_intr() and
bus_teardown_intr() had been changed to pass the original child device up
in 42734, but the ISA bus was not converted to new-bus until 45720.
2011-05-06 13:48:53 +00:00
jkim
adbae7d2c9 Use newly added rdtsc32() for DELAY(9) as well. 2011-04-14 19:11:45 +00:00
jkim
5b73ac45d1 Add some tunable descriptions about x86 timers.
Requested by:	arundel
2011-04-14 00:07:08 +00:00
jkim
2092a06579 Do not use TSC for DELAY(9) if it not P-state invariant to avoid possible
foot-shooting.  DELAY() becomes unreliable when TSC frequency varies wildly,
especially cpufreq(4) and powerd(8) are used at the same time.
2011-04-12 22:41:52 +00:00
jkim
61582b7c03 Merge two similar functions to reduce duplication. 2011-04-11 19:27:44 +00:00
jkim
096c7a804f Refactor DELAYDEBUG as it is only useful for correcting i8254 frequency. 2011-04-08 19:54:29 +00:00
jkim
95c723445e Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
jkim
36e15e1609 When TSC is unavailable, broken or disabled and the current timecounter has
better quality than i8254 timer, use it for DELAY(9).
2011-03-14 22:05:59 +00:00
jkim
98d68ca741 Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.
2011-03-10 20:02:58 +00:00
brucec
4a353c54fd Fix typos - remove duplicate "is".
PR:		docs/154934
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after:	3 days
2011-02-23 09:22:33 +00:00
jhb
f6949632bc Small style fixes:
- Avoid side-effect assignments in if statements when possible.
- Don't use ! to check for NULL pointers, explicitly check against NULL.
- Explicitly check error return values against 0.
- Don't use INTR_MPSAFE for interrupt handlers with only filters as it is
  meaningless.
- Remove unneeded function casts.
2010-12-16 17:05:28 +00:00
avg
21647a4834 atrtc: remove (pre-)historic check of RTC NVRAM at address 0x0e
Old scrolls tell that once upon a time IBM AT BIOS was known to put some
useful system diagnostic information into RTC NVRAM.  It is not really
known if and for how long PC BIOSes followed that convention, but I
believe that many, if not all, modern BIOSes do not do that any more
(not mentioning other types of x86 firmware).
Some diagnostic bits don't even make any sense any longer.
The check results in confusing messages upon boot on some systems.
So I am removing it.

Discussed with:	bde, jhb, mav
MFC after:	3 weeks
2010-10-16 10:45:36 +00:00
mav
bbf7bbb468 Restore pre-r212778 optimization, skipping timer reprogramming when it is
not neccessary. It allows to avoid time counter jump of up to 1/18s, when
base frequency slightly tuned via machdep.i8254_freq sysctl.
Fix few style things.

Suggested by:	bde
2010-09-18 07:36:43 +00:00
mav
0f4b390682 Add one-shot mode support to attimer (i8254) event timer.
Unluckily, using one-shot mode is impossible, when same hardware used for
time counting. Introduce new tunable hint.attimer.0.timecounter, setting
which to 0 disables i8254 time counter and allows one-shot mode. Note,
that on some systems there may be no other reliable enough time counters,
so this tunable should be used with understanding.
2010-09-17 04:48:50 +00:00
mav
0ea74c96a2 Fix several un-/signedness bugs of r210290 and r210293. Add one more check. 2010-07-20 15:48:29 +00:00
mav
1021ed9c1f Extend timer driver API to report also minimal and maximal supported period
lengths. Make MI wrapper code to validate periods in request. Make kernel
clock management code to honor these hardware limitations while choosing hz,
stathz and profhz values.
2010-07-20 10:58:56 +00:00
mav
b076092fdd Rise knowledge about curthread->td_intr_frame by one step. Make timer
callback argument really opaque. Not repeat interrupt handler's problem
in case somebody will ever need to have both argument and frame.
2010-07-13 12:46:06 +00:00
mav
f723107552 Unify pc98 event timer code with the rest of x86.
Reviewed by:	nyan@
2010-07-13 06:57:27 +00:00
mav
55884c8757 Instead of deleting existing IRQ resource, which is not really working for
ACPI bus, find wanted IRQ rid or spare one. This should fix panic during
boot on systems reporting fancy IRQ numbers for attimer and atrtc.
2010-07-12 06:46:17 +00:00
mav
234db8607d Allow attimer to be hinted at ISA if not reported by ISA PNP or ACPI.
Rephrase respective atrtc code same way to be more readable.
2010-07-01 18:59:05 +00:00
mav
0966180de4 Rework r209456:
Instead of using fake rid (which ISA doesn't like), delete untrusted
IRQ resource and let it be recreated.
2010-07-01 18:51:18 +00:00