Commit Graph

11875 Commits

Author SHA1 Message Date
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
Roman Divacky
27d4fea6c5 Change the parameter passed to the inline assembly to u_short
as we are dealing with 16bit segment registers. Change mov
to movw.

Approved by:    rpaulo (mentor)
Reviewed by:    kib, rink
2010-09-03 14:25:17 +00:00
Rui Paulo
cba3269417 Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.

Sponsored by:	The FreeBSD Foundation
2010-08-28 08:03:29 +00:00
Rui Paulo
6bf9fb35e5 Sync DTrace bits with amd64 and fix the build.
Sponsored by:	The FreeBSD Foundation
2010-08-26 11:22:12 +00:00
Jung-uk Kim
db1cea00ad Increase maximum number of page table entries per VM86 context from 8 to 24
pages, yet again.  Now we can allocate a whole segment, which is required
for shadowing option ROM images, for example.
2010-08-25 21:13:23 +00:00
Rui Paulo
0bc1991a4a Call the necessary DTrace function pointers when we have different kinds
of traps.

Sponsored by:	The FreeBSD Foundation
2010-08-25 09:10:32 +00:00
Rui Paulo
8a8d8fa3d1 Add two DTrace trap type values. Used by fasttrap.
Sponsored by:	The FreeBSD Foundation
2010-08-24 13:13:24 +00:00
Attilio Rao
67a94de261 Revert part of the r211149 as I erroneously ported the logical_cpus from
Yahoo! patchset as a mask (and according manipulating variables) while
it is actually a CPU count.

Submitted by:	neel
MFC after:	1 month
X-MFC:		211149
2010-08-19 22:37:43 +00:00
John Baldwin
8c7a92bd4a Remove unused KTRACE includes. 2010-08-19 16:41:27 +00:00
Rui Paulo
187278cadc For every instance of '.if ${CC} == "foo"' or '.if ${CC} != "foo"' in
Makefiles or *.mk files, use ${CC:T:Mfoo} instead, so only the basename
of the compiler command (excluding any arguments) is considered.

This allows you to use, for example, CC="/nondefault/path/clang -xxx",
and still have the various tests in bsd.*.mk identify your compiler as
clang correctly.

ICC if cases were also changed.

Submitted by:	Dimitry Andric <dimitry at andric.com>
2010-08-17 20:39:28 +00:00
Pietro Cerutti
e0e08e6a60 - The iMac9,1 needs the PAT workaround as well
Approved by:	cognet
2010-08-17 12:17:24 +00:00
Konstantin Belousov
ee235befcb Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
Attilio Rao
3742bd96fe Revert r211176:
As long as interrupts are disabled and there is not explicit call to
sched_add() there can't be any preemption there, thus the calls may be
consistent.

Reported by:	kib, jhb
2010-08-12 13:46:43 +00:00
John Baldwin
60c7b36b7a Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int.  Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.
2010-08-11 23:22:53 +00:00
Attilio Rao
807ef45666 IPI handlers may run generally with interrupts disabled because they
are served via an interrupt gate.

However, that doesn't explicitly prevent preemption and thread
migration thus scheduler pinning may be necessary in some handlers.
Fix that.

Tested by:	gianni
MFC after:	1 month
2010-08-11 10:51:27 +00:00
Attilio Rao
7cd8b4cd42 Fix a typo due to a stale version of the patch.
Reported by:	gianni, rdivacky
MFC after:	1 month
X-MFC:		211149
2010-08-10 18:29:39 +00:00
Attilio Rao
4c967b618d Fix some places that may use cpumask_t while they still use 'int' types.
While there, also fix some places assuming cpu type is 'int' while
u_int is really meant.

Note: this will also fix some possible races in per-cpu data accessings
to be addressed in further commits.

In collabouration with:	Yahoo! Incorporated (via sbruno and peter)
Tested by:	gianni
MFC after:	1 month
2010-08-10 16:14:10 +00:00
Attilio Rao
d35534bf42 Simplify the logic for handling ipi_selected() and ipi_cpu() in the
amd64/i386 case.

Reviewed by:	jhb
Tested by:	gianni
MFC after:	1 month
X-MFC:		210939
2010-08-09 20:25:06 +00:00
David Malone
ee04083c8a Don't pass sizeof(u_int) to an argument of SYSCLT_PROC that ends up not
being used.
2010-08-08 20:34:53 +00:00
Bernhard Schmidt
5ec432ed82 Fix whitespace nits.
PR:		conf/148989
Submitted by:	pluknet <pluknet at gmail.com>
MFC after:	3 days
2010-08-06 18:46:27 +00:00
John Baldwin
d9d8d1449d Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
Jung-uk Kim
439f3d8b81 Implement a simple native VM86 backend for X86BIOS. Now i386 uses native
VM86 calls instead of the real mode emulator as a backend.  VM86 has been
proven reliable for very long time and it is actually few times faster than
emulation.  Increase maximum number of page table entries per VM86 context
from 3 to 8 pages.  It was (ridiculously) low and insufficient for new VM86
backend, which shares one context globally.  Slighly rearrange and clean up
the emulator backend to accommodate new code.  The only visible change here
is stack size, which is decreased from 64K to 4K bytes to sync. with VM86.
Actually, it seems there is no need for big stack in real mode.

MFC after:	1 month
2010-08-05 18:48:30 +00:00
John Baldwin
e2865ebbc2 Change the MPTable and $PIR PCI-PCI bridge drivers to inherit from the
generic PCI-PCI bridge driver and only override specific methods.  This
should fix suspend/resume of PCI-PCI bridges using these drivers.
2010-08-05 17:48:37 +00:00
John Baldwin
7134e39042 Tweak the logic to disable CLFLUSH in virtual environments to work around
problems with flushing the local APIC register range so that it checks
vm_guest directly.

Reviewed by:	kib, alc
MFC after:	2 weeks
2010-08-02 17:01:23 +00:00
Xin LI
a3bc0a4e5c Improve cputemp(4) driver wrt newer Intel processors, especially
Xeon 5500/5600 series:

 - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place
   of Tj(max) when a sane value is available, as documented
   in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane
   value we mean 70C - 100C for now);
 - Print the probe results when booting verbose;
 - Replace cpu_mask with cpu_stepping;
 - Use CPUID_* macros instead of rolling our own.

Approved by:	rpaulo
MFC after:	1 month
2010-07-29 19:08:22 +00:00
John Baldwin
536af0d751 Mark the __curthread() functions as __pure2 and remove the volatile keyword
from the inline assembly.  This allows the compiler to cache invocations of
curthread since it's value does not change within a thread context.

Submitted by:	zec (i386)
MFC after:	1 week
2010-07-29 18:44:10 +00:00
Jung-uk Kim
994ce54d01 MFamd64: r210615
Fix another fallout from r208833.  savectx() is used to save CPU context
for crash dump (dumppcb) and kdb (stoppcbs).  For both cases, we cannot
have a valid pointer in pcb_save.  This should restore the previous
behaviour.
2010-07-29 17:00:41 +00:00
John Baldwin
a955c461ad The corrected error count field is dependent on CMCI, not TES.
MFC after:	1 week
2010-07-28 21:52:09 +00:00
Matthew D Fleming
d7854da193 Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size.  The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class.  This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused.  At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by:    -arch (mostly silence)
Reviewed by:    zml
Approved by:    zml (mentor)
2010-07-28 15:36:12 +00:00
Alan Cox
a14a949872 The interpreter name should no longer be treated as a buffer that can be
overwritten.  (This change should have been included in r210545.)

Submitted by:	kib
2010-07-28 04:47:40 +00:00
John Baldwin
a3870a1826 Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy.  This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
  via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
  a CPU belongs to.  Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
  (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
  The MD code is required to populate an array of mem_affinity structures.
  Each entry in the array defines a range of memory (start and end) and a
  domain for the range.  Multiple entries may be present for a single
  domain.  The list is terminated by an entry where all fields are zero.
  This array of structures is used to split up phys_avail[] regions that
  fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
  used when fulfulling a physical memory allocation.  Right now the
  per-domain freelists are listed in a round-robin order for each domain.
  In the future a table such as the ACPI SLIT table may be used to order
  the per-domain lookup lists based on the penalty for each memory domain
  relative to a specific domain.  The lookup lists may be examined via a
  new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
  pick a lookup list when allocating memory.

Reviewed by:	alc
2010-07-27 20:33:50 +00:00
Jung-uk Kim
172754036a Simplify fldcw() macro. There is no reason to use pointer here. No object
file change after this commit (verified with md5).
2010-07-26 23:20:55 +00:00
Jung-uk Kim
8b019a8887 Remove an unused macro since r189418. 2010-07-26 22:55:14 +00:00
Jung-uk Kim
30402401a7 Reduce diff against fenv.h:
Mark all inline asms as volatile for safety.  No object file change after
this commit (verified with md5).
2010-07-26 22:16:36 +00:00
Jung-uk Kim
2e50fa36a5 FNSTSW instruction can use AX register as an operand.
Obtained from:	fenv.h
2010-07-26 21:24:52 +00:00
Rui Paulo
daef39e7ae Remove the acpi_aiboost driver. It has been replaced by aibs(4). 2010-07-25 17:55:57 +00:00
Rui Paulo
2b95672852 MFamd64:
Add USD_GETBASE(), USD_SETBASE(), USD_GETLIMIT() and USD_SETLIMIT().
2010-07-21 18:47:52 +00:00
Tijl Coosemans
3245ecbe92 Store fsbase and gsbase in the right fields of the mcontext. They were
switched.

PR:		i386/148344
Approved by:	kib (mentor)
MFC after:	1 week
2010-07-20 12:36:36 +00:00
Alexander Motin
060d7431b5 Add hints for i8254 timer on i386 and amd64. Some people report about
systems with PnP/ACPI not reporting i8254 timer. In some cases it can be
fatal, as i8254 can be the only available time counter hardware. From other
side we are now heavily depend on i8254 timer and till the last time it's
init/usage was completely hardcoded. So this change just restores previous
behavior in more regular fashion.
2010-07-16 23:21:46 +00:00
Alexander Motin
fcc06be1b2 Move functions declaration to MI code, following implementation. 2010-07-15 17:49:35 +00:00
Bernhard Schmidt
774f94f14c - Update 6000 firmware to 9.221.4.1
- Add 6050 firmware

MFC after:	2 weeks
2010-07-15 11:26:07 +00:00
Warner Losh
1003cfe94d Remove obsolete undef of COPY_SIGCODE. It appears to have not been
used in FreeBSD in quite some time (maybe since before 4.4-lite :)

Submitted by:	bde
2010-07-13 15:06:13 +00:00
Alan Cox
8155e5d561 Reduce the number of global TLB shootdowns generated by pmap_qenter().
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown.  Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns.  The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown.  With this change, that is no longer so.

For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.

MFC after:	3 weeks
2010-07-10 18:22:44 +00:00
Konstantin Belousov
b543e91ba5 Fix spacing.
Noted by:	pgollucci
MFC after:	3 weeks
2010-07-09 21:27:42 +00:00
Konstantin Belousov
2680dac9e1 For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
  dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by:	alc [1]
Reviewed by:	alc
MFC after:	3 weeks
2010-07-09 20:05:56 +00:00
Alexander Motin
af565edaaa Revert r209638. After commit, there appeared to be more people who liked
previous name of stray interrupt counters, then responded to the list.
2010-07-02 17:22:15 +00:00
Alexander Motin
b7bc6aa726 Make stray irq counters have format alike to other counters. Unified format
makes string processing (for example by `systat -vm`) easier.
2010-07-01 21:58:46 +00:00
John Baldwin
fc0de8f0b6 Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
2010-06-30 18:03:42 +00:00
Konstantin Belousov
13cedde2cb Regenerate 2010-06-28 18:17:21 +00:00