Commit Graph

5859 Commits

Author SHA1 Message Date
Konstantin Belousov
78ae4338a2 Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with:	bz, rwatson
Approved by:	re (bz, kensmith)
MFC after:	2 weeks
2010-10-12 09:18:17 +00:00
Konstantin Belousov
b3b4bec7e6 Regen. 2010-10-08 07:19:05 +00:00
Konstantin Belousov
5d2a6a61b4 Fix typo.
Submitted by:	arundel
MFC after:	3 days
2010-10-08 07:18:44 +00:00
Konstantin Belousov
3f506a78ce Display PCID capability of CPU and add CPUID define for it.
MFC after:	1 week
2010-10-05 15:31:56 +00:00
Konstantin Belousov
2d5db3709b The makectx() function, used by kdb_trap() to reconstruct pcb from
trap frame when trap initiated kdb entry, incorrectly calculated the
value of %rsp for trapped thread.

According to Intel(R) 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide, Part 1, rev. 035, 6.14.2 64-Bit Mode
Stack Frame, "64-bit mode ... pushes SS:RSP unconditionally, rather than
only on a CPL change."
Even assuming the conditional push of the %ss:%rsp, the calculation
was still wrong because sizeof(tf_ss) + sizeof(tf_rsp) == 16 on amd64.

Always use the tf_rsp from trap frame. The change supposedly fixes
stepping when using kgdb backend for kdb.

Submitted by:	Zhouyi Zhou <zhouzhouyi gmail com>
PR:	amd64/151167
Reviewed by:	avg
MFC after:	1 week
2010-10-03 13:52:17 +00:00
Andriy Gapon
d443a96ffb i386 and amd64 mp_machdep: improve topology detection for Intel CPUs
This patch is significantly based on previous work by jkim.
List of changes:
- added comments that describe topology uniformity assumption
- added reference to Intel Processor Topology Enumeration article
- documented a few global variables that describe topology
- retired weirdly set and used logical_cpus variable
- changed fallback code for mp_ncpus > 0 case, so that CPUs are treated
  as being different packages rather than cores in a single package
- moved AMD-specific code to topo_probe_amd [jkim]
- in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT
  and core masks and match APIC IDs against those masks [started by
  jkim]
- in topo_probe_0x4() drop code for double-checking topology parameters
  by looking at L1 cache properties [jkim]
- in topo_probe_0xb() add fallback path to topo_probe_0x4() as
  prescribed by Intel [jkim]

Still to do:
- prepare for upcoming AMD CPUs by using new mechanism of uniform
  topology description [pointed by jkim]
- probe cache topology in addition to CPU topology and probably use that
  for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have
  the same CPU topology, but Athlon cores do not share L2 cache while
  Core2's do (no L3 cache in both cases)
- think of supporting non-uniform topologies if they are ever
  implemented for platforms in question
- think how to better described old HTT vs new HTT distinction, HTT vs
  SMT can be confusing as SMT is a generic term
- more robust code for marking CPUs as "logical" and/or "hyperthreaded",
  use HTT mask instead of modulo operation
- correct support for halting logical and/or hyperthreaded CPUs, let
  scheduler know that it shouldn't schedule any threads on those CPUs

PR:			kern/145385 (related)
In collaboration with:	jkim
Tested by:		Sergey Kandaurov <pluknet@gmail.com>,
			Jeremy Chadwick <freebsd@jdc.parodius.com>,
			Chip Camden <sterling@camdensoftware.com>,
			Steve Wills <steve@mouf.net>,
			Olivier Smedts <olivier@gid0.org>,
			Florian Smeets <flo@smeets.im>
MFC after:		1 month
2010-10-01 10:32:54 +00:00
Neel Natu
5c1a8dc028 Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.
The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by:	NetApp
Submitted by:	Will McGovern (will at netapp dot com)
Reviewed by:	mjacob, jhb
2010-09-29 21:53:11 +00:00
David Xu
295fbd498e Now userland POSIX semaphore is based on umtx. The kernel module
is only used to support binary compatible, if want to run old
binary, you need to kldload the module.
2010-09-24 09:04:16 +00:00
Norikatsu Shigemura
cbf4dac64f Add support 'device tpm' for amd64.
Add tpm(4)'s default setting to /boot/defaults/loader.conf.
Add 'device tpm' to NOTES for amd64 and i386.

Discussed with:	takawata
Approved by:	imp (mentor)
2010-09-19 14:40:37 +00:00
Andriy Gapon
0b750af1b1 amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory
KVA space is abundant on amd64, so there is no reason to limit kernel
map size to a fraction of available physical memory.  In fact, it could
be larger than physical memory.

This should help with memory auto-tuning for ZFS and shouldn't affect
other workloads.
This should reduce number of circumstances for "kmem_map too small"
panics, but probably won't eliminate them entirely due to potential kmem
fragmentation.

In fact, you might want/need to limit maximum ARC size after this commit
if you need to resrve more memory for applications.

This change was discussed on arch@ and nobody said "don't do it".

MFC after:	6 weeks
2010-09-17 07:36:32 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Kenneth D. Merry
d3c7b9a08a MFp4 (//depot/projects/mps/...)
Bring in a driver for the LSI Logic MPT2 6Gb SAS controllers.

This driver supports basic I/O, and works with SAS and SATA drives and
expanders.

Basic error recovery works (i.e. timeouts and aborts) as well.

Integrated RAID isn't supported yet, and there are some known bugs.

So this isn't ready for production use, but is certainly ready for
testing and additional development.  For the moment, new commits to this
driver should go into the FreeBSD Perforce repository first
(//depot/projects/mps/...) and then get merged into -current once
they've been vetted.

This has only been added to the amd64 GENERIC, since that is the only
architecture I have tested this driver with.

Submitted by:	scottl
Discussed with:	imp, gibbs, will
Sponsored by:	Yahoo, Spectra Logic Corporation
2010-09-10 15:03:56 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
Roman Divacky
27d4fea6c5 Change the parameter passed to the inline assembly to u_short
as we are dealing with 16bit segment registers. Change mov
to movw.

Approved by:    rpaulo (mentor)
Reviewed by:    kib, rink
2010-09-03 14:25:17 +00:00
Jung-uk Kim
305c5c0acb Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use
these values in the function.
2010-08-30 21:19:42 +00:00
Rui Paulo
cba3269417 Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.

Sponsored by:	The FreeBSD Foundation
2010-08-28 08:03:29 +00:00
Rui Paulo
0bc1991a4a Call the necessary DTrace function pointers when we have different kinds
of traps.

Sponsored by:	The FreeBSD Foundation
2010-08-25 09:10:32 +00:00
Rui Paulo
8a8d8fa3d1 Add two DTrace trap type values. Used by fasttrap.
Sponsored by:	The FreeBSD Foundation
2010-08-24 13:13:24 +00:00
Attilio Rao
67a94de261 Revert part of the r211149 as I erroneously ported the logical_cpus from
Yahoo! patchset as a mask (and according manipulating variables) while
it is actually a CPU count.

Submitted by:	neel
MFC after:	1 month
X-MFC:		211149
2010-08-19 22:37:43 +00:00
John Baldwin
8c7a92bd4a Remove unused KTRACE includes. 2010-08-19 16:41:27 +00:00
Pietro Cerutti
e0e08e6a60 - The iMac9,1 needs the PAT workaround as well
Approved by:	cognet
2010-08-17 12:17:24 +00:00
Konstantin Belousov
ee235befcb Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
Jung-uk Kim
0405a5efe7 Reset switchtime to zero rather than the current CPU ticker (TSC) value.
It is more appropriate in this context because TSC MSR is reset to zero
when the CPU is restarted from S3 and above.  Move acpi_resync_clock() back
to where it was before r211202.  It does not make a difference any more.
2010-08-13 22:08:42 +00:00
Attilio Rao
3742bd96fe Revert r211176:
As long as interrupts are disabled and there is not explicit call to
sched_add() there can't be any preemption there, thus the calls may be
consistent.

Reported by:	kib, jhb
2010-08-12 13:46:43 +00:00
Jung-uk Kim
a1004d0abf Reset switchtime and switchticks after resynchronizing the system clock.
This should fix weird runtime problem after resume on amd64.  It also fixes
"calcru: runtime went backwards" warnings with bootverbose.
2010-08-12 00:20:46 +00:00
John Baldwin
60c7b36b7a Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int.  Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.
2010-08-11 23:22:53 +00:00
Attilio Rao
807ef45666 IPI handlers may run generally with interrupts disabled because they
are served via an interrupt gate.

However, that doesn't explicitly prevent preemption and thread
migration thus scheduler pinning may be necessary in some handlers.
Fix that.

Tested by:	gianni
MFC after:	1 month
2010-08-11 10:51:27 +00:00
Attilio Rao
7cd8b4cd42 Fix a typo due to a stale version of the patch.
Reported by:	gianni, rdivacky
MFC after:	1 month
X-MFC:		211149
2010-08-10 18:29:39 +00:00
Attilio Rao
4c967b618d Fix some places that may use cpumask_t while they still use 'int' types.
While there, also fix some places assuming cpu type is 'int' while
u_int is really meant.

Note: this will also fix some possible races in per-cpu data accessings
to be addressed in further commits.

In collabouration with:	Yahoo! Incorporated (via sbruno and peter)
Tested by:	gianni
MFC after:	1 month
2010-08-10 16:14:10 +00:00
Attilio Rao
d35534bf42 Simplify the logic for handling ipi_selected() and ipi_cpu() in the
amd64/i386 case.

Reviewed by:	jhb
Tested by:	gianni
MFC after:	1 month
X-MFC:		210939
2010-08-09 20:25:06 +00:00
David Malone
ee04083c8a Don't pass sizeof(u_int) to an argument of SYSCLT_PROC that ends up not
being used.
2010-08-08 20:34:53 +00:00
Konstantin Belousov
1757d9699d Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS
in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well.

MFC after:	1 week
2010-08-07 11:57:13 +00:00
Bernhard Schmidt
5ec432ed82 Fix whitespace nits.
PR:		conf/148989
Submitted by:	pluknet <pluknet at gmail.com>
MFC after:	3 days
2010-08-06 18:46:27 +00:00
Jung-uk Kim
64299552b9 Remove unnecessary casting and simplify code. We are not there yet. ;-) 2010-08-06 17:21:32 +00:00
Jung-uk Kim
05db09e056 Correct argument order of acpi_restorecpu(), which was forgotten in r210804. 2010-08-06 15:59:00 +00:00
John Baldwin
d9d8d1449d Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
John Baldwin
e2865ebbc2 Change the MPTable and $PIR PCI-PCI bridge drivers to inherit from the
generic PCI-PCI bridge driver and only override specific methods.  This
should fix suspend/resume of PCI-PCI bridges using these drivers.
2010-08-05 17:48:37 +00:00
Jung-uk Kim
aa9928df7a Remove an unnecessary register load. 2010-08-03 16:08:58 +00:00
Jung-uk Kim
3ab42a25a9 savectx() has not been used for fork(2) for about 15 years. [1]
Do not clobber FPU thread's PCB as it is more harmful.  When we resume CPU,
unconditionally reload FPU state.

Pointed out by:	bde [1]
2010-08-03 15:32:08 +00:00
Jung-uk Kim
6305bb243c Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make
better use of cache lines by placing it before pcb_save (now pcb_user_save),
which is moved to the end of pcb since r210777.
2010-08-02 18:12:30 +00:00
Jung-uk Kim
a2d2c83668 - Merge savectx2() with savectx() and struct xpcb with struct pcb. [1]
savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs).  Thus,
saving additional information does not hurt and it may be even beneficial.
Unfortunately, struct pcb has grown larger to accommodate more data.
Move 512-byte long pcb_user_save to the end of struct pcb while I am here.
- savectx() now saves FPU state unconditionally and copy it to the PCB of
FPU thread if necessary.  This gives panic dump and kdb a chance to take
a look at the current FPU state even if the FPU is "supposedly" not used.
- Resuming CPU now unconditionally reinitializes FPU.  If the saved FPU
state was irrelevant, it could be in an unknown state.

Suggested by:	bde [1]
2010-08-02 17:35:00 +00:00
John Baldwin
7134e39042 Tweak the logic to disable CLFLUSH in virtual environments to work around
problems with flushing the local APIC register range so that it checks
vm_guest directly.

Reviewed by:	kib, alc
MFC after:	2 weeks
2010-08-02 17:01:23 +00:00
Xin LI
16430b12a3 In rdmsr_safe, use zero extend (by doing a 32-bit movl over
eax to itself) instead of a sign extend.

Discussed with:	stas
MFC after:	1 month
2010-07-30 21:39:28 +00:00
Xin LI
a3bc0a4e5c Improve cputemp(4) driver wrt newer Intel processors, especially
Xeon 5500/5600 series:

 - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place
   of Tj(max) when a sane value is available, as documented
   in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane
   value we mean 70C - 100C for now);
 - Print the probe results when booting verbose;
 - Replace cpu_mask with cpu_stepping;
 - Use CPUID_* macros instead of rolling our own.

Approved by:	rpaulo
MFC after:	1 month
2010-07-29 19:08:22 +00:00
John Baldwin
536af0d751 Mark the __curthread() functions as __pure2 and remove the volatile keyword
from the inline assembly.  This allows the compiler to cache invocations of
curthread since it's value does not change within a thread context.

Submitted by:	zec (i386)
MFC after:	1 week
2010-07-29 18:44:10 +00:00
Jung-uk Kim
9727ca6a77 Fix another fallout from r208833. savectx() is used to save CPU context
for crash dump (dumppcb) and kdb (stoppcbs).  For both cases, there cannot
have a valid pointer in pcb_save.  This should restore the previous
behaviour.
2010-07-29 16:49:20 +00:00
Jung-uk Kim
39381048f0 Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h. 2010-07-29 16:41:21 +00:00
John Baldwin
a955c461ad The corrected error count field is dependent on CMCI, not TES.
MFC after:	1 week
2010-07-28 21:52:09 +00:00
Matthew D Fleming
d7854da193 Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size.  The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class.  This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused.  At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by:    -arch (mostly silence)
Reviewed by:    zml
Approved by:    zml (mentor)
2010-07-28 15:36:12 +00:00
Alan Cox
a14a949872 The interpreter name should no longer be treated as a buffer that can be
overwritten.  (This change should have been included in r210545.)

Submitted by:	kib
2010-07-28 04:47:40 +00:00
John Baldwin
a3870a1826 Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy.  This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
  via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
  a CPU belongs to.  Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
  (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
  The MD code is required to populate an array of mem_affinity structures.
  Each entry in the array defines a range of memory (start and end) and a
  domain for the range.  Multiple entries may be present for a single
  domain.  The list is terminated by an entry where all fields are zero.
  This array of structures is used to split up phys_avail[] regions that
  fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
  used when fulfulling a physical memory allocation.  Right now the
  per-domain freelists are listed in a round-robin order for each domain.
  In the future a table such as the ACPI SLIT table may be used to order
  the per-domain lookup lists based on the penalty for each memory domain
  relative to a specific domain.  The lookup lists may be examined via a
  new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
  pick a lookup list when allocating memory.

Reviewed by:	alc
2010-07-27 20:33:50 +00:00
Jung-uk Kim
172754036a Simplify fldcw() macro. There is no reason to use pointer here. No object
file change after this commit (verified with md5).
2010-07-26 23:20:55 +00:00
Jung-uk Kim
07c86dcf54 Add missing ldmxcsr() prototype for lint case. 2010-07-26 23:02:18 +00:00
Jung-uk Kim
30402401a7 Reduce diff against fenv.h:
Mark all inline asms as volatile for safety.  No object file change after
this commit (verified with md5).
2010-07-26 22:16:36 +00:00
Jung-uk Kim
2e50fa36a5 FNSTSW instruction can use AX register as an operand.
Obtained from:	fenv.h
2010-07-26 21:24:52 +00:00
Jung-uk Kim
9bfb10b154 Re-implement FPU suspend/resume for amd64. This removes superfluous uses
of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs().
Also, we do not touch PCB flags any more.

MFC after:	1 month
2010-07-26 19:53:09 +00:00
Konstantin Belousov
970eba46d5 Remove unneeded includes.
Submitted by:	alc
MFC after:	1 week
2010-07-26 14:38:51 +00:00
Konstantin Belousov
d48dda1244 Regen 2010-07-23 21:31:03 +00:00
Konstantin Belousov
0b53d1569e Remove the linux_exec_copyin_args(), freebsd32_exec_copyin_args() may
server as well. COMPAT_FREEBSD32 is a prerequisite for COMPAT_LINUX32.

Reviewed by:	alc
MFC after:	3 weeks
2010-07-23 21:30:33 +00:00
Alan Cox
69a8f9e3d1 Eliminate a little bit of duplicated code. 2010-07-23 18:58:27 +00:00
Konstantin Belousov
87d45a0392 When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by:	jhb, nwhitehorn
MFC after:	2 weeks
2010-07-22 09:13:49 +00:00
Alexander Motin
060d7431b5 Add hints for i8254 timer on i386 and amd64. Some people report about
systems with PnP/ACPI not reporting i8254 timer. In some cases it can be
fatal, as i8254 can be the only available time counter hardware. From other
side we are now heavily depend on i8254 timer and till the last time it's
init/usage was completely hardcoded. So this change just restores previous
behavior in more regular fashion.
2010-07-16 23:21:46 +00:00
Alexander Motin
fcc06be1b2 Move functions declaration to MI code, following implementation. 2010-07-15 17:49:35 +00:00
Alan Cox
7db39bcb05 Optimize pmap_remove()'s handling of PG_G mappings. Specifically,
instead of calling pmap_invalidate_page() for each PG_G mapping, call
pmap_invalidate_range() for each range of PG_G mappings.  In addition,
eliminate a redundant call to pmap_invalidate_page().  Both
pmap_remove_pte() and pmap_remove_page() called pmap_invalidate_page()
when the mapping had the PG_G attribute.  Now, only pmap_remove_page()
calls pmap_invalidate_page().  Altogether, these changes eliminate 53%
of the TLB shootdowns for a "buildworld" on a ZFS file system.  On
FFS, the reduction is 3%.

MFC after:	6 weeks
2010-07-15 16:25:51 +00:00
Bernhard Schmidt
774f94f14c - Update 6000 firmware to 9.221.4.1
- Add 6050 firmware

MFC after:	2 weeks
2010-07-15 11:26:07 +00:00
Warner Losh
1003cfe94d Remove obsolete undef of COPY_SIGCODE. It appears to have not been
used in FreeBSD in quite some time (maybe since before 4.4-lite :)

Submitted by:	bde
2010-07-13 15:06:13 +00:00
Jung-uk Kim
588697d478 Move i386-inherited logic of building ACPI headers for acpi_wakeup.c into
better places and remove intermediate makefile and shell scripts.  This
makes parallel kernel build little bit safer for amd64.
2010-07-12 21:08:35 +00:00
John Baldwin
05f81eb67d Remove a dead test. We already exclude NMI traps from this code in an
earlier condition.

MFC after:	1 week
2010-07-12 20:45:37 +00:00
Konstantin Belousov
aaa95ccb50 When switching the thread from the processor, store %dr7 content
into the pcb before disabling watchpoints. Otherwise, when the
thread is restored on a processor, watchpoints are still disabled.

Submitted by:	Tijl Coosemans <tijl coosemans org>
	(I would be much happier if Tijl commited this himself)
MFC after:	1 week
2010-07-12 19:59:15 +00:00
Alan Cox
8155e5d561 Reduce the number of global TLB shootdowns generated by pmap_qenter().
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown.  Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns.  The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown.  With this change, that is no longer so.

For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.

MFC after:	3 weeks
2010-07-10 18:22:44 +00:00
Konstantin Belousov
2680dac9e1 For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
  dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by:	alc [1]
Reviewed by:	alc
MFC after:	3 weeks
2010-07-09 20:05:56 +00:00
Alan Cox
77cb6e6f8d Correctly maintain the per-cpu field "curpmap" on amd64 just like we
do on i386.  The consequences of not doing so on amd64 became apparent
with the introduction of the COUNT_IPIS and COUNT_XINVLTLB_HITS
options.  Specifically, single-threaded applications were generating
unnecessary IPIs to shoot-down the TLB on other processors.  However,
this is clearly nonsensical because a single-threaded application is
only running on the current processor.  The reason that this happens
is that pmap_activate() is unable to properly update the old pmap's
field "pm_active" without the correct "curpmap".  So, in effect, stale
bits in "pm_active" were leading pmap_protect(), pmap_remove(),
pmap_remove_pages(), etc. to flush the TLB contents on some arbitrary
processor that wasn't even running the same application.

Reviewed by:	kib
MFC after:	3 weeks
2010-07-08 03:35:00 +00:00
Rui Paulo
80599de862 Fix style issues with the previous commit, namely
use-tab-instead-of-space and don't use underscores in macro variables.

Pointed out by:	bde
2010-07-07 12:08:58 +00:00
Kevin Lo
054d87d697 Add the u3g(4) driver. I can't find any reason why it's not here. 2010-07-07 09:23:46 +00:00
Rui Paulo
8923c96ee1 Introduce USD_{SET,GET}{BASE,LIMIT}. These help setting up the user
segment descriptor hi and lo values. Idea from Solaris.

Reviewed by:	kib
2010-07-06 16:56:27 +00:00
Alexander Motin
af565edaaa Revert r209638. After commit, there appeared to be more people who liked
previous name of stray interrupt counters, then responded to the list.
2010-07-02 17:22:15 +00:00
Alexander Motin
b7bc6aa726 Make stray irq counters have format alike to other counters. Unified format
makes string processing (for example by `systat -vm`) easier.
2010-07-01 21:58:46 +00:00
John Baldwin
fc0de8f0b6 Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
2010-06-30 18:03:42 +00:00
Konstantin Belousov
13cedde2cb Regenerate 2010-06-28 18:17:21 +00:00
Konstantin Belousov
595473a587 Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.

The DF clearing for signal handlers was done some time ago.

MFC after:	1 week
2010-06-23 20:44:07 +00:00
Konstantin Belousov
692add74d8 Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() for
get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also,
note that usercontext is not initialized anymore in fpstate_drop().

Systematically replace references to npxgetregs() and npxsetregs()
by npxgetuserregs() and npxsetuserregs() in comments.

Noted by:	bde
2010-06-23 12:17:13 +00:00
Alexander Motin
25eb1b8c15 Some style fixes for r209371.
Submitted by:	jhb@
2010-06-22 16:20:10 +00:00
Alexander Motin
875b8844be Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
 LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
 HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
 i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
 RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
2010-06-20 21:33:29 +00:00
Konstantin Belousov
9e3e64e797 Only enable kdtrace hook in the LINT on the architectures that implement it. 2010-06-18 18:51:09 +00:00
Konstantin Belousov
b376ebac6b In the ia32_{get,set}_fpcontext(), use fpu{get,set}userregs instead
of fpu{get,set}regs.

Noted by:	bde
MFC after:	1 month
2010-06-17 12:35:17 +00:00
Alexander Motin
d364638110 Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.

Add counters for timer-related IPIs.

Reviewed by:	jhb@ (previous version)
2010-06-17 11:54:49 +00:00
John Baldwin
61d3f0bab2 Restore the machine check register banks on resume. For banks being
monitored via CMCI, reset the interrupt threshold to 1 on resume.

Reviewed by:	jkim
MFC after:	2 weeks
2010-06-15 18:51:41 +00:00
Konstantin Belousov
07c809237a Remove two obsoleted comments, add a note about 32bit compatibility.
MFC after:	1 month
2010-06-15 18:16:04 +00:00
Konstantin Belousov
4a23ecc77e Rename CRITSECT_ASSERT to CRITICAL_ASSERT.
Suggested by:	jhb
MFC after:	1 month
2010-06-15 14:59:35 +00:00
Konstantin Belousov
997534958e Use critical sections instead of disabling local interrupts to ensure
the consistency between PCPU fpcurthread and the state of the FPU.

Explicitely assert that the calling conventions for fpudrop() are
adhered too. In cpu_thread_exit(), add missed critical section entrance.

Reviewed by:	bde
Tested by:	pho
MFC after:	1 month
2010-06-15 09:19:33 +00:00
Jung-uk Kim
c977e11721 Fix ACPI suspend/resume on amd64, which was broken since r208833.
We need actual storage for FPU state to save and restore.
2010-06-14 20:08:26 +00:00
Alexander Motin
2f9fc3899b Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu()
for pre-bound IRQs during boot, submit there LAPIC ID, same as in other
places, not CPU ID.
2010-06-14 07:38:53 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Alan Cox
9124d0d6a3 Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set.  Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
2010-06-11 15:49:39 +00:00
Alexander Kabaev
60743cbd22 Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requests
in Linux emulation layer. Linux seems to only require that pos is
page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter
checking is too strict to allow some Linux binaries to run. tsMuxeR is
one example of such a binary.

Discussed with:	jhb
MFC after:	1 week
2010-06-10 17:59:47 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
John Baldwin
b9cd2f771a Move the MD support for PCI message signalled interrupts to the x86 tree
as it is identical for i386 and amd64.
2010-06-08 18:36:03 +00:00
John Baldwin
2465e30f0c Move the machine check support code to the x86 tree since it is identical
on i386 and amd64.

Requested by:	alc
2010-06-08 18:04:07 +00:00
John Baldwin
53a908cb07 Move the I/O APIC code to the x86 tree since it is identical on i386 and
amd64.
2010-06-08 17:51:21 +00:00
John Baldwin
bfc7a4fc48 - Use a bit more care when moving I/O APIC interrupts between CPUs. Mask
the interrupt followed by a brief delay if it is not currently masked
  before moving the interrupt.
- Move the icu_lock out of ioapic_program_intpin() and into callers.  This
  closes a race where ioapic_program_intpin() could use a stale value of
  the masked state to compute the masked bit in the register.

Reviewed by:	mav
MFC after:	2 weeks
2010-06-08 17:08:13 +00:00
Konstantin Belousov
4f24f88ebb Style-compilant order of declarations.
Noted by:	bde
MFC after:	1 month
2010-06-06 16:13:50 +00:00
Konstantin Belousov
6cf9a08d2c Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches.  There is also a
facility to allow the kernel thread to use pcb save_area.

Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.

KPI discussed with:	fabient
Tested by:    pho, fabient
Hardware provided by:	Sentex Communications
MFC after:    1 month
2010-06-05 15:59:59 +00:00
Alan Cox
b2830a9649 Eliminate a stale comment. 2010-05-31 06:06:10 +00:00
Alan Cox
72dc3eb65b Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.
2010-05-30 18:48:41 +00:00
Alan Cox
8f0d5d3b9f When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first.  Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter().  In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page.  (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.)  This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping.  Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().
2010-05-29 17:10:45 +00:00
John Baldwin
0c86af8162 Defer initializing machine checks for the boot CPU until the local APIC is
fully configured.

MFC after:	1 month
2010-05-28 17:50:24 +00:00
Alan Cox
52d8ba372e Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released.  This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().
2010-05-28 06:49:57 +00:00
Alan Cox
c46b90e90a Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
John Baldwin
58ccad7ddc Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached.  CMCI also includes a count of events when reporting
corrected errors in the bank's status register.  Note that individual
banks may or may not support CMCI.  If they do, each bank includes its own
threshold register that determines when the interrupt fires.  Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl).  The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by:	Sailaja Bangaru  sbappana at yahoo com
MFC after:	1 month
2010-05-24 15:45:05 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Alexander Motin
dbd55f3ff0 - Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
2010-05-24 11:40:49 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Alexander Motin
fa1ed4bd1a Unify local_apic.c for x86 archs, 2010-05-23 17:45:01 +00:00
John Baldwin
e826ef1ec4 - Adjust the whitespace for the lines that output fields in 'show pcpu' in
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
  the idle process is now shared across all idle threads.

MFC after:	1 month
2010-05-21 17:17:56 +00:00
Poul-Henning Kamp
065b12a703 Rename an argument from "exp" to "expect" since the former makes FlexeLint
uneasy, in case anybody think it might be exp(3) in libm.

This also makes it consistent with other archs.
2010-05-20 06:18:03 +00:00
John Baldwin
3b642a049b Add constants for the optional EOI suppression support in local APICs and
EOI registers in I/O APICs.
2010-05-19 19:52:41 +00:00
Alan Cox
9ab6032f73 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
Konstantin Belousov
7565f3e837 Do not use .extern, it is not strictly needed with gas and it is custom
to omit it.

Requested by:	bde
MFC after:	6 days
2010-05-13 09:59:10 +00:00
Konstantin Belousov
6a440fc3e0 Route all returns from the interrupts and faults through the doreti_iret
labeled iretq instruction.

Suppose that multithreaded process executes two threads, currently
scheduled on different processors. Let assume that thread A executes
using %cs or %ss pointing into the descriptor from LDT. If IPI comes
which handler does not return by jump to doreti, and meantime thread B
invalidates descriptor pointed to by %cs or %ss, then iretq from IPI
handler could fault.

Routing the return by doreti_iret allows kernel to catch the situation
and recover from it by sending signal to the usermode.

Tested by:	pho
MFC after:	1 week
2010-05-12 10:29:35 +00:00
Konstantin Belousov
99b1ff2ac4 Remove unneeded overrides of the segment registers in the inner trap
frame upon segment register load fault. The doreti procedure does not
load segment registers when returning to the kernel frame, and current
values in the segment descriptor cache already allow the kernel mode
to run, not modified by faulted loaded.

Suggested by:	bde
Tested by:	pho
MFC after:	1 week
2010-05-12 10:29:06 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Alan Cox
7024db1d40 Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free().  Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.
2010-05-06 16:39:43 +00:00
Konstantin Belousov
db8fd40e9f Add definitions for Intel AESNI CPUID bits and print the capabilities
on boot.

Hardware provided by:	Sentex Communications
MFC after:	1 week
2010-05-05 21:07:47 +00:00
Joel Dahl
8e0ad55abb Switch to our preferred 2-clause BSD license.
Approved by:	kmacy
2010-05-05 20:39:02 +00:00
Konstantin Belousov
bf6f1b56c5 Style and comment adjustements.
Suggested and reviewed by:	bde
MFC after:	3 days
2010-05-03 14:30:49 +00:00
Konstantin Belousov
6a5baa54fd Remove debugging code that was not used once since commit.
Suggested by:	bde
MFC after:	1 week
2010-05-01 13:15:35 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Attilio Rao
d8b878873e - Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
  processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
  processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
  necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by:	Sandvine Incorporated
PR:		threads/116181
Reviewed by:	emaste, marcel
MFC after:	3 weeks
2010-04-28 15:38:01 +00:00
Konstantin Belousov
8bac98182a Style: use #define<TAB> instead of #define<SPACE>.
Noted by:	bde, pluknet gmail com
MFC after:	11 days
2010-04-27 09:48:43 +00:00
Kip Macy
cd4d97b439 missed pv access before pmap lock 2010-04-25 23:51:05 +00:00
Kip Macy
c28d264aa0 Incremental reduction of delta with head_page_lock_2 branch
- replace modification of pmap resident_count with pmap_resident_count_{inc,dec}
- the pv list is protected by the pmap lock, but in several cases we are relying
  on the vm page queue mutex, move pv_va read under the pmap lock
2010-04-25 23:18:02 +00:00
Andrew Thompson
f6a9d95ba3 Set USB_DEBUG like the other platforms, I had turned it off to test the build
before committing r207077.

Spotted by:	marius
2010-04-25 22:01:32 +00:00
Alan Cox
0d2e1c3e39 Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation.  Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection.  This could, by itself, be
fixed, but I don't see the point given the above argument.

Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.
2010-04-25 20:40:45 +00:00
Kip Macy
81b47f28bc apply style(9) changes applied to head_page_lock_2
requested by: kib@
2010-04-24 21:17:07 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Jung-uk Kim
b834123032 If a conditional jump instruction has the same jt and jf, do not perform
the test and jump unconditionally.
2010-04-22 23:47:19 +00:00
Andrew Thompson
b850ecc180 Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had
the illusion of a tunable setting but was always turned on regardless.

MFC after:	1 week
2010-04-22 21:31:34 +00:00
Konstantin Belousov
94c6c6ba67 As was done in r155238 for i386 and in r155239 for amd64, clear the carry
flag for ia32 binary executed on amd64 host in get_mcontext().

PR:	kern/92110 (one more time)
Reported by:	stas
MFC after:	1 week
2010-04-21 11:17:16 +00:00
Rui Paulo
ff569d8436 Rename the cyclic global variable lapic_cyclic_clock_func to just
cyclic_clock_func. This will make more sense when we start developing non
x86 cyclic version.
2010-04-20 17:03:30 +00:00
Pyun YongHyeon
d193ed0bed Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet.
This driver was written by Alexander Pohoyda and greatly enhanced
by Nikolay Denev. I don't have these hardwares but this driver was
tested by Nikolay Denev and xclin.

Because SiS didn't release data sheet for this controller, programming
information came from Linux driver and OpenSolaris. Unlike other open
source driver for SiS190/191, sge(4) takes full advantage of TX/RX
checksum offloading and does not require additional copy operation in
RX handler.
The controller seems to have advanced offloading features like VLAN
hardware tag insertion/stripping, TCP segmentation offload(TSO) as
well as jumbo frame support but these features are not available
yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw>
who sent fix for receiving VLAN oversized frames.
2010-04-14 20:45:33 +00:00
Konstantin Belousov
b71e04d3a8 ld_gs_base is executing with stack containing only the frame,
temporary pushed %rflags has been popped already.

Pointy hat to:	kib
MFC after:	3 days
2010-04-14 20:04:55 +00:00
Konstantin Belousov
5f82d16eb1 Change printf() calls to uprintf() for sigreturn() and trap() complaints
about inacessible or wrong mcontext, and for dreaded "kernel trap with
interrupts disabled" situation. The later is changed when trap is
generated from user mode (shall never be ?).

Normalize the messages to include both pid and thread name.

MFC after:	1 week
2010-04-13 10:12:58 +00:00
Konstantin Belousov
a35d07a831 Handle a case when non-canonical address is loaded into the fsbase or
gsbase MSR.

MFC after:	3 days
2010-04-10 18:38:11 +00:00
Fabien Thomas
1fa7f10bac - Support for uncore counting events: one fixed PMC with the uncore
domain clock, 8 programmable PMC.
- Westmere based CPU (Xeon 5600, Corei7 980X) support.
- New man pages with events list for core and uncore.
- Updated Corei7 events with Intel 253669-033US December 2009 doc.
  There is some removed events in the documentation, they have been
  kept in the code but documented in the man page as obsolete.
- Offcore response events can be setup with rsp token.

Sponsored by: NETASQ
2010-04-02 13:23:49 +00:00
John Baldwin
90dfe31955 Add a handler for the local APIC error interrupt. For now it just prints
out the current value of the local APIC error register when the interrupt
fires.

MFC after:	1 week
2010-03-29 19:13:34 +00:00
John Baldwin
6fb5da5092 Cosmetic tweak to use a type suffix instead of a cast to force a constant
to be a long.
2010-03-29 18:47:04 +00:00
Ed Schouten
510ea843ba Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.
2010-03-28 13:13:22 +00:00
Alan Cox
3792de2e87 Correctly handle preemption of pmap_update_pde_invalidate().
X-MFC after:	r205573
2010-03-27 23:53:47 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
John Baldwin
121b3af9f2 Remove unneeded type specifiers from 64-bit constants. The compiler
infers their natural type from the constants' values.

Submitted by:	bde
MFC after:	3 days
2010-03-22 15:08:26 +00:00
Alan Cox
d088493ba0 Eliminate a pointless TLB invalidation from pmap_bootstrap(). No mappings
whatsoever are changed between the earlier load_cr3() and this invalidation.
2010-03-21 00:21:59 +00:00
Alan Cox
cea8f9dfaf I am told by AMD that the machine check hardware on the instruction TLB
won't generate bogus exceptions.  Therefore, the implementation of the
"unofficial" workaround needn't mask L1TP errors by the instruction cache
unit.
2010-03-21 00:13:11 +00:00
Andriy Gapon
9344361b66 pmap amd64/i386: fix a typo in a comment
MFC after:	3 days
2010-03-19 14:48:32 +00:00
John Baldwin
42c93b8d31 Use the same policy for rejecting / not-reject ACPI tables with incorrect
checksums as the base acpi(4) driver.  This fixes a problem where the MADT
parser would reject the MADT table during early boot causing the MP Table
to be, but then the acpi(4) driver would attach and use non-SMP interrupt
routing.

Tested by:	Alastair Hogge  agh of coolrhaug com
MFC after:	1 week
2010-03-19 12:43:18 +00:00
John Baldwin
a311ca2f45 - Extend the machine check record structure to include several fields useful
for parsing model-specific and other fields in machine check events
  including the global machine check capabilities and status registers,
  CPU identification, and the FreeBSD CPU ID.
- Report these added fields in the console log of a machine check so that
  a record structure can be reconstituted from the console messages.
- Parse new architectural errors including memory controller errors.

MFC after:	1 week
2010-03-16 16:01:19 +00:00
Ed Schouten
338f1debcd Remove COMPAT_43TTY from stock kernel configuration files.
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
2010-03-13 09:21:00 +00:00
John Baldwin
55c4e01602 Fix the previous attempt to fix kernel builds of HEAD on 7.x. Use the
__gnu_inline__ attribute for PMAP_INLINE when using the 7.x compiler to
match what 7.x uses for PMAP_INLINE.
2010-03-12 03:08:47 +00:00
Nathan Whitehorn
841c0c7ec7 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
John Baldwin
343803ad83 Print out the family and model from the cpu_id. This is especially useful
given the advent of the extended family and extended model fields.  The
values are printed in hex to match their common usage in documentation.

Submitted by:	Alexander Best
MFC after:	1 week
2010-03-11 14:17:37 +00:00
Konstantin Belousov
2a595a404f Fall back to wbinvd when region for CLFLUSH is >= 2MB.
Submitted by:	Kevin Day <toasty dragondata com>
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-10 15:50:38 +00:00
John Baldwin
f126fa5fee Now that the workaround for the AMD 10h CPUs is in place, re-enable machine
checks by default on amd64.

Discussed with:	alc
2010-03-09 15:12:53 +00:00
Alan Cox
102c07edb3 Implement AMD's recommended workaround for Erratum 383 on Family 10h
processors.  With this workaround, superpage promotion can be re-enabled
under virtualization.  Moreover, machine check exceptions can safely be
enabled when FreeBSD is running natively on Family 10h processors.

Most of the credit should go to Andriy Gapon for diagnosing the error and
working with Borislav Petkov at AMD to document it.  Andriy also reviewed
and tested my patches.

Discussed with:	jhb
MFC after:	3 weeks
2010-03-09 03:30:31 +00:00
Joel Dahl
1edcf74de7 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from:	NetBSD
2010-03-03 17:55:51 +00:00
Attilio Rao
306c0c6ea0 Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.
Right now, some atrtc seems reporting strange diagnostic error* making the
current pattern bogus.

In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.

Sponsored by:	Sandvine Incorporated
Tested by:	yongari, Richard Todd
		<rmtodd at ichotolot dot servalan dot com>
MFC:		3 weeks
X-MFC:		r202387, 204309
2010-03-03 17:13:29 +00:00
John Baldwin
977cb83962 Print the contents of the miscellaneous (MISC) register to the console if
it is valid along with the other register values when a machine check is
encountered.

MFC after:	1 week
2010-03-01 13:56:15 +00:00
Alan Cox
0b993ee5fd When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs.  Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by:	jhb, kib
MFC after:	3 days
2010-02-27 18:00:57 +00:00
Attilio Rao
3258030144 Introduce the new kernel sub-tree x86 which should contain all the code
shared and generalized between our current amd64, i386 and pc98.

This is just an initial step that should lead to a more complete effort.
For the moment, a very simple porting of cpufreq modules, BIOS calls and
the whole MD specific ISA bus part is added to the sub-tree but ideally
a lot of code might be added and more shared support should grow.

Sponsored by:	Sandvine Incorporated
Reviewed by:	emaste, kib, jhb, imp
Discussed on:	arch
MFC:		3 weeks
2010-02-25 14:13:39 +00:00
Justin T. Gibbs
daf6545e61 Enforce stronger semantics for bus-dma alignment (currently only on amd64).
Now all contiguous regions returned from bus-dma will be aligned to the
alignment constraint and all but the last region are guaranteed to be
a multiple of the alignment in length.  This also means that the relative
alignment of two adjacent bytes in the I/O stream have a difference of 1
even if they are not physically contiguous.

The old code, when needing to perform a copy in order to align data, only
copied the amount of data needed to reach the next page boundary.  This
often left an unaligned end to the segment.  Drivers such as Xen's blkfront
can't deal with such segments.

The downside to this approach is that, once an unaligned region is encountered,
the remainder of the I/O will be bounced.  However, bouncing should be rare.
It is typically caused by non-performance critical userland programs that
don't bother to align their I/O buffers (e.g. bsdlabel).  In-kernel I/O
buffers are always aligned to at least a page boundary.

Reviewed by:	scottl
MFC after:      2 weeks
2010-02-22 17:03:45 +00:00
Alan Cox
cc611daaf4 Since create_pagetables() zeroes the page tables, pmap_bootstrap() needn't
zero *CMAP1.
2010-02-21 03:49:39 +00:00
Ed Schouten
0b918ea7a9 Remove redundant inclusion of <sys/cdefs.h>.
In my previous commit I should have moved the inclusion to the top,
instead of adding a second one.
2010-02-20 14:13:47 +00:00
Ed Schouten
d502d4503a Add <sys/cdefs.h>.
This header file uses __packed, without including <sys/cdefs.h>. This
means it cannot be used in the way described in sysarch(3) by only
including <machine/sysarch.h>.
2010-02-20 13:33:50 +00:00
Ed Schouten
ddc534916d Allow the pmap code to be built with GCC from FreeBSD 7 again.
This patch basically gives us the best of both worlds. Instead of
forcing the compiler to emulate GNU-style inline semantics even though
we're using ISO C99, it will only use GNU-style inlining when the
compiler is configured that way (__GNUC_GNU_INLINE__).

Tested by:	jhb
2010-02-18 14:28:38 +00:00
Attilio Rao
c1210a7d97 Adjust style (following the already existing rules) for the newly
introduced option DEADLKRES.

Reported by:	danfe, julian, avg
2010-02-15 23:44:48 +00:00
Attilio Rao
88cbfa852e Add the options DEADLKRES (introducing the deadlock resolver thread) in
the 'debugging' section of any HEAD kernel and enable for the mainstream
ones, excluding the embedded architectures.
It may, of course, enabled on a case-by-case basis.

Sponsored by:	Sandvine Incorporated
Requested by:	emaste
Discussed with:	kib
2010-02-10 16:30:04 +00:00
Rebecca Cran
c7ea7c4618 Update documentation for the iwn and iwnfw drivers: they support the 1000, 5150, 6000 and 6050 devices too, with firmware modules for the 4965, 1000, 5000, 5150 and 6000.
Add documentation for mwl and all the wireless firmware drivers.

Approved by:	rrs (mentor)
2010-02-08 21:38:42 +00:00
Robert Noland
7b59c0c5f5 Enable MTRR on all VIA CPUs that claim support (amd64).
This is the amd64 part of r203289.

Noticed by:	jhb
MFC after:	2 weeks
2010-02-02 01:20:33 +00:00
Robert Noland
b1ba33ffbe Welcome drm support for VIA unichrome chips.
MFC after:	2 weeks
2010-01-31 14:30:39 +00:00
Andriy Gapon
c4d16d268f add static qualifier to definition of a function already declared static
This is for improving code readibility only.

MFC after:	1 week
2010-01-29 10:20:11 +00:00
Edward Tomasz Napierala
48cd7df296 Fix array overflow. This routine is only called from procfs,
which is not mounted by default, and I've been unable to trigger
a panic without this fix applied anyway.

Reviewed by:	kib, cperciva
2010-01-24 12:13:38 +00:00
Alan Cox
040a1eeab2 Simplify the mapping of the system message buffer. Use the direct map just
like ia64 does.
2010-01-23 20:28:37 +00:00
Konstantin Belousov
5b1162b964 For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number
and arguments after ptracestop(), if debugger modified anything in the
process environment. Since procfs stopeven requires number of syscall
arguments in p_xstat, this cannot be solved by moving stop/trace point
before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reported by:	Ali Polatel <alip exherbo org>
Briefly discussed with:	jhb
MFC after:	3 weeks
2010-01-23 11:45:35 +00:00
John Baldwin
13c18821fa Move the examples for the 'hints' and 'env' keywords from various GENERIC
kernel configs into NOTES.

Reviewed by:	imp
2010-01-19 17:20:34 +00:00
Ed Schouten
91bfd816f2 Recommit r193732:
Remove __gnu89_inline.

  Now that we use C99 almost everywhere, just use C99-style in the pmap
  code. Since the pmap code is the only consumer of __gnu89_inline, remove
  it from cdefs.h as well. Because the flag was only introduced 17 months
  ago, I don't expect any problems.

  Reviewed by:    alc

It was backed out, because it prevented us from building kernels using a
7.x compiler. Now that most people use 8.x, there is nothing that holds
us back. Even if people run 7.x, they should be able to build a kernel
if they run `make kernel-toolchain' or `make buildworld' first.
2010-01-19 15:31:18 +00:00
Attilio Rao
a26cb6d547 Handling all the three clocks (hardclock, softclock, profclock) with the
LAPIC may lead to aliasing for softclock and profclock because frequencies
are sized in order to fit mainly hardclock.
atrtc used to take care of the softclock and profclock and it does still
do, if the LAPIC can't handle the clocks properly.

Revert the change when the LAPIC started taking charge of all three of
them and let atrtc handle softclock and profclock if not explicitly
requested. Such request can be made setting != 0 the new tunable
machdep.lapic_allclocks or if the new device ATPIC is not present
within the i386 kernel config (atrtc is linked to atpic presence).

Diagnosed by:	Sandvine Incorporated
Reviewed by:	jhb, emaste
Sponsored by:	Sandvine Incorporated
MFC:		3 weeks
2010-01-15 16:04:30 +00:00
John Baldwin
ca6c375670 Update the ident for the XENHVM kernel config to match the filename.
MFC after:	1 week
2010-01-14 15:07:18 +00:00
Gavin Atkinson
7964930201 Spell "Hz" correctly wherever it is user-visible.
PR:		bin/142566
Submitted by:	N.J. Mann   njm njm.me.uk
Approved by:	ed (mentor)
MFC after:	2 weeks
2010-01-12 17:59:58 +00:00
Marcel Moolenaar
409a390c33 Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after:	1 week
2010-01-11 18:10:13 +00:00
Alan Cox
ac24a8ea24 Simplify pmap_init(). Additionally, correct a harmless misbehavior on i386.
Specifically, where locore had created large page mappings for the kernel,
the wrong vm page array entries were being initialized.  The vm page array
entries for the pages containing the kernel were being initialized instead
of the vm page array entries for page table pages.

MFC after:	1 week
2010-01-11 16:01:20 +00:00
Alan Cox
92697f16aa Eliminate unused declarations. 2010-01-10 21:00:52 +00:00
Warner Losh
87948dfdf2 Add INCLUDE_CONFIG_FILE in GENERIC on all non-embedded platforms.
# This is the resolution of removing it from DEFAULTS...

MFC after:	5 days
2010-01-10 17:44:22 +00:00
Konstantin Belousov
293409233b Set md_ldt (pointer to the LDT) after md_ldt_sd (system segment
descriptor for the LDT) is populated. md_ldt is used by context-switch
code as indicator that LDT segment register shall be loaded with
GUSERLDT segment instead of 0, so context switch at the wrong time may
cause attempt to load non-populated descriptor.

Use store with the barrier to prevent other CPUs from seeing updated
md_ldt but not seeing updated md_ldt_sd. Multithreaded process may
context-switch to another thread of the process on another CPU and read
md_ldt.

MFC after:	1 week
2010-01-09 11:28:01 +00:00
Bjoern A. Zeeb
193171b7f5 In sys/<arch>/conf/Makefile set TARGET to <arch>. That allows
sys/conf/makeLINT.mk to only do certain things for certain
architectures.

Note that neither arm nor mips have the Makefile there, thus
essentially not (yet) supporting LINT.  This would enable them
do add special treatment to sys/conf/makeLINT.mk as well chosing
one of the many configurations as LINT.

This is a hack of doing this and keeping it in a separate commit
will allow us to more easily identify and back it out.

Discussed on/with:	arch, jhb (as part of the LINT-VIMAGE thread)
MFC after:		1 month
2010-01-08 18:57:31 +00:00
Warner Losh
56eff2143f Revert 200594. This file isn't intended for these sorts of things. 2010-01-04 21:30:04 +00:00
Brooks Davis
9efde58392 Add vlan(4) to all GENERIC kernels.
MFC after:	1 week
2010-01-03 20:40:54 +00:00
David E. O'Brien
93d8be03d9 Quiet variable "shadows" warning:
sys/vmmeter.h: warning: shadowed declaration is here
  machine/cpufunc.h: In function 'insw':
  machine/cpufunc.h: warning: declaration of 'cnt' shadows a global declaration
  ..snip..
2010-01-01 20:55:11 +00:00
Robert Noland
cfd7bacef2 Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.
This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by:	jhb@
MFC after:	Not in this lifetime...
2009-12-29 21:51:28 +00:00
John Baldwin
390cee8729 - Create a separate section in in the MI NOTES file for PCI wireless NIC
drivers and move bwi(4) there from the PCI Ethernet NIC section.
- Move ath(4) and ral(4) to the MI NOTES file.

Reviewed by:	rpaulo
2009-12-18 16:13:21 +00:00
Doug Barton
f1bdf073c1 Add INCLUDE_CONFIG_FILE, and a note in comments about how to also
include the comments with CONFIGARGS
2009-12-16 02:17:43 +00:00
Konstantin Belousov
1173b9a2d0 For ia32 syscall(), call cpu_set_syscall_retval(). Update comment inside
cpu_set_syscall_retval() accordingly.

MFC after:	1 week
2009-12-12 20:11:31 +00:00
Jung-uk Kim
93eba8807c Simplify a macro not to generate unncessary symbols. 2009-12-08 22:38:42 +00:00
Andriy Gapon
e72b7e5bba mca: small enhancements related to cpu quirks
- use utility macros for CPU family/model checking
- limit Intel P6 quirk to pre-Nehalem models (taken from OpenSolaris)
- add AMD GartTblWkEn quirk for families 0Fh and 10h; I haven't experienced
  any problems without the quirk but both Linux and OpenSolaris do this
- slightly re-arrange quirk code to provide for the future generalization
  and separation of vendor-specific quirk functions

Reviewed by:	jhb
MFC after:	1 week
2009-12-03 16:10:21 +00:00
Andriy Gapon
d5e341a956 mca: improve status checking, recording and reporting
- directly print mca information in case we fail to allocate memory
  for a record
- include bank number into mca record
- print raw mca status value for extended information

Reviewed by:	jhb
MFC after:	10 days
2009-12-02 15:45:55 +00:00
Andriy Gapon
5022f21bd9 amdsbwd: new driver for AMD SB600/SB7xx watchdog timer
The hardware is compliant with WDRT specification, so I originally
considered including generic WDRT watchdog support, but decided
against it, because I couldn't find anyone to the code for me.
WDRT seems to be not very popular.
Besides, generic WDRT porbably requires a slightly different driver
approach.

Reviewed by:	des, gavin, rpaulo
MFC after:	3 weeks
2009-11-30 11:44:03 +00:00
Andriy Gapon
71224c78d4 x86 cpu features: add MOVBE reporting and flag
The check is glimpsed from Linux and OpenSolaris.
MOVBE instruction is found in Intel Atom processors.
2009-11-30 11:11:08 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Jung-uk Kim
26b8a1c94f - Add more aggressive BPF JIT optimization. This is in more favor of i386
while the previous commit was more amd64-centric.
- Use calloc(3) instead of malloc(3)/memset(3) in user land[1].

Submitted by:	ed[1]
2009-11-23 22:23:19 +00:00
Jung-uk Kim
35012a1e69 Add an experimental and rudimentary JIT optimizer to reduce unncessary
overhead from short BPF filter programs such as "get the first 96 bytes".
2009-11-21 00:19:09 +00:00
Jung-uk Kim
c12b965f99 General style cleanup, no functional change. 2009-11-20 21:12:40 +00:00
Jung-uk Kim
5ecf77367c - Allocate scratch memory on stack instead of pre-allocating it with
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9).  It has no
performance benefit over malloc(9)/free(9)[2].

Requested by:	rwatson[1]
Pointed out by:	rwatson, jhb, alc[2]
2009-11-20 18:49:20 +00:00
Jung-uk Kim
986689c263 Fix tinderbox build for i386 and sync amd64 with it. 2009-11-19 15:45:24 +00:00
Jung-uk Kim
ae4fdab8a8 - Change internal function bpf_jit_compile() to return allocated size of
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.
2009-11-18 23:40:19 +00:00
Jung-uk Kim
366652f987 - Make BPF JIT compiler working again in userland. We are limiting size of
generated native binary to page size for now.
- Update copyright date and fix some style nits.
2009-11-18 19:26:17 +00:00
Poul-Henning Kamp
8c0099aed3 Uppercase the UL suffix on a constant, so Flexelint doesn't worry that
'u1' might have been intended.  No, that does not make sense and yes
I have told them.
2009-11-16 10:53:04 +00:00
Konstantin Belousov
ec24e8d42e Amd64 init_secondary() calls initializecpu() while curthread is still
not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to
initializecpu() that results in hang because AP are started when kernel
environment is already dynamic and thus needs to acquire mutex, that is
too early in AP start sequence to work.

Extract the code that should be executed only once, because it sets
up global variables, from initializecpu() to initializecpucache(),
and call the later only from hammer_time() executed on BSP. Now,
TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage.

In collaboration with:	Mykola Dzham <freebsd levsha org ua>
Reviewed by:	jhb
Tested by:	ed, battlez
2009-11-13 13:07:01 +00:00
Jun Kuriyama
bb830eceaa - Style nits.
- Remove unneeded TUNABLE_INT().

Suggested by:	avg, kib
2009-11-12 03:31:19 +00:00
Andriy Gapon
6cc16fcb4e reflect that pg_ps_enabled is a tunable, not just a read-only sysctl
Nod from:	jhb
2009-11-11 14:21:31 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Roman Divacky
68c4dfdf0c Make isa_dma functions MPSAFE by introducing its own private lock. These
functions are selfcontained (ie. they touch only isa_dma.c static variables
and hardware) so a private lock is sufficient to prevent races. This changes
only i386/amd64 while there are also isa_dma functions for ia64/sparc64.
Sparc64 are ones empty stubs and ia64 ones are unused as ia64 does not
have isa (says marcel).

This patch removes explicit locking of Giant from a few drivers (there
are some that requires this but lack ones - this patch fixes this) and
also removes the need for implicit locking of Giant from attach routines
where it's provided by newbus.

Approved by:	ed (mentor, implicit)
Reviewed by:	jhb, attilio (glanced by)
Tested by:	Giovanni Trematerra <giovanni.trematerra gmail com>
IA64 clue:	marcel
2009-11-09 20:29:10 +00:00
Jun Kuriyama
6f5c96c41d - Add hw.clflush_disable loader tunable to avoid panic (trap 9) at
map_invalidate_cache_range() even if CPU is not Intel.
- This tunable can be set to -1 (default), 0 and 1.  -1 is same as
  current behavior, which automatically disable CLFLUSH on Intel CPUs
  without CPUID_SS (should be occured on Xen only).  You can specify 1
  when this panic happened on non-Intel CPUs (such as AMD's).  Because
  disabling CLFLUSH may reduce performance, you can try with setting 0
  on Intel CPUs without SS to use CLFLUSH feature.

Reviewed by:	kib
Reported by:	karl, kuriyama
Related to:	kern/138863
2009-11-09 02:54:16 +00:00
Attilio Rao
f1c892a33c Strip from messages for users external URLs the project cannot directly
control.

Requested by:	kib, rwatson
2009-11-05 14:34:38 +00:00
Jung-uk Kim
8fa0490a2e Tweak memory allocation for amd64 suspend/resume CPU context. 2009-11-04 22:39:18 +00:00
Attilio Rao
06db609d4a Opteron rev E family of processor expose a bug where, in very rare
ocassions, memory barriers semantic is not honoured by the hardware
itself. As a result, some random breakage can happen in uninvestigable
ways (for further explanation see at the content of the commit itself).

As long as just a specific familly is bugged of an entire architecture
is broken, a complete fix-up is impratical without harming to some
extents the other correct cases.
Considering that (and considering the frequency of the bug exposure)
just print out a warning message if the affected machine is identified.

Pointed out by:	Samy Al Bahra <sbahra at repnop dot org>
Help on wordings by:	jeff
MFC:	3 days
2009-11-04 01:32:59 +00:00
John Baldwin
f12c034874 Fix some problems with effective mmap() offsets > 32 bits. This was
partially fixed on amd64 earlier.  Rather than forcing linux_mmap_common()
to use a 32-bit offset, have it accept a 64-bit file offset.  This offset
is then passed to the real mmap() call.  Rather than inventing a structure
to hold the normal linux_mmap args that has a 64-bit offset, just pass
each of the arguments individually to linux_mmap_common() since that more
closes matches the existing style of various kern_foo() functions.

Submitted by:	Christian Zander @ Nvidia
MFC after:	1 week
2009-10-28 20:17:54 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Jung-uk Kim
1d9fd1477c Try hiding annoying text cursor after the video controller is reset. 2009-10-23 18:57:52 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Konstantin Belousov
051f6f8a7a Move intr_describe() out of #ifdef SMP; the function is always required.
Reviewed by:	jhb
2009-10-16 12:00:59 +00:00
John Baldwin
37b8ef16cd Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
  the nexus(4) driver supply a custom bus_describe_intr method that invokes
  a new intr_describe() MD routine which in turn looks up the associated
  interrupt event and invokes intr_event_describe_handler().

Requested by:	many
Reviewed by:	scottl
MFC after:	2 weeks
2009-10-15 14:54:35 +00:00
John Baldwin
55b6a401ef Move the USB wireless drivers down into their own section next to the USB
ethernet drivers.

Submitted by:	Glen Barber  glen.j.barber @ gmail
MFC after:	1 month
2009-10-13 19:02:03 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00
Attilio Rao
8448afced8 atomic_cmpset_barr_* was added in order to cope with compilers willing to
specify their own version of atomic_cmpset_* which could have been
different than the membar version.

Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.

In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.

Requested by:	jhb
Reviewed by:	jhb
Tested by:	Giovanni Trematerra <giovanni dot trematerra at
		gmail dot com>
2009-10-09 15:51:40 +00:00
Jung-uk Kim
a7e2341e20 Clean up amd64 suspend/resume code.
- Allocate memory for wakeup code after ACPI bus is attached.  The early
memory allocation hack was inherited from i386 but amd64 does not need it.
- Exclude real mode IVT and BDA explicitly.  Improve comments about memory
allocation and reason for the exclusions.  It is a no-op in reality, though.
- Remove an unnecessary CLD from wakeup code and re-align.
2009-10-08 17:41:53 +00:00
Attilio Rao
d9492a4483 - All the functions in atomic.h needs to be in "physical" form (like
not defined through macros or similar) in order to be later compiled in
  the kernel and offer this way the support for modules (and
  compatibility among the UP case and SMP case).
  Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
  and specifying a template.  Note that the new DEFINE_CMPSET_GEN()
  template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.

[1] Reported by:	Paul B. Mahol <onemda at gmail dot com>
2009-10-06 23:48:28 +00:00
Attilio Rao
86d2e48c22 Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used.  GCC, however, does that aggressively, even in
presence of volatile operands.  The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).

Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.

Reported by:	jhb
Reviewed by:	jhb
Tested by:	rdivacky, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
2009-10-06 13:45:49 +00:00
Bjoern A. Zeeb
52bf2041ac Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by:	kib
MFC after:	1 month
2009-10-03 11:57:21 +00:00
Konstantin Belousov
b02395c64d As a workaround, for Intel CPUs, do not use CLFLUSH in
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.

XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.

Tested by:	csjp
Reviewed by:	alc
MFC after:	3 days
2009-10-01 12:52:48 +00:00
Rui Paulo
c16c6b65da Improve 802.11s comment.
Spotted by:	dougb
MFC after:	1 day
2009-10-01 02:08:42 +00:00
Andriy Gapon
beb2c1f3e9 cpufunc.h: unify/correct style of c extension names
i386 and amd64 archs only.
inline => __inline. [1]
__asm__ => __asm. [2]

Reviewed by:	kib, jhb [1]
Suggested by:	kib [2]
MFC after:	1 week
2009-09-30 16:34:50 +00:00
Alan Cox
1eaff126fa Temporarily disable the use of 1GB page mappings by the direct map. There
are currently two problems with the use of 1GB page mappings by the direct
map.  First, at least one device driver uses pmap_extract() rather than
DMAP_TO_PHYS() to translate a direct map address to a physical address.
Unfortunately, neither pmap_extract() nor pmap_kextract() yet support 1GB
page mappings.  Second, pmap_bootstrap() needs to interrogate the MTRRs to
ensure that a 1GB page mapping doesn't span two MTRRs of different types.

Reported and tested by: Daniel O'Connor
MFC after:	3 days
2009-09-28 17:10:27 +00:00
Jung-uk Kim
71f99e637a Copy apm(4) emulation from sys/i386/acpica/acpi_machdep.c and
install apm(8) and apm_bios.h on amd64.
2009-09-27 14:00:16 +00:00
Bjoern A. Zeeb
4507f02e0e lindev(4) [1] is supposed to be a collection of linux-specific pseudo
devices that we also support, just not by default (thus only LINT or
module builds by default).

While currently there is only "/dev/full" [2], we are planning to see more
in the future.  We may decide to change the module/dependency logic in the
future should the list grow too long.

This is not part of linux.ko as also non-linux binaries like kFreeBSD
userland or ports can make use of this as well.

Suggested by:	rwatson [1] (name)
Submitted by:	ed [2]
Discussed with:	markm, ed, rwatson, kib (weeks ago)
Reviewed by:	rwatson, brueffer (prev. version)
PR:		kern/68961
MFC after:	6 weeks
2009-09-26 12:45:28 +00:00
Ed Maste
43721b6e84 Add a backtrace to the "fpudna in kernel mode!" case, to help track down
where this comes from.

Reviewed by:	bde
2009-09-24 14:26:42 +00:00
Andriy Gapon
1e908511f8 number of cleanups in i386 and amd64 pci md code
o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
  unsigned before comparison with maximum value to cut off negative
  values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
  values are already checked in the subsequent switch

Reviewed by:	jhb
MFC after:	1 week
2009-09-24 07:11:23 +00:00
John Baldwin
d95e7f5a7a Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
  APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
  mapping the RSDT or XSDT and searching for a table with a given signature
  out into acpica_machdep.c for both amd64 and i386.
2009-09-23 15:42:35 +00:00
John Baldwin
07ee969179 - Split the logic to parse an SMAP entry out into a separate function on
amd64 similar to i386.  This fixes a bug on amd64 where overlapping
  entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
  instead of an append to support systems with out-of-order SMAP entries.

PR:		amd64/138220
Reported by:	James R. Van Artsdalen  james of jrv org
MFC after:	3 days
2009-09-22 16:51:00 +00:00
Xin LI
a57707e712 Build x86bios only for i386/amd64 for now. More work is required
to make these functional on other architectures, and the current
code breaks sparc64 and powerpc.

Spotted by:	tinderbox via des
2009-09-21 23:58:29 +00:00
Konstantin Belousov
a1bfaca761 If CPU happens to be in usermode when a T_RESERVED trap occured,
then trapsignal is called with ksi.ksi_signo = 0. For debugging kernels,
that should end up in panic, for non-debugging kernels behaviour is
undefined.

Do panic regardeless of execution mode at the moment of trap.

Reviewed by:	jhb
MFC after:	1 month
2009-09-21 09:41:51 +00:00
Xin LI
6abad12dfe Automatically depend on x86emu when vesa or dpms is being built into
kernel.  With this change the user no longer need to remember building
this option.

Submitted by:	swell.k at gmail.com
2009-09-21 07:08:20 +00:00
Xin LI
372c733759 Enable s3pci on amd64 which works on top of VESA, and allow
static building it into kernel on i386 and amd64.

Submitted by:	swell.k at gmail.com
2009-09-21 07:05:48 +00:00