Commit Graph

5651 Commits

Author SHA1 Message Date
Jung-uk Kim
c12b965f99 General style cleanup, no functional change. 2009-11-20 21:12:40 +00:00
Jung-uk Kim
5ecf77367c - Allocate scratch memory on stack instead of pre-allocating it with
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9).  It has no
performance benefit over malloc(9)/free(9)[2].

Requested by:	rwatson[1]
Pointed out by:	rwatson, jhb, alc[2]
2009-11-20 18:49:20 +00:00
Jung-uk Kim
986689c263 Fix tinderbox build for i386 and sync amd64 with it. 2009-11-19 15:45:24 +00:00
Jung-uk Kim
ae4fdab8a8 - Change internal function bpf_jit_compile() to return allocated size of
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.
2009-11-18 23:40:19 +00:00
Jung-uk Kim
366652f987 - Make BPF JIT compiler working again in userland. We are limiting size of
generated native binary to page size for now.
- Update copyright date and fix some style nits.
2009-11-18 19:26:17 +00:00
Poul-Henning Kamp
8c0099aed3 Uppercase the UL suffix on a constant, so Flexelint doesn't worry that
'u1' might have been intended.  No, that does not make sense and yes
I have told them.
2009-11-16 10:53:04 +00:00
Konstantin Belousov
ec24e8d42e Amd64 init_secondary() calls initializecpu() while curthread is still
not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to
initializecpu() that results in hang because AP are started when kernel
environment is already dynamic and thus needs to acquire mutex, that is
too early in AP start sequence to work.

Extract the code that should be executed only once, because it sets
up global variables, from initializecpu() to initializecpucache(),
and call the later only from hammer_time() executed on BSP. Now,
TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage.

In collaboration with:	Mykola Dzham <freebsd levsha org ua>
Reviewed by:	jhb
Tested by:	ed, battlez
2009-11-13 13:07:01 +00:00
Jun Kuriyama
bb830eceaa - Style nits.
- Remove unneeded TUNABLE_INT().

Suggested by:	avg, kib
2009-11-12 03:31:19 +00:00
Andriy Gapon
6cc16fcb4e reflect that pg_ps_enabled is a tunable, not just a read-only sysctl
Nod from:	jhb
2009-11-11 14:21:31 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Roman Divacky
68c4dfdf0c Make isa_dma functions MPSAFE by introducing its own private lock. These
functions are selfcontained (ie. they touch only isa_dma.c static variables
and hardware) so a private lock is sufficient to prevent races. This changes
only i386/amd64 while there are also isa_dma functions for ia64/sparc64.
Sparc64 are ones empty stubs and ia64 ones are unused as ia64 does not
have isa (says marcel).

This patch removes explicit locking of Giant from a few drivers (there
are some that requires this but lack ones - this patch fixes this) and
also removes the need for implicit locking of Giant from attach routines
where it's provided by newbus.

Approved by:	ed (mentor, implicit)
Reviewed by:	jhb, attilio (glanced by)
Tested by:	Giovanni Trematerra <giovanni.trematerra gmail com>
IA64 clue:	marcel
2009-11-09 20:29:10 +00:00
Jun Kuriyama
6f5c96c41d - Add hw.clflush_disable loader tunable to avoid panic (trap 9) at
map_invalidate_cache_range() even if CPU is not Intel.
- This tunable can be set to -1 (default), 0 and 1.  -1 is same as
  current behavior, which automatically disable CLFLUSH on Intel CPUs
  without CPUID_SS (should be occured on Xen only).  You can specify 1
  when this panic happened on non-Intel CPUs (such as AMD's).  Because
  disabling CLFLUSH may reduce performance, you can try with setting 0
  on Intel CPUs without SS to use CLFLUSH feature.

Reviewed by:	kib
Reported by:	karl, kuriyama
Related to:	kern/138863
2009-11-09 02:54:16 +00:00
Attilio Rao
f1c892a33c Strip from messages for users external URLs the project cannot directly
control.

Requested by:	kib, rwatson
2009-11-05 14:34:38 +00:00
Jung-uk Kim
8fa0490a2e Tweak memory allocation for amd64 suspend/resume CPU context. 2009-11-04 22:39:18 +00:00
Attilio Rao
06db609d4a Opteron rev E family of processor expose a bug where, in very rare
ocassions, memory barriers semantic is not honoured by the hardware
itself. As a result, some random breakage can happen in uninvestigable
ways (for further explanation see at the content of the commit itself).

As long as just a specific familly is bugged of an entire architecture
is broken, a complete fix-up is impratical without harming to some
extents the other correct cases.
Considering that (and considering the frequency of the bug exposure)
just print out a warning message if the affected machine is identified.

Pointed out by:	Samy Al Bahra <sbahra at repnop dot org>
Help on wordings by:	jeff
MFC:	3 days
2009-11-04 01:32:59 +00:00
John Baldwin
f12c034874 Fix some problems with effective mmap() offsets > 32 bits. This was
partially fixed on amd64 earlier.  Rather than forcing linux_mmap_common()
to use a 32-bit offset, have it accept a 64-bit file offset.  This offset
is then passed to the real mmap() call.  Rather than inventing a structure
to hold the normal linux_mmap args that has a 64-bit offset, just pass
each of the arguments individually to linux_mmap_common() since that more
closes matches the existing style of various kern_foo() functions.

Submitted by:	Christian Zander @ Nvidia
MFC after:	1 week
2009-10-28 20:17:54 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Jung-uk Kim
1d9fd1477c Try hiding annoying text cursor after the video controller is reset. 2009-10-23 18:57:52 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Konstantin Belousov
051f6f8a7a Move intr_describe() out of #ifdef SMP; the function is always required.
Reviewed by:	jhb
2009-10-16 12:00:59 +00:00
John Baldwin
37b8ef16cd Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
  the nexus(4) driver supply a custom bus_describe_intr method that invokes
  a new intr_describe() MD routine which in turn looks up the associated
  interrupt event and invokes intr_event_describe_handler().

Requested by:	many
Reviewed by:	scottl
MFC after:	2 weeks
2009-10-15 14:54:35 +00:00
John Baldwin
55b6a401ef Move the USB wireless drivers down into their own section next to the USB
ethernet drivers.

Submitted by:	Glen Barber  glen.j.barber @ gmail
MFC after:	1 month
2009-10-13 19:02:03 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00
Attilio Rao
8448afced8 atomic_cmpset_barr_* was added in order to cope with compilers willing to
specify their own version of atomic_cmpset_* which could have been
different than the membar version.

Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.

In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.

Requested by:	jhb
Reviewed by:	jhb
Tested by:	Giovanni Trematerra <giovanni dot trematerra at
		gmail dot com>
2009-10-09 15:51:40 +00:00
Jung-uk Kim
a7e2341e20 Clean up amd64 suspend/resume code.
- Allocate memory for wakeup code after ACPI bus is attached.  The early
memory allocation hack was inherited from i386 but amd64 does not need it.
- Exclude real mode IVT and BDA explicitly.  Improve comments about memory
allocation and reason for the exclusions.  It is a no-op in reality, though.
- Remove an unnecessary CLD from wakeup code and re-align.
2009-10-08 17:41:53 +00:00
Attilio Rao
d9492a4483 - All the functions in atomic.h needs to be in "physical" form (like
not defined through macros or similar) in order to be later compiled in
  the kernel and offer this way the support for modules (and
  compatibility among the UP case and SMP case).
  Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
  and specifying a template.  Note that the new DEFINE_CMPSET_GEN()
  template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.

[1] Reported by:	Paul B. Mahol <onemda at gmail dot com>
2009-10-06 23:48:28 +00:00
Attilio Rao
86d2e48c22 Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used.  GCC, however, does that aggressively, even in
presence of volatile operands.  The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).

Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.

Reported by:	jhb
Reviewed by:	jhb
Tested by:	rdivacky, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
2009-10-06 13:45:49 +00:00
Bjoern A. Zeeb
52bf2041ac Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by:	kib
MFC after:	1 month
2009-10-03 11:57:21 +00:00
Konstantin Belousov
b02395c64d As a workaround, for Intel CPUs, do not use CLFLUSH in
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.

XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.

Tested by:	csjp
Reviewed by:	alc
MFC after:	3 days
2009-10-01 12:52:48 +00:00
Rui Paulo
c16c6b65da Improve 802.11s comment.
Spotted by:	dougb
MFC after:	1 day
2009-10-01 02:08:42 +00:00
Andriy Gapon
beb2c1f3e9 cpufunc.h: unify/correct style of c extension names
i386 and amd64 archs only.
inline => __inline. [1]
__asm__ => __asm. [2]

Reviewed by:	kib, jhb [1]
Suggested by:	kib [2]
MFC after:	1 week
2009-09-30 16:34:50 +00:00
Alan Cox
1eaff126fa Temporarily disable the use of 1GB page mappings by the direct map. There
are currently two problems with the use of 1GB page mappings by the direct
map.  First, at least one device driver uses pmap_extract() rather than
DMAP_TO_PHYS() to translate a direct map address to a physical address.
Unfortunately, neither pmap_extract() nor pmap_kextract() yet support 1GB
page mappings.  Second, pmap_bootstrap() needs to interrogate the MTRRs to
ensure that a 1GB page mapping doesn't span two MTRRs of different types.

Reported and tested by: Daniel O'Connor
MFC after:	3 days
2009-09-28 17:10:27 +00:00
Jung-uk Kim
71f99e637a Copy apm(4) emulation from sys/i386/acpica/acpi_machdep.c and
install apm(8) and apm_bios.h on amd64.
2009-09-27 14:00:16 +00:00
Bjoern A. Zeeb
4507f02e0e lindev(4) [1] is supposed to be a collection of linux-specific pseudo
devices that we also support, just not by default (thus only LINT or
module builds by default).

While currently there is only "/dev/full" [2], we are planning to see more
in the future.  We may decide to change the module/dependency logic in the
future should the list grow too long.

This is not part of linux.ko as also non-linux binaries like kFreeBSD
userland or ports can make use of this as well.

Suggested by:	rwatson [1] (name)
Submitted by:	ed [2]
Discussed with:	markm, ed, rwatson, kib (weeks ago)
Reviewed by:	rwatson, brueffer (prev. version)
PR:		kern/68961
MFC after:	6 weeks
2009-09-26 12:45:28 +00:00
Ed Maste
43721b6e84 Add a backtrace to the "fpudna in kernel mode!" case, to help track down
where this comes from.

Reviewed by:	bde
2009-09-24 14:26:42 +00:00
Andriy Gapon
1e908511f8 number of cleanups in i386 and amd64 pci md code
o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
  unsigned before comparison with maximum value to cut off negative
  values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
  values are already checked in the subsequent switch

Reviewed by:	jhb
MFC after:	1 week
2009-09-24 07:11:23 +00:00
John Baldwin
d95e7f5a7a Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
  APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
  mapping the RSDT or XSDT and searching for a table with a given signature
  out into acpica_machdep.c for both amd64 and i386.
2009-09-23 15:42:35 +00:00
John Baldwin
07ee969179 - Split the logic to parse an SMAP entry out into a separate function on
amd64 similar to i386.  This fixes a bug on amd64 where overlapping
  entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
  instead of an append to support systems with out-of-order SMAP entries.

PR:		amd64/138220
Reported by:	James R. Van Artsdalen  james of jrv org
MFC after:	3 days
2009-09-22 16:51:00 +00:00
Xin LI
a57707e712 Build x86bios only for i386/amd64 for now. More work is required
to make these functional on other architectures, and the current
code breaks sparc64 and powerpc.

Spotted by:	tinderbox via des
2009-09-21 23:58:29 +00:00
Konstantin Belousov
a1bfaca761 If CPU happens to be in usermode when a T_RESERVED trap occured,
then trapsignal is called with ksi.ksi_signo = 0. For debugging kernels,
that should end up in panic, for non-debugging kernels behaviour is
undefined.

Do panic regardeless of execution mode at the moment of trap.

Reviewed by:	jhb
MFC after:	1 month
2009-09-21 09:41:51 +00:00
Xin LI
6abad12dfe Automatically depend on x86emu when vesa or dpms is being built into
kernel.  With this change the user no longer need to remember building
this option.

Submitted by:	swell.k at gmail.com
2009-09-21 07:08:20 +00:00
Xin LI
372c733759 Enable s3pci on amd64 which works on top of VESA, and allow
static building it into kernel on i386 and amd64.

Submitted by:	swell.k at gmail.com
2009-09-21 07:05:48 +00:00
Alan Cox
d6dbb0dba0 When superpages are enabled, add the 2 or 4MB page size to the array of
supported page sizes.

Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:09:33 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Jung-uk Kim
3bcdfb9bf8 Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.
2009-09-10 17:27:36 +00:00
Dag-Erling Smørgrav
80c03b8eee As jhb@ pointed out to me, r197057 was incorrect, not least because these
are generated files.
2009-09-10 13:20:27 +00:00
Xin LI
ee5e90dab2 - Teach vesa(4) and dpms(4) about x86emu. [1]
- Add vesa kernel options for amd64.
 - Connect libvgl library and splash kernel modules to amd64 build.
 - Connect manual page dpms(4) to amd64 build.
 - Remove old vesa/dpms files.

Submitted by:	paradox <ddkprog yahoo com> [1], swell k at gmail.com
		(with some minor tweaks)
2009-09-09 09:50:31 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00
Poul-Henning Kamp
a330ed7cd1 Move multi-include protection back up to the top of the file and
name after the physical file rather than the aliased name.
2009-09-08 12:59:56 +00:00
Jung-uk Kim
c8e648e167 Fix confusing comments about default PAT entries. 2009-09-02 16:47:10 +00:00
Jung-uk Kim
c9e8817902 - Work around ACPI mode transition problem for recent NVIDIA 9400M chipset
based Intel Macs.  Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason.  For these platforms, we just
use the old PAT layout.  Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.

Reported by:	rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by:	jhb
2009-09-02 16:02:48 +00:00
John Baldwin
a01e019a26 Don't attempt to bind the current thread to the CPU an IRQ is bound to
when removing an interrupt handler from an IRQ during shutdown.  During
shutdown we are already bound to CPU 0 and this was triggering a panic.

MFC after:	3 days
2009-09-02 00:39:59 +00:00
John Baldwin
8101afb656 Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.

Discussed with:	alc
MFC after:	3 days
2009-08-31 18:41:13 +00:00
Bjoern A. Zeeb
ecc2fda872 Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.

Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.

The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo  was matched first,  as the fallback to ld-elf32.so.1 does
not exist in that case.

Reported and tested by:	ticso
In collaboration with:	kib
MFC after:		3 days
2009-08-30 14:38:17 +00:00
Robert Noland
cbc3c1f687 Swap the start/end virtual addresses in pmap_invalidate_cache_range().
This fixes the functionality on non SelfSnoop hardware.

Found by:	rnoland
Submitted by:	alc
Reviewed by:	kib
MFC after:	3 days
2009-08-29 16:01:21 +00:00
Bjoern A. Zeeb
89ffc202d6 Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR:		kern/135468
Submitted by:	dchagin [1] (initial patch)
Suggested by:	kib [2]
Tested by:	Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by:	kib
MFC after:	3 days
2009-08-24 16:19:47 +00:00
Jung-uk Kim
66406b8f25 Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.

Tested by:	bz
Approved by:	re (kib)
2009-08-20 22:58:05 +00:00
Ed Schouten
12f27c4e64 Make the MacBookPro3,1 hardware boot again.
Tested by:	Patrick Lamaiziere <patfbsd davenulle org>
Approved by:	re (kib)
2009-08-19 20:39:33 +00:00
Konstantin Belousov
faccac2d45 Correct a critical accounting error in pmap_demote_pde(). Specifically,
when pmap_demote_pde() allocates a page table page to implement a
user-space demotion, it must increment the pmap's resident page count.
Not doing so, can lead to an underflow during address space termination
that causes pmap_remove() to exit prematurely, before it has destroyed
all of the mappings within the specified range.  The ultimate effect or
symptom of this error is an assertion failure in vm_page_free_toq()
because the page being freed is still mapped.

This error is only possible when superpage promotion is enabled.  Thus,
it only affects FreeBSD  versions greater than 7.2.

Tested by:	pho, alc
Reviewed by:	alc
Approved by:	re (rwatson)
MFC after:	1 week
2009-08-17 13:27:55 +00:00
John Baldwin
21157ad3b1 Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
  routines in the local APIC code that the hwpmc(4) driver can use to
  manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
  HWPMC_HOOKS is enabled.  Instead, the hwpmc(4) driver explicitly
  enables the interrupt when it is succesfully initialized and disables
  the interrupt when it is unloaded.  This avoids enabling the interrupt
  on unsupported CPUs which may result in spurious NMIs.

Reported by:	rnoland
Reviewed by:	jkoshy
Approved by:	re (kib)
MFC after:	2 weeks
2009-08-14 21:05:08 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
Ed Schouten
61fb73de41 Make the MacBook3,1 boot again.
Approved by:	re (kib)
2009-08-02 11:26:23 +00:00
Rui Paulo
1f93ae9453 Refine the MacBook hack to only match early models that have Intel ICH.
Discussed with:	kjim
Approved by:	re (kib)
2009-07-27 13:51:55 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Konstantin Belousov
206a336872 When the page caching attributes are changed, after new mapping is
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.

Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.

Proposed and reviewed by:	alc
Approved by:	re (kensmith)
2009-07-22 14:32:38 +00:00
Alan Cox
9861cbc6ca Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386.  Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages.  Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache.  Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address.  For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it.  It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by:	re (kib)
2009-07-19 21:40:19 +00:00
Alan Cox
13de722155 An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc().  It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by:	re (kib)
2009-07-18 01:50:05 +00:00
Jung-uk Kim
e8c4d3e407 Match PCI Express root bridge _HID directly instead of
relying on _CID.

Reviewed by:	jhb
Approved by:	re (kib)
2009-07-13 21:36:31 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Rui Paulo
59aa14a91d Implementation of the upcoming Wireless Mesh standard, 802.11s, on the
net80211 wireless stack. This work is based on the March 2009 D3.0 draft
standard. This standard is expected to become final next year.
This includes two main net80211 modules, ieee80211_mesh.c
which deals with peer link management, link metric calculation,
routing table control and mesh configuration and ieee80211_hwmp.c
which deals with the actually routing process on the mesh network.
HWMP is the mandatory routing protocol on by the mesh standard, but
others, such as RA-OLSR, can be implemented.

Authentication and encryption are not implemented.

There are several scripts under tools/tools/net80211/scripts that can be
used to test different mesh network topologies and they also teach you
how to setup a mesh vap (for the impatient: ifconfig wlan0 create
wlandev ... wlanmode mesh).

A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled
by default on GENERIC kernels for i386, amd64, sparc64 and pc98.

Drivers that support mesh networks right now are: ath, ral and mwl.

More information at: http://wiki.freebsd.org/WifiMesh

Please note that this work is experimental. Also, please note that
bridging a mesh vap with another network interface is not yet supported.

Many thanks to the FreeBSD Foundation for sponsoring this project and to
Sam Leffler for his support.
Also, I would like to thank Gateworks Corporation for sending me a
Cambria board which was used during the development of this project.

Reviewed by:	sam
Approved by:	re (kensmith)
Obtained from:	projects/mesh11s
2009-07-11 15:02:45 +00:00
Konstantin Belousov
d77e2734a1 When amd64 CPU cannot load segment descriptor during trap return to
usermode, it generates GPF, that is mirrored to user mode as SIGSEGV.
The offending register in mcontext should contain the value loading of
which generated the GPF, and it is so on i386. On amd64, we currently
report segment descriptor in tf_err, while segment register contains the
corrected value loaded by trap handler.

Fix the issue by behaving like i386, reloading segment register in trap
frame after signal frame is pushed onto user stack.

Noted and tested by:	pho
Approved by:	re (kensmith)
2009-07-10 10:29:16 +00:00
Konstantin Belousov
a2622e5dc2 Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by:	jeff
Tested (and tested, and tested ...) by:	pho
Approved by:	re (kensmith)
2009-07-09 09:34:11 +00:00
Alan Cox
133898afd5 When pmap_change_attr() changes the PAT setting on a kernel mapping, it has
to simultaneously change the PAT setting for the same pages within the
direct map region.  This may require the demotion of a 2MB page mapping and
the allocation of a page table page.  This revision gives the highest
possible priority (VM_ALLOC_INTERRUPT) to this page allocation, so that
pmap_change_attr() is less likely to fail.  (In general, kernel page table
page allocations have the highest priority, so this is not creating a new
precedent.)

(Demotion of 1GB page mappings within the direct map already specifies
VM_ALLOC_INTERRUPT to vm_page_alloc(), so only pmap_demote_pde() must be
changed.)

Approved by:	re (kib)
2009-07-06 18:43:42 +00:00
John Baldwin
0d0a6650d7 After the per-CPU IDT changes, the IDT vector of an interrupt could change
when the interrupt was moved from one CPU to another.  If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled.  This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range.  However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled.  The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.

Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method.  While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.

Approved by:	re (kensmith)
2009-07-06 18:23:00 +00:00
John Baldwin
f7d7cd0c76 MFi386: Add a 'show idt' command to DDB to display the non-default function
pointers in the interrupt descriptor table.

Approved by:	re (kensmith)
2009-07-06 18:10:27 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
Ed Schouten
89fe4c0a2b Enable POSIX semaphores on all non-embedded architectures by default.
More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.

Proposed by:	miwi
Patch by:	Florian Smeets <flo kasimir com>
Approved by:	re (kib)
2009-07-02 18:24:37 +00:00
John Baldwin
cebc7fb16c Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
  to a specific CPU to return an error value instead of void, thus allowing
  it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
  destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
  on the first interrupt in a group.  Moving the first interrupt in a group
  moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
  intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
  interrupts.  Previously, binding an interrupt to a CPU only performed a
  privilege check if the interrupt had an interrupt thread.  Interrupts
  without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
  cpuset mask for the associated interrupt thread.

Approved by:	re (kib)
2009-07-01 17:20:07 +00:00
Doug Rabson
259d14ed88 Don't include rpcv2.h - it has been removed.
Submitted by: ed@
Approved by: re
2009-07-01 07:34:28 +00:00
Andriy Gapon
462fab84b8 remove unused/unneeded extern declarations
This should result in no changes to compiled code.

Reviewed by:	alc
Approved by:	re (kib)
MFC after:	1 day
2009-06-30 11:16:32 +00:00
Robert Watson
ad8dacbb91 Catch missed AUDIT_ARG() -> AUDIT_ARG_CMD() on amd64.
Submitted by:	Florian Smeets <flo at kasimir.com>
Approved by:	re (kib) (implicit)
MFC after:	1 week
2009-06-27 15:03:50 +00:00
Robert Watson
14961ba789 Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type.  This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by:	brooks
Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-06-27 13:58:44 +00:00
Alan Cox
5797795f5a Correct the #endif comment.
Noticed by:	jmallett
Approved by:	re (kib)
2009-06-26 16:22:24 +00:00
Alan Cox
e999111ae7 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
John Baldwin
4e9dba6322 Fix kernels compiled without SMP support. Make intr_next_cpu() available
for UP kernels but as a stub that always returns the single CPU's local
APIC ID.

Reported by:	kib
2009-06-25 20:35:46 +00:00
John Baldwin
b4805f449c - Restore the behavior of pre-allocating IDT vectors for MSI interrupts.
This is mostly important for the multiple MSI message case where the
  IDT vectors for the entire group need to be allocated together.  This
  also restores the assumptions made by the PCI bus code that it could
  invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
  assigned to interrupt sources on activation.  Instead of assigning the
  CPU via pic_assign_cpu() before calling enable_intr(), allow the
  different interrupt source drivers to ask the MD interrupt code which
  CPU to use when they allocate an IDT vector.  I/O APIC interrupt pins
  do this in their pic_enable_intr() routines giving the same behavior as
  before.  MSI sources do it when the IDT vectors are allocated during
  msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.

Tested by:	rnoland
2009-06-25 18:13:46 +00:00
John Baldwin
7af55bd450 Whitespace fix. 2009-06-24 19:16:48 +00:00
Alexander Motin
9f23a6caa4 Make algorithm a bit more bulletproof. 2009-06-23 23:16:37 +00:00
Jeff Roberson
50c202c592 Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
   DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
   PCPU_*.  Requires only one extra instruction more than PCPU_* and is
   virtually the same as __thread for builtin and much faster for shared
   objects.  DPCPU variables can be initialized when defined.
 - Modules are supported by relocating the module's per-cpu linker set
   over space reserved in the kernel.  Modules may fail to load if there
   is insufficient space available.
 - Track space available for modules with a one-off extent allocator.
   Free may block for memory to allocate space for an extent.

Reviewed by:    jhb, rwatson, kan, sam, grehan, marius, marcel, stas
2009-06-23 22:42:39 +00:00
Alexander Motin
9e9be26906 Fix variable name. 2009-06-23 22:08:25 +00:00
Alexander Motin
96c5d068d8 Rework r193814:
While general idea of patch was good, it was not working properly due the way
it was implemented. When we are using same timer interrupt for several of
hard/prof/stat purposes we should not send several IPIs same time to other
CPUs. Sending several IPIs same time leads to terrible accounting/profiling
results due to strong synchronization effect, when the second interrupt
handler accounts processing of the first one.
Interlink timer events in a such way, that no more then one IPI is sent for
any original timer interrupt.
2009-06-23 21:45:33 +00:00
Alan Cox
0f6766f3da Eliminate dead code. These definitions should have been deleted with the
introduction of i686_mem.c in r45405.

Merge adjacent #ifdef _KERNEL/#endif blocks.
2009-06-22 04:21:02 +00:00
Paul Saab
c2b6d60d26 I have several machines where the following warning is printed:
warning: no time-of-day clock registered, system time will not be set accurately

Provide hints to atrtc on amd64 since it's not being described in
ACPI on some systems.

Reviewed by:	jhb
2009-06-15 21:55:29 +00:00
Alexander Motin
bb74c2db4d Forbid multi-vector MSI interrupt vectors migration to another CPU once
allocated. MSI have strict vectors allocation requirements, which are not
satisfied now during reallocation. This is not the best possible solution,
but better then just broken, as it was.

No objections: current@, arch@, jhb@
2009-06-15 13:47:49 +00:00
Alan Cox
387aabc513 Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt().  This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous.  Unfortunately, this is not always the case.  For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption.  Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous.  This
revision also changes some inconsistent if not buggy behavior.  For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid.  However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page.  The amd64 version has a bug of
my own creation.  It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance.  I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms.  However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity).  These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with:	jhb@
2009-06-14 19:51:43 +00:00
Ed Schouten
c2919fd8ff Enable PRINTF_BUFR_SIZE on i386 and amd64 by default.
In the past there have been some reports of PRINTF_BUFR_SIZE not
functioning correctly. Instead of having garbled console messages, we
should just see whether the issues are still there and analyze them.

Approved by:	re
2009-06-14 18:01:35 +00:00
Pyun YongHyeon
d68875eb7e Add alc(4), a driver for Atheros AR8131/AR8132 PCIe ethernet
controller. These controllers are also known as L1C(AR8131) and
L2C(AR8132) respectively. These controllers resembles the first
generation controller L1 but usage of different descriptor format
and new register mappings over L1 register space requires a new
driver. There are a couple of registers I still don't understand
but the driver seems to have no critical issues for performance and
stability. Currently alc(4) supports the following hardware
features.
  o MSI
  o TCP Segmentation offload
  o Hardware VLAN tag insertion/stripping
  o Tx/Rx interrupt moderation
  o Hardware statistics counters(dev.alc.%d.stats)
  o Jumbo frame
  o WOL
AR8131/AR8132 also supports Tx checksum offloading but I disabled
it due to stability issues. I'm not sure this comes from broken
sample boards or hardware bugs. If you know your controller works
without problems you can still enable it. The controller has a
silicon bug for Rx checksum offloading, so the feature was not
implemented.
I'd like to say big thanks to Atheros. Atheros kindly sent sample
boards to me and answered several questions I had.

HW donated by:	Atheros Communications, Inc.
2009-06-10 02:07:58 +00:00
Kip Macy
98fda6ac58 opt in to flowtable on i386/amd64 2009-06-09 21:58:14 +00:00
Kip Macy
c804d618eb remove flowtable from DEFAULTS 2009-06-09 20:26:52 +00:00
Bjoern A. Zeeb
78c7c6adbc Unbreak the build for amd64 after r193814 using correct variable names. 2009-06-09 09:47:02 +00:00
Ariff Abdullah
b65cb1db3c When using i8254 as the only kernel timer source:
- Interpolate stat/prof clock using clkintr() in a similar fashion to
  local APIC timer, since statclock usually run slower.

- Liberate hardclockintr() from taking the burden of handling both stat
  and prof clock interrupt. Instead, send IPIs within clkintr() to handle
  those.
2009-06-09 07:26:52 +00:00
Ariff Abdullah
867cdecd40 Move C1E workaround into its own idle function. Previous workaround works
only during initial booting process, while there are laptops/BIOSes that
tend to act 'smarter' by force enabling C1E if the main power adapter
being pulled out, rendering previous workaround ineffective. Given the
fact that we still rely on local APIC to drive timer interrupt, this
workaround should keep all Turion (probably Phenom too) X\d+ alive whether
its on battery power or not.

URL:		http://lists.freebsd.org/pipermail/freebsd-acpi/2008-April/004858.html
    		http://lists.freebsd.org/pipermail/freebsd-acpi/2008-May/004888.html

Tested by:	Peter Jeremy <peterjeremy at optushome d com d au>
2009-06-09 04:17:36 +00:00
Jung-uk Kim
230bb4d90d Rewrite OsdSynch.c to reflect the latest ACPICA more closely:
- Implement ACPI semaphore (ACPI_SEMAPHORE) with condvar(9) and mutex(9).
- Implement ACPI mutex (ACPI_MUTEX) with mutex(9).
- Implement ACPI lock (ACPI_SPINLOCK) with spin mutex(9).
2009-06-08 20:07:16 +00:00
Ed Schouten
5942207fb4 Revert my change; reintroduce __gnu89_inline.
It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.

Requested by:	rwatson
2009-06-08 18:23:43 +00:00
Ed Schouten
032e3d1d19 Remove __gnu89_inline.
Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by:	alc
2009-06-08 17:27:25 +00:00
Alan Cox
3cfc28b0a0 Now that amd64's kernel map is 512GB (SVN rev 192216), there is no reason
to cap its buffer map at 1GB.

MFC after:	6 weeks
2009-06-08 16:43:40 +00:00
Konstantin Belousov
12be48a216 Put intrcnt, eintrcnt, intrnames and eintrnames into the .data section.
Noted by:	"Tseng, Kuo-Lang" <kuo-lang.tseng intel com>, bde
MFC after:	3 days
2009-06-05 20:23:29 +00:00
Jung-uk Kim
129d3046ef Import ACPICA 20090521. 2009-06-05 18:44:36 +00:00
Robert Watson
bd875f5f13 Remove MAC kernel config files and add "options MAC" to GENERIC, with the
goal of shipping 8.0 with MAC support in the default kernel.  No policies
will be compiled in or enabled by default, but it will now be possible to
load them at boot or runtime without a kernel recompile.

While the framework is not believed to impose measurable overhead when no
policies are loaded (a result of optimization over the past few months in
HEAD), we'll continue to benchmark and optimize as the release approaches.
Please keep an eye out for performance or functionality regressions that
could be a result of this change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2009-06-02 18:31:08 +00:00
Dmitry Chagin
f8cd0af232 Implement accept4 syscall.
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-01 20:48:39 +00:00
Robert Watson
33dd50646e Regenerate generated syscall files following changes to struct sysent in
r193234.
2009-06-01 16:14:38 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
John Baldwin
515c5b1ede Don't bother reading the initial value of the machine check banks during
startup on Pentium 4 CPUs.  This wasn't safe to do on APs during AP startup,
was of limited value, and won't be used for future processors.
2009-05-20 16:11:22 +00:00
John Baldwin
dfc77ef51f - Add a tunable 'hw.mca.enabled' that can be used to enable/disable the
machine check code.  Disable it by default for now.
- When computing the mask of bits that determines a non-restartable event
  during a machine check exception, or-in the overflow flag rather than
  replacing the other flags.

PR:		i386/134586 [2]
Submitted by:	Andi Kleen  andi-fbsd firstfloor.org
2009-05-18 21:50:06 +00:00
John Baldwin
d3da228f37 Add a read-only sysctl hw.pci.mcfg to mirror the tunable by the same name.
MFC after:	1 week
2009-05-18 21:47:32 +00:00
John Baldwin
8aba835b8e Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend
using 128 byte alignment for locks.  (See IA-32 SDM Vol 3A 7.11.6.7)
2009-05-18 19:33:59 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
Kip Macy
b522d2c99b correct range in comment
pointed out by alc
2009-05-16 22:08:00 +00:00
Kip Macy
e127902229 update vm map comment
pointed out by Larry Rosenman
2009-05-16 22:00:13 +00:00
Kip Macy
b6d82b1ae9 Increase default kernel map to 512GB
I briefly discussed this with alc. It could lead to problems for greater than 64GB.
However, that seems unlikely in practice.
2009-05-16 20:57:08 +00:00
Dmitry Chagin
3933bde22e Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and
SOCK_NONBLOCK flags, that allow to save fcntl() calls.

Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-16 18:48:41 +00:00
John Baldwin
76dae09449 Trim the default set of device hints on i386 and amd64:
- Remove vga0 and the disabled uart2/uart3 hints from both platforms.
- Remove hints for ISA adv0, bt0, aha0, aic0, ed0, cs0, sn0, ie0, fe0, and
  le0 from i386.  All these hints were marked 'disabled' and thus already
  did not work "out of the box".

Discussed with:	imp
2009-05-14 21:53:35 +00:00
Attilio Rao
120b18d86f FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now).  Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by:	jhb, kmacy
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2009-05-14 17:43:00 +00:00
John Baldwin
9dc0b3d54f Implement simple machine check support for amd64 and i386.
- For CPUs that only support MCE (the machine check exception) but not MCA
  (i.e. Pentium), all this does is print out the value of the machine check
  registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
  a bit more involved.
  - First, there is limited support for decoding the CPU-independent MCA
    error codes in the kernel, and the kernel uses this to output a short
    description of any machine check events that occur.
  - When a machine check exception occurs, all of the MCx banks on the
    current CPU are scanned and any events are reported to the console
    before panic'ing.
  - To catch events for correctable errors, a periodic timer kicks off a
    task which scans the MCx banks on all CPUs.  The frequency of these
    checks is controlled via the "hw.mca.interval" sysctl.
  - Userland can request an immediate scan of the MCx banks by writing
    a non-zero value to "hw.mca.force_scan".
  - If any correctable events are encountered, the appropriate details
    are stored in a 'struct mca_record' (defined in <machine/mca.h>).
    The "hw.mca.count" is a count of such records and each record may
    be queried via the "hw.mca.records" tree by specifying the record
    index (0 .. count - 1) as the next name in the MIB similar to using
    PIDs with the kern.proc.* sysctls.  The idea is to export machine
    check events to userland for more detailed processing.
  - The periodic timer and hw.mca sysctls are only present if the CPU
    supports MCA.

Discussed with:	emaste (briefly)
MFC after:	1 month
2009-05-13 17:53:04 +00:00
Alan Cox
07a7b85e94 Correct a rare use-after-free error in pmap_copy(). This error was
introduced in amd64 revision 1.540 and i386 revision 1.547.  However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)

The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page.  I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails.  This is appropriate because the creation of
mappings by pmap_copy() is optional.  They are a (possible) optimization,
and not a requirement.

Correct a nearby whitespace error in the i386 pmap_copy().

Crash reported by: jeff@
MFC after:	6 weeks
2009-05-13 07:42:53 +00:00
Dmitry Chagin
03cc95d21a Translate l_timeval arg to native struct timeval in
linux_setsockopt()/linux_getsockopt() for SO_RCVTIMEO,
SO_SNDTIMEO opts as l_timeval has MD members.

Remove bogus __packed attribute from l_timeval struct on __amd64__.

PR:		kern/134276
Submitted by:	Thomas Mueller <tmueller sysgo com>
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-05-11 13:50:42 +00:00
Dmitry Chagin
8d30f381ef Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by:	kib (mentor)
2009-05-10 18:43:43 +00:00
Dmitry Chagin
1ca16454b3 Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by:	bde

Approved by:	kib (mentor)
2009-05-10 18:16:07 +00:00
Jun Kuriyama
b3b17597ea - Use "device\t" and "options \t" for consistency. 2009-05-10 00:00:25 +00:00
Jamie Gritton
7ae27ff49f Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
Dmitry Chagin
13f20d7e86 To avoid excessive code duplication move MI definitions to the MI
header file. As it is defined in Linux.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 09:39:20 +00:00
Doug Rabson
ad5c667f35 Disable adaptive mutexes and rwlocks for XENHVM. 2009-05-06 17:52:38 +00:00
Doug Rabson
8480241102 Fix XENHVM build. 2009-05-06 17:48:39 +00:00
Alexander Motin
614dd4f83c Do not try to initialize LAPIC timer if we are not going to use it.
It solves assertion, when kernel built with INVARIANTS configured
to use i8254 timer.
2009-05-05 01:13:20 +00:00
Jung-uk Kim
4ef853cc7f Unlock the largest standard CPUID on Intel CPUs for both amd64 and i386 and
fix SMP topology detection.  On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place.  On amd64, all supported Intel CPUs should have this MSR.
2009-05-04 18:05:27 +00:00
Alexander Motin
1703f2b424 Rename statclock_disable variable to atrtcclock_disable that it actually is,
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.

Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.

This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.
2009-05-03 17:47:21 +00:00
Alexander Motin
6a3a164d6e Add support for using i8254 and rtc timers as event sources for amd64 SMP
system. Redistribute hard-/stat-/profclock events to other CPUs using IPIs.
2009-05-02 12:20:43 +00:00
Dmitry Chagin
d789bfd562 Move extern variable definitions to the header file.
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-02 10:06:49 +00:00
Alexander Motin
58a2bb4996 Add resume methods to i8254 and atrtc devices. 2009-05-01 21:43:04 +00:00
Alexander Motin
2f369c9496 Small addition to r191720.
Restore previous behaviour for the case of unknown interrupt. Invocation
of IRQ -1 crashes my system on resume. Returning 0, as it was, is not
perfect also, but at least not so dangerous.
2009-05-01 20:53:37 +00:00
Sam Leffler
dcad868984 o add uath
o sort usb wireless drivers
2009-05-01 17:20:16 +00:00
Alexander Motin
1ecff35a6b Use value -1 instead of 0 for marking unused APIC vectors. This fixes
IRQ0 routing on LAPIC-enabled systems.

Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers
as hard-/stat-/profclock sources falling back to using i8254 and rtc timers.

On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU
enters C3 or deeper power state. It makes no problems for interrupt
processing, as chipset wakes up CPU on interrupt triggering. But entering
C3 state kills LAPIC timer and freezes system time, making C3 and deeper
states practically unusable. Using i8254 timer allows to avoid this
problem.

By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters
C3 state, saving more then a Watt of total idle power (>10%) in addition to
all other power-saving techniques.

This technique is not working for SMP yet, as only one CPU receives
timer interrupts. But I think that problem could be fixed by forwarding
interrupts to other CPUs with IPI.
2009-05-01 17:05:49 +00:00
Dmitry Chagin
79262bf1f0 Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-01 15:36:02 +00:00
Jung-uk Kim
788399cbd9 - Fix divide-by-zero panic when SMP kernel is used on UP system[1].
- Avoid possible divide-by-zero panic on SMP system when the CPUID is
disabled, unsupported, or buggy.

Submitted by:	pluknet (pluknet at gmail dot com)[1]
2009-04-30 22:10:04 +00:00
Jeff Roberson
82fcb0f192 - Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
 - Remove the cpu_cores/cpu_logical detection from identcpu.
 - Describe the layout of the system in cpu_mp_announce().

Sponsored by:   Nokia
2009-04-29 06:54:40 +00:00
John Baldwin
10395e0714 Reduce the number of bounce zones (and thus the number of bounce pages
used in some cases):
- Ignore DMA tag boundaries when allocating bounce pages.  The boundaries
  don't determine whether or not parts of a DMA request bounce.  Instead,
  they are just used to carve up segments.
- Allow tags with sub-page alignment to share bounce pages since bounce
  pages are always page aligned.

Reviewed by:	scottl (amd64)
MFC after:	1 month
2009-04-23 20:24:19 +00:00
John Baldwin
125f11d360 Adjust the way we number CPUs on x86 so that we attempt to "group" all
logical CPUs in a package.  We do this by numbering the non-boot CPUs
by starting with the first CPU whose APIC ID is after the boot CPU and
wrapping back around to APIC ID 0 if needed rather than always starting
at APIC ID 0.  While here, adjust the cpu_mp_announce() routine to list
CPUs based on the mapping established by assign_cpu_ids() rather than
making assumptions about the algorithm assign_cpu_ids() uses.

MFC after:	1 month
2009-04-22 21:40:37 +00:00
Robert Watson
9725389e1e Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by:	bde [1], jhb [2]
MFC after:	2 weeks
2009-04-20 12:59:23 +00:00
Robert Watson
22037b2d2c Add description and cautionary note regarding CACHE_LINE_SIZE.
MFC after:	2 weeks
Suggested by:	alc
2009-04-19 21:26:36 +00:00
Robert Watson
a93fa8f2bb For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant.  These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after:	2 weeks
Discussed on:   arch@
2009-04-19 20:19:13 +00:00
Kip Macy
34b07340ff - Import infrastructure for caching flows as a means of accelerating L3 and L2 lookups
as well as providing stateful load balancing when used with RADIX_MPATH.
- Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at
  runtime with 'sysctl net.inet.flowtable.enable=1'.

- Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to
  their kernel config files.

- A minimal hookup will be added to ip_output in a subsequent commit. I would like to see
  more review before bringing in changes that require more churn.

Supported by: Bitgravity Inc.
2009-04-19 00:16:04 +00:00
John Baldwin
842f11bef6 Restore bus DMA bounce pages to an offset of 0 when they are released by
a tag that has BUS_DMA_KEEP_PG_OFFSET set.  Otherwise the page could be
reused with a non-zero offset by a tag that doesn't have
BUS_DMA_KEEP_PG_OFFSET leading to data corruption.

Sleuthing by:	avg
Reviewed by:	scottl
2009-04-17 13:22:18 +00:00
Marcel Moolenaar
6ad9a99f21 Add a compat option to the EBR scheme that controls the
naming of the partitions (GEOM_PART_EBR_COMPAT).  When
compatibility is enabled, changes to the partitioning are
disallowed.

Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.

Enable compatibility on amd64 and i386.
2009-04-15 22:38:22 +00:00
Jung-uk Kim
cebe9dc98a A simple rewrite of biossmap.c:
- Do not iterate int 15h, function e820h twice.  Instead, we use STAILQ to
store each return buffer and copy all at once.
- Export optional extended attributes defined in ACPI 3.0 as separate
metadata.  Currently, there are only two bits defined in the specification.
For example, if the descriptor has extended attributes and it is not
enabled, it has to be ignored by OS.  We may implement it in the kernel
later if it is necessary and proven correct in reality.
- Check return buffer size strictly as suggested in ACPI 3.0.

Reviewed by:	jhb
2009-04-15 17:31:22 +00:00
Konstantin Belousov
3feb57a0a8 The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the
uio_td to extract pages from, instead of unconditionally use kernel
pmap.

Submitted by:	Jason Harmening <jason.harmening gmail com> (amd64 version)
PR:	amd64/133592
Reviewed by:	scottl (original patch), jhb
MFC after:	2 weeks
2009-04-13 19:20:32 +00:00
Ed Schouten
e1048f7678 Simplify in/out functions (for i386 and AMD64).
Remove a hack to generate more efficient code for port numbers below
0x100, which has been obsolete for at least ten years, because GCC has
an asm constraint to specify that.

Submitted by:	Christoph Mallon <christoph mallon gmx de>
2009-04-11 14:01:01 +00:00
Jack F Vogel
b698ab40c8 Add ixgbe to the GENERIC amd64 kernel in place of the
older ixgb driver. I will add to other architectures
after this one proves trouble free.

MFC after:	2 weeks
2009-04-10 00:40:48 +00:00
Ed Schouten
2c97d32a81 Also remove the unused __word_swap_int*() macros.
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-08 19:10:20 +00:00
Ed Schouten
17cfde3df4 Implement __bswap16() without using inline assembly.
Most compilers nowadays (including GCC) are smart enough to know what's
going on and generate more efficient code anyway.

Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-08 19:06:47 +00:00
Ed Schouten
db26a6714a Don't explicitly force ecx to be used for MSR_FSBASE/MSR_GSBASE.
Because the "c" input constaint is used, the compiler will already place
the MSR_FSBASE/MSR_GSBASE constants in ecx. Using __asm("ecx") makes
LLVM crash. Even though this is also an LLVM bug, we'd better remove the
unnecessary GCCism as well.

Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-07 19:31:36 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Jung-uk Kim
6d6dd74911 Reduce code duplcations from r190620. While I am here, tweak a comment. 2009-04-02 01:46:57 +00:00
Jung-uk Kim
c8b7d7f4bf Chase GDT layout changes and unbreak suspend/resume on amd64. 2009-04-02 00:23:56 +00:00
Jung-uk Kim
4ec2a29f12 Garbage collect unused MSR_GSBASE since r190620.
The only consumer was exception.S and specialreg.h is directly included now.
Note no md5 changes were observed for all assym.s consumers with this.
2009-04-01 18:36:34 +00:00
Jung-uk Kim
4a608e44b5 Garbage collect unused stack segment since r190620. 2009-04-01 16:24:24 +00:00
Konstantin Belousov
7496ce7d74 Sync definitions for struct sigcontext for i386 and amd64 architectures
to struct mcontext.
2009-04-01 13:44:28 +00:00
Konstantin Belousov
2c66cccab7 Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
Linuxolator tested by:	dchagin
2009-04-01 13:09:26 +00:00
Konstantin Belousov
c11d6143ca Add separate gdt descriptors for %fs and %gs on amd64.
Reorder amd64 gdt descriptors so that user-accessible selectors are the
same as on i386. At least Wine hard-codes this into the binary.

In collaboration with:	pho
Reviewed by:	jhb
2009-04-01 12:53:01 +00:00
Konstantin Belousov
59aff0f894 Fully enumerate all i386 sysarch commands an amd64 include file.
Provides i386/freebsd API-compatible definitions for the argument
structures of the above sysarch commands. struct i386_ioperm_args
definition is ABI-compatible.

In collaboration with:	pho
Reviewed by:	jhb
2009-04-01 12:48:17 +00:00
Konstantin Belousov
0cdf4ffabc Add all segment registers for the amd64 CPU to struct reg and mcontext.
To keep these structures ABI-compatible, half the size of r_trapno,
r_err, mc_trapno, mc_flags.

Add fsbase and gsbase to mcontext on both amd64 and i386.
Add flags to amd64 mcontext to indicate that it contains valid segments
or bases.

In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
2009-04-01 12:44:17 +00:00
Konstantin Belousov
49c9cff881 Provide convenient definition of the union descriptor, similar to the
i386 one. Fully enumerate system segments and gate types.

In collaboration with:	pho
Reviewed by:	jhb
2009-04-01 12:31:04 +00:00
Jung-uk Kim
4e4ce82e0a Fix an uninitialized variable from the previous commit. 2009-03-31 21:14:05 +00:00
Jung-uk Kim
938608cb45 Probe size of installed memory modules from loader and display it
as 'real memory' instead of Maxmem if the value is available.
Note amd64 displayed physmem as 'usable memory' since machdep.c r1.640
to unconfuse users.  Now it is consistent across amd64 and i386 again.
While I am here, clean up smbios.c a bit and update copyright date.

Reviewed by:	jhb
2009-03-31 21:02:55 +00:00
Doug Ambrisko
0af7103533 Revert 190445 change to this file restoring:
typedef l_long          l_off_t;
Change l_mmap_argv's to l_ulong for pgoff.  This restores prior behaviour
to consumers of l_off_t but allows mmap to mmap a 32bit position which a
Linux application requires to access SMBIOS data via /dev/mem.

Reviewed by:	dchagin
Prompted by:	rdivacky
2009-03-27 17:00:49 +00:00
Konstantin Belousov
49d008d916 Convert gdt_segs and ldt_segs initialization to C99 style.
Reviewed by:	jhb
2009-03-26 18:07:13 +00:00
Doug Ambrisko
d2b2128a28 Add stuff to support upcoming BMC/IPMI flashing of newer Dell machine
via the Linux tool.
     -  Add Linux shim to ipmi(4)
     -  Create a partitions file to linprocfs to make Linux fdisk see
        disks.  This file is dynamic so we can see disks come and go.
     -  Convert msdosfs to vfat in mtab since Linux uses that for
        msdosfs.
     -  In the Linux mount path convert vfat passed in to msdosfs
        so Linux mount works on FreeBSD.  Note that tasting works
        so that if da0 is a msdos file system
                /compat/linux/bin/mount /dev/da0 /mnt
        works.
     -  fix a 64it bug for l_off_t.
Grabing sh, mount, fdisk, df from Linux, creating a symlink of mtab to
/compat/linux/etc/mtab and then some careful unpacking of the Linux bmc
update tool and hacking makes it work on newer Dell boxes.  Note, probably
if you can't figure out how to do this, then you probably shouldn't be
doing it :-)
2009-03-26 17:14:22 +00:00
John Baldwin
b9dda9d6fe Fix a few nits in the earlier changes to prevent local information leakage
in AMD FPUs:
- Do not clear the affected state in the case that the FPU registers for
  the thread that already owns the FPU are changed via fpu_setregs().  The
  only local information the thread would see is its own state in that
  case.
- Fix a type mismatch for the dummy variable used in a "fld".  It accepts
  a float, not a double.

Reviewed by:	bde
Approved by:	so (cperciva)
MFC after:	1 month
2009-03-25 22:08:30 +00:00
John Baldwin
63de9515b7 Rename (fpu|npx)_cleanstate to (fpu|npx)_initialstate to better reflect
their purpose.

Inspired by:	bde
MFC after:	1 month
2009-03-25 14:17:08 +00:00
John Baldwin
6cad8eb41d Fall back to using configuration type 1 accesses for PCI config requests if
the requested PCI bus falls outside of the bus range given in the ACPI
MCFG table.  Several BIOSes seem to not include all of the PCI busses in
systems in their MCFG tables.  It maybe that the BIOS is simply buggy and
does support all the busses, but it is more conservative to just fall back
to the old method unless it is certain that memory accesses will work.
2009-03-24 18:10:22 +00:00
Jung-uk Kim
d2b227cd49 - Clean up suspend/resume code for amd64.
- Call acpi_resync_clock() to reset system time before hardclock is ready
to tick.  Note we assume the current timecounter hardware and RTC are
already available for read operation.

Tested by:	mav
2009-03-23 22:35:30 +00:00
Alan Cox
b4862e19af Update stale comments. The alternate address space mapping was eliminated
when PAE support was added to i386.  The direct mapping exists on amd64.
2009-03-22 18:56:26 +00:00
Alan Cox
0c645b7267 In general, the kernel virtual address of the pml4 page table page that is
stored in the pmap is from the direct map region.  The two exceptions have
been the kernel pmap and the swapper's pmap.  These pmaps have used a
kernel virtual address established by pmap_bootstrap() for their shared
pml4 page table page.  However, there is no reason not to use the direct
map for these pmaps as well.
2009-03-22 04:32:05 +00:00
Alan Cox
9624b51a0e Eliminate the recomputation of pcb_cr3 from cpu_set_upcall(). The
bcopy()ed value from the old thread is the correct value because the new
thread and the old thread will share a page table.
2009-03-22 02:33:48 +00:00
Andrew Thompson
2b78d30630 Remove the uscanner(4) driver, this follows the removal of the kernel scanner
driver in Linux 2.6. uscanner was just a simple wrapper around a fifo and
contained no logic, the default interface is now libusb (supported by sane).

Reviewed by:	HPS
2009-03-19 20:33:26 +00:00
Konstantin Belousov
a4f2b2b0c6 Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer
to the full path of the image that is being executed.
Increase AT_COUNT.

Remove no longer true comment about types used in Linux ELF binaries,
listed types contain FreeBSD-specific entries.

Reviewed by:	kan
2009-03-17 12:50:16 +00:00
Jung-uk Kim
c66d2b38c8 Initial suspend/resume support for amd64.
This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.
2009-03-17 00:48:11 +00:00
Dmitry Chagin
6465d2d9d2 Chase the k8temp->amdtemp rename in NOTES and loader.conf.
Approved by:	kib (mentor)
2009-03-16 10:36:24 +00:00
Alan Cox
3b2dc2ac52 Update the pmap's resident page count when a page table page is freed in
pmap_remove_pde() and pmap_remove_pages().

MFC after:	6 weeks
2009-03-14 08:28:02 +00:00
Alan Cox
957939b503 Correct accounting errors in _pmap_allocpte(). Specifically, the pmap's
resident page count and the global wired page count were not correctly
maintained when page table page allocation failed.

MFC after:	6 weeks
2009-03-14 05:33:09 +00:00
Dmitry Chagin
32c01de21c Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR:		118473
Approved by:	kib (mentor)
2009-03-13 16:40:51 +00:00
Doug Rabson
1267802438 Merge in support for Xen HVM on amd64 architecture. 2009-03-11 15:30:12 +00:00
Alan Cox
802e54dc1f Optimize the inner loop of pmap_copy().
MFC after:	6 weeks
2009-03-11 14:55:04 +00:00
Alan Cox
280db2a6f5 Eliminate the last use of the recursive mapping to access user-space page
table pages.  Now, all accesses to user-space page table pages are
performed through the direct map.  (The recursive mapping is only used
to access kernel-space page table pages.)

Eliminate the TLB invalidation on the recursive mapping when a user-space
page table page is removed from the page table and when a user-space
superpage is demoted.
2009-03-10 02:12:03 +00:00
Robert Watson
6f63bf4edf Trim comments about the MP-safety of various bits of the amd64/i386
system call entry path and i386 IP checksum generation: we now assume
all code is MPSAFE unless explicitly marked otherwise.  Remove XXX
Giant comments along similar lines: the code by the comments either
doesn't need or doesn't want Giant (especially the NMI handler).

MFC after:	3 days
2009-03-09 13:11:16 +00:00
Alan Cox
6ec7df4a08 Change pmap_enter_quick_locked() so that it uses the kernel's direct map
instead of the pmap's recursive mapping to access the lowest level of the
page table when it maps a user-space virtual address.
2009-03-09 03:35:25 +00:00
Maxim Sobolev
feb593d215 Small comment nit: "run time" -> "run-time".
Submitted by:	rwatson
2009-03-08 05:01:39 +00:00
Andrew Thompson
663963b1d2 Reenable ndis in the LINT build now that it has been updated for USB. Thanks to
HPS and Weongyo.
2009-03-07 19:54:30 +00:00
Alan Cox
767a6e258b If the PDE is known, then use the direct mapping instead of the recursive
mapping to access the PTE.
2009-03-06 17:40:58 +00:00
John Baldwin
2ee8325f42 A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
  the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
  FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
  thread hasn't used the FPU yet to reflect the real initial control
  word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
  word instead of trashing whatever the current state of the FPU is.

Reviewed by:	bde
2009-03-05 19:42:11 +00:00
Alan Cox
cc7d8aabd1 Make pmap_copy() more TLB friendly. Specifically, make it use the kernel's
direct map instead of the pmap's recursive mapping to access the lowest
level in the page table.

MFC after:	6 weeks
2009-03-05 18:11:26 +00:00
John Baldwin
a8346a9865 A few cleanups to the FPU code on amd64:
- fpudna() always returned 1 since amd64 CPUs always have FPUs.  Change
  the function to return void and adjust the calling code in trap() to
  assume the return 1 case is the only case.
- Remove fpu_cleanstate_ready as it is always true when it is tested.
  Also, only initialize fpu_cleanstate when fpuinit() is called on the BSP.

Reviewed by:	bde
2009-03-05 16:56:16 +00:00
John Baldwin
9edc34f864 Move the PCB flag macros up next to the 'pcb_flags' member in the struct. 2009-03-05 16:52:50 +00:00
John Baldwin
b25fc07f53 At least one BIOS bogusly includes duplicate entries for I/O APICs. The
bogus entries have a starting IRQ that is invalid (> 255, so won't fit
into a PCI intline config register).  It had the side effect of breaking
MSI by "claiming" several IRQs in the MSI range.  Fix this by ignoring such
I/O APICs.

MFC after:	2 weeks
2009-03-05 16:03:44 +00:00
Dmitry Chagin
4d7c2e8a48 Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by:	Marcin Cieslak <saper at SYSTEM PL>
Approved by:	kib (mentor)
MFC after:	1 week
2009-03-04 12:14:33 +00:00
Konstantin Belousov
2883703e00 Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process
executing on 64bit kernel. This eliminates the direct comparisions
of p_sysent with &ia32_freebsd_sysvec, that were left intact after
r185169.
2009-03-02 18:43:50 +00:00
Maxim Sobolev
df86dcaf67 Fix typo in comments in r189023. 2009-02-25 22:24:56 +00:00
Jung-uk Kim
a4079bfb74 Enable support for PAT_WRITE_PROTECTED and PAT_UNCACHED cache modes
unconditionally on amd64.  On i386, we assume PAT is usable if the CPU
vendor is not Intel or CPU model is newer than Pentium IV.

Reviewed by:	alc, jhb
2009-02-25 20:26:48 +00:00
Maxim Sobolev
e0bc0fad3d Make machdep.hyperthreading_enabled tunable working with the SCHED_ULE.
Unlike with SCHED_BSD, however, it can only be set to 0 at boot time,
it's not possible to change it at runtime.

Reviewed by:	jhb
MFC after:	1 month
2009-02-25 01:49:01 +00:00
Andrew Thompson
1f4963cb16 These are no longer needed. 2009-02-24 23:27:59 +00:00
Andrew Thompson
211211de83 Exclude ndis from the LINT build as it currently breaks the build, patches to
move to the new usb stack are in progress.
2009-02-24 00:39:48 +00:00
Andrew Thompson
c89d41e5ff Change over the usb kernel options to the new stack (retaining existing
naming). The old usb stack can be compiled in my prefixing the name with 'o'.
2009-02-23 18:34:56 +00:00
John Baldwin
0b7dc0a7c6 Some whitespace and style fixes.
Submitted by:	bde (partly)
2009-02-23 15:39:24 +00:00
Alan Cox
6d65f2fa57 Optimize free_pv_entry(); specifically, avoid repeated TAILQ_REMOVE()s.
MFC after:	1 week
2009-02-23 06:00:24 +00:00
Jeff Roberson
35d8de82c4 - Resolve an issue where we may clear an idt while an interrupt on a
different cpu is still assigned to that vector by never clearing idt
   entries.  This was only provided as a debugging feature and the bugs
   are caught by other means.
 - Drop the sched lock when rebinding to reassign an interrupt vector
   to a new cpu so that pending interrupts have a chance to be delivered
   before removing the old vector.

Discussed with:	tegge, jhb
2009-02-21 23:15:34 +00:00
Konstantin Belousov
99b7f1a10b Adapt linux emulation to use cv for vfork wait.
Submitted by:	Takahiro Kurosawa <takahiro.kurosawa gmail com>
PR:	kern/131506
2009-02-18 16:11:39 +00:00
Andrew Thompson
e31a070263 Add uslcom to the build too.
Reminded by:	Michael Butler
2009-02-15 23:40:29 +00:00
Andrew Thompson
e4edc14efd Switch over GENERIC kernels to USB2 by default.
Tested by:	make universe
2009-02-15 22:33:44 +00:00
Alan Cox
6be00eca3f Remove unnecessary page queues locking around vm_page_busy() and
vm_page_wakeup().  (This change is applicable to RELENG_7 but not
RELENG_6.)

MFC after:	1 week
2009-02-14 18:23:52 +00:00
Marcel Moolenaar
91e1be8baf Add option GEOM_PART_EBR by default on amd64 and i386. 2009-02-10 00:08:39 +00:00
Olivier Houchard
96c7367b9e The bounce zone sees its page number increased if multiple dma maps use it in
the same dma tag. However, it can happen multiple dma tags share the same
bounce zone too, so add a per-bounce zone map counter, and check it instead of
the dma tag map counter, to know if we have to alloc more pages.

Reported by:	miwi
Reviewed by:	scottl
2009-02-09 18:03:31 +00:00
Warner Losh
047e5fdabc When bouncing pages, allow a new option to preserve the intra-page
offset.  This is needed for the ehci hardware buffer rings that assume
this behavior.

This is an interim solution, and a more general one is being worked
on.  This solution doesn't break anything that doesn't ask for it
directly.  The mbuf and uio variants with this flag likely don't work
and haven't been tested.

Universe builds with these changes.  I don't have a huge-memory
machine to test these changes with, but will be happy to work with
folks that do and hps if this changes turns out not to be sufficient.

Submitted by:	alfred@ from Hans Peter Selasky's original
2009-02-08 22:54:58 +00:00
Warner Losh
3282e64ac0 Companion for r188301: fix the prototypes. 2009-02-08 07:03:34 +00:00
Warner Losh
d9d53b2a54 Correct parameter types for pcib_{read,write}_config by fixing the
protptyoes for the legacy_* impelemtnations of these kobj methods.
2009-02-08 07:02:42 +00:00
Wojciech A. Koszek
c4ce3ea6eb Tidy NOTES a bit:
- remove misleading nve/nfe comments, which make it hard to
  distinguish those two at a first glance
- bring pbio documentation to the block comment together with
  other drivers

I also brought commented out line responsible for si(4), since it
seems to compile and already has respective comment in this file.
2009-02-07 00:01:10 +00:00
Wojciech A. Koszek
bc4c1ddaf9 ural(4) is already present in global NOTES, thus there is no
need to explicitly list it here once again. This removes:

	WARNING: duplicate option `DEV_URAL' encountered.
	WARNING: duplicate device `ural' encountered.

Warnings when compiling LINT on amd64.
2009-02-06 21:56:55 +00:00
Wojciech A. Koszek
c353491ad3 Fix AGP debugging code:
- correct format strings
- fill opt_agp.h if AGP_DEBUG is defined
- bring AGP_DEBUG to LINT by mentioning it in NOTES

This should hopefully fix a warning that was...

Found by:	Coverity Prevent(tm)
CID:		3676
Tested on:	amd64, i386
2009-02-06 20:57:10 +00:00
Joseph Koshy
bb471e3315 Improve robustness of NMI handling, for NMIs recognized in kernel
mode.

- Make the NMI handler run on its own stack (TSS_IST2).
- Store the GSBASE value for each CPU just before the start of
  each NMI stack, permitting efficient retrieval using %rsp-relative
  addressing.
- For NMIs taken from kernel mode, program MSR_GSBASE explicitly
  since one or both of MSR_GSBASE and MSR_KGSBASE can be potentially
  invalid.  The current contents of MSR_GSBASE are saved and restored
  at exit.
- For NMIs handled from user mode, continue to use 'swapgs' to
  load the per-CPU GSBASE.

Reviewed by:	jeff
Debugging help:	jeff
Tested by:	gnn, Artem Belevich <artemb at gmail dot com>
2009-02-03 09:01:45 +00:00
David E. O'Brien
d065e13dc2 Fix the inconsistent tabbing.
Noticed by:	bde
2009-01-31 20:46:01 +00:00
David E. O'Brien
e6493bbebf Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by:	jhb
2009-01-31 11:37:21 +00:00
Jeff Roberson
9c8e8e3aa7 - Allocate apic vectors on a per-cpu basis. This allows us to allocate
more irqs as we have more cpus.  This is principally useful on systems
   with msi devices which may want many irqs per-cpu.

Discussed with:	jhb
Sponsored by:	Nokia
2009-01-29 09:22:56 +00:00
John Baldwin
de43ac6044 Use a different value for the initial control word for the FPU state for
32-bit processes.  The value matches the initial setting used by
FreeBSD/i386.  Otherwise, 32-bit binaries using floating point would use
a slightly different initial state when run on FreeBSD/amd64.

MFC after:	1 week
2009-01-28 20:35:16 +00:00
Jung-uk Kim
b11e7979ac VIA Nano processor has a special MSR (CENT_HARDWARECTRL3) bit 32 to determine
whether TSC is P-state invariant or not.  In fact, this MSR is writable but
we just leave it at the BIOS default for now.
2009-01-22 21:04:46 +00:00
Konstantin Belousov
5c0c22e92e The context switch to the 32bit binary does not properly restore
the fsbase value. The switch loads the fs segment register, that
invalidates the value in fsbase msr, thus value in %r9 can not be
considered the current value for fsbase anymore.

Unconditionally reload fsbase when switching to 32bit binary.

PR:	130526
MFC after:	3 weeks
2009-01-20 12:07:49 +00:00
Maxim Sobolev
9cfe40d9fa Take NTFS option out to match i386 GENERIC.
Suggested by:	phk, luigi
2009-01-19 15:33:06 +00:00
Maxim Sobolev
4d598bc1de asr(4) is not amd64-clean, not amr(4).
Pointy hat to:	myself
Submitted by:	scottl
2009-01-19 08:51:20 +00:00
Maxim Sobolev
69b2984e2e Comment amr(4) out - according to scottl it's not 64-bit clean. 2009-01-19 08:25:41 +00:00
Maxim Sobolev
68ce278eea Whitespace-only: reduce diff to the i386 GENERIC. 2009-01-19 07:18:32 +00:00
Maxim Sobolev
579aaaa4ea Add asr(4) and stge(4) from i386 GENERIC. Both drivers compile on amd64 and
there is no particular reason for them to be i386-only.

MFC after:	2 weeks
2009-01-19 07:10:11 +00:00
Konstantin Belousov
a353a3455e Disable interrupts, if they were enabled, before doing swapgs.
Otherwise, interrupt may happen while we run with kernel CS and usermode
gsbase.

Reviewed by:	jeff
MFC after:	1 week
2009-01-14 14:20:08 +00:00
Andrew Thompson
ea8f960c13 MFp4: //depot/projects/usb@155990
Add USB scanner support to USB2 config files.

Submitted by: Hans Petter Selasky
2009-01-13 19:05:10 +00:00
Luigi Rizzo
650ea0d62e Documentation-only change:
- add a reference to the config(5) manpage;
- hopefully clarify the format of the 'env FILENAME' directive.

I am putting these notes in sys/${arch}/conf/GENERIC and not
in sys/conf/NOTES because:

1. i386/GENERIC already had reference to a similar option (hints..)
   and to documentation (handbook)

2. GENERIC is what most users look at when they have to modify or
   create a new kernel config, so having the suggestion there is
   more effective.

I am only touching i386 and amd64 because the other GENERIC files
are already out of sync, and I am not sure what is the overall plan.

MFC after:	3 days
2009-01-13 12:35:33 +00:00
Jung-uk Kim
92df0bda99 Add basic amd64 support for VIA Nano processors. 2009-01-12 19:17:35 +00:00
Jung-uk Kim
6811e5d474 Add Centaur/IDT/VIA vendor ID for Nano family, which has long mode support. 2009-01-05 21:51:49 +00:00
Robert Watson
b581c4975b Add commented out options KDTRACE_HOOKS and, for amd64, KDRACE_FRAME,
to GENERIC configuration files.  This brings what's in 8.x in sync
with what is in 7.x, but does not change any current defaults.

Possibly they should now be enabled in head by default?
2009-01-05 14:21:49 +00:00
Rui Paulo
e287cc5d31 Disable USB bluetooth (needs netgraph built in) and USB audio (doesn't
compile).
2008-12-30 20:13:20 +00:00
Rui Paulo
0b8454a9a0 Add a kernel config file so that users have less difficulty testing
USBng.

If it makes sense, it could be done for arm/mips too.
2008-12-30 19:46:06 +00:00
Marcel Moolenaar
05002c354b Make gpart the default partitioning class on all platforms.
Both ia64 and powerpc were using gpart exclusively already
so there's no change for those two.

Discussed on: arch@
2008-12-17 17:43:22 +00:00
Warner Losh
db3cd725a5 AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them.
Reviewed by:	peter
2008-12-17 06:56:58 +00:00
Warner Losh
0e7faf3934 Remove obsolete AT_DEBUG stuff. It never should have been committed
in the first place, let alone migrated to linux emulation.

Reviewed by:	peter, rdivacky
2008-12-17 06:11:42 +00:00
Joseph Koshy
4e706fe392 Bug fix: %ebx needs to be preserved in the user callchain capture
path.
2008-12-14 09:06:28 +00:00