Commit Graph

12124 Commits

Author SHA1 Message Date
Konstantin Belousov
595473a587 Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.

The DF clearing for signal handlers was done some time ago.

MFC after:	1 week
2010-06-23 20:44:07 +00:00
Konstantin Belousov
692add74d8 Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() for
get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also,
note that usercontext is not initialized anymore in fpstate_drop().

Systematically replace references to npxgetregs() and npxsetregs()
by npxgetuserregs() and npxsetuserregs() in comments.

Noted by:	bde
2010-06-23 12:17:13 +00:00
Konstantin Belousov
1060a94fb5 After the FPU use requires #MF working due to INT13 FPU exception handling
removal, MFi386 r209198:
    Use critical sections instead of disabling local interrupts to ensure
    the consistency between PCPU fpcurthread and the state of FPU.

Reviewed by:	bde
Tested by:	pho
2010-06-23 11:21:19 +00:00
Konstantin Belousov
699d648aab Remove the support for int13 FPU exception reporting on i386. It is
believed that all 486-class CPUs FreeBSD is capable to run on, either
have no FPU and cannot use external coprocessor, or have FPU on the
package and can use #MF.

Reviewed by:	bde
Tested by:	pho (previous version)
2010-06-23 11:12:58 +00:00
Konstantin Belousov
95882b9865 Remove unused i586 optimized bcopy/bzero/etc implementations that utilize
FPU registers for copying. Remove the switch table and jumps from
bcopy/bzero/... to the actual implementation.
As a side-effect, i486-optimized bzero is removed.

Reviewed by:	bde
Tested by:	pho (previous version)
2010-06-23 10:40:28 +00:00
Alexander Motin
25eb1b8c15 Some style fixes for r209371.
Submitted by:	jhb@
2010-06-22 16:20:10 +00:00
Alexander Motin
875b8844be Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
 LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
 HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
 i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
 RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
2010-06-20 21:33:29 +00:00
Konstantin Belousov
9e3e64e797 Only enable kdtrace hook in the LINT on the architectures that implement it. 2010-06-18 18:51:09 +00:00
Alexander Motin
d364638110 Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.

Add counters for timer-related IPIs.

Reviewed by:	jhb@ (previous version)
2010-06-17 11:54:49 +00:00
John Baldwin
61d3f0bab2 Restore the machine check register banks on resume. For banks being
monitored via CMCI, reset the interrupt threshold to 1 on resume.

Reviewed by:	jkim
MFC after:	2 weeks
2010-06-15 18:51:41 +00:00
Alexander Motin
2f9fc3899b Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu()
for pre-bound IRQs during boot, submit there LAPIC ID, same as in other
places, not CPU ID.
2010-06-14 07:38:53 +00:00
Alexander Motin
1440de203f Check general TSC presence before doing more specific checks and printfs. 2010-06-12 13:10:03 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Alan Cox
9124d0d6a3 Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set.  Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
2010-06-11 15:49:39 +00:00
Alexander Kabaev
60743cbd22 Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requests
in Linux emulation layer. Linux seems to only require that pos is
page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter
checking is too strict to allow some Linux binaries to run. tsMuxeR is
one example of such a binary.

Discussed with:	jhb
MFC after:	1 week
2010-06-10 17:59:47 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
John Baldwin
b9cd2f771a Move the MD support for PCI message signalled interrupts to the x86 tree
as it is identical for i386 and amd64.
2010-06-08 18:36:03 +00:00
John Baldwin
2465e30f0c Move the machine check support code to the x86 tree since it is identical
on i386 and amd64.

Requested by:	alc
2010-06-08 18:04:07 +00:00
John Baldwin
53a908cb07 Move the I/O APIC code to the x86 tree since it is identical on i386 and
amd64.
2010-06-08 17:51:21 +00:00
John Baldwin
bfc7a4fc48 - Use a bit more care when moving I/O APIC interrupts between CPUs. Mask
the interrupt followed by a brief delay if it is not currently masked
  before moving the interrupt.
- Move the icu_lock out of ioapic_program_intpin() and into callers.  This
  closes a race where ioapic_program_intpin() could use a stale value of
  the masked state to compute the masked bit in the register.

Reviewed by:	mav
MFC after:	2 weeks
2010-06-08 17:08:13 +00:00
Konstantin Belousov
6cf9a08d2c Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches.  There is also a
facility to allow the kernel thread to use pcb save_area.

Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.

KPI discussed with:	fabient
Tested by:    pho, fabient
Hardware provided by:	Sentex Communications
MFC after:    1 month
2010-06-05 15:59:59 +00:00
Alan Cox
966898be68 In the unlikely event that pmap_ts_referenced() demoted five superpage
mappings to the same underlying physical page, the calling thread would be
left forever pinned to the same processor.

MFC after:	3 days
2010-06-03 03:55:22 +00:00
John Baldwin
9c72429312 MFamd64: Add a new macro PCPU_XEN_FIELDS to hold XEN-specific per-CPU
fields that is always included in PCPU_MD_FIELDS.  The macro is empty for
non-XEN kernels.  This avoids duplicating non-XEN per-CPU fields in two
places.  While here, remove several unused fields from the XEN-specific
structure.

Reviewed by:	kmacy, gibbs
MFC after:	1 month
2010-06-02 15:09:36 +00:00
Alan Cox
b2830a9649 Eliminate a stale comment. 2010-05-31 06:06:10 +00:00
Alan Cox
72dc3eb65b Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.
2010-05-30 18:48:41 +00:00
Alan Cox
a1192299b3 Merge various changes from i386/i386/pmap.c:
The remaining, unmerged portions of r175404
  Retire PMAP_DIAGNOSTIC.  Any useful diagnostics that were conditionally
  compiled under PMAP_DIAGNOSTIC are now KASSERT()s.  (Note: The kernel
  option DIAGNOSTIC still disables inlining of certain pmap functions.)

  Eliminate dead code from pmap_enter().  This code implemented an assertion.
  On i386, an equivalent check is already implemented.  However, on amd64,
  a small change is required to implement an equivalent check.

  Eliminate \n from a nearby panic string.

  Use KASSERT() to reimplement pmap_copy()'s two assertions.

Merge portions of r177659
  To date, we have assumed that the TLB will only set the PG_M bit in a
  PTE if that PTE has the PG_RW bit set.  However, this assumption does
  not hold on recent processors from Intel.  For example, consider a PTE
  that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
  is cached in the TLB and later the PG_RW bit is cleared in the PTE,
  but the corresponding TLB entry is not (yet) invalidated.
  Historically, upon a write access using this (stale) TLB entry, the
  TLB would observe that the PG_RW bit had been cleared and initiate a
  page fault, aborting the setting of the PG_M bit in the PTE.  Now,
  however, P4- and Core2-family processors will set the PG_M bit before
  observing that the PG_RW bit is clear and initiating a page fault.  In
  other words, the write does not occur but the PG_M bit is still set.

  The real impact of this difference is not that great.  Specifically,
  we should no longer assert that any PTE with the PG_M bit set must
  also have the PG_RW bit set, and we should ignore the state of the
  PG_M bit unless the PG_RW bit is set.

r208609
  Defer freeing any page table pages in pmap_remove_all() until after the
  page queues lock is released.  This may reduce the amount of time that the
  page queues lock is held by pmap_remove_all().

r208645
  When I pushed down the page queues lock into pmap_is_modified(), I created
  an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
  vm_page_dirty() must perform the call first.  Otherwise, pmap_is_modified()
  could return FALSE without acquiring the page queues lock because the page
  is not (currently) writeable, and the caller to pmap_is_modified() might
  believe that the page's dirty field is clear because it has not seen the
  effect of the vm_page_dirty() call.

  When I pushed down the page queues lock into pmap_is_modified(), I
  overlooked one place where this ordering dependence is violated:
  pmap_enter().  In a rare situation pmap_enter() can be called to replace a
  dirty mapping to one page with a mapping to another page.  (I say rare
  because replacements generally occur as a result of a copy-on-write fault,
  and so the old page is not dirty.)  This change delays clearing PG_WRITEABLE
  until after vm_page_dirty() has been called.

  Fixing the ordering dependency also makes it easy to introduce a small
  optimization: When pmap_enter() used to replace a mapping to one page with a
  mapping to another page, it freed the pv entry for the first mapping and
  later called the pv entry allocator for the new mapping.  Now, pmap_enter()
  attempts to recycle the old pv entry, saving two calls to the pv entry
  allocator.

  There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
  Update a comment to reflect this.

  Tidy up the variable declarations at the start of pmap_enter().
2010-05-30 04:44:32 +00:00
Alan Cox
8f0d5d3b9f When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first.  Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter().  In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page.  (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.)  This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping.  Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().
2010-05-29 17:10:45 +00:00
John Baldwin
0c86af8162 Defer initializing machine checks for the boot CPU until the local APIC is
fully configured.

MFC after:	1 month
2010-05-28 17:50:24 +00:00
Alan Cox
52d8ba372e Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released.  This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().
2010-05-28 06:49:57 +00:00
Konstantin Belousov
eee6151f46 Clarify a potential issue in get_fpcontext() use.
MFC after:	1 week
2010-05-27 18:33:00 +00:00
Alan Cox
c46b90e90a Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
John Baldwin
835f163a20 Only enable CMCI on i386 if 'device apic' is enabled in the kernel since
it requires the local APIC to work.
2010-05-25 21:39:30 +00:00
John Baldwin
58ccad7ddc Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached.  CMCI also includes a count of events when reporting
corrected errors in the bank's status register.  Note that individual
banks may or may not support CMCI.  If they do, each bank includes its own
threshold register that determines when the interrupt fires.  Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl).  The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by:	Sailaja Bangaru  sbappana at yahoo com
MFC after:	1 month
2010-05-24 15:45:05 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Alexander Motin
dbd55f3ff0 - Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
2010-05-24 11:40:49 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Alexander Motin
fa1ed4bd1a Unify local_apic.c for x86 archs, 2010-05-23 17:45:01 +00:00
Poul-Henning Kamp
065b12a703 Rename an argument from "exp" to "expect" since the former makes FlexeLint
uneasy, in case anybody think it might be exp(3) in libm.

This also makes it consistent with other archs.
2010-05-20 06:18:03 +00:00
John Baldwin
3b642a049b Add constants for the optional EOI suppression support in local APICs and
EOI registers in I/O APICs.
2010-05-19 19:52:41 +00:00
Alan Cox
9ab6032f73 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
Poul-Henning Kamp
965df046e5 Apply a patch that has been lingering in my inbox for far too long:
On a soekris Net5501, if you do a watchdog -t 16, followed by a watchdog
-t 0 to disable the watchdog, and then after some time (16s) re-enable
the watchdog the box reboots immediatly. This prevents also to stop and
restart watchdogd(8).

This is because when you stop the watchdog, the timer is not stoped,
only the hard reset is disabled. So when the timer has elapsed, the C2
event of the timer is set.

But when the hard reset is re-enabled, the event is not cleared and the
box reboots.

The attached patch stops and resets the counter when the watchdog is
disabled and do not disable the hard reset of the timer (if the timer
has elapsed it's too late).

Submitted by:	 Patrick Lamaizière
2010-05-15 10:31:11 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Alan Cox
7024db1d40 Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free().  Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.
2010-05-06 16:39:43 +00:00
Konstantin Belousov
db8fd40e9f Add definitions for Intel AESNI CPUID bits and print the capabilities
on boot.

Hardware provided by:	Sentex Communications
MFC after:	1 week
2010-05-05 21:07:47 +00:00
Joel Dahl
8e0ad55abb Switch to our preferred 2-clause BSD license.
Approved by:	kmacy
2010-05-05 20:39:02 +00:00
Kip Macy
958d87cd86 merge 194209 in to the i386/xen pmap
requested by: alc@
2010-04-30 03:26:12 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Attilio Rao
d8b878873e - Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
  processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
  processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
  necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by:	Sandvine Incorporated
PR:		threads/116181
Reviewed by:	emaste, marcel
MFC after:	3 weeks
2010-04-28 15:38:01 +00:00
Konstantin Belousov
8bac98182a Style: use #define<TAB> instead of #define<SPACE>.
Noted by:	bde, pluknet gmail com
MFC after:	11 days
2010-04-27 09:48:43 +00:00
Alan Cox
14dd3a29ea MFi386 r207205
Clearing a page table entry's accessed bit (PG_A) and setting the
  page's PG_REFERENCED flag in pmap_protect() can't really be justified,
  so don't do it.
2010-04-27 05:35:35 +00:00
Alan Cox
0d2e1c3e39 Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation.  Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection.  This could, by itself, be
fixed, but I don't see the point given the above argument.

Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.
2010-04-25 20:40:45 +00:00
Kip Macy
c5cc832f32 - fix style issues on i386 as well
requested by: alc@
2010-04-24 21:36:52 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Jung-uk Kim
b834123032 If a conditional jump instruction has the same jt and jf, do not perform
the test and jump unconditionally.
2010-04-22 23:47:19 +00:00
Andrew Thompson
b850ecc180 Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had
the illusion of a tunable setting but was always turned on regardless.

MFC after:	1 week
2010-04-22 21:31:34 +00:00
Rui Paulo
ff569d8436 Rename the cyclic global variable lapic_cyclic_clock_func to just
cyclic_clock_func. This will make more sense when we start developing non
x86 cyclic version.
2010-04-20 17:03:30 +00:00
Pyun YongHyeon
d193ed0bed Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet.
This driver was written by Alexander Pohoyda and greatly enhanced
by Nikolay Denev. I don't have these hardwares but this driver was
tested by Nikolay Denev and xclin.

Because SiS didn't release data sheet for this controller, programming
information came from Linux driver and OpenSolaris. Unlike other open
source driver for SiS190/191, sge(4) takes full advantage of TX/RX
checksum offloading and does not require additional copy operation in
RX handler.
The controller seems to have advanced offloading features like VLAN
hardware tag insertion/stripping, TCP segmentation offload(TSO) as
well as jumbo frame support but these features are not available
yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw>
who sent fix for receiving VLAN oversized frames.
2010-04-14 20:45:33 +00:00
Konstantin Belousov
5f82d16eb1 Change printf() calls to uprintf() for sigreturn() and trap() complaints
about inacessible or wrong mcontext, and for dreaded "kernel trap with
interrupts disabled" situation. The later is changed when trap is
generated from user mode (shall never be ?).

Normalize the messages to include both pid and thread name.

MFC after:	1 week
2010-04-13 10:12:58 +00:00
Rui Paulo
05c100d21f Add EFI boot info fields. 2010-04-07 18:52:51 +00:00
Joel Dahl
8c14c16020 Switch to our preferred 2-clause BSD license.
Approved by:	jfv
2010-04-07 18:26:13 +00:00
Fabien Thomas
1fa7f10bac - Support for uncore counting events: one fixed PMC with the uncore
domain clock, 8 programmable PMC.
- Westmere based CPU (Xeon 5600, Corei7 980X) support.
- New man pages with events list for core and uncore.
- Updated Corei7 events with Intel 253669-033US December 2009 doc.
  There is some removed events in the documentation, they have been
  kept in the code but documented in the man page as obsolete.
- Offcore response events can be setup with rsp token.

Sponsored by: NETASQ
2010-04-02 13:23:49 +00:00
John Baldwin
90dfe31955 Add a handler for the local APIC error interrupt. For now it just prints
out the current value of the local APIC error register when the interrupt
fires.

MFC after:	1 week
2010-03-29 19:13:34 +00:00
Ed Schouten
510ea843ba Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.
2010-03-28 13:13:22 +00:00
Alan Cox
3792de2e87 Correctly handle preemption of pmap_update_pde_invalidate().
X-MFC after:	r205573
2010-03-27 23:53:47 +00:00
Alan Cox
a57d0d8e1d Simplify pmap_growkernel(), making the i386 version more like the amd64
version.

MFC after:	3 weeks
2010-03-27 18:24:27 +00:00
Alan Cox
09fcdf114e A ptrace(2) by one processor may trigger a promotion in the address space
of another process.  Modify pmap_promote_pde() to handle this.  (This is
not a problem on amd64 due to implementation differences.)

Reported by:	jh@
MFC after:	1 week
2010-03-25 17:24:03 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
Alan Cox
e1990590e3 Adapt r204907 and r205402, the amd64 implementation of the workaround for
AMD Family 10h Erratum 383, to i386.

Enable machine check exceptions by default, just like r204913 for amd64.

Enable superpage promotion only if the processor actually supports large
pages, i.e., PG_PS.

MFC after:	2 weeks
2010-03-24 03:07:35 +00:00
John Baldwin
121b3af9f2 Remove unneeded type specifiers from 64-bit constants. The compiler
infers their natural type from the constants' values.

Submitted by:	bde
MFC after:	3 days
2010-03-22 15:08:26 +00:00
Ed Maste
d02e85a681 Merge r197455 from amd64:
Add a backtrace to the "fpudna in kernel mode!" case, to help track down
  where this comes from.

  Reviewed by:	bde
2010-03-22 11:52:53 +00:00
Xin LI
0d0284bc52 Back out revision 205307.
For the record:

CPU_ENABLE_SSE enables some code that dynamically enables SSE support
but not necessarily enforce execution of SSE instructions.
2010-03-19 16:09:57 +00:00
Andriy Gapon
9344361b66 pmap amd64/i386: fix a typo in a comment
MFC after:	3 days
2010-03-19 14:48:32 +00:00
John Baldwin
42c93b8d31 Use the same policy for rejecting / not-reject ACPI tables with incorrect
checksums as the base acpi(4) driver.  This fixes a problem where the MADT
parser would reject the MADT table during early boot causing the MP Table
to be, but then the acpi(4) driver would attach and use non-SMP interrupt
routing.

Tested by:	Alastair Hogge  agh of coolrhaug com
MFC after:	1 week
2010-03-19 12:43:18 +00:00
Xin LI
01af4cc1cc SSE is enabled by default about 5 years ago so there is no point pretending
that we support I486 and I586 CPUs in the GENERIC kernel, users wants these
support would have to build a custom kernel to explicitly disable SSE
anyways.

MFC after:	1 month
2010-03-19 01:16:53 +00:00
John Baldwin
a311ca2f45 - Extend the machine check record structure to include several fields useful
for parsing model-specific and other fields in machine check events
  including the global machine check capabilities and status registers,
  CPU identification, and the FreeBSD CPU ID.
- Report these added fields in the console log of a machine check so that
  a record structure can be reconstituted from the console messages.
- Parse new architectural errors including memory controller errors.

MFC after:	1 week
2010-03-16 16:01:19 +00:00
John Baldwin
c998036d71 Use unsigned long long constants for fields in 64-bit machine check
registers instead of unsigned long constants.

MFC after:	3 days
2010-03-16 15:27:58 +00:00
Ed Schouten
338f1debcd Remove COMPAT_43TTY from stock kernel configuration files.
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
2010-03-13 09:21:00 +00:00
John Baldwin
55c4e01602 Fix the previous attempt to fix kernel builds of HEAD on 7.x. Use the
__gnu_inline__ attribute for PMAP_INLINE when using the 7.x compiler to
match what 7.x uses for PMAP_INLINE.
2010-03-12 03:08:47 +00:00
John Baldwin
343803ad83 Print out the family and model from the cpu_id. This is especially useful
given the advent of the extended family and extended model fields.  The
values are printed in hex to match their common usage in documentation.

Submitted by:	Alexander Best
MFC after:	1 week
2010-03-11 14:17:37 +00:00
John Baldwin
cf684ede27 Make NKPT a kernel option on i386 so that it can be set to a non-default
value from kernel config files.

Tested by:	Charles Sprickman  spork of bway net
MFC after:	2 weeks
2010-03-10 19:50:52 +00:00
Konstantin Belousov
2a595a404f Fall back to wbinvd when region for CLFLUSH is >= 2MB.
Submitted by:	Kevin Day <toasty dragondata com>
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-10 15:50:38 +00:00
Joel Dahl
1edcf74de7 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from:	NetBSD
2010-03-03 17:55:51 +00:00
Attilio Rao
306c0c6ea0 Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.
Right now, some atrtc seems reporting strange diagnostic error* making the
current pattern bogus.

In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.

Sponsored by:	Sandvine Incorporated
Tested by:	yongari, Richard Todd
		<rmtodd at ichotolot dot servalan dot com>
MFC:		3 weeks
X-MFC:		r202387, 204309
2010-03-03 17:13:29 +00:00
John Baldwin
977cb83962 Print the contents of the miscellaneous (MISC) register to the console if
it is valid along with the other register values when a machine check is
encountered.

MFC after:	1 week
2010-03-01 13:56:15 +00:00
Alan Cox
0b993ee5fd When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs.  Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by:	jhb, kib
MFC after:	3 days
2010-02-27 18:00:57 +00:00
Attilio Rao
3258030144 Introduce the new kernel sub-tree x86 which should contain all the code
shared and generalized between our current amd64, i386 and pc98.

This is just an initial step that should lead to a more complete effort.
For the moment, a very simple porting of cpufreq modules, BIOS calls and
the whole MD specific ISA bus part is added to the sub-tree but ideally
a lot of code might be added and more shared support should grow.

Sponsored by:	Sandvine Incorporated
Reviewed by:	emaste, kib, jhb, imp
Discussed on:	arch
MFC:		3 weeks
2010-02-25 14:13:39 +00:00
Kip Macy
20f72e6f2f - fix bootstrap for variable KVA_PAGES
- remove unused CADDR1
- hold lock across page table update

MFC after:	3 days
2010-02-21 01:13:34 +00:00
Kip Macy
6f1b404676 remove atkbd from default config to avoid pulling in real-mode bios emulation 2010-02-21 01:06:07 +00:00
Ed Schouten
ddc534916d Allow the pmap code to be built with GCC from FreeBSD 7 again.
This patch basically gives us the best of both worlds. Instead of
forcing the compiler to emulate GNU-style inline semantics even though
we're using ISO C99, it will only use GNU-style inlining when the
compiler is configured that way (__GNUC_GNU_INLINE__).

Tested by:	jhb
2010-02-18 14:28:38 +00:00
Attilio Rao
c1210a7d97 Adjust style (following the already existing rules) for the newly
introduced option DEADLKRES.

Reported by:	danfe, julian, avg
2010-02-15 23:44:48 +00:00
Attilio Rao
88cbfa852e Add the options DEADLKRES (introducing the deadlock resolver thread) in
the 'debugging' section of any HEAD kernel and enable for the mainstream
ones, excluding the embedded architectures.
It may, of course, enabled on a case-by-case basis.

Sponsored by:	Sandvine Incorporated
Requested by:	emaste
Discussed with:	kib
2010-02-10 16:30:04 +00:00
Rebecca Cran
c7ea7c4618 Update documentation for the iwn and iwnfw drivers: they support the 1000, 5150, 6000 and 6050 devices too, with firmware modules for the 4965, 1000, 5000, 5150 and 6000.
Add documentation for mwl and all the wireless firmware drivers.

Approved by:	rrs (mentor)
2010-02-08 21:38:42 +00:00
Ed Schouten
790f66db55 Remove unused LIBCOMPAT keyword from syscalls.master. 2010-02-08 10:02:01 +00:00
Alan Cox
bda39c37f1 Change the default value for the flag enabling superpage mapping and
promotion to "on".
2010-02-01 17:36:48 +00:00
Robert Noland
b3e7ca23e7 Enable MTRR on all VIA CPUs that claim support.
This may not be entirely correct either, but the existing check is
bogus.  I have both a C3 and a C7 that fail this check, but work fine.

MFC after:	2 weeks
2010-01-31 14:35:49 +00:00
Robert Noland
b1ba33ffbe Welcome drm support for VIA unichrome chips.
MFC after:	2 weeks
2010-01-31 14:30:39 +00:00
Andriy Gapon
c4d16d268f add static qualifier to definition of a function already declared static
This is for improving code readibility only.

MFC after:	1 week
2010-01-29 10:20:11 +00:00
Alan Cox
b3021b932a Optimize pmap_demote_pde() by using the new KPTmap to access a kernel
page table page instead of creating a temporary mapping to it.

Set the PG_G bit on the page table entries that implement the KPTmap.

Locore initializes the unused portions of the NKPT kernel page table
pages that it allocates to zero.  So, pmap_bootstrap() needn't zero
the page table entries referenced by CMAP1 and CMAP3.

Simplify pmap_set_pg().

MFC after:	10 days
2010-01-27 18:33:22 +00:00
Alan Cox
cf3508519c Handle a race between pmap_kextract() and pmap_promote_pde(). This race is
known to cause a kernel crash in ZFS on i386 when superpage promotion is
enabled.

Tested by:	netchild
MFC after:	1 week
2010-01-23 18:42:28 +00:00
Konstantin Belousov
5b1162b964 For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number
and arguments after ptracestop(), if debugger modified anything in the
process environment. Since procfs stopeven requires number of syscall
arguments in p_xstat, this cannot be solved by moving stop/trace point
before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reported by:	Ali Polatel <alip exherbo org>
Briefly discussed with:	jhb
MFC after:	3 weeks
2010-01-23 11:45:35 +00:00
John Baldwin
13c18821fa Move the examples for the 'hints' and 'env' keywords from various GENERIC
kernel configs into NOTES.

Reviewed by:	imp
2010-01-19 17:20:34 +00:00
Ed Schouten
91bfd816f2 Recommit r193732:
Remove __gnu89_inline.

  Now that we use C99 almost everywhere, just use C99-style in the pmap
  code. Since the pmap code is the only consumer of __gnu89_inline, remove
  it from cdefs.h as well. Because the flag was only introduced 17 months
  ago, I don't expect any problems.

  Reviewed by:    alc

It was backed out, because it prevented us from building kernels using a
7.x compiler. Now that most people use 8.x, there is nothing that holds
us back. Even if people run 7.x, they should be able to build a kernel
if they run `make kernel-toolchain' or `make buildworld' first.
2010-01-19 15:31:18 +00:00
Attilio Rao
a7ccec946b - Allow clock subsystem to be compiled without the apic support [0]
- ATPIC, on pc98 is never defined somewhere, differently from i386.
  Turn its compilation to be conditional as i386 does. [1]

[0] Reported by:	nyan
[1] Submitted by:	nyan
2010-01-17 23:23:35 +00:00
Attilio Rao
a26cb6d547 Handling all the three clocks (hardclock, softclock, profclock) with the
LAPIC may lead to aliasing for softclock and profclock because frequencies
are sized in order to fit mainly hardclock.
atrtc used to take care of the softclock and profclock and it does still
do, if the LAPIC can't handle the clocks properly.

Revert the change when the LAPIC started taking charge of all three of
them and let atrtc handle softclock and profclock if not explicitly
requested. Such request can be made setting != 0 the new tunable
machdep.lapic_allclocks or if the new device ATPIC is not present
within the i386 kernel config (atrtc is linked to atpic presence).

Diagnosed by:	Sandvine Incorporated
Reviewed by:	jhb, emaste
Sponsored by:	Sandvine Incorporated
MFC:		3 weeks
2010-01-15 16:04:30 +00:00
Brooks Davis
9126964cdb Only allocate the space we need before calling kern_getgroups instead
of allocating what ever the user asks for up to "ngroups_max + 1".  On
systems with large values of kern.ngroups this will be more efficient.

The now redundant check that the array is large enough in
kern_getgroups() is deliberate to allow this change to be merged to
stable/8 without breaking potential third party consumers of the API.

Reported by:	bde
MFC after:	28 days
2010-01-15 07:18:46 +00:00
Gavin Atkinson
7964930201 Spell "Hz" correctly wherever it is user-visible.
PR:		bin/142566
Submitted by:	N.J. Mann   njm njm.me.uk
Approved by:	ed (mentor)
MFC after:	2 weeks
2010-01-12 17:59:58 +00:00
Brooks Davis
412f9500e2 Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1.  kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1.  Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after:	1 month
2010-01-12 07:49:34 +00:00
Marcel Moolenaar
409a390c33 Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after:	1 week
2010-01-11 18:10:13 +00:00
Alan Cox
ac24a8ea24 Simplify pmap_init(). Additionally, correct a harmless misbehavior on i386.
Specifically, where locore had created large page mappings for the kernel,
the wrong vm page array entries were being initialized.  The vm page array
entries for the pages containing the kernel were being initialized instead
of the vm page array entries for page table pages.

MFC after:	1 week
2010-01-11 16:01:20 +00:00
Alan Cox
418f8af3a9 Eliminate an unused declaration. 2010-01-11 15:51:13 +00:00
Alan Cox
92697f16aa Eliminate unused declarations. 2010-01-10 21:00:52 +00:00
Warner Losh
87948dfdf2 Add INCLUDE_CONFIG_FILE in GENERIC on all non-embedded platforms.
# This is the resolution of removing it from DEFAULTS...

MFC after:	5 days
2010-01-10 17:44:22 +00:00
Alan Cox
0f59b74f76 Long ago, in r120654, the rounding of KERNend and physfree in locore
was changed from a small page boundary to a large page boundary.  As
a consequence pmap_kmem_choose() became a pointless waste of address
space.  Eliminate it.
2010-01-09 22:09:10 +00:00
Bjoern A. Zeeb
193171b7f5 In sys/<arch>/conf/Makefile set TARGET to <arch>. That allows
sys/conf/makeLINT.mk to only do certain things for certain
architectures.

Note that neither arm nor mips have the Makefile there, thus
essentially not (yet) supporting LINT.  This would enable them
do add special treatment to sys/conf/makeLINT.mk as well chosing
one of the many configurations as LINT.

This is a hack of doing this and keeping it in a separate commit
will allow us to more easily identify and back it out.

Discussed on/with:	arch, jhb (as part of the LINT-VIMAGE thread)
MFC after:		1 month
2010-01-08 18:57:31 +00:00
Bjoern A. Zeeb
23b6a4482b Unbreak the XEN build after r201751. 2010-01-08 16:56:11 +00:00
Alan Cox
cb7667d0f6 Catch up with r183101 that added "device acpi" to GENERIC. 2010-01-08 09:16:37 +00:00
Alan Cox
28a5e2a5d7 Make pmap_set_pg() static. 2010-01-07 17:34:45 +00:00
Alan Cox
0d7e26de54 Eliminate unused variables (see r137912). 2010-01-07 04:47:09 +00:00
Warner Losh
56eff2143f Revert 200594. This file isn't intended for these sorts of things. 2010-01-04 21:30:04 +00:00
Brooks Davis
9efde58392 Add vlan(4) to all GENERIC kernels.
MFC after:	1 week
2010-01-03 20:40:54 +00:00
David E. O'Brien
93d8be03d9 Quiet variable "shadows" warning:
sys/vmmeter.h: warning: shadowed declaration is here
  machine/cpufunc.h: In function 'insw':
  machine/cpufunc.h: warning: declaration of 'cnt' shadows a global declaration
  ..snip..
2010-01-01 20:55:11 +00:00
Robert Noland
cfd7bacef2 Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.
This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by:	jhb@
MFC after:	Not in this lifetime...
2009-12-29 21:51:28 +00:00
John Baldwin
390cee8729 - Create a separate section in in the MI NOTES file for PCI wireless NIC
drivers and move bwi(4) there from the PCI Ethernet NIC section.
- Move ath(4) and ral(4) to the MI NOTES file.

Reviewed by:	rpaulo
2009-12-18 16:13:21 +00:00
Doug Barton
f1bdf073c1 Add INCLUDE_CONFIG_FILE, and a note in comments about how to also
include the comments with CONFIGARGS
2009-12-16 02:17:43 +00:00
John Baldwin
36f1a2c073 Remove comment claiming that building acpi into the kernel is deprecated.
PR:		docs/141353
Submitted by:	Bruce Cran
MFC after:	1 week
2009-12-14 15:32:32 +00:00
Kip Macy
126e2c082b for PV XEN translate page table entries from machine (real) to physical (logical) addresses so that kgdb can
translate them to the correct coredump offsets
2009-12-10 07:48:47 +00:00
Kip Macy
72bc4ff74d - revert pmap_kenter_temporary to taking a physical address
- make minidump work
2009-12-10 03:09:35 +00:00
Kip Macy
367bfeea1c make PV core dump actually dump memory - still need to fix program header initialization 2009-12-09 08:09:25 +00:00
Andriy Gapon
e72b7e5bba mca: small enhancements related to cpu quirks
- use utility macros for CPU family/model checking
- limit Intel P6 quirk to pre-Nehalem models (taken from OpenSolaris)
- add AMD GartTblWkEn quirk for families 0Fh and 10h; I haven't experienced
  any problems without the quirk but both Linux and OpenSolaris do this
- slightly re-arrange quirk code to provide for the future generalization
  and separation of vendor-specific quirk functions

Reviewed by:	jhb
MFC after:	1 week
2009-12-03 16:10:21 +00:00
Andrew Thompson
3da0423e1c Fix cut'n paste on the AR9280 entry.
Submitted by:	pluknet
2009-12-02 21:22:10 +00:00
Andriy Gapon
d5e341a956 mca: improve status checking, recording and reporting
- directly print mca information in case we fail to allocate memory
  for a record
- include bank number into mca record
- print raw mca status value for extended information

Reviewed by:	jhb
MFC after:	10 days
2009-12-02 15:45:55 +00:00
Andrew Thompson
1813b93dc1 Add missing ath_ar9* ath hal entries. 2009-12-02 00:38:11 +00:00
Andriy Gapon
5022f21bd9 amdsbwd: new driver for AMD SB600/SB7xx watchdog timer
The hardware is compliant with WDRT specification, so I originally
considered including generic WDRT watchdog support, but decided
against it, because I couldn't find anyone to the code for me.
WDRT seems to be not very popular.
Besides, generic WDRT porbably requires a slightly different driver
approach.

Reviewed by:	des, gavin, rpaulo
MFC after:	3 weeks
2009-11-30 11:44:03 +00:00
Andriy Gapon
71224c78d4 x86 cpu features: add MOVBE reporting and flag
The check is glimpsed from Linux and OpenSolaris.
MOVBE instruction is found in Intel Atom processors.
2009-11-30 11:11:08 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Attilio Rao
385e11118c i386 has not (yet) any DEV_ATPIC conditional than axe it out from Xen
version.

No objections by:	kmacy
2009-11-27 01:02:17 +00:00
Kip Macy
be7747b449 fixup kernel core dumps on paravirtual guests 2009-11-24 07:17:51 +00:00
Jung-uk Kim
26b8a1c94f - Add more aggressive BPF JIT optimization. This is in more favor of i386
while the previous commit was more amd64-centric.
- Use calloc(3) instead of malloc(3)/memset(3) in user land[1].

Submitted by:	ed[1]
2009-11-23 22:23:19 +00:00
Jung-uk Kim
35012a1e69 Add an experimental and rudimentary JIT optimizer to reduce unncessary
overhead from short BPF filter programs such as "get the first 96 bytes".
2009-11-21 00:19:09 +00:00
Jung-uk Kim
c12b965f99 General style cleanup, no functional change. 2009-11-20 21:12:40 +00:00
Jung-uk Kim
5ecf77367c - Allocate scratch memory on stack instead of pre-allocating it with
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9).  It has no
performance benefit over malloc(9)/free(9)[2].

Requested by:	rwatson[1]
Pointed out by:	rwatson, jhb, alc[2]
2009-11-20 18:49:20 +00:00
Jung-uk Kim
986689c263 Fix tinderbox build for i386 and sync amd64 with it. 2009-11-19 15:45:24 +00:00
Jung-uk Kim
ae4fdab8a8 - Change internal function bpf_jit_compile() to return allocated size of
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.
2009-11-18 23:40:19 +00:00
Jung-uk Kim
366652f987 - Make BPF JIT compiler working again in userland. We are limiting size of
generated native binary to page size for now.
- Update copyright date and fix some style nits.
2009-11-18 19:26:17 +00:00
Alexander Motin
87122077a7 Previous solution appeared to be unsufficient. After additional testing
I have found that it is not only desktop CPUs problem. but mobile also.
Probably AP on laptops just started initially at lower frequency, hiding
the problem.

Disable frequency validation by default, for systems with more then one CPU,
until we can implement it properly. It looks like making more harm now then
benefits. Add 'hw.est.strict' loader tunable to control it.

Now my iXsystems Invincibook is able to run at 800MHz lowest frequency,
instead of 1200MHz before, when 800MHz was incorrectly reported invalid.
2009-11-14 16:20:07 +00:00
Alexander Motin
69e19c5eaf Retry only once, if BIOS is completely broken and gives zero freqs. 2009-11-14 14:29:18 +00:00
Alexander Motin
9b7b3d4cbd Desktop Core2Duo/Core2Quad CPUs are unable to control frequency of single
CPU core, only pair of them. As result, both cores are running on highest
one of requested frequencies, and that is reported by status register.
Such behavior confuses frequency validation logic, as it runs on only
one core, as SMP is not yet launched, making EIST completely unusable.

To workaround this, add check for validation result. If we haven't found
at least two usable frequencies, then probably we are looking bad and have
to trust data provided by BIOS as-is.
2009-11-14 14:16:02 +00:00
Yoshihiro Takahashi
8956df833c Fix cpu model for PODP5V83. It is P24T, not P54T.
Also remove redundant 'Overdrive' word.

Pointed out by:	SATOU Tomokazu (tomo1770 at maple ocn ne jp)
MFC after:	1 week
2009-11-12 10:59:00 +00:00
Jun Kuriyama
bb830eceaa - Style nits.
- Remove unneeded TUNABLE_INT().

Suggested by:	avg, kib
2009-11-12 03:31:19 +00:00
Andriy Gapon
6cc16fcb4e reflect that pg_ps_enabled is a tunable, not just a read-only sysctl
Nod from:	jhb
2009-11-11 14:21:31 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Roman Divacky
68c4dfdf0c Make isa_dma functions MPSAFE by introducing its own private lock. These
functions are selfcontained (ie. they touch only isa_dma.c static variables
and hardware) so a private lock is sufficient to prevent races. This changes
only i386/amd64 while there are also isa_dma functions for ia64/sparc64.
Sparc64 are ones empty stubs and ia64 ones are unused as ia64 does not
have isa (says marcel).

This patch removes explicit locking of Giant from a few drivers (there
are some that requires this but lack ones - this patch fixes this) and
also removes the need for implicit locking of Giant from attach routines
where it's provided by newbus.

Approved by:	ed (mentor, implicit)
Reviewed by:	jhb, attilio (glanced by)
Tested by:	Giovanni Trematerra <giovanni.trematerra gmail com>
IA64 clue:	marcel
2009-11-09 20:29:10 +00:00
Jun Kuriyama
6f5c96c41d - Add hw.clflush_disable loader tunable to avoid panic (trap 9) at
map_invalidate_cache_range() even if CPU is not Intel.
- This tunable can be set to -1 (default), 0 and 1.  -1 is same as
  current behavior, which automatically disable CLFLUSH on Intel CPUs
  without CPUID_SS (should be occured on Xen only).  You can specify 1
  when this panic happened on non-Intel CPUs (such as AMD's).  Because
  disabling CLFLUSH may reduce performance, you can try with setting 0
  on Intel CPUs without SS to use CLFLUSH feature.

Reviewed by:	kib
Reported by:	karl, kuriyama
Related to:	kern/138863
2009-11-09 02:54:16 +00:00
Attilio Rao
f1c892a33c Strip from messages for users external URLs the project cannot directly
control.

Requested by:	kib, rwatson
2009-11-05 14:34:38 +00:00
Attilio Rao
06db609d4a Opteron rev E family of processor expose a bug where, in very rare
ocassions, memory barriers semantic is not honoured by the hardware
itself. As a result, some random breakage can happen in uninvestigable
ways (for further explanation see at the content of the commit itself).

As long as just a specific familly is bugged of an entire architecture
is broken, a complete fix-up is impratical without harming to some
extents the other correct cases.
Considering that (and considering the frequency of the bug exposure)
just print out a warning message if the affected machine is identified.

Pointed out by:	Samy Al Bahra <sbahra at repnop dot org>
Help on wordings by:	jeff
MFC:	3 days
2009-11-04 01:32:59 +00:00
Ed Schouten
4c88ba2d70 Unobfuscate unit number handling in apm(4).
There is no need to use the lower 4 bits of the unit number to store the
device type number. Just use 0 and 1 to distinguish them. devfs also
guarantees that there can never be an open call on a device that has a
unit number different to 0 and 1, so there is no need to check for this
in open().
2009-10-31 10:38:30 +00:00
John Baldwin
f12c034874 Fix some problems with effective mmap() offsets > 32 bits. This was
partially fixed on amd64 earlier.  Rather than forcing linux_mmap_common()
to use a 32-bit offset, have it accept a 64-bit file offset.  This offset
is then passed to the real mmap() call.  Rather than inventing a structure
to hold the normal linux_mmap args that has a 64-bit offset, just pass
each of the arguments individually to linux_mmap_common() since that more
closes matches the existing style of various kern_foo() functions.

Submitted by:	Christian Zander @ Nvidia
MFC after:	1 week
2009-10-28 20:17:54 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Andriy Gapon
df0151b780 add amdtemp to i386 NOTES
essentially this is a MFamd64

Nod from:	rpaulo
2009-10-20 09:31:57 +00:00
Konstantin Belousov
051f6f8a7a Move intr_describe() out of #ifdef SMP; the function is always required.
Reviewed by:	jhb
2009-10-16 12:00:59 +00:00
John Baldwin
37b8ef16cd Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
  the nexus(4) driver supply a custom bus_describe_intr method that invokes
  a new intr_describe() MD routine which in turn looks up the associated
  interrupt event and invokes intr_event_describe_handler().

Requested by:	many
Reviewed by:	scottl
MFC after:	2 weeks
2009-10-15 14:54:35 +00:00
John Baldwin
55b6a401ef Move the USB wireless drivers down into their own section next to the USB
ethernet drivers.

Submitted by:	Glen Barber  glen.j.barber @ gmail
MFC after:	1 month
2009-10-13 19:02:03 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00
Attilio Rao
8448afced8 atomic_cmpset_barr_* was added in order to cope with compilers willing to
specify their own version of atomic_cmpset_* which could have been
different than the membar version.

Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.

In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.

Requested by:	jhb
Reviewed by:	jhb
Tested by:	Giovanni Trematerra <giovanni dot trematerra at
		gmail dot com>
2009-10-09 15:51:40 +00:00
Attilio Rao
d9492a4483 - All the functions in atomic.h needs to be in "physical" form (like
not defined through macros or similar) in order to be later compiled in
  the kernel and offer this way the support for modules (and
  compatibility among the UP case and SMP case).
  Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
  and specifying a template.  Note that the new DEFINE_CMPSET_GEN()
  template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.

[1] Reported by:	Paul B. Mahol <onemda at gmail dot com>
2009-10-06 23:48:28 +00:00
Attilio Rao
86d2e48c22 Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used.  GCC, however, does that aggressively, even in
presence of volatile operands.  The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).

Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.

Reported by:	jhb
Reviewed by:	jhb
Tested by:	rdivacky, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
2009-10-06 13:45:49 +00:00
Bjoern A. Zeeb
52bf2041ac Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by:	kib
MFC after:	1 month
2009-10-03 11:57:21 +00:00
Kip Macy
46aba52a50 make read_eflags and write_eflags accomplish the same effect on PVM as native,
simplifying interrupt handling
2009-10-01 22:05:38 +00:00
Konstantin Belousov
b02395c64d As a workaround, for Intel CPUs, do not use CLFLUSH in
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.

XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.

Tested by:	csjp
Reviewed by:	alc
MFC after:	3 days
2009-10-01 12:52:48 +00:00
Rui Paulo
c16c6b65da Improve 802.11s comment.
Spotted by:	dougb
MFC after:	1 day
2009-10-01 02:08:42 +00:00
Andriy Gapon
beb2c1f3e9 cpufunc.h: unify/correct style of c extension names
i386 and amd64 archs only.
inline => __inline. [1]
__asm__ => __asm. [2]

Reviewed by:	kib, jhb [1]
Suggested by:	kib [2]
MFC after:	1 week
2009-09-30 16:34:50 +00:00
Jung-uk Kim
71f99e637a Copy apm(4) emulation from sys/i386/acpica/acpi_machdep.c and
install apm(8) and apm_bios.h on amd64.
2009-09-27 14:00:16 +00:00
Bjoern A. Zeeb
4507f02e0e lindev(4) [1] is supposed to be a collection of linux-specific pseudo
devices that we also support, just not by default (thus only LINT or
module builds by default).

While currently there is only "/dev/full" [2], we are planning to see more
in the future.  We may decide to change the module/dependency logic in the
future should the list grow too long.

This is not part of linux.ko as also non-linux binaries like kFreeBSD
userland or ports can make use of this as well.

Suggested by:	rwatson [1] (name)
Submitted by:	ed [2]
Discussed with:	markm, ed, rwatson, kib (weeks ago)
Reviewed by:	rwatson, brueffer (prev. version)
PR:		kern/68961
MFC after:	6 weeks
2009-09-26 12:45:28 +00:00
Andriy Gapon
1e908511f8 number of cleanups in i386 and amd64 pci md code
o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
  unsigned before comparison with maximum value to cut off negative
  values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
  values are already checked in the subsequent switch

Reviewed by:	jhb
MFC after:	1 week
2009-09-24 07:11:23 +00:00
John Baldwin
d95e7f5a7a Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
  APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
  mapping the RSDT or XSDT and searching for a table with a given signature
  out into acpica_machdep.c for both amd64 and i386.
2009-09-23 15:42:35 +00:00
John Baldwin
07ee969179 - Split the logic to parse an SMAP entry out into a separate function on
amd64 similar to i386.  This fixes a bug on amd64 where overlapping
  entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
  instead of an append to support systems with out-of-order SMAP entries.

PR:		amd64/138220
Reported by:	James R. Van Artsdalen  james of jrv org
MFC after:	3 days
2009-09-22 16:51:00 +00:00
Xin LI
a57707e712 Build x86bios only for i386/amd64 for now. More work is required
to make these functional on other architectures, and the current
code breaks sparc64 and powerpc.

Spotted by:	tinderbox via des
2009-09-21 23:58:29 +00:00
Konstantin Belousov
a1bfaca761 If CPU happens to be in usermode when a T_RESERVED trap occured,
then trapsignal is called with ksi.ksi_signo = 0. For debugging kernels,
that should end up in panic, for non-debugging kernels behaviour is
undefined.

Do panic regardeless of execution mode at the moment of trap.

Reviewed by:	jhb
MFC after:	1 month
2009-09-21 09:41:51 +00:00
Xin LI
6abad12dfe Automatically depend on x86emu when vesa or dpms is being built into
kernel.  With this change the user no longer need to remember building
this option.

Submitted by:	swell.k at gmail.com
2009-09-21 07:08:20 +00:00
Xin LI
372c733759 Enable s3pci on amd64 which works on top of VESA, and allow
static building it into kernel on i386 and amd64.

Submitted by:	swell.k at gmail.com
2009-09-21 07:05:48 +00:00
Alan Cox
d6dbb0dba0 When superpages are enabled, add the 2 or 4MB page size to the array of
supported page sizes.

Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:09:33 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Robert Watson
e76d823b81 Use C99 initialization for struct filterops.
Obtained from:	Mac OS X
Sponsored by:	Apple Inc.
MFC after:	3 weeks
2009-09-12 20:03:45 +00:00
Kip Macy
7d05808361 fix UP compilation 2009-09-11 23:41:11 +00:00
Jung-uk Kim
3bcdfb9bf8 Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.
2009-09-10 17:27:36 +00:00
Dag-Erling Smørgrav
80c03b8eee As jhb@ pointed out to me, r197057 was incorrect, not least because these
are generated files.
2009-09-10 13:20:27 +00:00
Konstantin Belousov
9411b7675e As was done in r196643 for i386 and amd64, swap the start/end virtual
addresses in pmap_invalidate_cache_range().

Reported by:	Vincent Hoffman <vince unsane co uk>
Reviewed by:	jhb
MFC after:	3 days
2009-09-09 19:40:54 +00:00
Xin LI
ee5e90dab2 - Teach vesa(4) and dpms(4) about x86emu. [1]
- Add vesa kernel options for amd64.
 - Connect libvgl library and splash kernel modules to amd64 build.
 - Connect manual page dpms(4) to amd64 build.
 - Remove old vesa/dpms files.

Submitted by:	paradox <ddkprog yahoo com> [1], swell k at gmail.com
		(with some minor tweaks)
2009-09-09 09:50:31 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00
Konstantin Belousov
4dc50fc92d Add missing ';'. 2009-09-04 14:53:12 +00:00
Julian Elischer
d0ac01f1f6 whitespace commit
Submitted by:	bde@
2009-09-04 07:29:24 +00:00
Julian Elischer
02c41ee985 Bring i386 up to date with amd64 and others.
The macros for PCPU can be slightly simplified, which makes the
resulting tangle qa lot easier to understand when trying to read them.

MFC after:	4 weeks
2009-09-04 05:40:06 +00:00
Jung-uk Kim
c8e648e167 Fix confusing comments about default PAT entries. 2009-09-02 16:47:10 +00:00
Jung-uk Kim
c9e8817902 - Work around ACPI mode transition problem for recent NVIDIA 9400M chipset
based Intel Macs.  Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason.  For these platforms, we just
use the old PAT layout.  Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.

Reported by:	rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by:	jhb
2009-09-02 16:02:48 +00:00
John Baldwin
a01e019a26 Don't attempt to bind the current thread to the CPU an IRQ is bound to
when removing an interrupt handler from an IRQ during shutdown.  During
shutdown we are already bound to CPU 0 and this was triggering a panic.

MFC after:	3 days
2009-09-02 00:39:59 +00:00
Adrian Chadd
56af96debe Delete whitespace not in i386/pmap.c 2009-09-01 12:17:47 +00:00
Adrian Chadd
c0bb2b8058 Migrate to use cpuset_t. 2009-09-01 06:15:50 +00:00
Adrian Chadd
fa52c6106b Merge in the pat_works work from sys/i386/i386/pmap.c - primarily to reduce
diff size.
2009-09-01 05:15:45 +00:00
Adrian Chadd
f8d44dbb74 Fix broken build. 2009-09-01 03:44:25 +00:00
Adrian Chadd
82fe1cca67 Revert previous commit; that was left-over junk in the tree. 2009-08-31 23:35:59 +00:00
Adrian Chadd
0ad6375395 Shuffle pagezero() into the same location as in sys/i386/i386/pmap.c. 2009-08-31 23:30:39 +00:00
John Baldwin
8101afb656 Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.

Discussed with:	alc
MFC after:	3 days
2009-08-31 18:41:13 +00:00
John Baldwin
75e66e421a Improve pmap_change_attr() so that it is able to demote a large (2/4MB)
page into 4KB pages as needed.  This should be fairly rare in practice
on i386.  This includes merging the following changes from the amd64 pmap:
180430, 180485, 180845, 181043, 181077, and 196318.
- Add basic support for changing attributes on PDEs to pmap_change_attr()
  similar to the support in the initial version of pmap_change_attr() on
  amd64 including inlines for pmap_pde_attr() and pmap_pte_attr().
- Extend pmap_demote_pde() to include the ability to instantiate a new page
  table page where none existed before.
- Enhance pmap_change_attr().  Use pmap_demote_pde() to demote a 2/4MB page
  mapping to 4KB page mappings when the specified attribute change only
  applies to a portion of the 2/4MB page.  Previously, in such cases,
  pmap_change_attr() gave up and returned an error.
- Correct a critical accounting error in pmap_demote_pde().

Reviewed by:	alc
MFC after:	3 days
2009-08-31 17:42:52 +00:00
Xin LI
f9d38c281c Partially revert 196524: this part of change should not be committed as
part of the changeset - it's an unrelated one.

Reported by:	danfe
2009-08-31 17:34:11 +00:00
Bjoern A. Zeeb
ecc2fda872 Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.

Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.

The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo  was matched first,  as the fallback to ld-elf32.so.1 does
not exist in that case.

Reported and tested by:	ticso
In collaboration with:	kib
MFC after:		3 days
2009-08-30 14:38:17 +00:00
Robert Noland
cbc3c1f687 Swap the start/end virtual addresses in pmap_invalidate_cache_range().
This fixes the functionality on non SelfSnoop hardware.

Found by:	rnoland
Submitted by:	alc
Reviewed by:	kib
MFC after:	3 days
2009-08-29 16:01:21 +00:00
Gleb Smirnoff
08cc4f3542 Fix build broken in r196524. 2009-08-25 14:08:33 +00:00
Xin LI
c5bebef869 Fix VESA modes and allow 8bit depth modes.
PR:		i386/124902
Submitted by:	paradox <ddkprog yahoo com>
MFC after:	2 months
2009-08-24 22:35:53 +00:00
Bjoern A. Zeeb
89ffc202d6 Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR:		kern/135468
Submitted by:	dchagin [1] (initial patch)
Suggested by:	kib [2]
Tested by:	Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by:	kib
MFC after:	3 days
2009-08-24 16:19:47 +00:00
Jung-uk Kim
66406b8f25 Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.

Tested by:	bz
Approved by:	re (kib)
2009-08-20 22:58:05 +00:00
John Baldwin
a56fe095f0 Temporarily revert the new-bus locking for 8.0 release. It will be
reintroduced after HEAD is reopened for commits by re@.

Approved by:	re (kib), attilio
2009-08-20 19:17:53 +00:00
Ed Schouten
12f27c4e64 Make the MacBookPro3,1 hardware boot again.
Tested by:	Patrick Lamaiziere <patfbsd davenulle org>
Approved by:	re (kib)
2009-08-19 20:39:33 +00:00
Attilio Rao
68a5590836 Port recent IPI enhachements to en:
* Introduce the ipi_nmi_handler() function for the Xen infrastructure
* Fixup adeguately the ipi sender functions

Approved by:	re (kib)
2009-08-15 18:37:06 +00:00
John Baldwin
21157ad3b1 Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
  routines in the local APIC code that the hwpmc(4) driver can use to
  manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
  HWPMC_HOOKS is enabled.  Instead, the hwpmc(4) driver explicitly
  enables the interrupt when it is succesfully initialized and disables
  the interrupt when it is unloaded.  This avoids enabling the interrupt
  on unsupported CPUs which may result in spurious NMIs.

Reported by:	rnoland
Reviewed by:	jkoshy
Approved by:	re (kib)
MFC after:	2 weeks
2009-08-14 21:05:08 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
Attilio Rao
444b91868b Make the newbus subsystem Giant free by adding the new newbus sxlock.
The newbus lock is responsible for protecting newbus internIal structures,
device states and devclass flags. It is necessary to hold it when all
such datas are accessed. For the other operations, softc locking should
ensure enough protection to avoid races.

Newbus lock is automatically held when virtual operations on the device
and bus are invoked when loading the driver or when the suspend/resume
take place. For other 'spourious' operations trying to access/modify
the newbus topology, newbus lock needs to be automatically acquired and
dropped.

For the moment Giant is also acquired in some key point (modules subsystem)
in order to avoid problems before the 8.0 release as module handlers could
make assumptions about it. This Giant locking should go just after
the release happens.

Please keep in mind that the public interface can be expanded in order
to provide more support, if there are really necessities at some point
and also some bugs could arise as long as the patch needs a bit of
further testing.

Bump __FreeBSD_version in order to reflect the newbus lock introduction.

Reviewed by:    ed, hps, jhb, imp, mav, scottl
No answer by:   ariff, thompsa, yongari
Tested by:      pho,
                G. Trematerra <giovanni dot trematerra at gmail dot com>,
                Brandon Gooch <jamesbrandongooch at gmail dot com>
Sponsored by:   Yahoo! Incorporated
Approved by:	re (ksmith)
2009-08-02 14:28:40 +00:00
Ed Schouten
61fb73de41 Make the MacBook3,1 boot again.
Approved by:	re (kib)
2009-08-02 11:26:23 +00:00
Konstantin Belousov
ed190ad06a Fix XEN build breakage, by implementing pmap_invalidate_cache_range()
and using it when appropriate. Merge analogue of the r195836
optimization to XEN.

Approved by:	re (kensmith)
2009-07-29 19:38:33 +00:00
Konstantin Belousov
8a5ac5d56f As was done in r195820 for amd64, use clflush for flushing cache lines
when memory page caching attributes changed, and CPU does not support
self-snoop, but implemented clflush, for i386.

Take care of possible mappings of the page by sf buffer by utilizing
the mapping for clflush, otherwise map the page transiently. Amd64
used direct map.

Proposed and reviewed by:  alc
Approved by:   re (kensmith)
2009-07-29 08:49:58 +00:00
Rui Paulo
1f93ae9453 Refine the MacBook hack to only match early models that have Intel ICH.
Discussed with:	kjim
Approved by:	re (kib)
2009-07-27 13:51:55 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Alan Cox
3313145243 Eliminate unnecessary cache and TLB flushes by pmap_change_attr(). (This
optimization was implemented in the amd64 version roughly 1 year ago.)

Approved by:	re (kensmith)
2009-07-23 19:43:23 +00:00
Alan Cox
9861cbc6ca Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386.  Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages.  Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache.  Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address.  For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it.  It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by:	re (kib)
2009-07-19 21:40:19 +00:00
Alan Cox
13de722155 An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc().  It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by:	re (kib)
2009-07-18 01:50:05 +00:00
Jung-uk Kim
e8c4d3e407 Match PCI Express root bridge _HID directly instead of
relying on _CID.

Reviewed by:	jhb
Approved by:	re (kib)
2009-07-13 21:36:31 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Rui Paulo
59aa14a91d Implementation of the upcoming Wireless Mesh standard, 802.11s, on the
net80211 wireless stack. This work is based on the March 2009 D3.0 draft
standard. This standard is expected to become final next year.
This includes two main net80211 modules, ieee80211_mesh.c
which deals with peer link management, link metric calculation,
routing table control and mesh configuration and ieee80211_hwmp.c
which deals with the actually routing process on the mesh network.
HWMP is the mandatory routing protocol on by the mesh standard, but
others, such as RA-OLSR, can be implemented.

Authentication and encryption are not implemented.

There are several scripts under tools/tools/net80211/scripts that can be
used to test different mesh network topologies and they also teach you
how to setup a mesh vap (for the impatient: ifconfig wlan0 create
wlandev ... wlanmode mesh).

A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled
by default on GENERIC kernels for i386, amd64, sparc64 and pc98.

Drivers that support mesh networks right now are: ath, ral and mwl.

More information at: http://wiki.freebsd.org/WifiMesh

Please note that this work is experimental. Also, please note that
bridging a mesh vap with another network interface is not yet supported.

Many thanks to the FreeBSD Foundation for sponsoring this project and to
Sam Leffler for his support.
Also, I would like to thank Gateworks Corporation for sending me a
Cambria board which was used during the development of this project.

Reviewed by:	sam
Approved by:	re (kensmith)
Obtained from:	projects/mesh11s
2009-07-11 15:02:45 +00:00
Edward Tomasz Napierala
c38898116a There is an optimization in chmod(1), that makes it not to call chmod(2)
if the new file mode is the same as it was before; however, this
optimization must be disabled for filesystems that support NFSv4 ACLs.
Chmod uses pathconf(2) to determine whether this is the case - however,
pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.

This change adds lpathconf(3) to make it possible to solve that problem
in a clean way.

Reviewed by:	rwatson (earlier version)
Approved by:	re (kib)
2009-07-08 15:23:18 +00:00
John Baldwin
0d0a6650d7 After the per-CPU IDT changes, the IDT vector of an interrupt could change
when the interrupt was moved from one CPU to another.  If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled.  This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range.  However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled.  The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.

Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method.  While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.

Approved by:	re (kensmith)
2009-07-06 18:23:00 +00:00
Alan Cox
0e18ab26d0 PAE adds another level to the i386 page table. This level is a small
4-entry table that must be located within the first 4GB of RAM.  This
requirement is met by defining an UMA zone with a custom back-end
allocator function.  This revision makes two changes to this back-end
allocator function: (1) It replaces the use of contigmalloc() with the
use of kmem_alloc_contig().  This eliminates "double accounting", i.e.,
accounting by both the UMA zone and malloc tags.  (I made the same
change for the same reason to the zones supporting jumbo frames a week
ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK,
M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it.
pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO.  At the
moment, this is harmless only because the default behavior of
contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init()
doesn't really depend on the memory being zeroed.

The back-end allocator function in the Xen pmap is dead code.  I am
changing it nonetheless because I don't want to leave any "bad examples"
in the source tree for someone to copy at a later date.

Approved by:	re (kib)
2009-07-05 21:40:21 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
Ed Schouten
89fe4c0a2b Enable POSIX semaphores on all non-embedded architectures by default.
More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.

Proposed by:	miwi
Patch by:	Florian Smeets <flo kasimir com>
Approved by:	re (kib)
2009-07-02 18:24:37 +00:00
John Baldwin
cebc7fb16c Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
  to a specific CPU to return an error value instead of void, thus allowing
  it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
  destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
  on the first interrupt in a group.  Moving the first interrupt in a group
  moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
  intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
  interrupts.  Previously, binding an interrupt to a CPU only performed a
  privilege check if the interrupt had an interrupt thread.  Interrupts
  without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
  cpuset mask for the associated interrupt thread.

Approved by:	re (kib)
2009-07-01 17:20:07 +00:00
Doug Rabson
b49a2b39fd Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.
Approved by: re
2009-06-30 19:03:27 +00:00
Robert Watson
14961ba789 Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type.  This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by:	brooks
Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-06-27 13:58:44 +00:00
John Baldwin
7c020cbbe5 Return ENOSYS instead of EINVAL for invalid function codes to match the
behavior of Linux.

Reported by:	Alexander Best  alexbestms of math.uni-muenster.de
Approved by:	re (kib)
2009-06-26 19:39:33 +00:00
Alan Cox
5797795f5a Correct the #endif comment.
Noticed by:	jmallett
Approved by:	re (kib)
2009-06-26 16:22:24 +00:00
Alan Cox
e999111ae7 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
John Baldwin
4e9dba6322 Fix kernels compiled without SMP support. Make intr_next_cpu() available
for UP kernels but as a stub that always returns the single CPU's local
APIC ID.

Reported by:	kib
2009-06-25 20:35:46 +00:00
John Baldwin
b4805f449c - Restore the behavior of pre-allocating IDT vectors for MSI interrupts.
This is mostly important for the multiple MSI message case where the
  IDT vectors for the entire group need to be allocated together.  This
  also restores the assumptions made by the PCI bus code that it could
  invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
  assigned to interrupt sources on activation.  Instead of assigning the
  CPU via pic_assign_cpu() before calling enable_intr(), allow the
  different interrupt source drivers to ask the MD interrupt code which
  CPU to use when they allocate an IDT vector.  I/O APIC interrupt pins
  do this in their pic_enable_intr() routines giving the same behavior as
  before.  MSI sources do it when the IDT vectors are allocated during
  msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.

Tested by:	rnoland
2009-06-25 18:13:46 +00:00
Robert Watson
d4b9a71c71 Fix ibcs2_ipc.c build by adding missing limits.h include.
Submitted by:	keramida
2009-06-25 07:25:39 +00:00
John Baldwin
b648d4806b Change the ABI of some of the structures used by the SYSV IPC API:
- The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned
  short.
- The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned
  short.
- The mode member of struct ipc_perm is now mode_t instead of unsigned short
  (this is merely a style bug).
- The rather dubious padding fields for ABI compat with SV/I386 have been
  removed from struct msqid_ds and struct semid_ds.
- The shm_segsz member of struct shmid_ds is now a size_t instead of an
  int.  This removes the need for the shm_bsegsz member in struct
  shmid_kernel and should allow for complete support of SYSV SHM regions
  >= 2GB.
- The shm_nattch member of struct shmid_ds is now an int instead of a
  short.
- The shm_internal member of struct shmid_ds is now gone.  The internal
  VM object pointer for SHM regions has been moved into struct
  shmid_kernel.
- The existing __semctl(), msgctl(), and shmctl() system call entries are
  now marked COMPAT7 and new versions of those system calls which support
  the new ABI are now present.
- The new system calls are assigned to the FBSD-1.1 version in libc.  The
  FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls.
- A simplistic framework for tagging system calls with compatibility
  symbol versions has been added to libc.  Version tags are added to
  system calls by adding an appropriate __sym_compat() entry to
  src/lib/libc/incldue/compat.h. [1]

PR:		kern/16195 kern/113218 bin/129855
Reviewed by:	arch@, rwatson
Discussed with:	kan, kib [1]
2009-06-24 21:10:52 +00:00
John Baldwin
7af55bd450 Whitespace fix. 2009-06-24 19:16:48 +00:00
Alexander Motin
9f23a6caa4 Make algorithm a bit more bulletproof. 2009-06-23 23:16:37 +00:00
Jeff Roberson
50c202c592 Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
   DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
   PCPU_*.  Requires only one extra instruction more than PCPU_* and is
   virtually the same as __thread for builtin and much faster for shared
   objects.  DPCPU variables can be initialized when defined.
 - Modules are supported by relocating the module's per-cpu linker set
   over space reserved in the kernel.  Modules may fail to load if there
   is insufficient space available.
 - Track space available for modules with a one-off extent allocator.
   Free may block for memory to allocate space for an extent.

Reviewed by:    jhb, rwatson, kan, sam, grehan, marius, marcel, stas
2009-06-23 22:42:39 +00:00
Alexander Motin
96c5d068d8 Rework r193814:
While general idea of patch was good, it was not working properly due the way
it was implemented. When we are using same timer interrupt for several of
hard/prof/stat purposes we should not send several IPIs same time to other
CPUs. Sending several IPIs same time leads to terrible accounting/profiling
results due to strong synchronization effect, when the second interrupt
handler accounts processing of the first one.
Interlink timer events in a such way, that no more then one IPI is sent for
any original timer interrupt.
2009-06-23 21:45:33 +00:00
Rui Paulo
df849145b5 * Driver for ACPI WMI (Windows Management Instrumentation)
* Driver for ACPI HP extra functionations, which required
  ACPI WMI driver.

Submitted by:	Michael <freebsdusb at bindone.de>
Approved by:	re
MFC after:	2 weeks
2009-06-23 13:17:25 +00:00
Alan Cox
0f6766f3da Eliminate dead code. These definitions should have been deleted with the
introduction of i686_mem.c in r45405.

Merge adjacent #ifdef _KERNEL/#endif blocks.
2009-06-22 04:21:02 +00:00
Brooks Davis
87ecedf4f8 Use NGROUPS instead of NGROUPS_MAX as the limits on setgroups and
getgroups for ibcs emulation.  It seems vanishingly likely any
programs will actually be affected since they probably assume a much
lower value and use a static array size.
2009-06-20 18:52:02 +00:00
Brooks Davis
838d985825 Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively.  (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer.  Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively.  Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary.  In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups.  When feasible, truncate
the group list rather than generating an error.

Minor changes:
  - Reduce the number of hand rolled versions of groupmember().
  - Do not assign to both cr_gid and cr_groups[0].
  - Modify ipfw to cache ucreds instead of part of their contents since
    they are immutable once referenced by more than one entity.

Submitted by:	Isilon Systems (initial implementation)
X-MFC after:	never
PR:		bin/113398 kern/133867
2009-06-19 17:10:35 +00:00
John Baldwin
7228812fd2 Regen for added flags field. 2009-06-17 19:53:20 +00:00
John Baldwin
38a9df71f9 Move (read|write)_cyrix_reg() inlines from specialreg.h to cpufunc.h.
specialreg.h now consists solely of register-related macros.
2009-06-16 15:13:18 +00:00
Alexander Motin
bb74c2db4d Forbid multi-vector MSI interrupt vectors migration to another CPU once
allocated. MSI have strict vectors allocation requirements, which are not
satisfied now during reallocation. This is not the best possible solution,
but better then just broken, as it was.

No objections: current@, arch@, jhb@
2009-06-15 13:47:49 +00:00
Alan Cox
387aabc513 Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt().  This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous.  Unfortunately, this is not always the case.  For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption.  Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous.  This
revision also changes some inconsistent if not buggy behavior.  For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid.  However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page.  The amd64 version has a bug of
my own creation.  It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance.  I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms.  However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity).  These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with:	jhb@
2009-06-14 19:51:43 +00:00
Ed Schouten
c2919fd8ff Enable PRINTF_BUFR_SIZE on i386 and amd64 by default.
In the past there have been some reports of PRINTF_BUFR_SIZE not
functioning correctly. Instead of having garbled console messages, we
should just see whether the issues are still there and analyze them.

Approved by:	re
2009-06-14 18:01:35 +00:00
Ed Schouten
2b7ceeb0b3 Clobber "cc" instead of using volatile.
Submitted by:	Christoph Mallon
2009-06-13 14:30:08 +00:00
Ed Schouten
4ec1748cfe Clobber "cc" instead of using volatile; remove obsolete register keyword.
Submitted by:	Christoph Mallon
2009-06-13 14:00:10 +00:00
Ed Schouten
3c7553cc20 Simplify the inline assembler (and correct potential error) of pte_load_store().
Submitted by:	Christoph Mallon
2009-06-13 13:56:06 +00:00
Andriy Gapon
ebb6aed490 strict kobj signatures: fix legacy i386 pcib_write_config impl
Reviewed by:	imp, current@
Approved by:	jhb (mentor)
2009-06-11 17:06:31 +00:00
Konstantin Belousov
d8b0556c6d Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
Pyun YongHyeon
d68875eb7e Add alc(4), a driver for Atheros AR8131/AR8132 PCIe ethernet
controller. These controllers are also known as L1C(AR8131) and
L2C(AR8132) respectively. These controllers resembles the first
generation controller L1 but usage of different descriptor format
and new register mappings over L1 register space requires a new
driver. There are a couple of registers I still don't understand
but the driver seems to have no critical issues for performance and
stability. Currently alc(4) supports the following hardware
features.
  o MSI
  o TCP Segmentation offload
  o Hardware VLAN tag insertion/stripping
  o Tx/Rx interrupt moderation
  o Hardware statistics counters(dev.alc.%d.stats)
  o Jumbo frame
  o WOL
AR8131/AR8132 also supports Tx checksum offloading but I disabled
it due to stability issues. I'm not sure this comes from broken
sample boards or hardware bugs. If you know your controller works
without problems you can still enable it. The controller has a
silicon bug for Rx checksum offloading, so the feature was not
implemented.
I'd like to say big thanks to Atheros. Atheros kindly sent sample
boards to me and answered several questions I had.

HW donated by:	Atheros Communications, Inc.
2009-06-10 02:07:58 +00:00
Kip Macy
98fda6ac58 opt in to flowtable on i386/amd64 2009-06-09 21:58:14 +00:00
Kip Macy
c804d618eb remove flowtable from DEFAULTS 2009-06-09 20:26:52 +00:00
Ariff Abdullah
b65cb1db3c When using i8254 as the only kernel timer source:
- Interpolate stat/prof clock using clkintr() in a similar fashion to
  local APIC timer, since statclock usually run slower.

- Liberate hardclockintr() from taking the burden of handling both stat
  and prof clock interrupt. Instead, send IPIs within clkintr() to handle
  those.
2009-06-09 07:26:52 +00:00
Ariff Abdullah
867cdecd40 Move C1E workaround into its own idle function. Previous workaround works
only during initial booting process, while there are laptops/BIOSes that
tend to act 'smarter' by force enabling C1E if the main power adapter
being pulled out, rendering previous workaround ineffective. Given the
fact that we still rely on local APIC to drive timer interrupt, this
workaround should keep all Turion (probably Phenom too) X\d+ alive whether
its on battery power or not.

URL:		http://lists.freebsd.org/pipermail/freebsd-acpi/2008-April/004858.html
    		http://lists.freebsd.org/pipermail/freebsd-acpi/2008-May/004888.html

Tested by:	Peter Jeremy <peterjeremy at optushome d com d au>
2009-06-09 04:17:36 +00:00
Xin LI
38676b52dd Add line width calculations for 15/16 and 24/32 bit modes in case
the "Get Scan Line Length" function fails, as it does in Parallels
(in Version 2.2, Build 2112 at least).

PR:		i386/127367
Obtained from:	DragonFly
Submitted by:	Pedro Giffuni
MFC after:	1 month
2009-06-09 00:54:57 +00:00
Jung-uk Kim
230bb4d90d Rewrite OsdSynch.c to reflect the latest ACPICA more closely:
- Implement ACPI semaphore (ACPI_SEMAPHORE) with condvar(9) and mutex(9).
- Implement ACPI mutex (ACPI_MUTEX) with mutex(9).
- Implement ACPI lock (ACPI_SPINLOCK) with spin mutex(9).
2009-06-08 20:07:16 +00:00
Ed Schouten
5942207fb4 Revert my change; reintroduce __gnu89_inline.
It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.

Requested by:	rwatson
2009-06-08 18:23:43 +00:00
Ed Schouten
032e3d1d19 Remove __gnu89_inline.
Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by:	alc
2009-06-08 17:27:25 +00:00
Adrian Chadd
385432acf8 Decouple the i386 native and i386 Xen APIC definitions a little further.
I'm experimenting locally with xen APIC emulation a bit and this
makes it easier to migrate APIC entries between being bitmapped and
not being bitmapped.
2009-06-07 22:52:48 +00:00
Jung-uk Kim
129d3046ef Import ACPICA 20090521. 2009-06-05 18:44:36 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Robert Watson
bd875f5f13 Remove MAC kernel config files and add "options MAC" to GENERIC, with the
goal of shipping 8.0 with MAC support in the default kernel.  No policies
will be compiled in or enabled by default, but it will now be possible to
load them at boot or runtime without a kernel recompile.

While the framework is not believed to impose measurable overhead when no
policies are loaded (a result of optimization over the past few months in
HEAD), we'll continue to benchmark and optimize as the release approaches.
Please keep an eye out for performance or functionality regressions that
could be a result of this change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2009-06-02 18:31:08 +00:00
Dmitry Chagin
f8cd0af232 Implement accept4 syscall.
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-01 20:48:39 +00:00
Robert Watson
33dd50646e Regenerate generated syscall files following changes to struct sysent in
r193234.
2009-06-01 16:14:38 +00:00
Adrian Chadd
c22ca7f04f Fix the MP IPI code to differentiate between bitmapped IPIs and function IPIs.
This attempts to fix the IPI handling code to correctly differentiate
between bitmapped IPIs and function IPIs. The Xen IPIs were on low numbers
which clashed with the bitmapped IPIs.

This commit bumps those IPI numbers up to 240 and above (just like in the i386
code) and fiddles with the ipi_vectors[] logic to call the correct function.

This still isn't "right". Specifically, the IPI code may work fine for TLB
shootdown events but the rendezvous/lazypmap IPIs are thrown by calling ipi_*()
routines which don't set the call_func stuff (function id, addr1, addr2) that
the TLB shootdown events are. So the Xen SMP support is still broken.

PR:		135069
2009-05-31 08:11:39 +00:00
Adrian Chadd
23ca223bfa Remove some unused code in ipi_selected() .
The code path this was copied from (sys/i386/i386/mp_machdep.c:ipi_selected())
handles bitmap'ed IPIs and normal IPIs via separate notification paths. Xen
SMP handles them the same way.
2009-05-31 07:25:24 +00:00
Adrian Chadd
eee45687e9 Even though I'm not quite sure that the call_func stuff will work properly
in all the places/cases IPI messages will be generated, at least be consistent
with how the call_data pointer is assigned and cleared (ie, all done inside
the spinlock.

Ensure that its NULL before continuing, just to try and identify situations
where things are going horribly wrong.
2009-05-30 15:20:25 +00:00
Adrian Chadd
d68bbb81fb Don't schedule a CALL_FUNCTION_VECTOR software IPI if the IPI was signaled
via the bitmap (and thus sent via RESCHEDULE_VECTOR.)
2009-05-30 14:59:08 +00:00
Adrian Chadd
a235aceb1a Correctly report the IPI IRQs being created; make it clear what vectors they are for. 2009-05-30 06:37:03 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Adrian Chadd
f3ba9cc983 Revert to 2-clause. 2009-05-29 13:48:42 +00:00
Adrian Chadd
fd27593538 Fix the Xen TOD update when the hypervisor wall clock is nudged.
The "wall clock" in the current code is actually the hypervisor start time.
The time of day is the "start time" plus the hypervisor "uptime".

Large enough bumps in the dom0 clock lead to a hypervisor "bump" which is
implemented as a bump in the start time, not the uptime. The clock.c routines
were reading in the hypervisor start time and then using this as the TOD.
This meant that any hypervisor time bump would cause the FreeBSD DomU to
set its TOD to the hypervisor start time, rather than the actual TOD.

This fix is a bit hacky and some reshuffling should be done later on
to clarify what is going on. I've left the wall clock code alone.
(The code which updates shadow_tv and shadow_tv_version.)
A new routine adds the uptime to the shadow_tv, which is then used to
update the TOD.

I've included some debugging so it is obvious when the clock is nudged.

PR:	135008
2009-05-29 13:43:21 +00:00
Adrian Chadd
7d18ff9a2b Migrate the Xen hypervisor clock reading routines into something
sharable.
2009-05-29 13:36:06 +00:00
Adrian Chadd
58e3a119e5 Say hello to a very basic, read-only, Xen Hypervisor RTC.
The hypervisor doesn't provide a single "TOD" - it instead provides a
"start time" and a "running time". These are added together to form
the current TOD. The TOD is in UTC.

This RTC is only (initially) designed to be read at startup. There's
some further poking that needs to happen to pick up hypervisor time
changes (ie, by the Dom0 time being adjusted by something). This
time adjustment currently can cause "weird stuff" in the DomU clock;
I'll begin investigating and repairing that in subsequent commits.

PR:		135008
2009-05-28 04:17:05 +00:00
Warner Losh
1de9b53249 We don't need d_thread_t for cross-branch portability here anymore.
Move do struct thread * instead.
2009-05-20 16:47:40 +00:00
Warner Losh
09605c1806 Some minor style changes:
o Convert K&R function definitions to ANSI
o Eliminate spaces/tabs that should have been deleted as part of the de__P
  efforts
o Use struct thread * in preference to d_thread_t *.
2009-05-20 16:29:22 +00:00
John Baldwin
515c5b1ede Don't bother reading the initial value of the machine check banks during
startup on Pentium 4 CPUs.  This wasn't safe to do on APs during AP startup,
was of limited value, and won't be used for future processors.
2009-05-20 16:11:22 +00:00
John Baldwin
dfc77ef51f - Add a tunable 'hw.mca.enabled' that can be used to enable/disable the
machine check code.  Disable it by default for now.
- When computing the mask of bits that determines a non-restartable event
  during a machine check exception, or-in the overflow flag rather than
  replacing the other flags.

PR:		i386/134586 [2]
Submitted by:	Andi Kleen  andi-fbsd firstfloor.org
2009-05-18 21:50:06 +00:00
John Baldwin
d3da228f37 Add a read-only sysctl hw.pci.mcfg to mirror the tunable by the same name.
MFC after:	1 week
2009-05-18 21:47:32 +00:00
John Baldwin
8aba835b8e Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend
using 128 byte alignment for locks.  (See IA-32 SDM Vol 3A 7.11.6.7)
2009-05-18 19:33:59 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
Dmitry Chagin
3933bde22e Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and
SOCK_NONBLOCK flags, that allow to save fcntl() calls.

Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-16 18:48:41 +00:00
John Baldwin
76dae09449 Trim the default set of device hints on i386 and amd64:
- Remove vga0 and the disabled uart2/uart3 hints from both platforms.
- Remove hints for ISA adv0, bt0, aha0, aic0, ed0, cs0, sn0, ie0, fe0, and
  le0 from i386.  All these hints were marked 'disabled' and thus already
  did not work "out of the box".

Discussed with:	imp
2009-05-14 21:53:35 +00:00
Attilio Rao
120b18d86f FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now).  Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by:	jhb, kmacy
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2009-05-14 17:43:00 +00:00
John Baldwin
9dc0b3d54f Implement simple machine check support for amd64 and i386.
- For CPUs that only support MCE (the machine check exception) but not MCA
  (i.e. Pentium), all this does is print out the value of the machine check
  registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
  a bit more involved.
  - First, there is limited support for decoding the CPU-independent MCA
    error codes in the kernel, and the kernel uses this to output a short
    description of any machine check events that occur.
  - When a machine check exception occurs, all of the MCx banks on the
    current CPU are scanned and any events are reported to the console
    before panic'ing.
  - To catch events for correctable errors, a periodic timer kicks off a
    task which scans the MCx banks on all CPUs.  The frequency of these
    checks is controlled via the "hw.mca.interval" sysctl.
  - Userland can request an immediate scan of the MCx banks by writing
    a non-zero value to "hw.mca.force_scan".
  - If any correctable events are encountered, the appropriate details
    are stored in a 'struct mca_record' (defined in <machine/mca.h>).
    The "hw.mca.count" is a count of such records and each record may
    be queried via the "hw.mca.records" tree by specifying the record
    index (0 .. count - 1) as the next name in the MIB similar to using
    PIDs with the kern.proc.* sysctls.  The idea is to export machine
    check events to userland for more detailed processing.
  - The periodic timer and hw.mca sysctls are only present if the CPU
    supports MCA.

Discussed with:	emaste (briefly)
MFC after:	1 month
2009-05-13 17:53:04 +00:00
Alan Cox
07a7b85e94 Correct a rare use-after-free error in pmap_copy(). This error was
introduced in amd64 revision 1.540 and i386 revision 1.547.  However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)

The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page.  I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails.  This is appropriate because the creation of
mappings by pmap_copy() is optional.  They are a (possible) optimization,
and not a requirement.

Correct a nearby whitespace error in the i386 pmap_copy().

Crash reported by: jeff@
MFC after:	6 weeks
2009-05-13 07:42:53 +00:00
Christian Brueffer
d1e015bfee Remove unused variables.
Found with:	Coverity Prevent(tm)
CID:		4285, 4286
2009-05-12 22:11:02 +00:00
Dmitry Chagin
8d30f381ef Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by:	kib (mentor)
2009-05-10 18:43:43 +00:00
Dmitry Chagin
1ca16454b3 Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by:	bde

Approved by:	kib (mentor)
2009-05-10 18:16:07 +00:00
Jun Kuriyama
b3b17597ea - Use "device\t" and "options \t" for consistency. 2009-05-10 00:00:25 +00:00
Ed Schouten
e5a34e5582 Regenerate system call tables to use SVN ids. 2009-05-08 20:16:04 +00:00
Ed Schouten
a1c45f054a Regenerate ibcs2 system call table. 2009-05-08 20:08:43 +00:00
Ed Schouten
51e6ba0f00 Burn TTY ioctl bridges in compat layers.
I really don't want any pieces of code to include ioctl_compat.h, so let
the ibcs2 and svr4 compat leave sgtty alone. If they want to support
sgtty, they should emulate it on top of termios, not sgtty.

The code has been marked with BURN_BRIDGES for a long time. ibcs2 and
svr4 are not really popular pieces of code anyway.
2009-05-08 20:06:37 +00:00
Marko Zec
29b02909eb Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
Jamie Gritton
7ae27ff49f Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
Dmitry Chagin
13f20d7e86 To avoid excessive code duplication move MI definitions to the MI
header file. As it is defined in Linux.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 09:39:20 +00:00
Alexander Motin
614dd4f83c Do not try to initialize LAPIC timer if we are not going to use it.
It solves assertion, when kernel built with INVARIANTS configured
to use i8254 timer.
2009-05-05 01:13:20 +00:00
Jung-uk Kim
4ef853cc7f Unlock the largest standard CPUID on Intel CPUs for both amd64 and i386 and
fix SMP topology detection.  On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place.  On amd64, all supported Intel CPUs should have this MSR.
2009-05-04 18:05:27 +00:00
Alexander Motin
64886da216 Oops, sorry. Fix for fix. 2009-05-04 08:41:54 +00:00
Alexander Motin
b5f9da0f8d There is no atrtc driver in pc98, so hide atrtcclock_disable variable usage
in APM driver for this platform. This should fix pc98 build.
2009-05-04 08:36:47 +00:00
Alexander Motin
1703f2b424 Rename statclock_disable variable to atrtcclock_disable that it actually is,
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.

Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.

This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.
2009-05-03 17:47:21 +00:00
Kip Macy
51a70df2db fix XEN compilation 2009-05-02 22:22:00 +00:00
Alexander Motin
a40d9024df Add support for using i8254 and rtc timers as event sources for i386 SMP
system. Redistribute hard-/stat-/profclock events to other CPUs using IPI.
2009-05-02 12:59:47 +00:00
Dmitry Chagin
d789bfd562 Move extern variable definitions to the header file.
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-02 10:06:49 +00:00
Alexander Motin
2f369c9496 Small addition to r191720.
Restore previous behaviour for the case of unknown interrupt. Invocation
of IRQ -1 crashes my system on resume. Returning 0, as it was, is not
perfect also, but at least not so dangerous.
2009-05-01 20:53:37 +00:00
Sam Leffler
dcad868984 o add uath
o sort usb wireless drivers
2009-05-01 17:20:16 +00:00
Alexander Motin
1ecff35a6b Use value -1 instead of 0 for marking unused APIC vectors. This fixes
IRQ0 routing on LAPIC-enabled systems.

Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers
as hard-/stat-/profclock sources falling back to using i8254 and rtc timers.

On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU
enters C3 or deeper power state. It makes no problems for interrupt
processing, as chipset wakes up CPU on interrupt triggering. But entering
C3 state kills LAPIC timer and freezes system time, making C3 and deeper
states practically unusable. Using i8254 timer allows to avoid this
problem.

By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters
C3 state, saving more then a Watt of total idle power (>10%) in addition to
all other power-saving techniques.

This technique is not working for SMP yet, as only one CPU receives
timer interrupts. But I think that problem could be fixed by forwarding
interrupts to other CPUs with IPI.
2009-05-01 17:05:49 +00:00
Dmitry Chagin
79262bf1f0 Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-01 15:36:02 +00:00
Jung-uk Kim
788399cbd9 - Fix divide-by-zero panic when SMP kernel is used on UP system[1].
- Avoid possible divide-by-zero panic on SMP system when the CPUID is
disabled, unsupported, or buggy.

Submitted by:	pluknet (pluknet at gmail dot com)[1]
2009-04-30 22:10:04 +00:00
Jeff Roberson
82fcb0f192 - Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
 - Remove the cpu_cores/cpu_logical detection from identcpu.
 - Describe the layout of the system in cpu_mp_announce().

Sponsored by:   Nokia
2009-04-29 06:54:40 +00:00
John Baldwin
10395e0714 Reduce the number of bounce zones (and thus the number of bounce pages
used in some cases):
- Ignore DMA tag boundaries when allocating bounce pages.  The boundaries
  don't determine whether or not parts of a DMA request bounce.  Instead,
  they are just used to carve up segments.
- Allow tags with sub-page alignment to share bounce pages since bounce
  pages are always page aligned.

Reviewed by:	scottl (amd64)
MFC after:	1 month
2009-04-23 20:24:19 +00:00
John Baldwin
125f11d360 Adjust the way we number CPUs on x86 so that we attempt to "group" all
logical CPUs in a package.  We do this by numbering the non-boot CPUs
by starting with the first CPU whose APIC ID is after the boot CPU and
wrapping back around to APIC ID 0 if needed rather than always starting
at APIC ID 0.  While here, adjust the cpu_mp_announce() routine to list
CPUs based on the mapping established by assign_cpu_ids() rather than
making assumptions about the algorithm assign_cpu_ids() uses.

MFC after:	1 month
2009-04-22 21:40:37 +00:00
Robert Watson
9725389e1e Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by:	bde [1], jhb [2]
MFC after:	2 weeks
2009-04-20 12:59:23 +00:00
Robert Watson
22037b2d2c Add description and cautionary note regarding CACHE_LINE_SIZE.
MFC after:	2 weeks
Suggested by:	alc
2009-04-19 21:26:36 +00:00
Robert Watson
a93fa8f2bb For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant.  These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after:	2 weeks
Discussed on:   arch@
2009-04-19 20:19:13 +00:00
Kip Macy
34b07340ff - Import infrastructure for caching flows as a means of accelerating L3 and L2 lookups
as well as providing stateful load balancing when used with RADIX_MPATH.
- Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at
  runtime with 'sysctl net.inet.flowtable.enable=1'.

- Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to
  their kernel config files.

- A minimal hookup will be added to ip_output in a subsequent commit. I would like to see
  more review before bringing in changes that require more churn.

Supported by: Bitgravity Inc.
2009-04-19 00:16:04 +00:00
John Baldwin
842f11bef6 Restore bus DMA bounce pages to an offset of 0 when they are released by
a tag that has BUS_DMA_KEEP_PG_OFFSET set.  Otherwise the page could be
reused with a non-zero offset by a tag that doesn't have
BUS_DMA_KEEP_PG_OFFSET leading to data corruption.

Sleuthing by:	avg
Reviewed by:	scottl
2009-04-17 13:22:18 +00:00
Marcel Moolenaar
6ad9a99f21 Add a compat option to the EBR scheme that controls the
naming of the partitions (GEOM_PART_EBR_COMPAT).  When
compatibility is enabled, changes to the partitioning are
disallowed.

Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.

Enable compatibility on amd64 and i386.
2009-04-15 22:38:22 +00:00
Jung-uk Kim
cebe9dc98a A simple rewrite of biossmap.c:
- Do not iterate int 15h, function e820h twice.  Instead, we use STAILQ to
store each return buffer and copy all at once.
- Export optional extended attributes defined in ACPI 3.0 as separate
metadata.  Currently, there are only two bits defined in the specification.
For example, if the descriptor has extended attributes and it is not
enabled, it has to be ignored by OS.  We may implement it in the kernel
later if it is necessary and proven correct in reality.
- Check return buffer size strictly as suggested in ACPI 3.0.

Reviewed by:	jhb
2009-04-15 17:31:22 +00:00
Konstantin Belousov
3feb57a0a8 The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the
uio_td to extract pages from, instead of unconditionally use kernel
pmap.

Submitted by:	Jason Harmening <jason.harmening gmail com> (amd64 version)
PR:	amd64/133592
Reviewed by:	scottl (original patch), jhb
MFC after:	2 weeks
2009-04-13 19:20:32 +00:00
Ed Schouten
e1048f7678 Simplify in/out functions (for i386 and AMD64).
Remove a hack to generate more efficient code for port numbers below
0x100, which has been obsolete for at least ten years, because GCC has
an asm constraint to specify that.

Submitted by:	Christoph Mallon <christoph mallon gmx de>
2009-04-11 14:01:01 +00:00
Ed Schouten
2c97d32a81 Also remove the unused __word_swap_int*() macros.
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-08 19:10:20 +00:00
Ed Schouten
17cfde3df4 Implement __bswap16() without using inline assembly.
Most compilers nowadays (including GCC) are smart enough to know what's
going on and generate more efficient code anyway.

Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-08 19:06:47 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Alan Cox
beb3c3a9c5 Retire VM_PROT_READ_IS_EXEC. It was intended to be a micro-optimization,
but I see no benefit from it today.

VM_PROT_READ_IS_EXEC was only intended for use on processors that do not
distinguish between read and execute permission.  On an mmap(2) or
mprotect(2), it automatically added execute permission if the caller
specified permissions included read permission.  The hope was that this
would reduce the number of vm map entries needed to implement an address
space because there would be fewer neighboring vm map entries that differed
only in the presence or absence of VM_PROT_EXECUTE.  (See vm/vm_mmap.c
revision 1.56.)

Today, I don't see any real applications that benefit from
VM_PROT_READ_IS_EXEC.  In any case, vm map entries are now organized
as a self-adjusting binary search tree instead of an ordered list.  So,
the need for coalescing vm map entries is not as great as it once was.
2009-04-04 23:12:14 +00:00
Doug Rabson
3e33218d77 Fix the Xen build for i386 PV mode. 2009-04-01 17:06:28 +00:00
Konstantin Belousov
7496ce7d74 Sync definitions for struct sigcontext for i386 and amd64 architectures
to struct mcontext.
2009-04-01 13:44:28 +00:00
Konstantin Belousov
6ce76ba376 Fill the fsbase and gsbase fields of the mcontext structure on i386.
In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
2009-04-01 12:46:05 +00:00
Konstantin Belousov
0cdf4ffabc Add all segment registers for the amd64 CPU to struct reg and mcontext.
To keep these structures ABI-compatible, half the size of r_trapno,
r_err, mc_trapno, mc_flags.

Add fsbase and gsbase to mcontext on both amd64 and i386.
Add flags to amd64 mcontext to indicate that it contains valid segments
or bases.

In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
2009-04-01 12:44:17 +00:00
Jung-uk Kim
4e4ce82e0a Fix an uninitialized variable from the previous commit. 2009-03-31 21:14:05 +00:00
Jung-uk Kim
938608cb45 Probe size of installed memory modules from loader and display it
as 'real memory' instead of Maxmem if the value is available.
Note amd64 displayed physmem as 'usable memory' since machdep.c r1.640
to unconfuse users.  Now it is consistent across amd64 and i386 again.
While I am here, clean up smbios.c a bit and update copyright date.

Reviewed by:	jhb
2009-03-31 21:02:55 +00:00
Michael Reifenberger
b1a0a22d44 Extend comment in copyright notice as requested by author.
Submitted by:	G.Otsuji
2009-03-29 13:35:20 +00:00
Michael Reifenberger
24cd37102c Add support for Phenom (Family 10h) to cpufreq.
Its a newer version provided by the author than in the PR.

PR:		kern/128575
Submitted by:	Gen Otsuji annona2 [at] gmail.com
2009-03-28 08:54:47 +00:00
Konstantin Belousov
49d008d916 Convert gdt_segs and ldt_segs initialization to C99 style.
Reviewed by:	jhb
2009-03-26 18:07:13 +00:00
John Baldwin
b9dda9d6fe Fix a few nits in the earlier changes to prevent local information leakage
in AMD FPUs:
- Do not clear the affected state in the case that the FPU registers for
  the thread that already owns the FPU are changed via fpu_setregs().  The
  only local information the thread would see is its own state in that
  case.
- Fix a type mismatch for the dummy variable used in a "fld".  It accepts
  a float, not a double.

Reviewed by:	bde
Approved by:	so (cperciva)
MFC after:	1 month
2009-03-25 22:08:30 +00:00
John Baldwin
63de9515b7 Rename (fpu|npx)_cleanstate to (fpu|npx)_initialstate to better reflect
their purpose.

Inspired by:	bde
MFC after:	1 month
2009-03-25 14:17:08 +00:00
John Baldwin
6cad8eb41d Fall back to using configuration type 1 accesses for PCI config requests if
the requested PCI bus falls outside of the bus range given in the ACPI
MCFG table.  Several BIOSes seem to not include all of the PCI busses in
systems in their MCFG tables.  It maybe that the BIOS is simply buggy and
does support all the busses, but it is more conservative to just fall back
to the old method unless it is certain that memory accesses will work.
2009-03-24 18:10:22 +00:00