Commit Graph

5556 Commits

Author SHA1 Message Date
Alan Cox
72dc3eb65b Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.
2010-05-30 18:48:41 +00:00
Alan Cox
8f0d5d3b9f When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first.  Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter().  In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page.  (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.)  This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping.  Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().
2010-05-29 17:10:45 +00:00
John Baldwin
0c86af8162 Defer initializing machine checks for the boot CPU until the local APIC is
fully configured.

MFC after:	1 month
2010-05-28 17:50:24 +00:00
Alan Cox
52d8ba372e Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released.  This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().
2010-05-28 06:49:57 +00:00
Alan Cox
c46b90e90a Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
John Baldwin
58ccad7ddc Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached.  CMCI also includes a count of events when reporting
corrected errors in the bank's status register.  Note that individual
banks may or may not support CMCI.  If they do, each bank includes its own
threshold register that determines when the interrupt fires.  Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl).  The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by:	Sailaja Bangaru  sbappana at yahoo com
MFC after:	1 month
2010-05-24 15:45:05 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Alexander Motin
dbd55f3ff0 - Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
2010-05-24 11:40:49 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Alexander Motin
fa1ed4bd1a Unify local_apic.c for x86 archs, 2010-05-23 17:45:01 +00:00
John Baldwin
e826ef1ec4 - Adjust the whitespace for the lines that output fields in 'show pcpu' in
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
  the idle process is now shared across all idle threads.

MFC after:	1 month
2010-05-21 17:17:56 +00:00
Poul-Henning Kamp
065b12a703 Rename an argument from "exp" to "expect" since the former makes FlexeLint
uneasy, in case anybody think it might be exp(3) in libm.

This also makes it consistent with other archs.
2010-05-20 06:18:03 +00:00
John Baldwin
3b642a049b Add constants for the optional EOI suppression support in local APICs and
EOI registers in I/O APICs.
2010-05-19 19:52:41 +00:00
Alan Cox
9ab6032f73 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
Konstantin Belousov
7565f3e837 Do not use .extern, it is not strictly needed with gas and it is custom
to omit it.

Requested by:	bde
MFC after:	6 days
2010-05-13 09:59:10 +00:00
Konstantin Belousov
6a440fc3e0 Route all returns from the interrupts and faults through the doreti_iret
labeled iretq instruction.

Suppose that multithreaded process executes two threads, currently
scheduled on different processors. Let assume that thread A executes
using %cs or %ss pointing into the descriptor from LDT. If IPI comes
which handler does not return by jump to doreti, and meantime thread B
invalidates descriptor pointed to by %cs or %ss, then iretq from IPI
handler could fault.

Routing the return by doreti_iret allows kernel to catch the situation
and recover from it by sending signal to the usermode.

Tested by:	pho
MFC after:	1 week
2010-05-12 10:29:35 +00:00
Konstantin Belousov
99b1ff2ac4 Remove unneeded overrides of the segment registers in the inner trap
frame upon segment register load fault. The doreti procedure does not
load segment registers when returning to the kernel frame, and current
values in the segment descriptor cache already allow the kernel mode
to run, not modified by faulted loaded.

Suggested by:	bde
Tested by:	pho
MFC after:	1 week
2010-05-12 10:29:06 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Alan Cox
7024db1d40 Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free().  Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.
2010-05-06 16:39:43 +00:00
Konstantin Belousov
db8fd40e9f Add definitions for Intel AESNI CPUID bits and print the capabilities
on boot.

Hardware provided by:	Sentex Communications
MFC after:	1 week
2010-05-05 21:07:47 +00:00
Joel Dahl
8e0ad55abb Switch to our preferred 2-clause BSD license.
Approved by:	kmacy
2010-05-05 20:39:02 +00:00
Konstantin Belousov
bf6f1b56c5 Style and comment adjustements.
Suggested and reviewed by:	bde
MFC after:	3 days
2010-05-03 14:30:49 +00:00
Konstantin Belousov
6a5baa54fd Remove debugging code that was not used once since commit.
Suggested by:	bde
MFC after:	1 week
2010-05-01 13:15:35 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Attilio Rao
d8b878873e - Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
  processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
  processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
  necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by:	Sandvine Incorporated
PR:		threads/116181
Reviewed by:	emaste, marcel
MFC after:	3 weeks
2010-04-28 15:38:01 +00:00
Konstantin Belousov
8bac98182a Style: use #define<TAB> instead of #define<SPACE>.
Noted by:	bde, pluknet gmail com
MFC after:	11 days
2010-04-27 09:48:43 +00:00
Kip Macy
cd4d97b439 missed pv access before pmap lock 2010-04-25 23:51:05 +00:00
Kip Macy
c28d264aa0 Incremental reduction of delta with head_page_lock_2 branch
- replace modification of pmap resident_count with pmap_resident_count_{inc,dec}
- the pv list is protected by the pmap lock, but in several cases we are relying
  on the vm page queue mutex, move pv_va read under the pmap lock
2010-04-25 23:18:02 +00:00
Andrew Thompson
f6a9d95ba3 Set USB_DEBUG like the other platforms, I had turned it off to test the build
before committing r207077.

Spotted by:	marius
2010-04-25 22:01:32 +00:00
Alan Cox
0d2e1c3e39 Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation.  Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection.  This could, by itself, be
fixed, but I don't see the point given the above argument.

Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.
2010-04-25 20:40:45 +00:00
Kip Macy
81b47f28bc apply style(9) changes applied to head_page_lock_2
requested by: kib@
2010-04-24 21:17:07 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Jung-uk Kim
b834123032 If a conditional jump instruction has the same jt and jf, do not perform
the test and jump unconditionally.
2010-04-22 23:47:19 +00:00
Andrew Thompson
b850ecc180 Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had
the illusion of a tunable setting but was always turned on regardless.

MFC after:	1 week
2010-04-22 21:31:34 +00:00
Konstantin Belousov
94c6c6ba67 As was done in r155238 for i386 and in r155239 for amd64, clear the carry
flag for ia32 binary executed on amd64 host in get_mcontext().

PR:	kern/92110 (one more time)
Reported by:	stas
MFC after:	1 week
2010-04-21 11:17:16 +00:00
Rui Paulo
ff569d8436 Rename the cyclic global variable lapic_cyclic_clock_func to just
cyclic_clock_func. This will make more sense when we start developing non
x86 cyclic version.
2010-04-20 17:03:30 +00:00
Pyun YongHyeon
d193ed0bed Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet.
This driver was written by Alexander Pohoyda and greatly enhanced
by Nikolay Denev. I don't have these hardwares but this driver was
tested by Nikolay Denev and xclin.

Because SiS didn't release data sheet for this controller, programming
information came from Linux driver and OpenSolaris. Unlike other open
source driver for SiS190/191, sge(4) takes full advantage of TX/RX
checksum offloading and does not require additional copy operation in
RX handler.
The controller seems to have advanced offloading features like VLAN
hardware tag insertion/stripping, TCP segmentation offload(TSO) as
well as jumbo frame support but these features are not available
yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw>
who sent fix for receiving VLAN oversized frames.
2010-04-14 20:45:33 +00:00
Konstantin Belousov
b71e04d3a8 ld_gs_base is executing with stack containing only the frame,
temporary pushed %rflags has been popped already.

Pointy hat to:	kib
MFC after:	3 days
2010-04-14 20:04:55 +00:00
Konstantin Belousov
5f82d16eb1 Change printf() calls to uprintf() for sigreturn() and trap() complaints
about inacessible or wrong mcontext, and for dreaded "kernel trap with
interrupts disabled" situation. The later is changed when trap is
generated from user mode (shall never be ?).

Normalize the messages to include both pid and thread name.

MFC after:	1 week
2010-04-13 10:12:58 +00:00
Konstantin Belousov
a35d07a831 Handle a case when non-canonical address is loaded into the fsbase or
gsbase MSR.

MFC after:	3 days
2010-04-10 18:38:11 +00:00
Fabien Thomas
1fa7f10bac - Support for uncore counting events: one fixed PMC with the uncore
domain clock, 8 programmable PMC.
- Westmere based CPU (Xeon 5600, Corei7 980X) support.
- New man pages with events list for core and uncore.
- Updated Corei7 events with Intel 253669-033US December 2009 doc.
  There is some removed events in the documentation, they have been
  kept in the code but documented in the man page as obsolete.
- Offcore response events can be setup with rsp token.

Sponsored by: NETASQ
2010-04-02 13:23:49 +00:00
John Baldwin
90dfe31955 Add a handler for the local APIC error interrupt. For now it just prints
out the current value of the local APIC error register when the interrupt
fires.

MFC after:	1 week
2010-03-29 19:13:34 +00:00
John Baldwin
6fb5da5092 Cosmetic tweak to use a type suffix instead of a cast to force a constant
to be a long.
2010-03-29 18:47:04 +00:00
Ed Schouten
510ea843ba Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.
2010-03-28 13:13:22 +00:00
Alan Cox
3792de2e87 Correctly handle preemption of pmap_update_pde_invalidate().
X-MFC after:	r205573
2010-03-27 23:53:47 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
John Baldwin
121b3af9f2 Remove unneeded type specifiers from 64-bit constants. The compiler
infers their natural type from the constants' values.

Submitted by:	bde
MFC after:	3 days
2010-03-22 15:08:26 +00:00
Alan Cox
d088493ba0 Eliminate a pointless TLB invalidation from pmap_bootstrap(). No mappings
whatsoever are changed between the earlier load_cr3() and this invalidation.
2010-03-21 00:21:59 +00:00
Alan Cox
cea8f9dfaf I am told by AMD that the machine check hardware on the instruction TLB
won't generate bogus exceptions.  Therefore, the implementation of the
"unofficial" workaround needn't mask L1TP errors by the instruction cache
unit.
2010-03-21 00:13:11 +00:00