Commit Graph

1768 Commits

Author SHA1 Message Date
Marcel Moolenaar
b1b6c03e3d Provide more examples for error injection. 2010-07-06 23:13:21 +00:00
Marcel Moolenaar
e987ee58d9 Allocate and setup an interrupt vector for corrected machine checks.
For now, just print when we get the interrupt, but eventually we need
to collect the details and provide a more useful report.
2010-07-03 20:19:20 +00:00
Marcel Moolenaar
57764700bc When compiling with profiling, we define PROF for userspace and GPROF
for the kernel.
2010-07-01 00:30:35 +00:00
Marcel Moolenaar
2c9459d167 While functions are ideally aligned to a 32-byte boundary, don't
assume this to be the case.
2010-06-30 22:29:02 +00:00
John Baldwin
fc0de8f0b6 Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
2010-06-30 18:03:42 +00:00
Marcel Moolenaar
d87d5bbf82 The ptc.g operation for the Mckinley and Madison processors has the
side-effect of purging more than the requested translation. While
this is not a problem in general, it invalidates the assumption made
during constructing the trapframe on entry into the kernel in SMP
configurations. The assumption is that only the first store to the
stack will possibly cause a TLB miss. Since the ptc.g purges the
translation caches of all CPUs in the coherency domain, a ptc.g
executed on one CPU can cause a purge on another CPU that is
currently running the critical code that saves the state to the
trapframe. This can cause an unexpected TLB miss and with interrupt
collection disabled this means an unexpected data nested TLB fault.

A data nested TLB fault will not save any context, nor provide a
way for software to determine what caused the TLB miss nor where
it occured. Careful construction of the kernel entry and exit code
allows us to handle a TLB miss in precisely orchastrated points
and thereby avoiding the need to wire the kernel stack, but the
unexpected TLB miss caused by the ptc.g instructution resulted in
an unrecoverable condition and resulting in machine checks.

The solution to this problem is to synchronize the kernel entry
on all CPUs with the use of the ptc.g instruction on a single CPU
by implementing a bare-bones readers-writer lock that allows N
readers (= N CPUs entering the kernel) and 1 writer (= execution
of the ptc.g instruction on some CPU). This solution wins over
a rendez-vous approach by not interrupting CPUs with an IPI.

This problem has not been observed on the Montecito.

PR:		ia64/147772
MFC after:	6 days
2010-06-12 01:45:29 +00:00
Alan Cox
9124d0d6a3 Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set.  Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
2010-06-11 15:49:39 +00:00
Marcel Moolenaar
f635c047c5 Bump MAX_BPAGES from 256 to 1024. It seems that a few drivers, bge(4)
in particular, do not handle deferred DMA map load operations at all.
Any error, and especially EINPROGRESS, is treated as a hard error and
typically abort the current operation. The fact that the busdma code
queues the load operation for when resources (i.e. bounce buffers in
this particular case) are available makes this especially problematic.
Bounce buffering, unlike what the PR synopsis would suggest, works
fine.

While on the subject, properly implement swi_vm().

PR:		147502
MFC after:	1 week
2010-06-11 03:00:32 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
Alan Cox
c68c71f9b8 Simplify the inner loop of get_pv_entry(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty,
wait instead until the loop completes.
2010-05-30 20:31:12 +00:00
Alan Cox
ff8ffaf43a Don't set PG_WRITEABLE in pmap_enter() unless the page is managed. 2010-05-29 18:26:44 +00:00
Alan Cox
c46b90e90a Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
Konstantin Belousov
3341592139 Change ia64' struct syscall_args definition so that args is a pointer to
the arguments array instead of array itself. ia64 syscall arguments are
readily available in the frame, point args to it, do not do unnecessary
bcopy. Still reserve the array in syscall_args for ia32 emulation.

Suggested and reviewed by:	marcel
MFC after:	1 month
2010-05-24 17:24:14 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
John Baldwin
e826ef1ec4 - Adjust the whitespace for the lines that output fields in 'show pcpu' in
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
  the idle process is now shared across all idle threads.

MFC after:	1 month
2010-05-21 17:17:56 +00:00
Marcel Moolenaar
3753228779 Switch to C99 exact-width types. 2010-05-19 00:23:10 +00:00
Alan Cox
9ab6032f73 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Alan Cox
1332aaf9ed MFamd64/i386 r207205
Clearing a page table entry's accessed bit and setting the page's
  PG_REFERENCED flag in pmap_protect() can't really be justified, so
  don't do it.  Moreover, on ia64, don't set the page's dirty field
  unless pmap_protect() is removing write access.
2010-04-29 15:47:31 +00:00
Attilio Rao
d8b878873e - Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
  processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
  processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
  necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by:	Sandvine Incorporated
PR:		threads/116181
Reviewed by:	emaste, marcel
MFC after:	3 weeks
2010-04-28 15:38:01 +00:00
Konstantin Belousov
8bac98182a Style: use #define<TAB> instead of #define<SPACE>.
Noted by:	bde, pluknet gmail com
MFC after:	11 days
2010-04-27 09:48:43 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Andrew Thompson
b850ecc180 Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had
the illusion of a tunable setting but was always turned on regardless.

MFC after:	1 week
2010-04-22 21:31:34 +00:00
Marcel Moolenaar
4658933f3a Populate the sysctl tree with any MCA records we collected.
The sequence number is used as the name of a sysctl node,
under which we add the MCA records using the CPU id as the
leaf  name.

Add the hw.mca.inject sysctl to provide a way to inject
MC errors and trigger machine checks.

PR:		ia64/113102
2010-04-13 22:20:12 +00:00
Marcel Moolenaar
40c46ad800 Change the (generic) argument to ia64_store_mca_state() from the
cpuid to the struct pcpu of the CPU. We casting between pointer
types only then.
2010-04-13 15:55:18 +00:00
Marcel Moolenaar
cfa78e8115 o s/u_int64_t/uint64_t/g
o   style(9) fixes.
2010-04-13 15:51:25 +00:00
Marcel Moolenaar
d572b057de Sync up to SDM 2.2. 2010-04-13 03:10:38 +00:00
Marcel Moolenaar
8d02363b0c Bring up-to-date:
o   Switch to ITANIUM2 has the cpu. This has absolutely no effect
    on the code, but makes for a better example.
o   Drop COMPAT_FREEBSD6. We're tier 2, so you're supposed to run
    8-stable or newer.
o   Add PREEMPTION. It works now.
o   Remove HWPMC_HOOKS. We don't have support for hwpmc yet.

o   Add a bunch of new devices: atapist, hptiop, amr, ips, twa, igb,
    ixgbe, ae, age, alc, ale, bce, bfe, et, jme, msk, nge, sk, ste,
    stge, tx, vge, axe, rue, udav, fwip, and all USB serial.
o   Remove "legacy" devices: le, vx, dc, pcn, rl, sis.

Make sure to the module list is a superset of what goes into GENERIC.
2010-03-27 06:53:11 +00:00
Marcel Moolenaar
9280895b48 Implement interrupt to CPU binding. Assign interrupts to CPUs in a
round-robin fashion, starting with the highest priority interrupt
on the highest-numbered CPU and cycling downwards.
2010-03-27 05:40:50 +00:00
Marcel Moolenaar
1adf3cbdb7 Remove nx_pcibus from the nexus resource. Nexus is not involved
with PCI busses. Remove nexus_read_ivar() and nexus_write_ivar()
to give default behaviour. Remove <machine/nexusvar.h> as well,
because there's nothing in it that's being used.
2010-03-27 03:15:34 +00:00
Marcel Moolenaar
1764e57174 Rename disable_intr() to ia64_disable_intr() and rename enable_intr()
to ia64_enable_intr(). This reduces confusion with intr_disable() and
intr_restore().

Have configure_final() call ia64_finalize_intr() instead of enable_intr()
in preparation of adding support for binding interrupts to all CPUs.
2010-03-26 21:22:02 +00:00
Marcel Moolenaar
f4926eabc2 Only use the interval timer for clock interrupts on the BSP and
have the BSP use IPIs to trigger clock interrupts on the APs.
This allows us to run on hardware configurations for which the
ITC has non-uniform frequencies across CPUs.

While here, change the clock XIV to type IPI so as to protect
the interrupt delivery against CPU re-balancing once that's
implemented.
2010-03-26 02:29:15 +00:00
Nathan Whitehorn
d4425a31a5 Fix the ia64 build.
Pointy hat to: me
2010-03-26 00:53:13 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
Marcel Moolenaar
55bd918aab o Remove the pmap argument to pmap_invalidate_all() as it's not used
other than in a potentially dangerous KASSERT.
o   Hand-inline pmap_remove_page() as it's only called from 1 place and
    the abstraction that pmap_remove_page() provides is not enough to
    warrant the obfuscation. Eliminate the dangerous KASSERT in the
    process.
o   In pmap_remove_pte(), remove the KASSERT for pmap being the current
    one as it's not safe in the face of CPU migration.
2010-03-22 18:24:42 +00:00
Marcel Moolenaar
f73ddcd50b Drop the pmap argument to pmap_invalidate_page(). It's not used other
than in a KASSERT. The KASSERT is broken in that it's done outside the
critical section and as such isn't protected against CPU migration.
Improve pmap_invalidate_page() as follows:
o   calculate vhpt_ofs inside the critical region for exactly the same
    reason.
o   calculate the tag outside the FOREACH loop, as it's loop-invariant.
    This is more efficient.
o   Replace the test and set with an atomic cmpset operation because we
    are changing other CPU's VHPT tables and this avoids invalidating
    after the entry got modified. Not necessarily a problem, but better
    safe than sorry.
2010-03-22 04:24:19 +00:00
Marcel Moolenaar
7bc8a5971b With preemption, the high FP registers may get enabled by cpu_switch()
before we grab the mutex. Don't assert that they must be disabled at
that point. We pretty much bypass all logic in that case anyway and
leave immediately, so there's no harm.
2010-03-22 04:01:45 +00:00
Marcel Moolenaar
95b11053b3 Fix interrupt handling by extending the critical region so that
preemption doesn't happen until after all pending interrupt have
been services.
While here again, simplify the EOI handling by doing it after we
call the XIV-specific handlers, rather than in each of them. The
original thought was that we may want to do an EOI first and the
actual IPI handling next, but that's mostly a micro-optimization.
2010-03-22 03:55:18 +00:00
Marcel Moolenaar
cc7a041c2b Disable interrupts when calling into SAL for PCI configuration
cycles. This serves 2 purposes:
1.  It prevents preemption and CPU migration while running SAL code.
2.  It reduces the chance of stack overflows: we're supposed to enter
    SAL with at least 16KB of either memory- or register stack space,
    which we can't do without switching to a different stack.
2010-03-22 03:06:11 +00:00
Marcel Moolenaar
c56153c577 Define curthread as an inline function that loads the thread pointer
directly from r13, the pcpu pointer. This guarantees correct behaviour
when the thread migrates to a different CPU.
2010-03-22 02:01:33 +00:00
Marcel Moolenaar
a5d64faeca Print MD fields in the pcpu to aid debugging. 2010-03-21 22:39:11 +00:00
Marcel Moolenaar
c50679660e Don't include <machine/_regset.h> when _MACHINE_REGSET_H_ in defined.
This is not for multiple inclusion purposes, because _regset.h already
handles this, but to enable inclusion of the MD header by cross-tools
on non-ia64 installations. The cross-tool can include _regset.h itself
before including MD headers that depend on it.
2010-03-21 22:33:09 +00:00
Marcel Moolenaar
a5cef7a1ce Don't check for boot_verbose in the environment. The loader does
that already and sets RB_VERBOSE. The loader has always done it.
2010-03-20 04:22:22 +00:00
Marcel Moolenaar
3804454ac0 Revamp the interrupt code based on the previous commit:
o   Introduce XIV, eXternal Interrupt Vector, to differentiate from
    the interrupts vectors that are offsets in the IVT (Interrupt
    Vector Table). There's a vector for external interrupts, which
    are based on the XIVs.

o   Keep track of allocated and reserved XIVs so that we can assign
    XIVs without hardcoding anything. When XIVs are allocated, an
    interrupt handler and a class is specified for the XIV. Classes
    are:
    1.  architecture-defined: XIV 15 is returned when no external
	interrupt are pending,
    2.  platform-defined: SAL reports which XIV is used to wakeup
	an AP (typically 0xFF, but it's 0x12 for the Altix 350).
    3.  inter-processor interrupts: allocated for SMP support and
	non-redirectable.
    4.  device interrupts (i.e. IRQs): allocated when devices are
	discovered and are redirectable.

o   Rewrite the central interrupt handler to call the per-XIV
    interrupt handler and rename it to ia64_handle_intr(). Move
    the per-XIV handler implementation to the file where we have
    the XIV allocation/reservation. Clock interrupt handling is
    moved to clock.c. IPI handling is moved to mp_machdep.c.

o   Drop support for the Intel 8259A because it was broken. When
    XIV 0 is received, the CPU should initiate an INTA cycle to
    obtain the interrupt vector of the 8259-based interrupt. In
    these cases the interrupt controller we should be talking to
    WRT to masking on signalling EOI is the 8259 and not the I/O
    SAPIC. This requires adriver for the Intel 8259A which isn't
    available for ia64. Thus stop pretending to support ExtINTs
    and instead panic() so that if we come across hardware that
    has an Intel 8259A, so have something real to work with.

o   With XIVs for IPIs dynamically allocatedi and also based on
    priority, define the IPI_* symbols as variables rather than
    constants. The variable holds the XIV allocated for the IPI.

o   IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV
    assigned to IPI_STOP is delivered.
2010-03-17 00:37:15 +00:00
Marcel Moolenaar
510e1af7cb Have cpu_throw() loop on blocked_lock as well. This bug has existed
a long time and has gone unnoticed just as long, because I kept
using sched_4bsd (due to sched_ule not working with preemption),
but GENERIC had sched_ule by default -- including SMP.

While here, remove unused inclusion of <machine/clock.h>, remove
totally bogus inclusion of <i386/include/specialreg.h>.
2010-03-15 16:53:09 +00:00
Ed Schouten
338f1debcd Remove COMPAT_43TTY from stock kernel configuration files.
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
2010-03-13 09:21:00 +00:00
Nathan Whitehorn
da4e34909f Accidentally committed test code. Remove it.
Big pointy hat:	me
2010-03-11 14:54:54 +00:00