- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).
Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.
Update stack(9) man page.
Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)
We used to allocate the domains 0-14 for userland, and leave the domain 15
for the kernel. Now supersections requires the use of domain 0, so we
switched the kernel domain to 0, and use 1-15 for userland.
How it's done currently, the kernel domain could be allocated for a
userland process.
So switch back to the previous way we did things, set the first available
domain to 0, and just add 1 to get the real domain number in the struct pmap.
Reported by: Mark Tinguely <tinguely AT casselton DOT net>
MFC After: 3 days
The RAS implementation would set the end address, then the start
address. These were used by the kernel to restart a RAS sequence if
it was interrupted. When the thread switching code ran, it would
check these values and adjust the PC and clear them if it did.
However, there's a small flaw in this scheme. Thread T1, sets the end
address and gets preempted. Thread T2 runs and also does a RAS
operation. This resets end to zero. Thread T1 now runs again and
sets start and then begins the RAS sequence, but is preempted before
the RAS sequence executes its last instruction. The kernel code that
would ordinarily restart the RAS sequence doesn't because the PC isn't
between start and 0, so the PC isn't set to the start of the sequence.
So when T1 is resumed again, it is at the wrong location for RAS to
produce the correct results. This causes the wrong results for the
atomic sequence.
The window for the first race is 3 instructions. The window for the
second race is 5-10 instructions depending on the atomic operation.
This makes this failure fairly rare and hard to reproduce.
Mutexs are implemented in libthr using atomic operations. When the
above race would occur, a lock could get stuck locked, causing many
downstream problems, as you might expect.
Also, make sure to reset the start and end address when doing a syscall, or
a malicious process could set them before doing a syscall.
Reviewed by: imp, ups (thanks guys)
Pointy hat to: cognet
MFC After: 3 days
Call uma_sel_align() there at well.
Set CPU_CONTROL_VECRELOC if we're using the high vectors page.
Submitted by: Rafal Jaworowski <raj AT semihalf DOT com>
MFC After: 1 week
It should just contain the value we want to add, as if we're interrupted
between the add and the str, we will restart from the beginning. Just use
a register we can scratch instead.
MFC After: 1 week
routine. It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.
MFC after: 1 week
Reviewed by: scottl
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().
Reviewed by: tegge
MFC after: 6 weeks
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().
i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().
PR: ia64/118024
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).
In collaboration with: Peter Holm
Reviewed by: jhb
in the same order as it's set in ate_set_mac.
I remember a discussion about this on -arm, but apparently nothing was done.
Warner, is this wrong ?
X-MFC After: proper review
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.
Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)
After discussion with Sam, switch back to use firmware(9) instead of
having the firmware in hex format.
Put the binary firmware uuencoded into sys/contrib/dev/npe, and slap a
LICENSE file, as found on the Intel website.
Approved by: re (blanket), mux (mentor)
MFC After: 1 week
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)