range operations like pmap_remove() and pmap_protect() as well as allowing
simple operations like pmap_extract() not to involve any global state.
This substantially reduces lock coverages for the global table lock and
improves concurrency.
sync performs a strict superset of the functions of eieio, so using both
is redundant. While here, expand bus barriers to all bus_space operations,
since many drivers do not correctly use bus_space_barrier().
In principle, we can also replace sync just with eieio, for a significant
performance increase, but it remains to be seen whether any poorly-written
drivers currently depend on the side effects of sync to properly function.
MFC after: 1 week
usermode context switches (long jumps and ucontext operations). If these
are used across threads, multiple threads can end up with the same TLS base.
Madness will then result.
This makes behavior on PPC match that on x86 systems and on Linux.
MFC after: 10 days
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
constraints.
These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.
Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.
Reviewed by: scottl
profiling and kernel profiling. To enable kernel profiling one has to build
kgmon(8). I will enable the build once I managed to build and test powerpc
(32-bit) kernels with profiling support.
- add a powerpc64 PROF_PROLOGUE for _mcount.
- add macros to avoid adding the PROF_PROLOGUE in certain assembly entries.
- apply these macros where needed.
- add size information to the MCOUNT function.
MFC after: 3 weeks, together with r230291
possible, and double faults within an SLB trap handler are not. The result
is that it possible to take an SLB fault at any time, on any address, for
any reason, at any point in the kernel.
This lets us do two important things. First, it removes the (soft) 16 GB RAM
ceiling on PPC64 as well as any architectural limitations on KVA space.
Second, it lets the kernel tolerate poorly designed hypervisors that
have a tendency to fail to restore the SLB properly after a hypervisor
context switch.
MFC after: 6 weeks
curthread-accessing part of mtx_{,un}lock(9) when using a r210623-style
curthread implementation on sparc64, crashing the kernel in its early
cycles as PCPU isn't set up, yet (and can't be set up as OFW is one of the
things we need for that, which leads to a chicken-and-egg problem). What
happens is that due to the fact that the idea of r210623 actually is to
allow the compiler to cache invocations of curthread, it factors out
obtaining curthread needed for both mtx_lock(9) and mtx_unlock(9) to
before the branch based on kobj_mutex_inited when compiling the kernel
without the debugging options. So change kobj_class_compile_static(9)
to just never acquire kobj_mtx, effectively restricting it to its
documented use, and add a kobj_init_static(9) for initializing objects
using a class compiled with the former and that also avoids using mutex(9)
(and malloc(9)). Also assert in both of these functions that they are
used in their intended way only.
While at it, inline kobj_register_method() and kobj_unregister_method()
as there wasn't much point for factoring them out in the first place
and so that a reader of the code has to figure out the locking for
fewer functions missing a KOBJ_ASSERT.
Tested on powerpc{,64} by andreast.
Reviewed by: nwhitehorn (earlier version), jhb
MFC after: 3 days
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
It is reported that on some chips (e.g. the 970MP) behavior of POW bit set
simultaneously with modifying other bits is undefined and may cause hangs.
The race should be handled in some other way, but for now just get back.
Reported by: nwitehorn
into sleep after receiving interrupt, delaying interrupt thread execution
indefinitely until the next interrupt arrive.
Reviewed by: nwhitehorn
MFC after: 3 days
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.
Reviewed by: rwatson
Approved by: re (bz)
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.
Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)
instead of a PCPU field for curthread. This averts a race on SMP systems
with a high interrupt rate where the thread looking up the value of
curthread could be preempted and migrated between obtaining the PCPU
pointer and reading the value of pc_curthread, resulting in curthread being
observed to be the current thread on the thread's original CPU. This played
merry havoc with the system, in particular with mutexes. Many thanks to
jhb for helping me work this one out.
Note that Book-E is in principle susceptible to the same problem, but has
not been modified yet due to lack of Book-E hardware.
MFC after: 2 weeks
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.
Reviewed by: jhb
CPU. This fixes a panic observed on Heathrow-based systems without
SMP-capable PICs when the kernel had both options SMP and INVARIANTS.
MFC after: 5 days
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.
I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).
Sponsored by: Sandvine Incorporated
Discussed with: emaste, des
MFC after: 2 weeks
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.
Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.
While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.
Discussed with: kib
MFC after: 2 Week
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().
Change several checks for a magic number of iterations to use
should_yield() instead.
MFC after: 1 week
already supported nested PICs, but was limited to having a nested
AT-PIC only. With G5 support the need for nested OpenPIC controllers
needed to be added. This was done the wrong way and broke the MPC8555
eval system in the process.
OFW, as well as FDT, describe the interrupt routing in terms of a
controller and an interrupt pin on it. This needs to be mapped to a
flat and global resource: the IRQ. The IRQ is the same as the PCI
intline and as such needs to be representable in 8 bits. Secondly,
ISA support pretty much dictates that IRQ 0-15 should be reserved
for ISA interrupts, because of the internal workins of south bridges.
Both were broken.
This change reverts revision 209298 for a big part and re-implements
it simpler. In particular:
o The id() method of the PIC I/F is removed again. It's not needed.
o The openpic_attach() function has been changed to take the OFW
or FDT phandle of the controller as a second argument. All bus
attachments that previously used openpic_attach() as the attach
method of the device I/F now implement as bus-specific method
and pass the phandle_t to the renamed openpic_attach().
o Change powerpc_register_pic() to take a few more arguments. In
particular:
- Pass the number of IPIs specificly. The number of IRQs carved
out for a PIC is the sum of the number of int. pins and IPIs.
- Pass a flag indicating whether the PIC is an AT-PIC or not.
This tells the interrupt framework whether to assign IRQ 0-15
or some other range.
o Until we implement proper multi-pass bus enumeration, we have to
handle the case where we need to map from PIC+pin to IRQ *before*
the PIC gets registered. This is done in a similar way as before,
but rather than carving out 256 IRQs per PIC, we carve out 128
IRQs (124 pins + 4 IPIs). This is supposed to handle the G5 case,
but should really be fixed properly using multiple passes.
o Have the interrupt framework set root_pic in most cases and not
put that burden in PIC drivers (for the most part).
o Remove powerpc_ign_lookup() and replace it with powerpc_get_irq().
Remove IGN_SHIFT, INTR_INTLINE and INTR_IGN.
Related to the above, fix the Freescale PCI controller driver, broken
by the FDT code. Besides not attaching properly, bus numbers were
assigned improperly and enumeration was broken in general. This
prevented the AT PIC from being discovered and interrupt routing to
work properly. Consequently, the ata(4) controller stopped functioning.
Fix the driver, and FDT PCI support, enough to get the MPC8555CDS
going again. The FDT PCI code needs a whole lot more work.
No breakages are expected, but lackiong G5 hardware, it's possible
that there are unpleasant side-effects. At least MPC85xx support is
back to where it was 7 months ago -- it's amazing how badly support
can be broken in just 7 months...
Sponsored by: Juniper Networks
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.
MFC after: 1 month
hypervisor infrastructure support:
- Fix coexistence of multiple platform modules in the same kernel
- Allow platform modules to provide an SMP topology
- PowerPC hypervisors limit the amount of memory accessible in real mode.
Allow the platform modules to specify the maximum real-mode address,
and modify the bits of the kernel that need to allocate
real-mode-accessible buffers to respect this limits.
MSR[DEEPNAP] be set, not just MSR[DEEPNAP]. Fixing this reduces the idle
temperature of my CPUs from 57 to 38 degrees and makes one-shot timer
mode work properly.
Hint from: mav
MFC after: 4 days
concurrency bug. Since all SLB/SR entries were invalidated during an
exception, a decrementer exception could cause the user segment to be
invalidated during a copyin()/copyout() without a thread switch that
would cause it to be restored from the PCB, potentially causing the
operation to continue on invalid memory. This is now handled by explicit
restoration of segment 12 from the PCB on 32-bit systems and a check in
the Data Segment Exception handler on 64-bit.
While here, cause copyin()/copyout() to check whether the requested
user segment is already installed, saving some pipeline flushes, and
fix the synchronization primitives around the mtsr and slbmte
instructions to prevent accessing stale segments.
MFC after: 2 weeks