kld_unload event handler which gets invoked after a linker file has been
successfully unloaded. The kld_unload and kld_load event handlers are now
invoked with the shared linker lock held, while kld_unload_try is invoked
with the lock exclusively held.
Convert hwpmc(4) to use these event handlers instead of having
kern_kldload() and kern_kldunload() invoke hwpmc(4) hooks whenever files are
loaded or unloaded. This has no functional effect, but simplifes the linker
code somewhat.
Reviewed by: jhb
with rmlocks. This works only with non-sleepable rm because handlers run
in SWI context. While here, document the new KPI in the timeout(9)
manpage.
Requested by: adrian, scottl
Reviewed by: mav, remko(manpage)
existing examples to not pass an mbuf as a probe argument. There's no
obvious reason to have it there, and it doesn't really jibe with the example
added in this revision.
MFC after: 1 week
called after the module has been loaded, and the unload handlers are called
before the module is unloaded. Moreover, the module unload handlers may
return an error to prevent the unload from proceeding.
Reviewed by: avg
MFC after: 2 weeks
Now the MTX_RECURSE flag can be passed to the mtx_*_flag() calls.
This helps in cases we want to narrow down to specific calls the
possibility to recurse for some locks.
Sponsored by: EMC / Isilon storage division
Reviewed by: jeff, alc
Tested by: pho
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.
Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag
The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.
Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl
some general word-smithing.
- Don't claim that adaptive mutexes have a timeout (they don't).
- Don't treat pool mutexes as a separate primitive in a few places.
- Describe sleepable read-mostly locks as a separate lock type and add
them to the various tables.
- Don't claim that sx locks are less efficient. That hasn't been true in
a few years now.
- Describe lockmanager locks next to sx locks since they are very similar
in terms of rules, etc., and so that all the lock primitives are
grouped together before the non-lock primitives.
- Similarly, move the section on Giant after the description of all the
non-lock primitives to preserve grouping.
- Condition variables work on several types of locks, not just mutexes.
- Add a bit of language to compare/contrast condition variables with
sleep/wakeup.
- Add a note about why pause(9) is unique.
- Add some language to define bounded vs unbounded sleeps and explain
why they are treated separately (bounded sleeps only need CPU time
to make forward progress).
- Don't state that using mtx_sleep() is a bad idea. It is in fact rather
necessary.
- Rework the interaction table a bit. First, it did not include really
include sleepable rmlocks and it left out lockmgr entirely. To get
things to fit, combine similar lock types into the same column / row,
and explicitly state what "sleep" means. The notes about recursion
and lock order were also a bit banal (lock order is always important,
not just in the few places annotated here), so remove them. In
particular, the lock order note would need to be on just about every
cell. If we want to document recursion I think a better approach
would be a separate table summarizing the recursion rules for each
lock as having too many notes clutters the table.
- Tweak the tables to use less indentation so everything still fits with
the added columns.
- Correct a few cells in the context mode table.
- Use mdoc markup instead of explicit markup in a few places.
Requested by: julian
MFC after: 2 weeks
provided by Isilon.
- Add an rm_assert() supporting various lock assertions similar to other
locking primitives. Because rmlocks track readers the assertions are
always fully accurate unlike rw_assert() and sx_assert().
- Flesh out the lock class methods for rmlocks to support sleeping via
condvars and rm_sleep() (but only while holding write locks), rmlock
details in 'show lock' in DDB, and the lc_owner method used by
dtrace.
- Add an internal destroyed cookie so that API functions can assert
that an rmlock is not destroyed.
- Make use of rm_assert() to add various assertions to the API (e.g.
to assert locks are held when an unlock routine is called).
- Give RM_SLEEPABLE locks their own lock class and always use the
rmlock's own lock_object with WITNESS.
- Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping
while holding a read lock on an rmlock.
Submitted by: andre
Obtained from: EMC/Isilon
Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.
See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.
In collaboration with: kib
Reviewed by: luigi
Tested by: ae, ray
Sponsored by: Nginx, Inc.
These zones have slab size == sizeof(struct pcpu), but request from VM
enough pages to fit (uk_slabsize * mp_ncpus). An item allocated from such
zone would have a separate twin for each CPU in the system, and these twins
are at a distance of sizeof(struct pcpu) from each other. This magic value
of distance would allow us to make some optimizations later.
To address private item from a CPU simple arithmetics should be used:
item = (type *)((char *)base + sizeof(struct pcpu) * curcpu)
These arithmetics are available as zpcpu_get() macro in pcpu.h.
To introduce non-page size slabs a new field had been added to uma_keg
uk_slabsize. This shifted some frequently used fields of uma_keg to the
fourth cache line on amd64. To mitigate this pessimization, uma_keg fields
were a bit rearranged and least frequently used uk_name and uk_link moved
down to the fourth cache line. All other fields, that are dereferenced
frequently fit into first three cache lines.
Sponsored by: Nginx, Inc.
The scope of these callbacks is primarily to support actions that affect the
taskqueue's thread environments. They are entirely optional, and
consequently are introduced as a new API: taskqueue_set_callback().
This interface allows the caller to specify that a taskqueue requires a
callback and optional context pointer for a given callback type.
The callback types included in this commit can be used to register a
constructor and destructor for thread-local storage using osd(9). This
allows a particular taskqueue to define that its threads require a specific
type of TLS, without the need for a specially-orchestrated task-based
mechanism for startup and shutdown in order to accomplish it.
Two callback types are supported at this point:
- TASKQUEUE_CALLBACK_TYPE_INIT, called by every thread when it starts, prior
to processing any tasks.
- TASKQUEUE_CALLBACK_TYPE_SHUTDOWN, called by every thread when it exits,
after it has processed its last task but before the taskqueue is
reclaimed.
While I'm here:
- Add two new macros, TQ_ASSERT_LOCKED and TQ_ASSERT_UNLOCKED, and use them
in appropriate locations.
- Fix taskqueue.9 to mention taskqueue_start_threads(), which is a required
interface for all consumers of taskqueue(9).
Reviewed by: kib (all), eadler (taskqueue.9), brd (taskqueue.9)
Approved by: ken (mentor)
Sponsored by: Spectra Logic
MFC after: 1 month