Improve descriptions, remove turnstiles (since, from what I understand,
they are only used to implement other synchronization primitives), tweak
formatting.
MFC r203127:
Add description of bounded sleep vs unbounded sleep (aka blocking). Move
rules into their own section.
MFC r203131:
Cosmetic fixes.
MFC r203759:
Improve description for Giant and mention blocking inside interrupt threads.
MFC r203762:
Start sentences with a new line.
Submitted by: brueffer
MFC r203825:
Remove list of locking primitives, which is kind of redundant, move
information about witness(9) to the section about interactions, and
expand 'contexts' table.
MFC r203929:
Some rewording and language fixes.
PR: docs/136918, docs/134074
Submitted by: Ben Kaduk <kaduk at mit dot edu>, Haven Hash <havenster at gmail dot com>
Introduce the new kernel thread called "deadlock resolver".
It is used in order to seek within the threads state and heuristically
understand if there is any deadlock happening.
In order to implement it, the sq_type in sleepqueues is mandatory and not
only compiled along with INVARIANTS option. Additively, a new sleepqueue
function, sleepq_type() is added, returning the type of the sleepqueue
linked to a wchan.
Three new sysctls are added in order to configure the thread:
debug.deadlkres.slptime_threshold
debug.deadlkres.blktime_threshold
debug.deadlkres.sleepfreq
rappresenting the thresholds for sleep and block time that will lead to
a deadlock matching (when exceeded), while the sleepfreq rappresents the
number of seconds between 2 consecutive thread runnings.
In order to enable the deadlock resolver thread recompile your kernel
with the option DEADLKRES.
Sponsored by: Sandvine Incorporated
Add a facility for associating optional descriptions with active interrupt
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64, i386, and sparc64 by
having the nexus(4) driver supply a custom bus_describe_intr method that
invokes a new intr_describe() MD routine which in turn looks up the
associated interrupt event and invokes intr_event_describe_handler().
- Note that if_xname, if_dname, and if_dunit are usually initialized via
if_initname().
- Document if_drv_flags and replace references to IFF_(RUNNING|OACTIVE)
with references to IFF_DRV_(RUNNING|OACTIVE).
- Complete truncated sentence in the description of if_transmit by copying
from the description in if_qflush.
- Add missing line breaks for translators.
- Update required headers for namei() to add <sys/fcntl.h> and remove
<sys/proc.h>.
- Add RETURN VALUES and ERROR sections for namei()'s error return values.
- Add a missing link to NDHASGIANT.9.
In current code, threads performing an interruptible sleep
will leave the waiters flag on forcing the owner to do a wakeup even
when the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" queue while the other queue has real waiters on it,
because nobody is going to wakeup the 2nd queue waiters and they will
sleep indefinitively.
A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on.
Add a sleepqueue interface which does report the actual number of waiters
on a specified queue of a waitchannel and track if at least one sleepfail
waiter is present or not. In presence of this or empty "preferred" queue,
wakeup both waiters queues.
Discussed with: kib
Tested by: Pete French <petefrench at ticketswitch dot com>,
Justin Head <justin at encarnate dot com>
Add clarifications to the kproc and kthread manpages and link
the kthread_create(9) man page to the kproc(9) page as it has migrated and
people looking for it may need a hand to find its new name.
Approved by: re (kib)
This patch fixes two bugs in sglist(9) and improves robustness of the API via
better semantics if a request to append an address range to an existing list
fails.
- When cloning an sglist, properly set the length in the new sglist instead of
leaving the new list empty.
- Properly compute the amount of data added to an sglist via
_sglist_append_buf(). This allows sglist_consume_uio() to properly update
uio_resid.
- When a request to append an address range to a scatter/gather list fails,
restore the sglist to the state it had at the start of the function call
instead of resetting it to an empty list.
Approved by: re (kib)
Change the 'resid' parameter to sglist_consume_uio() from an int to a
size_t to match the recent type change of the uio_resid member of struct
uio.
Approved by: re (kib)
Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.
Approved by: re (kib)
First (early) draft of net80211 documentation. Note this is
focused on driver writers (as opposed to folks adding to net80211).
Reviewed by: wkoszek
Approved by: re (rwatson)
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
from going away, under INVARIANTS as this is a general problem
of the stack and should be solved in if.c/netisr but still good
to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.
Hook epair(4) up to the build.
Approved by: re (kib)
- update for getrlimit(2) manpage;
- support for setting RLIMIT_SWAP in login class;
- addition to the limits(1) and sh and csh limit-setting builtins;
- tuning(7) documentation on the sysctls controlling overcommit.
In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)
Actually, as it did receive few tuning, the support is disabled by
default, but it can opt-in with the option ADAPTIVE_LOCKMGRS.
Due to the nature of lockmgrs, adaptive spinning needs to be
selectively enabled for any interested lockmgr.
The support is bi-directional, or, in other ways, it will work in both
cases if the lock is held in read or write way. In particular, the
read path is passible of further tunning using the sysctls
debug.lockmgr.retries and debug.lockmgr.loops . Ideally, such sysctls
should be axed or compiled out before release.
Addictionally note that adaptive spinning doesn't cope well with
LK_SLEEPFAIL. The reason is that many (and probabilly all) consumers
of LK_SLEEPFAIL are mainly interested in knowing if the interlock was
dropped or not in order to reacquire it and re-test initial conditions.
This directly interacts with adaptive spinning because lockmgr needs
to drop the interlock while spinning in order to avoid a deadlock
(further details in the comments inside the patch).
Final note: finding someone willing to help on tuning this with
relevant workloads would be either very important and appreciated.
Tested by: jeff, pho
Requested by: many
queue was drained. It will never fire for a directly dispatched packet.
You will most likely never want to use this for any ordinary netisr usage
and you will never blame netisr in case you try to use it and it does
not work as expected.
Reviewed by: rwatson
probe. The current device order is unchanged. This commit just adds the
infrastructure and ABI changes so that it is easier to merge later changes
into 8.x.
- Driver attachments now have an associated pass level. Attachments are
not allowed to probe or attach to drivers until the system-wide pass level
is >= the attachment's pass level. By default driver attachments use the
"last" pass level (BUS_PASS_DEFAULT). Driver's that wish to probe during
an earlier pass use EARLY_DRIVER_MODULE() instead of DRIVER_MODULE() which
accepts the pass level as an additional parameter.
- A new method BUS_NEW_PASS has been added to the bus interface. This
method is invoked when the system-wide pass level is changed to kick off
a rescan of the device tree so that drivers that have just been made
"eligible" can probe and attach.
- The bus_generic_new_pass() function provides a default implementation of
BUS_NEW_PASS(). It first allows drivers that were just made eligible for
this pass to identify new child devices. Then it propogates the rescan to
child devices that already have an attached driver by invoking their
BUS_NEW_PASS() method. It also reprobes devices without a driver.
- BUS_PROBE_NOMATCH() is only invoked for devices that do not have
an attached driver after being scanned during the final pass.
- The bus_set_pass() function is used during boot to raise the pass level.
Currently it is only called once during root_bus_configure() to raise
the pass level to BUS_PASS_DEFAULT. This has the effect of probing all
devices in a single pass identical to previous behavior.
Reviewed by: imp
Approved by: re (kib)
Each list describes a logical memory object that is backed by one or more
physical address ranges. To minimize locking, the sglist objects
themselves are immutable once they are shared.
These objects may be used in the future to facilitate I/O requests using
physically-addressed buffers. For the immediate future I plan to use them
to implement a new type of VM object and pager.
Reviewed by: jeff, scottl
MFC after: 1 month
permissions, such as VWRITE_ACL. For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.
Reviewed by: rwatson@