The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.
Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
These take an additional mutex argument, which is dropped before any
processes are made runnable. This can avoid contention on the mutex
if the processes would immediately acquire it, and is done in such a
way that wakeups will not be lost.
Reviewed by: jhb
We already did this in the SMP case, and it is now maintained in the UP
case as well, and makes the code slightly more readable. Note that
curproc is always executing, thus the p != curproc test does not need to
be performed if the p_oncpu check is made.
We don't actually need to force a context switch of the current process.
The act of firing the event triggers a context switch to softclock() and
then switching back out again which is equivalent to a preemption, thus
no further work is needed on the local CPU.
allow call-specific authorization.
o Modify the authorization model so that p_can() is used to check
scheduling get/set events, using P_CAN_SEE for gets, and P_CAN_SCHED
for sets. This brings the checks in line with get/setpriority().
Obtained from: TrustedBSD Project
- The sx assertions don't actually need the internal sx mutex lock, so
don't bother doing so.
- Add a new assertion SX_ASSERT_LOCKED() that asserts that either a
shared or exclusive lock should be held. This assertion should be used
instead of SX_ASSERT_SLOCKED() in almost all cases.
- Adjust some KASSERT()'s to include file and line information.
- Use the new witness_assert() function in the WITNESS case for sx slock
asserts to verify that the current thread actually owns a slock.
- Clean up the KTR tracepoints to be slighlty more consistent and useful
- Fix a bug in WITNESS where we would recurse indefinitely and blow the
stack when acquiring Giant after sleeping with a sleepable lock held.
Reported by: tanimura (3)
processes.
- Don't construct fake call args and then call kill(). psignal is not
anymore complicated and is quicker and not prone to locking problems.
Calling psignal() avoids having to do a pfind() since we already have a
proc pointer and also allows us to keep the task leader locked while we
kill all the peer processes so the list is kept coherent.
- When a kthread exits, do a wakeup() on its proc pointers. This can be
used by kernel modules that have kthreads and want to ensure they have
safely exited before completely the MOD_UNLOAD event.
Connectivity provided by: Usenix wireless
may need the clock lock for nanotime().
- Add KTR trace events for lock list manipulations and other witness
operations.
- Use a temporary variable instead of setting the lock list head directly
and then setting up the links to add a new lock list entry to the lock
list. This small race could result in witness "forgetting" about all
the locks held by this process temporarily during an interrupt.
- Close a more fatal race condition when removing a lock from a list.
Removing a lock from the list entails both decrementing the count of
items in this bucket as well as shuffling items in the current bucket up
a notch to replace the gap left by the removed item. Wrap these
operations in a critical section.
class to trace witness events.
- Make the ktr_cpu field of ktr_entry be a standard field rather than one
present only in the KTR_EXTEND case.
- Move the default definition of KTR_ENTRIES from sys/ktr.h to
kern/kern_ktr.c. It has not been needed in the header file since KTR
was un-inlined.
- Minor include cleanup in kern/kern_ktr.c.
- Fiddle with the ktr_cpumask in ktr_tracepoint() to disable KTR events
on the current CPU while we are processing an event.
- Set the current CPU inside of the critical section to ensure we don't
migrate CPU's after the critical section but before we set the CPU.
switch. Count the context switch when preempting the current thread to let
a higher priority thread blocked on a mutex we just released run as an
involuntary context switch.
Reported by: bde
people are on track with the cause and effect of this, and although
fixing this severely degenerate case appears to violate the letter of
POSIX.1-200x, Bruce and I (and enough others) agree that it should be
comitted.
So, this patch generates an ENOENT error for any attempt to do a path lookup
through an empty symlink (e.g. open(), stat()).
Submitted by: "Andrey A. Chernov" <ache@nagual.pp.ru>
Reviewed by: bde
Discussed exhaustively on: freebsd-current
Previously committed to: NetBSD 4 years ago
- Grab Giant around ktrace points.
- Clean up KTR_PROC tracepoints to not display the value of
sched_lock.mtx_lock as it isn't really needed anymore and just obfuscates
the messages.
- Add a few if conditions to replace gotos.
- Ensure that every msleep KTR event ends up with a matching msleep resume
KTR event (this was broken when we didn't do a mi_switch()).
- Only note via ktrace that we resumed from a switch once rather than twice
in several places in msleep().
- Remove spl's rom asleep and await as the proc lock and sched_lock provide
all the needed locking.
- In mawait() add in a needed ktrace point for noting that we are about to
switch out.
lock until after grabbing the sched_lock to avoid CURSIG racing with
psignal.
- Don't grab Giant for addupc_task() as it isn't needed.
Reported by: tegge (signal race), bde (addupc_task a while back)
rather than grabbing it and releasing it themselves. This allows callers
of these functions to get the lock to close race conditions.
- Grab Giant around ktrace in postsig.
- Count the switches performed on SIGSTOP's as involuntary context switches
in the resource usage stats.
Reported by: tegge (signal race), bde (missing csw stats)
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:
o Reduce contention for SMP by offering per-CPU pools and locks.
o Better use of data cache due to per-CPU pools.
o Much less code cache pollution due to excessively large allocation macros.
o Framework for `grouping' objects from same page together so as to be able
to possibly free wired-down pages back to the system if they are no longer
needed by the network stacks.
Additional things changed with this addition:
- Moved some mbuf specific declarations and initializations from
sys/conf/param.c into mbuf-specific code where they belong.
- m_getclr() has been renamed to m_get_clrd() because the old name is really
confusing. m_getclr() HAS been preserved though and is defined to the new
name. No tree sweep has been done "to change the interface," as the old
name will continue to be supported and is not depracated. The change was
merely done because m_getclr() sounds too much like "m_get a cluster."
- TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
systat(1) (see TODO below).
- Fixed systat(1) to display number of "free mbufs" based on new per-CPU
stat structures.
- Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
per-CPU stat structures. All infos are fetched via sysctl.
TODO (in order of priority):
- Re-enable mbtypes statistics in both netstat(1) and systat(1) after
introducing an SMP friendly way to collect the mbtypes stats under the
already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
seems too costly for a mere stat update, especially when other locks are
already present).
- Optionally have systat(1) display not only "total free mbufs" but also
"total free mbufs per CPU pool."
- Fix minor length-fetching issues in netstat(1) related to recently
re-enabled option to read mbuf stats from a core file.
- Move reference counters at least for mbuf clusters into an unused portion
of the cluster itself, to save space and need to allocate a counter.
- Look into introducing resource freeing possibly from a kproc.
Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
lock. We now use temporary variables to save the process argument pointer
and just update the pointer while holding the lock. We then perform the
free on the cached pointer after releasing the lock.
something: offset into the first mbuf of the target chain before copying
the source data over.
Make drivers using m_devget() with a first argument "data - ETHER_ALIGN"
to use the offset argument to pass ETHER_ALIGN in. The way it was previously
done is potentially dangerous if the source data was at the top of a page
and the offset caused the previous page to be copied (if the
previous page has not yet been appropriately mapped).
The old `offset' argument in m_devget() is not used anywhere (it's always
0) and dates back to ~1995 (and earlier?) when support for ethernet trailers
existed. With that support gone, it was merely collecting dust.
Tested on alpha by: jlemon
Partially submitted by: jlemon
Reviewed by: jlemon
MFC after: 3 weeks