don't try and convert the argument flags to malloc flags, or we risk
implicitly requesting blocking and generating witness warnings.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
When enabled, this causes m_defrag to randomly return NULL (following
its normal failure case so that extra memory leaks are not introduced.)
Code similar to this was used to find / fix a few bugs last week.
returning some additional room in the first mbuf in a chain, and
avoiding feature-specific contents in the mbuf header. To do this:
- Modify mbuf_to_label() to extract the tag, returning NULL if not
found.
- Introduce mac_init_mbuf_tag() which does most of the work
mac_init_mbuf() used to do, except on an m_tag rather than an
mbuf.
- Scale back mac_init_mbuf() to perform m_tag allocation and invoke
mac_init_mbuf_tag().
- Replace mac_destroy_mbuf() with mac_destroy_mbuf_tag(), since
m_tag's are now GC'd deep in the m_tag/mbuf code rather than
at a higher level when mbufs are directly free()'d.
- Add mac_copy_mbuf_tag() to support m_copy_pkthdr() and related
notions.
- Generally change all references to mbuf labels so that they use
mbuf_to_label() rather than &mbuf->m_pkthdr.label. This
required no changes in the MAC policies (yay!).
- Tweak mbuf release routines to not call mac_destroy_mbuf(),
tag destruction takes care of it for us now.
- Remove MAC magic from m_copy_pkthdr() and m_move_pkthdr() --
the existing m_tag support does all this for us. Note that
we can no longer just zero the m_tag list on the target mbuf,
rather, we have to delete the chain because m_tag's will
already be hung off freshly allocated mbuf's.
- Tweak m_tag copying routines so that if we're copying a MAC
m_tag, we don't do a binary copy, rather, we initialize the
new storage and do a deep copy of the label.
- Remove use of MAC_FLAG_INITIALIZED in a few bizarre places
having to do with mbuf header copies previously.
- When an mbuf is copied in ip_input(), we no longer need to
explicitly copy the label because it will get handled by the
m_tag code now.
- No longer any weird handling of MAC labels in if_loop.c during
header copies.
- Add MPC_LOADTIME_FLAG_LABELMBUFS flag to Biba, MLS, mac_test.
In mac_test, handle the label==NULL case, since it can be
dynamically loaded.
In order to improve performance with this change, introduce the notion
of "lazy MAC label allocation" -- only allocate m_tag storage for MAC
labels if we're running with a policy that uses MAC labels on mbufs.
Policies declare this intent by setting the MPC_LOADTIME_FLAG_LABELMBUFS
flag in their load-time flags field during declaration. Note: this
opens up the possibility of post-boot policy modules getting back NULL
slot entries even though they have policy invariants of non-NULL slot
entries, as the policy might have been loaded after the mbuf was
allocated, leaving the mbuf without label storage. Policies that cannot
handle this case must be declared as NOTLATE, or must be modified.
- mac_labelmbufs holds the current cumulative status as to whether
any policies require mbuf labeling or not. This is updated whenever
the active policy set changes by the function mac_policy_updateflags().
The function iterates the list and checks whether any have the
flag set. Write access to this variable is protected by the policy
list; read access is currently not protected for performance reasons.
This might change if it causes problems.
- Add MAC_POLICY_LIST_ASSERT_EXCLUSIVE() to permit the flags update
function to assert appropriate locks.
- This makes allocation in mac_init_mbuf() conditional on the flag.
Reviewed by: sam
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
mbuf_to_label(). This permits the vast majority of entry point code
to be unaware that labels are stored in m->m_pkthdr.label, such that
we can experiment storage of labels elsewhere (such as in m_tags).
Reviewed by: sam
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
the current queue if its priority is really elevated. This needs more work
as there are cases where a next queue kse could be holding up what would
be a curr queue kse, and thus hurting interactivity. Also, when a thread
with an elevated priority has its priority lowered it should be placed
back on the next queue.
the target process exiting which causes attempts to register the kevent
to randomly fail depending on whether the target runs to completion before
the parent can call kevent(2). The bug actually effects EVFILT_PROC
events on any zombie process, but the most common manifestation is with
parents trying to monitor child processes.
MFC after: 2 weeks
Sponsored by: NTT Multimedia Communications Labs
the second kseq's run queue so that it is referenced by the kse when
it is switched out.
- Spell ksq_rslices properly.
Reported by: Ian Freislich <ianf@za.uu.net>
- Allow user adjustable min and max time slices (suggested by hiten).
- Change the SLP_RUN_MAX to 100ms from 2 seconds so that we learn whether a
process is interactive or not much more quickly.
- Place a process on the current run queue if it is interactive or if it is
running at an interrupt thread priority due to priority prop.
- Use the 'current' timeshare queue for interrupt threads, realtime threads,
and idle threads that are running at higher priority due to priority prop.
This fixes problems where priorities would have been elevated but we would
not check the timeshare run queue until other lower priority tasks were
no longer runnable.
- Keep an array of loads indexed by the priority class as well as a global
load.
- Keep an bucket of nice values with a count of the number of kses currently
runnable with that nice value.
- Keep track of the minimum nice value of any running thread.
- Remove the unused short term sleep accounting. I was attempting to use
this for load balancing but it didn't work out.
- Define a kseq_print() for use with debugging.
- Add KTR debugging at useful places so we can easily debug slice and
priority assignment.
- Decouple the runq assignment from the kseq assignment. kseq_add now keeps
track of statistics. This is done so that the nice and load is still
tracked for the currently running process. Previously if a niced process
was added while a non nice process was running the niced process would
still get a slice since it was not aware of the unnice process.
- Make adjustments for the sched api changes.
of ksegs since they primarily operation on processes.
- KSEs take ticks so pass the kse through sched_clock().
- Add a sched_class() routine that adjusts a ksegrp pri class.
- Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp}
that will be used to tell the scheduler about new instances of these
structures within the same process. These will be used by THR and KSE.
- Change sched_4bsd to reflect this API update.
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.
Reviewed by: rwatson, tjr
of asserting that an mbuf has a packet header. Use it instead of hand-
rolled versions wherever applicable.
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
- Treat each class specially in kseq_{choose,add,rem}. Let the rest of the
code be less aware of scheduling classes.
- Skip the interactivity calculation for non TIMESHARE ksegrps.
- Move slice and runq selection into kseq_add(). Uninline it now that it's
big.
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.
The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.
Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.
This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.
The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.
One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.
Glanced at by: jake, jhb
to select a KSE with a slice of 0 we will update its slice and insert it
onto the next queue.
- Pass the KSE instead of the ksegrp into sched_slice(). This more
accurately reflects the behavior of the code. Slices are granted to kses.
- Add a function kseq_nice_min() which finds the smallest nice value
assigned to the kseg of any KSE on the queue.
- Rewrite the logic in sched_slice(). Add a large comment describing the
new slice selection scheme. To summarize, slices are assigned based on
the nice value. Priorities are still calculated based on the nice and
interactivity of a process. Slice sizes of 0 may be granted for KSEs
whos nice is 20 or futher away from the lowest nice on the run queue.
Other nice values are scaled across the range [min, min+20]. This fixes
ULEs bad behavior with positively niced processes.