interrupts.
Protect usage of the per processor switchtime variable against
interrupts in calcru().
This seem to eliminate the "microuptime() went backwards" warnings.
the the original trapframe of the syscall, trap, or interrupt that entered
the kernel. Before SMPng, ast's were handled via a psuedo trap at the
end of doerti. With the SMPng commit, ast's were broken out into a
separate ast() function that was called from doreti to match the behavior
of other architectures. Unfortunately, when this was done, the
p_md.md_regs member of curproc was not updateda in ast(), thus when
signals are handled by userret() after an interrupt that returns to
userland, we end up using a stale trapframe that will result in the
registers from the old trapframe overwriting the real trapframe and
smashing all the registers right before we return to usermode. The saved
%cs:%eip from where we were in usermode are saved in the trapframe for
example.
- Don't use an atomic operation to update cnt.v_soft in ast(). This is
the only place the variable is written to, and sched_lock is always
held when it is written, so it is already protected and the mutex release
of sched_lock asserts a memory barrier that ensures the value will be
updated in a timely fashion.
- Don't hold sched_lock around addupc_task() as this apparently breaks
profiling badly due to sched_lock being held across copyin().
Reported by: bde (2)
an interrupt thread while the interrupt thread is blocked on Giant waiting
to execute the interrupt handler being removed. The result was that the
intrhand structure would be free'd, and we would call 0xdeadc0de. The work
around is to check to see if the interrupt thread is idle when removing a
handler. If not, then we mark the interrupt handler as being dead using
the new IH_DEAD flag and don't remove it from the interrupt threads' list
of handlers. When the interrupt thread resumes, it will see a dead handler
while traversing the list of handlers and will remove the handler then.
work because opt_preemption.h wasn't #include'd. Instead, make use of the
do_switch parameter to ithread_schedule() and do the check in the alpha
interrupt code.
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.
Notes:
o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.
Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project
filename insteada of copying the first 32 characters of it.
- Add in const modifiers for the passed in format strings and filenames
and their respective members in the ktr_entry struct.
scheduling an interrupt thread to run when needed. This has the side
effect of enabling support for entropy gathering from interrupts on
all architectures.
- Change the software interrupt and x86 and alpha hardware interrupt code
to use ithread_schedule() for most of their processing when scheduling
an interrupt to run.
- Remove the pesky Warning message about interrupt threads having entropy
enabled. I'm not sure why I put that in there in the first place.
- Add more error checking for parameters and change some cases that
returned EINVAL to panic on failure instead via KASSERT().
- Instead of doing a documented evil hack of setting the P_NOLOAD flag
on every interrupt thread whose pri was SWI_CLOCK, set the flag
explicity for clk_ithd's proc during start_softintr().
- Add pager capability to the 'show ktr' command. It functions much like
'ps': Enter at the prompt displays one more entry, Space displays
another page, and any other key quits.
This is useful when doing copies of packet where some leading
space has been preallocated to insert protocol headers.
Note that there are in fact almost no users of m_copypacket.
MFC candidate.
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.
is sent to a process, psignal() needs to schedule an AST for the
process if the process is runnable, not just if it is current, so that
pending signals get checked for on the next return of the process to
user mode. This wasn't practical until recently because the AST flag
was per-cpu so setting it for a non-current process would usually just
cause a bogus AST for the current process.
For non-current processes looping in user mode, it took accidental
(?) magic to deliver signals at all. Signals were usually delivered
late as a side effect of rescheduling (need_resched() sets astpending,
etc.). In pre-SMPng, delivery was delayed by at most 1 quantum (the
need_resched() call in roundrobin() is certain to occur within 1
quantum for looping processes). In -current, things are complicated
by normal interrupt handlers being threads. Missing handling of the
complications makes roundrobin() a bogus no-op, but preemptive
scheduling sort of works anyway due to even larger bogons elsewhere.
always on curproc. This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).
Debogotified the definition of aston(). aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.
Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.
actually in the kernel. This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there. As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).
This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size. This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.
Reviewed by: bde
lookup vop so that it defaulted to using vop_eopnotsupp for strange
lookups like the ones for open("/dev/null/", ...) and stat("/dev/null/",
...). This mainly caused the wrong errno to be returned by vfs syscalls
(EOPNOTSUPP is not in POSIX, and is not documented in connection with
specfs in open.2 and is not documented in stat.2 at all). Also, lookup
vops are apparently required to set *ap->a_vpp to NULL on error, but
vop_eopnotsupp is too broken to do this.
allocation, as required.
If m_getm() receives NULL as a first argument, then it allocates `len'
(second argument) bytes worth of mbufs + clusters and returns the chain
only if it was able to allocate everything.
If the first argument is non-NULL, then it should be an existing mbuf
chain (e.g. pre-allocated mbuf sitting on a ring, on some list, etc.) and
so it will allocate `len' bytes worth of clusters and mbufs, as needed,
and append them to the tail of the passed in chain, only if it was able
to allocate everything requested.
If allocation fails, only what was allocated by the routine will be freed,
and NULL will be returned.
Also, get rid of existing m_getm() in netncp code and replace calls to it
to calls to this new generic code.
Heavily Reviewed by: bp
one the number of variables needed for top and other setgid kmem
utilities that could only be accessed via /dev/kmem previously.
Submitted by: Thomas Moestl <tmoestl@gmx.net>
Reviewed by: freebsd-audit
- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.
Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb
and function argument declarations. Make sure that functions that are
supposed to return a pointer return NULL in case of failure. Don't cast
NULL. Finally, get rid of annoying `register' uses.
tracing in order to avoid duplication.
- Insert some tracepoints back into the mutex acq/rel code, thus ensuring
that we can trace all lock acq/rel's again.
- All CURPROC != NULL checks are MPASS()es (under MUTEX_DEBUG) because they
signify a serious mutex corruption.
- Change up some KASSERT()s to MPASS()es, and vice-versa, depending on the
type of problem we're debugging (INVARIANTS is used here to check that
the API is being used properly whereas MUTEX_DEBUG is used to ensure that
something general isn't happening that will have bad impact on mutex
locks).
Reminded by: jhb, jake, asmodai
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.
- I can't seem to reproduce the warning I got from WITNESS anymore.
- The fix was wrong. Since a uidinfo struct is a member of proc, it
makes sense for the locking order to be such that you are allowed to
hold proc and then grab the uidinfo lock.
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).
- Add a set of MI helper functions for interrupt threads:
- ithread_create() creates a new interrupt thread
- ithread_destroy() destroys an interrupt thread
- ithread_add_handler() attaches a new handler to an interrupt thread
- ithread_remove_handler() detaches a handler from an interrupt thread
- Rename sinthand_add() and sched_swi() to swi_add() and swi_sched()
respectively so that they live in a consistent namespace.
- struct intrhand is no longer a public type. It would be private to
kern_intr.c but the current implementation of fast interrupts on the
alpha requires the type to be exported. However, all handlers should
be treated as void * cookies in the way that new-bus treats them. This
includes references to software interrupt handlers.
will only display sleep mutexes held by the current process.
- Clean up some nits in the witness_display() function and add a ddb
command 'show witness' that dumps the hierarchy and order lists to the
console.
- Use queue(3) macros where appropriate.
- Resort the spin lock order list so that "com" is before "sched_lock".
Also, add appropriate #ifdef's around SMP and i386-specific mutexes.
- Add two new mutexes used to protect the ithread lists and tables to the
order list.
Requested by: bde (1)
follows:
- show ktr_first display the first entry
- show ktr_next display the next entry
- show ktr display the entire buffer
The /v modifiers continue to work as described previously.
Requested by: bde
only the boot processor should be running in the comments.
- Initialize curproc to point to each CPU's respective idleproc if their
curproc is NULL.
- Keep track of the number of context switches performed by idleproc.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
kmem_free() for now. Kmem_malloc() and kmem_free() now have appropriate
assertions in place, and these checks aren't feasible until more of the
networking code is locked down. Also, the extra assertions here should
already be caught by the WITNESS code as lock order violations should
mutex operations on Giant be reintroduced here later.
the index of the pollfd array to the number of fd's currently open, not
the maximum number of fd's. ie: if you had 0,1,2 open, you could not
use pollfd slots higher than 20. The specs say we only have to support
OPEN_MAX [64] entries but we allow way more than that.
by myself. It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used. This code has been
heavily tested in -stable but only tested somewhat on -current. An MFC
will occur in a few days. My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.
Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.
Ensure that VM objects are not allocated for system maps. There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.
Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.
Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map. Previously only a simple
linear optimization was made. (The buffer vm_map typically has only a
handful of vm_map_entry's. This stabilizes it at that level permanently).
PR: 20609
Submitted by: (Tor Egge) tegge
With this flag set malloc() will panic if memory allocation failed.
This usable only in critical places where failed allocation is fatal.
Reviewed by: peter
machines (duh!). This was one reason why this script broke on
i386. The other being that on i386 sections did not have the
proper alignment. This has been fixed in sys/sys/linker_set.h.
o Use objdump instead of gensetdefs(1) to build the linker sets.
o Allow overriding of nm and objdump in resp. genassym.sh and
gensetdefs.pl for non-native toolchains.
Reviewed by: arch
Perl improvements: Jos Backus <josb@cncdsl.com>, benno
problem is that a mutex lock, prior to this change, is acquired before
the curproc is set to idleproc, so we mess ourselves up by calling
the mutex lock routine with curproc == NULL.
Moving it up after the aps_ready spin-wait has us hopefully setting it
after idleproc is setup.
Solved by: jake (the allmighty) :-)
Move the helper macros from sbuf.h to sbuf.c
Use ints instead of size_ts.
Relax the requirements for sbuf_finish(): it is now possible to finish an
overflowed buffer.
Make sbuf_len() return -1 instead of 0 if the sbuf overflowed.
Requested by: gibbs
instead of a trapframe directly. (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
fork_return() instead of child_return().
- Axe child_return().
- Update stopevent() to assert that the proc lock is held when it is
held and is not recursed. Note that the STOPEVENT() macro obtains
the proc lock when calling this function.