in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated
file descriptors which is used to locate available descriptors when a new
one is needed. It also moves the task of growing the file descriptor table
out of fdalloc(), reducing complexity in both fdalloc() and do_dup().
Debts of gratitude are owed to tjr@ (who provided the original patch on
which this work is based), grog@ (for the gdb(4) man page) and rwatson@
(for assistance with pxeboot(8)).
ithread_remove_handler() may fail to remove the interrupt handler if
it decides to let the ithread do the removal. The problem is that during
boot "cold" is set, which causes msleep() to return immediately. This
will cause ithread_remove_handler() to fail to wait for the ithread
to do the removal from the handler TAILQ before freeing the handler
back to the heap. Bad things will happen when some other user of the
TAILQ, such as ithread_add_handler() or the actual ithread attempts to use
the freed handler. Fix the problem by forcing ithread_remove_handler()
to do the actual removal itself if the "cold" flag is set.
Reviewed by: jhb
a maximum dump size of 0, return a size-related error, rather
than returning success. Otherwise, waitpid() will incorrectly
return a status indicating that a core dump was created. Note
that the specific error doesn't actually matter, since it's lost.
MFC after: 2 weeks
PR: 60367
Submitted by: Valentin Nechayev <netch@netch.kiev.ua>
avoid relying on the minimum memory allocation size to avoid problems.
The check is somewhat redundant because the consumers of the returned
structure will check that sa_len is a protocol-specific larger size.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: nectar
MFC after: 30 days
setting the new process' p_pgrp again before inserting it in the p_pglist.
Without it we can get the new process to be inserted in a different p_pglist
than the one p2->p_pgrp points to, and this is not something we want to happen.
This is not a fix, merely a bandaid, but it will work until someone finds a
better way to do it.
Discussed with: jhb (a long time ago)
in slightly less usual states:
If the thread is on a run queue, display "running" if the thread is
actually running, otherwise, "runnable".
If the thread is sleeping, and it's on a sleep queue, display the
name of the queue, otherwise "unknown" -- previously, in this situation
we would display "iowait".
If the thread is waiting on a lock, display *lockname.
If the thread is suspended, display "suspended" -- previously, in
this situation we would display "iowait".
If the thread is waiting for an interrupt, display "intrwait" --
previously, in this situation we would display "iowait".
If the thread is in a state not handled by the above, display
"unknown" -- previously, we would print "iowait".
Among other things, this avoids displaying "iowait" when the foreground
process turns out to be suspended waiting for a debugger to properly
attach.
holding the mutex. Because the sigacts pointer can't change while
the process is "live" (proc locking (x)), we know our pointer is still
valid.
In communication with: truckman
Reviewed by: jhb
on a non-recursive mutex will fail but will not trigger any assertions.
- Add an assertion to mtx_lock() that one never recurses on a non-recursive
mutex. This is mostly useful for the non-WITNESS case.
Requested by: deischen, julian, others (1)
Add empty line before first code line in functions with no local
variables.
Properly terminate comment sentences.
Indent lines which are longer that 80 characters.
Move v_addpollinfo closer to the rest of poll-related functions.
Move DEBUG_VFS_LOCKS ifdefed block to the end of file.
Obtained from: bde (partly)
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.
Reviewed by: deischen, dfr
o promote several m_tag_* routines to inline
o add an m_tag_setup inline to set the fixed fields in a packet tag
o add an m_tag_free method pointer to each mtag to support, for example,
allocating tags from zones
o have m_tag_find check if the tag list is not empty before calling
m_tag_locate to search
Reviewed by: brooks, silence from others
with one of std{in,out,err} open. This helps with the file descriptor
leaks reported on -current. This should probably be merged into 5.2.
Reviewed by: ru
Tested by: Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>
function back to near the beginning of the file. Rev.1.194 moved it into
the middle of auxiliary functions following kern_execve(). Moved the
__mac_execve() syscall function up together with execve(). It was new in
rev1.1.196 and perfectly misplaced after execve().
sched_cpu() locks an sx lock (allproc_lock) which can sleep if it fails to
acquire the lock, it is not safe to execute this in a callout handler from
softclock().
use it, if we ever did. They have been been VERY poorly maintained for
some time, possibly because they were a NOP. FWIW, This brings our table
formats back closer to the other *BSD's.
cpu could have been bogged down with non-transferable load and still not
migrated a new thread to an idle cpu. This required some benchmarking and
tuning to get right as the comment above it suggests.
- In sched_add(), do the idle check prior to the transfer check so that we
don't try to transfer load from an idle cpu. This fixes panics caused by
IPIs on UP machines running SMP kernels.
Reported/Debugged by: seanc
reassigning their v_ops field to specfs, detaching from the mountpoint, etc.
However, this is not sufficient. If we vclean() the vnode the pages owned
by the vnode are lost, potentially while buffers reference them. Implement
parts of vclean() seperately in vgonechrl() so that the pages and bufs
associated with a device vnode are not destroyed while in use.
- The new sched_balance_groups() function does intra-group balancing while
sched_balance() balances the available groups.
- Pick a random time between 0 ticks and hz * 2 ticks to restart each
balancing process. Each balancer has its own timeout.
- Pick a random place in the list of groups to start the search for lowest
and highest group loads. This prevents us from prefering a group based on
numeric position.
- Use a nasty hack to stop us from preferring cpu 0. The problem is that
softclock always runs on cpu 0, so it always has a little extra load. We
ignore this load in the balancer for now. In the future softclock should
run on a random cpu and these hacks can go away.
cpu are added to a group.
- Don't place a cpu into the kseq_idle bitmask until all cpus in that group
have idled.
- Prefer idle groups over idle group members in the new kseq_transfer()
function. In this way we will prefer to balance load across full cores
rather than add further load a partial core.
- Before a cpu goes idle, check the other group members for threads. Since
SMT cpus may freely share threads, this is cheap.
- SMT cores may be individually pinned and bound to now. This contrasts the
old mechanism where binding or pinning would have allowed a thread to run
on any available cpu.
- Remove some unnecessary logic from sched_switch(). Priority propagation
should be properly taken care of in sched_prio() now.
turnstile_unpend(). A racing thread that does not have TDI_LOCK set may
either be running on another CPU or it may be sitting on a run queue if it
was preempted during the very small window in turnstile_wait() between
unlocking the turnstile chain lock and locking sched_lock.
case of a turnstile having no threads is just one instance of the more
general case where the thread we are examining has been partially awakened
already in that it has been removed from the turnstile's blocked list but
still has TDI_LOCK set. We detect that case by checking to see if the
thread has already had a turnstile reassigned to it.
functions less noisy: We printf if a new function took longer than
the previous record holder, or of the previous record holder took
more than twice as long as the current record.
to have the kernel switch to a new thread, instead of doing it in
userland. It is in fact needed on ia64 where syscall restarts do not
return to userland first. It's completely handled inside the kernel.
As such, any context created by the kernel as part of an upcall and
caused by some syscall needs to be restored by the kernel.
Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad
things happen(TM). This is why beast.freebsd.org paniced with ULE.
Reviewed by: jeff
and the mpo_create_cred() MAC policy entry point to
mpo_copy_cred_label(). This is more consistent with similar entry
points for creation and label copying, as mac_create_cred() was
called from crdup() as opposed to during process creation. For
a number of policies, this removes the requirement for special
handling when copying credential labels, and improves consistency.
Approved by: re (scottl)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.
Approved by: re (scottl)
Tested on: i386, amd64, alpha
aid other kernel code, especially code which can be in a module such as
the acpi_cpu(4) driver, to work properly with both SMP and UP kernels.
The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and
the smp_rendezvous() function. This also means that CPU_ABSENT() is now
always implemented the same on all kernels.
Approved by: re (scottl)
to sendfile(2) being erroneously automatically restarted after a signal
is delivered. Fixed by converting ERESTART to EINTR prior to exiting.
Updated manual page to indicate the potential EINTR error, its cause
and consequences.
Approved by: re@freebsd.org
forced unmount case. Otherwise, a file system that is referenced
only by process fd_cdir/fd_rdir references to the file system root
vnode will be successfully unmounted without the MNT_FORCE flag.
The previous behaviour was not compatible with the unmount semantics
required by amd(8), so file systems could be unexpectedly unmounted
while there were still references to the file system root directory.
Reported by: Erez Zadok <ezk@cs.sunysb.edu>
Approved by: re (scottl)
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.
Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)
happen in interrupt context; 1) sleep locks, and 2) malloc/free
calls.
1) is fixed by using spin locks instead.
2) is fixed by preallocating a FIFO (implemented with a STAILQ)
and using elements from this FIFO instead. This turns out
to be rather fast.
OK'ed by: re (scottl)
Thanks to: peter, jhb, rwatson, jake
Apologies to: *