not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.
Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.
Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.
Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.
Reported by: Justin Teller <justin.teller gmail com>
Reviewed by: davidxu, jilles
MFC after: 1 month
set quite late in the revocation path, properly verify that vnode is
not doomed before calling VOP.
Reported and tested by: Harald Schmalzbauer <h.schmalzbauer omnilan de>
MFC after: 3 days
unlocked. fdrop() closes file descriptor when reference count goes to
zero. Close method for vnodes locks the vnode, resulting in "sleepable
after non-sleepable". For pipes, pipe mutex is before kqueue lock,
causing LOR.
Reported and tested by: pho
MFC after: 2 weeks
instead of sizeof(int), and on sparc64 that resulted in fetching wrong
value for acl_maxcnt, which in turn caused __acl_get_link(2) to fail
with EINVAL.
PR: sparc64/139304
Submitted by: Dmitry Afanasiev <KOT at MATPOCKuH.Ru>
sockets. This allows for reliable bi-directional datagram communication
over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or
SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing
UNIX domain socket code. This allows applications requiring record-
oriented semantics to do so reliably via local IPC.
Some implementation notes (also present in XXX comments):
- Currently we lack an sbappend variant able to do datagrams and control
data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR.
Adding a new variant will solve this problem.
- UNIX domain sockets on FreeBSD provide back-pressure/flow control
notification for stream sockets by manipulating the send socket
buffer's size during pru_send and pru_rcvd. This trick works less well
for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to
manage blocking, but also to determine maximum datagram size. Fixing
this requires rethinking how back-pressure is done for SOCK_SEQPACKET;
in the mean time, it's possible to get EMSGSIZE when buffers fill,
instead of blocking.
Discussed with: benl
Reviewed by: bz, rpaulo
MFC after: 3 months
Sponsored by: Google
virtual address 0, limiting the ability to convert a kernel
NULL pointer dereference into a privilege escalation attack.
If the sysctl is set to 0 a newly started process will not be able
to map anything in the address range of the first page (0 to PAGE_SIZE).
This is the default. Already running processes are not affected by this.
You can either change the sysctl or the tunable from loader in case
you need to map at a virtual address of 0, for example when running
any of the extinct species of a set of a.out binaries, vm86 emulation, ..
In that case set security.bsd.map_at_zero="1".
Superseeds: r197537
In collaboration with: jhb, kib, alc
if it is empty. Otherwise the previous thread's name would remain in the
struct and then be reported for this thread.
Submitted by: Ryan Stone
MFC after: 1 week
want to provide VOP_ACCESSX(9) don't have to implement both. Note that
this commit makes implementation of either of these two mandatory.
Reviewed by: kib
in order to avoid, on architectures which doesn't have strong ordered
writes, CPU instructions reordering.
Diagnosed by: fabio
Reviewed by: jhb
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
- F_READAHEAD: specify the amount for sequential access. The amount is
specified in bytes and is rounded up to nearest block size.
- F_RDAHEAD: Darwin compatible version that use 128KB as the sequential
access size.
A third argument of zero disables the read-ahead behavior.
Please note that the read-ahead amount is also constrainted by sysctl
variable, vfs.read_max, which may need to be raised in order to better
utilize this feature.
Thanks Igor Sysoev for proposing the feature and submitting the original
version, and kib@ for his valuable comments.
Submitted by: Igor Sysoev <is rambler-co ru>
Reviewed by: kib@
MFC after: 1 month
contained only SLIST_HEAD as its member, thus sizeof(struct klist) would
equal to sizeof(struct klist *), so this change makes the code more
correct in terms of semantics, but should be a no-op to compiler at this
time.
Reported by: MQ <antinvidia at gmail com>
check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag,
and resume the readers. The error must be EAGAIN, otherwise there must
have memory problem, and nobody can rescue the buggy application.
The revision 197445 might be reverted.
driver load. This fixes crash on atapicam module load on systems,
where some ata channels (usually ata1) was probed, but failed to attach.
Reviewed by: jhb, imp
Tested by: many
run for re-acuiring the lock, but recheck if new pages are allocatable
from the pool and free the previously allocated ones.
Tested by: pho, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
EV_RECEIPT is useful to disambiguating error conditions when multiple
events structures are passed to kevent(2). The error code is returned
in the data field and EV_ERROR is set.
Approved by: rwatson (co-mentor)
When the EV_DISPATCH flag is used the event source will be disabled
immediately after the delivery of an event. This is similar to the
EV_ONESHOT flag but it doesn't delete the event.
Approved by: rwatson (co-mentor)
Add user events support to kernel events which are not associated with any
kernel mechanism but are triggered by user level code. This is useful for
adding user level events to an event handler that may also be monitoring
kernel events.
Approved by: rwatson (co-mentor)
The touch event filter is called when a kernel event data is possibly
updated. There are two hook points. First, during a kevent() system
call. Second, when an event has been triggered.
Approved by: rwatson (co-mentor)
TCP_SORECEIVE_STREAM for the time being.
Requested by: brooks
Once compiled in make it easily switchable for testers by using a tuneable
net.inet.tcp.soreceive_stream
and a corresponding read-only sysctl to report the current state.
Suggested by: rwatson
MFC after: 2 days
- In 8.x and above the run-queue locks are nomore shared even in the
HTT case, so remove the special case.
- The deadlock explained in the removed comment here is still possible
even with different locks, with the contribution of tdq_lock_pair().
An explanation is here:
(hypotesis: a thread needs to migrate on another CPU, thread1 is doing
sched_switch_migrate() and thread2 is the one handling the sched_switch()
request or in other words, thread1 is the thread that needs to migrate
and thread2 is a thread that is going to be preempted, most likely an
idle thread. Also, 'old' is referred to the context (in terms of
run-queue and CPU) thread1 is leaving and 'new' is referred to the
context thread1 is going into. Finally, thread3 is doing tdq_idletd()
or sched_balance() and definitively doing tdq_lock_pair())
* thread1 blocks its td_lock. Now td_lock is 'blocked'
* thread1 drops its old runqueue lock
* thread1 acquires the new runqueue lock
* thread1 adds itself to the new runqueue and sends an IPI_PREEMPT
through tdq_notify() to the new CPU
* thread1 drops the new lock
* thread3, scanning the runqueues, locks the old lock
* thread2 received the IPI_PREEMPT and does thread_lock() with td_lock
pointing to the new runqueue
* thread3 wants to acquire the new runqueue lock, but it can't because
it is held by thread2 so it spins
* thread1 wants to acquire old lock, but as long as it is held by
thread3 it can't
* thread2 going further, at some point wants to switchin in thread1,
but it will wait forever because thread1->td_lock is in blocked state
This deadlock has been manifested mostly on 7.x and reported several time
on mailing lists under the voice 'spinlock held too long'.
Many thanks to des@ for having worked hard on producing suitable textdumps
and Jeff for help on the comment wording.
Reviewed by: jeff
Reported by: des, others
Tested by: des, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
(STABLE_7 based version)
The problem was introduced in SVN 180608/ rev 1.114 and affects
all users of callout_reset() (including select, usleep, setitimer).
A better fix probably involves replicating 'ticks' in the
struct callout_cpu; this commit is just a temporary thing so that
we can MFC it after a suitable test time and RE approval.
MFC after: 3 days
longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in
or out 4 bytes more then the process expected.
Calculate the amount of bytes to copy taking into account size of fd_set
for the current process ABI.
Diagnosed and tested by: Peter Jeremy <peterjeremy acm org>
Reviewed by: jhb
MFC after: 1 week
The hook calls vn_fullpath(9), that should not be executed with a vnode
lock held.
Reported by: Bruce Cran <bruce cran org uk>
Tested by: pho
MFC after: 3 days