That revision is introducing a bug which is more visible than problems
it is trying to fix.
As long as my time is very limited in this period I am going to
commit back this patch just once it is fully fixed.
Reported by: dim, Nicholas Esborn
PT_GNU_STACK program header, if present and enabled. Two new sysctls
are provided, kern.elf32.nxstack and kern.elf64.nxstack, that allow to
enable PT_GNU_STACK for ABIs of specified bitsize, if ABI decided to
support shared page.
Inform rtld about access mode of the stack initial mapping by
AT_STACKPROT aux vector.
At the moment, the default is disabled, waiting for the usermode
support bits.
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.
Enable shared page use for amd64, both native and 32bit FreeBSD
binaries. Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.
Reviewed by: alc
to match the desired priority in td_priority. Otherwise the first time
thread0 used a borrowed priority it would drop down to PUSER instead of
PVM.
- Explicitly initialize the starting priority of new kprocs to PVM to
avoid inheriting some random priority from thread0.
MFC after: 2 weeks
thread and proc have been copied and zeroed from the old thread and
proc. Otherwise attempts to modify thread or process data in sched_fork()
could be undone.
- Don't copy td_{base,}_user_pri from the old thread to the new thread in
sched_fork_thread() in ULE. This is already done courtesy the bcopy()
of the thread copy region.
- Always initialize the real priority (td_priority) of new threads to the
new thread's base priority (td_base_pri) to avoid bogusly inheriting a
borrowed priority from the parent thread.
MFC after: 2 weeks
This was lost when it was converted to using a condition variable instead
of lbolt.
- Drop the priority of flowtable down to PPAUSE when it is idle as well
since it is a similar background task.
MFC after: 2 weeks
the dependency of which was preloaded, but failed to initialize. Previously,
kernel dereferenced NULL pointer returned by modlist_lookup2(); now, when this
happens, we unload the dependent module. Since the depended_files list is
sorted in dependency order, this properly propagates, unloading modules that
depend on failed ones.
From the user point of view, this prevents the kernel from panicing when
trying to boot kernel compiled without KDTRACE_HOOKS with dtraceall_load="YES"
in /boot/loader.conf.
Reviewed by: kib
Add and export constants of array sizes of jail parameters as compiled into
the kernel.
This is the least intrusive way to allow kvm to read the (sparse) arrays
independent of the options the kernel was compiled with.
Reviewed by: jhb (originally)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY
was cleared early in vm_object_page_clean, before the cleaning pass
was done. This is no longer true after r216799.
Moreover, since OBJ_CLEANING is a flag, and not the counter, it could
be reset too prematurely when parallel vm_object_page_clean() are
performed.
Reviewed by: alc (as a part of the bigger patch)
MFC after: 1 month (after r216799 is merged)
- Problem1:
Hypothesis: thread1 is doing a callout_reset_on(), within his
callout handler, willing to implicitly or explicitly migrate the
callout. thread2 is draining the callout.
Thesys:
* thread1 calls callout_lock() and locks the old callout cpu
* thread1 performs the checks in the first path of the
callout_reset_on()
* thread1 hits this codepiece:
/*
* If the lock must migrate we have to check the state again as
* we can't hold both the new and old locks simultaneously.
*/
if (c->c_cpu != cpu) {
c->c_cpu = cpu;
CC_UNLOCK(cc);
goto retry;
}
which means it will drop the lock and 'retry'
* thread2 will callout_lock() and locks the new callout cpu.
thread1 spins on the new lock and will not keep going for the
moment.
* thread2 checks that the callout is not pending (as callout is
currently running) and that it is not on cc->cc_curr (because cc
now refers to the new callout and the callout is running on the
old callout cpu) thus it thinks it is done and returns.
* thread1 will now acquire the lock and then adds the callout
to the new callout cpu queue
That seems an obvious race as callout_stop() falsely reports
the callout stopped or worse, callout_drain() falsely returns
while the callout is still in use.
- Solution1:
Fixing this problem would require, in general, to lock both
callout cpus at once while switching the c_cpu field and avoid
cyclic deadlocks between callout cpus locks.
The concept of CPUBLOCK is then introduced (working more or less
like the blocked_lock for thread_lock() function) meaning:
"in callout_lock(), spin until the c->c_cpu is not different from
CPUBLOCK". That way the "original" callout cpu, referred to the
above mentioned code snippet, will remain blocked until the lock
handover is over critical path will remain covered.
- Problem2:
Having the callout currently executed on a specific callout cpu
and contemporary pending on another callout cpu (as it can happen
with current code) breaks, at least, the assumption callout_drain()
returns just once the callout cannot be referenced anymore.
- Solution2:
Callout migration is deferred if the current callout is already
under execution.
The best place to do that is in softclock() and new members are
added to the callout cpu structure in order to specify a pending
migration is requested. That is necessary because the callout
cannot be trusted (not freed) the 100% of times after the execution
of the callout handler.
CPUBLOCK will prevent, in the "deferred migration" case, that the
callout gets freed in this case, stopping any callout_stop() and
callout_drain() possible activity until the migration is
actually performed.
- Problem3:
There is a further race in callout_drain().
In order to avoid a race between sleepqueue lock and callout cpu
spinlock, in _callout_stop_safe(), the callout cpu lock is dropped,
the sleepqueue lock is acquired and a new callout cpu lookup is
performed. Note that the channel used for locking the sleepqueue is
obtained from the "current" callout cpu (&cc->cc_waiting).
If the callout migrated in the meanwhile, callout_drain() will end up
using the wrong wchan for the sleepqueue (the locked one will be the
older, while the new one will not really be locked) leading to a
lock leak and a race access to sleepqueue.
- Solution3:
It is enough to check if a migration happened between the operation
of acquiring the sleepqueue lock and the new callout cpu lock and
eventually unwind all those and try again.
This problems can lead to deathly races on moderate (4-ways) SMP
environment, leading to easy panic or deadlocks.
The 24-ways of the reporter, could easilly panic, with completely
normal workload, almost daily.
gianni@ kindly wrote the following prof-of-concept which can
panic a FreeBSD machine in less than one hour, in smaller SMP:
http://www.freebsd.org/~attilio/callout/test.c
Reported by: Nicholas Esborn <nick at desert dot net>, DesertNet
In collabouration with: gianni, pho, Nicholas Esborn
Reviewed by: jhb
MFC after: 1 week (*)
* Usually, I would aim for a larger MFC timeout, but I really want this
in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special
case for this patch
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
lowered, repropagete priorities, this may cause mutex owner's priority
to be lowerd, in old code, mutex owner's priority is rise-only.
- Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based
condition variable, this should eliminate an extra system call to get
current time.
- Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single
system call. Create userland sleep queue for condition variable, in most
cases, thread will wait in the queue, the pthread_cond_signal will defer
thread wakeup until the mutex is unlocked, it tries to avoid an extra
system call and a extra context switch in time window of pthread_cond_signal
and pthread_mutex_unlock.
The changes are part of process-shared mutex project.
the parentheses around the location for simple fail points into the
location string. This makes the print on fail point set more
consistent between the two versions.
Also fix up fail.h a little for style(9): only use one of sys/param.h
and sys/types.h, and use the existing __XSTRING() macro instead of
rolling our own. Also fix up a few tabs on changed and nearby lines.
Lastly, since KFAIL_POINT_{BEGIN,END} are not meant for use outside
this file, just eliminate the macros entirely.
MFC after: 1 week
and in many respects can be thought of as a more generic superset of pfil.
Hhook provides a way for kernel subsystems to export hook points that Khelp
modules can hook to provide enhanced or new functionality to the kernel. The
KPI has been designed to ensure hook points pose no noticeable overhead when
no hook functions are registered.
- Introduce the Khelp (Kernel Helpers) KPI. Khelp provides a framework for
managing Khelp modules, which indirectly use the Hhook KPI to register their
hook functions with hook points of interest within the kernel. Khelp modules
aim to provide a structured way to dynamically extend the kernel at runtime in
an ABI preserving manner. Depending on the subsystem providing hook points, a
Khelp module may be able to associate per-object data for maintaining relevant
state between hook calls.
- pjd's Object Specific Data (OSD) KPI is used to manage the per-object data
allocated to Khelp modules. Create a new "OSD_KHELP" OSD type for use by the
Khelp framework.
- Bump __FreeBSD_version to 900028 to mark the introduction of the new KPIs.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz, others along the way
MFC after: 3 months
condition in proc_rwmem() and to (2) simplify the implementation of the
cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem()
the requested read or write could fail because the targeted page could be
reclaimed between the calls to vm_fault() and vm_page_hold().
In collaboration with: kib@
MFC after: 6 weeks
earlier commit. While here, move the thread lock down in rtp_to_pri().
It is not needed for all of the priority value checks and the computation
of newpri.
Reported by: swell.k @ gmail
MFC after: 3 days
a 32-bit one. This can cause weird timeout issues, as the copying reads
garbage from the user.
Code by: Deepak Veliath <deepak dot veliath at isilon dot com>
MFC after: 1 week
to PSARC/2010/029. In short, the semantics is simplified - "weird stuff"
no longer happens after chmod, entries don't get duplicated during
inheritance, and trivial ACLs no longer contain three "DENY" entries,
which is also more friendly to MS Windows.
By default, UFS keeps using old semantics. To change it, set sysctl
vfs.acl_nfs4_old_semantics to 0. I'll flip the switch when ZFSv28
hits the tree, to keep these two in sync - ZFS v28 uses PSARC semantics,
and ZFS v15 uses the old one.
It is possible a lower priority thread lending priority to higher priority
thread, in old code, it is ignored, however the lending should always be
recorded, add field td_lend_user_pri to fix the problem, if a thread does
not have borrowed priority, its value is PRI_MAX.
MFC after: 1 week
and set *procp to NULL in all cases. Previously, it was not being set
in the ERESTART case. This is effectively no-op, since its value is
ignored by callers in the error case.
Reviewed by: kib@
proper log message for r216150.
MFC after: 1 week
If unix socket has a unix socket attached as the rights that has a
unix socket attached as the rights that has a unix socket attached as
the rights ... Kernel may overflow the stack on attempt to close such
socket.
Only close the rights file in the context of the current close if the
file is not unix domain socket. Otherwise, postpone the work to
taskqueue, preventing unlimited recursion.
The pass of the unix domain sockets over the SCM_RIGHTS message
control is not widely used, and more, the close of the socket with
still attached rights is mostly an application failure. The change
should not affect the performance of typical users of SCM_RIGHTS.
Reviewed by: jeff, rwatson
in "struct vm_object". This is required to make it possible to account
for per-jail swap usage.
Reviewed by: kib@
Tested by: pho@
Sponsored by: FreeBSD Foundation