- Move the realtime priority range up above kernel sleep priorities and
just below interrupt thread priorities.
- Contract the interrupt and kernel sleep priority ranges a bit so that
the timesharing priority band can be increased. The new timeshare range
is now slightly larger than the old realtime + timeshare ranges.
- Change the ULE scheduler to no longer use realtime priorities for
interactive threads. Instead, the larger timeshare range is now split
into separate subranges for interactive and non-interactive ("batch")
threads. The end result is that interactive threads and non-interactive
threads still use the same priority ranges as before, but realtime
threads now have a separate, dedicated priority range.
- Do not modify the priority of non-timeshare threads in sched_sleep()
or via cv_broadcastpri(). Realtime and idle priority threads will
no longer have their priorities affected by sleeping in the kernel.
Reviewed by: jeff
interactive timeshare threads (PRI_*_INTERACTIVE) and non-interactive
timeshare threads (PRI_*_BATCH) and use these instead of PRI_*_REALTIME
and PRI_*_TIMESHARE. No functional change.
Reviewed by: jeff
PI_DISKLOW. While here, rename PI_TTYLOW to PI_TTY.
- Add a macro PI_SWI() that takes a SWI_* constant as an argument and
returns the suitable thread priority.
That revision is introducing a bug which is more visible than problems
it is trying to fix.
As long as my time is very limited in this period I am going to
commit back this patch just once it is fully fixed.
Reported by: dim, Nicholas Esborn
PT_GNU_STACK program header, if present and enabled. Two new sysctls
are provided, kern.elf32.nxstack and kern.elf64.nxstack, that allow to
enable PT_GNU_STACK for ABIs of specified bitsize, if ABI decided to
support shared page.
Inform rtld about access mode of the stack initial mapping by
AT_STACKPROT aux vector.
At the moment, the default is disabled, waiting for the usermode
support bits.
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.
Enable shared page use for amd64, both native and 32bit FreeBSD
binaries. Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.
Reviewed by: alc
to match the desired priority in td_priority. Otherwise the first time
thread0 used a borrowed priority it would drop down to PUSER instead of
PVM.
- Explicitly initialize the starting priority of new kprocs to PVM to
avoid inheriting some random priority from thread0.
MFC after: 2 weeks
thread and proc have been copied and zeroed from the old thread and
proc. Otherwise attempts to modify thread or process data in sched_fork()
could be undone.
- Don't copy td_{base,}_user_pri from the old thread to the new thread in
sched_fork_thread() in ULE. This is already done courtesy the bcopy()
of the thread copy region.
- Always initialize the real priority (td_priority) of new threads to the
new thread's base priority (td_base_pri) to avoid bogusly inheriting a
borrowed priority from the parent thread.
MFC after: 2 weeks
This was lost when it was converted to using a condition variable instead
of lbolt.
- Drop the priority of flowtable down to PPAUSE when it is idle as well
since it is a similar background task.
MFC after: 2 weeks
the dependency of which was preloaded, but failed to initialize. Previously,
kernel dereferenced NULL pointer returned by modlist_lookup2(); now, when this
happens, we unload the dependent module. Since the depended_files list is
sorted in dependency order, this properly propagates, unloading modules that
depend on failed ones.
From the user point of view, this prevents the kernel from panicing when
trying to boot kernel compiled without KDTRACE_HOOKS with dtraceall_load="YES"
in /boot/loader.conf.
Reviewed by: kib
Add and export constants of array sizes of jail parameters as compiled into
the kernel.
This is the least intrusive way to allow kvm to read the (sparse) arrays
independent of the options the kernel was compiled with.
Reviewed by: jhb (originally)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY
was cleared early in vm_object_page_clean, before the cleaning pass
was done. This is no longer true after r216799.
Moreover, since OBJ_CLEANING is a flag, and not the counter, it could
be reset too prematurely when parallel vm_object_page_clean() are
performed.
Reviewed by: alc (as a part of the bigger patch)
MFC after: 1 month (after r216799 is merged)
- Problem1:
Hypothesis: thread1 is doing a callout_reset_on(), within his
callout handler, willing to implicitly or explicitly migrate the
callout. thread2 is draining the callout.
Thesys:
* thread1 calls callout_lock() and locks the old callout cpu
* thread1 performs the checks in the first path of the
callout_reset_on()
* thread1 hits this codepiece:
/*
* If the lock must migrate we have to check the state again as
* we can't hold both the new and old locks simultaneously.
*/
if (c->c_cpu != cpu) {
c->c_cpu = cpu;
CC_UNLOCK(cc);
goto retry;
}
which means it will drop the lock and 'retry'
* thread2 will callout_lock() and locks the new callout cpu.
thread1 spins on the new lock and will not keep going for the
moment.
* thread2 checks that the callout is not pending (as callout is
currently running) and that it is not on cc->cc_curr (because cc
now refers to the new callout and the callout is running on the
old callout cpu) thus it thinks it is done and returns.
* thread1 will now acquire the lock and then adds the callout
to the new callout cpu queue
That seems an obvious race as callout_stop() falsely reports
the callout stopped or worse, callout_drain() falsely returns
while the callout is still in use.
- Solution1:
Fixing this problem would require, in general, to lock both
callout cpus at once while switching the c_cpu field and avoid
cyclic deadlocks between callout cpus locks.
The concept of CPUBLOCK is then introduced (working more or less
like the blocked_lock for thread_lock() function) meaning:
"in callout_lock(), spin until the c->c_cpu is not different from
CPUBLOCK". That way the "original" callout cpu, referred to the
above mentioned code snippet, will remain blocked until the lock
handover is over critical path will remain covered.
- Problem2:
Having the callout currently executed on a specific callout cpu
and contemporary pending on another callout cpu (as it can happen
with current code) breaks, at least, the assumption callout_drain()
returns just once the callout cannot be referenced anymore.
- Solution2:
Callout migration is deferred if the current callout is already
under execution.
The best place to do that is in softclock() and new members are
added to the callout cpu structure in order to specify a pending
migration is requested. That is necessary because the callout
cannot be trusted (not freed) the 100% of times after the execution
of the callout handler.
CPUBLOCK will prevent, in the "deferred migration" case, that the
callout gets freed in this case, stopping any callout_stop() and
callout_drain() possible activity until the migration is
actually performed.
- Problem3:
There is a further race in callout_drain().
In order to avoid a race between sleepqueue lock and callout cpu
spinlock, in _callout_stop_safe(), the callout cpu lock is dropped,
the sleepqueue lock is acquired and a new callout cpu lookup is
performed. Note that the channel used for locking the sleepqueue is
obtained from the "current" callout cpu (&cc->cc_waiting).
If the callout migrated in the meanwhile, callout_drain() will end up
using the wrong wchan for the sleepqueue (the locked one will be the
older, while the new one will not really be locked) leading to a
lock leak and a race access to sleepqueue.
- Solution3:
It is enough to check if a migration happened between the operation
of acquiring the sleepqueue lock and the new callout cpu lock and
eventually unwind all those and try again.
This problems can lead to deathly races on moderate (4-ways) SMP
environment, leading to easy panic or deadlocks.
The 24-ways of the reporter, could easilly panic, with completely
normal workload, almost daily.
gianni@ kindly wrote the following prof-of-concept which can
panic a FreeBSD machine in less than one hour, in smaller SMP:
http://www.freebsd.org/~attilio/callout/test.c
Reported by: Nicholas Esborn <nick at desert dot net>, DesertNet
In collabouration with: gianni, pho, Nicholas Esborn
Reviewed by: jhb
MFC after: 1 week (*)
* Usually, I would aim for a larger MFC timeout, but I really want this
in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special
case for this patch
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
lowered, repropagete priorities, this may cause mutex owner's priority
to be lowerd, in old code, mutex owner's priority is rise-only.
- Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based
condition variable, this should eliminate an extra system call to get
current time.
- Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single
system call. Create userland sleep queue for condition variable, in most
cases, thread will wait in the queue, the pthread_cond_signal will defer
thread wakeup until the mutex is unlocked, it tries to avoid an extra
system call and a extra context switch in time window of pthread_cond_signal
and pthread_mutex_unlock.
The changes are part of process-shared mutex project.
the parentheses around the location for simple fail points into the
location string. This makes the print on fail point set more
consistent between the two versions.
Also fix up fail.h a little for style(9): only use one of sys/param.h
and sys/types.h, and use the existing __XSTRING() macro instead of
rolling our own. Also fix up a few tabs on changed and nearby lines.
Lastly, since KFAIL_POINT_{BEGIN,END} are not meant for use outside
this file, just eliminate the macros entirely.
MFC after: 1 week
and in many respects can be thought of as a more generic superset of pfil.
Hhook provides a way for kernel subsystems to export hook points that Khelp
modules can hook to provide enhanced or new functionality to the kernel. The
KPI has been designed to ensure hook points pose no noticeable overhead when
no hook functions are registered.
- Introduce the Khelp (Kernel Helpers) KPI. Khelp provides a framework for
managing Khelp modules, which indirectly use the Hhook KPI to register their
hook functions with hook points of interest within the kernel. Khelp modules
aim to provide a structured way to dynamically extend the kernel at runtime in
an ABI preserving manner. Depending on the subsystem providing hook points, a
Khelp module may be able to associate per-object data for maintaining relevant
state between hook calls.
- pjd's Object Specific Data (OSD) KPI is used to manage the per-object data
allocated to Khelp modules. Create a new "OSD_KHELP" OSD type for use by the
Khelp framework.
- Bump __FreeBSD_version to 900028 to mark the introduction of the new KPIs.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz, others along the way
MFC after: 3 months
condition in proc_rwmem() and to (2) simplify the implementation of the
cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem()
the requested read or write could fail because the targeted page could be
reclaimed between the calls to vm_fault() and vm_page_hold().
In collaboration with: kib@
MFC after: 6 weeks
earlier commit. While here, move the thread lock down in rtp_to_pri().
It is not needed for all of the priority value checks and the computation
of newpri.
Reported by: swell.k @ gmail
MFC after: 3 days
a 32-bit one. This can cause weird timeout issues, as the copying reads
garbage from the user.
Code by: Deepak Veliath <deepak dot veliath at isilon dot com>
MFC after: 1 week
to PSARC/2010/029. In short, the semantics is simplified - "weird stuff"
no longer happens after chmod, entries don't get duplicated during
inheritance, and trivial ACLs no longer contain three "DENY" entries,
which is also more friendly to MS Windows.
By default, UFS keeps using old semantics. To change it, set sysctl
vfs.acl_nfs4_old_semantics to 0. I'll flip the switch when ZFSv28
hits the tree, to keep these two in sync - ZFS v28 uses PSARC semantics,
and ZFS v15 uses the old one.
It is possible a lower priority thread lending priority to higher priority
thread, in old code, it is ignored, however the lending should always be
recorded, add field td_lend_user_pri to fix the problem, if a thread does
not have borrowed priority, its value is PRI_MAX.
MFC after: 1 week
and set *procp to NULL in all cases. Previously, it was not being set
in the ERESTART case. This is effectively no-op, since its value is
ignored by callers in the error case.
Reviewed by: kib@
proper log message for r216150.
MFC after: 1 week
If unix socket has a unix socket attached as the rights that has a
unix socket attached as the rights that has a unix socket attached as
the rights ... Kernel may overflow the stack on attempt to close such
socket.
Only close the rights file in the context of the current close if the
file is not unix domain socket. Otherwise, postpone the work to
taskqueue, preventing unlimited recursion.
The pass of the unix domain sockets over the SCM_RIGHTS message
control is not widely used, and more, the close of the socket with
still attached rights is mostly an application failure. The change
should not affect the performance of typical users of SCM_RIGHTS.
Reviewed by: jeff, rwatson
in "struct vm_object". This is required to make it possible to account
for per-jail swap usage.
Reviewed by: kib@
Tested by: pho@
Sponsored by: FreeBSD Foundation
general LOR issue where the sysctl lock had no good place in the
hierarchy. One specific instance is #284 on
http://sources.zabbadoz.net/freebsd/lor.html .
Reviewed by: jhb
MFC after: 1 month
X-MFC-note: split oid_refcnt field for oid_running to preserve KBI
When shared-locked vnode is supplied as an argument to vunref(9) and
resulting usecount is 0, set VI_OWEINACT and do not try to upgrade vnode
lock. The later could cause vnode unlock, allowing the vnode to be
reclaimed meantime.
Tested by: pho
MFC after: 1 week
tq_name was used write-only and besides it was just a pointer, so it
could point to some garbage in a temporary buffer that's gone.
This change shouldn't change KPI/KBI as struct taskqueue is private to
subr_taskqueue.c.
If we find a need for tq_name it can be resurrected at any moment.
taskqueue_create() interface is preserved for this purpose.
Suggested by: jhb
MFC after: 10 days
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.
Changes reverted:
------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines
Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.
------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines
Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
thread specific informations.
In order to do that, and in order to avoid KBI breakage with existing
infrastructure the following semantic is implemented:
- For live programs, a new member to the PT_LWPINFO is added (pl_tdname)
- For cores, a new ELF note is added (NT_THRMISC) that can be used for
storing thread specific, miscellaneous, informations. Right now it is
just popluated with a thread name.
GDB, then, retrieves the correct informations from the corefile via the
BFD interface, as it groks the ELF notes and create appropriate
pseudo-sections.
Sponsored by: Sandvine Incorporated
Tested by: gianni
Discussed with: dim, kan, kib
MFC after: 2 weeks
more than 1s earlier. Prior to this commit, the computation of
th_scale * delta (which produces a 64-bit value equal to the time since
the last tc_windup call in units of 2^(-64) seconds) would overflow and
any complete seconds would be lost.
We fix this by repeatedly converting tc_frequency units of timecounter
to one seconds; this is not exactly correct, since it loses the NTP
adjustment, but if we find ourselves going more than 1s at a time between
clock interrupts, losing a few seconds worth of NTP adjustments is the
least of our problems...
on FreeBSD (amd64), invocations of "javac" (or "java") eventually
end with the output of "Killed" and exit code 137.
This is caused by:
1. After calling exec() in multithreaded linux program threads are not
destroyed and continue running. They get killed after program being
executed finishes.
2. linux_exit_group doesn't return correct exit code when called not
from group leader. Which happens regularly using sun jvm.
The submitters fix this in a similar way to how NetBSD handles this.
I took the PRs away from dchagin, who seems to be out of touch of
this since a while (no response from him).
The patches committed here are from [2], with some little modifications
from me to the style.
PR: 141439 [1], 144194 [2]
Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk
Reviewed by: rdivacky (in april 2010)
MFC after: 5 days
and vop_reclaim() methods. They seems to be unused, and the reported
situation is normal for the forced unmount.
MFC after: 1 week
X-MFC-note: keep prtactive symbol in vfs_subr.c
This way, when there is a dependency between two modules, the handler of the
latter probed runs first.
This is a similar approach as the modules are unloaded in the same
linkerfile.
Sponsored by: Sandvine Incorporated
Submitted by: Nima Misaghian <nmisaghian at sandvine dot com>
MFC after: 1 week
supported rather than 1. They are supposed to return a suitable value
for sysconf(3). While here, make the fsync sysctl match <unistd.h>.
MFC after: 1 week
during boot.
Change the last argument of gets() to indicate a visibility flag and add
definitions for the numerical constants. Except for the value 2, gets()
will behave exactly the same, so existing consumers shouldn't break. We
only use it in two places, though.
Submitted by: lme (older version)
you tag a socket with an uint32_t value. The cookie can then be
used by the kernel for various purposes, e.g. setting the skipto
rule or pipe number in ipfw (this is the reason SO_USER_COOKIE has
been implemented; however there is nothing ipfw-specific in its
implementation).
The ipfw-related code that uses the optopn will be committed separately.
This change adds a field to 'struct socket', but the struct is not
part of any driver or userland-visible ABI so the change should be
harmless.
See the discussion at
http://lists.freebsd.org/pipermail/freebsd-ipfw/2009-October/004001.html
Idea and code from Paul Joe, small modifications and manpage
changes by myself.
Submitted by: Paul Joe
MFC after: 1 week
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
in the _mtx_* or __mtx_* namespaces. While here, change the names to more
closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.
Suggested by: bde (1, 2)