The previous commit also included changes to all the system call lists,
but it is a tradition to update these lists in a second commit, so rerun
make sysent to update the $FreeBSD$ tags inside these files to refer to
the latest version of syscalls.master.
Requested by: rwatson
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:
- Improved driver model:
The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.
If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.
- Improved hotplugging:
With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).
The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.
- Improved performance:
One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.
Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.
Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
- Speedup the lock orderings lookup modifying the witness graph from a
linked tree to a matrix. A table lookup caches the lock orderings in
order to make a O(1) access for them. Any witness object has an unique
index withing this lookup cache table.
- Reduce the lock contention on w_mtx acquiring it only when the LOR
actually happens and not in a sane case. In order to do this don't totally
flush lock lists (per-CPU spinlocks list and per-thread sleeplocks list)
but check for ll_count anytime we need to have to verify allocations sanity.
- Introduce the function witness_thread_exit() in the witness namespace which
should verify a thread doesn't hold any witness occurrence why exiting.
- Rename the sysctl debug.witness.graphs into debug.witness.fullgraph and
add debug.witness.badstacks which prints out stacks for LOR revealed.
This is implemented using the stack(9) support, which makes WITNESS to be
dependent by the STACK option or by the DDB (including STACK) option.
- Fix style(9) for src/sys/kern/subr_witness.c
The hash table approach has been developed by Ilya Maykov on the behalf of
Isilon Systems which kindly released the patch.
Jeff Roberson, ported the patch to -CURRENT and fixed w_mtx contention, on the
behalf of Nokia.
Submitted by: Ilya Maykov <ivmaykov at gmail dot com> (Isilon Systems), jeff
Sponsored by: Nokia
the various copyouts associated with initializing the process's
argv/env data in userspace. It is possible that these copyout
operations can fault under memory pressure, possibly resulting
in dead locks. This is believed to be safe since none of the
copyout_strings() operations need to interact with the vnode here.
Submitted by: Zhouyi Zhou
PR: kern/111260
Discussed with: kib
MFC after: 3 weeks
There is no reason the fdopen() routine needs Giant. It only sets
curthread->td_dupfd, based on the device unit number of the cdev.
I guess we won't get massive performance improvements here, but still, I
assume we eventually want to get rid of Giant.
msleep/mtx_sleep or the various cv_*wait*() routines. Currently, the
"unlock" behavior of PDROP and cv_wait_unlock() with Giant is not
permitted as it is will be confusing since Giant is fully unrecursed and
unlocked during a thread sleep.
This is handy for subsystems which wish to allow unlocked drivers to
continue to use Giant such as CAM, the new TTY layer, and the new USB
stack. CAM currently uses a hack that I told Scott to use because I
really didn't want to permit this behavior, and the TTY and USB patches
both have various patches to permit this.
MFC after: 2 weeks
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup(). When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).
With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup. However, this required
grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.
Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked. The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up. The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().
Discussed with: jeff
Glanced at by: sam
Tested by: Jurgen Weber jurgen - ish com au
MFC after: 2 weeks
lstat(2) is called on symlinks -- this code appears never to have
worked. The PR this addresses suggests that the intended
original behavior is the right one, but as bde points out in the
PR comments, we do actually support storing a mode on symlinks,
so returning it seems reasonable.
This is consistent with Mac OS X, which despite documentation to
the contrary does return the mode set on a symlink, but not some
other platforms. The Single Unix Spec requires only that the
returned bits be "meaningful", which seems at best unhelpful as
advice goes.
PR: 25018
MFC after: 3 days
vnode lock may cause a LOR between kld_sx lock and vnode lock.
linker_load_dependencies() drops kld_sx, and another thread may attempt
to load the same kld.
Reported and tested by: pjd
MFC after: 1 week
processes are not producing absolute pathname tokens. It is required
that audited pathnames are generated relative to the global root mount
point. This modification changes our implementation of audit_canon_path(9)
and introduces a new function: vn_fullpath_global(9) which performs a
vnode -> pathname translation relative to the global mount point based
on the contents of the name cache. Much like vn_fullpath,
vn_fullpath_global is a wrapper function which called vn_fullpath1.
Further, the string parsing routines have been converted to use the
sbuf(9) framework. This change also removes the conditional acquisition
of Giant, since the vn_fullpath1 method will not dip into file system
dependent code.
The vnode locking was modified to use vhold()/vdrop() instead the vref()
and vrele(). This will modify the hold count instead of modifying the
user count. This makes more sense since it's the kernel that requires
the reference to the vnode. This also makes sure that the vnode does not
get recycled we hold the reference to it. [1]
Discussed with: rwatson
Reviewed by: kib [1]
MFC after: 2 weeks
It seems we only use `lbolt' inside the VFS syncer and the TTY layer
now. Because I'm planning to replace the TTY layer next month, there's
no reason to keep `lbolt' if it's only used in a single thread inside
the kernel.
Because the syncer code wanted to wake up the syncer thread before the
timeout, it called sleepq_remove(). Because we now just use a condvar(9)
with a timeout value of `hz', we can wake it up using cv_broadcast()
without waking up any unrelated threads.
Reviewed by: phk
After the import of the new TTY layer, the TTY_QUOTE definition will not
be present anymore. To make sure clists will still work as expected,
introduce an internal definition called QUOTEMASK.
Maybe we can decide to remove the quote bits entirely, but we still have
to look into this. There may be drivers that still use the quote bits.
Obtained from: //depot/projects/mpsafetty
- When a cpuset is applied to a thread, walk the cpuset to see if it is a
"full" cpuset (includes all available CPUs). If not, set a new
TDS_AFFINITY flag to indicate that this thread can't run on all CPUs.
When inheriting a cpuset from another thread during thread creation, the
new thread also inherits this flag. It is in a new ts_flags field in
td_sched rather than using one of the TDF_SCHEDx flags because fork()
clears td_flags after invoking sched_fork().
- When placing a thread on a runqueue via sched_add(), if the thread is not
pinned or bound but has the TDS_AFFINITY flag set, then invoke a new
routine (sched_pickcpu()) to pick a CPU for the thread to run on next.
sched_pickcpu() walks the cpuset and picks the CPU with the shortest
per-CPU runqueue length. Note that the reason for the TDS_AFFINITY flag
is to avoid having to walk the cpuset and examine runq lengths in the
common case.
- To avoid walking the per-CPU runqueues in sched_pickcpu(), add an array
of counters to hold the length of the per-CPU runqueues and update them
when adding and removing threads to per-CPU runqueues.
MFC after: 2 weeks
- Check if panicstr isn't set, if it is ignore the lock. This helps to avoid
confusion, because lockmgr is a no-op when panicstr isn't NULL, so
asserting anything at this point doesn't make sense and can just race with
other panic.
Discussed with: kib
The ttyinfo() routine generates the fancy output when pressing ^T. Right
now it is stored in tty.c. In the MPSAFE TTY code it is already stored
in tty_info.c. To make integration of the MPSAFE TTY code a little
easier, take the same approach.
This makes the TTY code a little bit more readable, because having the
proc_*/thread_* routines in tty.c is very distractful.
Approved by: philip (mentor)
child process immediately after bulk bcopy() without dropping the
process lock.
Since process is not single-threaded when forking, dropping and
reacquiring the lock allows an other thread to change the process title
of the parent in between, and results in hold being done on the invalid
pointer. The problem manifested itself as the double free of the old
p_args.
Reported by: kris
Reviewed by: jhb
MFC after: 1 week
and there is no need to maintain it.
- Fix vn_get() in order to let it call vget(9) with a valid locking
request. vget(9) returns the vnode locked in order to prevent recycling,
but in this case internal XFS locks alredy prevent it from happening, so
it is safe to drop the vnode lock before to return by vn_get().
- Add a VNASSERT() in vget(9) in order to catch malformed locking requests.
Discussed with: kan, kib
Tested by: Lothar Braun <lothar at lobraun dot de>
interrupt-driven configuration handlers to complete, print out a
diagnostic message every 60 second indicating which handlers are
still running. Do this at most 5 times per run so as to avoid
scrolling out any useful information from the kernel message
buffer.
The interval of 60 seconds was selected based on a best guess as
to the nature of "long enough" and may want to be tuned higher
or lower depending on real-world tolerances.
MFC after: 3 days
Discussed with: scottl
for completion in run_interrupt_driven_config_hooks(). This is
helpful when trying to figure out which device drivers have gone
into la-la land during boot-time autoconfiguration.
MFC after: 3 days
- When a tick occurs on a cpu, iterate from cs_softticks until ticks.
The per-cpu tick processing happens asynchronously with the actual
adjustment of the 'ticks' variable. Sometimes the results may
be visible before the local call and sometimes after. Previously this
could cause a one tick window where we didn't evaluate the bucket.
- In softclock fetch curticks before incrementing cc_softticks so we
don't skip insertions which were made for the current time.
Sponsored by: Nokia
sched_tick() to prevent multiple increments for one tick. This pushes
the value out of range and breaks priority calculation.
Reviewed by: kib
Found by: pho/nokia
Sponsored by: Nokia
MFC after: 3 days