called.
- vfs_getvfs has to return a reference to prevent the returned mountpoint
from changing identities.
- Release references acquired via vfs_getvfs.
Discussed with: tegge
Tested by: kris
Sponsored by: Isilon Systems, Inc.
mount memory from being reclaimed. This resolves a number of race
conditions described in vfs_default.c and introduced with the
VFS_LOCK_GIANT macros.
- Let the mtx and lock remain valid after the mount structure has been
freed by using init and fini calls. Technically fini will never be
called but is included for completeness.
- Consistently use lockmgr directly rather than lockmgr to lock and
vfs_unbusy to unlock.
Discussed with: tegge
Tested by: kris
Sponsored by: Isilon Systems, Inc.
- Move the vn_lock of the dvp until after we've unbusied the filesystem
to avoid a LOR with the mount point lock.
- In the v_mountedhere while loop we acquire a new instance of giant each
time through without releasing the first. This would cause us to leak
Giant.
Sponsored by: Isilon Systems, Inc.
requires Giant. It is set in bgetvp and cleared in brelvp.
- Create QUEUE_DIRTY_GIANT for dirty buffers that require giant.
- In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and
only if we think there are buffers in that queue.
Sponsored by: Isilon Systems, Inc.
failing, print a message when we fail for some reason as most callers do
not check the return value (e.g. 'cuz they're called from SYSINIT)
Reviewed by: scottl
MFC after: 1 week
controllers typically have multiple channels and support a number
of serial communications protocols. The scc(4) driver is itself
an umbrella driver that delegates the control over each channel
and mode to a subordinate driver (like uart(4)).
The scc(4) driver supports the Siemens SAB 82532 and the Zilog
Z8530 and replaces puc(4) for these devices.
a lock's priority to a sleeping thread. When we panic, dump a stack
trace of the thread that is asleep if DDB is compiled into the kernel
just before calling panic(). This is much more informative and useful
for debugging than the current behavior of getting a page fault and not
having an easy way of determining which thread caused the original problem.
MFC after: 1 week
a race where data could come in before we clear the INFLUX flag, and get
skipped over by knote (and hence never be activated, though it should of
been)...
Found by: glebius & co.
Reviewed by: glebius
MFC after: 3 days
generating a coredump as the result of a signal.
- Fix a bug where we could leak a Giant lock if vn_start_write() failed
in coredump().
Reported by: jmg (2)
and use that instead of testing fdidx against -1 to determine if it should
release Giant if Giant was locked due to the requested file residing on a
non-MPSAFE VFS.
Discussed with: jeff
arguments. The first one is never used (all callers pass in 0); the
second is sometimes used to pass in a struct timespec * which is used as
a timeout and never modified. Constify that argument so callers can pass
a const struct timespec * without jumping through hoops.
acquiring Giant in kern_sendfile().
Guard against the forced reclamation of a vnode in kern_sendfile().
Discussed with: jeff
Reviewed by: tegge
MFC after: 3 weeks
REGRESSION is enabled, allows user space to dictate that sonewconn()
should skip it's "skip the hard work" check to see if the listen
queue is full, and instead proceed with allocation of a socket and
trimming of the overflowed queue. This makes it easier to test the
queue overflow logic.
MFC after: 1 month
Kernel changes:
Inform hwpmc of executable objects brought into the system by
kldload() and mmap(), and of their removal by kldunload() and
munmap(). A helper function linker_hwpmc_list_objects() has been
added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve
the list of currently loaded kernel modules.
The unused `MAPPINGCHANGE' event has been deprecated in favour
of separate `MAP_IN' and `MAP_OUT' events; this change reduces
space wastage in the log.
Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to
handle the map change callbacks.
Change the default per-cpu sample buffer size to hold
32 samples (up from 16).
Increment __FreeBSD_version.
libpmc(3) changes:
Update libpmc(3) to deal with the new events in the log file; bring
the pmclog(3) manual page in sync with the code.
pmcstat(8) changes:
Introduce new options to pmcstat(8): "-r" (root fs path), "-M"
(mapfile name), "-q"/"-v" (verbosity control). Option "-k" now
takes a kernel directory as its argument but will also work with
the older invocation syntax.
Rework string handling in pmcstat(8) to use an opaque type for
interned strings. Clean up ELF parsing code and add support for
tracking dynamic object mappings reported by a v2.0.00 hwpmc(4).
Report statistics at the end of a log conversion run depending
on the requested verbosity level.
Reviewed by: jhb, dds (kernel parts of an earlier patch)
Tested by: gallatin (earlier patch)
VFS_LOCK_GIANT/VFS_UNLOCK_GIANT calls. This completely removes Giant
acquisition in the syscall path for ffs.
Bug fix to kern_fhstatfs from: Todd Miller <Todd.Miller@sparta.com>
Sponsored by: Isilon Systems, Inc.
"fdinit() fails to initialize newfdp->fd_fd.fd_lastfile to -1. This breaks
fdcopy() which will incorrectly set newfdp->fd_freefile to 1 if no files are
open and the last file descriptor marked as unused for fdp was 0. This later
causes descriptor 0 to be unavailable in newfdp when the optimization is
enabled.
When the last file descriptor previously marked as used is nonzero and marked
as unused, fdunused() incorrectly sets fdp->fd_lastfile to fd - 1 due to
fd_last_used() returning (size - 1). This hides the problem that breaks the
optimization."
This allows us to keep the optimization, while un-breaking it.
This is a RELENG_6 candidate.
PR: kern/87208
MFC after: 1 week
Submitted by: tegge
the target directory or file. This case should fail in the filesystem
anyway and perhaps kern_rename() should catch it.
Sponsored by: Isilon Systems, Inc.
really breaking things. Simple "close(0); dup(fd)" does not return descriptor
"0" in some cases. Further, this change also breaks some MAC interactions with
mac_execve_will_transition(). Under certain circumstances, fdcheckstd() can
be called in execve(2) causing an assertion that checks to make sure that
stdin, stdout and stderr reside at indexes 0, 1 and 2 in the process fd table
to fail, resulting in a kernel panic when INVARIANTS is on.
This should also kill the "dup(2) regression on 6.x" show stopper item on the
6.1-RELEASE TODO list.
This is a RELENG_6 candidate.
PR: kern/87208
Silence from: des
MFC after: 1 week
defined for an in-use socket. This allows us to eliminate countless tests
of whether so_pcb is non-NULL, eliminating dozens of error cases. For
now, retain the call to sotryfree() in the uipc_abort() path, but this
will eventually move to soabort().
These new assumptions should be largely correct, and will become more so
as the socket/pcb reference model is fixed. Removing the notion that
so_pcb can be non-NULL is a critical step towards further fine-graining
of the UNIX domain socket locking, as the so_pcb reference no longer
needs to be protected using locks, instead it is a property of the socket
life cycle.
consumers ignore the return value, soabort() is required to succeed,
and protocols produce errors here to report multiple freeing of the
pcb, which we hope to eliminate.
specified, the rightmost option takes effect." Fix code to obey
this. This makes e.g. "mount -r /usr" or "mount -ar" actually
mount file systems read-only.
Fix detection of active unlinked files by checking VI_OWEINACT and
VI_DOINGINACT in addition to v_usecount.
Defer inactive handling for unlinked files if the file system is mostly
suspended (secondary writes being blocked).
Perform deferred inactive handling after the file system is resumed.
triggers.
This should eliminate all the trivial messages which result from minor
increases in cpu_tick frequency.
Machines which don't du cpu clock fiddling shouldn't issue "backwards"
messages now.
Laptops and other machines where the initial estimate of cputicks may be
waaaay off will still issue warnings.
replacement for vn_write_suspend_wait() to better account for secondary write
processing.
Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.
Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
whether or not to allocate a full mbuf cluster rather than just a plain
mbuf when adding on additional mbufs in m_getm(). In practice, there wasn't
any resulting mem trashing since m_getm() doesn't ever allocate an mbuf with
a packet header, and MINCLSIZE is the available payload in an mbuf with a
header rather than the available payload in a plain mbuf.
Discussed with: andre (lightly)
Releasing items from the mt_zone can not be done by a simple
uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC
flag. Use uma_zfree_arg instead and supply the slab.
This bug caused panics in low memory situations on unloading kernel
modules containing MALLOC_DEFINE(..) statements.
Submitted by: ups
be called without any vnode locks held. Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
has many positive effects including improved smp locking, reducing
interdependencies between mounts that can lead to deadlocks, etc.
- Add the softdep worklist and various counters to the ufsmnt structure.
- Add a mount pointer to the workitem and remove mount pointers from the
various structures derived from the workitem as they are now redundant.
- Remove the poor-man's semaphore protecting softdep_process_worklist and
softdep_flushworklist. Several threads may now process the list
simultaneously.
- Add softdep_waitidle() to block the thread until all pending
dependencies being operated on by other threads have been flushed.
- Use softdep_waitidle() in unmount and snapshots to block either
operation until the fs is stable.
- Remove softdep worklist processing from the syncer and move it into the
softdep_flush() thread. This thread processes all softdep mounts
once each second and when it is called via the new softdep_speedup()
when there is a resource shortage. This removes the softdep hook
from the kernel and various hacks in header files to support it.
Reviewed by/Discussed with: tegge, truckman, mckusick
Tested by: kris
so other threads can not see it if we unlock the proc
lock (this can happen in knlist_delete). Don't do wakeup,
it is not necessary.
2. Decrease kaio_buffer_count in biohelper rather than
doing it in aio_bio_done_notify.
3. In aio_bio_done_notify, don't send notification if KAIO_RUNDOWN
was set, because the process is already in single thread mode.
4. Use assignment to initialize aiothreadflags.
5. AIOCBLIST_RUNDOWN is not useful, axe the code using it.
6. use LIO_NOP instead of zero.
callout_drain() logic. We no longer need a separate non-spin mutex to
do sleep/wakeup with, instead we can now just use the one spin mutex to
manage all the callout functionality.
the last reference is dropped. I forgot that vnodes can stick around
for a very long time until processes discover that they are dead. This
means that a vnode reference is not sufficient to keep the mount
referenced and even more code will be required to ref mount points.
Discovered by: kris
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
stop event earlier. After we have signalled that, we set P_WEXIT and
then wait for any processes with a hold on the vmspace via PHOLD to
release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is
invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
to zero.
- Change proc_rwmem() to require that the processing read from has its
vmspace held via PHOLD by the caller and get rid of all the junk to
screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
to clear an earlier single-step simualted via a breakpoint). We only
do one to avoid races. Also, by making the EINVAL error for unknown
requests be part of the default: case in the switch, the various
switch cases can now just break out to return which removes a _lot_ of
duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug
where a LWP ptrace command could return EINVAL with the proc lock still
held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
ptrace_clear_single_step() to always be called with the proc lock
held (it was a mixed bag previously). Alpha and arm have to drop
the lock while the mess around with breakpoints, but other archs
avoid extra lock release/acquires in ptrace(). I did have to fix a
couple of other consumers in kern_kse and a few other places to
hold the proc lock and PHOLD.
Tested by: ps (1 mostly, but some bits of 2-4 as well)
MFC after: 1 week
modules prior to looking up the directory which we will cover to avoid
this problem in mount.
- We must hold the coveredvp locked before we can busy the mountpoint to
prevent a lock order reversal with the vfs_busy() in lookup which holds
the directory lock prior to doing a vfs_busy(). The directory lock is
required to safely clear the v_mountedhere field on the directory.
MFC After: 1 week
prevent the mount point from going away while we're waiting on the lock.
The ref does not need to persist once we have the lock because the
lock prevents the mount point from being unmounted.
MFC After: 1 week
the VFS_STATFS call to prevent the mount from disappearing while we're
stating.
- Convert these routines to use MPSAFE namei semantics.
MFC After: 1 week
a calcru() wrapper that passes a local rusage_ext on the stack that is
a snapshot to do the calculations on. Now we can pass p->p_crux to
calcru1() in calccru() again which fixes the issues with runtime going
backwards messages when dead processes are harvested by init.
Reviewed by: phk
Tested by: Stefan Ehmann shoesoft at gmx dot net