used to ensure that we weren't exiting the syscall with a lock still
held. This wasn't safe, however, because we'd already executed a vput()
and on a loaded system the vnode may have been free'd by the time we
assert. This functionality is also handled by the td_locks assert in
userret, which doesn't tell you what the syscall was, but will at least
panic before you deadlock.
Sponsored by: Isilon Systems, Inc.
Discovred by: Peter Holm
Approved by: re (blanket vfs)
anyway and it's not used outside of vfs_subr.c.
- Change vgonel() to accept a parameter which determines whether or not
we'll put the vnode on the free list when we're done.
- Use the new vgonel() parameter rather than VI_DOOMED to signal our
intentions in vtryrecycle().
- In vgonel() return if VI_DOOMED is already set, this vnode has already
been reclaimed.
Sponsored by: Isilon Systems, Inc.
most of the code to deal with them has been dead for sometime. Simplify
the code by doing an insert sort hinted by the current head position.
Met with apathy by: arch@
I introduce a very small race here (some file system can be mounted or
unmounted between 'count' calculation and file systems list creation),
but it is harmless.
Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by: Peter Holm <peter@holm.cc>
It can be used to panic the kernel by giving too big value.
Fix it by moving allocation and size verification into kern_getfsstat().
This even simplifies kern_getfsstat() consumers, but destroys symmetry -
memory is allocated inside kern_getfsstat(), but has to be freed by the
caller.
Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by: Peter Holm <peter@holm.cc>
o getsockopt(SO_ACCEPTFILTER) always returns success on listen socket
even we didn't install accept filter on the socket.
o Fix these bugs and add regression tests for them.
Submitted by: Igor Sysoev [1]
Reviewed by: alfred
MFC after: 2 weeks
events could be added to cover other interesting details.
- Add some VNASSERTs to discover places where we access vnodes after
they have been uma_zfree'd before we try to free them again.
- Add a few more VNASSERTs to vdestroy() to be certain that the vnode is
really unused.
Sponsored by: Isilon Systems, Inc.
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries. However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.
To fix this, hibufspace is exported and scaled to a reasonable
fraction. This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously. If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system. It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.
The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.
General nod by: gad
MFC after: 2 weeks
More testing: wes
PR: kern/79208
well worth the bloat.
- Change the formatting of 'show ktr' slightly to accommodate the
additional field. Remove a tab from the verbose output and place the
actual trace data after a : so it is more easy to understand which
part is the event and which is part of the record.
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()
My benchmarks have not revealed any performance degradation.
Reviewed by: jeff, bde
Approved by: rwatson, jmg (kqueue changes), grehan (mentor)
- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
and extend its functionality:
value policy
0 show all mount-points without any restrictions
1 show only mount-points below jail's chroot and show only part of the
mount-point's path (if jail's chroot directory is /jails/foo and
mount-point is /jails/foo/usr/home only /usr/home will be shown)
2 show only mount-point where jail's chroot directory is placed.
Default value is 2.
Discussed with: rwatson
security.bsd.see_other_uids is set to 0, etc.
One can check if invisible process is active, by doing:
# ktrace -p <pid>
If ktrace returns 'Operation not permitted' the process is alive and
if returns 'No such process' there is no such process.
MFC after: 1 week
milliseconds due to what is essentially n^2 algorithmic complexity. This
change makes the algorithm N*2 instead. This heavy processing manifested
itself as skipping in audio and video playback due to the long scheduling
latencies and contention on giant by pcm.
- flushbufqueues() is now responsible for flushing multiple buffers
rather than one at a time. This allows us to save our progress in the
list by using a sentinal. We must do the numdirtywakeup() and
waitrunningbufspace() here now rather than in buf_daemon().
- Also add a uio_yield() after we have processed the list once for bufs
without deps and again for bufs with deps. This is to release Giant
and allow any other giant locked code to proceed.
Tested by: Many users on current@
Revealed by: schedgraph traces sent by Emil Mikulic & Anthony Ginepro
list on fork() if the process doesn't actually have references to any
semaphores. This avoids extra work, as well as potentially asking to
allocate storage for 0 references.
Found by: avatar
MFC after: 1 week
points to convert _sema() to _sem() for consistency purposes with
respect to the other semaphore-related entry points:
mac_init_sysv_sema() -> mac_init_sysv_sem()
mac_destroy_sysv_sem() -> mac_destroy_sysv_sem()
mac_create_sysv_sema() -> mac_create_sysv_sem()
mac_cleanup_sysv_sema() -> mac_cleanup_sysv_sem()
Congruent changes are made to the policy interface to support this.
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
as this happens via thread_switchout(). I don't particularly like the
structure of the code here. We twice call out to thread code when
a thread is voluntarily switching. Once to thread_switchout() and once
to slot_fill(), while sched_4BSD does even more work which is redundant
to select another thread to use our remaining slice. This should be
simplified in the future, but for now I'm only going to fix the bug not
the bad design.
mutex instead of a MTX_DEF one in order to defer preemption while
reading the date and time registers. If we don't manage to read them
within the time slot where we are guaranteed that no updates occur we
might actually read them during an update in which case the output is
undefined.
aio_write(2) completion through kevent(2). This method does not work on
64-bit architectures. It was deprecated in FreeBSD 4.4. See revisions
1.87 and 1.70.2.7.
Change aio_physwakeup() to call psignal(9) directly rather than indirectly
through a timeout(9). Discussed with: bde
Correct a bug introduced in revision 1.65 that could result in premature
delivery of a signal if an lio_listio(2) consisted of a mixture of
direct/raw and queued I/O operations. Observed by: tegge
Eliminate a field from struct kaioinfo that is now unused.
Reviewed by: tegge
slot for us. Previously, we would take two slots on every preempt, and
setrunqueue() would fix it up for us in the non threaded case. The
threaded case was simply broken.
- Clean up flags, prototypes, comments.