- Allow setting format, resolution and accuracy of BPF time stamps per
listener. Previously, we were only able to use microtime(9). Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command. Document all supported options in bpf(4) and their uses.
- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts. bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp. The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries. On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long. On 32-bit platforms, the size may
increase by 8 bytes. For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested. However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.
- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.
Reviewed by: net@
flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for
the memory allocation.
As Warner noted, allowing the functions to sleep might cause
reordering of the queued notifications.
Reviewed by: imp, jh
MFC after: 3 weeks
via %s
Most of the cases looked harmless, but this is done for the sake of
correctness. In one case it even allowed to drop an intermediate buffer.
Found by: clang
MFC after: 2 week
per-buf flag to catch if a buf is double-counted in the free count.
This code was useful to debug an instance where a local patch at Isilon
was incorrectly managing numfreebufs for a new buf state.
Reviewed by: jeff
Approved by: zml (mentor)
shared resources defaults beyond absolute minimums.
The new values are chosen mostly by magic. They are still fairly
small and will need increasing for large installations (especially
SHMMAX). However, they are now enough to e.g. start PostgreSQL
installations with ~~300 users and nearly 512 MB of shared buffers.
Reviewed by: A short discussion on hackers@
to obtain both trap frame and opaque argument submitted on registrction.
After kernel and all drivers get used to it, legacy hack can be removed.
Reviewed by: jhb@
from the buffer pages to buffer. Combine the code to set buffer
dirty range (previously in vfs_setdirty()) and to clean the pages
(vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now
the vm object lock is acquired only once.
Drain the VPO_BUSY bit of the buffer pages before setting valid
and clean bits in vfs_clean_pages_dirty_buf() with new helper
vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not
busy.
In vfs_busy_pages(), move the wait for draining of VPO_BUSY before
the dirtyness handling, to follow the structure of
vfs_clean_pages_dirty_buf().
Reported and tested by: pho
Suggested and reviewed by: alc
MFC after: 2 weeks
different mount points, e.g. the nullfs vnode and the covered vnode
from the lower filesystem. In this case, existing assertion in
vop_rename_pre() may be triggered.
Check for vnode locks equiality instead of the vnodes itself to
not trip over the situation.
Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
Tested by: pho
MFC after: 2 weeks
Use it to allow to tune sem_nsem_max at runtime, only when sem.ko
module is present in kernel.
Requested and tested by: amdmi3
Reviewed by: jhb
MFC after: 3 days
ta_func may free the task structure, so no references to its members
are valid after the handler has been called. Using a per-queue member
and having waits longer than strictly necessary was suggested by jhb.
Submitted by: Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by: zml, jhb
set but be cleared before the call to sodisconnect(). In this case,
ENOTCONN is returned: suppress this error rather than returning it to
userspace so that close() doesn't report an error improperly.
PR: kern/144061
Reported by: Matt Reimer <mreimer at vpop.net>,
Nikolay Denev <ndenev at gmail.com>,
Mikolaj Golub <to.my.trociny at gmail.com>
MFC after: 3 days
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().
Reviewed by: kib (an earlier version)
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
on exit, that is done once in thread_exit() and the second time in
proc_reap(), by clearing td_incruntime.
Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg()
instead of ruxagg_locked() and use it from thread_exit().
Diagnosed and tested by: neel
MFC after: 3 days
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
the idle process is now shared across all idle threads.
MFC after: 1 month
fd_set bits in select(2). It seems that historical behaviour is to not
reporting exception on EOF, and several applications are broken.
Reported by: Yoshihiko Sarumaru <ysarumaru gmail com>
Discussed with: bde
PR: ports/140934
MFC after: 2 weeks
eliminate it.
Assert that the object containing the page is locked in
vm_page_test_dirty(). Perform some style clean up while I'm here.
Reviewed by: kib
am now able to run 32 cores ok.. but I still will hang
on buildworld with a NFS problem. I suspect I am missing
a patch for the netlogic rge driver.
JC check and see if I am missing anything except your
core-mask changes
Obtained from: JC
We cannot expect that modspace is the last entry in the linker
set and thus that modspace + possible extra space up to PAGE_SIZE
would be contiguous. For the moment do not support more than
*_MODMIN space and ignore the extra space (*).
(*) We know how to get it back but it'll need testing.
Discussed with: jeff, rwatson (briefly)
Reviewed by: jeff
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 4 days
(TOCONS | TOLOG) mask even when called from DDB points.
That breaks several output, where the most notable is textdump output.
Fix this by having configurable callbacks passed to witness_list_locks()
and witness_display_spinlock() for printing out datas.
Reported by: several broken textdump outputs
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
MFC after: 7 days
X-MFC: r207922
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().
Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)
Switch to a per-processor counter for the total number of pages cached.