- add rm_try_rlock().
- add RM_SLEEPABLE to use sx(9) as the back-end lock in order to sleep while
holding the write lock.
- change rm_noreadtoken to a cpu bitmask to indicate which CPUs need to go
through the lock/unlock in order to synchronize. As a side effect, this
also avoids IPI to CPUs without any readers during rm_wlock.
Discussed with: ups@, rwatson@ on arch@
Sponsored by: Isilon Systems, Inc.
signals, because it is managed by debugger, however a normal signal sent to
a interruptibly sleeping thread wakes up the thread so it will handle the
signal when the process leaves the stopped state.
PR: 150138
MFC after: 1 week
least one execute bit set, otherwise execve(2) will return EACCES even
for an user with PRIV_VFS_EXEC privilege.
Add the check also to vaccess(9), vaccess_acl_nfs4(9) and
vaccess_acl_posix1e(9). This makes access(2) to better agree with
execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check
to zfs_freebsd_access() too. There may be other file systems which are
not using vaccess*() functions and need to be handled separately.
PR: kern/125009
Reviewed by: bde, trasz
Approved by: pjd (ZFS part)
for socket, when specified POLLIN|POLLOUT in events, you would have one
selfd registered for receiving socket buffer, and one for sending. Now,
if both events are not ready to fire at the time of the initial scan,
but are simultaneously ready after the sleep, pollrescan() would iterate
over the pollfd struct twice. Since both times revents is not zero,
returned value would be off by one.
Fix this by recalculating the return value in pollout().
PR: kern/143029
MFC after: 2 weeks
Actually it is hard to properly handle such a failure, especially in MNT_UPDATE
case. The only reason for the vfs_allocate_syncvnode() function to fail is
getnewvnode() failure. Fortunately it is impossible for current implementation
of getnewvnode() to fail, so we can assert this and make
vfs_allocate_syncvnode() void. This in turn free us from handling its failures
in the mount code.
Reviewed by: kib
MFC after: 1 month
rather than forging ahead and interpreting garbage buffer content
and dirent structures.
This change backs out r211684 which was essentially a no-op.
MFC after: 1 week
standard kill(). On other systems, SI_LWP is generated by lwp_kill().
This will allow conforming applications to differentiate between
signals generated by standard events and those generated by other
implementation events in a manner compatible with existing practice.
- Bump __FreeBSD_version
the uio_offset adjustment instead to calculate a correct *len.
Without this change, we run off the end of the directory data
we're reading and panic horribly for nfs filesystems.
MFC after: 1 week
use '-' in probe names, matching the probe names in Solaris.[1]
Add userland SDT probes definitions to sys/sdt.h.
Sponsored by: The FreeBSD Foundation
Discussed with: rwaston [1]
LK_CANRECURSE after a lock is created. Use them to implement macros that
otherwise manipulated the flags directly. Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe. This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.
Reviewed by: kib
MFC after: 3 days
of p_traceflag that is stored in the kinfo_proc structure. It is still
racey even with the lock and the code will read a consistent snapshot of
the flag without the lock.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.
Tested by: marius (sparc64)
MFC after: 1 month
Interrupt driven configuration hooks serve two purposes: they are a
mechanism for registering for a callback that is invoked once interrupt
services are available, and they hold off root device selection so long
as any configuration hooks are still active. Before this change, it was
not possible to safely register additional hooks from the context of a
configuration hook callback. The need for this feature arises when
interrupts are required to discover new devices (e.g. access to the XenStore
to find para-virtualized devices) which in turn also require the ability
to hold off root device selection until some lengthy, interrupt driven,
configuration task has completed (e.g. Xen front/back device driver
negotiation).
More specifically, the mutex protecting the list of active configuration
hooks is never held during a callback, and static information is used
to ensure proper ordering and only a single callback to each hook even
when faced with registration or removal of a hook during an active run.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week.
time-of-day clock or vice versa. For x86 systems, RTC resolution is one
second and we used to lose up to one second whenever we initialize system
time from RTC or write system time back to RTC. With this change, margin
of error per conversion is roughly between -0.5 and +0.5 second rather
than between -1 and 0 second. Note that it does not take care of errors
from getnanotime(9) (which is up to 1/hz second) or CLOCK_GETTIME() latency.
These are just too expensive to correct and it is not worthy of the cost.
bufobj lock. If b_bufobj is not NULL, then bufobj lock should be
held when manipulating the flags. Not doing this sometimes leaves
BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to
stuck indefinitely in "getbuf" sleep, waiting for background write to
finish which is not actually performed.
Add BO_LOCK() in the cases where it was missed.
In collaboration with: pho
Tested by: bz
Reviewed by: jeff
MFC after: 1 month
use-after-free over a longer time. Also release the backing pages of
a guarded allocation at free(9) time to reduce the overhead of using
memguard(9). Allow setting and varying the malloc type at run-time.
Add knobs to allow:
- randomly guarding memory
- adding un-backed KVA guard pages to detect underflow and overflow
- a lower limit on the size of allocations that are guarded
Reviewed by: alc
Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page)
Silence from: -arch
Approved by: zml (mentor)
MFC after: 1 month
most system, based on benchmark results on a low-end fibre channel SAN
under VMWare:
vfs.read_max read performance
8 (historical default) 83 MB/s
16 (recent bump) 131 MB/s
32 (this version) 152 MB/s
64 157 MB/s
(results are +/- 3 MB/s)
As read-ahead is heuristic, based on past IO requests, it shouldn't be
problematic. The new default is still smaller then in other OSes.
number of CPUs detection.
However, that was not mention at all, the problem was not reported, the
patch has not been MFCed and the fix is mostly improper.
Fix the original overflow (caused when 32 CPUs must be detected) by
just using a different mathematical computation (it also makes more
explicit the size of operands involved, which is good in the moment
waiting for a more complete support for a large number of CPUs).
PR: kern/148698
Submitted by: Joe Landers <jlanders at vmware dot com>
Tested by: gianni
MFC after: 10 days
the vfs.read_max default. For most systems this means going from 128 KiB
to 256 KiB, which is still very conservative and lower than what most
other operating systems use, but as a sane default should not
interfere much with existing systems.
For systems with RAID volumes and/or virtualization envirnments, where
read performance is very important, increasing this sysctl tunable to 32
or even more will demonstratively yield additional performance benefits.
If MAXPHYS ever gets bumped up, it will probably be a good idea to slave
read_max to it.
SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could
not be copied to the application. If some data was left in the last
mbuf, it was correctly discarded, but MSG_TRUNC was not set.
Reviewed by: bz
MFC after: 3 weeks
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.
Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month
the virtualization detection successfully disabling the clflush instruction.
This fixes insta-panics for XEN hvm users when the hw.clflush_disable
tunable is -1 or 0 (-1 by default).
Discussed with: jhb
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.
In collaboration with: pho
MFC after: 1 month