Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Do not use macros in the -width of a .Bl, since mandoc does not support them.
Fix issues reported by igor and mandoc -Tlint.
Use a .Bl for list of clock IDs instead of a comma list.
MFC after: 3 days
Sponsored by: Dell EMC
As of r325320 posix_fallocate returns EINVAL on ZFS to indicate that
the underlying filesystem does not support this operation, per
POSIX.1-2008. Document this case in the man page.
MFC after: 20 days
MFC with: r325320
Sponsored by: The FreeBSD Foundation
RB_POWERCYCLE instructs the platform to power off and then power back
on a short time later, if that's possible. Otherwise, degrade to the
RB_POWEROFF behavior.
Sponsored by: Netflix
In r322258 I made p1003_1b.aio_listio_max a tunable. However, further
investigation shows that there was never any good reason for that limit to
exist in the first place. It's used in two completely different ways:
* To size a UMA zone, which globally limits the number of concurrent
aio_suspend calls.
* To artifically limit the number of operations in a single lio_listio call.
There doesn't seem to be any memory allocation associated with this limit.
This change does two things:
* Properly names aio_suspend's UMA zone, and sizes it based on a new constant.
* Eliminates the artifical restriction on lio_listio. Instead, lio_listio
calls will now be limited by the more generous max_aio_queue_per_proc. The
old p1003_1b.aio_listio_max is now an alias for
vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with
_SC_AIO_LISTIO_MAX.
Reported by: bde
Reviewed by: jhb
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12120
In FreeBSD 11 and later debug.iosize_max_clamp defaults to 0, and the
maximum nbytes count for write(2) is SSIZE_MAX. Update the man page to
document this, and mention the sysctl that can be set to obtain the
previous behaviour.
PR: 196666
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
posix_fallocate is logically equivalent to writing zero blocks to the
desired file size and there is no reason to prevent calling it in
capability mode. posix_fallocate already checked for the CAP_WRITE
right, so we merely need to list it in capabilities.conf.
Reviewed by: allanjude
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12640
Make armv7 as a new MACHINE_ARCH.
Copy all the places we do armv6 and add armv7 as basically an
alias. clang appears to generate code for armv7 by default. armv7 hard
float isn't supported by the the in-tree gcc, so it hasn't been
updated to have a new default.
Support armv7 as a new valid MACHINE_ARCH (and by extension
TARGET_ARCH).
Add armv7 to the universe build.
Differential Revision: https://reviews.freebsd.org/D12010
After r308212 Capsicum permits .. lookups in capability mode, as long as
path component traversal does not escape the directory corresponding to
the provided file descriptor.
We should add a description of the vfs.lookup_cap_dotdot and
vfs.lookup_cap_dotdot_nonlocal sysctls, perhaps as a cross-reference to
capsicum(4). I intend to look at that soon.
Reviewed by: bjk, cem, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12343
S_IRUSR is defined in sys/stat.h
PR: 209229
Submitted by: <mt AT markoturk DOT info>
Approved by: bcr (mentor)
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D12007
Private functions like __aio_read and _aio_read were exposed in
FBSDprivate_1.0 by r169090, even though they've never been used outside of
librt. Also, remove some weak references from r156136 that have never
resolved.
Reviewed by: kib
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D11649
Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of
the allocated address space, but does not allow instantiation of the
pages in the range. It is useful for more explicit support for usual
two-stage reserve then commit allocators, since it prevents accidental
instantiation of the mapping, e.g. by mprotect(2).
Use guards to reimplement stack grow code. Explicitely track stack
grow area with the guard, including the stack guard page. On stack
grow, trivial shift of the guard map entry and stack map entry limits
makes the stack expansion. Move the code to detect stack grow and
call vm_map_growstack(), from vm_fault() into vm_map_lookup().
As result, it is impossible to get random mapping to occur in the
stack grow area, or to overlap the stack guard page.
Enable stack guard page by default.
Reviewed by: alc, markj
Man page update reviewed by: alc, bjk, emaste, markj, pho
Tested by: pho, Qualys
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D11306 (man pages)
The flag is not implemented, all FreeBSD architectures correctly
handle locks on normal cacheable mappings. On the other hand, the
flag was specified by some software, so it is kept in the header as
nop. Removal from the man page should discourage its use.
Reviewed by: alc, bjk, emaste, markj, pho
MFC after: 3 days
X-Differential revision: https://reviews.freebsd.org/D11306
Add forward compatibility so that new binaries can run on old
kernels. If the new system call from ino64 isn't available on your
system, then the old one will be used and the results translated. The
stat and statfs families of functions are fully emulated. While not
required by policy, in this case it is helpful to our users to provide
this compatibility. In this case, it allows rollback of the kernel
after installing a new userland should a problem be discovered. It
also prevents foot-shooting if a user does an install before rebooting
with the new kernel. Finally, it allows the use case where one needs
to run new binaries on an old kernel as part of an upgrade process.
The getdirentries family uses tricks that may not work on remote
filesystems. Specifically, it uses a buffer 1/4 the size requested to
get the data from he old syscall.
The code carefully uses direct syscalls for old system calls to avoid
referencing freebsd11_* symbols, which contaminate ld-elf.so.1's
export table due to its use of stat functions, which causes errno to
be incorrect in client programs due to the wrong *stat* function being
resolved in some cases.
This code should removed sometime after 12 is branched.
Tested on: 12-current binaries on a 10.3-beta kernel run and return
consistent results. 12-current kernel and userland with
packages from before ino64 was committed also work.
Differential Revision: https://reviews.freebsd.org/D11185
Reviewed by: kib@, emaste@
This syscall has never existed and is not at risk of existing any time soon.
Remove documentation referencing it, which has been wrong since FreeBSD 9.
Reported by: allanjude@
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.
To make this useful, data member of the struct kevent must be extended
to 64bit. Using the opportunity, I also added ext members. This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.
The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).
Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2). Compat shims
are provided for both host native and compat32.
Requested by: bapt
Reviewed by: bapt, brooks, ngie (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D11025
The futimens() and utimensat() compat stubs allowed using these functions on
kernels that did not have the system calls yet (10.2, old 11-current).
Also remove the documentation of the [ENOTSUP] error that could occur with
an old kernel.
A -DNO_CLEAN build may fail because the depend files refer to the deleted
files.
The documentation moved to section 3 several years ago, but
'man cap_rights_get' pulls up cap_rights_limit(2) (which is
MLINKed to cap_rights_get.2) instead of cap_rights_get(3).
MFC after: 1 week
bhyve was recently sandboxed with capsicum, and needs to be able to
control the CPU sets of its vcpu threads
Reviewed by: emaste, oshogbo, rwatson
MFC after: 2 weeks
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D10170
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.
ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.
Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.
Struct xvnode changed layout, no compat shims are provided.
For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.
Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.
Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).
Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439
- Sort .Xr entries in SEE ALSO section.
- Sort SEE ALSO and STANDARDS sections properly, in terms of the
entire document.
Reported by: make manlint
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
The previous misuse of sys_sigqueue() was sending random register or
stack garbage to 64-bit targets. The freebsd32 implementation preserves
the sival_int member of value when signaling a 64-bit process.
Document the mixed ABI implementation of union sigval and the
incompability of sival_ptr with pointer integrity schemes.
Reviewed by: kib, wblock
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10605
Add the CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clock_id
values to the clock_gettime(2) man page. Reformat the excessively
long paragraph (sentence!) into a tag list.
Reported by: jilles in https://reviews.freebsd.org/D10020
MFC after: 3 days
Sponsored by: Dell EMC
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.
Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)
Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.
Reviewed by: kib, ngie, jilles
MFC after: 3 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10020
INHERIT_ZERO is an OpenBSD feature.
When a page is marked as such, it would be zeroed
upon fork().
This would be used in new arc4random(3) functions.
PR: 182610
Reviewed by: kib (earlier version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D427
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
Now that <sys/event.h> can be included on its own, adjust the manual
page accordingly. Remove both unnecessary #include statements from the
synopsis and the example code.
While there, also add a note to the BUGS section to mention that
previous versions of this header file still depend on <sys/types.h>.
Reviewed by: ngie, vangyzen
Differential Revision: https://reviews.freebsd.org/D9605
For regular files and posix shared memory, POSIX requires that
[offset, offset + size) range is legitimate. At the maping time,
check that offset is not negative. Allowing negative offsets might
expose the data that filesystem put into vm_object for internal use,
esp. due to OFF_TO_IDX() signess treatment. Fault handler verifies
that the mapped range is valid, assuming that mmap(2) checked that
arithmetic gives no undefined results.
For device mappings, leave the semantic of negative offsets to the
driver. Correct object page index calculation to not erronously
propagate sign.
In either case, disallow overflow of offset + size.
Update mmap(2) man page to explain the requirement of the range
validity, and behaviour when the range becomes invalid after mapping.
Reported and tested by: royger (previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Document AF_UNIX control messages in unix(4) only, not split between unix(4)
and recv(2).
Also, warn about LOCAL_CREDS effective uid/gid fields, since the write could
be from a setuid or setgid program (with the explicit SCM_CREDS and
LOCAL_PEERCRED, the credentials are read at such a time that it can be
assumed that the process intends for them to be used in this context).
Reviewed by: wblock
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D9298
This reduces build output, need for recalculating paths, and makes it clearer
which paths are relative to what areas in the source tree. The change in
performance over a locally mounted UFS filesystem was negligible in my testing,
but this may more positively impact other filesystems like NFS.
LIBC_SRCTOP was left alone so Juniper (and other users) can continue to
manipulate lib/libc/Makefile (and other Makefile.inc's under lib/libc) as
include Makefiles with custom options.
Discussed with: marcel, sjg
MFC after: 1 week
Reviewed by: emaste
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D9207
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.
- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.
- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().
- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.
- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.
- How rate limiting works:
1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.
2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.
3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.
4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.
Reviewed by: wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision: https://reviews.freebsd.org/D3687
Sponsored by: Mellanox Technologies
MFC after: 3 months
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:
o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).
In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.
Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.
This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.
There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.
Reviewed by: adrian, gnn
Differential Revision: https://reviews.freebsd.org/D9171
This argument is not a bitmask of flags, but only accepts a single value.
Fail with EINVAL if an invalid value is passed to 'flag'. Rename the
'flags' argument to getmntinfo(3) to 'mode' as well to match.
This is a followup to r308088.
Reviewed by: kib
MFC after: 1 month
Instead of failing with ENAMETOOLONG, which is swallowed by
pthread_set_name_np() anyway, truncate the given name to MAXCOMLEN+1
bytes. This is more likely what the user wants, and saves the
caller from truncating it before the call (which was the only
recourse).
Polish pthread_set_name_np(3) and add a .Xr to thr_set_name(2)
so the user might find the documentation for this behavior.
Reviewed by: jilles
MFC after: 3 days
Sponsored by: Dell EMC
As of r234483, vnode deactivation causes non-VPO_NOSYNC pages to be
laundered. This behaviour has two problems:
1. Dirty VPO_NOSYNC pages must be laundered before the vnode can be
reclaimed, and this work may be unfairly deferred to the vnlru process
or an unrelated application when the system is under vnode pressure.
2. Deactivation of a vnode with dirty VPO_NOSYNC pages requires a scan of
the corresponding VM object's memq for non-VPO_NOSYNC dirty pages; if
the laundry thread needs to launder pages from an unreferenced such
vnode, it will reactivate and deactivate the vnode with each laundering,
potentially resulting in a large number of expensive scans.
Therefore, ensure that all dirty pages are laundered upon deactivation,
i.e., when all maps of the vnode are removed and all references are
released.
Reviewed by: alc, kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8641
We return [EMLINK] instead of [ELOOP] when trying to open a symlink with
O_NOFOLLOW, so that the original case of [ELOOP] can be distinguished. Code
like cmp -h and xz takes advantage of this.
PR: 214633
Reviewed by: kib, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D8586
do any speculations about readahead, and use exactly the amount of readahead
specified by user. E.g. setting SF_FLAGS(0, SF_USER_READAHEAD) will guarantee
that no readahead at all will be performed.
Previously the flag returned by cap_getmode was not described explicitly
in the man page.
Reviewed by: wblock
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7822
The initial value of NOASM is nearly the same in all cases and the
initial value of PSEUDO is the same in all cases so reduce duplication
(and hopefully, future merge conflicts) by machine independent defaults.
Also document the PSEUDO variable.
Reviewed by: jhb, kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D7820
Besides removing hand-translation to assembler, this also adds missing
wrappers for arm64 and risc-v.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D7694
Since ptrace(2) syscall can return -1 for non-error situations, libc
wrappers set errno to 0 before performing the syscall, as the service
to the caller. On both i386 and amd64, the errno symbol was directly
referenced, which only works correctly in single-threaded process.
Change assembler wrappers for ptrace(2) to get current thread errno
location by calling __error(). Allow __error interposing, as
currently allowed in cerror().
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
- Avoid double use of "request" in a single sentence. Instead, describe
aio_sigevent as being used to request notification of the associated
operation's completion. This matches the language used to describe
aio_sigevent in aio(4).
- Simplify the prohibition on modifying buffers while requests are in
flight.
- Fix case mismatch.
- Drop note about not using stack variables. C programmers should be able
to figure out if a stack variable is safe based on the later warning
about the life cycle requirements of control blocks.
- Remove prohibition on modifying the I/O buffer for aio_fsync() since
it does not use an I/O buffer. For aio_mlock(), prohibit modifications
to the mapping (e.g. due to mprotect, munmap, mmap, etc.) but do not
prohibit modifications to the memory backing the buffer (stores into
the pages backing the buffer).
Requested by: wblock (1,2), kib (4)
Reviewed by: kib, rpokala, wblock
MFC after: 3 days
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7462
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing
code with fsync(2). For all filesystems, this commit provides the
implementation which delegates the work of VOP_FDATASYNC() to
VOP_FSYNC(). This is functionally correct but not efficient.
This is not yet POSIX-compliant implementation, because it does not
ensure that queued AIO requests are completed before returning.
Reviewed by: mckusick
Discussed with: avg (ZFS), jhb (AIO part)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7471
Our mprotect() function seems to take a "const void *" address to the
pages whose permissions need to be adjusted. POSIX uses "void *". Simply
stick to the POSIX one to prevent us from writing unportable code.
PR: 211423 (exp-run)
Tested by: antoine@ (Thanks!)
It looks like the msgrcv() system call is already written in such a way
that the size is internally computed as a size_t and written into all of
td_retval[0]. This means that it is effectively already returning
ssize_t. It's just that the userspace prototype doesn't match up.
The asynchronous I/O changes made previously result in different
behavior out of the box. Previously all AIO requests failed with
ENOSYS / SIGSYS unless aio.ko was explicitly loaded. Now, some AIO
requests complete and others ("unsafe" requests) fail with EOPNOTSUPP.
Reword the introductory paragraph in aio(4) to add a general
description of AIO before describing the vfs.aio.enable_unsafe sysctl.
Remove the ENOSYS error description from aio_fsync(2), aio_read(2),
and aio_write(2) and replace it with a description of EOPNOTSUPP.
Remove the ENOSYS error description from aio_mlock(2).
Log a message to the system log the first time a process requests an
"unsafe" AIO request that fails with EOPNOTSUPP. This is modeled on
the log message used for processes using the legacy pty devices.
Reviewed by: kib (earlier version)
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7151
First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when
the new child was created via vfork() rather than fork(). Second, a
new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK
event mask. This new stop is reported after the vfork parent resumes
due to the child calling exit or exec. Debuggers can use this stop to
reinsert breakpoints in the vfork parent process before it resumes.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7045
ptrace() now stores a mask of optional events in p_ptevents. Currently
this mask is a single integer, but it can be expanded into an array of
integers in the future.
Two new ptrace requests can be used to manipulate the event mask:
PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK
sets the current event mask.
The current set of events include:
- PTRACE_EXEC: trace calls to execve().
- PTRACE_SCE: trace system call entries.
- PTRACE_SCX: trace syscam call exits.
- PTRACE_FORK: trace forks and auto-attach to new child processes.
- PTRACE_LWP: trace LWP events.
The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have
been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces
P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS.
The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for
compatibility but now simply toggle corresponding flags in the
event mask.
While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both
modify the event mask and continue the traced process.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7044
- Add a sigevent(3) manpage to give a general overview of the sigevent
structure and the available notification mechanisms.
- Document that AIO requests contain a nested sigevent structure that can
be used to request completion notification.
- Expand the sigevent details in other manuals to note details such as
the extra values stored in a queued signal's information or in a posted
kevent.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7122
Since r302216, thread suspension causes advisory file locks to restart
(instead of continuing to wait) and for a long time SA_RESTART has
affected advisory file locks. These are both not compliant to POSIX.1.
To clarify that restarting means something, add a paragraph about fair
queuing. Note that the network lock manager does not implement fair
queuing.
Reviewed by: kib (previous version)
Approved by: re (gjb)
value.
This eliminates the need for machine dependant assembly wrappers for
pipe(2).
It also make passing an invalid address to pipe(2) return EFAULT rather
than triggering a segfault. Document this behavior (which was already
true for pipe2(2), but undocumented).
Reviewed by: andrew
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D6815
Setting time by seconds or microseconds may cause unexpected effects
especially if sysctl vfs.timestamp_precision=3 (not default).
Calling the obsolete functions with NULL timestamps is acceptable.
Add some missing errno values to thr_new(2) and pthread_create(3).
In particular, EDEADLK was not documented in the latter.
While I'm here, improve some English and cross-references.
Reviewed by: kib
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D6663
Add text to thr_exit(2) and thr_new(2) discouraging their use in
applications since calling these in a process with libthr loaded will
confuse libthr and is likely to cause hangs or crashes.
The thr_kill2(2) call is not used by libthr and may be useful in special
applications.
The other calls can be used in applications but it should not be necessary.
While there, order EVFILT_VNODE notes descriptions alphabetically.
Based on submission, and tested by: Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after: 2 weeks
the monitored directory as the result of rename(2) operation. The
renames staying in the directory are not reported.
Submitted by: Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after: 2 weeks
a basic usage example. Although it is an
untypical example for the use of kqueue, it is
better than nothing and should get people started.
PR: 196844
Submitted by: fernando.apesteguia@gmail.com
Reviewed by: kib
Approved by: kib
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D6082
First, update the return types of aio_return() and aio_waitcomplete() to
ssize_t.
POSIX requires aio_return() to return a ssize_t so that it can represent
all return values from read() and write(). aio_waitcomplete() should use
ssize_t for the same reason.
aio_return() has used ssize_t in <aio.h> since r31620 but the manpage and
system call entry were not updated. aio_waitcomplete() has always
returned int.
Note that this does not require new system call stubs as this is
effectively only an API change in how the compiler interprets the return
value.
Second, allow aio_nbytes values up to IOSIZE_MAX instead of just INT_MAX.
aio_read/write should now honor the same length limits as normal read/write.
Third, use longs instead of ints in the aio_return() and aio_waitcomplete()
system call functions so that the 64-bit size_t in the in-kernel aiocb
isn't truncated to 32-bits before being copied out to userland or
being returned.
Finally, a simple test has been added to verify the bounds checking on the
maximum read size from a file.
These entries should have never been present since they only exist for
compat with FreeBSD 6.x (and older) binaries. This was missed in r296572.
Technically this breaks the ABI by removing versioned symbols. However,
no binaries should be linked against these symbols. No release has
shipped with a header that contained a prototype for these functions.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5615
documented and easy to miss.
At the same time, it's pretty important for anyone who is trying to use
SEEK_HOLE/SEEK_DATA in real app. Try to bridge that gap by making that
description more pronounced and also document how it affects failure codes.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5162
do not participate in the global symbols namespace, but rtld locks are
still replaced and functions are interposed. In particular,
__pthread_map_stacks_exec is resolved to the libc version. If a
library is loaded later, which requires adjustment of the stack
protection mode, rtld calls into libc __pthread_map_stacks_exec due to
the symbols scope. The libc version might recurse into binder and
recursively acquire rtld bind lock, causing the hang.
Make libc __pthread_map_stacks_exec() interposed, which synchronizes
rtld locks and version of the stack exec hook when libthr loaded,
regardless of the symbol scope control or symbol resolution order.
The __pthread_map_stacks_exec() symbol is removed from the private
version in libthr since libc symbol now operates correctly in presence
of libthr.
Reported and tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks