freebsd-skq

Author	SHA1	Message	Date
kevans	0b09e6adb7	Add SPDX tags to recently added files Reported by: Pawel Biernacki	2019-09-25 22:53:30 +00:00
kevans	ca07d63db5	posix_spawn(3): handle potential signal issues with vfork Described in [1], signal handlers running in a vfork child have opportunities to corrupt the parent's state. Address this by adding a new rfork(2) flag, RFSPAWN, that has vfork(2) semantics but also resets signal handlers in the child during creation. x86 uses rfork_thread(3) instead of a direct rfork(2) because rfork with RFMEM/RFSPAWN cannot work when the return address is stored on the stack -- further information about this problem is described under RFMEM in the rfork(2) man page. Addressing this has been identified as a prerequisite to using posix_spawn in subprocess on FreeBSD [2]. [1] https://ewontfix.com/7/ [2] https://bugs.python.org/issue35823 Reviewed by: jilles, kib Differential Revision: https://reviews.freebsd.org/D19058	2019-09-25 19:22:03 +00:00
kevans	245d8426fc	rfork(2): add RFSPAWN flag When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets signal handlers in the child during creation to avoid a point of corruption of parent state from the child. This flag will be used by posix_spawn(3) to handle potential signal issues. Reviewed by: jilles, kib Differential Revision: https://reviews.freebsd.org/D19058	2019-09-25 19:20:41 +00:00
kevans	1d9983b221	sysent: regenerate after r352705 This also implements it, fixes kdump, and removes no longer needed bits from lib/libc/sys/shm_open.c for the interim.	2019-09-25 18:09:19 +00:00
kevans	575e351fdd	Add linux-compatible memfd_create memfd_create is effectively a SHM_ANON shm_open(2) mapping with optional CLOEXEC and file sealing support. This is used by some mesa parts, some linux libs, and qemu can also take advantage of it and uses the sealing to prevent resizing the region. This reimplements shm_open in terms of shm_open2(2) at the same time. shm_open(2) will be moved to COMPAT12 shortly. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D21393	2019-09-25 18:03:18 +00:00
kevans	7013010007	Update fcntl(2) after r352695	2019-09-25 17:33:12 +00:00
sef	f0e5ce5f10	Add two options to allow mount to avoid covering up existing mount points. The two options are * nocover/cover: Prevent/allow mounting over an existing root mountpoint. E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local is already a mountpoint. * emptydir/noemptydir: Prevent/allow mounting on a non-empty directory. E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail. Neither of these options is intended to be a default, for historical and compatibility reasons. Reviewed by: allanjude, kib Differential Revision: https://reviews.freebsd.org/D21458	2019-09-23 04:28:07 +00:00
kib	c4ddd57b85	Return EISDIR when directory is opened with O_CREAT without O_DIRECTORY. Reviewed by: bcr (man page), emaste (previous version) PR: 240452 Sponsored by: The FreeBSD Foundation MFC after: 1 week DIfferential revision: https://reviews.freebsd.org/D21634	2019-09-17 18:32:18 +00:00
asomers	af1c569e96	getsockopt.2: clarify that SO_TIMESTAMP is not 100% reliable When SO_TIMESTAMP is set, the kernel will attempt to attach a timestamp as ancillary data to each IP datagram that is received on the socket. However, it may fail, for example due to insufficient memory. In that case the packet will still be received but not timestamp will be attached. Reviewed by: kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D21607	2019-09-11 19:48:32 +00:00
mhorne	cac0e53d56	Fix cpuwhich_t column width Not bumping .Dd since this is purely a format change. Approved by: markj (mentor)	2019-09-08 21:37:52 +00:00
kib	a788c25f19	Add procctl(PROC_STACKGAP_CTL) It allows a process to request that stack gap was not applied to its stacks, retroactively. Also it is possible to control the gaps in the process after exec. PR: 239894 Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21352	2019-09-03 18:56:25 +00:00
mjg	286ae5bd6b	Add sysctlbyname system call Previously userspace would issue one syscall to resolve the sysctl and then another one to actually use it. Do it all in one trip. Fallback is provided in case newer libc happens to be running on an older kernel. Submitted by: Pawel Biernacki Reported by: kib, brooks Differential Revision: https://reviews.freebsd.org/D17282	2019-09-03 04:16:30 +00:00
emaste	ac28fa5277	Add @generated tag to libc syscall asm wrappers Although libc syscall wrappers do not get checked in this can aid in finding the source of generated files when spelunking in the objdir. Multiple tools use @generated to identify generated files (for example, in a review Phabricator will by default hide diffs in generated files). For consistency use the @generated tag in makesyscalls.sh as we've done for other generated files, even though these wrappers aren't checked in to the tree.	2019-08-16 14:14:57 +00:00
kib	75eca26976	wait(2): clarify reparenting of children of the exiting process. Point to the existence of reapers and mention that init is the default reaper. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-08-11 15:47:48 +00:00
kib	334dd0a5ca	wait(2): split long line by using .Fo/.Fa instead of .Ft. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-08-11 15:44:36 +00:00
bjk	c1b91fd7f9	Fix grammar nit in copy_file_range docs Bytes are countable, so we have fewer of them, not less of them.	2019-07-25 15:43:15 +00:00
rmacklem	cd932f85db	Add libc support for the copy_file_range(2) syscall added by r350315. copy_file_range.2 is a new man page (content change). Reviewed by: kib, asomers Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20584	2019-07-25 06:05:49 +00:00
jhb	bd67f2ec6b	Add ptrace op PT_GET_SC_RET. This ptrace operation returns a structure containing the error and return values from the current system call. It is only valid when a thread is stopped during a system call exit (PL_FLAG_SCX is set). The sr_error member holds the error value from the system call. Note that this error value is the native FreeBSD error value that has _not_ been translated to an ABI-specific error value similar to the values logged to ktrace. If sr_error is zero, then the return values of the system call will be set in sr_retval[0] and sr_retval[1]. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D20901	2019-07-15 21:48:02 +00:00
kib	dd4da0248c	Document atomicity for read(2) and write(2). Take part of the text from POSIX 2018 edition and describe the atomicity requirements for read and write syscalls. See p1003.1-2018, Vol.2, 2.9.7 Threads interaction with Regular File Operations. Reviewed by: asomers Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D20867	2019-07-06 20:31:37 +00:00
kib	5144f6086b	Control implicit PROT_MAX() using procctl(2) and the FreeBSD note feature bit. In particular, allocate the bit to opt-out the image from implicit PROTMAX enablement. Provide procctl(2) verbs to set and query implicit PROTMAX handling. The knobs mimic the same per-image flag and per-process controls for ASLR. Reviewed by: emaste, markj (previous version) Discussed with: brooks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D20795	2019-07-02 19:07:17 +00:00
kib	e9829937e1	Typo. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-06-28 16:42:44 +00:00
brooks	57acd8427d	Add PROT_MAX to the HISTORY section. In the case of mmap(), add a HISTORY section. Mention that mmap() and mprotect()'s documentation predates an implementation. The implementation first saw wide use in 4.3-Reno, but there seems to be no easy way to express that in mdoc so stick with 4.4BSD. Reviewed by: emaste Requested by: cem Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D20713	2019-06-20 21:52:30 +00:00
brooks	0165ccc850	Extend mmap/mprotect API to specify the max page protections. A new macro PROT_MAX() alters a protection value so it can be OR'd with a regular protection value to specify the maximum permissions. If present, these flags specify the maximum permissions. While these flags are non-portable, they can be used in portable code with simple ifdefs to expand PROT_MAX() to 0. This change allows (e.g.) a region that must be writable during run-time linking or JIT code generation to be made permanently read+execute after writes are complete. This complements W^X protections allowing more precise control by the programmer. This change alters mprotect argument checking and returns an error when unhandled protection flags are set. This differs from POSIX (in that POSIX only specifies an error), but is the documented behavior on Linux and more closely matches historical mmap behavior. In addition to explicit setting of the maximum permissions, an experimental sysctl vm.imply_prot_max causes mmap to assume that the initial permissions requested should be the maximum when the sysctl is set to 1. PROT_NONE mappings are excluded from this for compatibility with rtld and other consumers that use such mappings to reserve address space before mapping contents into part of the reservation. A final version this is expected to provide per-binary and per-process opt-in/out options and this sysctl will go away in its current form. As such it is undocumented. Reviewed by: emaste, kib (prior version), markj Additional suggestions from: alc Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18880	2019-06-20 18:24:16 +00:00
asomers	ba20317d83	open(2): fix the description of O_FSYNC The man page claims that with O_FSYNC (aka O_SYNC) the kernel will not cache written data. However, that's not true. Nor does POSIX require it. Perhaps it was true when that section of the man page was written in r69336 (I haven't checked). But it's not true now. Now the effect is simply that writes are sent to disk immediately and synchronously, but they're still cached. See also: https://pubs.opengroup.org/onlinepubs/9699919799/ See also: ffs_write in sys/ufs/ffs/ffs_vnops.c Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20641	2019-06-14 20:35:37 +00:00
oshogbo	86dc1571ec	unlink: add missing function to unlink.2 man page	2019-06-05 22:36:19 +00:00
asomers	ae332e926a	Link fhlinkat(2) man page Reviewed by: kib MFC after: 3 days MFC-With: r341689 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20339	2019-05-22 01:11:21 +00:00
markj	094736f08f	Provide separate accounting for user-wired pages. Historically we have not distinguished between kernel wirings and user wirings for accounting purposes. User wirings (via mlock(2)) were subject to a global limit on the number of wired pages, so if large swaths of physical memory were wired by the kernel, as happens with the ZFS ARC among other things, the limit could be exceeded, causing user wirings to fail. The change adds a new counter, v_user_wire_count, which counts the number of virtual pages wired by user processes via mlock(2) and mlockall(2). Only user-wired pages are subject to the system-wide limit which helps provide some safety against deadlocks. In particular, while sources of kernel wirings typically support some backpressure mechanism, there is no way to reclaim user-wired pages shorting of killing the wiring process. The limit is exported as vm.max_user_wired, renamed from vm.max_wired, and changed from u_int to u_long. The choice to count virtual user-wired pages rather than physical pages was done for simplicity. There are mechanisms that can cause user-wired mappings to be destroyed while maintaining a wiring of the backing physical page; these make it difficult to accurately track user wirings at the physical page layer. The change also closes some holes which allowed user wirings to succeed even when they would cause the system limit to be exceeded. For instance, mmap() may now fail with ENOMEM in a process that has called mlockall(MCL_FUTURE) if the new mapping would cause the user wiring limit to be exceeded. Note that bhyve -S is subject to the user wiring limit, which defaults to 1/3 of physical RAM. Users that wish to exceed the limit must tune vm.max_user_wired. Reviewed by: kib, ngie (mlock() test changes) Tested by: pho (earlier version) MFC after: 45 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19908	2019-05-13 16:38:48 +00:00
trasz	2dfe603ea5	.Xr protect(1) and proccontrol(1) from procctl(2). MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-04-09 10:09:59 +00:00
oshogbo	20d273b44b	Introduce funlinkat syscall that always us to check if we are removing the file associated with the given file descriptor. Reviewed by: kib, asomers Reviewed by: cem, jilles, brooks (they reviewed previous version) Discussed with: pjd, and many others Differential Revision: https://reviews.freebsd.org/D14567	2019-04-06 09:34:26 +00:00
kib	9638d3e2e6	Fix initial exec TLS mode for dynamically loaded shared objects. If dso uses initial exec TLS mode, rtld tries to allocate TLS in static space. If there is no space left, the dlopen(3) fails. If space if allocated, initial content from PT_TLS segment is distributed to all threads' pcbs, which was missed and caused un-initialized TLS segment for such dso after dlopen(3). The mode is auto-detected either due to the relocation used, or if the DF_STATIC_TLS dynamic flag is set. In the later case, the TLS segment is tried to allocate earlier, which increases chance of the dlopen(3) to succeed. LLD was recently fixed to properly emit the flag, ld.bdf did it always. Initial test by: dumbbell Tested by: emaste (amd64), ian (arm) Tested by: Gerald Aryeetey <aryeeteygerald_rogers.com> (arm64) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D19072	2019-03-29 17:52:57 +00:00
emaste	c2bece9ede	Use consistent struct stat arg name in stat man page stat, lstat, and fstat use `sb` as the stat struct pointer arg name, while fstatat previously used `buf`. MFC after: 1 week	2019-03-13 15:18:14 +00:00
emaste	47733e8a6a	poll.2: POLLNVAL is returned also for insufficient rights Reported by: "Bora Özarslan" <borako.ozarslan@gmail.com> MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-02-27 17:52:22 +00:00
kib	069606f4d2	procctl(2): document ASLR knobs. Reviewed by: 0mp Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D19308	2019-02-26 17:41:41 +00:00
kib	7b81db9851	procctl(2): fix -width parameter to .Bl. According to 0mp, macros are not expanded in the argument provided to -width. Use plain identifiers for width specification. Noted and reviewed by: 0mp Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D19308	2019-02-26 17:35:06 +00:00
glebius	8a60540f9e	Imaginary cat jumped my keyboard!	2019-02-15 23:46:34 +00:00
glebius	d889424078	For 32-bit machines rollback the default number of vnode pager pbufs back to the lever before r343030. For 64-bit machines reduce it slightly, too. Together with r343030 I bumped the limit up to the value we use at Netflix to serve 100 Gbit/s of sendfile traffic, and it probably isn't a good default. Provide a loader tunable to change vnode pager pbufs count. Document it.	2019-02-15 23:36:22 +00:00
pluknet	16482aa8e6	Document the ENOBUFS errno in setsockopt(2). In particular, it is the case if SO_SNDBUF/SO_RCVBUF would exceed sb_max_adj. PR: 200649 MFC after: 1 week	2019-02-09 21:33:32 +00:00
ngie	d1a640f492	Document that `sendfile` will return an invalid value for `sbytes` if provided an invalid address This is meant to clarify the fact that the system call will not fail with -1/EFAULT, as one might expect, when reading the sendfile(2) manpage today. While here, pet the mandoc linter, when dealing with the section that describes valid values for `flags`. PR: 232210 MFC after: 2 weeks Approved by: emaste (mentor) Reviewed by: glebius, 0mp Differential Revision: https://reviews.freebsd.org/D18949	2019-01-25 19:56:02 +00:00
mckusick	72a21ba0f6	Create new EINTEGRITY error with message "Integrity check failed". An integrity check such as a check-hash or a cross-correlation failed. The integrity error falls between EINVAL that identifies errors in parameters to a system call and EIO that identifies errors with the underlying storage media. EINTEGRITY is typically raised by intermediate kernel layers such as a filesystem or an in-kernel GEOM subsystem when they detect inconsistencies. Uses include allowing the mount(8) command to return a different exit value to automate the running of fsck(8) during a system boot. These changes make no use of the new error, they just add it. Later commits will be made for the use of the new error number and it will be added to additional manual pages as appropriate. Reviewed by: gnn, dim, brueffer, imp Discussed with: kib, cem, emaste, ed, jilles Differential Revision: https://reviews.freebsd.org/D18765	2019-01-17 06:35:45 +00:00
kib	f34dbcf7d0	Implement shmat(2) flag SHM_REMAP. Based on the description in Linux man page. Reviewed by: markj, ngie (previous version) Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18837	2019-01-16 05:15:57 +00:00
kib	54e59fff7b	Add a tunable which changes mincore(2) algorithm to only report data from the local mapping. Enable the setting by default. The article behind the change: https://arxiv.org/abs/1901.01161 Reviewed by: markj Discussed with: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18764	2019-01-07 22:10:48 +00:00
jilles	bbe7a81f56	thr_wake(2): Minor mdoc fixes MFC after: 1 week	2019-01-06 21:34:05 +00:00
markj	85037a09f3	Support MSG_DONTWAIT in send(2). As it does for recv(2), MSG_DONTWAIT indicates that the call should not block, returning EAGAIN instead. Linux and OpenBSD both implement this, so the change makes porting easier, especially since we do not return EINVAL or so when unrecognized flags are specified. Submitted by: Greg V <greg@unrelenting.technology> Reviewed by: tuexen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18728	2019-01-04 17:31:50 +00:00
kib	4045451f9e	Remove special case handling for getfhat(fd, NULL, handle). There is no reason for it to behave differently from openat(fd, NULL). Also the handling did not worked because the substituted path was from the system address space, causing EFAULT. Submitted by: Jack Halford <jack@gandi.net> MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18501	2018-12-11 02:48:49 +00:00
kib	48d91fd889	Add new file handle system calls. Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2). The syscalls are provided for a NFS userspace server (nfs-ganesha). Submitted by: Jack Halford <jack@gandi.net> Sponsored by: Gandi.net Tested by: pho Feedback from: brooks, markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18359	2018-12-07 15:17:29 +00:00
asomers	1d5dbbadec	stat(2): clarify which syscalls modify file timestamps The list of syscalls that modify st_atim, st_mtim, and st_ctim was quite out of date and probably not accurate to begin with. Update it, and make it clear that the list is open-ended. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18410	2018-12-05 17:28:40 +00:00
asomers	0b42fa2478	fcntl.2: document an additional error condition MFC after: 2 weeks	2018-11-15 16:13:25 +00:00
kib	7d33ec3750	Add d_off support for multiple filesystems. The d_off field has been added to the dirent structure recently. Currently filesystems don't support this feature. Support has been added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs. A stub implementation is available for cd9660, nandfs, udf and pseudofs but hasn't been tested. Motivation for this feature: our usecase is for a userspace nfs server (nfs-ganesha) with zfs. At the moment we cache direntry offsets by calling lseek once per entry, with this patch we can get the offset directly from getdirentries(2) calls which provides a significant speedup. Submitted by: Jack Halford <jack@gandi.net> Reviewed by: mckusick, pfg, rmacklem (previous versions) Sponsored by: Gandi.net MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17917	2018-11-14 14:18:35 +00:00
kib	b2b4bc844b	First draft of documentation for AT/O_BENEATH handling of the absolute paths. It was decided that committing the code and drafting of the man page update is better than allowing the code to rot until wordsmithing happens. Reviewed by: jilles (previous version) Discussed with: brooks, jilles, emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17714	2018-11-11 01:46:48 +00:00
cem	a559f5f5a0	kern_poll: Restore explanatory comment removed in r177374 The comment isn't stale. The check is bogus in the sense that poll(2) does not require pollfd entries to be unique in fd space, so there is no reason there cannot be more pollfd entries than open or even allowed fds. The check is mostly a seatbelt against accidental misuse or abuse. FD_SETSIZE, while usually unrelated to poll, is used as an arbitrary floor for systems with very low kern.maxfilesperproc. Additionally, document this possible EINVAL condition in the poll.2 manual. No functional change. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D17671	2018-11-01 23:46:23 +00:00

1 2 3 4 5 ...

1876 Commits