freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	fd53298799	unix: add myself to the copyright notice for the new implementation of PF_UNIX/SOCK_DGRAM	2023-02-01 09:39:28 -08:00
Justin Hibbits	9507d03bfe	IfAPI: Use the ifnet APIs in kern_poll() The only API used is if_name(). Sponsored by: Juniper Networks, Inc.	2023-01-31 15:02:16 -05:00
Sebastian Huber	c7c53e3ca6	Clarify hardpps() parameter name and comment Since `32c203577a` by phk in 1999 (Make even more of the PPSAPI implementations generic), the "nsec" parameter of hardpps() is a time difference and no longer a time point. Change the name to "delta_nsec" and adjust the comment. Remove comment about a clock tick adjustment which is no longer in the code. Pull Request: https://github.com/freebsd/freebsd-src/pull/640 Reviewed by: imp	2023-01-30 11:07:40 -07:00
Jose Luis Duran	df949e762c	kern_environment: Partially apply style(9) Sort include files, remove duplicates and remove trailing whitespce. Pull Request: https://github.com/freebsd/freebsd-src/pull/589 Reviewed by: imp	2023-01-30 10:47:56 -07:00
Dmitry Chagin	2058f075b4	cpuset: Handle CPU_WHICH_TIDPID wherever cpuset_which() is called. cpuset_which() resolves the argument pair which and id and returns references to an appropriate resources. To avoid leaking resources or accessing unresolved references to a resources handle new which CPU_WHICH_TIDPID wherever cpuset_which() is called. To avoid code duplication cpuset_which2() has been added. Reported by: syzbot+331e8402e0f7347f0f2a@syzkaller.appspotmail.com Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D38272 MFC after: 2 weeks	2023-01-30 19:28:54 +03:00
Dmitry Chagin	e4754c8036	subr_smp: Trim trailing whitespaces. MFC after: 1 week	2023-01-29 16:18:17 +03:00
Dmitry Chagin	c21b080f3d	cpuset: Fix sched_[g\|s]etaffinity() for better compatibility with Linux. Under Linux to sched_[g\|s]etaffinity() functions the value returned from a call to gettid(2) (thread id) can be passed in the argument pid. Specifying pid as 0 will set the attribute for the calling thread, and passing the value returned from a call to getpid(2) (process id) will set the attribute for the main thread of the thread group. Native cpuset(2) family of system calls has "which" argument to determine how the value of id argument is interpreted, i.e., CPU_WHICH_TID is used to pass a thread id and CPU_WHICH_PID - to pass a process id. For now native sched_[g\|s]etaffinity() implementation is wrong as uses "which" CPU_WHICH_PID to pass both (process and thread id) to the kernel. To fix this adding a new "which" CPU_WHICH_TIDPID intended to handle both id's. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D38209 MFC after: 1 week	2023-01-29 16:17:33 +03:00
Dmitry Chagin	01f74ccd5a	libthr: Fix pthread_attr_[g\|s]etaffinity_np to match it's manual and the kernel. Since `f35093f8` semantics of a thread affinity functions is changed to be a compatible with Linux: In case of getaffinity(), the minimum cpuset_t size that the kernel permits is the maximum CPU id, present in the system, / NBBY bytes, the maximum size is not limited. In case of setaffinity(), the kernel does not limit the size of the user-provided cpuset_t, internally using only the meaningful part of the set, where the upper bound is the maximum CPU id, present in the system, no larger than the size of the kernel cpuset_t. To match pthread_attr_[g\|s]etaffinity_np checks of the user-provided cpusets to the kernel behavior export the minimum cpuset_t size allowed by running kernel via new sysctl kern.sched.cpusetsizemin and use it in checks. Reviewed by: Differential Revision: https://reviews.freebsd.org/D38112 MFC after: 1 week	2023-01-29 15:35:18 +03:00
Allan Jude	5ff13fbc19	MFV: zstd 1.5.2 Merge commit 'b3392d84da5bf2162baf937c77e0557f3fd8a52b' into zstd_1.5.2 full changelog: https://github.com/facebook/zstd/compare/v1.4.8...v1.5.2 Updated sys/kern/subr_compressor.c to new API MFC after: 3 days Relnotes: yes Sponsored by: Klara, Inc.	2023-01-27 17:22:31 +00:00
Gleb Smirnoff	f394d9c0a4	sysctl: use correct types and names in sysctl_*sec_to_sbintime The functions are intended to report kernel variables that are stored as sbintime_t (pointed to by arg1) as human readable nanoseconds or milliseconds (reported via sysctl_handle_64). The variable types and names were reversed. I guess there is no functional change here, as all types flipped around were signed 64. Note that these function aren't used yet anywhere in the kernel. Reviewed by: mav Differential revision: https://reviews.freebsd.org/D38217	2023-01-27 07:09:22 -08:00
Mitchell Horne	627ca221c3	kern_reboot: unconditionally call shutdown_reset() Currently shutdown_reset() is registered as the final entry of the shutdown_final event handler. However, if a panic occurs early in boot before the event is registered (SI_SUB_INTRINSIC), we may end up spinning in the subsequent infinite for loop and failing to reset altogether. Instead we can simply call this function unconditionally. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D37981	2023-01-23 15:10:24 -04:00
Jiajie Chen	dec7db4960	Add kf_file_nlink field to kf_file and populate it This will allow user-space programs (e.g. lsof) to locate deleted files whose nlink equals zero. Prior to this commit, programs has to use stat(kf_path) to get nlink, but that will fail if the file is deleted. [mjg: s/fail/file in the commit message] Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D38169	2023-01-23 17:09:52 +00:00
Konstantin Belousov	456f05756b	Handle int rank issues in in vn_getsize_locked() and vn_seek() In vn_getsize_locked(), when storing vattr.va_size of type u_quad_t into off_t size, we must avoid overflow. Then, the check for fsize < 0, introduced in the commit `f45feecfb2` 'vfs: add vn_getsize', is nop [1]. Reported and reviewed by: jhb Coverity CID: 1502346 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38133	2023-01-20 23:56:29 +02:00
Konstantin Belousov	5657f49ef3	kern_umtx.c do_wait(): correct confusing indent Sponsored by: The FreeBSD Foundation MFC after: 3 days	2023-01-20 23:33:11 +02:00
Brooks Davis	fa1d803c0f	epoch: replace hand coded assertion The assertion is equivalent to kstack_contains() so use that rather than spelling it out. Suggested by: jhb Reviewed by: jhb MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D38107	2023-01-20 18:04:40 +00:00
John Baldwin	846e4a206f	ktls_disable_ifnet_help: Set curvnet around sorele(). This is required in kernels with VIMAGE such as GENERIC. MFC after: 1 week Sponsored by: Chelsio Communications	2023-01-18 15:39:04 -08:00
Konstantin Belousov	0f80d5ebc8	Require INVARIANTS and WITNESS if DEBUG_VFS_LOCKS is set Reported by: pho Reviewed by: markj, mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38070	2023-01-16 05:55:47 +02:00
Zhenlei Huang	8bce8d28ab	jail: Avoid multipurpose return value of function prison_ip_restrict() Currently function prison_ip_restrict() returns true if the replacement buffer was used, or no buffer provided and allocation fails and should redo. The logic is confusing and cause possibly infinite loop from `eb8dcdeac2` . Reviewed by: jamie, glebius Approved by: kp (mentor) Differential Revision: https://reviews.freebsd.org/D37918	2023-01-13 18:45:14 +08:00
Zhenlei Huang	89ddfbbac8	jail: Fix regression panic from `eb8dcdeac2` And possibly infinite loop calling prison_ip_restrict() in kern_jail_set() [2]. [1] It is possible that prisons do not have any IPv4 or IPv6 addresses. [2] If prison_ip_restrict() is not provided with prison_ip, when it allocates prison_ip successfully, then it should return false to indicate not redo prison_ip_restrict() later. Reviewed by: glebius Approved by: kp (mentor) Fixes: `eb8dcdeac2` jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37906	2023-01-13 18:45:14 +08:00
Zhenlei Huang	ddbf879d79	jail: Correctly access IPv[46] addresses of prison_ip * Fix wrong IPv[46] addresses inherited from parent jail * Properly restrict the child jail's IPv[46] addresses Reviewed by: melifaro, glebius Approved by: kp (mentor) Fixes: `eb8dcdeac2` jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37871 Differential Revision: https://reviews.freebsd.org/D37872	2023-01-13 18:45:14 +08:00
Konstantin Belousov	37b9fb1696	Add descrip_check_write_mp() helper ... which verifies that given file table does not have file descriptors referencing vnodes on the specified mount point. It is up to the caller to ensure that the check is not racy. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37896	2022-12-29 22:55:39 +02:00
Mateusz Guzik	f45feecfb2	vfs: add vn_getsize getattr is very expensive and in important cases only gets called to get the size. This can be optimized with a dedicated routine which obtains that statistic. As a step towards that goal make size-only consumers use a dedicated routine. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D37885	2022-12-28 22:43:49 +00:00
John Baldwin	07be751727	ktls: Post receive errors on partially closed sockets. If an error such as an invalid record or one whose decryption fails is detected on a socket that has received a RST then ktls_drop() could ignore the error since INP_DROPPED could already be set. In this case soreceive_generic hangs since it does not return from a KTLS socket with pending encrypted data unless there is an error (so_error) (this behavior is to ensure that soreceive_generic doesn't return a premature EOF when there is pending data still being decrypted). Note that this was a bug prior to `69542f2682` as tcp_usr_abort would also have ignored the error in this case. Reviewed by: gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37775	2022-12-27 16:00:17 -08:00
Mateusz Guzik	829f0bcb5f	vfs: add the concept of vnode state transitions To quote from a comment above vput_final: <quote> * XXX Some filesystems pass in an exclusively locked vnode and strongly depend * on the lock being held all the way until VOP_INACTIVE. This in particular * happens with UFS which adds half-constructed vnodes to the hash, where they * can be found by other code. </quote> As is there is no mechanism which allows filesystems to denote that a vnode is fully initialized, consequently problems like the above are only found the hard way(tm). Add rudimentary support for state transitions, which in particular allow to assert the vnode is not legally unlocked until its fate is decided (either construction finishes or vgone is called to abort it). The new field lands in a 1-byte hole, thus it does not grow the struct. Bump __FreeBSD_version to 1400077 Reviewed by: kib (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D37759	2022-12-26 17:35:12 +00:00
Mateusz Guzik	94267fc907	vfs: use designated initializers for the typename array While here prefix with v for better consistency with the vnode stuff. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D37759	2022-12-26 17:34:41 +00:00
Konstantin Belousov	974be51b3f	Fixes for ptrace_syscallreq() Re-assign the sc local (syscall number) before moving args for SYS_syscall. Correct the audit and kdtrace hooks invocations. Fixes: `140ceb5d95` Sponsored by: The FreeBSD Foundation MFC after: 1 week	2022-12-23 01:53:41 +02:00
Konstantin Belousov	140ceb5d95	ptrace(2): add PT_SC_REMOTE remote syscall request Reviewed by: markj Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590	2022-12-22 23:11:35 +02:00
Konstantin Belousov	f0592b3c8d	Add a thread debugging flag TDB_BOUNDARY It indicates to a debugger that the thread is stopped at the kernel->user exit path. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590	2022-12-22 23:11:35 +02:00
Konstantin Belousov	e6feeae2f9	sys: rename td_coredump to td_remotereq and TDB_COREDUMPRQ to TDB_COREDUMPREQ Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590	2022-12-22 23:11:35 +02:00
Zhenlei Huang	21ad3e27fa	jail: Fix output of IPv[46] addresses of DDB `show prison` Reviewed by: melifaro, jamie Approved by: kp (mentor) Fixes: `eb8dcdeac2` jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37732	2022-12-21 09:53:28 +08:00
Alfredo Dal'Ava Junior	b13110e9f3	ufs/ffs: detect endian mismatch between machine and filesystem Mount on a LE machine a filesystem formatted for BE is not supported currently. This adds a check for the superblock magic number using swapped bytes to guess and warn the user that it may be a valid superblock but endian is incompatible. MFC after: 2 weeks Reviewed by: mckusick Obtained from: mckusick, alfredo Differential Revision: https://reviews.freebsd.org/D37675	2022-12-20 00:20:11 -03:00
Doug Rabson	71e9be1bd5	Don't allow stacking of file mounts Reviewed by: mjg, kib Tested by: pho	2022-12-19 16:46:27 +00:00
Doug Rabson	a1d74b2dab	Allow realpath to work for file mounts For file mounts, the directory vnode is not available from namei and this prevents the use of vn_fullpath_hardlink. In this case, we can use the vnode which was covered by the file mount with vn_fullpath. This also disallows file mounts over files with link counts greater than one to ensure a deterministic path to the mount point. Reviewed by: mjg, kib Tested by: pho	2022-12-19 16:46:27 +00:00
Doug Rabson	521fbb722c	Add support for mounting single files in nullfs The main use-case for this is to support mounting config files and secrets into OCI containers. My current workaround copies the files into the container which is messy and risks secrets leaking into container images if the cleanup fails. This adds a VFCF flag to indicate whether the filesystem supports file mounts and allows fspath to be either a directory or a file if the flag is set. Test Plan: $ sudo mkdir -p /mnt $ sudo touch /mnt/foo $ sudo mount -t nullfs /COPYRIGHT /mnt/foo Reviewed by: mjg, kib Tested by: pho	2022-12-19 16:46:13 +00:00
Doug Rabson	78d35459a2	Add vn_path_to_global_path_hardlink This is similar to vn_path_to_global_path but allows for regular files which may not be present in the cache. Reviewed by: mjg, kib Tested by: pho	2022-12-19 16:44:59 +00:00
Mateusz Guzik	8f7859e800	vfs: retire the now unused SAVESTART flag Bump __FreeBSD_version to 1400075 Tested by: pho	2022-12-19 08:11:08 +00:00
Mateusz Guzik	56da4aa554	vfs: stop using SAVESTART for rename ni_startdir has never reached rename routines anyway Reviewed by: mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D34468	2022-12-19 08:09:37 +00:00
Mateusz Guzik	8f874e92eb	vfs: make relookup take an additional argument instead of looking at SAVESTART This is a step towards removing the flag. Reviewed by: mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D34468	2022-12-19 08:09:00 +00:00
Mateusz Guzik	269c564b90	vfs: retire NDFREE There are no consumers anymore. Interested parties can NDFREE_PNBUF and vput or vrele relevant vnodes. Tested by: pho	2022-12-19 08:07:54 +00:00
Mateusz Guzik	85dac03e30	vfs: stop using NDFREE It provides nothing but a branchfest and next to no consumers want it anyway. Tested by: pho	2022-12-19 08:07:23 +00:00
Rick Macklem	bba7a2e896	kern_jail.c: Allow mountd/nfsd to optionally run in a jail This patch adds "allow.nfsd" to the jail code based on a new kernel build option VNET_NFSD. This will not work until future patches fix nmount(2) to allow mountd to run in a vnet prison and the NFS server code is patched so that global variables are in a vnet. The jail(8) man page will be patched in a future commit. Reviewed by: jamie MFC after: 4 months Differential Revision: https://reviews.freebsd.org/D37637	2022-12-17 13:43:49 -08:00
Rick Macklem	195f1b124d	vfs_mount.c: fix vfs_domount() for PRIV_VFS_MOUNT_EXPORTED It appears that, prior to r158857 vfs_domount() checked suser() when MNT_EXPORTED was specified. r158857 appears to have broken this, since MNT_EXPORTED was no longer set when mountd.c was converted to use nmount(2). r164033 replaced the suser() check with priv_check(td, PRIV_VFS_MOUNT_EXPORTED), which does the same thing (ie. checks for effective uid == 0 assuming suses_enabled is set). This patch restores this check by setting MNT_EXPORTED when the "export" mount option is specified to nmount(). I think this is reasonable since only mountd(8) should be setting exports and I doubt any non-root mounted file system would be setting its own exports. Reviewed by: kib, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D37718	2022-12-16 13:01:23 -08:00
John Baldwin	69542f2682	ktls: Close a race with setting so_error when dropping a connection. pr_abort calls tcp_usr_abort which calls tcp_drop with ECONNABORTED. After pr_abort returns, the so_error is then set to a more specific error. However, a reader can observe and return the ECONNABORTED error before so_error is set to the desired error value. This is resulting in spurious test failures of recently added tests for invalid conditions such as invalid headers. To fix, refactor the code to abort a connection to call tcp_drop directly with the desired error value. ktls_reset_send_tag already calls tcp_drop directly when it aborts a connection due to an error. Reviewed by: gallatin Reported by: CI (jenkins), gallatin, olivier Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37692	2022-12-15 12:06:26 -08:00
Andrew Gallatin	ac4e3a27ab	Unbreak the build when MAC is not defined `7a2c93b86e` removed the use of "error" when MAC was not defined, resulting in an unused variable error. Sponsored by: Netflix Reviewed by: jhb	2022-12-14 17:39:25 -05:00
Gleb Smirnoff	7a2c93b86e	sockets: provide sousrsend() that does socket specific error handling Sockets have special handling for EPIPE on a write, that was spread out into several places. Treating transient errors is also special - if protocol is atomic, than we should ignore any changes to uio_resid, a transient error means the write had completely failed (see `d2b3a0ed31`). - Provide sousrsend() that expects a valid uio, and leave sosend() for kernel consumers only. Do all special error handling right here. - In dofilewrite() don't do special handling of error for DTYPE_SOCKET. - For send(2), write(2) and aio_write(2) call into sousrsend() and remove error handling for kern_sendit(), soo_write() and soaio_process_job(). PR: 265087 Reported by: rz-rpi03 at h-ka.de Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35863	2022-12-14 10:02:44 -08:00
Jason A. Harmening	42442d7a6e	Generalize the VV_CROSSLOCK logic in vfs_lookup() When VV_CROSSLOCK is present, the lock for the vnode at the current stage of lookup must be held across the VFS_ROOT() call for the filesystem mounted at the vnode. Since VV_CROSSLOCK implies that the root vnode reuses the already-held lock, the possibility for recursion should be made clear in the flags passed to VFS_ROOT(). For cases in which the lock is held exclusive, this means passing LK_CANRECURSE. For cases in which the lock is held shared, it means clearing LK_NODDLKTREAT to allow VFS_ROOT() to potentially recurse on the shared lock even in the presence of an exclusive waiter. That the existing code works for unionfs is due to a coincidence of the current unionfs implementation. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D37458	2022-12-10 22:02:38 -06:00
Mateusz Guzik	ebdf27b6f3	uipc: remove accept_mtx It is unused since `779f106aa1` ("Listening sockets improvements.") Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-12-11 02:47:07 +00:00
Konstantin Belousov	0919f29d91	shmfd: account for the actually allocated pages Return the value as stat(2) st_blocks. Suggested and reviewed by: markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37097	2022-12-09 14:17:12 +02:00
Konstantin Belousov	37aea2649f	tmpfs: for used pages, account really allocated pages, instead of file sizes This makes tmpfs size accounting correct for the sparce files. Also correct report st_blocks/va_bytes. Previously the reported value did not accounted for the swapped out pages. PR: 223015 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37097	2022-12-09 14:17:12 +02:00
Konstantin Belousov	7ec4b29b08	uiomove_object: hide diagnostic under bootverbose Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37097	2022-12-09 14:15:37 +02:00

1 2 3 4 5 ...

19455 Commits