freebsd-dev

Author	SHA1	Message	Date
John Baldwin	cca6d6160f	aio_biowakeup: Various style fixes.	2023-02-15 10:57:08 -08:00
Keith Reynolds	40734fc57e	aio: Fix a test and set race in aio_biowakeup. Use atomic_fetchadd in place of separate atomic_subtract / atomic_load. Reviewed by: markj Sponsored by: HPE TidalScale Differential Revision: https://reviews.freebsd.org/D38559	2023-02-15 10:56:39 -08:00
Mitchell Horne	28137bdb19	intrng: track counter allocation with a bitmap Crucially, this allows releasing counters, and interrupt sources by extension. Where before we were incrementing intrcnt_index with atomics, now we protect the bitmap using the existing isrc_table_lock mutex. Reviewed by: mmel MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D38437	2023-02-14 14:06:00 -04:00
Mitchell Horne	82e846df5b	intrng: sort includes MFC after: 3 days	2023-02-14 14:06:00 -04:00
Mark Johnston	636b19ead4	tcp: Disallow re-connection of a connected socket soconnectat() tries to ensure that one cannot connect a connected socket. However, the check is racy and does not really prevent two threads from attempting to connect the same TCP socket. Modify tcp_connect() and tcp6_connect() to perform the check again, this time synchronized by the inpcb lock, under which we call soisconnecting(). Reported by: syzkaller Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38507	2023-02-14 10:07:19 -05:00
Konstantin Belousov	020e8a4d06	allocbuf(): convert direct panic() calls to KASSERT()s Also do minor style adjustments. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38549	2023-02-14 00:28:42 +02:00
Mateusz Guzik	a066bba2da	ntptime: ansify Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-02-13 18:24:13 +00:00
Mateusz Guzik	00343b4adc	uipc: ansify Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-02-13 18:20:29 +00:00
Mitchell Horne	78919798e7	kern_poll: include sys/sched.h For sched_relinquish(). This fixes the build for some kernel configs. Reported by: Jenkins Fixes: `1029dab634` ("mi_switch(): clean up switch types and their usage")	2023-02-09 17:13:02 -04:00
Andrew Gallatin	d24b032bec	ktls: Fix comments & whitespace issues with `c0e4090e3d` Address some last minute review feedback on `c0e4090e3d` by fixing spacing around comments, and clarifying that the newly added destroy_task is not related to tls 1.0. No functional change intended. Pointed out by: jhb Sponsored by: Netflix	2023-02-09 14:11:24 -05:00
Andrew Gallatin	c0e4090e3d	ktls: Accurately track if ifnet ktls is enabled This allows us to avoid spurious calls to ktls_disable_ifnet() When we implemented ifnet kTLSe, we set a flag in the tx socket buffer (SB_TLS_IFNET) to indicate ifnet kTLS. This flag meant that now, or in the past, ifnet ktls was active on a socket. Later, I added code to switch ifnet ktls sessions to software in the case of lossy TCP connections that have a high retransmit rate. Because TCP was using SB_TLS_IFNET to know if it needed to do math to calculate the retransmit ratio and potentially call into ktls_disable_ifnet(), it was doing unneeded work long after a session was moved to software. This patch carefully tracks whether or not ifnet ktls is still enabled on a TCP connection. Because the inp is now embedded in the tcpcb, and because TCP is the most frequent accessor of this state, it made sense to move this from the socket buffer flags to the tcpcb. Because we now need reliable access to the tcbcb, we take a ref on the inp when creating a tx ktls session. While here, I noticed that rack/bbr were incorrectly implementing tfb_hwtls_change(), and applying the change to all pending sends, when it should apply only to future sends. This change reduces spurious calls to ktls_disable_ifnet() by 95% or so in a Netflix CDN environment. Reviewed by: markj, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D38380	2023-02-09 12:44:44 -05:00
Mitchell Horne	1029dab634	mi_switch(): clean up switch types and their usage Overall, this is a non-functional change, except for kernels built with SCHED_STATS. However, the switch types are useful for communicating the intent of the caller. 1. Ensure that every caller provides a type. In most cases, we upgrade the basic yield to sched_relinquish() aka SWT_RELINQUISH. 2. The case of sched_bind() is distinct, so add a new switch type SWT_BIND. 3. Remove the two unused types, SWT_PREEMPT and SWT_SLEEPQTIMO. 4. Remove SWT_NONE altogether and assert that callers always provide a type flag. 5. Reference the mi_switch(9) man page in the comments, as these flags will be documented there. Reviewed by: kib, markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38184	2023-02-09 12:01:32 -04:00
Mitchell Horne	bff02948ed	sched_4bsd: use the same switch flags as ULE ULE uses the more specific SWT_REMOTEPREEMPT and SWT_REMOTEWAKEIDLE switch types, let's do that here as well. SWT_PREEMPT is somewhat redundant when we also have the SW_PREEMPT flag. This only has an effect for kernels built with SCHED_STATS. Reviewed by: kib, markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38183	2023-02-09 12:01:32 -04:00
Mitchell Horne	dc9b13736f	Use maybe_yield() in a few more places Reviewed by: kib, markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38186	2023-02-09 11:58:06 -04:00
Mitchell Horne	d570418bd8	Boolify should_yield() Do this ahead of adding a man page that describes the function. No functional change. Reviewed by: kib, markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38181	2023-02-09 11:58:06 -04:00
Mitchell Horne	a7a452fedc	Update comments referencing create_thread() The equivalent function is now named thread_create(). Mention kthread_add() where it is also relevant. Reviewed by: kib, markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38180	2023-02-09 11:58:06 -04:00
Mitchell Horne	e6cf1a0826	physmem: add ram0 pseudo-driver Its purpose is to reserve all I/O space belonging to physical memory from nexus, preventing it from being handed out by bus_alloc_resource() to callers such as xenpv_alloc_physmem(), which looks for the first available free range it can get. This mimics the existing pseudo-driver on x86. If needed, the device can be disabled with hint.ram.0.disabled="1" in /boot/device.hints. Reviewed by: imp MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32343	2023-02-08 16:50:46 -04:00
Mateusz Guzik	08d357287b	sysv: ansify Reported by: clang 15 Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-02-08 00:11:10 +00:00
Mateusz Guzik	8377575772	vfs: ansify Reported by: clang 15 Sponsored by: Rubicon Communications, LLC ("Netgate")	2023-02-07 23:03:20 +00:00
Mark Johnston	27202b98dc	jail: Use atomic(9) instead of CK atomics There's no reason to use one over the other here, let's prefer the interface that's used elsewhere in the kernel. No functional change intended. Reviewed by: mjg Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38360	2023-02-07 15:10:24 -05:00
Val Packett	4a1c4de232	Allow sysctl hw.machine/hw.machine_arch in capability mode There's no harm in reading strings like 'amd64'. Reviewed by: emaste, manu Sponsored by: https://www.patreon.com/valpackett Differential Revision: https://reviews.freebsd.org/D28703	2023-02-06 14:00:52 -05:00
Justin Hibbits	6472761966	IfAPI: use IfAPI in mbuf Sponsored by: Juniper Networks, Inc.	2023-02-06 12:32:04 -05:00
Justin Hibbits	1e6131bad6	IfAPI: Add needed APIs for mbuf support Summary: Add 2 new APIs for supporting recent mbuf changes: * `36e0a362ac` added the m_snd_tag_alloc() wrapper around if_snd_tag_alloc(). Push this down to the ifnet level. * `4d7a1361ef` adds the m_rcvif_serialize()/m_rcvif_restore() KPIs to serialize and restore an ifnet pointer. Add the necessary wrapper to get the index generation for this. Reviewed By: jhb Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38340	2023-02-06 12:32:04 -05:00
Rick Macklem	db5655124c	vfs_mount.c: Free exports structures in vfs_destroy_mount() During testing of exporting file systems in jails, I noticed that the export structures on a mount were not being free'd when the mount is dismounted. This bug appears to have been in the system for a very long time. It would have resulted in a slow memory leak when exported file systems were dismounted. Prior to r362158, freeing the structures during dismount would not have been safe, since VFS_CHECKEXP() returned a pointer into an export structure, which might still have been used by the NFS server for an in-progress RPC when the file system is dismounted. r362158 fixed this, so it should now be safe to free the structures in vfs_mount_destroy(), which is what this patch does. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D38385	2023-02-04 14:45:23 -08:00
Rick Macklem	d94e0bdc14	Revert "vfs_export: Add checks for correct prison when updating exports" This reverts commit `7926a01ed7`. A new patch in D38371 is being considered for doing this.	2023-02-04 14:38:32 -08:00
Konstantin Belousov	3b6056204d	FIOSEEKHOLE/FIOSEEKDATA: correct consistency for bmap-based implementation Writes on UFS through a mapped region do not allocate disk blocks in holes immediately. The blocks are allocated when the pages are paged out first time. This breaks the algorithm in vn_bmap_seekhole() and ufs_bmap_seekdata(), because VOP_BMAP() reports hole for the place which already contains a valid data. Clean the pages before doing VOP_BMAP() in the affected functions. In principle, we could clean less by only requesting clean starting from the offset, but it is probably not very important. PR: 269261 Reported by: asomers Reviewed by: asomers, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38379	2023-02-04 20:32:07 +02:00
Pawel Jakub Dawidek	c54d240eb1	kern_prot.c p_candebug(): Remove single-use variable. Reviewed by: allanjude, oshogbo Approved by: allanjude, oshogbo Differential Revision: https://reviews.freebsd.org/D38288	2023-02-02 17:00:24 -08:00
Brooks Davis	5c274b3622	whitespace: rewrap to match case directly above It's easier to visually diff the two case blocks if there aren't gratutious whitespace differences. Sponsored by: DARPA	2023-02-03 00:37:31 +00:00
Rick Macklem	7926a01ed7	vfs_export: Add checks for correct prison when updating exports mountd(8) basically does the following: getmntinfo() for each mount delete_exports using nmount(2) to do the creation/deletion of individual exports. For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo() returns all mount points, including ones being used within other prisons. This can cause confusion if the same file system is specified in the exports(5) file for multiple prisons. This patch adds a perminent identifier to each prison and marks which prison did the exports in a field of the mount structure called mnt_exjail. This field can then be compared to the perminent identifier for the prison that the thread's credentials is in. Also required was a new function called prison_isalive_permid() which returns if the prison is alive, so that the check can be ignored for prisons that have been removed. This prepares the system to allow mountd(8) to run in multiple prisons, including prison0. Future commits will complete the modifications to allow mountd(8) to run in vnet prisons. Until then, these changes should not affect semantics. Reviewed by: markj MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D38144	2023-02-02 16:20:58 -08:00
Dag-Erling Smørgrav	69d94f4c76	Add tarfs, a filesystem backed by tarballs. Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Reviewed by: pauamma, imp Differential Revision: https://reviews.freebsd.org/D37753	2023-02-02 18:19:29 +01:00
Rick Macklem	99187c3a44	prison_check_nfsd: Add check for enforce_statfs != 0 Since mountd(8) will not be able to do exports when running in a vnet prison if enforce_statfs is set to 0, add a check for this to prison_check_nfsd(). Reviewed by: jamie, markj MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D38189	2023-02-01 16:02:20 -08:00
Konstantin Belousov	2555f175b3	Move kstack_contains() and GET_STACK_USAGE() to MD machine/stack.h Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38320	2023-02-02 00:59:26 +02:00
Gleb Smirnoff	a0102dee34	sockets: in sousrsend() pass down the error to aio(4) This somewhat undermines the initial goal of sousrsend() to have all the special error handling for a write on a socket in a single place. The aio(4) needs to see EWOULDBLOCK to re-schedule the job. Because aio(4) handles return from soreceive() and sousrsend() with the same code, we can't check for (error == 0 && done < job_nbytes). Keeping this exclusion for aio(4) seems a lesser evil. Fixes: `7a2c93b86e`	2023-02-01 13:03:10 -08:00
Gleb Smirnoff	fd53298799	unix: add myself to the copyright notice for the new implementation of PF_UNIX/SOCK_DGRAM	2023-02-01 09:39:28 -08:00
Justin Hibbits	9507d03bfe	IfAPI: Use the ifnet APIs in kern_poll() The only API used is if_name(). Sponsored by: Juniper Networks, Inc.	2023-01-31 15:02:16 -05:00
Sebastian Huber	c7c53e3ca6	Clarify hardpps() parameter name and comment Since `32c203577a` by phk in 1999 (Make even more of the PPSAPI implementations generic), the "nsec" parameter of hardpps() is a time difference and no longer a time point. Change the name to "delta_nsec" and adjust the comment. Remove comment about a clock tick adjustment which is no longer in the code. Pull Request: https://github.com/freebsd/freebsd-src/pull/640 Reviewed by: imp	2023-01-30 11:07:40 -07:00
Jose Luis Duran	df949e762c	kern_environment: Partially apply style(9) Sort include files, remove duplicates and remove trailing whitespce. Pull Request: https://github.com/freebsd/freebsd-src/pull/589 Reviewed by: imp	2023-01-30 10:47:56 -07:00
Dmitry Chagin	2058f075b4	cpuset: Handle CPU_WHICH_TIDPID wherever cpuset_which() is called. cpuset_which() resolves the argument pair which and id and returns references to an appropriate resources. To avoid leaking resources or accessing unresolved references to a resources handle new which CPU_WHICH_TIDPID wherever cpuset_which() is called. To avoid code duplication cpuset_which2() has been added. Reported by: syzbot+331e8402e0f7347f0f2a@syzkaller.appspotmail.com Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D38272 MFC after: 2 weeks	2023-01-30 19:28:54 +03:00
Dmitry Chagin	e4754c8036	subr_smp: Trim trailing whitespaces. MFC after: 1 week	2023-01-29 16:18:17 +03:00
Dmitry Chagin	c21b080f3d	cpuset: Fix sched_[g\|s]etaffinity() for better compatibility with Linux. Under Linux to sched_[g\|s]etaffinity() functions the value returned from a call to gettid(2) (thread id) can be passed in the argument pid. Specifying pid as 0 will set the attribute for the calling thread, and passing the value returned from a call to getpid(2) (process id) will set the attribute for the main thread of the thread group. Native cpuset(2) family of system calls has "which" argument to determine how the value of id argument is interpreted, i.e., CPU_WHICH_TID is used to pass a thread id and CPU_WHICH_PID - to pass a process id. For now native sched_[g\|s]etaffinity() implementation is wrong as uses "which" CPU_WHICH_PID to pass both (process and thread id) to the kernel. To fix this adding a new "which" CPU_WHICH_TIDPID intended to handle both id's. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D38209 MFC after: 1 week	2023-01-29 16:17:33 +03:00
Dmitry Chagin	01f74ccd5a	libthr: Fix pthread_attr_[g\|s]etaffinity_np to match it's manual and the kernel. Since `f35093f8` semantics of a thread affinity functions is changed to be a compatible with Linux: In case of getaffinity(), the minimum cpuset_t size that the kernel permits is the maximum CPU id, present in the system, / NBBY bytes, the maximum size is not limited. In case of setaffinity(), the kernel does not limit the size of the user-provided cpuset_t, internally using only the meaningful part of the set, where the upper bound is the maximum CPU id, present in the system, no larger than the size of the kernel cpuset_t. To match pthread_attr_[g\|s]etaffinity_np checks of the user-provided cpusets to the kernel behavior export the minimum cpuset_t size allowed by running kernel via new sysctl kern.sched.cpusetsizemin and use it in checks. Reviewed by: Differential Revision: https://reviews.freebsd.org/D38112 MFC after: 1 week	2023-01-29 15:35:18 +03:00
Allan Jude	5ff13fbc19	MFV: zstd 1.5.2 Merge commit 'b3392d84da5bf2162baf937c77e0557f3fd8a52b' into zstd_1.5.2 full changelog: https://github.com/facebook/zstd/compare/v1.4.8...v1.5.2 Updated sys/kern/subr_compressor.c to new API MFC after: 3 days Relnotes: yes Sponsored by: Klara, Inc.	2023-01-27 17:22:31 +00:00
Gleb Smirnoff	f394d9c0a4	sysctl: use correct types and names in sysctl_*sec_to_sbintime The functions are intended to report kernel variables that are stored as sbintime_t (pointed to by arg1) as human readable nanoseconds or milliseconds (reported via sysctl_handle_64). The variable types and names were reversed. I guess there is no functional change here, as all types flipped around were signed 64. Note that these function aren't used yet anywhere in the kernel. Reviewed by: mav Differential revision: https://reviews.freebsd.org/D38217	2023-01-27 07:09:22 -08:00
Mitchell Horne	627ca221c3	kern_reboot: unconditionally call shutdown_reset() Currently shutdown_reset() is registered as the final entry of the shutdown_final event handler. However, if a panic occurs early in boot before the event is registered (SI_SUB_INTRINSIC), we may end up spinning in the subsequent infinite for loop and failing to reset altogether. Instead we can simply call this function unconditionally. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D37981	2023-01-23 15:10:24 -04:00
Jiajie Chen	dec7db4960	Add kf_file_nlink field to kf_file and populate it This will allow user-space programs (e.g. lsof) to locate deleted files whose nlink equals zero. Prior to this commit, programs has to use stat(kf_path) to get nlink, but that will fail if the file is deleted. [mjg: s/fail/file in the commit message] Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D38169	2023-01-23 17:09:52 +00:00
Konstantin Belousov	456f05756b	Handle int rank issues in in vn_getsize_locked() and vn_seek() In vn_getsize_locked(), when storing vattr.va_size of type u_quad_t into off_t size, we must avoid overflow. Then, the check for fsize < 0, introduced in the commit `f45feecfb2` 'vfs: add vn_getsize', is nop [1]. Reported and reviewed by: jhb Coverity CID: 1502346 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38133	2023-01-20 23:56:29 +02:00
Konstantin Belousov	5657f49ef3	kern_umtx.c do_wait(): correct confusing indent Sponsored by: The FreeBSD Foundation MFC after: 3 days	2023-01-20 23:33:11 +02:00
Brooks Davis	fa1d803c0f	epoch: replace hand coded assertion The assertion is equivalent to kstack_contains() so use that rather than spelling it out. Suggested by: jhb Reviewed by: jhb MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D38107	2023-01-20 18:04:40 +00:00
John Baldwin	846e4a206f	ktls_disable_ifnet_help: Set curvnet around sorele(). This is required in kernels with VIMAGE such as GENERIC. MFC after: 1 week Sponsored by: Chelsio Communications	2023-01-18 15:39:04 -08:00
Konstantin Belousov	0f80d5ebc8	Require INVARIANTS and WITNESS if DEBUG_VFS_LOCKS is set Reported by: pho Reviewed by: markj, mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38070	2023-01-16 05:55:47 +02:00
Zhenlei Huang	8bce8d28ab	jail: Avoid multipurpose return value of function prison_ip_restrict() Currently function prison_ip_restrict() returns true if the replacement buffer was used, or no buffer provided and allocation fails and should redo. The logic is confusing and cause possibly infinite loop from `eb8dcdeac2` . Reviewed by: jamie, glebius Approved by: kp (mentor) Differential Revision: https://reviews.freebsd.org/D37918	2023-01-13 18:45:14 +08:00
Zhenlei Huang	89ddfbbac8	jail: Fix regression panic from `eb8dcdeac2` And possibly infinite loop calling prison_ip_restrict() in kern_jail_set() [2]. [1] It is possible that prisons do not have any IPv4 or IPv6 addresses. [2] If prison_ip_restrict() is not provided with prison_ip, when it allocates prison_ip successfully, then it should return false to indicate not redo prison_ip_restrict() later. Reviewed by: glebius Approved by: kp (mentor) Fixes: `eb8dcdeac2` jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37906	2023-01-13 18:45:14 +08:00
Zhenlei Huang	ddbf879d79	jail: Correctly access IPv[46] addresses of prison_ip * Fix wrong IPv[46] addresses inherited from parent jail * Properly restrict the child jail's IPv[46] addresses Reviewed by: melifaro, glebius Approved by: kp (mentor) Fixes: `eb8dcdeac2` jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37871 Differential Revision: https://reviews.freebsd.org/D37872	2023-01-13 18:45:14 +08:00
Konstantin Belousov	37b9fb1696	Add descrip_check_write_mp() helper ... which verifies that given file table does not have file descriptors referencing vnodes on the specified mount point. It is up to the caller to ensure that the check is not racy. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37896	2022-12-29 22:55:39 +02:00
Mateusz Guzik	f45feecfb2	vfs: add vn_getsize getattr is very expensive and in important cases only gets called to get the size. This can be optimized with a dedicated routine which obtains that statistic. As a step towards that goal make size-only consumers use a dedicated routine. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D37885	2022-12-28 22:43:49 +00:00
John Baldwin	07be751727	ktls: Post receive errors on partially closed sockets. If an error such as an invalid record or one whose decryption fails is detected on a socket that has received a RST then ktls_drop() could ignore the error since INP_DROPPED could already be set. In this case soreceive_generic hangs since it does not return from a KTLS socket with pending encrypted data unless there is an error (so_error) (this behavior is to ensure that soreceive_generic doesn't return a premature EOF when there is pending data still being decrypted). Note that this was a bug prior to `69542f2682` as tcp_usr_abort would also have ignored the error in this case. Reviewed by: gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37775	2022-12-27 16:00:17 -08:00
Mateusz Guzik	829f0bcb5f	vfs: add the concept of vnode state transitions To quote from a comment above vput_final: <quote> * XXX Some filesystems pass in an exclusively locked vnode and strongly depend * on the lock being held all the way until VOP_INACTIVE. This in particular * happens with UFS which adds half-constructed vnodes to the hash, where they * can be found by other code. </quote> As is there is no mechanism which allows filesystems to denote that a vnode is fully initialized, consequently problems like the above are only found the hard way(tm). Add rudimentary support for state transitions, which in particular allow to assert the vnode is not legally unlocked until its fate is decided (either construction finishes or vgone is called to abort it). The new field lands in a 1-byte hole, thus it does not grow the struct. Bump __FreeBSD_version to 1400077 Reviewed by: kib (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D37759	2022-12-26 17:35:12 +00:00
Mateusz Guzik	94267fc907	vfs: use designated initializers for the typename array While here prefix with v for better consistency with the vnode stuff. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D37759	2022-12-26 17:34:41 +00:00
Konstantin Belousov	974be51b3f	Fixes for ptrace_syscallreq() Re-assign the sc local (syscall number) before moving args for SYS_syscall. Correct the audit and kdtrace hooks invocations. Fixes: `140ceb5d95` Sponsored by: The FreeBSD Foundation MFC after: 1 week	2022-12-23 01:53:41 +02:00
Konstantin Belousov	140ceb5d95	ptrace(2): add PT_SC_REMOTE remote syscall request Reviewed by: markj Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590	2022-12-22 23:11:35 +02:00
Konstantin Belousov	f0592b3c8d	Add a thread debugging flag TDB_BOUNDARY It indicates to a debugger that the thread is stopped at the kernel->user exit path. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590	2022-12-22 23:11:35 +02:00
Konstantin Belousov	e6feeae2f9	sys: rename td_coredump to td_remotereq and TDB_COREDUMPRQ to TDB_COREDUMPREQ Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590	2022-12-22 23:11:35 +02:00
Zhenlei Huang	21ad3e27fa	jail: Fix output of IPv[46] addresses of DDB `show prison` Reviewed by: melifaro, jamie Approved by: kp (mentor) Fixes: `eb8dcdeac2` jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37732	2022-12-21 09:53:28 +08:00
Alfredo Dal'Ava Junior	b13110e9f3	ufs/ffs: detect endian mismatch between machine and filesystem Mount on a LE machine a filesystem formatted for BE is not supported currently. This adds a check for the superblock magic number using swapped bytes to guess and warn the user that it may be a valid superblock but endian is incompatible. MFC after: 2 weeks Reviewed by: mckusick Obtained from: mckusick, alfredo Differential Revision: https://reviews.freebsd.org/D37675	2022-12-20 00:20:11 -03:00
Doug Rabson	71e9be1bd5	Don't allow stacking of file mounts Reviewed by: mjg, kib Tested by: pho	2022-12-19 16:46:27 +00:00
Doug Rabson	a1d74b2dab	Allow realpath to work for file mounts For file mounts, the directory vnode is not available from namei and this prevents the use of vn_fullpath_hardlink. In this case, we can use the vnode which was covered by the file mount with vn_fullpath. This also disallows file mounts over files with link counts greater than one to ensure a deterministic path to the mount point. Reviewed by: mjg, kib Tested by: pho	2022-12-19 16:46:27 +00:00
Doug Rabson	521fbb722c	Add support for mounting single files in nullfs The main use-case for this is to support mounting config files and secrets into OCI containers. My current workaround copies the files into the container which is messy and risks secrets leaking into container images if the cleanup fails. This adds a VFCF flag to indicate whether the filesystem supports file mounts and allows fspath to be either a directory or a file if the flag is set. Test Plan: $ sudo mkdir -p /mnt $ sudo touch /mnt/foo $ sudo mount -t nullfs /COPYRIGHT /mnt/foo Reviewed by: mjg, kib Tested by: pho	2022-12-19 16:46:13 +00:00
Doug Rabson	78d35459a2	Add vn_path_to_global_path_hardlink This is similar to vn_path_to_global_path but allows for regular files which may not be present in the cache. Reviewed by: mjg, kib Tested by: pho	2022-12-19 16:44:59 +00:00
Mateusz Guzik	8f7859e800	vfs: retire the now unused SAVESTART flag Bump __FreeBSD_version to 1400075 Tested by: pho	2022-12-19 08:11:08 +00:00
Mateusz Guzik	56da4aa554	vfs: stop using SAVESTART for rename ni_startdir has never reached rename routines anyway Reviewed by: mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D34468	2022-12-19 08:09:37 +00:00
Mateusz Guzik	8f874e92eb	vfs: make relookup take an additional argument instead of looking at SAVESTART This is a step towards removing the flag. Reviewed by: mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D34468	2022-12-19 08:09:00 +00:00
Mateusz Guzik	269c564b90	vfs: retire NDFREE There are no consumers anymore. Interested parties can NDFREE_PNBUF and vput or vrele relevant vnodes. Tested by: pho	2022-12-19 08:07:54 +00:00
Mateusz Guzik	85dac03e30	vfs: stop using NDFREE It provides nothing but a branchfest and next to no consumers want it anyway. Tested by: pho	2022-12-19 08:07:23 +00:00
Rick Macklem	bba7a2e896	kern_jail.c: Allow mountd/nfsd to optionally run in a jail This patch adds "allow.nfsd" to the jail code based on a new kernel build option VNET_NFSD. This will not work until future patches fix nmount(2) to allow mountd to run in a vnet prison and the NFS server code is patched so that global variables are in a vnet. The jail(8) man page will be patched in a future commit. Reviewed by: jamie MFC after: 4 months Differential Revision: https://reviews.freebsd.org/D37637	2022-12-17 13:43:49 -08:00
Rick Macklem	195f1b124d	vfs_mount.c: fix vfs_domount() for PRIV_VFS_MOUNT_EXPORTED It appears that, prior to r158857 vfs_domount() checked suser() when MNT_EXPORTED was specified. r158857 appears to have broken this, since MNT_EXPORTED was no longer set when mountd.c was converted to use nmount(2). r164033 replaced the suser() check with priv_check(td, PRIV_VFS_MOUNT_EXPORTED), which does the same thing (ie. checks for effective uid == 0 assuming suses_enabled is set). This patch restores this check by setting MNT_EXPORTED when the "export" mount option is specified to nmount(). I think this is reasonable since only mountd(8) should be setting exports and I doubt any non-root mounted file system would be setting its own exports. Reviewed by: kib, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D37718	2022-12-16 13:01:23 -08:00
John Baldwin	69542f2682	ktls: Close a race with setting so_error when dropping a connection. pr_abort calls tcp_usr_abort which calls tcp_drop with ECONNABORTED. After pr_abort returns, the so_error is then set to a more specific error. However, a reader can observe and return the ECONNABORTED error before so_error is set to the desired error value. This is resulting in spurious test failures of recently added tests for invalid conditions such as invalid headers. To fix, refactor the code to abort a connection to call tcp_drop directly with the desired error value. ktls_reset_send_tag already calls tcp_drop directly when it aborts a connection due to an error. Reviewed by: gallatin Reported by: CI (jenkins), gallatin, olivier Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37692	2022-12-15 12:06:26 -08:00
Andrew Gallatin	ac4e3a27ab	Unbreak the build when MAC is not defined `7a2c93b86e` removed the use of "error" when MAC was not defined, resulting in an unused variable error. Sponsored by: Netflix Reviewed by: jhb	2022-12-14 17:39:25 -05:00
Gleb Smirnoff	7a2c93b86e	sockets: provide sousrsend() that does socket specific error handling Sockets have special handling for EPIPE on a write, that was spread out into several places. Treating transient errors is also special - if protocol is atomic, than we should ignore any changes to uio_resid, a transient error means the write had completely failed (see `d2b3a0ed31`). - Provide sousrsend() that expects a valid uio, and leave sosend() for kernel consumers only. Do all special error handling right here. - In dofilewrite() don't do special handling of error for DTYPE_SOCKET. - For send(2), write(2) and aio_write(2) call into sousrsend() and remove error handling for kern_sendit(), soo_write() and soaio_process_job(). PR: 265087 Reported by: rz-rpi03 at h-ka.de Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35863	2022-12-14 10:02:44 -08:00
Jason A. Harmening	42442d7a6e	Generalize the VV_CROSSLOCK logic in vfs_lookup() When VV_CROSSLOCK is present, the lock for the vnode at the current stage of lookup must be held across the VFS_ROOT() call for the filesystem mounted at the vnode. Since VV_CROSSLOCK implies that the root vnode reuses the already-held lock, the possibility for recursion should be made clear in the flags passed to VFS_ROOT(). For cases in which the lock is held exclusive, this means passing LK_CANRECURSE. For cases in which the lock is held shared, it means clearing LK_NODDLKTREAT to allow VFS_ROOT() to potentially recurse on the shared lock even in the presence of an exclusive waiter. That the existing code works for unionfs is due to a coincidence of the current unionfs implementation. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D37458	2022-12-10 22:02:38 -06:00
Mateusz Guzik	ebdf27b6f3	uipc: remove accept_mtx It is unused since `779f106aa1` ("Listening sockets improvements.") Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-12-11 02:47:07 +00:00
Konstantin Belousov	0919f29d91	shmfd: account for the actually allocated pages Return the value as stat(2) st_blocks. Suggested and reviewed by: markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37097	2022-12-09 14:17:12 +02:00
Konstantin Belousov	37aea2649f	tmpfs: for used pages, account really allocated pages, instead of file sizes This makes tmpfs size accounting correct for the sparce files. Also correct report st_blocks/va_bytes. Previously the reported value did not accounted for the swapped out pages. PR: 223015 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37097	2022-12-09 14:17:12 +02:00
Konstantin Belousov	7ec4b29b08	uiomove_object: hide diagnostic under bootverbose Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37097	2022-12-09 14:15:37 +02:00
Doug Rabson	5eeb4f737f	imgact_binmisc: Optionally pre-open the interpreter vnode This allows the use of chroot and/or jail environments which depend on interpreters registed with imgact_binmisc to use emulator binaries from the host to emulate programs inside the chroot. Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D37432	2022-12-08 14:32:03 +00:00
Warner Losh	7014e78fb7	boot: Remove stray free() Early versions of this code had a free, but this one doesn't need it. Remove the forgotten free(vv); from earlier versions. Fixes: `ed56dcfc6b` Noticed by: Michael Butler Sponsored by: Netflix	2022-12-07 11:30:04 -07:00
Warner Losh	ed56dcfc6b	boot: pass in args as const Copy the arg that sets a variable to maximize the reuse of this routine. There are places we call it from that are const char * and it might not be safe to cast that away. Sponsored by: Netflix	2022-12-07 11:00:54 -07:00
Warner Losh	3cf97e91fa	Revert "newbus: Change attach failure behavior" This reverts commit `68c3f03021`. There are some weird crashes when KVMs switch caused by this, so revert this commit until they are sorted out. Reported by: cy@ Sponsored by: Netflix	2022-12-05 17:00:26 -07:00
Warner Losh	68c3f03021	newbus: Change attach failure behavior In the rare case that we succeed in probing, but fail to attach, flip the default to be to disable the device. hw.bus.disable_failed_devices=false is no required to restore the old behavior. The old behavior dates form a time when dynamic control of devices wasn't yet present (devctl didn't exist). Now that one can retry probe/attach the device with devctl, the default doesn't make sense: The more desirable behaivor is to have stable device numbers when one has several instances of the same device in a system (common for NICs or HBAs). Reviewed by: jhb (verbal) Sponsored by: Netflix	2022-12-04 16:29:03 -07:00
Warner Losh	aa52c6bdd7	newbus: Create a knob to disable devices that fail to attach. Normally, when a device fails to attach, we tear down the newbus state for that device so that future driver loads can try again (maybe with a different driver, or maybe with a re-loaded and fixed kld). Sometimes, however, it is desirable to have the device fail permanantly. We do this by calling device_disable() on a failed attached, as well as keeping the device in DS_ATTACHING forever. This prevents retries on that device. This is enabled via hw.bus.disable_failed_devices=1 in either a hint via the loader, or at runtime with a sysctl setting. Setting from 1 -> 0 at runtime will not affect previously disabled devices, however: they remain disabled. They can be re-enabled manually with devctl enable, however. Sponsored by: Netflix Reviewed by: gallatin, hselasky, jhb Differential Revision: https://reviews.freebsd.org/D37517	2022-12-04 16:20:24 -07:00
Warner Losh	7652743540	devd: Warn for deprecated 'kern' system type One year ago, I deprecated 'kern' in favor of 'kernel' for the system name for some power events. I'm about to remove it from the kernel, but realized there's been no warning generated for users. Preserve POLA by converting on the fly here and issuing a warning for 14.x, and an fatal error after we branch 15. Make compiling it an error on 16 to remove the gross hack after we branch. Sponsored by: Netflix Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D37584	2022-12-02 10:48:02 -07:00
Warner Losh	8d147537bf	newbus: Remove deprecated "kern" system name for resume events. The new "kernel" system name is the one that's documented and has been generated for a year now. Remove the old one now that 14.0 is getting close. Sponsored by: Netflix Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D37582	2022-12-02 10:48:02 -07:00
Warner Losh	e59fa9b2e7	newbus: Comment style nit Sponsored by: Netflix	2022-11-29 13:11:24 -07:00
Warner Losh	1ea42a2880	md5: Use c89 function definitions Use the c89 function definitions rather than the old K&R definitions. Sponsored by: Netflix	2022-11-27 13:22:31 -07:00
Eric van Gyzen	a134a12b14	Mark the debug.vnlru_nowhere sysctl as CTLFLAG_STATS The kernel doesn't read it. It's only writable so it can be cleared. Sponsored by: Dell EMC Isilon	2022-11-17 10:44:58 -06:00
Rick Macklem	4ee16246f9	vfs_vnops.c: Fix blksize for ZFS Since ZFS reports _PC_MIN_HOLE_SIZE as 512 (although it appears that an unwritten region must be at least f_iosize to remain unallocated), vn_generic_copy_file_range() uses 4096 for the copy blksize for ZFS, reulting in slow copies. For most other file systems, _PC_MIN_HOLE_SIZE and f_iosize are the same value, so this patch modifies the code to use f_iosize for most cases. It also documents in comments why the blksize is being set a certain way, so that the code does not appear to be doing "magic math". Reported by: allanjude Reviewed by: allanjude, asomers MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D37076	2022-11-16 17:37:22 -08:00
John Baldwin	9a673b7158	ktls: Add software support for AES-CBC decryption for TLS 1.1+. This is mainly intended to provide a fallback for TOE TLS which may need to use software decryption for an initial record at the start of a connection. Reviewed by: markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37370	2022-11-15 12:02:03 -08:00
Mateusz Guzik	c3f1a13902	Retire broken GPROF support from the kernel The option is not even recognized and with that patched it does not compile. Even if it did work, it would be prohibitively expensive to use. Interested parties can use pmcstat or dtrace instead.	2022-11-15 14:17:10 +00:00
John Baldwin	5920f99d21	ktls: Inline ktls_cleanup() into ktls_destroy(). Reviewed by: gallatin, markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37353	2022-11-11 16:01:02 -08:00
John Baldwin	d01db2b837	ktls: Don't leak ktls session objects for certain errors. ktls_cleanup() does not free ktls session objects, it merely cleans (and frees) members of the object. Change callers to use ktls_free() instead. Reviewed by: gallatin, markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37352	2022-11-11 16:00:37 -08:00
Mateusz Guzik	83286682f8	vfs: whack mips remnant This reverts commit `8ffa01a061`.	2022-11-09 00:31:50 +00:00
Gleb Smirnoff	8840ae2288	tcp: don't store VNET in every tcpcb, take it from the inpcbinfo Reviewed by: rscheff Differential revision: https://reviews.freebsd.org/D37125	2022-11-08 10:24:40 -08:00
Gleb Smirnoff	9eb0e8326d	tcp: provide macros to access inpcb and socket from a tcpcb There should be no functional changes with this commit. Reviewed by: rscheff Differential revision: https://reviews.freebsd.org/D37123	2022-11-08 10:24:40 -08:00
Mark Johnston	2c10be9e06	arm64: Handle translation faults for thread structures The break-before-make requirement poses a problem when promoting or demoting mappings containing thread structures: a CPU may raise a translation fault while accessing curthread, and data_abort() accesses the thread again before pmap_fault() can translate the address and return. Normally this isn't a problem because we have a hack to ensure that slabs used by the thread zone are always accessed via the direct map, where promotions and demotions are rare. However, this hack doesn't work properly with UMA_MD_SMALL_ALLOC disabled, as is the case with KASAN configured (since our KASAN implementation does not shadow the direct map and so tries to force the use of the kernel map wherever possible). Fix the problem by modifying data_abort() to handle translation faults in the kernel map without dereferencing "td", i.e., curthread, and without enabling interrupts. pmap_klookup() has special handling for translation faults which makes it safe to call in this context. Then, revert the aforementioned hack. Reviewed by: kevans, alc, kib, andrew MFC after: 1 month Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D37231	2022-11-02 13:46:25 -04:00
Andrew Gallatin	8b19898a78	Fix a panic on boot introduced by `555a861d68` First, an sbuf_new() in device_get_path() shadows the sb passed in by dev_wired_cache_add(), leaving its sb in an unfinished state, leading to a failed KASSERT(). Fixing this is as simple as removing the sbuf_new() from device_get_path() Second, we cannot simply take a pointer to the sbuf memory and store it in the device location cache, because that sbuf is freed immediately after we add data to the cache, leading to a use-after-free and eventually a double-free. Fixing this requires allocating memory for the path. After a discussion with jhb, we decided that one malloc was better than two in dev_wired_cache_add, which is why it changed so much. Reviewed by: jhb Sponsored by: Netflix MFC after: 14 days	2022-11-01 13:44:39 -04:00
Mark Johnston	1f6b6cf177	atomic: Intercept atomic_(load\|store)_bool for kernel sanitizers Fixes: `2bed73739a` ("atomic: Add plain atomic_load/store_bool()")	2022-10-29 11:10:58 -04:00
Konstantin Belousov	6b69465efb	vfs_domount(): ensure that v_mountedhere and VIRF_MOUNTPOINT are set under the vnode lock Fixes: `f7833196bd` Reported and tested by: pho Reviewed by: jah, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37198	2022-10-29 14:29:55 +03:00
John Baldwin	744bfb2131	Import the WireGuard driver from zx2c4.com. This commit brings back the driver from FreeBSD commit `f187d6dfbf` plus subsequent fixes from upstream. Relative to upstream this commit includes a few other small fixes such as additional INET and INET6 #ifdef's, #include cleanups, and updates for recent API changes in main. Reviewed by: pauamma, gbe, kevans, emaste Obtained from: git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D36909	2022-10-28 13:36:12 -07:00
Jason A. Harmening	f7833196bd	vfs_lookup(): Minor performance optimizations Refactor the symlink and mountpoint traversal logic to avoid repeatedly checking the vnode type; a symlink cannot be a mountpoint and vice versa. Avoid repeatedly checking cn_flags for NOCROSSMOUNT and simplify the check which determines whether the vnode is a mountpoint. Suggested by: mjg Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D35054	2022-10-26 19:33:33 -05:00
Jason A. Harmening	4390622c8d	vfs_busy(): fix wording in comment Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35054	2022-10-26 19:33:30 -05:00
Jason A. Harmening	706f15c5fa	Remove witness directives from crossmp locking VOPs These are of limited use since the crossmp vnode locking ops have not actually used a lock since commit `a2d3554542`. We in fact require that these operations are always issued with LK_SHARED. Additionally, these directives can produce a false positive in certain VV_CROSSLOCK cases which require upgrading of the covered vnode lock from shared to exclusive. While here, replace the runtime check of LK_SHARED with a KASSERT and expand the check to include LK_NOWAIT, which all callers pass. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D35054	2022-10-26 19:33:18 -05:00
Jason A. Harmening	080ef8a418	Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR When a lookup operation crosses into a new mountpoint, the mountpoint must first be busied before the root vnode can be locked. When a filesystem is unmounted, the vnode covered by the mountpoint must first be locked, and then the busy count for the mountpoint drained. Ordinarily, these two operations work fine if executed concurrently, but with a stacked filesystem the root vnode may in fact use the same lock as the covered vnode. By design, this will always be the case for unionfs (with either the upper or lower root vnode depending on mount options), and can also be the case for nullfs if the target and mount point are the same (which admittedly is very unlikely in practice). In this case, we have LOR. The lookup path holds the mountpoint busy while waiting on what is effectively the covered vnode lock, while a concurrent unmount holds the covered vnode lock and waits for the mountpoint's busy count to drain. Attempt to resolve this LOR by allowing the stacked filesystem to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary. Upon observing this flag, the vfs_lookup() will leave the covered vnode lock held while crossing into the mountpoint. Employ this flag for unionfs with the caveat that it can't be used for '-o below' mounts until other unionfs locking issues are resolved. Reported by: pho Tested by: pho Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35054	2022-10-26 19:33:03 -05:00
Mateusz Guzik	d346e3ac33	vfs: use cache_assert_no_entries instead of open-coding it	2022-10-26 15:54:19 +00:00
Warner Losh	deb1e3b719	physmem: Add physmem_excluded to query if a region is excluded In order to safely reuse excluded memory when it's reserved for special purpose, we need to test whether or not the memory has been reserved early in boot. physmem_excluded will return true when the entire range is excluded, false otherwise. Sponsored by: Netflix	2022-10-25 09:32:49 -06:00
Mateusz Guzik	d653aaec7a	cache: add cache_assert_no_entries	2022-10-24 15:37:43 +00:00
Hans Petter Selasky	fdd9548333	time(3): Fix spelling. Noted by: Gary Jennejohn <garyj@gmx.de> MFC after: 1 week Sponsored by: NVIDIA Networking	2022-10-23 18:42:11 +02:00
Hans Petter Selasky	35a33d14b5	time(3): Optimize tvtohz() function. List of changes: - Use integer multiplication instead of long multiplication, because the result is an integer. - Remove multiple if-statements and predict new if-statements. - Rename local variable name, "ticks" into "retval" to avoid shadowing the system "ticks" global variable. Reviewed by: kib@ and imp@ MFC after: 1 week Sponsored by: NVIDIA Networking Differential Revision: https://reviews.freebsd.org/D36859	2022-10-23 10:04:50 +02:00
Hans Petter Selasky	ee29897fc3	time(3): Declare the minimum and maximum hz values supported. Reviewed by: kib@ and imp@ MFC after: 1 week Sponsored by: NVIDIA Networking Differential Revision: https://reviews.freebsd.org/D37072	2022-10-23 10:04:50 +02:00
Konstantin Belousov	33ce178835	vn_bmap_seekhole: check that passed offset is non-negative Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37024	2022-10-19 20:24:07 +03:00
Konstantin Belousov	555a861d68	device_get_path(): take sbuf directly This allows to fix a bug where sbuf allocation done in the context of dev_wired_cache_match() must use non-sleepable allocations. Suggested by: jhb Reviewed by: jhb, takawata Discussed with: imp Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36899	2022-10-19 19:39:40 +03:00
Konstantin Belousov	8cf783bde3	device_get_path(): handle case when dev is root PR: 266862 Based on submission by: takawata Reviewed by: jhb, takawata Disscussed with: imp Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36899	2022-10-19 19:39:33 +03:00
Konstantin Belousov	d9c5a9ea49	device_get_path(): do not drop the error from BUS_GET_DEVICE_PATH() Later it would silently converted to ENOMEM always, because any error was reported as NULL return path. Reviewed by: jhb, takawata Discussed with: imp Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36899	2022-10-19 19:39:26 +03:00
Konstantin Belousov	23d2fcfbb2	subr_bus.c: some style Wrap long lines in devctl2_ioctl DEV_GET_PATH and dev_wired_cache_match() Reviewed by: jhb, takawata Discussed with: imp Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36899	2022-10-19 19:39:17 +03:00
Colin Percival	c32bd97641	kern: Support duplicate variables in early kenv Some virtual machines pass virtio MMIO device parameters via the kernel command line as a series of virtio_mmio.device=<parameters> options. These get translated into FreeBSD kernel environment variables; but unfortunately they all use the same variable name, which resulted in all but the first such parameter being ignored when the dynamic kernel environment is set up from the initial environment buffers. With this commit, duplicate environment settings will instead be stored as ${name}_1, ${name}_2... ${name}_9999. In the unlikely event that the same variable is set over 10000 times before the dynamic kernel environment is set up, we panic. Variable settings after the dynamic environment is initialized continue to override the previously-set value; the change is limited to the very early kernel boot (prior to SI_SUB_KMEM + 1) and changes behaviour from "ignore" to "store with a different name" only. Reviewed by: imp Feedback from: kevans Sponsored by: https://patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36187	2022-10-17 23:02:20 -07:00
Ali Abdallah	ba4782022a	ksched: correct return code for invalid priority By convention, EINVAL is returned when validating arguments, not EPERM. This matches the documented behaviour of sched_setscheduler(3), and that of SCHED_OTHER. PR: 227735 MFC after: 1 week Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D37021	2022-10-17 15:12:13 -03:00
Mitchell Horne	39888ed7a3	kern_intr: Check for NULL event in intr_destroy() It likely won't happen, but is consistent with the other functions of this KPI. Reviewed by: imp, jhb MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D33479	2022-10-15 15:51:44 -03:00
Zhenlei Huang	43f8c763cd	if_me: Use dedicated network privilege Separate if_me privileges from if_gif. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D36691	2022-10-15 17:05:36 +02:00
Mitchell Horne	05b727fee5	Downgrade tty_intr_event from a global It can be static within uart_tty.c. It is an open question whether there remains any real benefit to having uart instances share a swi thread. Reviewed by: imp, markj, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D36938	2022-10-12 13:46:12 -03:00
Michael Tuexen	bc0d407676	Revert "listen(): improve POSIX compliance" This reverts commit `76e6e4d72f`. Several programs in the tree use -1 instead of INT_MAX to use the maximum value. Thanks to Eugene Grosbein for pointing this out.	2022-10-12 04:33:00 +02:00
Michael Tuexen	76e6e4d72f	listen(): improve POSIX compliance Ensure that a negative backlog argument is handled as it if was 0. Reviewed by: markj@, glebius@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D31821	2022-10-11 22:46:51 +02:00
Bjoern A. Zeeb	99e6980fcf	device_get_property: add a HANDLE case This will resolve a reference and return the appropriate handle, a node on the simplebus or an ACPI_HANDLE for ACPI. For now we do not try to further abstract the return type. MFC after: 2 weeks Reviewed by: mw Differential Revision: https://reviews.freebsd.org/D36793	2022-10-09 21:51:25 +00:00
Mateusz Guzik	143942f992	unr: remove UNR64_LOCKED All platforms support 64-bit atomics now.	2022-10-08 10:41:21 +00:00
Gleb Smirnoff	53af690381	tcp: remove INP_TIMEWAIT flag Mechanically cleanup INP_TIMEWAIT from the kernel sources. After `0d7445193a`, this commit shall not cause any functional changes. Note: this flag was very often checked together with INP_DROPPED. If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we will be able to remove most of this checks and turn them to assertions. Some of them can be turned into assertions right now, but that should be carefully done on a case by case basis. Differential revision: https://reviews.freebsd.org/D36400	2022-10-06 19:24:37 -07:00
Andrew Turner	9d4cff787e	Remove pre-armv6 support from devmap Remove an old code path that was used used by Armv4/5 so is unused now. Sponsored by: The FreeBSD Foundation	2022-10-05 09:56:17 +01:00
Hans Petter Selasky	0def80f1a5	time(3): Align fast clock times to avoid firing multiple timers. In non-periodic mode absolute timers fire at exactly the time given. When specifying a fast clock, align the firing time so that less timer interrupt events are needed. Reviewed by: rrs @ Differential Revision: https://reviews.freebsd.org/D36858 MFC after: 1 week Sponsored by: NVIDIA Networking	2022-10-03 17:53:17 +02:00
Alfredo Dal'Ava Junior	db79bf75ac	powerpc: cpuset: add local functions for copyin/copyout Add local functions to workaround an instruction segment trap (panic) when the indirect functions copyin and copyout are called by an external loadable kernel module (i.e. pfsync, zfs and linuxulator). The crash was triggered by change `47a57144af`, but kernel binary linked with LLD 9 works fine. LLVM bisect points that LLD behavior chaged after dc06b0bc9ad055d06535462d91bfc2a744b2f589. This is know to affect powerpc targets only and the final fix is still being discussed with the LLVM community. PR: 266730 Reviewed by: luporl, jhibbits (on IRC, previous version) MFC after: 2 days Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br) Differential Revision: https://reviews.freebsd.org/D36234	2022-10-03 12:03:09 -03:00
Doug Moore	e5f93d1078	show_sysctl_all: reduce copying, please coverity Modify db_show_sysctl_all so that it does not copy more than once the data of the input oid, and so that what it passes to db_show_oid does not alarm coverity. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D36847	2022-10-01 12:20:04 -05:00
Gleb Smirnoff	636420bde3	unix/dgram: don't leak file descriptors when socket write failed	2022-09-30 13:43:08 -07:00
Alexander V. Chernikov	7b660faa9e	sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit. Protocols such as netlink may need a large socket receive buffer, measured in tens of megabytes. This change allows netlink to set larger socket buffers (given the privs are in place), without requiring user to manuall bump maxsockbuf. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D36747	2022-09-28 10:20:09 +00:00
Alexander V. Chernikov	f66968564d	protocols: make socket buffers ioctl handler changeable Allow to set custom per-protocol handlers for the socket buffers ioctls by introducing pr_setsbopt callback with the default value set to the currently-used sbsetopt(). Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D36746	2022-09-28 10:20:09 +00:00
Doug Moore	5294bfa751	sysctl_search_oid: remove all-NULL precondition The implementation of sysctl_search_oid no longer relies on the initial value of nodes to be all NULL, so remove the comment that demands it and let the caller stop enforcing it. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D36768	2022-09-28 04:30:11 -05:00
Doug Moore	9f6f9007b9	name2oid: use find_oidname In name2oid, use sysctl _find_oidname instead of re-implementing it. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D36765	2022-09-27 16:17:55 -05:00
Doug Moore	e96ae5cb05	sysctl_search_oid: remove useless tests sysctl_search_old makes several tests in a loop that can be removed. The first test in the loop is only ever true on the first loop iteration, and is always true on that iteration, so its work can be done before the loop begins. The upper and lower bounds on the loop variable 'indx' are each tested on each iteration, but 'indx' is changed in one direction or the other only once within the loop, so only one bound needs to be checked. Two ways remain in the loop that nodes[indx] can change (after one of them is put before the loop start), and one of them applies exactly when indx has been incremented, so no separate test for that case requires testing. Restructure and add comments that makes clearer that this is a basic depth-first search. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D36741	2022-09-27 13:30:31 -05:00
Doug Moore	ed5183455e	register_oid: fix duplicate oid after `d3f96f6610` sysctl_register_oid must check the uniqueness of any newly computed oid_number in sysctl_register_oid. Reviewed by: asomers MFC with: `d3f96f6610` Differential Revision: https://reviews.freebsd.org/D36743	2022-09-27 12:24:01 -05:00
Hans Petter Selasky	c075ea46bc	sysctl(3): Implement SYSCTL_FOREACH() to iterate all OIDs in a sysctl list. To avoid using the sysctl list macros directly in external kernel modules. Reviewed by: asomers, manu and asiciliano Differential Revision: https://reviews.freebsd.org/D36748 MFC after: 1 week Sponsored by: NVIDIA Networking	2022-09-27 19:21:21 +02:00
Mitchell Horne	f2963b530e	kasan: disable kasan_mark() after a violation Specifically, when we receive a violation and we're configured to panic, kasan_enabled gets unset before we descend into panic(). At this point, there's no longer any reason to allow marking as kasan_shadow_check() is disabled -- we have some inherent risk of faulting or panicking if the system's in a bad enough state with no benefit. Reviewed by: markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D36742	2022-09-27 11:01:21 -05:00
Alan Somers	6622e299ac	Fix the build with SCHED_STATS after `d3f96f6610` MFC with: `d3f96f6610` Sponsored by: Axcient	2022-09-26 20:20:46 -06:00
Alan Somers	d3f96f6610	Fix O(n^2) behavior in sysctl Sysctl OIDs were internally stored in linked lists, triggering O(n^2) behavior when userland iterates over many of them. The slowdown is noticeable for MIBs that have > 100 children (for example, vm.uma). But it's unignorable for kstat.zfs when a pool has > 1000 datasets. Convert the linked lists into RB trees. This produces a ~25x speedup for listing kstat.zfs with 4100 datasets, and no measurable penalty for small dataset counts. Bump __FreeBSD_version for the KPI change. Sponsored by: Axcient Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D36500	2022-09-26 18:03:34 -06:00
Alan Somers	52360ca32f	copy_file_range: truncate write if it would exceed RLIMIT_FSIZE PR: 266611 MFC after: 2 weeks Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D36706	2022-09-26 15:22:29 -06:00
Mitchell Horne	818cae0ff7	kasan: provide bus peek/poke definitions Reviewed by: andrew, markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D36700	2022-09-26 14:25:05 -05:00
Konstantin Belousov	1b4b75171e	Add vn_rlimit_fsizex() and vn_rlimit_fsizex_res() The vn_rlimit_fsizex() function: - checks that the write does not exceed RLIMIT_FSIZE limit and fs maximum supported file size - truncates write length if it exceeds the RLIMIT_FSIZE or max file size, but there are some bytes to write - sends SIGXFSZ if RLIMIT_FSIZE would be exceed otherwise POSIX mandates the truncated write in case when some bytes can be written but whole write request fails the RLIMIT_FSIZE check. The function is supposed to be used from VOP_WRITE()s. Due to pecularity in the VFS generic write syscall layer, uio_resid must correctly reflect the written amount (noted by markj). Provide the dual vn_rlimit_fsizex_res() function to correct uio_resid after the clamp done in vn_rlimit_fsizex() on VOP_WRITE() return. PR: 164793 Reviewed by: asomers, jah, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36625	2022-09-24 19:41:33 +03:00

1 2 3 4 5 ...

19588 Commits