freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	d97bfe3ff8	bus: Cleanup device_probe_child() When device driver probe method returns 0, i.e. absolute priority, do not remove its class from the device just to set it back few lines later, that may change the device unit number, etc. and after which we'd better call the probe again. If during search we found some driver with absolute priority, we do not need to set device driver and class since we haven't removed them before. It should not happen, but if second probe method call failed, remove the driver and possibly the class from the device as it was when we started. Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D32125 (cherry picked from commit `f73c2bbf81`)	2022-01-04 12:10:55 -05:00
Warner Losh	dd39806d1d	bus: Fix LINT / BUS_DEBUG build Fix `0389e9be63` for LINT built. Removed an arg only from code under BUS_DEBUG w/o rebuilding LINT... Sponsored by: Netflix Fixes: `0389e9be63` (cherry picked from commit `67a9e76da6`)	2022-01-04 12:10:42 -05:00
Warner Losh	7be46aeea1	bus: retire DF_REBID I did DF_REBID to allow for 'hoover' drivers that would attach to otherwise unattached devices in the tree. This notion didn't catch on as it was tricky to make work well and it was easier to just publish a /dev node of some flavor by the parent device. It's been nothing but dead weight for a long time. Reviewed by: mav Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D32056 (cherry picked from commit `0389e9be63`)	2022-01-04 12:00:53 -05:00
Konstantin Belousov	e163ee6ef5	Add kern.elf{32,64}.vdso knobs to enable/disable vdso preloading (cherry picked from commit `eb02958748`)	2022-01-02 18:43:01 +02:00
Konstantin Belousov	e85becdf19	vdso for ia32 on amd64 (cherry picked from commit `98c8b62524`)	2022-01-02 18:43:01 +02:00
Justin Hibbits	8851242d9d	Fix assert check for SV_DSO_SIG in exec_sysvec_init_secondary() (cherry picked from commit `d2de68811a`)	2022-01-02 18:43:01 +02:00
Konstantin Belousov	d00ebd9b9c	Pass vdso address to userspace (cherry picked from commit `01c77a436e`)	2022-01-02 18:43:01 +02:00
Konstantin Belousov	203bcad731	amd64: wrap 64bit sigtramp into vdso (cherry picked from commit `ab4524b3d7`)	2022-01-02 18:43:01 +02:00
Dmitry Chagin	25904983e8	Remove bogus cast from exec_sysvec_init(). (cherry picked from commit `b39fa4770d`)	2022-01-02 18:43:01 +02:00
Dmitry Chagin	0ceca7923f	Modify exec_sysvec_init() to allow non-native abi to setup their sysentvecs. (cherry picked from commit `21629e2a45`)	2022-01-02 18:43:00 +02:00
Konstantin Belousov	0af1cbb038	itimers: strip unused bits from struct itimer and struct itimers (cherry picked from commit `23ba59fbfb`)	2022-01-02 18:43:00 +02:00
Konstantin Belousov	840b422b7c	itimers_alloc: no need to initialize its_timers array (cherry picked from commit `3f15708478`)	2022-01-02 18:43:00 +02:00
Mark Johnston	2f9116e480	fd: Initialize more export_fd_buf fields in kern_proc_cwd_out() In particular, we need to initialize efbuf->flags, since export_vnode_to_sb() loads that field. This was mostly harmless since the flag only determines whether the output kinfo_file is packed, and KERN_PROC_CWD only ever emits a single kinfo_file anyway. Sponsored by: The FreeBSD Foundation (cherry picked from commit `327060bd77`)	2021-12-31 09:26:07 -05:00
Mark Johnston	7a06849669	unix: Increase the default datagram recv buffer size syslog(3) was recently change to support larger messages, up to 8KB. Our syslogd handles this fine, as it adjusts /dev/log's recv buffer to a large size. rsyslog, however, uses the system default of 4KB. This leads to problems since our syslog(3) retries indefinitely when a send() returns ENOBUFS, but if the message is large enough this will never succeed. Increase the default recv buffer size for datagram sockets to support 8KB syslog messages without requiring the logging daemon to adjust its buffers. PR: 260126 Reviewed by: asomers Sponsored by: The FreeBSD Foundation (cherry picked from commit `d157f2627b`)	2021-12-31 09:25:54 -05:00
Bjoern A. Zeeb	a34668185b	modules: increase MAXMODNAME and provide backward compat With various firmware files used by graphics and wireless drivers we are exceeding the current 32 character module name (file path in kldxref) length. In order to overcome this issue bump it to the maximum path length for the next version. To be able to MFC provide backward compat support for another version of the struct as the offsets for the second half change due to the array size increase. MAXMODNAME being defined to MAXPATHLEN needs param.h to be included first. With only 7 modules (or LinuxKPI module.h) not doing that adjust them rather than including param.h in module.h [1]. Reported by: Greg V (greg unrelenting.technology) Sponsored by: The FreeBSD Foundation Suggested by: imp [1] Reviewed by: imp (and others to different level) Differential Revision: https://reviews.freebsd.org/D32383 (cherry picked from commit `df38ada293`)	2021-12-30 18:26:18 +00:00
Dawid Gorecki	532d925b6f	kern_exec: Add kern.stacktop sysctl. With stack gap enabled top of the stack is moved down by a random amount of bytes. Because of that some multithreaded applications which use kern.usrstack sysctl to calculate address of stacks for their threads can fail. Add kern.stacktop sysctl, which can be used to retrieve address of the stack after stack gap is applied to it. Returns value identical to kern.usrstack for processes which have no stack gap. Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31897 (cherry picked from commit `a97d697122`)	2021-12-30 16:25:22 +01:00
Dawid Gorecki	16a900ae02	setrlimit: Take stack gap into account. Calling setrlimit with stack gap enabled and with low values of stack resource limit often caused the program to abort immediately after exiting the syscall. This happened due to the fact that the resource limit was calculated assuming that the stack started at sv_usrstack, while with stack gap enabled the stack is moved by a random number of bytes. Save information about stack size in struct vmspace and adjust the rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max, then the value is truncated to rlim_max. PR: 253208 Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31516 (cherry picked from commit `889b56c8cd`)	2021-12-30 16:24:59 +01:00
Colin Percival	f388ad85c2	Add _sleep to TSLOG Most of the nvme initialization time in my tests is being spent here (via pause_sbt). Sponsored by: https://www.patreon.com/cperciva (cherry picked from commit `bd11e253a9`)	2021-12-29 14:53:19 -08:00
Colin Percival	89a9852f32	MFC loader+userland TSLOG support stand/common: Add file_addbuf() libsa: Add support for timestamp logging (tslog) stand/common: Add support for timestamp logging (tslog) i386/loader: Call tslog_init efi/loader: Call tslog_init (+ bugfix) stand/common command_boot: Pass tslog to kernel kern_tslog: Include tslog data from loader loader: Use tslog to instrument some functions Add userland boot profiling to TSLOG (+ bugfix) Sponsored by: https://www.patreon.com/cperciva (cherry picked from commit `60a978bec9`) (cherry picked from commit `e193d3ba33`) (cherry picked from commit `c8dfc327db`) (cherry picked from commit `c4b65e954f`) (cherry picked from commit `f49381ccb6`) (cherry picked from commit `537a44bf28`) (cherry picked from commit `fe51b5a76d`) (cherry picked from commit `313724bab9`) (cherry picked from commit `46dd801acb`) (cherry picked from commit `52e125c2bd`) (cherry picked from commit `19e4f2f289`)	2021-12-29 14:53:18 -08:00
Konstantin Belousov	bafbbd46ca	Regen	2021-12-20 02:29:11 +02:00
Konstantin Belousov	1791debf4a	swapoff: add one more variant of the syscall For MFC, COMPAT_FREEBSD13 braces were removed. (cherry picked from commit `5346570276`)	2021-12-20 02:29:11 +02:00
Florian Walpen	30c3a5f248	Add idle priority scheduling privilege group to MAC/priority (cherry picked from commit `a9545eede4`)	2021-12-19 04:42:51 +02:00
Florian Walpen	5719dba765	Add PRIV_SCHED_IDPRIO (cherry picked from commit `a20a2450cd`)	2021-12-19 04:42:51 +02:00
Konstantin Belousov	c35b12cc7d	exec_elf: use intermediate u_long variable to correct mismatched type (cherry picked from commit `e499988f0c`)	2021-12-19 04:42:51 +02:00
Konstantin Belousov	f1b1fa3505	imgact_elf: avoid mapsz overflow (cherry picked from commit `bf83941638`)	2021-12-19 04:42:51 +02:00
Konstantin Belousov	f1f4d58a6b	imgact_elf: check that the alignment of PT_LOAD segment is power of two (cherry picked from commit `36df8f540f`)	2021-12-19 04:42:51 +02:00
Konstantin Belousov	681e834b02	imgact_elf: exclude invalid alignment requests (cherry picked from commit `714d6d09b5`)	2021-12-19 04:42:51 +02:00
Konstantin Belousov	29ef89ab1b	rnd_elf: add comment explaining the interface (cherry picked from commit `a4007ae10c`)	2021-12-19 04:42:51 +02:00
Konstantin Belousov	e0dc92e185	elf image activator: convert asserts into errors (cherry picked from commit `9cf78c1cf6`)	2021-12-19 04:42:51 +02:00
Konstantin Belousov	5b1800d62f	exec_elf: assert that the image vnode is still locked on return (cherry picked from commit `b4b20492cd`)	2021-12-19 04:42:50 +02:00
Konstantin Belousov	995954f0dd	Style (cherry picked from commit `88dd7a0a39`)	2021-12-19 04:42:50 +02:00
Rick Macklem	18f5b477ee	vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE When the NFSv4.2 server does a VOP_ALLOCATE(), it needs the operation to be done for the RPC's credential and not td_ucred. It also needs the writing to be done synchronously. This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE() and modifies vop_stdallocate() to use these arguments. The VOP_ALLOCATE.9 man page will be patched separately. (cherry picked from commit `f0c9847a6c`)	2021-12-18 14:30:25 -08:00
Alexander Motin	d87f1e2e36	Make msgbuf_peekbytes() not return leading zeroes. Introduce new MSGBUF_WRAP flag, indicating that buffer has wrapped at least once and does not keep zeroes from the last msgbuf_clear(). It allows msgbuf_peekbytes() to return only real data, not requiring every consumer to trim the leading zeroes after doing pointless copy. The most visible effect is that kern.msgbuf sysctl now always returns proper zero-terminated string, not only after the first buffer wrap. MFC after: 1 week Sponsored by: iXsystems, Inc. (cherry picked from commit `81dc00331d`)	2021-12-17 20:36:23 -05:00
Andriy Gapon	9c0050b0a6	kern_tc: unify timecounter to bintime delta conversion There are two places where we convert from a timecounter delta to a bintime delta: tc_windup and bintime_off. Both functions use the same calculations when the timecounter delta is small. But for a large delta (greater than approximately an equivalent of 1 second) the calculations were different. Both functions use approximate calculations based on th_scale that avoid division. Both produce values slightly greater than a true value, calculated with division by tc_frequency, would be. tc_windup is slightly more accurate, so its result is closer to the true value and, thus, smaller than bintime_off result. As a consequence there can be a jump back in time when time hands are switched after a long period of time (a large delta). Just before the switch the time would be calculated with a large delta from th_offset_count in bintime_off. tc_windup does the switch using its own calculations of a new th_offset using the large delta. As explained earlier, the new th_offset may end up being less than the previously produced binuptime. So, for a period of time new binuptime values may be "back in time" comparing to values just before the switch. Such a jump must never happen. All the code assumes that the uptime is monotonically nondecreasing and some code works incorrectly when that assumption is broken. For example, we have observed sleepq_timeout() ignoring a timeout when the sbinuptime value obtained by the callout code was greater than the expiration value, but the sbinuptime obtained in sleepq_timeout() was less than it. In that case the target thread would never get woken up. The unified calculations should ensure the monotonic property of the uptime. The problem is quite rare as normally tc_windup should be called HZ times per second (typically 1000 or 100). But it may happen in VMs on very busy hypervisors where a VM's virtual CPU may not get an execution time slot for a second or more. Reviewed by: kib Sponsored by: Panzura LLC (cherry picked from commit `3d9d64aa18`)	2021-12-17 09:28:24 +02:00
Konstantin Belousov	4305fd126c	Kernel linkers: add emergency sysctl to restore old behavior PR: 207898 (cherry picked from commit `ecd8245e0d`)	2021-12-15 03:41:29 +02:00
Konstantin Belousov	da536d64b7	kernel linker: do not read debug symbol tables for non-debug symbols PR: 207898 (cherry picked from commit `95c20faf11`)	2021-12-15 03:41:29 +02:00
Konstantin Belousov	b23c24558b	linker_debug_symbol_values(): use proper linker interface to get debug values (cherry picked from commit `72f6662662`)	2021-12-15 03:41:29 +02:00
Konstantin Belousov	8cca53de0c	Style (cherry picked from commit `c37c6f994f`)	2021-12-13 02:58:22 +02:00
Konstantin Belousov	9258e9e3a8	fcntl(2): add F_KINFO operation (cherry picked from commit `794d3e8e63`)	2021-12-13 02:58:22 +02:00
Konstantin Belousov	c14695417b	Add declaration for static export_file_to_kinfo() (cherry picked from commit `6e51d61a96`)	2021-12-13 02:58:22 +02:00
Konstantin Belousov	9e75c46527	imgact_aout.c: some style (cherry picked from commit `290e05dde0`)	2021-12-10 04:32:18 +02:00
Konstantin Belousov	aa1d548128	imgact_aout.c: We do not expect the aout support to be ported (cherry picked from commit `9da5257e1c`)	2021-12-10 04:32:18 +02:00
Wuyang Chung	815c26d4e1	Correct the name of the second parameter of biowait to wmesg This parameter is passed directly to msleep, and the name of the msleep parameter is wmesg. Make them match. Pull Request: https://github.com/freebsd/freebsd-src/pull/557 (cherry picked from commit `8587d75255`)	2021-12-06 08:55:55 -07:00
Konstantin Belousov	aebdfa9515	Expand comment explaining reasons for automatic swapoff on shutdown (cherry picked from commit `a5c2d59ed3`)	2021-12-06 02:29:43 +02:00
Konstantin Belousov	c1abd6bd3d	shutdown: unmount filesystems after swapoff (cherry picked from commit `08bb51f8d6`)	2021-12-06 02:29:43 +02:00
Gordon Bergling	33daf0eb60	kern: Correct a typo in a sysctl description - s/osbolete/obsolete/ (cherry picked from commit `fe96f62d61`)	2021-12-05 10:07:36 +01:00
Konstantin Belousov	2c52eba4f4	linker_kldload_busy(): allow recursion PR: 259748 (cherry picked from commit `4f924a786a`)	2021-12-05 03:02:57 +02:00
Gordon Bergling	2c68c93e2e	vfs: Fix a typo in a sysctl description - s/dependecies/dependencies/ (cherry picked from commit `b6f4818a7e`)	2021-12-03 16:51:32 +01:00
Mitchell Horne	86aa46c79c	Allow minidumps to be performed on the live system Add a boolean parameter to minidumpsys(), to indicate a live dump. When requested, take a snapshot of important global state, and pass this to the machine-dependent minidump function. For now this includes the kernel message buffer, and the bitset of pages to be dumped. Beyond this, we don't take much action to protect the integrity of the dump from changes in the running system. A new function msgbuf_duplicate() is added for snapshotting the message buffer. msgbuf_copy() is insufficient for this purpose since it marks any new characters it finds as read. For now, nothing can actually trigger a live minidump. A future patch will add the mechanism for this. For simplicity and safety, live dumps are disallowed for mips. Reviewed by: markj, jhb MFC after: 2 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D31993 (cherry picked from commit `588ab3c774`)	2021-12-03 10:02:03 -04:00
Mitchell Horne	eb2ea57ef1	minidump: Parameterize minidumpsys() The minidump code is written assuming that certain global state will not change, and rightly so, since it executes from a kernel debugger context. In order to support taking minidumps of a live system, we should allow copies of relevant global state that is likely to change to be passed as parameters to the minidumpsys() function. This patch does the work of parameterizing this function, by adding a struct minidumpstate argument. For now, this struct allows for copies of the kernel message buffer, and the bitset that tracks which pages should be dumped (vm_page_dump). Follow-up changes will actually make use of these arguments. Notably, dump_avail[] does not need a snapshot, since it is not expected to change after system initialization. The existing minidumpsys() definitions are renamed, and a thin MI wrapper is added to kern_dump.c, which handles the construction of the state struct. Thus, calling minidumpsys() remains as simple as before. Reviewed by: kib, markj, jhb Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31989 (cherry picked from commit `1adebe3cd6`)	2021-12-03 10:02:03 -04:00
Mark Johnston	2e837779a7	link_elf_obj: Process global ifunc relocs after other global relocs This is needed to ensure that resolvers that reference global symbols return correct results. Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `b11e6fd75b`)	2021-12-02 09:15:15 -05:00
Mark Johnston	02e3eb8d48	mbuf: Only allow extpg mbufs if the system has a direct map Some upcoming changes will modify software checksum routines like in_cksum() to operate using m_apply(), which uses the direct map to access packet data for unmapped mbufs. This approach of course does not work on platforms without a direct map, so we have to disallow the use of unmapped mbufs on such platforms. I believe this is the right tradeoff: we only configure KTLS on amd64 and arm64 today (and one KTLS consumer, NFS TLS, requires a direct map already), and the use of unmapped mbufs with plain sendfile is a recent optimization. If need be, m_apply() could be modified to create CPU-private mappings of extpg mbuf pages as a fallback. So, change mb_use_ext_pgs to be hard-wired to zero on systems without a direct map. Note that PMAP_HAS_DMAP is not a compile-time constant on some systems, so the default value of mb_use_ext_pgs has to be determined during boot. Reviewed by: jhb Discussed with: gallatin Sponsored by: The FreeBSD Foundation (cherry picked from commit `fcaa890c44`)	2021-11-29 20:34:54 -05:00
Mark Johnston	fdd27db348	vm: Add a mode to vm_object_page_remove() which skips invalid pages This will be used to break a deadlock in ZFS between the per-mountpoint teardown lock and page busy locks. In particular, when purging data from the page cache during dataset rollback, we want to avoid blocking on the busy state of invalid pages since the busying thread may be blocked on the teardown lock in zfs_getpages(). Add a helper, vn_pages_remove_valid(), for use by filesystems. Bump __FreeBSD_version so that the OpenZFS port can make use of the new helper. PR: 258208 Reviewed by: avg, kib, sef Tested by: pho (part of a larger patch) Sponsored by: The FreeBSD Foundation (cherry picked from commit `d28af1abf0`)	2021-11-29 09:09:28 -05:00
Justin Hibbits	2949655427	Fix segment size in compressing core dumps A core segment is bounded in size only by memory size. On 64-bit architectures this means a segment can be much larger than 4GB. However, compress_chunk() takes only a u_int, clamping segment size to 4GB-1, resulting in a truncated core. Everything else, including the compressor internally, uses size_t, so use size_t at the boundary here. This dates back to the original refactor back in 2015 (r279801 / `aa14e9b7`). PR: 260006 Sponsored by: Juniper Networks, Inc. (cherry picked from commit `63cb9308a7`)	2021-11-29 09:08:11 -05:00
Gordon Bergling	c8f21cc79f	sched_ule(4): Fix two typo in source code comments - s/conditons/conditions/ - s/unconditonally/unconditionally/ (cherry picked from commit `15b5c347f1`)	2021-11-28 12:41:11 +01:00
John Baldwin	94280c5811	ktls: Reject some invalid cipher suites. - Reject AES-CBC cipher suites for TLS 1.0 and TLS 1.1 using auth algorithms other than SHA1-HMAC. - Reject AES-GCM cipher suites for TLS versions older than 1.2. Reviewed by: markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D32842 (cherry picked from commit `900a28fe33`)	2021-11-23 15:11:53 -08:00
John Baldwin	ba6b771d17	ktls: Ensure FIFO encryption order for TLS 1.0. TLS 1.0 records are encrypted as one continuous CBC chain where the last block of the previous record is used as the IV for the next record. As a result, TLS 1.0 records cannot be encrypted out of order but must be encrypted as a FIFO. If the later pages of a sendfile(2) request complete before the first pages, then TLS records can be encrypted out of order. For TLS 1.1 and later this is fine, but this can break for TLS 1.0. To cope, add a queue in each TLS session to hold TLS records that contain valid unencrypted data but are waiting for an earlier TLS record to be encrypted first. - In ktls_enqueue(), check if a TLS record being queued is the next record expected for a TLS 1.0 session. If not, it is placed in sorted order in the pending_records queue in the TLS session. If it is the next expected record, queue it for SW encryption like normal. In addition, check if this new record (really a potential batch of records) was holding up any previously queued records in the pending_records queue. Any of those records that are now in order are also placed on the queue for SW encryption. - In ktls_destroy(), free any TLS records on the pending_records queue. These mbufs are marked M_NOTREADY so were not freed when the socket buffer was purged in sbdestroy(). Instead, they must be freed explicitly. Reviewed by: gallatin, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D32381 (cherry picked from commit `9f03d2c001`)	2021-11-23 15:11:44 -08:00
John Baldwin	0053fedc1b	ktls: Reject attempts to enable AES-CBC with TLS 1.3. AES-CBC cipher suites are not supported in TLS 1.3. Reported by: syzbot+ab501c50033ec01d53c6@syzkaller.appspotmail.com Reviewed by: tuexen, markj Differential Revision: https://reviews.freebsd.org/D32404 (cherry picked from commit `a63752cce6`)	2021-11-23 15:11:44 -08:00
John Baldwin	6afc00ed13	ktls: Use COUNTER_U64_DEFINE_EARLY for the ktls_toe_chacha20 counter. I missed updating this counter when rebasing the changes in `9c64fc4029` after the switch to COUNTER_U64_DEFINE_EARLY in `1755b2b989`. Fixes: `9c64fc4029` Add Chacha20-Poly1305 as a KTLS cipher suite. Sponsored by: Netflix (cherry picked from commit `90972f0402`)	2021-11-23 15:11:44 -08:00
John Baldwin	b7f27a60ac	Add Chacha20-Poly1305 as a KTLS cipher suite. Chacha20-Poly1305 for TLS is an AEAD cipher suite for both TLS 1.2 and TLS 1.3 (RFCs 7905 and 8446). For both versions, Chacha20 uses the server and client IVs as implicit nonces xored with the record sequence number to generate the per-record nonce matching the construction used with AES-GCM for TLS 1.3. Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27839 (cherry picked from commit `9c64fc4029`)	2021-11-23 15:11:44 -08:00
John Baldwin	b07b1f890e	Stop creating socket aio kprocs during boot. Create the initial pool of kprocs on demand when the first socket AIO request is submitted instead. The pool of kprocs used for other AIO requests is similarly created on first use. Reviewed by: asomers Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32468 (cherry picked from commit `d1b6fef075`)	2021-11-23 15:11:43 -08:00
Mark Johnston	35dfdb88ea	unix: Remove a write-only local variable Reported by: clang Sponsored by: The FreeBSD Foundation (cherry picked from commit `42188bb5c1`)	2021-11-23 09:32:46 -05:00
Mark Johnston	d16fbc488e	clock: Group the "clocks" SYSINIT with the function definition This is how most SYSINITs are defined. Also annotate the dummy parameter with __unused. No functional change intended. Sponsored by: The FreeBSD Foundation (cherry picked from commit `2287ced2f5`)	2021-11-22 08:45:47 -05:00
Mark Johnston	686b143f37	timecounter: Initialize tc_lock earlier Hyper-V wants to register its MSR-based timecounter during SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be available for DELAY(). So we cannot use MTX_SYSINIT to initialize the timecounter lock. PR: 259878 Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `3339950117`)	2021-11-22 08:44:49 -05:00
Konstantin Belousov	7dab2a7cf5	Kernel linkers: some style (cherry picked from commit `a7e4eb1422`)	2021-11-21 02:27:44 +02:00
Warner Losh	87586bff11	sysbeep: Adjust interface to take a duration as a sbt Change the 'period' argument to 'duration' and change its type to sbintime_t so we can more easily express different durations. Reviewed by: tsoome, glebius Differential Revision: https://reviews.freebsd.org/D32619 (cherry picked from commit `072d5b98c4`)	2021-11-18 21:52:22 -07:00
Konstantin Belousov	19f2755d9e	DEBUG_VFS_LOCKS: stop excluding devfs and doomed vnode from asserts (cherry picked from commit `d032cda0d0`)	2021-11-19 06:25:29 +02:00
Konstantin Belousov	b9283ea323	Make locking assertions for VOP_FSYNC() and VOP_FDATASYNC() more correct (cherry picked from commit `47b248ac65`)	2021-11-19 06:25:29 +02:00
Konstantin Belousov	3a12ea648f	freevnode(): lock the freeing vnode around destroy_vpollinfo() (cherry picked from commit `d1d675cb30`)	2021-11-19 06:25:29 +02:00
Konstantin Belousov	4c04226222	getblk(): do not require devvp vnodes to be locked (cherry picked from commit `a7b4a54d2c`)	2021-11-19 06:25:28 +02:00
Konstantin Belousov	5bd64640f7	start_init: use 'p' (cherry picked from commit `8660813153`)	2021-11-18 02:32:32 +02:00
Hans Petter Selasky	4a36455c41	Factor out flags preserved during mbuf demote into a separate define. This define will later on be used by coming TLS RX hardware offload patches. No functional change intended. Reviewed by: jhb@ Sponsored by: NVIDIA Networking (cherry picked from commit `dd31400c3c`)	2021-11-12 15:33:54 +01:00
Konstantin Belousov	9de9a33050	fexecve(2): allow O_PATH file descriptors opened without O_EXEC (cherry picked from commit `be10c0a910`)	2021-11-06 04:12:33 +02:00
Konstantin Belousov	5291b294d3	proc_get_binpath(): provide syntaxically correct value for unused NDINIT arg (cherry picked from commit `7ac82c96fe`)	2021-11-06 04:12:32 +02:00
Konstantin Belousov	392fbf5cce	proc_get_binpath(): return empty string instead of NULL (cherry picked from commit `02de91d740`)	2021-11-06 04:12:32 +02:00
Konstantin Belousov	17aab23bf7	fexecve(2): restore the attempts to calculate the executable path (cherry picked from commit `e4ce23b238`)	2021-11-06 04:12:32 +02:00
Konstantin Belousov	0303cc4be8	Extract proc_get_binpath() from sysctl_kern_proc_pathname() (cherry picked from commit `f34fc6ba06`)	2021-11-06 04:12:32 +02:00
Konstantin Belousov	ea4e8e191c	sysctl kern.proc.procname: report right hardlink name PR: 248184 (cherry picked from commit `ee92c8a842`)	2021-11-06 04:12:32 +02:00
Konstantin Belousov	d39bd6d14d	exec: store parent directory and hardlink name of the binary in struct proc (cherry picked from commit `351d5f7fc5`)	2021-11-06 04:12:32 +02:00
Konstantin Belousov	a69fb7452e	exec: provide right hardlink name in AT_EXECPATH PR: 248184 (cherry picked from commit `0c10648fbb`)	2021-11-06 04:12:31 +02:00
Konstantin Belousov	b94df11d52	Make vn_fullpath_hardlink() externally callable (cherry picked from commit `9a0bee9f6a`)	2021-11-06 04:12:31 +02:00
Konstantin Belousov	1849361644	struct image_params: use bool type for boolean members (cherry picked from commit `15bf81f354`)	2021-11-06 04:12:31 +02:00
Konstantin Belousov	3b4baefca9	do_execve(): switch boolean locals to use bool type (cherry picked from commit `9d58243fbc`)	2021-11-06 04:12:31 +02:00
Konstantin Belousov	0b06c284ae	kern_exec.c: style (cherry picked from commit `143dba3a91`)	2021-11-06 04:12:31 +02:00
Konstantin Belousov	3e322ded35	Unmap shared page manually before doing vm_map_remove() on exit or exec (cherry picked from commit `1c69690319`)	2021-11-04 02:56:39 +02:00
Sebastian Huber	b765d3da06	kern_tc.c: Scaling/large delta recalculation (cherry picked from commit `ae750fbac7`)	2021-11-04 02:56:38 +02:00
Mark Johnston	66cb1858f4	Convert vm_page_alloc() callers to use vm_page_alloc_noobj(). Remove page zeroing code from consumers and stop specifying VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to simply use VM_ALLOC_WAITOK. Similarly, convert vm_page_alloc_domain() callers. Note that callers are now responsible for assigning the pindex. Reviewed by: alc, hselasky, kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `a4667e09e6`)	2021-11-03 13:39:36 -04:00
Mark Johnston	b5e5020260	rmslock: Update td_locks during lock and unlock operations Reviewed by: mjg Sponsored by: The FreeBSD Foundation (cherry picked from commit `71f31d784e`)	2021-11-03 09:15:05 -04:00
Mark Johnston	10d94487df	kasan: Use vm_offset_t for the first parameter to kasan_shadow_map() No functional change intended. Sponsored by: The FreeBSD Foundation (cherry picked from commit `20e3b9d8bd`)	2021-11-02 18:17:58 -04:00
Alexander Motin	aac9d07f93	sleepqueue(9): Remove sbinuptime() from sleepq_timeout(). Callout c_time is always bigger or equal than the scheduled time. It is also smaller than sbinuptime() and can't change while the callback is running. So we reliably can use it instead of sbinuptime() here. In case there was a race and the callout was rescheduled to the later time, the callback will be called again. According to profiles it saves ~5% of the timer interrupt time even with fast TSC timecounter. MFC after: 1 month (cherry picked from commit `6df1359e55`)	2021-11-01 20:24:07 -04:00
Mark Johnston	3388bf06d7	Generalize sanitizer interceptors for memory and string routines Similar to commit `3ead60236f` ("Generalize bus_space(9) and atomic(9) sanitizer interceptors"), use a more generic scheme for interposing sanitizer implementations of routines like memcpy(). No functional change intended. Sponsored by: The FreeBSD Foundation (cherry picked from commit `ec8f1ea8d5`)	2021-11-01 10:20:50 -04:00
Mark Johnston	bf0986b742	Generalize bus_space(9) and atomic(9) sanitizer interceptors Make it easy to define interceptors for new sanitizer runtimes, rather than assuming KCSAN. Lay a bit of groundwork for KASAN and KMSAN. When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions in atomic_san.h are used by default instead of the inline implementations in the platform's atomic.h. These definitions are implemented in the sanitizer runtime, which includes machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual implementations. No functional change intended. Sponsored by: The FreeBSD Foundation (cherry picked from commit `3ead60236f`)	2021-11-01 10:16:39 -04:00
Mark Johnston	252b6ae3e6	KASAN: Disable checking before triggering a panic KASAN hooks will not generate reports if panicstr != NULL, but then there is a window after the initial panic() call where another report may be raised. This can happen if a false positive occurs; to simplify debugging of such problems, avoid recursing. Sponsored by: The FreeBSD Foundation (cherry picked from commit `ea3fbe0707`)	2021-11-01 10:07:45 -04:00
Mark Johnston	224a01a342	KASAN: Implement __asan_unregister_globals() It will be called during KLD unload to unpoison the redzones following global variables. Otherwise, virtual address ranges previously used for a KLD may be left tainted, triggering false positives when they are recycled. Reported by: pho Sponsored by: The FreeBSD Foundation (cherry picked from commit `588c7a06df`)	2021-11-01 10:07:13 -04:00
Mark Johnston	28c338b342	realloc: Fix KASAN(9) shadow map updates When copying from the old buffer to the new buffer, we don't know the requested size of the old allocation, but only the size of the allocation provided by UMA. This value is "alloc". Because the copy may access bytes in the old allocation's red zone, we must mark the full allocation valid in the shadow map. Do so using the correct size. Reported by: kp Tested by: kp Sponsored by: The FreeBSD Foundation (cherry picked from commit `9a7c2de364`)	2021-11-01 10:05:22 -04:00
Mark Johnston	9710b74dd0	malloc: Add state transitions for KASAN - Reuse some REDZONE bits to keep track of the requested and allocated sizes, and use that to provide red zones. - As in UMA, disable memory trashing to avoid unnecessary CPU overhead. Sponsored by: The FreeBSD Foundation (cherry picked from commit `06a53ecf24`)	2021-11-01 10:03:36 -04:00
Mark Johnston	2748ecec95	execve: Mark exec argument buffers We cache mapped execve argument buffers to avoid the overhead of TLB shootdowns. Mark them invalid when they are freed to the cache. Sponsored by: The FreeBSD Foundation (cherry picked from commit `f1c3adefd9`)	2021-11-01 10:03:28 -04:00
Mark Johnston	75306778f1	vfs: Add KASAN state transitions for vnodes vnodes are a bit special in that they may exist on per-CPU lists even while free. Add a KASAN-only destructor that poisons regions of each vnode that are not expected to be accessed after a free. Sponsored by: The FreeBSD Foundation (cherry picked from commit `b261bb4057`)	2021-11-01 10:03:19 -04:00
Mark Johnston	a3d4c8e21d	amd64: Implement a KASAN shadow map The idea behind KASAN is to use a region of memory to track the validity of buffers in the kernel map. This region is the shadow map. The compiler inserts calls to the KASAN runtime for every emitted load and store, and the runtime uses the shadow map to decide whether the access is valid. Various kernel allocators call kasan_mark() to update the shadow map. Since the shadow map tracks only accesses to the kernel map, accesses to other kernel maps are not validated by KASAN. UMA_MD_SMALL_ALLOC is disabled when KASAN is configured to reduce usage of the direct map. Currently we have no mechanism to completely eliminate uses of the direct map, so KASAN's coverage is not comprehensive. The shadow map uses one byte per eight bytes in the kernel map. In pmap_bootstrap() we create an initial set of page tables for the kernel and preloaded data. When pmap_growkernel() is called, we call kasan_shadow_map() to extend the shadow map. kasan_shadow_map() uses pmap_kasan_enter() to allocate memory for the shadow region and map it. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29417 (cherry picked from commit `6faf45b34b`)	2021-11-01 09:57:30 -04:00
Mark Johnston	48d2c7cc30	Add the KASAN runtime KASAN enables the use of LLVM's AddressSanitizer in the kernel. This feature makes use of compiler instrumentation to validate memory accesses in the kernel and detect several types of bugs, including use-after-frees and out-of-bounds accesses. It is particularly effective when combined with test suites or syzkaller. KASAN has high CPU and memory usage overhead and so is not suited for production environments. The runtime and pmap maintain a shadow of the kernel map to store information about the validity of memory mapped at a given kernel address. The runtime implements a number of functions defined by the compiler ABI. These are prefixed by __asan. The compiler emits calls to __asan_load() and __asan_store() around memory accesses, and the runtime consults the shadow map to determine whether a given access is valid. kasan_mark() is called by various kernel allocators to update state in the shadow map. Updates to those allocators will come in subsequent commits. The runtime also defines various interceptors. Some low-level routines are implemented in assembly and are thus not amenable to compiler instrumentation. To handle this, the runtime implements these routines on behalf of the rest of the kernel. The sanitizer implementation validates memory accesses manually before handing off to the real implementation. The sanitizer in a KASAN-configured kernel can be disabled by setting the loader tunable debug.kasan.disable=1. Obtained from: NetBSD Sponsored by: The FreeBSD Foundation (cherry picked from commit `38da497a4d`)	2021-11-01 09:56:31 -04:00

1 2 3 4 5 ...

18633 Commits