freebsd-dev

Author	SHA1	Message	Date
Kyle Evans	59dafcde62	kqueue: fix conversion of timer data to sbintime This unbreaks the i386 kqueue timer tests after a recent change switched NOTE_ABSTIME over to using microseconds. Notably, the data argument (which holds useconds) is an int64_t, but we were passing it to timer2sbintime which takes an intptr_t. Perhaps in a previous incarnation, intptr_t would have made sense, but now it just leads to the timestamp getting truncated and subsequently rejected when it no longer fits in an intptr_t. PR: 245768 Reported by: lwhsu / CI MFC after: 1 week	2020-04-21 03:57:30 +00:00
Mitchell Horne	49439183ce	Convert arm's physmem interface to MI code The arm_physmem interface found in arm's MD code provides a convenient set of routines for adding/excluding physical memory regions and initializing important kernel globals such as Maxmem, realmem, phys_avail[], and dump_avail[]. It is especially convenient for FDT systems, since we can use FDT parsing functions and pass the result directly to one of these physmem routines. This interface is already in use on arm and arm64, and can be used to simplify this early initialization on RISC-V as well. This requires only a couple trivial changes: - Move arm_physmem_kernel_addr to arm/machdep.c. It is unused on arm64, and manipulated entirely in arm MD code. - Convert arm32_btop/arm64_btop to atop. This is equivalently defined on all architectures. - Drop the "arm" prefix. Reviewed by: manu, emaste ("looks reasonable") MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24153	2020-04-19 00:12:30 +00:00
Kristof Provost	3f8bc99c4c	ethersubr: Make the mac address generation more robust If we create two (vnet) jails and create a bridge interface in each we end up with the same mac address on both bridge interfaces. These very often conflicts, resulting in same mac address in both jails. Mitigate this problem by including the jail name in the mac address. Reviewed by: kevans, melifaro MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24383	2020-04-18 07:50:30 +00:00
Konstantin Belousov	acf65eef23	sendfile: When all io finished, assert that sfio->pa[] is in expected state. It must contain fully restored contigous run of the wired pages from the object, except possible trimmed tail. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2020-04-18 03:09:25 +00:00
Konstantin Belousov	e795a04083	The pa argument for sendfile_iodone() is not necessary a slice of sfio->pa. It is true for zfs, but it is not for e.g. vnode or buffer pagers. When fixing bogus pages, fix them in both places. Rely on the fact that pa[0] must have been invalid so it cannot be bogus. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2020-04-18 03:07:18 +00:00
Kyle Evans	23d5326823	tty: convert tty_lock_assert to tty_assert_locked to hide lock type A later change, currently being iterated on in D24459, will in-fact change the lock type to an sx so that TTY drivers can sleep on it if they need to. Committing this ahead of time to make the review in question a little more palatable. tty_lock_assert() is unfortunately still needed for now in two places to make sure that the tty lock has not been recursed upon, for those scenarios where it's supplied by the TTY driver and possibly a mutex that is allowed to recurse. Suggested by: markj	2020-04-17 18:34:49 +00:00
Brooks Davis	b24e6ac8b7	Convert canary, execpathp, and pagesizes to pointers. Use AUXARGS_ENTRY_PTR to export these pointers. This is a followup to r359987 and r359988. Reviewed by: jhb Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24446	2020-04-16 21:53:17 +00:00
Brooks Davis	19112958cf	style(9): end continued line with operator.	2020-04-16 17:24:13 +00:00
Kyle Evans	c318828929	Preload hostuuid for early-boot use prison0's hostuuid will get set by the hostid rc script, either after generating it and saving it to /etc/hostid or by simply reading /etc/hostid. Some things (e.g. arbitrary MAC address generation) may use the hostuuid as a factor in early boot, so providing a way to read /etc/hostid (if it's available) and using it before userland starts up is desirable. The code is written such that the preload doesn't have to be /etc/hostid, thus not assuming that there will be newline at the end of the buffer or even the exact shape of the newline. White trailing whitespace/non-printables trimmed, the result will be validated as a valid uuid before it's used for early boot purposes. The preload can be turned off with hostuuid_load="NO" in /boot/loader.conf, just as other preloads; it's worth noting that this is a 37-byte file, the overhead is believed to be generally minimal. It doesn't seem necessary at this time to be concerned with kern.hostid. One does wonder if we should consider validating hostuuids coming in via jail_set(2); some bits seem to care about uuid form and we bother validating format of smbios-provided uuid and in-fact whatever uuid comes from /etc/hostid. Reviewed by: karels, delphij, jamie MFC after: 1 week (don't preload by default, probably) Differential Revision: https://reviews.freebsd.org/D24288	2020-04-16 00:54:06 +00:00
Brooks Davis	9df1c38bbc	Export argc, argv, envc, envv, and ps_strings in auxargs. This simplifies discovery of these values, potentially with reducing the number of syscalls we need to make at runtime. Longer term, we wish to convert the startup process to pass an auxargs pointer to _start() and use that rather than walking off the end of envv. This is cleaner, more C-friendly, and for systems with strong bounds (e.g. CHERI) necessary. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24407	2020-04-15 20:23:55 +00:00
Brooks Davis	397df744f9	Make ps_strings in struct image_params into a pointer. This is a prepratory commit for D24407. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA	2020-04-15 20:21:30 +00:00
Kyle Evans	3fb92d4cb1	validate_uuid: absorb the rest of parse_uuid with a flags arg This makes the naming annoyance (validate_uuid vs. parse_uuid) less of an issue and centralizes all of the functionality into the new KPI while still making the extra validation optional. The end-result is all the same as far as hostuuid validation-only goes.	2020-04-15 18:39:12 +00:00
Pawel Biernacki	f65eac0fc0	sysctl_handle_string: Put logical or in parentheses. Reported by: rdivacky Approved by: kib (mentor) Pointy-hat to: kaktus	2020-04-15 16:55:38 +00:00
Pawel Biernacki	1627b1fd9d	sysctl(9): fix handling string tunables. r357614 changed internals of handling string sysctls, and inadvertently broke setting string tunables. Take them into account. PR: 245463 Reported by: jhb, np Reviewed by: imp, jhb, kib Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D24429	2020-04-15 16:33:55 +00:00
Hans Petter Selasky	a90fb6cf3c	Cast all ioctl command arguments through uint32_t internally. Hide debug print showing use of sign extended ioctl command argument under INVARIANTS. The print is available to all and can easily fill up the logs. No functional change intended. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-04-15 13:20:51 +00:00
Kyle Evans	142ffb8bdc	kern uuid: break format validation out into a separate KPI This new KPI, validate_uuid, strictly validates the formatting of the input UUID and, optionally, populates a given struct uuid. As noted in the header, the key differences are that the new KPI won't recognize an empty string as a nil UUID and it won't do any kind of semantic validation on it. Also key is that populating a struct uuid is optional, so the caller doesn't necessarily need to allocate a bogus one on the stack just to validate the string. This KPI has specifically been broken out in support of D24288, which will preload /etc/hostid in loader so that early boot hostuuid users (e.g. anything that calls ether_gen_addr) can have a valid hostuuid to work with once it's been stashed in /etc/hostid.	2020-04-15 03:59:26 +00:00
Brooks Davis	618a20d4f9	Remove bogus use of useracc() in (clock_)nanosleep. There's no point in pre-checking that we can access the user's rmtp pointer before we do it in copyout(). While here, improve style(9) compliance. Reviewed by: imp MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24409	2020-04-14 20:53:12 +00:00
Brooks Davis	562894f0dc	Centralize compatability translation macros. Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h and replace existing definitation with includes where required. This eliminates duplicate code and allows Linux and FreeBSD compatability headers to be included in the same files. Input from: cem, jhb Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24275	2020-04-14 20:30:48 +00:00
Kyle Evans	e19b97f7a0	sysent: re-roll after r359930	2020-04-14 18:11:26 +00:00
Kyle Evans	7d03e08112	Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_range Include a temporarily compatibility shim as well for kernels predating close_range, since closefrom is used in some critical areas. Reviewed by: markj (previous version), kib Differential Revision: https://reviews.freebsd.org/D24399	2020-04-14 18:07:42 +00:00
Jonathan T. Looney	fb401f1bba	Make sonewconn() overflow messages have per-socket rate-limits and values. sonewconn() emits debug-level messages when a listen socket's queue overflows. Currently, sonewconn() tracks overflows on a global basis. It will only log one message every 60 seconds, regardless of how many sockets experience overflows. And, when it next logs at the end of the 60 seconds, it records a single message referencing a single PCB with the total number of overflows across all sockets. This commit changes to per-socket overflow tracking. The code will now log one message every 60 seconds per socket. And, the code will provide per-socket queue length and overflow counts. It also provides a way to change the period between log messages using a sysctl. Reviewed by: jhb (previous version), bcr (manpages) MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24316	2020-04-14 15:38:18 +00:00
Jonathan T. Looney	f6ab9795d4	Print more detail as part of the sonewconn() overflow message. When a socket's listen queue overflows, sonewconn() emits a debug-level log message. These messages are sometimes useful to systems administrators in highlighting a process which is not keeping up with its listen queue. This commit attempts to enhance the usefulness of this message by printing more details about the socket's address. If all else fails, it will at least print the domain name of the socket. Reviewed by: bz, jhb, kbowling MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24272	2020-04-14 15:30:34 +00:00
Andrew Gallatin	23feb56348	KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache. This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext. In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs. This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone, This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons. In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache. Many thanks to glebius for helping to make this better in the Netflix tree. Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213	2020-04-14 14:46:06 +00:00
Kyle Evans	51a16c8412	posixshm: fix counting of writable mappings Similar to mmap'ing vnodes, posixshm should count any mapping where maxprot contains VM_PROT_WRITE (i.e. fd opened r/w with no write-seal applied) as writable and thus blocking of any write-seal. The memfd tests have been amended to reflect the fixes here, which notably includes: 1. Fix for error return bug; EPERM is not a documented failure mode for mmap 2. Fix rejection of write-seal with active mappings that can be upgraded via mprotect(2). Reported by: markj Discussed with: markj, kib	2020-04-14 13:32:03 +00:00
Mark Johnston	b36871af6d	Fix sendto() on unconnected SOCK_STREAM/SEQPACKET unix sockets. Previously the unpcb pointer of the newly connected remote socket was not initialized correctly, so attempting to lock it would result in a null pointer dereference. Reported by: syzkaller MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-04-13 19:22:05 +00:00
Mark Johnston	c7841c6b8e	Relax restrictions on private mappings of POSIX shm objects. When creating a private mapping of a POSIX shared memory object, VM_PROT_WRITE should always be included in maxprot regardless of permissions on the underlying FD. Otherwise it is possible to open a shm object read-only, map it with MAP_PRIVATE and PROT_WRITE, and violate the invariant in vm_map_insert() that (prot & maxprot) == prot. Reported by: syzkaller Reviewed by: kevans, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24398	2020-04-13 19:20:39 +00:00
Kyle Evans	605c4cda2f	close_range/closefrom: fix regression from close_range introduction close_range will clamp the range between [0, fdp->fd_lastfile], but failed to take into account that fdp->fd_lastfile can become -1 if all fds are closed. =-( In this scenario, just return because there's nothing further we can do at the moment. Add a test case for this, fork() and simply closefrom(0) twice in the child; on the second invocation, fdp->fd_lastfile == -1 and will trigger a panic before this change. X-MFC-With: r359836	2020-04-13 17:55:31 +00:00
Kyle Evans	3d224fc909	sysent: re-roll after introduction of close_range in r359836	2020-04-12 21:23:51 +00:00
Kyle Evans	472ced39ef	Implement a close_range(2) syscall close_range(min, max, flags) allows for a range of descriptors to be closed. The Python folk have indicated that they would much prefer this interface to closefrom(2), as the case may be that they/someone have special fds dup'd to higher in the range and they can't necessarily closefrom(min) because they don't want to hit the upper range, but relocating them to lower isn't necessarily feasible. sys_closefrom has been rewritten to use kern_close_range() using ~0U to indicate closing to the end of the range. This was chosen rather than requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the call to kern_close_range for simplicity. The flags argument of close_range(2) is currently unused, so any flags set is currently EINVAL. It was added to the interface in Linux so that future flags could be added for, e.g., "halt on first error" and things of this nature. This patch is based on a syscall of the same design that is expected to be merged into Linux. Reviewed by: kib, markj, vangyzen (all slightly earlier revisions) Differential Revision: https://reviews.freebsd.org/D21627	2020-04-12 21:23:19 +00:00
Konstantin Belousov	d25f1b21c9	sendfile_iodone: correct calculation of the page index for relookup. This is yet another bug in r359473. Reported and tested by: delphij Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2020-04-12 05:10:48 +00:00
Mark Johnston	25f4ddfb2b	sbappendcontrol() needs to avoid clearing M_NOTREADY on data mbufs. If LOCAL_CREDS is set on a unix socket and sendfile() is called, sendfile will call uipc_send(PRUS_NOTREADY), prepending a control message to the M_NOTREADY mbufs. uipc_send() then calls sbappendcontrol() instead of sbappend(), and sbappendcontrol() would erroneously clear M_NOTREADY. Pass send flags to sbappendcontrol(), like we do for sbappend(), to preserve M_READY when necessary. Reported by: syzkaller MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24333	2020-04-10 20:42:11 +00:00
Mark Johnston	a50b1900a0	Properly handle disconnected sockets in uipc_ready(). When transmitting over a unix socket, data is placed directly into the receiving socket's receive buffer, instead of the transmitting socket's send buffer. This means that when pru_ready is called during sendfile(), the passed socket does not contain M_NOTREADY mbufs in its buffers; uipc_ready() must locate the linked socket. Currently uipc_ready() frees the mbufs if the socket is disconnected, but this is wrong since the mbufs may still be present in the receiving socket's buffer after a disconnect. This can result in a use-after-free and potentially a double free if the receive buffer is flushed after uipc_ready() frees the mbufs. Fix the problem by trying harder to locate the correct socket buffer and calling sbready(): use the global list of SOCK_STREAM unix sockets to search for a sockbuf containing the now-ready mbufs. Only free the mbufs if we fail this search. Reviewed by: jah, kib Reported and tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24332	2020-04-10 20:41:59 +00:00
Konstantin Belousov	f709eee61a	Do not pass bogus page to mbufs. This is a bug in r359473. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2020-04-10 01:28:47 +00:00
Kirk McKusick	9a79b99003	When running with a kernel compiled with DEBUG_LOCKS, before panic'ing for recusing on a non-recursive lock, print out the kernel stack where the lock was originally acquired.	2020-04-09 23:42:13 +00:00
Konstantin Belousov	90f29198c3	Remove extra call to vfs_op_exit() from vfs_write_suspend() when VFS_SYNC() fails. The vfs_write_resume() handler already does vfs_op_exit() for us. Reported by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation	2020-04-09 18:38:00 +00:00
Rick Macklem	8de97f394e	Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933	2020-04-09 14:44:46 +00:00
Conrad Meyer	1e9ee2b596	ddb(4): show lockchain: Don't dereference LK_KERNPROC Also, print a little more information for otherwise unhandled inhibited states. Finally, improve the grammar of some prints. Some of the print statements missing verb. Sponsored by: Dell EMC Isilon	2020-04-02 20:47:51 +00:00
John Baldwin	59838c1a19	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837	2020-04-01 19:22:09 +00:00
Jason A. Harmening	8abaf6a7c8	deadlkres: include thread name in panic messages Reviewed by: markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24235	2020-04-01 04:51:39 +00:00
Andrew Gallatin	c4ad247b7a	KTLS: Coalesce adjacent TLS trailers & headers to improve PCIe bus efficiency KTLS uses the embedded header and trailer fields of unmapped mbufs. This can lead to "silly" buffer lengths, where we have an mbuf chain that will create a scatter/gather lists with a regular pattern of 13 bytes followed by 16 bytes between each adjacent TLS record. For software ktls we typically wind up with a pattern where we have several TLS records encrypted, and made ready at once. When these records are made ready, we can coalesce these silly buffers in sbready_compress by copying 13b TLS header of the next record into the 16b TLS trailer of the current record. After doing so, we now have a small 29 byte chunk between each TLS record. This marginally increases PCIe bus efficiency. We've seen an almost 1Gb/s increase in peak throughput on Broadwell based Xeons running a 100% software TLS workload with Mellanox ConnectX-4 NICs. Note that this change is ifdef'ed for KTLS, as KTLS is currently the only user of the hdr/trailer feature of unmapped mbufs, and peeking into them is expensive, since the ext_pgs struct lives in separately allocated memory, and may be cold in cache. This optimization is not applicable to HW ("NIC") TLS, as that depends on having the entire TLS record described by a single unmapped mbuf, so we cannot shift parts of the record between mbufs for HW TLS. Reviewed by: jhb, hselasky, scottl Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24204	2020-03-30 23:29:53 +00:00
Konstantin Belousov	c506a6386f	kern_sendfile.c: fix bugs with handling of busy page states. - Do not call into a vnode pager while leaving some pages from the same block as the current run, xbusy. This immediately deadlocks if pager needs to instantiate the buffer. - Only relookup bogus pages after io finished, otherwise we might obliterate the valid pages by out of date disk content. While there, expand the comment explaining this pecularity. - Do not double-unbusy on error. Split unbusy for error case, which is left in the sendfile_swapin(), from the more properly coded normal case in sendfile_iodone(). - Add an XXXKIB comment explaining the serious bug in the validation algorithm, not fixed by this patch series. PR: 244713 Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 22:13:32 +00:00
Konstantin Belousov	d866353677	kern_sendfile.c: do not release sfio reference on error. It is already done by sendfile_iodone(), now consistently for all errors. This de-facto reverts r358597, after r359466. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 22:01:36 +00:00
Konstantin Belousov	59e1ac9d79	kern_sendfile.c: wait for all in-flight ios completion before unwiring pages. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:57:28 +00:00
Konstantin Belousov	8f0a223c78	kern_sendfile.c: add specific malloc type. Now sfio leaks are more easily seen in the malloc statistics than e.g. just wired or busy pages leak. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:50:51 +00:00
Konstantin Belousov	abfdf76791	VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:44:30 +00:00
Konstantin Belousov	bbca4bd7cd	buffer pager: skip bogus pages. We cannot validate bogus page by reading a buffer. PR: 244713 Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:42:46 +00:00
Konstantin Belousov	0ac8511aef	kern_sendfile.c style: order headers alphabetically. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:40:35 +00:00
Ed Maste	fa9ce9ac29	capabilities.conf: provide information about capmode permitted syscalls Reviewed by: jhb (earlier) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24118	2020-03-30 18:24:07 +00:00
Mark Johnston	9b1d850be8	Remove the "config" taskqgroup and its KPIs. Equivalent functionality is already provided by taskqueue(9), just use that instead. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-30 14:24:03 +00:00
Mark Johnston	59d50fe5ef	Simplify taskqgroup inititialization. taskqgroup initialization was broken into two steps: 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; 2. initialize taskqueues, start taskqueue threads, enqueue "binder" tasks to bind threads to specific CPUs, at SI_SUB_SMP. Step 2 tries to handle the case where tasks have already been attached to a queue, by migrating them to their intended queue. In particular, tasks can't be enqueued before step 2 has completed. This breaks NFS mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP is not defined, since mountroot happens before SI_SUB_SMP in this case. Simplify initialization: do all initialization except for CPU binding at SI_SUB_TASKQ. This means that until CPU binding is completed, group tasks may be executed on a CPU other than that to which they were bound, but this should not be a problem for existing users of the taskqgroup KPIs. Reported by: sbruno Tested by: bdragon, sbruno MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24188	2020-03-30 14:22:52 +00:00
John Baldwin	c034143269	Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_' fields are always used for auth keys and settings and 'csp_cipher_' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677	2020-03-27 18:25:23 +00:00
Mark Johnston	b7e3a3b6e1	Remove unused SYSINIT macros for capability rights. Static rights are initialized in cap_rights_sysinit(). MFC after: 1 week	2020-03-26 15:02:37 +00:00
Conrad Meyer	ca0ec73c11	Expand generic subword atomic primitives The goal of this change is to make the atomic_load_acq_{8,16}, atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives available in MI-namespace. The second goal is to get this draft out of my local tree, as anything that requires a full tinderbox is a big burden out of tree. MD specifics can be refined individually afterwards. The generic implementations may not be ideal for your architecture; feel free to implement better versions. If no subword_atomic definitions are needed, the include can be removed from your arch's machine/atomic.h. Generic definitions are guarded by defined macros of the same name. To avoid picking up conflicting generic definitions, some macro defines are added to various MD machine/atomic.h to register an existing implementation. Include _atomic_subword.h in arm and arm64 machine/atomic.h. For some odd reason, KCSAN only generates some versions of primitives. Generate the _acq variants of atomic_load._8, atomic_load._16, and atomic_testandset.*_long. There are other questionably disabled primitives, but I didn't run into them, so I left them alone. KCSAN is only built for amd64 in tinderbox for now. Add atomic_subword implementations of atomic_load_acq_{8,16} implemented using masking and atomic_load_acq_32. Add generic atomic_subword implementations of atomic_testandset_long(), atomic_testandclear_long(), and atomic_testandset_acq_long(), using atomic_fcmpset_long() and atomic_fcmpset_acq_long(). On x86, add atomic_testandset_acq_long as an alias for atomic_testandset_long. Reviewed by: kevans, rlibby (previous versions both) Differential Revision: https://reviews.freebsd.org/D22963	2020-03-25 23:12:43 +00:00
Konstantin Belousov	8cf8c2f65a	kern_copy_file_range(): check the file type. The syscall can only operate on valid vnode types. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2020-03-24 17:16:52 +00:00
Rick Macklem	f9122b6488	Fix an NFS mount attempt where VFS_STATFS() fails. r353150 added mnt_rootvnode and this seems to have broken NFS mounts when the VFS_STATFS() called just after VFS_MOUNT() returns an error. Then the code calls VFS_UNMOUNT(), which calls vflush(), which returns EBUSY. Then the thread get stuck sleeping on "mntref" in vfs_mount_destroy(). This patch fixes this problem. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D24022	2020-03-22 18:18:30 +00:00
Mark Johnston	99258935eb	Lock the socket in soo_stat(). Otherwise nothing synchronizes with a concurrent conversion of the socket to a listening socket. Only the PF_LOCAL protocols implement pru_sense, and it is safe to hold the socket lock there, so do so for now. Reported by: syzbot+4801f1b79ea40953ca8e@syzkaller.appspotmail.com MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-03-20 20:09:00 +00:00
Mark Johnston	db38b699b4	Simplify uipc_detach() slightly. Remove a goto and an unneeded local variable, and fix style. No functional change intended. Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-20 16:18:54 +00:00
Mark Johnston	569a186d02	Remove UNP_NASCENT, reverting r303855. unp_connectat() no longer holds the link lock across calls to sonewconn(), so the recursion described in r303855 can no longer occur. No functional change intended. Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-20 16:17:54 +00:00
Mark Johnston	429537caeb	kern_dup(): Call filecaps_free_prep() in a write section. filecaps_free_prep() bzeros the capabilities structure and we need to be careful to synchronize with unlocked readers, which expect a consistent rights structure. Reviewed by: kib, mjg Reported by: syzbot+5f30b507f91ddedded21@syzkaller.appspotmail.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24120	2020-03-19 15:40:05 +00:00
Mark Johnston	2d896b816b	Enter a write sequence when updating rights. The Capsicum system calls modify file descriptor table entries. To ensure that readers observe a consistent snapshot of descriptor writes, the system calls need to signal to unlocked readers that an update is pending. Note that ioctl rights are always checked with the descriptor table lock held, so it is not strictly necessary to signal unlocked readers. However, we probably want to enable lockless ioctl checks eventually, so use seqc_write_begin() in kern_cap_ioctls_limit() too. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24119	2020-03-19 15:39:45 +00:00
Brandon Bergren	3069380898	[PowerPC][Book-E] Fix missing load base in elf_cpu_parse_dynamic(). When I implemented MD DYNAMIC parsing, I was originally passing a linker_file_t so that the MD code could relocate pointers. However, it turns out this isn't even filled in until later, so it was always 0. Just pass the load base (ef->address) directly, as that's really the only thing we were interested in in the first place. This fixes a crash on RB800 where it was trying to write to an unmapped address when updating the GOT. Reviewed by: jhibbits Sponsored by: Tag1 Consulting, Inc. Differential Revision: https://reviews.freebsd.org/D24105	2020-03-18 02:58:18 +00:00
Conrad Meyer	34086d5bda	Implement sysctl kern.boot_id Boot IDs are random, opaque 128-bit identifiers that distinguish distinct system boots. A new ID is generated each time the system boots. Unlike kern.boottime, the value is not modified by NTP adjustments. It remains fixed until the machine is restarted. PR: 244867 Reported by: Ricardo Fraile <rfraile AT rfraile.eu> MFC after: I do not intend to, but feel free	2020-03-17 22:27:16 +00:00
Conrad Meyer	a99c321802	Remove misleading / redundant bzero in callout_callwheel_init The intent seems to be zeroing all of the cc_cpu array, or its singleton on such platforms. The assumption made is that the BSP is always zero. The code smell was introduced in r326218, which changed the prior explicit zero to 'curcpu'. The change is only valid if curcpu continues to be zero, contrary to the aim expressed in that commit message. So, more succinctly, the expression could be: memset(cc_cpu,0,sizeof(cc_cpu)). However, there's no point. cc_cpu lives in the data section and has a zero initial value already. So this revision just removes the problematic statement. No functional change. Appeases a (false positive, ish) Coverity CID. CID: 1383567 Reported by: Puneeth Jothaiah <puneethkumar.jothaia AT dell.com> Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D24089	2020-03-16 22:25:25 +00:00
Bjoern A. Zeeb	1b786d0191	kern_jail: missing \0 termination check on osrelease parameter If a user spplies a non-\0 terminated osrelease parameter reading it back may disclose kernel memory. This is a problem in case of nested jails (children.max > 0, which is not the default). Otherwise root outside the jail has access to kernel memory by other means and root inside a jail cannot create a child jail. Add the proper \0 check at the end of a supplied osrelease parameter and make sure any copies of the field will be \0-terminated. Submitted by: Hans Christian Woithe (chwoithe yahoo.com) MFC after: 3 days	2020-03-14 14:04:55 +00:00
Michael Tuexen	db4493f7b6	sendfile() does currently not support SCTP sockets. Therefore, fail the call. Reviewed by: markj@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24059	2020-03-13 18:38:28 +00:00
Conrad Meyer	7a119578a4	kern_shutdown: Add missing EKCD ifdef Submitted by: Puneeth Jothaiah <puneethkumar.jothaia AT dell.com> Reviewed by: bdrewery Sponsored by: Dell EMC Isilon	2020-03-12 21:26:36 +00:00
Konstantin Belousov	00ebd80972	Fix signal delivery might be on sigfastblock clearing. When clearing sigfastblock, either by sigfastblock(UNSETPTR) call or implicitly on execve(2), kernel must check for pending signals and reschedule them if needed. E.g. on execve, all other threads are terminated, and current thread fast block pointer is cleaned. If any signal was left pending, it can now be delivered to the current thread, and we should prepare for ast() on return to userspace to notice the signals. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2020-03-10 20:25:03 +00:00
Konstantin Belousov	0bc52b0bdb	Return reschedule_signals() to being static again. It was used after sigfastblock_setpend() call in in ast() when current thread fast-blocks signals. Add a flag to sigfastblock_setpend() to request reschedule, and remove the direct use of the function from subr_trap.c Tested by: pho Sponsored by: The FreeBSD Foundation	2020-03-10 20:04:38 +00:00
Konstantin Belousov	2d3c083fd7	pipe: explain why not deallocating inode number is fine. Suggested and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24009	2020-03-09 23:40:25 +00:00
Konstantin Belousov	c6d3d601c9	Preallocate pipe buffers on pipe creation. Return ENOMEM if one of the buffer cannot be created even with the minimal size. This should avoid subsequent spurious ENOMEM errors from write(2) when buffer cannot be allocated on the fly, after we reported that the pipe was create succesfully. Reported by: Keno Fischer <keno@juliacomputing.com> Reviewed by: markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23993	2020-03-09 21:55:26 +00:00
Konstantin Belousov	1213de28f8	Style. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23993	2020-03-09 19:46:28 +00:00
Andrew Gallatin	98085bae8c	make lacp's use_numa hashing aware of send tags When I did the use_numa support, I missed the fact that there is a separate hash function for send tag nic selection. So when use_numa is enabled, ktls offload does not work properly, as it does not reliably allocate a send tag on the proper egress nic since different egress nics are selected for send-tag allocation and packet transmit. To fix this, this change: - refectors lacp_select_tx_port_by_hash() and lacp_select_tx_port() to make lacp_select_tx_port_by_hash() always called by lacp_select_tx_port() - pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash() - adds a numa_domain field to if_snd_tag_alloc_params - plumbs the numa domain into places where we allocate send tags In testing with NIC TLS setup on a NUMA machine, I see thousands of output errors before the change when enabling kern.ipc.tls.ifnet.permitted=1. After the change, I see no errors, and I see the NIC sysctl counters showing active TLS offload sessions. Reviewed by: rrs, hselasky, jhb Sponsored by: Netflix	2020-03-09 13:44:51 +00:00
Mateusz Guzik	d2222aa0e9	fd: use smr for managing struct pwd This has a side effect of eliminating filedesc slock/sunlock during path lookup, which in turn removes contention vs concurrent modifications to the fd table. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23889	2020-03-08 00:23:36 +00:00
Mark Johnston	d869a17e62	Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23978	2020-03-06 19:10:00 +00:00
Mark Johnston	fffcb56f7a	Add COUNTER_U64_SYSINIT() and COUNTER_U64_DEFINE_EARLY(). The aim is to reduce the boilerplate needed today to define and initialize global counters. Also add SI_SUB_COUNTER to the sysinit ordering. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23977	2020-03-06 19:09:01 +00:00
Chuck Silvers	f15ccf8836	Add a new "mntfs" pseudo file system which provides private device vnodes for file systems to safely access their disk devices, and adapt FFS to use it. Also add a new BO_NOBUFS flag to allow enforcing that file systems using mntfs vnodes do not accidentally use the original devfs vnode to create buffers. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23787	2020-03-06 18:41:37 +00:00
Konstantin Belousov	695e0701a0	buffer pager: deref ucred immediately after read. Ucred is passed to bread(9) so that non-local filesystems use proper credentials. But, since clean buffer might be cached unless buf_pager_relbuf is not enabled, it makes credentials to have extra reference until buffer is reclaimed. Ucred reference would prevent jail from destroying if creds are jailed. Dereferencing the read credentials on the valid buffer avoid that, and should be fine because the buffer is valid and does not need re-read. PR: 238032 Reported by: bz Reproduced and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23775	2020-03-05 15:52:34 +00:00
Mateusz Guzik	8d4d271e92	execve: use LOCKSHARED when looking up the interpreter Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23956	2020-03-04 19:52:34 +00:00
Chuck Silvers	37bf88e790	if vm_pager_get_pages_async() returns an error, release the sfio->nios refcount that we took earlier that represents the I/O that ended up not being started. Reviewed by: glebius Approved by: imp (mentor) Sponsored by: Netflix	2020-03-04 00:22:50 +00:00
Bjoern A. Zeeb	a2fba2a700	upic_ktrls: make RSS compile again here The results of ktls_get_cpu() are stored in u_int and NETISR_CPUID_NONE requires u_int. Adjust uint16_t to uint_t in order to make RSS kernels compile some more again. HPTS still has to be fixed, which is a bit more complicated. Reviewed by: jhb, gallatin, rrs Differential Revision: https://reviews.freebsd.org/D23726	2020-03-03 14:07:44 +00:00
Mark Johnston	4cf919edb9	Fix the malloc type used in sys_shm_unlink() after r354808. PR: 244563 Reported by: swills	2020-03-03 00:28:37 +00:00
Pawel Biernacki	b05ca4290c	sys/: Document few more sysctls. Submitted by: Antranig Vartanian <antranigv@freebsd.am> Reviewed by: kaktus Commented by: jhb Approved by: kib (mentor) Sponsored by: illuria security Differential Revision: https://reviews.freebsd.org/D23759	2020-03-02 15:30:52 +00:00
Mateusz Guzik	2f423bce54	vfs: stop taking additional refs on root vnode during lookup They are spurious since introduction of struct pwd, which provides them implicitly. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23885	2020-03-01 21:54:28 +00:00
Mateusz Guzik	8d03b99b9d	fd: move vnodes out of filedesc into a dedicated structure The new structure is copy-on-write. With the assumption that path lookups are significantly more frequent than chdirs and chrooting this is a win. This provides stable root and jail root vnodes without the need to reference them on lookup, which in turn means less work on globally shared structures. Note this also happens to fix a bug where jail vnode was never referenced, meaning subsequent access on lookup could run into use-after-free. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23884	2020-03-01 21:53:46 +00:00
Mateusz Guzik	8243063f9b	fd: make fgetvp_rights work without the filedesc lock Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23883	2020-03-01 21:50:13 +00:00
Mark Johnston	5aa5420ff2	Ensure that arm64 thread structures are allocated from the direct map. Otherwise we can fail to handle translation faults on curthread, leading to a panic. Reviewed by: alc, rlibby Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23895	2020-02-29 18:41:48 +00:00
Jeff Roberson	6be21eb778	Provide a lock free alternative to resolve bogus pages. This is not likely to be much of a perf win, just a nice code simplification. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23866	2020-02-28 21:42:48 +00:00
Jeff Roberson	7aaf252c96	Convert a few triviail consumers to the new unlocked grab API. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23847	2020-02-28 20:34:30 +00:00
Jeff Roberson	f72eaaeb03	Use unlocked grab for uipc_shm/tmpfs. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23865	2020-02-28 20:33:28 +00:00
Mark Johnston	46994ec2b1	Fix standalone builds of systrace.ko after r357912. Sponsored by: The FreeBSD Foundation	2020-02-28 17:05:04 +00:00
Mark Johnston	c99d0c5801	Add a blocking counter KPI. refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723	2020-02-28 16:05:18 +00:00
Jeff Roberson	561af25fa7	Simplify lazy advance with a 64bit atomic cmpset. This provides the potential to force a lazy (tick based) SMR to advance when there are blocking waiters by decoupling the wr_seq value from the ticks value. Add some missing compiler barriers. Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D23825	2020-02-27 19:05:26 +00:00
Warner Losh	729ea680be	Remove trailing white space.	2020-02-26 16:22:28 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Gleb Smirnoff	6bc27f086a	Generalize resources freeing in sendfile with different scenarios. Now we execute sendfile_iodone() in all possible cases, which guarantees that vm_object_pip_wakeup() is called and sfio structure is freed. At the beginning of sendfile initialize sfio->m to NULL, that would indicate that the mbuf chain either doesn't exist, or belongs to the syscall (not to I/O completion). Fill sfio->m only at a point when we are positive that there are I/Os ongoing and before releasing syscall's reference on sfio. In sendfile_iodone() perform vm_object_pip_wakeup() once last reference is released, then check for sfio->m. NULL pointer indicates that we need only to free the memory. Reviewed by: jtl, gallatin	2020-02-25 19:29:05 +00:00
Gleb Smirnoff	f85e1a806b	Make ktls_frame() never fail. Caller must supply correct mbufs. This makes sendfile code a bit simplier.	2020-02-25 19:26:40 +00:00
Gleb Smirnoff	69302907d6	When sendfile_swapin() sweeps through pages in search for a bogus page skip first and last pages. This is a micro optimisation.	2020-02-25 19:11:20 +00:00
Ryan Libby	fe20aaec0a	sys/kern: quiet -Wwrite-strings Quiet a variety of Wwrite-strings warnings in sys/kern at low-impact sites. This patch avoids addressing certain others which would need to plumb const through structure definitions. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23798	2020-02-23 03:32:16 +00:00
Ryan Libby	2782c00c04	vfs: quiet -Wwrite-strings Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23797	2020-02-23 03:32:11 +00:00
Ryan Libby	eaa17d4291	sys/vm: quiet -Wwrite-strings Discussed with: kib Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23796	2020-02-23 03:32:04 +00:00

1 2 3 4 5 ...

17398 Commits