freebsd-dev

Author	SHA1	Message	Date
Kyle Evans	3d224fc909	sysent: re-roll after introduction of close_range in r359836	2020-04-12 21:23:51 +00:00
Kyle Evans	472ced39ef	Implement a close_range(2) syscall close_range(min, max, flags) allows for a range of descriptors to be closed. The Python folk have indicated that they would much prefer this interface to closefrom(2), as the case may be that they/someone have special fds dup'd to higher in the range and they can't necessarily closefrom(min) because they don't want to hit the upper range, but relocating them to lower isn't necessarily feasible. sys_closefrom has been rewritten to use kern_close_range() using ~0U to indicate closing to the end of the range. This was chosen rather than requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the call to kern_close_range for simplicity. The flags argument of close_range(2) is currently unused, so any flags set is currently EINVAL. It was added to the interface in Linux so that future flags could be added for, e.g., "halt on first error" and things of this nature. This patch is based on a syscall of the same design that is expected to be merged into Linux. Reviewed by: kib, markj, vangyzen (all slightly earlier revisions) Differential Revision: https://reviews.freebsd.org/D21627	2020-04-12 21:23:19 +00:00
Konstantin Belousov	d25f1b21c9	sendfile_iodone: correct calculation of the page index for relookup. This is yet another bug in r359473. Reported and tested by: delphij Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2020-04-12 05:10:48 +00:00
Mark Johnston	25f4ddfb2b	sbappendcontrol() needs to avoid clearing M_NOTREADY on data mbufs. If LOCAL_CREDS is set on a unix socket and sendfile() is called, sendfile will call uipc_send(PRUS_NOTREADY), prepending a control message to the M_NOTREADY mbufs. uipc_send() then calls sbappendcontrol() instead of sbappend(), and sbappendcontrol() would erroneously clear M_NOTREADY. Pass send flags to sbappendcontrol(), like we do for sbappend(), to preserve M_READY when necessary. Reported by: syzkaller MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24333	2020-04-10 20:42:11 +00:00
Mark Johnston	a50b1900a0	Properly handle disconnected sockets in uipc_ready(). When transmitting over a unix socket, data is placed directly into the receiving socket's receive buffer, instead of the transmitting socket's send buffer. This means that when pru_ready is called during sendfile(), the passed socket does not contain M_NOTREADY mbufs in its buffers; uipc_ready() must locate the linked socket. Currently uipc_ready() frees the mbufs if the socket is disconnected, but this is wrong since the mbufs may still be present in the receiving socket's buffer after a disconnect. This can result in a use-after-free and potentially a double free if the receive buffer is flushed after uipc_ready() frees the mbufs. Fix the problem by trying harder to locate the correct socket buffer and calling sbready(): use the global list of SOCK_STREAM unix sockets to search for a sockbuf containing the now-ready mbufs. Only free the mbufs if we fail this search. Reviewed by: jah, kib Reported and tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24332	2020-04-10 20:41:59 +00:00
Konstantin Belousov	f709eee61a	Do not pass bogus page to mbufs. This is a bug in r359473. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2020-04-10 01:28:47 +00:00
Kirk McKusick	9a79b99003	When running with a kernel compiled with DEBUG_LOCKS, before panic'ing for recusing on a non-recursive lock, print out the kernel stack where the lock was originally acquired.	2020-04-09 23:42:13 +00:00
Konstantin Belousov	90f29198c3	Remove extra call to vfs_op_exit() from vfs_write_suspend() when VFS_SYNC() fails. The vfs_write_resume() handler already does vfs_op_exit() for us. Reported by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation	2020-04-09 18:38:00 +00:00
Rick Macklem	8de97f394e	Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933	2020-04-09 14:44:46 +00:00
Conrad Meyer	1e9ee2b596	ddb(4): show lockchain: Don't dereference LK_KERNPROC Also, print a little more information for otherwise unhandled inhibited states. Finally, improve the grammar of some prints. Some of the print statements missing verb. Sponsored by: Dell EMC Isilon	2020-04-02 20:47:51 +00:00
John Baldwin	59838c1a19	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837	2020-04-01 19:22:09 +00:00
Jason A. Harmening	8abaf6a7c8	deadlkres: include thread name in panic messages Reviewed by: markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24235	2020-04-01 04:51:39 +00:00
Andrew Gallatin	c4ad247b7a	KTLS: Coalesce adjacent TLS trailers & headers to improve PCIe bus efficiency KTLS uses the embedded header and trailer fields of unmapped mbufs. This can lead to "silly" buffer lengths, where we have an mbuf chain that will create a scatter/gather lists with a regular pattern of 13 bytes followed by 16 bytes between each adjacent TLS record. For software ktls we typically wind up with a pattern where we have several TLS records encrypted, and made ready at once. When these records are made ready, we can coalesce these silly buffers in sbready_compress by copying 13b TLS header of the next record into the 16b TLS trailer of the current record. After doing so, we now have a small 29 byte chunk between each TLS record. This marginally increases PCIe bus efficiency. We've seen an almost 1Gb/s increase in peak throughput on Broadwell based Xeons running a 100% software TLS workload with Mellanox ConnectX-4 NICs. Note that this change is ifdef'ed for KTLS, as KTLS is currently the only user of the hdr/trailer feature of unmapped mbufs, and peeking into them is expensive, since the ext_pgs struct lives in separately allocated memory, and may be cold in cache. This optimization is not applicable to HW ("NIC") TLS, as that depends on having the entire TLS record described by a single unmapped mbuf, so we cannot shift parts of the record between mbufs for HW TLS. Reviewed by: jhb, hselasky, scottl Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24204	2020-03-30 23:29:53 +00:00
Konstantin Belousov	c506a6386f	kern_sendfile.c: fix bugs with handling of busy page states. - Do not call into a vnode pager while leaving some pages from the same block as the current run, xbusy. This immediately deadlocks if pager needs to instantiate the buffer. - Only relookup bogus pages after io finished, otherwise we might obliterate the valid pages by out of date disk content. While there, expand the comment explaining this pecularity. - Do not double-unbusy on error. Split unbusy for error case, which is left in the sendfile_swapin(), from the more properly coded normal case in sendfile_iodone(). - Add an XXXKIB comment explaining the serious bug in the validation algorithm, not fixed by this patch series. PR: 244713 Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 22:13:32 +00:00
Konstantin Belousov	d866353677	kern_sendfile.c: do not release sfio reference on error. It is already done by sendfile_iodone(), now consistently for all errors. This de-facto reverts r358597, after r359466. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 22:01:36 +00:00
Konstantin Belousov	59e1ac9d79	kern_sendfile.c: wait for all in-flight ios completion before unwiring pages. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:57:28 +00:00
Konstantin Belousov	8f0a223c78	kern_sendfile.c: add specific malloc type. Now sfio leaks are more easily seen in the malloc statistics than e.g. just wired or busy pages leak. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:50:51 +00:00
Konstantin Belousov	abfdf76791	VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:44:30 +00:00
Konstantin Belousov	bbca4bd7cd	buffer pager: skip bogus pages. We cannot validate bogus page by reading a buffer. PR: 244713 Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:42:46 +00:00
Konstantin Belousov	0ac8511aef	kern_sendfile.c style: order headers alphabetically. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:40:35 +00:00
Ed Maste	fa9ce9ac29	capabilities.conf: provide information about capmode permitted syscalls Reviewed by: jhb (earlier) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24118	2020-03-30 18:24:07 +00:00
Mark Johnston	9b1d850be8	Remove the "config" taskqgroup and its KPIs. Equivalent functionality is already provided by taskqueue(9), just use that instead. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-30 14:24:03 +00:00
Mark Johnston	59d50fe5ef	Simplify taskqgroup inititialization. taskqgroup initialization was broken into two steps: 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; 2. initialize taskqueues, start taskqueue threads, enqueue "binder" tasks to bind threads to specific CPUs, at SI_SUB_SMP. Step 2 tries to handle the case where tasks have already been attached to a queue, by migrating them to their intended queue. In particular, tasks can't be enqueued before step 2 has completed. This breaks NFS mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP is not defined, since mountroot happens before SI_SUB_SMP in this case. Simplify initialization: do all initialization except for CPU binding at SI_SUB_TASKQ. This means that until CPU binding is completed, group tasks may be executed on a CPU other than that to which they were bound, but this should not be a problem for existing users of the taskqgroup KPIs. Reported by: sbruno Tested by: bdragon, sbruno MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24188	2020-03-30 14:22:52 +00:00
John Baldwin	c034143269	Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_' fields are always used for auth keys and settings and 'csp_cipher_' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677	2020-03-27 18:25:23 +00:00
Mark Johnston	b7e3a3b6e1	Remove unused SYSINIT macros for capability rights. Static rights are initialized in cap_rights_sysinit(). MFC after: 1 week	2020-03-26 15:02:37 +00:00
Conrad Meyer	ca0ec73c11	Expand generic subword atomic primitives The goal of this change is to make the atomic_load_acq_{8,16}, atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives available in MI-namespace. The second goal is to get this draft out of my local tree, as anything that requires a full tinderbox is a big burden out of tree. MD specifics can be refined individually afterwards. The generic implementations may not be ideal for your architecture; feel free to implement better versions. If no subword_atomic definitions are needed, the include can be removed from your arch's machine/atomic.h. Generic definitions are guarded by defined macros of the same name. To avoid picking up conflicting generic definitions, some macro defines are added to various MD machine/atomic.h to register an existing implementation. Include _atomic_subword.h in arm and arm64 machine/atomic.h. For some odd reason, KCSAN only generates some versions of primitives. Generate the _acq variants of atomic_load._8, atomic_load._16, and atomic_testandset.*_long. There are other questionably disabled primitives, but I didn't run into them, so I left them alone. KCSAN is only built for amd64 in tinderbox for now. Add atomic_subword implementations of atomic_load_acq_{8,16} implemented using masking and atomic_load_acq_32. Add generic atomic_subword implementations of atomic_testandset_long(), atomic_testandclear_long(), and atomic_testandset_acq_long(), using atomic_fcmpset_long() and atomic_fcmpset_acq_long(). On x86, add atomic_testandset_acq_long as an alias for atomic_testandset_long. Reviewed by: kevans, rlibby (previous versions both) Differential Revision: https://reviews.freebsd.org/D22963	2020-03-25 23:12:43 +00:00
Konstantin Belousov	8cf8c2f65a	kern_copy_file_range(): check the file type. The syscall can only operate on valid vnode types. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2020-03-24 17:16:52 +00:00
Rick Macklem	f9122b6488	Fix an NFS mount attempt where VFS_STATFS() fails. r353150 added mnt_rootvnode and this seems to have broken NFS mounts when the VFS_STATFS() called just after VFS_MOUNT() returns an error. Then the code calls VFS_UNMOUNT(), which calls vflush(), which returns EBUSY. Then the thread get stuck sleeping on "mntref" in vfs_mount_destroy(). This patch fixes this problem. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D24022	2020-03-22 18:18:30 +00:00
Mark Johnston	99258935eb	Lock the socket in soo_stat(). Otherwise nothing synchronizes with a concurrent conversion of the socket to a listening socket. Only the PF_LOCAL protocols implement pru_sense, and it is safe to hold the socket lock there, so do so for now. Reported by: syzbot+4801f1b79ea40953ca8e@syzkaller.appspotmail.com MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-03-20 20:09:00 +00:00
Mark Johnston	db38b699b4	Simplify uipc_detach() slightly. Remove a goto and an unneeded local variable, and fix style. No functional change intended. Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-20 16:18:54 +00:00
Mark Johnston	569a186d02	Remove UNP_NASCENT, reverting r303855. unp_connectat() no longer holds the link lock across calls to sonewconn(), so the recursion described in r303855 can no longer occur. No functional change intended. Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-20 16:17:54 +00:00
Mark Johnston	429537caeb	kern_dup(): Call filecaps_free_prep() in a write section. filecaps_free_prep() bzeros the capabilities structure and we need to be careful to synchronize with unlocked readers, which expect a consistent rights structure. Reviewed by: kib, mjg Reported by: syzbot+5f30b507f91ddedded21@syzkaller.appspotmail.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24120	2020-03-19 15:40:05 +00:00
Mark Johnston	2d896b816b	Enter a write sequence when updating rights. The Capsicum system calls modify file descriptor table entries. To ensure that readers observe a consistent snapshot of descriptor writes, the system calls need to signal to unlocked readers that an update is pending. Note that ioctl rights are always checked with the descriptor table lock held, so it is not strictly necessary to signal unlocked readers. However, we probably want to enable lockless ioctl checks eventually, so use seqc_write_begin() in kern_cap_ioctls_limit() too. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24119	2020-03-19 15:39:45 +00:00
Brandon Bergren	3069380898	[PowerPC][Book-E] Fix missing load base in elf_cpu_parse_dynamic(). When I implemented MD DYNAMIC parsing, I was originally passing a linker_file_t so that the MD code could relocate pointers. However, it turns out this isn't even filled in until later, so it was always 0. Just pass the load base (ef->address) directly, as that's really the only thing we were interested in in the first place. This fixes a crash on RB800 where it was trying to write to an unmapped address when updating the GOT. Reviewed by: jhibbits Sponsored by: Tag1 Consulting, Inc. Differential Revision: https://reviews.freebsd.org/D24105	2020-03-18 02:58:18 +00:00
Conrad Meyer	34086d5bda	Implement sysctl kern.boot_id Boot IDs are random, opaque 128-bit identifiers that distinguish distinct system boots. A new ID is generated each time the system boots. Unlike kern.boottime, the value is not modified by NTP adjustments. It remains fixed until the machine is restarted. PR: 244867 Reported by: Ricardo Fraile <rfraile AT rfraile.eu> MFC after: I do not intend to, but feel free	2020-03-17 22:27:16 +00:00
Conrad Meyer	a99c321802	Remove misleading / redundant bzero in callout_callwheel_init The intent seems to be zeroing all of the cc_cpu array, or its singleton on such platforms. The assumption made is that the BSP is always zero. The code smell was introduced in r326218, which changed the prior explicit zero to 'curcpu'. The change is only valid if curcpu continues to be zero, contrary to the aim expressed in that commit message. So, more succinctly, the expression could be: memset(cc_cpu,0,sizeof(cc_cpu)). However, there's no point. cc_cpu lives in the data section and has a zero initial value already. So this revision just removes the problematic statement. No functional change. Appeases a (false positive, ish) Coverity CID. CID: 1383567 Reported by: Puneeth Jothaiah <puneethkumar.jothaia AT dell.com> Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D24089	2020-03-16 22:25:25 +00:00
Bjoern A. Zeeb	1b786d0191	kern_jail: missing \0 termination check on osrelease parameter If a user spplies a non-\0 terminated osrelease parameter reading it back may disclose kernel memory. This is a problem in case of nested jails (children.max > 0, which is not the default). Otherwise root outside the jail has access to kernel memory by other means and root inside a jail cannot create a child jail. Add the proper \0 check at the end of a supplied osrelease parameter and make sure any copies of the field will be \0-terminated. Submitted by: Hans Christian Woithe (chwoithe yahoo.com) MFC after: 3 days	2020-03-14 14:04:55 +00:00
Michael Tuexen	db4493f7b6	sendfile() does currently not support SCTP sockets. Therefore, fail the call. Reviewed by: markj@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24059	2020-03-13 18:38:28 +00:00
Conrad Meyer	7a119578a4	kern_shutdown: Add missing EKCD ifdef Submitted by: Puneeth Jothaiah <puneethkumar.jothaia AT dell.com> Reviewed by: bdrewery Sponsored by: Dell EMC Isilon	2020-03-12 21:26:36 +00:00
Konstantin Belousov	00ebd80972	Fix signal delivery might be on sigfastblock clearing. When clearing sigfastblock, either by sigfastblock(UNSETPTR) call or implicitly on execve(2), kernel must check for pending signals and reschedule them if needed. E.g. on execve, all other threads are terminated, and current thread fast block pointer is cleaned. If any signal was left pending, it can now be delivered to the current thread, and we should prepare for ast() on return to userspace to notice the signals. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2020-03-10 20:25:03 +00:00
Konstantin Belousov	0bc52b0bdb	Return reschedule_signals() to being static again. It was used after sigfastblock_setpend() call in in ast() when current thread fast-blocks signals. Add a flag to sigfastblock_setpend() to request reschedule, and remove the direct use of the function from subr_trap.c Tested by: pho Sponsored by: The FreeBSD Foundation	2020-03-10 20:04:38 +00:00
Konstantin Belousov	2d3c083fd7	pipe: explain why not deallocating inode number is fine. Suggested and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24009	2020-03-09 23:40:25 +00:00
Konstantin Belousov	c6d3d601c9	Preallocate pipe buffers on pipe creation. Return ENOMEM if one of the buffer cannot be created even with the minimal size. This should avoid subsequent spurious ENOMEM errors from write(2) when buffer cannot be allocated on the fly, after we reported that the pipe was create succesfully. Reported by: Keno Fischer <keno@juliacomputing.com> Reviewed by: markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23993	2020-03-09 21:55:26 +00:00
Konstantin Belousov	1213de28f8	Style. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23993	2020-03-09 19:46:28 +00:00
Andrew Gallatin	98085bae8c	make lacp's use_numa hashing aware of send tags When I did the use_numa support, I missed the fact that there is a separate hash function for send tag nic selection. So when use_numa is enabled, ktls offload does not work properly, as it does not reliably allocate a send tag on the proper egress nic since different egress nics are selected for send-tag allocation and packet transmit. To fix this, this change: - refectors lacp_select_tx_port_by_hash() and lacp_select_tx_port() to make lacp_select_tx_port_by_hash() always called by lacp_select_tx_port() - pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash() - adds a numa_domain field to if_snd_tag_alloc_params - plumbs the numa domain into places where we allocate send tags In testing with NIC TLS setup on a NUMA machine, I see thousands of output errors before the change when enabling kern.ipc.tls.ifnet.permitted=1. After the change, I see no errors, and I see the NIC sysctl counters showing active TLS offload sessions. Reviewed by: rrs, hselasky, jhb Sponsored by: Netflix	2020-03-09 13:44:51 +00:00
Mateusz Guzik	d2222aa0e9	fd: use smr for managing struct pwd This has a side effect of eliminating filedesc slock/sunlock during path lookup, which in turn removes contention vs concurrent modifications to the fd table. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23889	2020-03-08 00:23:36 +00:00
Mark Johnston	d869a17e62	Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23978	2020-03-06 19:10:00 +00:00
Mark Johnston	fffcb56f7a	Add COUNTER_U64_SYSINIT() and COUNTER_U64_DEFINE_EARLY(). The aim is to reduce the boilerplate needed today to define and initialize global counters. Also add SI_SUB_COUNTER to the sysinit ordering. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23977	2020-03-06 19:09:01 +00:00
Chuck Silvers	f15ccf8836	Add a new "mntfs" pseudo file system which provides private device vnodes for file systems to safely access their disk devices, and adapt FFS to use it. Also add a new BO_NOBUFS flag to allow enforcing that file systems using mntfs vnodes do not accidentally use the original devfs vnode to create buffers. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23787	2020-03-06 18:41:37 +00:00
Konstantin Belousov	695e0701a0	buffer pager: deref ucred immediately after read. Ucred is passed to bread(9) so that non-local filesystems use proper credentials. But, since clean buffer might be cached unless buf_pager_relbuf is not enabled, it makes credentials to have extra reference until buffer is reclaimed. Ucred reference would prevent jail from destroying if creds are jailed. Dereferencing the read credentials on the valid buffer avoid that, and should be fine because the buffer is valid and does not need re-read. PR: 238032 Reported by: bz Reproduced and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23775	2020-03-05 15:52:34 +00:00

1 2 3 4 5 ...

17321 Commits