freebsd-skq

Author	SHA1	Message	Date
kevans	98761e7d3e	Preload hostuuid for early-boot use prison0's hostuuid will get set by the hostid rc script, either after generating it and saving it to /etc/hostid or by simply reading /etc/hostid. Some things (e.g. arbitrary MAC address generation) may use the hostuuid as a factor in early boot, so providing a way to read /etc/hostid (if it's available) and using it before userland starts up is desirable. The code is written such that the preload doesn't have to be /etc/hostid, thus not assuming that there will be newline at the end of the buffer or even the exact shape of the newline. White trailing whitespace/non-printables trimmed, the result will be validated as a valid uuid before it's used for early boot purposes. The preload can be turned off with hostuuid_load="NO" in /boot/loader.conf, just as other preloads; it's worth noting that this is a 37-byte file, the overhead is believed to be generally minimal. It doesn't seem necessary at this time to be concerned with kern.hostid. One does wonder if we should consider validating hostuuids coming in via jail_set(2); some bits seem to care about uuid form and we bother validating format of smbios-provided uuid and in-fact whatever uuid comes from /etc/hostid. Reviewed by: karels, delphij, jamie MFC after: 1 week (don't preload by default, probably) Differential Revision: https://reviews.freebsd.org/D24288	2020-04-16 00:54:06 +00:00
brooks	3bc86c9ae7	Export argc, argv, envc, envv, and ps_strings in auxargs. This simplifies discovery of these values, potentially with reducing the number of syscalls we need to make at runtime. Longer term, we wish to convert the startup process to pass an auxargs pointer to _start() and use that rather than walking off the end of envv. This is cleaner, more C-friendly, and for systems with strong bounds (e.g. CHERI) necessary. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24407	2020-04-15 20:23:55 +00:00
brooks	27c07b76c4	Make ps_strings in struct image_params into a pointer. This is a prepratory commit for D24407. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA	2020-04-15 20:21:30 +00:00
kevans	9b77dcbf1d	validate_uuid: absorb the rest of parse_uuid with a flags arg This makes the naming annoyance (validate_uuid vs. parse_uuid) less of an issue and centralizes all of the functionality into the new KPI while still making the extra validation optional. The end-result is all the same as far as hostuuid validation-only goes.	2020-04-15 18:39:12 +00:00
kaktus	d91b3a25d2	sysctl_handle_string: Put logical or in parentheses. Reported by: rdivacky Approved by: kib (mentor) Pointy-hat to: kaktus	2020-04-15 16:55:38 +00:00
kaktus	aa6f926f8c	sysctl(9): fix handling string tunables. r357614 changed internals of handling string sysctls, and inadvertently broke setting string tunables. Take them into account. PR: 245463 Reported by: jhb, np Reviewed by: imp, jhb, kib Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D24429	2020-04-15 16:33:55 +00:00
hselasky	e86c4d6425	Cast all ioctl command arguments through uint32_t internally. Hide debug print showing use of sign extended ioctl command argument under INVARIANTS. The print is available to all and can easily fill up the logs. No functional change intended. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-04-15 13:20:51 +00:00
kevans	9da994e5cc	kern uuid: break format validation out into a separate KPI This new KPI, validate_uuid, strictly validates the formatting of the input UUID and, optionally, populates a given struct uuid. As noted in the header, the key differences are that the new KPI won't recognize an empty string as a nil UUID and it won't do any kind of semantic validation on it. Also key is that populating a struct uuid is optional, so the caller doesn't necessarily need to allocate a bogus one on the stack just to validate the string. This KPI has specifically been broken out in support of D24288, which will preload /etc/hostid in loader so that early boot hostuuid users (e.g. anything that calls ether_gen_addr) can have a valid hostuuid to work with once it's been stashed in /etc/hostid.	2020-04-15 03:59:26 +00:00
brooks	d11edfe286	Remove bogus use of useracc() in (clock_)nanosleep. There's no point in pre-checking that we can access the user's rmtp pointer before we do it in copyout(). While here, improve style(9) compliance. Reviewed by: imp MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24409	2020-04-14 20:53:12 +00:00
brooks	cfb2be0cff	Centralize compatability translation macros. Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h and replace existing definitation with includes where required. This eliminates duplicate code and allows Linux and FreeBSD compatability headers to be included in the same files. Input from: cem, jhb Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24275	2020-04-14 20:30:48 +00:00
kevans	ee46db7e3b	sysent: re-roll after r359930	2020-04-14 18:11:26 +00:00
kevans	79165c9642	Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_range Include a temporarily compatibility shim as well for kernels predating close_range, since closefrom is used in some critical areas. Reviewed by: markj (previous version), kib Differential Revision: https://reviews.freebsd.org/D24399	2020-04-14 18:07:42 +00:00
jtl	5c3de7e0d4	Make sonewconn() overflow messages have per-socket rate-limits and values. sonewconn() emits debug-level messages when a listen socket's queue overflows. Currently, sonewconn() tracks overflows on a global basis. It will only log one message every 60 seconds, regardless of how many sockets experience overflows. And, when it next logs at the end of the 60 seconds, it records a single message referencing a single PCB with the total number of overflows across all sockets. This commit changes to per-socket overflow tracking. The code will now log one message every 60 seconds per socket. And, the code will provide per-socket queue length and overflow counts. It also provides a way to change the period between log messages using a sysctl. Reviewed by: jhb (previous version), bcr (manpages) MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24316	2020-04-14 15:38:18 +00:00
jtl	7f82c3af09	Print more detail as part of the sonewconn() overflow message. When a socket's listen queue overflows, sonewconn() emits a debug-level log message. These messages are sometimes useful to systems administrators in highlighting a process which is not keeping up with its listen queue. This commit attempts to enhance the usefulness of this message by printing more details about the socket's address. If all else fails, it will at least print the domain name of the socket. Reviewed by: bz, jhb, kbowling MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24272	2020-04-14 15:30:34 +00:00
gallatin	2de7a790ba	KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache. This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext. In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs. This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone, This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons. In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache. Many thanks to glebius for helping to make this better in the Netflix tree. Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213	2020-04-14 14:46:06 +00:00
kevans	8c5ad1b4c0	posixshm: fix counting of writable mappings Similar to mmap'ing vnodes, posixshm should count any mapping where maxprot contains VM_PROT_WRITE (i.e. fd opened r/w with no write-seal applied) as writable and thus blocking of any write-seal. The memfd tests have been amended to reflect the fixes here, which notably includes: 1. Fix for error return bug; EPERM is not a documented failure mode for mmap 2. Fix rejection of write-seal with active mappings that can be upgraded via mprotect(2). Reported by: markj Discussed with: markj, kib	2020-04-14 13:32:03 +00:00
markj	6f2d4014b9	Fix sendto() on unconnected SOCK_STREAM/SEQPACKET unix sockets. Previously the unpcb pointer of the newly connected remote socket was not initialized correctly, so attempting to lock it would result in a null pointer dereference. Reported by: syzkaller MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-04-13 19:22:05 +00:00
markj	87b05c6566	Relax restrictions on private mappings of POSIX shm objects. When creating a private mapping of a POSIX shared memory object, VM_PROT_WRITE should always be included in maxprot regardless of permissions on the underlying FD. Otherwise it is possible to open a shm object read-only, map it with MAP_PRIVATE and PROT_WRITE, and violate the invariant in vm_map_insert() that (prot & maxprot) == prot. Reported by: syzkaller Reviewed by: kevans, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24398	2020-04-13 19:20:39 +00:00
kevans	4045a67bf3	close_range/closefrom: fix regression from close_range introduction close_range will clamp the range between [0, fdp->fd_lastfile], but failed to take into account that fdp->fd_lastfile can become -1 if all fds are closed. =-( In this scenario, just return because there's nothing further we can do at the moment. Add a test case for this, fork() and simply closefrom(0) twice in the child; on the second invocation, fdp->fd_lastfile == -1 and will trigger a panic before this change. X-MFC-With: r359836	2020-04-13 17:55:31 +00:00
kevans	a9a4eb7720	sysent: re-roll after introduction of close_range in r359836	2020-04-12 21:23:51 +00:00
kevans	6371039d47	Implement a close_range(2) syscall close_range(min, max, flags) allows for a range of descriptors to be closed. The Python folk have indicated that they would much prefer this interface to closefrom(2), as the case may be that they/someone have special fds dup'd to higher in the range and they can't necessarily closefrom(min) because they don't want to hit the upper range, but relocating them to lower isn't necessarily feasible. sys_closefrom has been rewritten to use kern_close_range() using ~0U to indicate closing to the end of the range. This was chosen rather than requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the call to kern_close_range for simplicity. The flags argument of close_range(2) is currently unused, so any flags set is currently EINVAL. It was added to the interface in Linux so that future flags could be added for, e.g., "halt on first error" and things of this nature. This patch is based on a syscall of the same design that is expected to be merged into Linux. Reviewed by: kib, markj, vangyzen (all slightly earlier revisions) Differential Revision: https://reviews.freebsd.org/D21627	2020-04-12 21:23:19 +00:00
kib	aae6ffd7c3	sendfile_iodone: correct calculation of the page index for relookup. This is yet another bug in r359473. Reported and tested by: delphij Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2020-04-12 05:10:48 +00:00
markj	3d36b367cf	sbappendcontrol() needs to avoid clearing M_NOTREADY on data mbufs. If LOCAL_CREDS is set on a unix socket and sendfile() is called, sendfile will call uipc_send(PRUS_NOTREADY), prepending a control message to the M_NOTREADY mbufs. uipc_send() then calls sbappendcontrol() instead of sbappend(), and sbappendcontrol() would erroneously clear M_NOTREADY. Pass send flags to sbappendcontrol(), like we do for sbappend(), to preserve M_READY when necessary. Reported by: syzkaller MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24333	2020-04-10 20:42:11 +00:00
markj	dde1b5985f	Properly handle disconnected sockets in uipc_ready(). When transmitting over a unix socket, data is placed directly into the receiving socket's receive buffer, instead of the transmitting socket's send buffer. This means that when pru_ready is called during sendfile(), the passed socket does not contain M_NOTREADY mbufs in its buffers; uipc_ready() must locate the linked socket. Currently uipc_ready() frees the mbufs if the socket is disconnected, but this is wrong since the mbufs may still be present in the receiving socket's buffer after a disconnect. This can result in a use-after-free and potentially a double free if the receive buffer is flushed after uipc_ready() frees the mbufs. Fix the problem by trying harder to locate the correct socket buffer and calling sbready(): use the global list of SOCK_STREAM unix sockets to search for a sockbuf containing the now-ready mbufs. Only free the mbufs if we fail this search. Reviewed by: jah, kib Reported and tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24332	2020-04-10 20:41:59 +00:00
kib	3248084b37	Do not pass bogus page to mbufs. This is a bug in r359473. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2020-04-10 01:28:47 +00:00
mckusick	4ef00b0116	When running with a kernel compiled with DEBUG_LOCKS, before panic'ing for recusing on a non-recursive lock, print out the kernel stack where the lock was originally acquired.	2020-04-09 23:42:13 +00:00
kib	1001d5d0e7	Remove extra call to vfs_op_exit() from vfs_write_suspend() when VFS_SYNC() fails. The vfs_write_resume() handler already does vfs_op_exit() for us. Reported by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation	2020-04-09 18:38:00 +00:00
rmacklem	9f795af55b	Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933	2020-04-09 14:44:46 +00:00
cem	16862b1b9c	ddb(4): show lockchain: Don't dereference LK_KERNPROC Also, print a little more information for otherwise unhandled inhibited states. Finally, improve the grammar of some prints. Some of the print statements missing verb. Sponsored by: Dell EMC Isilon	2020-04-02 20:47:51 +00:00
jhb	efd93357ab	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837	2020-04-01 19:22:09 +00:00
jah	b6a165463a	deadlkres: include thread name in panic messages Reviewed by: markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24235	2020-04-01 04:51:39 +00:00
gallatin	9a5d50a84c	KTLS: Coalesce adjacent TLS trailers & headers to improve PCIe bus efficiency KTLS uses the embedded header and trailer fields of unmapped mbufs. This can lead to "silly" buffer lengths, where we have an mbuf chain that will create a scatter/gather lists with a regular pattern of 13 bytes followed by 16 bytes between each adjacent TLS record. For software ktls we typically wind up with a pattern where we have several TLS records encrypted, and made ready at once. When these records are made ready, we can coalesce these silly buffers in sbready_compress by copying 13b TLS header of the next record into the 16b TLS trailer of the current record. After doing so, we now have a small 29 byte chunk between each TLS record. This marginally increases PCIe bus efficiency. We've seen an almost 1Gb/s increase in peak throughput on Broadwell based Xeons running a 100% software TLS workload with Mellanox ConnectX-4 NICs. Note that this change is ifdef'ed for KTLS, as KTLS is currently the only user of the hdr/trailer feature of unmapped mbufs, and peeking into them is expensive, since the ext_pgs struct lives in separately allocated memory, and may be cold in cache. This optimization is not applicable to HW ("NIC") TLS, as that depends on having the entire TLS record described by a single unmapped mbuf, so we cannot shift parts of the record between mbufs for HW TLS. Reviewed by: jhb, hselasky, scottl Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24204	2020-03-30 23:29:53 +00:00
kib	a4796af6c5	kern_sendfile.c: fix bugs with handling of busy page states. - Do not call into a vnode pager while leaving some pages from the same block as the current run, xbusy. This immediately deadlocks if pager needs to instantiate the buffer. - Only relookup bogus pages after io finished, otherwise we might obliterate the valid pages by out of date disk content. While there, expand the comment explaining this pecularity. - Do not double-unbusy on error. Split unbusy for error case, which is left in the sendfile_swapin(), from the more properly coded normal case in sendfile_iodone(). - Add an XXXKIB comment explaining the serious bug in the validation algorithm, not fixed by this patch series. PR: 244713 Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 22:13:32 +00:00
kib	dd37986813	kern_sendfile.c: do not release sfio reference on error. It is already done by sendfile_iodone(), now consistently for all errors. This de-facto reverts r358597, after r359466. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 22:01:36 +00:00
kib	b4e528613a	kern_sendfile.c: wait for all in-flight ios completion before unwiring pages. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:57:28 +00:00
kib	7f44c6c00e	kern_sendfile.c: add specific malloc type. Now sfio leaks are more easily seen in the malloc statistics than e.g. just wired or busy pages leak. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:50:51 +00:00
kib	1bd0720f0c	VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:44:30 +00:00
kib	62fe3d773b	buffer pager: skip bogus pages. We cannot validate bogus page by reading a buffer. PR: 244713 Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:42:46 +00:00
kib	bb1271846a	kern_sendfile.c style: order headers alphabetically. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:40:35 +00:00
emaste	4f6e9aa72d	capabilities.conf: provide information about capmode permitted syscalls Reviewed by: jhb (earlier) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24118	2020-03-30 18:24:07 +00:00
markj	011cb14c5d	Remove the "config" taskqgroup and its KPIs. Equivalent functionality is already provided by taskqueue(9), just use that instead. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-30 14:24:03 +00:00
markj	307559af97	Simplify taskqgroup inititialization. taskqgroup initialization was broken into two steps: 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; 2. initialize taskqueues, start taskqueue threads, enqueue "binder" tasks to bind threads to specific CPUs, at SI_SUB_SMP. Step 2 tries to handle the case where tasks have already been attached to a queue, by migrating them to their intended queue. In particular, tasks can't be enqueued before step 2 has completed. This breaks NFS mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP is not defined, since mountroot happens before SI_SUB_SMP in this case. Simplify initialization: do all initialization except for CPU binding at SI_SUB_TASKQ. This means that until CPU binding is completed, group tasks may be executed on a CPU other than that to which they were bound, but this should not be a problem for existing users of the taskqgroup KPIs. Reported by: sbruno Tested by: bdragon, sbruno MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24188	2020-03-30 14:22:52 +00:00
jhb	ddcef18974	Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_' fields are always used for auth keys and settings and 'csp_cipher_' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677	2020-03-27 18:25:23 +00:00
markj	0d53cb1dc6	Remove unused SYSINIT macros for capability rights. Static rights are initialized in cap_rights_sysinit(). MFC after: 1 week	2020-03-26 15:02:37 +00:00
cem	b4fc99e12c	Expand generic subword atomic primitives The goal of this change is to make the atomic_load_acq_{8,16}, atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives available in MI-namespace. The second goal is to get this draft out of my local tree, as anything that requires a full tinderbox is a big burden out of tree. MD specifics can be refined individually afterwards. The generic implementations may not be ideal for your architecture; feel free to implement better versions. If no subword_atomic definitions are needed, the include can be removed from your arch's machine/atomic.h. Generic definitions are guarded by defined macros of the same name. To avoid picking up conflicting generic definitions, some macro defines are added to various MD machine/atomic.h to register an existing implementation. Include _atomic_subword.h in arm and arm64 machine/atomic.h. For some odd reason, KCSAN only generates some versions of primitives. Generate the _acq variants of atomic_load._8, atomic_load._16, and atomic_testandset.*_long. There are other questionably disabled primitives, but I didn't run into them, so I left them alone. KCSAN is only built for amd64 in tinderbox for now. Add atomic_subword implementations of atomic_load_acq_{8,16} implemented using masking and atomic_load_acq_32. Add generic atomic_subword implementations of atomic_testandset_long(), atomic_testandclear_long(), and atomic_testandset_acq_long(), using atomic_fcmpset_long() and atomic_fcmpset_acq_long(). On x86, add atomic_testandset_acq_long as an alias for atomic_testandset_long. Reviewed by: kevans, rlibby (previous versions both) Differential Revision: https://reviews.freebsd.org/D22963	2020-03-25 23:12:43 +00:00
kib	1299dbec8a	kern_copy_file_range(): check the file type. The syscall can only operate on valid vnode types. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2020-03-24 17:16:52 +00:00
rmacklem	9482ee93e3	Fix an NFS mount attempt where VFS_STATFS() fails. r353150 added mnt_rootvnode and this seems to have broken NFS mounts when the VFS_STATFS() called just after VFS_MOUNT() returns an error. Then the code calls VFS_UNMOUNT(), which calls vflush(), which returns EBUSY. Then the thread get stuck sleeping on "mntref" in vfs_mount_destroy(). This patch fixes this problem. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D24022	2020-03-22 18:18:30 +00:00
markj	bfee51521a	Lock the socket in soo_stat(). Otherwise nothing synchronizes with a concurrent conversion of the socket to a listening socket. Only the PF_LOCAL protocols implement pru_sense, and it is safe to hold the socket lock there, so do so for now. Reported by: syzbot+4801f1b79ea40953ca8e@syzkaller.appspotmail.com MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-03-20 20:09:00 +00:00
markj	e4b2ce8ddc	Simplify uipc_detach() slightly. Remove a goto and an unneeded local variable, and fix style. No functional change intended. Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-20 16:18:54 +00:00
markj	dc1e534e9f	Remove UNP_NASCENT, reverting r303855. unp_connectat() no longer holds the link lock across calls to sonewconn(), so the recursion described in r303855 can no longer occur. No functional change intended. Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-20 16:17:54 +00:00

1 2 3 4 5 ...

17354 Commits