freebsd-skq

Author	SHA1	Message	Date
Andrew Gallatin	b2dba6634b	kTLS: Fix a bug where we would not encrypt anon data inplace. Software Kernel TLS needs to allocate a new destination crypto buffer when encrypting data from the page cache, so as to avoid overwriting shared clear-text file data with encrypted data specific to a single socket. When the data is anonymous, eg, not tied to a file, then we can encrypt in place and avoid allocating a new page. This fixes a bug where the existing code always assumes the data is private, and never encrypts in place. This results in unneeded page allocations and potentially more memory bandwidth consumption when doing socket writes. When the code was written at Netflix, ktls_encrypt() looked at private sendfile flags to determine if the pages being encrypted where part of the page cache (coming from sendfile) or anonymous (coming from sosend). This was broken internally at Netflix when the sendfile flags were made private, and the M_WRITABLE() check was added. Unfortunately, M_WRITABLE() will always be false for M_NOMAP mbufs, since one cannot just mtod() them. This change introduces a new flags field to the mbuf_ext_pgs struct by stealing a byte from the tls hdr. Note that the current header is still 2 bytes larger than the largest header we support: AES-CBC with explicit IV. We set MBUF_PEXT_FLAG_ANON when creating an unmapped mbuf in m_uiotombuf_nomap() (which is the path that socket writes take), and we check for that flag in ktls_encrypt() when looking for anon pages. Reviewed by: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21796	2019-09-27 20:08:19 +00:00
Ed Maste	f84ed82834	controlelf: update man page Some minor corrections, clarifications or rewording.	2019-09-27 19:26:52 +00:00
Andrew Gallatin	6554362c66	kTLS support for TLS 1.3 TLS 1.3 requires a few changes because 1.3 pretends to be 1.2 with a record type of application data. The "real" record type is then included at the end of the user-supplied plaintext data. This required adding a field to the mbuf_ext_pgs struct to save the record type, and passing the real record type to the sw_encrypt() ktls backend functions. Reviewed by: jhb, hselasky Sponsored by: Netflix Differential Revision: D21801	2019-09-27 19:17:40 +00:00
Mateusz Guzik	708cf7eb6c	cache: decrease ncnegfactor to 5 The current mechanism is bogus in several ways: - the limit is a percentage of total entries added, which means negative entries get evicted all the time even if there are plenty of resources - evicting code is almost not concurrent, which makes it unable to remove entries fast enough when doing something as simple as -j 104 buildworld - there is no support for performing mass removal if necessary Vast majority of negative entries never get any hits. Only evicting them when the filesystem demands it results in a significant growth of the namecache with almost no improvement in the hit ratio. Sample result about afer 90 minutes of poudriere -j 104: current no evict % of the original numneg 219737 2013157 916 numneghits 266711906 263544562 98 [1] [1] this may look funny but there is a certain dose of variation to the build The number was chosen as something which mostly eliminates spurious evictions during lighter workloads but still keeps the total at bay. Sponsored by: The FreeBSD Foundation	2019-09-27 19:14:03 +00:00
Mateusz Guzik	e643141838	cache: stop requeuing negative entries on the hot list Turns out it does not improve hit ratio, but it does come with a cost induces stemming from dirtying hit entries. Sample result: hit counts of evicted entries after 2 buildworlds before: value ------------- Distribution ------------- count -1 \| 0 0 \|@@@@@@@@@@@@@@@@@@@@@@@@@ 180865 1 \|@@@@@@@ 49150 2 \|@@@ 19067 4 \|@ 9825 8 \|@ 7340 16 \|@ 5952 32 \|@ 5243 64 \|@ 4446 128 \| 3556 256 \| 3035 512 \| 1705 1024 \| 1078 2048 \| 365 4096 \| 95 8192 \| 34 16384 \| 26 32768 \| 23 65536 \| 8 131072 \| 6 262144 \| 0 after: value ------------- Distribution ------------- count -1 \| 0 0 \|@@@@@@@@@@@@@@@@@@@@@@@@@ 184004 1 \|@@@@@@ 47577 2 \|@@@ 19446 4 \|@ 10093 8 \|@ 7470 16 \|@ 5544 32 \|@ 5475 64 \|@ 5011 128 \| 3451 256 \| 3002 512 \| 1729 1024 \| 1086 2048 \| 363 4096 \| 86 8192 \| 26 16384 \| 25 32768 \| 24 65536 \| 7 131072 \| 5 262144 \| 0 Sponsored by: The FreeBSD Foundation	2019-09-27 19:13:22 +00:00
Mateusz Guzik	312196df0f	cache: make negative list shrinking a little bit concurrent Continue protecting demotion from the hotlist and selection of the target list with the ncneg_shrink_lock lock, but drop it before relocking to zap the node. While here count how many times we skipped shrinking due to the lock being already taken. Sponsored by: The FreeBSD Foundation	2019-09-27 19:12:43 +00:00
Mateusz Guzik	95c6dd890a	cache: stop recalculating upper limit each time a new entry is added Sponsored by: The FreeBSD Foundation	2019-09-27 19:12:20 +00:00
Ed Maste	43b40779db	controlelf: exit with error if file endianness does not match host We need to add support for cross-endian operation, but until that's done just exit with an error rather than misbehaving.	2019-09-27 19:07:11 +00:00
Ed Maste	6abfc627d6	controlelf: simplify feature string parsing Also add error handling on failure to seek/write updated value.	2019-09-27 18:49:13 +00:00
Konstantin Belousov	df08823d07	Improve MD page fault handlers. Centralize calculation of signal and ucode delivered on unhandled page fault in new function vm_fault_trap(). MD trap_pfault() now almost always uses the signal numbers and error codes calculated in consistent MI way. This introduces the protection fault compatibility sysctls to all non-x86 architectures which did not have that bug, but apparently they were already much more wrong in selecting delivered signals on protection violations. Change the delivered signal for accesses to mapped area after the backing object was truncated. According to POSIX description for mmap(2): The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. An implementation may generate SIGBUS signals when a reference would cause an error in the mapped object, such as out-of-space condition. Adjust according to the description, keeping the existing compatibility code for SIGSEGV/SIGBUS on protection failures. For situations where kernel cannot handle page fault due to resource limit enforcement, SIGBUS with a new error code BUS_OBJERR is delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on amd64 due to protection key access violation. vm_fault_hold() is renamed to vm_fault(). Fixed some nits in trap_pfault()s like mis-interpreting Mach errors as errnos. Removed unneeded truncations of the fault addresses reported by hardware. PR: 211924 Reviewed by: alc Discussed with: jilles, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21566	2019-09-27 18:43:36 +00:00
Ed Maste	3801c66a97	controlelf: tidy up option parsing Sponsored by: The FreeBSD Foundation	2019-09-27 18:39:05 +00:00
Ed Maste	d70d327f0e	controlelf: add protmax control Sponsored by: The FreeBSD Foundation	2019-09-27 17:28:25 +00:00
Dimitry Andric	1a444441d8	Correct the final argument name in the top(1) manpage. The description talks about 'number', while the final argument was 'count'. Since 'count' is already used for the count of displays, change the final argument name to 'number'. MFC after: 3 days	2019-09-27 17:11:21 +00:00
Ed Maste	33ac844050	controlelf: some style(9) cleanup Submitted by: clang-format	2019-09-27 16:57:32 +00:00
Mark Johnston	7cc833c598	Fix a race in vm_page_swapqueue(). vm_page_swapqueue() atomically transitions a page between queues. To do so, it must hold the page queue lock for the old queue. However, once the queue index has been updated, the queue lock no longer protects the page's queue state. Thus, we must speculatively remove the page from the old queue before committing the queue state update, and roll back if the update fails. Reported and tested by: pho Reviewed by: kib Sponsored by: Intel, Netflix Differential Revision: https://reviews.freebsd.org/D21791	2019-09-27 16:46:08 +00:00
Ed Maste	02ebf42d80	controlelf: install standard BSD 2 clause license Reported by: kaktus Sponsored by: The FreeBSD Foundation	2019-09-27 16:44:29 +00:00
Mark Johnston	87e93ea6c3	Fix object locking in vm_object_unwire() after r352174. Now, vm_page_busy_sleep() expects the page's object to be locked. vm_object_unwire() does some unusual lazy locking of the object chain and keeps objects locked until a busy page is encountered or the loop terminates. When a busy page is encountered, rather than unlocking all but the "bottom-level" object, we must instead skip the object to which "tm" belongs. Reported and tested by: pho Reviewed by: kib Discussed with: jeff Sponsored by: Intel, Netflix Differential Revision: https://reviews.freebsd.org/D21790	2019-09-27 16:41:34 +00:00
Ed Maste	d6f95d83b4	controlelf: clean up warnings - use explicit ELF note name when not found - no trailing . on warnings - no \n Sponsored by: The FreeBSD Foundation	2019-09-27 16:35:08 +00:00
Conrad Meyer	963c89ff4e	nvdimm(4): Extract ACPI root bus driver No functional change intended. The intent is to add a "legacy" e820 pmem newbus bus for nvdimm device in a subsequent revision, and it's a little more clear if the parent buses get independent source files. Quite a lot of ACPI-specific logic is left in nvdimm.c; disentangling that is a much larger change (and probably not especially useful). Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21813	2019-09-27 16:32:44 +00:00
Ed Maste	d5cd80c811	Add tool to modify ELF binary feature control bits This will allow feature control bits (e.g. for ASLR, PROT_MAX) to be inspected or modified. Some clean-up and additional work is likely still required, but we can iterate on this in the tree. Submitted by: Bora Özarslan <borako.ozarslan@gmail.com> Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19290	2019-09-27 16:27:52 +00:00
Andrew Turner	50bb04b750	Check the vfs option length is valid before accessing through When a VFS option passed to nmount is present but NULL the kernel will place an empty option in its internal list. This will have a NULL pointer and a length of 0. When we come to read one of these the kernel will try to load from the last address of virtual memory. This is normally invalid so will fault resulting in a kernel panic. Fix this by checking if the length is valid before dereferencing. MFC after: 3 days Sponsored by: DARPA, AFRL	2019-09-27 16:22:28 +00:00
Warner Losh	4470d73996	Document varadic args as int, since you can't have short varadic args (they are promoted to ints). - `mode_t` is `uint16_t` (`sys/sys/_types.h`) - `openat` takes variadic args - variadic args cannot be 16-bit, and indeed the code uses int - the manpage currently kinda implies the argument is 16-bit by saying `mode_t` Prompted by Rust things: https://github.com/tailhook/openat/issues/21 Submitted by: Greg V at unrelenting Differential Revision: https://reviews.freebsd.org/D21816	2019-09-27 16:11:47 +00:00
Ed Maste	cdb42801d0	compiler-rt: correct RISC-V struct_kernel_stat64_sz The value of struct_kernel_stat64_sz introduced by review D5021 for RISC-V was incorrect. Also add a __riscv_xlen == 64 conditional as the 32-bit ABI is not yet finalized. Submitted by: Luís Marques Differential Revision: https://reviews.freebsd.org/D21684	2019-09-27 13:14:36 +00:00
Pawel Biernacki	944e67b9cb	Add myself (kaktus) as a src commiter. Reviewed by: kib (mentor) Approved by: kib (mentor), mjg (mentor) Differential Revision: https://reviews.freebsd.org/D21811	2019-09-27 10:19:28 +00:00
Alexander Motin	630d9800a1	Replace argument checks with assertions. Those functions are used by kernel, and we can't check all possible argument errors in production kernel. Plus according to docs many of those errors are checked by hardware. Assertions should just help with code debugging. MFC after: 2 weeks	2019-09-27 02:09:20 +00:00
Cy Schubert	a97e8d2fe4	Implement the dynamic add (-A) and removal (-R) of ippool pools from the command line. Prior to this the functionality was mostly there however since the pool type (-t) was not recognized by the -A and -R command options -- not recognized by getopt(). Additionally the code to implement the dynamic add and removal of pools didn't work. When dynamically adding (-A) a pool a type (-t) to specify if the pool is a tree or hash pool must be specified. When dynamically removing (-R) a pool, omitting -t will cause a search-and-destroy which will remove both types of pools matching the name given (-m). PR: 218433 MFC after: 1 week	2019-09-27 00:29:12 +00:00
Cy Schubert	e7257e1499	The no resolve (OPT_NORESOLVE) does nothing. Additionally, it (-R) conflicts with the command option of the same name (also -R). Remove the superfluous and confusing non-global non-command -R option. PR: 218433 MFC after: 1 week	2019-09-27 00:29:09 +00:00
Cy Schubert	80aa6435f0	Sync with source: Only a role of "ipf" is currentlysupported as the other documented (and undocumented) roles are #ifdef'd out. The plan is to complete ippool(8) as it is even in its current state a powerful feature/tool. PR: 218433 MFC after: 1 month	2019-09-27 00:29:06 +00:00
Cy Schubert	a263199455	Fix a typo. MFC after: 3 days	2019-09-27 00:29:03 +00:00
Gleb Smirnoff	05ee75efe7	Move EPOCH_TRACE to opt_global.h, so that any external modules that use epoch don't need Makefile tweaks. The downside is that any developer who wants EPOCH_TRACE needs to rebuild kernel in full, but that's fine. Reviewed by: imp	2019-09-26 21:12:47 +00:00
Oleksandr Tymoshenko	17b984a638	snd_hda: Add Intel Cannon Lake support Add missing header change ommitted in r352775 MFC after: 2 weeks X-MFC-with: 352775	2019-09-26 21:04:36 +00:00
Oleksandr Tymoshenko	c314e2aff2	snd_hda: Add Intel Cannon Lake support Add PCI ids for Intel Cannon Lake PCH Tested on: HP Spectre x360 13-p0043dx PR: 240574 Submitted by: Neel Chauhan <neel@neelc.org> Reviewed by: imp, mizhka, ray MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D21789	2019-09-26 21:02:21 +00:00
Kyle Evans	e12ff89136	Further normalize copyright notices - s/C/c/ where I've been inconsistent about it - +SPDX tags - Remove "All rights reserved" where possible Requested by: rgrimes (all rights reserved)	2019-09-26 16:19:22 +00:00
David Bright	d4f4430503	Correct mistake in MLINKS introduced in r352747 Messed up a merge conflict resolution and didn't catch that before commit. Sponsored by: Dell EMC Isilon	2019-09-26 16:13:17 +00:00
David Bright	c4571256af	sysent: regenerate after r352747. Sponsored by: Dell EMC Isilon	2019-09-26 15:41:10 +00:00
Mark Johnston	55248d32f2	Fix handling of invalid pages in exec_map_first_page(). exec_map_first_page() would unconditionally free an unbacked, invalid page from the executable image. However, it is possible that the page is wired, in which case it is incorrect to free the page, so check for additional wirings first. Reported by: syzkaller Tested by: pho Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21767	2019-09-26 15:35:35 +00:00
David Bright	9afb12bab4	Add an shm_rename syscall Add an atomic shm rename operation, similar in spirit to a file rename. Atomically unlink an shm from a source path and link it to a destination path. If an existing shm is linked at the destination path, unlink it as part of the same atomic operation. The caller needs the same permissions as shm_unlink to the shm being renamed, and the same permissions for the shm at the destination which is being unlinked, if it exists. If those fail, EACCES is returned, as with the other shm_* syscalls. truss support is included; audit support will come later. This commit includes only the implementation; the sysent-generated bits will come in a follow-on commit. Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: jilles (earlier revision) Reviewed by: brueffer (manpages, earlier revision) Relnotes: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21423	2019-09-26 15:32:28 +00:00
Jonathan T. Looney	0b18fb0798	Add new functionality to switch to using cookies exclusively when we the syn cache overflows. Whether this is due to an attack or due to the system having more legitimate connections than the syn cache can hold, this situation can quickly impact performance. To make the system perform better during these periods, the code will now switch to exclusively using cookies until the syn cache stops overflowing. In order for this to occur, the system must be configured to use the syn cache with syn cookie fallback. If syn cookies are completely disabled, this change should have no functional impact. When the system is exclusively using syn cookies (either due to configuration or the overflow detection enabled by this change), the code will now skip acquiring a lock on the syn cache bucket. Additionally, the code will now skip lookups in several places (such as when the system receives a RST in response to a SYN\|ACK frame). Reviewed by: rrs, gallatin (previous version) Discussed with: tuexen Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:18:57 +00:00
Jonathan T. Looney	0bee4d631a	Access the syncache secret directly from the V_tcp_syncache variable, rather than indirectly through the backpointer to the tcp_syncache structure stored in the hashtable bucket. This also allows us to remove the requirement in syncookie_generate() and syncookie_lookup() that the syncache hashtable bucket must be locked. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:06:46 +00:00
Jonathan T. Looney	867e98f8ee	Remove the unused sch parameter to the syncache_respond() function. The use of this parameter was removed in r313330. This commit now removes passing this now-unused parameter. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:02:34 +00:00
Alexander Motin	34a5c41c43	Add kern.cam.da.X.quirks tunable, similar existing for ada. Submitted by: Michael Lass MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20677	2019-09-26 14:48:39 +00:00
Ed Maste	20bd59416d	bspatch: add integer overflow checks Introduce a new add_off_t static function that exits with an error message if there's an overflow, otherwise returns their sum. Use this when adding values obtained from the input patch. Reviewed by: delphij, allanjude (earlier) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7897	2019-09-26 13:27:25 +00:00
Toomas Soome	11fc80a098	kernel terminal should initialize fg and bg variables before calling TUNABLE_INT_FETCH We have two ways to check if kenv variable exists - either we check return value from TUNABLE_INT_FETCH, or we pre-initialize the variable and check if this value did change. In terminal_init() it is more convinient to use pre-initialized variables. Problem was revealed by older loader.efi, which did not set teken.* variables. Reported by: tuexen	2019-09-26 07:19:26 +00:00
Toomas Soome	29f7096df9	vt: use proper return value check with TUNABLE_INT_FETCH The TUNABLE_INT_FETCH is macro around getenv_int() and we will get return value 0 or 1 for failure or success, we can use it to decide which background color to use.	2019-09-26 07:14:54 +00:00
Cy Schubert	4fcb870612	Teach the ippool parser about address families. This is a precursor to implementing IPv6 support within ippool which requires reworking radix_ipf.c. MFC after: 1 month	2019-09-26 03:09:45 +00:00
Cy Schubert	d096bd7911	ipf mistakenly regards UDP packets with a checksum of 0xffff as bad. Obtained from: NetBSD fil.c r1.30, NetBSD PR/54443 MFC after: 3 days	2019-09-26 03:09:42 +00:00
Rick Macklem	ee7201a725	Replace all mtx_assert() calls for n_mtx and ncl_iod_mutex with macros. To be consistent with replacing the mtx_lock()/mtx_unlock() calls on the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces all mtx_assert() calls on these mutexes with macros as well. This will simplify changing these locks to sx locks in a future commit. However, this change may be delayed indefinitely, since it appears there is a deadlock when vnode_pager_setsize() is called to shrink the size and the NFS node lock is held. There is no semantic change as a result of this commit. Suggested by: kib MFC after: 1 week	2019-09-26 02:54:45 +00:00
Conrad Meyer	407c48f060	amd64 pmap: Clarify largemap bootverbose message units A PML4 covers 512 gigabytes, not gigabits. Use the typical B suffix for bytes. No functional change. Sponsored by: Dell EMC Isilon	2019-09-26 01:51:55 +00:00
Conrad Meyer	f7b69dd986	amd64: Expose vm.pmap.large_map_pml4_entries as a sysctl node It's nice to have sysctl nodes for tunables. Sponsored by: Dell EMC Isilon	2019-09-26 01:50:26 +00:00
Martin Matuska	f057565e0d	MFV r352731: Sync libarchive with vendor. Relevant vendor changes: Issue #1237: Fix integer overflow in archive_read_support_filter_lz4.c PR #1249: Correct some typographical and grammatical errors. PR #1250: Minor corrections to the formatting of manual pages MFC after: 1 week	2019-09-26 01:50:20 +00:00

1 2 3 4 5 ...

244565 Commits