freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	dd388cfd9b	The net.inet.tcp.nolocaltimewait=1 optimization prevents local TCP connections from entering the TIME_WAIT state. However, it omits sending the ACK for the FIN, which results in RST. This becomes a bigger deal if the sysctl net.inet.tcp.blackhole is 2. In this case RST isn't send, so the other side of the connection (also local) keeps retransmitting FINs. To fix that in tcp_twstart() we will not call tcp_close() immediately. Instead we will allocate a tcptw on stack and proceed to the end of the function all the way to tcp_twrespond(), to generate the correct ACK, then we will drop the last PCB reference. While here, make a few tiny improvements: - use bools for boolean variable - staticize nolocaltimewait - remove pointless acquisiton of socket lock Reported by: jtl Reviewed by: jtl Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D14697	2018-03-21 20:59:30 +00:00
Conrad Meyer	0e33efe4e4	Import Blake2 algorithms (blake2b, blake2s) from libb2 The upstream repository is on github BLAKE2/libb2. Files landed in sys/contrib/libb2 are the unmodified upstream files, except for one difference: secure_zero_memory's contents have been replaced with explicit_bzero() only because the previous implementation broke powerpc link. Preferential use of explicit_bzero() is in progress upstream, so it is anticipated we will be able to drop this diff in the future. sys/crypto/blake2 contains the source files needed to port libb2 to our build system, a wrapped (limited) variant of the algorithm to match the API of our auth_transform softcrypto abstraction, incorporation into the Open Crypto Framework (OCF) cryptosoft(4) driver, as well as an x86 SSE/AVX accelerated OCF driver, blake2(4). Optimized variants of blake2 are compiled for a number of x86 machines (anything from SSE2 to AVX + XOP). On those machines, FPU context will need to be explicitly saved before using blake2(4)-provided algorithms directly. Use via cryptodev / OCF saves FPU state automatically, and use via the auth_transform softcrypto abstraction does not use FPU. The intent of the OCF driver is mostly to enable testing in userspace via /dev/crypto. ATF tests are added with published KAT test vectors to validate correctness. Reviewed by: jhb, markj Obtained from: github BLAKE2/libb2 Differential Revision: https://reviews.freebsd.org/D14662	2018-03-21 16:18:14 +00:00
Conrad Meyer	5fbc5b5a3c	cryptosoft(4): Zero plain hash contexts, too An OCF-naive user program could use these primitives to implement HMAC, for example. This would make the freed context sensitive data. Probably other bzeros in this file should be explicit_bzeros as well. Future work. Reviewed by: jhb, markj Differential Revision: https://reviews.freebsd.org/D14662 (minor part of a larger work)	2018-03-21 16:12:07 +00:00
Stephen Hurd	7021bf0569	Update copyright per Matthew Macy "Under my tutelage Nicole did 85% of the work. At the time it seemed simplest for a number of reasons to put my copyright on it. I now consider that to have been a mistake." Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Approved by: shurd Differential Revision: https://reviews.freebsd.org/D14766	2018-03-21 15:57:36 +00:00
Jonathan T. Looney	7fb2986ff6	If the INP lock is uncontested, avoid taking a reference and jumping through the lock-switching hoops. A few of the INP lookup operations that lock INPs after the lookup do so using this mechanism (to maintain lock ordering): 1. Lock lookup structure. 2. Find INP. 3. Acquire reference on INP. 4. Drop lock on lookup structure. 5. Acquire INP lock. 6. Drop reference on INP. This change provides a slightly shorter path for cases where the INP lock is uncontested: 1. Lock lookup structure. 2. Find INP. 3. Try to acquire the INP lock. 4. If successful, drop lock on lookup structure. Of course, if the INP lock is contested, the functions will need to revert to the previous way of switching locks safely. This saves a few atomic operations when the INP lock is uncontested. Discussed with: gallatin, rrs, rwatson MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D12911	2018-03-21 15:54:46 +00:00
Andrew Turner	d614c09a82	Use a table to find the endpoint configuration On the Allwinner SoCs we need to set a custom endpoint configuration. To allow for this use a table to store the configuration so the attachment can override it. Reviewed by: hselasky Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14783	2018-03-21 15:17:54 +00:00
Warner Losh	7961a77148	Mark psycho interrupts as MPSAFE. It's safe to do so now that we don't need Giant to call shutdown_nice().	2018-03-21 14:47:17 +00:00
Warner Losh	026fb270ca	Unlock giant when calling shutdown_nice()	2018-03-21 14:47:12 +00:00
Warner Losh	b799e21b28	This is MPSAFE on this platform, so don't take Giant out while running the callback.	2018-03-21 14:47:08 +00:00
Warner Losh	9b4bb7d500	These interrupts call shutdown_nice() which should be called Giant unlocked. Rather than dropping it in the interrupt handler, mark these handlers as MPSAFE.	2018-03-21 14:47:03 +00:00
Warner Losh	3e867f24cb	bufshutdown is no longer called with Giant held, so there's no need to drop or pickup Giant anymore. Remove that code and adjust comments.	2018-03-21 14:46:59 +00:00
Warner Losh	d5292812f8	Remove Giant from init creation and vfs_mountroot. Sponsored by: Netflix Discussed with: kib@, mckusick@ Differential Review: https://reviews.freebsd.org/D14712	2018-03-21 14:46:54 +00:00
Warner Losh	df4ee7639e	Revert r331273: "Release the "TUR" reference when clearing the TUR work flag. We mostly" It exposes other issues, so revert to the pervious state of known issues.	2018-03-21 12:55:59 +00:00
Konstantin Belousov	661722e76f	Move sysinit and sysuninit linker sets in the data (writeable) section. Both sets are sorted in place, and with the introduction of read-only permissions on the amd64 kernel text, the sorting override depended on CR0.WP turned off. Make it correct by moving the sets into writeable part of the KVA, also fixing boot on machines where hand-off from BIOS to OS occurs with CR0.WP set. Based on submission by: Peter Lei <peter.lei@ieee.org> MFC after: 1 week	2018-03-21 10:26:39 +00:00
Conrad Meyer	c37125d9e5	Add missed sys/limits.h include Apparently header pollution on x86 hid its absense. Sorry, other arch users. Fix the missed header introduced in r331279. Reported by: tinderbox	2018-03-21 03:43:40 +00:00
Conrad Meyer	4948f7bf11	Regenerate sysent files after r331279.	2018-03-21 01:17:01 +00:00
Conrad Meyer	e9ac27430c	Implement getrandom(2) and getentropy(3) The general idea here is to provide userspace programs with well-defined sources of entropy, in a fashion that doesn't require opening a new file descriptor (ulimits) or accessing paths (/dev/urandom may be restricted by chroot or capsicum). getrandom(2) is the more general API, and comes from the Linux world. Since our urandom and random devices are identical, the GRND_RANDOM flag is ignored. getentropy(3) is added as a compatibility shim for the OpenBSD API. truss(1) support is included. Tests for both system calls are provided. Coverage is believed to be at least as comprehensive as LTP getrandom(2) test coverage. Additionally, instructions for running the LTP tests directly against FreeBSD are provided in the "Test Plan" section of the Differential revision linked below. (They pass, of course.) PR: 194204 Reported by: David CARLIER <david.carlier AT hardenedbsd.org> Discussed with: cperciva, delphij, jhb, markj Relnotes: maybe Differential Revision: https://reviews.freebsd.org/D14500	2018-03-21 01:15:45 +00:00
Jamie Gritton	672756aa9f	Represent boolean jail options as an array of structures containing the flag and both the regular and "no" names, instead of two different string arrays whose indices need to match the flag's bit position. This makes them similar to the say "jailsys" options are represented. Loop through either kind of option array with a structure pointer rather then an integer index.	2018-03-20 23:08:42 +00:00
Alexander V. Chernikov	b2b7ca49dc	Use count(9) api for the bpf(4) statistics. Currently each bfp descriptor uses u64 variables to maintain its counters. On interfaces with high packet rate this leads to unnecessary contention and inaccurate reporting. PR: kern/205320 Reported by: elofu17 at hotmail.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D14726	2018-03-20 22:57:06 +00:00
Warner Losh	7b0eb8dbf8	Release the "TUR" reference when clearing the TUR work flag. We mostly do this right, except when there's no BP and we do a TUR by request. In that case, we clear the flag, but don't release the reference, leaking the reference on rare occasion. PR: 226510 Sponsored by: Netflix	2018-03-20 22:07:45 +00:00
Gleb Smirnoff	83fc34ea0d	At this point iwmesg isn't initialized yet, so print pointer to lock rather than panic before panicing.	2018-03-20 22:05:21 +00:00
Warner Losh	4e96c99bdf	Push down Giant one layer. In the days of yore, back when Penitums were the new kids on the block and F00F hacks were all the rage, one needed to take out Giant to do anything moderately complicated with the VM, mappings and such. So the pccard / cardbus code held Giant for the entire insertion or removal process. Today, the VM is MP safe. The lock is only needed for dealing with newbus things. Move locking and unlocking Giant to be only around adding and probing devices in pccard and cardbus.	2018-03-20 22:01:18 +00:00
Mark Johnston	1de56ac728	Revert part of r331264: disable interrupts before disabling WP. We might otherwise be preempted, leaving WP disabled while another thread runs on the CPU. Reported by: kib X-MFC with: r331264	2018-03-20 21:36:35 +00:00
Warner Losh	b6735f6ff5	Drop support for lint for cdefs.h.	2018-03-20 21:18:40 +00:00
Warner Losh	39e5f2b207	Remove obsolete lint support.	2018-03-20 21:17:48 +00:00
Mark Johnston	7a79ce2e38	Make use of the KPI added in r331252. MFC after: 2 weeks	2018-03-20 21:16:26 +00:00
Ed Maste	6e7f286b47	Restore close quote lost in r331254	2018-03-20 21:04:47 +00:00
John Baldwin	e875be212d	Use <stdarg.h> instead of <machine/stdarg.h> in userland. <machine/stdarg.h> is a kernel-only header. The standard header for userland is <stdarg.h>. Using the standard header in userland avoids weird build errors when building with external compilers that include their own stdarg.h header. Reviewed by: arichardson, brooks, imp Sponsored by: DARPA / AFRL Differential Revision: https://reviews.freebsd.org/D14776	2018-03-20 21:00:45 +00:00
Konstantin Belousov	8fbcc3343f	Move the CR0.WP manipulation KPI to x86. This should allow to avoid some #ifdefs in the common x86/ code. Requested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-03-20 20:20:49 +00:00
Ed Maste	b7d779b3e5	Make linuxulator fn declaration match definition I accidentally swapped 'linux_fixup_elf' to 'linux_elf_fixup' in amd64's declaration (only), while bringing this change over from git and encountering a conflict.	2018-03-20 19:28:52 +00:00
Ed Maste	fc2a8776a2	Rename assym.s to assym.inc assym is only to be included by other .s files, and should never actually be assembled by itself. Reviewed by: imp, bdrewery (earlier) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D14180	2018-03-20 17:58:51 +00:00
Konstantin Belousov	9cffc92c62	Disable write protection around patching of XSAVE instruction in the context switch code. Some BIOSes give control to the OS with CR0.WP already set, making the kernel text read-only before cpu_startup(). Reported by: Peter Lei <peter.lei@ieee.org> Reviewed by: jtl Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14768	2018-03-20 17:47:29 +00:00
Konstantin Belousov	2337dc6430	Provide KPI for handling of rw/ro kernel text. This is a pure syntax patch to create an interface to enable and later restore write access to the kernel text and other read-only mapped regions. It is in line with e.g. vm_fault_disable_pagefaults() by allowing the nesting. Discussed with: Peter Lei <peter.lei@ieee.org> Reviewed by: jtl Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14768	2018-03-20 17:43:50 +00:00
John Baldwin	fd40ecf3d4	Set the proper vnet in IPsec callback functions. When using hardware crypto engines, the callback functions used to handle an IPsec packet after it has been encrypted or decrypted can be invoked asynchronously from a worker thread that is not associated with a vnet. Extend 'struct xform_data' to include a vnet pointer and save the current vnet in this new member when queueing crypto requests in IPsec. In the IPsec callback routines, use the new member to set the current vnet while processing the modified packet. This fixes a panic when using hardware offload such as ccr(4) with IPsec after VIMAGE was enabled in GENERIC. Reported by: Sony Arpita Das and Harsh Jain @ Chelsio Reviewed by: bz MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14763	2018-03-20 17:05:23 +00:00
Konstantin Belousov	79e9552ebb	Check for wrap-around in vm_phys_alloc_seg_contig(). It is possible to provide insane values for size in contigmalloc(9) request, which usually not reaches the phys allocator due to failing KVA allocation. But with the forthcoming 4/4 i386, where 32bit architecture has almost 4G KVA, contigmalloc(1G) is not unreasonable outright and KVA might be available sometimes. Then, the calculation of pa_end could wrap around, depending on the physical address, and the checks in vm_phys_alloc_seg_contig() would pass while the iteration in the loop after the 'done' label goes out of the vm_page_array bounds. Fix it by detecting the wrap. Reported and tested by: pho Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14767	2018-03-20 16:17:55 +00:00
Mark Johnston	8c7549da2b	Drop KTR_CONTENTION. It is incomplete, has not been adopted in the other locking primitives, and we have other means of measuring lock contention (lock_profiling, lockstat, KTR_LOCK). Drop it to slightly de-clutter the mutex code and free up a precious KTR class index. Reviewed by: jhb, mjg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14771	2018-03-20 15:51:05 +00:00
Andrew Turner	ed4c884f2e	Check if the gettime runtime service is valid. The U-Boot efi runtime service expects us to set the address map before calling any runtime services. It will then remap a few functions to their runtime version. One of these is the gettime function. If we call into this without having set a runtime map we get a page fault. Add a check to see if this is valid in efi_init() so we don't try to use the possibly invalid pointer. Reviewed by: imp, kevans (both previous version) X-MFC-With: r330868 Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14759	2018-03-20 13:35:20 +00:00
Warner Losh	400326b667	Kill assert I shouldn't have committed	2018-03-20 13:14:10 +00:00
Warner Losh	afdbfe1e1b	Starting LBA is a 64bit number, so use htole64 instead of htole32. The latter casts the LBA to a 32-bit number before assigning it to the 64 bit structure entity. This works fine on the first 2TB of TRIMs, but terrible beyond that due to trucation. Also, add an assert to make sure we don't end too many DSM TRIM entries in one request. Sponsored by: Netflix	2018-03-20 03:37:14 +00:00
Warner Losh	6f591d13fd	Make kern.cam.nda.num_trim tunable to limit the number of BIO_DELETE requests that we'll collapse into one DSM_TRIM. By default it is a 256, which is the max that will fit into a 4k page. Sponsored by: Netflix	2018-03-20 03:37:09 +00:00
Warner Losh	fdfc0a83a3	Remove some redundant MPSAFE flags. This was pointed out in a code review I'm having trouble finding right now, but go ahead and eliminate these. Sponsored by: Netfix	2018-03-20 03:37:04 +00:00
Ed Maste	0a26f9316a	Rationalize license text on Linuxolator files i386 linux.h missed in r330239. Approved by: sos MFC after: 1 week Sponsored by: The FreeBSD Foundation	2018-03-20 02:50:11 +00:00
Justin Hibbits	2acde6a85a	Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a pointer. When &bdomain[i] is added to arg2 it widens from uintptr_t to intmax_t, then gcc whines when it gets cast to a pointer. Casting through uintptr_t silences this warning.	2018-03-20 02:01:30 +00:00
Justin Hibbits	a029f84189	Fix powerpc Book-E build post-331018/331048. pagedaemon_wakeup() was moved from vm_pageout.h to vm_pagequeue.h.	2018-03-20 01:07:22 +00:00
Oleksandr Tymoshenko	108117cc22	[ofw] fix errneous checks for OF_finddevice(9) return value OF_finddevices returns ((phandle_t)-1) in case of failure. Some code in existing drivers checked return value to be equal to 0 or less/equal to 0 which is also wrong because phandle_t is unsigned type. Most of these checks were for negative cases that were never triggered so trhere was no impact on functionality. Reviewed by: nwhitehorn MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D14645	2018-03-20 00:03:49 +00:00
Alexander Motin	5f5baf0e96	Update mpr(4) driver from v15 to v18 from Broadcom site. Version 16 is just a number bump, since we already had those changes. Version 17 introduces new AdapterType value, that allows new user-space tools from Broadcom to differentiate adapter generations 3 and 3.5. Version 18 updates headers and adds SAS_DEVICE_DISCOVERY_ERROR reporting. MFC after: 2 weeks	2018-03-19 23:21:45 +00:00
Matt Joras	d6160f6079	Fix initialization of eventhandler mutex. mtx_init does not do a copy of the name string it is passed. The eventhandler code incorrectly passed the parameter string directly to mtx_init instead of using the copy it makes. This was an existing problem with the code that I dutifully copied over in my changes in r325621. Reported by: Anton Rang <rang AT acm.org> Reviewed by: rstone, markj Approved by: rstone (mentor) MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14764	2018-03-19 22:43:27 +00:00
Ed Maste	dc85846736	Rename linuxulator functions with linux_ prefix It's preferable to have a consistent prefix. This also reduces differences between the three linux*_sysvec.c files. Sponsored by: Turing Robotic Industries Inc.	2018-03-19 21:26:32 +00:00
Kristof Provost	b4b8fa3387	pf: Fix memory leak in DIOCRADDTABLES If a user attempts to add two tables with the same name the duplicate table will not be added, but we forgot to free the duplicate table, leaking memory. Ensure we free the duplicate table in the error path. Reported by: Coverity CID: 1382111 MFC after: 3 weeks	2018-03-19 21:13:25 +00:00
Eric Joyner	7d48aa4c72	ixgbe(4): Update shared code, add support for X552 1G, fix bug This patch will: - Update ixgbe shared code - Add support for Intel(R) Ethernet Connection X552 1000BASE-T - Add error handling for link state check preventing VF from stopping traffic after changing PF's MTU value Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com> Reviewed by: Intel Networking Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D13885	2018-03-19 20:55:05 +00:00
Kenneth D. Merry	0afdc47158	cam_periph_acquire() now returns an errno. The ch(4) driver was missed in change 328918, which changed cam_periph_acquire() to return an errno instead of cam_status. As a result, ch(4) failed to attach. Sponsored by: Spectra Logic	2018-03-19 20:19:00 +00:00
John Baldwin	7af5f2acfb	Fix a typo. Reviewed by: kib	2018-03-19 17:14:56 +00:00
Lawrence Stewart	370efe5ac8	Add support for the experimental Internet-Draft "TCP Alternative Backoff with ECN (ABE)" proposal to the New Reno congestion control algorithm module. ABE reduces the amount of congestion window reduction in response to ECN-signalled congestion relative to the loss-inferred congestion response. More details about ABE can be found in the Internet-Draft: https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn The implementation introduces four new sysctls: - net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to enable ABE for ECN-enabled TCP connections. - net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the multiplicative window decrease factor, specified as a percentage, applied to the congestion window in response to a loss-based or ECN-based congestion signal respectively. They default to the values specified in the draft i.e. beta=50 and beta_ecn=80. - net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to non-zero to enable the use of standard beta (50% by default) when repairing loss during an ECN-signalled congestion recovery episode. It enables a more conservative congestion response and is provided for the purposes of experimentation as a result of some discussion at IETF 100 in Singapore. The values of beta and beta_ecn can also be set per-connection by way of the TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or CC_NEWRENO_BETA_ECN CC algo sub-options. Submitted by: Tom Jones <tj@enoti.me> Tested by: Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au> Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D11616	2018-03-19 16:37:47 +00:00
Emmanuel Vadot	d9d3a08ed4	sys/dts: Remove arm64 from subdir as it no longer exists. r325987 removed the arm64 directory, remove it from SUBDIR too.	2018-03-19 15:35:26 +00:00
Ed Maste	9bec2ea66e	linux*_sysvec.c: rationalize whitespace and comments There's a fair amount of duplication between MD linuxulator files. Make indentation and comments consistent between the three versions of linux_sysvec.c to reduce diffs when comparing them. Sponsored by: Turing Robotic Industries Inc.	2018-03-19 15:11:10 +00:00
Hans Petter Selasky	5cd5781c75	Remove redundant integer cast in ibcore. The "ref_count" field already has integer type. MFC after: 1 week Sponsored by: Mellanox Technologies	2018-03-19 13:51:33 +00:00
Ian Lepore	d892051323	Add the device/chip type to the disk d_descr field, and print more info about the chip including the erase block size at attach time. Also add myself to the copyrights since at this point svn blame would point to me as the culprit for much of this.	2018-03-18 18:58:47 +00:00
Ian Lepore	3c9af13c75	Add support for 4K and 32K erase block sizes. Many of the supported chips have these flags set in the ident table, but there was no code to support using the smaller erase sizes.	2018-03-18 18:37:47 +00:00
Ian Lepore	c03ab159f6	Make all internal routines return an int error status, and check the status at all call points. Combine the get_status and wait_for_ready routines, since waiting for ready is the only reason to ever get status.	2018-03-18 17:47:57 +00:00
Ian Lepore	89a1585b8d	Add sc_parent to the softc and use it in place of device_get_parent() calls all over the place. Also pass the softc as the arg to all the internal functions instead of passing a device_t and calling device_get_softc() in each function.	2018-03-18 17:25:23 +00:00
Mark Johnston	95099bbad1	Fix an access of an uninitialized variable in dtrace_probe(). Reported by: Coverity, via cem MFC after: 3 days	2018-03-18 17:01:50 +00:00
Ian Lepore	89a895b63c	Bugfix: wait for writes/erases to complete after starting them, instead of before starting them. Using the wait-before logic would make sense if there was useful time- consuming work that could be done between the end of one write and the beginning of the next, but it also requires doing the wait-for-ready before reading, because a prior write or erase could still be in progress. Reading is the far more common case, so adding a whole extra bus transaction to check for ready before each read would soak up any small gains that might be had from doing async writes.	2018-03-18 16:52:31 +00:00
Mark Johnston	c6a70eaea8	Avoid dequeuing the fault page during a soft fault. Such pages are re-enqueued at the end of the fault handler, preserving LRU. Rather than performing two separate operations per fault, simply requeue the page at the end of the fault (or bump its activation count if it resides in PQ_ACTIVE, avoiding the page queue lock entirely). This elides some page lock and page queue lock operations in common cases, e.g., CoW faults. Note that we must still dequeue the source page for "optimized" CoW faults since the page may not remain enqueued while it is moved to another object. Reviewed by: alc, kib Tested by: pho MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D14625	2018-03-18 16:49:30 +00:00
Mark Johnston	0eb50f9cd2	Have vm_page_{deactivate,launder}() requeue already-queued pages. In many cases the page is not enqueued so the change will have no effect. However, the change is needed to support an optimization in the fault handler and in some cases (sendfile, the buffer cache) it was being emulated by the caller anyway. Reviewed by: alc Tested by: pho MFC after: 2 weeks X-Differential Revision: https://reviews.freebsd.org/D14625	2018-03-18 16:40:56 +00:00
Ian Lepore	19aa9f7183	Eliminate some unneeded intermediate variables. Eliminate some redundant parens in shift-and-mask expressions. Reword and reflow some comments.	2018-03-18 16:36:14 +00:00
Mark Johnston	434862acb1	Have vm_page_replace() assert that the new page is not enqueued. The new page does not belong to a VM object, but the page daemon does not expect to encounter such pages. Reviewed by: alc, kib Tested by: pho MFC after: 1 week X-Differential Revision: https://reviews.freebsd.org/D14625	2018-03-18 16:35:40 +00:00
Ian Lepore	f432eb7ea1	Remove a pointless KASSERT and reword a comment a bit. The KASSERT tested for the same condition that the preceeding lines checked for and would have returned EIO, so the assert could never possibly trigger (sc_sectorsize must inherently be an integer multiple of FLASH_PAGE_SIZE).	2018-03-18 16:10:14 +00:00
Ian Lepore	dac94adb63	Do not overwrite the contents of BIO_WRITE buffers. SPI inherently transfers data in both directions at once. When writing to the device, use a dummy buffer for the incoming data, not the same buffer as the outgoing data. Writes are done in FLASH_PAGE_SIZE chunks, which is only 256 bytes, so just put the dummy buffer into the softc.	2018-03-18 15:56:10 +00:00
Mariusz Zaborski	9ea857cf0f	Remove unneeded variable which was introduced in r328472. Pointed out by: pjd@	2018-03-18 15:09:55 +00:00
Conrad Meyer	22aec4de9f	lib(private)zstd: Fix riscv build Link __bswap[ds]i2() intrinsics in to libzstd for riscv, where the C runtime apparently lacks such intrinsics. Broken in r330894. Reported by: asomers Sponsored by: Dell EMC Isilon	2018-03-18 03:42:57 +00:00
Mateusz Guzik	09bdec20a0	locks: slightly depessimize lockstat The slow path is always taken when lockstat is enabled. This induces rdtsc (or other) calls to get the cycle count even when there was no contention. Still go to the slow path to not mess with the fast path, but avoid the heavy lifting unless necessary. This reduces sys and real time during -j 80 buildkernel: before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total So note this is still a significant hit. LOCK_PROFILING results are not affected.	2018-03-17 19:26:33 +00:00
Jeff Roberson	3cec5c77d6	Move the dirty queues inside the per-domain structure. This resolves a bug where we had not hit global dirty limits but a single queue was starved for space by dirty buffers. A single buf_daemon is maintained for now. Add a bd_speedup() when we are low on bufspace. This can happen due to SUJ keeping many bufs locked until a cg block is written. Document this with a comment. Fix sysctls to work with per-domain variables. Add more ddb debugging. Reported by: pho Reviewed by: kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14705	2018-03-17 18:14:49 +00:00
Alan Somers	b521cf275c	audit(4): fix a typo in a comment no functional change	2018-03-17 17:56:08 +00:00
Warner Losh	8d1b99a023	Use kern.opts.mk instead of bsd.own.mk (which includes src.opts.mk) here.	2018-03-17 17:18:46 +00:00
Warner Losh	8346f88834	Use FreeBSD-current conventions for building options rather than FreeBSD 10 conventions: inlude kern.opts.mk.	2018-03-17 17:18:41 +00:00
Warner Losh	d4dccac09a	Remove commented out code to generate opt_inet*.h. That's handled automatically by kern.opts.mk now. Include that instead.	2018-03-17 17:18:37 +00:00
Warner Losh	4dcef3bca1	Add EFI to kernel options. Some parts of MI modules will soon depend on whether EFI is available or not. Add EFI to the list of kernel options so we can use it in the modules build.	2018-03-17 17:18:29 +00:00
Alexander V. Chernikov	1435dcd94f	Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration. Current arp/nd code relies on the feedback from the datapath indicating that the entry is still used. This mechanism is incorporated into the arpresolve()/nd6_resolve() routines. After the inpcb route cache introduction, the packet path for the locally-originated packets changed, passing cached lle pointer to the ether_output() directly. This resulted in the arp/ndp entry expire each time exactly after the configured max_age interval. During the small window between the ARP/NDP request and reply from the router, most of the packets got lost. Fix this behaviour by plugging datapath notification code to the packet path used by route cache. Unify the notification code by using single inlined function with the per-AF callbacks. Reported by: sthaug at nethelp.no Reviewed by: ae MFC after: 2 weeks	2018-03-17 17:05:48 +00:00
Warner Losh	378e38c1cf	Only take out the periph lock when we're modifying the flags of the softc for an async unit attention. CAM locks, sometimes, the periph lock and other times does not. We were taking the lock always and running into lock recursion issues on a non-recursive lock. Now we take it selectively. It's not clear why xpt takes the lock selectively before calling us, though, and that's still under investigation. Reported by: avg PR: 226510 (same panic, differnt circumstances) Sponsored by: Netflix	2018-03-17 16:04:06 +00:00
Ed Maste	f91e2d3a95	Move assym.s to DPSRCS in vmbus module assym.s is only to be included by other .s files, and should not actually be assembled by itself.	2018-03-17 14:50:20 +00:00
Ed Maste	d8ba45e213	Revert r313780 (UFS_ prefix)	2018-03-17 12:59:55 +00:00
Ed Maste	1e2b9afca9	Prefix UFS symbols with UFS_ to reduce namespace pollution Followup to r313780. Also prefix ext2's and nandfs's versions with EXT2_ and NANDFS_. Reported by: kib Reviewed by: kib, mckusick Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D9623	2018-03-17 01:48:27 +00:00
Ed Maste	4e78ff7068	ANSIfy sys/x86	2018-03-17 01:40:09 +00:00
Brooks Davis	28e7752907	Add _IOC_NEWLEN() and _IOC_NEWTYPE() macros. These macros take an existing ioctl(2) command and replace the length with the specified length or length of the specified type respectively. These can be used to define commands for 32-bit compatibility with fewer opportunities for cut-and-paste errors then a whole new definition. Reviewed by: cem, kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14706	2018-03-16 22:23:04 +00:00
Conrad Meyer	db488e4f52	random(4): Poll for signals during large reads Occasionally poll for signals during large reads of the /dev/u?random devices. This allows cancellation via SIGINT of accidental invocations of very large reads. (A 2GB /dev/random read, which takes about 10 seconds on my 2017 AMD Zen processor, can be aborted.) I believe this behavior was intended since 2014 (r273997), just not fully implemented. This is motivated by a potential getrandom(2) interface that may not explicitly forbid extremely large reads on 64-bit platforms -- even larger than the 2GB limit imposed on devfs I/O by default. Such reads, if they are to be allowed, should be cancellable by the user or administrator. Reviewed by: delphij Approved by: secteam (delphij) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14684	2018-03-16 18:50:26 +00:00
Ian Lepore	9c45f7b4fd	Use EFI RTC capabilities info when registering, add bootverbose diagnostics. Make some small improvements to the efirtc driver by obtaining the clock capabilities (resolution and whether the sub-second counters are reset) and using the info when registering the clock. When the hardware zeroes out the subsecond info on clock-set, schedule clock updates to happen just before top-of-second, so that the RTC time is closely in-sync with kernel time. Also, in the identify() routine, always add the driver if EFI runtime services are available, then decide in probe() whether to attach the driver or not. If not attaching and bootverbose is on, say why. All of this is basically to avoid "silent failure" -- if someone thinks there should be an efi rtc and it's not attaching, at least they can set bootverbose and maybe get a clue from the output. Differential Revision: https://reviews.freebsd.org/D14565 (timed out)	2018-03-16 18:16:27 +00:00
Ian Lepore	9c237f3a13	Add the header file needed for the recently-added call to pagedaemon_wakeup().	2018-03-16 16:06:25 +00:00
Michael Tuexen	1574b1e41e	Set the inp_vflag consistently for accepted TCP/IPv6 connections when net.inet6.ip6.v6only=0. Without this patch, the inp_vflag would have INP_IPV4 and the INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl variable net.inet6.ip6.v6only is 0. This resulted in netstat to report the source and destination addresses as IPv4 addresses, even they are IPv6 addresses. PR: 226421 Reviewed by: bz, hiren, kib MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D13514	2018-03-16 15:26:07 +00:00
Ed Maste	d595c5c0d6	linux_errno.c: add newer errno values Also introduce a static assert to ensure the list is kept up to date. Sponsored by: Turing Robotic Industries Inc.	2018-03-16 14:51:47 +00:00
Ed Maste	6e481f83f7	Share a single bsd-linux errno table across MD consumers Three copies of the linuxulator linux_sysvec.c contained identical BSD to Linux errno translation tables, and future work to support other architectures will also use the same table. Move the table to a common file to be used by all. Make it 'const int' to place it in .rodata. (Some existing Linux architectures use MD errno values, but x86 and Arm share the generic set.) This change should introduce no functional change; a followup will add missing errno values. MFC after: 3 weeks Sponsored by: Turing Robotic Industries Inc. Differential Revision: https://reviews.freebsd.org/D14665	2018-03-16 14:46:38 +00:00
Ed Maste	1d071ce340	Move assym.s to DPSRC in sgx module assym.s is only to be included by other .s files, and should not actually be assembled by itself.	2018-03-16 13:33:42 +00:00
Ed Maste	dd851c507b	ANSIfy i386/vm86.c	2018-03-16 12:12:41 +00:00
Conrad Meyer	27cb8d849f	Garbage collect unused chacha20 code Two copies of chacha20 were imported into the tree on Apr 15 2017 (r316982) and Apr 16 2017 (r317015). Only the latter is actually used by anything, so just go ahead and garbage collect the unused version while it's still only in CURRENT. I'm not making any judgement on which implementation is better. If I pulled the wrong one, feel free to swap the existing implementation out and replace it with the other code (conforming to the API that actually gets used in randomdev, of course). We only need one generic implementation. Sponsored by: Dell EMC Isilon	2018-03-16 07:11:53 +00:00
Conrad Meyer	5d3b36666b	Fix GCC build: Remove redundant pagedaemon_wakeup declaration Introduced in r331018. Reported by: kevans Sponsored by: Dell EMC Isilon	2018-03-16 07:05:09 +00:00
Warner Losh	d85d964829	Try polling the qpairs on timeout. On some systems, we're getting timeouts when we use multiple queues on drives that work perfectly well on other systems. On a hunch, Jim Harris suggested I poll the completion queue when we get a timeout. This patch polls the completion queue if no fatal status was indicated. If it had pending I/O, we complete that request and return. Otherwise, if aborts are enabled and no fatal status, we abort the command and return. Otherwise we reset the card. This may clear up the problem, or we may see it result in lots of timeouts and a performance problem. Either way, we'll know the next step. We may also need to pay attention to the fatal status bit of the controller. PR: 211713 Suggested by: Jim Harris Sponsored by: Netflix	2018-03-16 05:23:48 +00:00
Ian Lepore	3a25d855de	Add required interface header. Reported by: andreast@	2018-03-16 02:46:08 +00:00
Andriy Voskoboinyk	5f792f7478	rtwn(4): de-hardcode ('h/w rate index' - 'corresponding MCS index') constant	2018-03-16 01:03:10 +00:00
Andriy Voskoboinyk	46e18fc6d4	urtw(4), zyd(4): reduce code verbosity. No functional change intended.	2018-03-16 00:38:10 +00:00
Andriy Voskoboinyk	2757acf673	urtw(4): provide names for some commonly used rate indices + drop now-unused urtw_rate2rtl()	2018-03-16 00:09:16 +00:00
Andriy Voskoboinyk	2a440d19c1	Correct comment for IFM_IEEE80211_VHT media variant.	2018-03-15 23:32:29 +00:00
Brooks Davis	064c9c3d42	Add a request structure and make the implementation use it. This allows compatibility translation to take place on the stack (md_ioctl is too big) and is more suitable as a public interface within the kernel than the kern_ioctl interface. Except for the initialization of the md_req from the md_ioctl (including detection of kernel md_file pointers) and the updating of the md_ioctl prior to return, this is a mechanical replacment of md_ioctl and mdio with md_req and mdr. Reviewed by: markj, cem, kib (assorted versions) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14704	2018-03-15 21:42:49 +00:00
Jeff Roberson	30fbfdda6c	Eliminate pageout wakeup races. Take another step towards lockless vmd_free_count manipulation. Reduce the scope of the free lock by using a pageout lock to synchronize sleep and wakeup. Only trigger the pageout daemon on transitions between states. Drive all wakeup operations directly as side-effects from freeing memory rather than requiring an additional function call. Reviewed by: markj, kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14612	2018-03-15 19:23:07 +00:00
Brooks Davis	b65794ad84	Move implementation of ioctls into kern_*() functions. Move locks from outside ioctl to the individual implementations. This is the first step of changing the implementations to act on a kernel-internal request struct rather than on struct md_ioctl and to removing the use of kern_ioctl in mountroot. Reviewed by: cem, kib, markj (prior version) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14700	2018-03-15 18:12:55 +00:00
Edward Tomasz Napierala	6616539dcc	Fix iSCSI target crash on session reinstation. The crash scenario goes like this: there's a thread waiting on "reinstate"; because it doesn't update the timeout counter it gets terminated by the callout; at this point the maintenance thread starts the termination routine. The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops the refcount, which allows the maintenance thread to free its resources. At this point another thread receives a PDU. Boom. PR: 222898, 219866 Reported by: Eugene M. Zheganin <emz at norma.perm.ru> Tested by: Eugene M. Zheganin <emz at norma.perm.ru> Reviewed by: mav@ (earlier version) MFC after: 2 weeks Sponsored by: playkey.net	2018-03-15 17:36:13 +00:00
Brooks Davis	94598ac9f9	Restore the behavior of returning the total number of units by unconditionally incrementing i in the loop; Reported by: cem MFC with: r330880 Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14685	2018-03-15 16:37:43 +00:00
Conrad Meyer	8475a4175f	aesni(4): Stylistic/comment enhancements Improve clarity of a comment and style(9) some areas. No functional change. Reported by: markj (on review of a mostly-copied driver) Sponsored by: Dell EMC Isilon	2018-03-15 16:17:02 +00:00
Andriy Gapon	aca41af247	g_access: deal with races created by geoms that drop the topology lock The problem is that g_access() must be called with the GEOM topology lock held. And that gives a false impression that the lock is indeed held across the call. But this isn't always true because many classes, ZVOL being one of the many, need to drop the lock. It's either to perform an I/O on the first open or to acquire a different lock (like in g_mirror_access). That, of course, can break many assumptions. For example, g_slice_access() adds an extra exclusive count on the first open. As described above, an underlying geom may drop the topology lock and that would open a race with another thread that would also request another extra exclusive count. In general, two consumers may be granted incompatible accesses. To avoid this problem the code is changed to mark a geom with special flag before calling its access method and clear the flag afterwards. If another thread sees that flag, then it means that the topology lock has been dropped (either by the geom in question or downstream from it), so it is not safe to make another access call. So, the second thread would use g_topology_sleep() to wait until the flag is cleared and only then would it proceed with the access. Also see http://docs.freebsd.org/cgi/mid.cgi?809d9254-ee56-59d8-69a4-08838e985cea PR: 225960 Reported by: asomers Reviewed by: markj, mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14533	2018-03-15 09:16:10 +00:00
Andriy Gapon	289c14e811	MFV r330973: 9164 assert: newds == os->os_dsl_dataset illumos/illumos-gate@5f5913bb83 `5f5913bb83` https://www.illumos.org/issues/9164 This issue has been reported by Alan Somers as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225877 dmu_objset_refresh_ownership() first disowns a dataset (and releases it) and then owns it again. There is an assert that the new dataset object is the same as the old dataset object. When running ZFS Test Suite on FreeBSD we see this panic from zpool_upgrade_007_pos test: panic: solaris assert: newds == os->os_dsl_dataset (0xfffff80045f4c000 == 0xfffff80021ab4800) I see that the old dataset has dsl_dataset_evict_async() pending in ds_dbu.dbu_tqent and its ds_dbuf is NULL. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Don Brady <don.brady@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Andriy Gapon <avg@FreeBSD.org> PR: 225877 Reported by: asomers MFC after: 1 week	2018-03-15 08:49:21 +00:00
Wojciech Macek	d90930743f	Reverting r330925 for now	2018-03-15 06:19:45 +00:00
Alexander Motin	db08ef4353	Increase ABOUT FIRMWARE command timeout to 5s. It seems default timeout of 100ms is not enough for my 2694L card, while it was perfectly fine for others, even for full-height 2694. MFC after: 1 week Sponsored by: iXsystems, Inc.	2018-03-15 01:07:21 +00:00
Ed Maste	03d2db1542	Remove KERNEL_RETPOLINE from BROKEN_OPTIONS on i386 Clang will compile both amd64 and i386 with retpoline. Sponsored by: The FreeBSD Foundation	2018-03-15 00:57:57 +00:00
Jung-uk Kim	8438a7a80a	Merge ACPICA 20180313.	2018-03-14 23:45:48 +00:00
Jung-uk Kim	ba425ae46d	Remove local definitions for _STA method in favor of ACPICA. These macros were added in ACPICA 20051216, more than a decade ago.	2018-03-14 23:42:28 +00:00
Warner Losh	5d7fd8f726	Fix error messages in cut and pasted code. Also, fix an unnecessary deref to get ctrlr. Noticed by: rpokala@ Sponsored by: Netflix	2018-03-14 23:28:28 +00:00
Warner Losh	8b1e6ebe0e	When tearing down a queue pair, also delete the queue entries. The NVME standard has required in section 7.2.6, since at least 1.1, that a clean shutdown is signalled by deleting the subission and the completion queues before setting the shutdown bit in CC. The 1.0 standard, apparently, did not and many of the early Intel cards didn't care. Some newer cards care, at least one whose beta firmware can scramble the card on an unclean shutdown. Linux has done this for some time. To make it possible to move forward with an evaluation of this pre-release card with wonky firmware, delete the queues on the card when we delete the qpair structures. Sponsored by: Netflix	2018-03-14 23:01:18 +00:00
Warner Losh	d61cf64d0e	Don't make the namespace devices eternal. We'll need to delete namespaces soon, so go ahead and stop making these devices eternal. It doesn't help much, and will be getting in the way soon. Sponsored by: Netflix	2018-03-14 23:01:04 +00:00
Conrad Meyer	330b675f65	vfs_bio.c: Apply cleanups motivated by Coverity analysis It is believed that the conditions Coverity indicated were actually impossible to hit. So this patch just adds a cleanup to only compute v_mount once in brelse(), and in vfs_bio_getpages() always initializes error to zero to appease the static analyzer. No functional change intended. Submitted by: Darrick Lew <darrick.freebsd AT gmail.com> Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14613	2018-03-14 22:11:45 +00:00
Steven Hartland	7d147b81ea	Fix mps deadlock when handling panic During shutdown mps waits for its SSU requests to complete however when performing a reboot after handling a panic the scheduler is stopped so getmicrotime which is used can be non-functional. Switch to using the same method as shutdown_panic to ensure we actually complete. In addition reduce the timeout when RB_NOSYNC is set in howto as we expect this to fail. Reviewed by: slm MFC after: 1 week Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D12776	2018-03-14 21:32:23 +00:00
Steven Hartland	a59e720e65	Prevent ZFS TRIM breaking VTOC8 partitions Update the ZFS TRIM code to ensure it respects VTOC8 partition headers as documented by the ZFS On-Disk Specification section 1.3 Before this a zpool create on a VTOC8 partitioned device would overwrite the partition metadata. Reported by: marius Reviewed by: marius agv MFC after: 1 week Sponsored by: Multiplay	2018-03-14 21:21:03 +00:00
Brooks Davis	f287c3e4d3	Fix FSACTL_GET_NEXT_ADAPTER_FIB under 32-bit compat. This includes FSACTL_LNX_GET_NEXT_ADAPTER_FIB. Reviewed by: cem Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14672	2018-03-14 21:11:41 +00:00
John Baldwin	6b02bb1aa7	Fix the check for an empty send socket buffer on a TOE TLS socket. Compare sbavail() with the cached sb_off of already-sent data instead of always comparing with zero. This will correctly close the connection and send the FIN if the socket buffer contains some previously-sent data but no unsent data. Reported by: Harsh Jain @ Chelsio Sponsored by: Chelsio Communications	2018-03-14 20:49:51 +00:00
John Baldwin	02d2bcfaba	Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build. - Remove the one use of is_tls_offload() and the function. AIO special handling only needs to be disabled when a TOE socket is actively doing TLS offload on transmit. The TOE socket's mode (which affects receive operation) doesn't matter, so remove the check for the socket's mode and only check if a TOE socket has TLS transmit keys configured to determine if an AIO write request should fall back to the normal socket handling instead of the TOE fast path. - Move can_tls_offload() into t4_tls.c. It is not used in critical paths, so inlining isn't that important. Change return type to bool while here. Sponsored by: Chelsio Communications	2018-03-14 20:46:25 +00:00
Brooks Davis	2026d70da4	Add opt_compat.h to isp(4) as required by r330876. MFC with: r330876	2018-03-14 20:07:52 +00:00
Hans Petter Selasky	cbfc3c73ce	Fix compliancy of the kstrtoXXX() functions in the LinuxKPI, by skipping one newline character at the end, if any. Found by: greg@unrelenting.technology MFC after: 1 week Sponsored by: Mellanox Technologies	2018-03-14 19:51:28 +00:00
Edward Tomasz Napierala	6960c4e135	Fix typo in a warning message. MFC after: 2 weeks	2018-03-14 18:27:06 +00:00
Nathan Whitehorn	7c95bf1e68	Fix fat-fingering ("optional standard") and move all the OF code to being marked "standard", which is less confusing than having it conditional on AIM CPUs here, and then picked up through options FDT from conf/files on Book-E. Request by: jhibbits	2018-03-14 18:07:40 +00:00
Warner Losh	d38677d23c	Create a sysctl kern.cam.{,a,n}da.X.invalidate kern.cam.{,a,n}da.X.invalidate=1 forces daX to detach by calling cam_periph_invalidate on the underlying periph. This is for testing purposes only. Include only with options CAM_TEST_FAILURE and rename the former [AN]DA_TEST_FAILURE, and fix nda to compile with it set. We're using it at work to harden geom and the buffer cache to be resilient in the face of drive failure. Today, it far too often results in a panic. While much work was done on SIM initiated removal for the USB thumnb drive removal work, little has been done for periph initiated removal. This simulates what daerror() does for some errors nicely: we get the same panics with it that we do with failing drives. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D14581	2018-03-14 17:53:37 +00:00
Warner Losh	2a559cb8c8	This should have been += so clean builds work. Noticed by: hps@	2018-03-14 16:45:04 +00:00
Warner Losh	157cb465c4	Fix inverted logic that counted all completions as errors, except when they were actual errors. Sponsored by: Netflix	2018-03-14 16:44:57 +00:00
Warner Losh	807e94b2c3	Implement trim collapsing in nda When multiple trims are in the queue, collapse them as much as possible. At present, this usually results in only a few trims being collapsed together, but more work on that will make it possible to do hundreds (up to some configurable max). Sponsored by: Netflix	2018-03-14 16:44:50 +00:00
Warner Losh	8a3de7bc34	Allow NULL ccb to cam_iosched_bio_complete When the ccb is NULL to cam_iosched_bio_complete, just update the other statistics, but not the time. If many operations are collapsed together, this is needed to keep stats properly for the grouped bp. This should fix trim accounting. Sponsored by: Netflix	2018-03-14 16:44:16 +00:00
Nathan Whitehorn	94f513c8db	The expression (aim \| fdt) is always true on PowerPC. The last PowerPC platform that can run without a device tree (PS3) still uses the OF_() functions to check if one exists and OF_ is used unconditionally in core parts of the system like powerpc/machdep.c. Reflect this reality in files.powerpc, for example by changing occurrences of aim \| fdt to standard.	2018-03-14 16:16:25 +00:00
Ed Maste	7b194b3d3b	Remove stray ; at end of linux_vdso_deinstall()	2018-03-14 13:20:36 +00:00
Wojciech Macek	22eedd96c7	PowerNV: Fix I2C to compile if FDT is disabled Submitted by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Sponsored by: IBM, QCM Technologies	2018-03-14 09:20:03 +00:00
Conrad Meyer	052d3c1290	Update to Zstandard 1.3.3 Includes patch to conditionalize use of __builtin_clz(ll) on __has_builtin(). The issue is tracked upstream at https://github.com/facebook/zstd/pull/884 . Otherwise, these are vanilla Zstandard 1.3.3 files. Note that the 1.3.4 release should be due out soon. Sponsored by: Dell EMC Isilon	2018-03-14 03:00:17 +00:00
Warner Losh	f8f471cf5f	We need opt_compat.h after r330819 and 330820. Add opt_compat.h to fix the stand-alone build case. Sponsored by: Netflix.	2018-03-13 23:36:15 +00:00
John Baldwin	1e9538d253	Support for TLS offload of TOE connections on T6 adapters. The TOE engine in Chelsio T6 adapters supports offloading of TLS encryption and TCP segmentation for offloaded connections. Sockets using TLS are required to use a set of custom socket options to upload RX and TX keys to the NIC and to enable RX processing. Currently these socket options are implemented as TCP options in the vendor specific range. A patched OpenSSL library will be made available in a port / package for use with the TLS TOE support. TOE sockets can either offload both transmit and reception of TLS records or just transmit. TLS offload (both RX and TX) is enabled by setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be enabled on the relevant interface. Transmit offload can be used on any "normal" or TLS TOE socket by using the custom socket option to program a transmit key. This permits most TOE sockets to transparently offload TLS when applications use a patched SSL library (e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL library). Receive offload can only be used with TOE sockets using the TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a list of TCP port numbers. Any connection with either a local or remote port number in that list will be created as a TLS socket rather than a plain TOE socket. Note that although this sysctl accepts an arbitrary list of port numbers, the sysctl(8) tool is only able to set sysctl nodes to a single value. A TLS socket will hang without receiving data if used by an application that is not using a patched SSL library. Thus, the tls_rx_ports node should be used with care. For a server mostly concerned with offloading TLS transmit, this node is not needed as plain TOE sockets will fall back to software crypto when using an unpatched SSL library. New per-interface statistics nodes are added giving counts of TLS packets and payload bytes (payload bytes do not include TLS headers or authentication tags/MACs) offloaded via the TOE engine, e.g.: dev.cc.0.stats.rx_tls_octets: 149 dev.cc.0.stats.rx_tls_records: 13 dev.cc.0.stats.tx_tls_octets: 26501823 dev.cc.0.stats.tx_tls_records: 1620 TLS transmit work requests are constructed by a new variant of t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c. TLS transmit work requests require a buffer containing IVs. If the IVs are too large to fit into the work request, a separate buffer is allocated when constructing a work request. This buffer is associated with the transmit descriptor and freed when the descriptor is ACKed by the adapter. Received TLS frames use two new CPL messages. The first message is a CPL_TLS_DATA containing the decryped payload of a single TLS record. The handler places the mbuf containing the received payload on an mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message which includes a copy of the TLS header and indicates if there were any errors. The handler for this message places the TLS header into the socket buffer followed by the saved mbuf with the payload data. Both of these handlers are contained in tom/t4_tls.c. A few routines were exposed from t4_cpl_io.c for use by t4_tls.c including send_rx_credits(), a new send_rx_modulate(), and t4_close_conn(). TLS keys for both transmit and receive are stored in onboard memory in the NIC in the "TLS keys" memory region. In some cases a TLS socket can hang with pending data available in the NIC that is not delivered to the host. As a workaround, TLS sockets are more aggressive about sending CPL_RX_DATA_ACK messages anytime that any data is read from a TLS socket. In addition, a fallback timer will periodically send CPL_RX_DATA_ACK messages to the NIC for connections that are still in the handshake phase. Once the connection has finished the handshake and programmed RX keys via the socket option, the timer is stopped. A new function select_ulp_mode() is used to determine what sub-mode a given TOE socket should use (plain TOE, DDP, or TLS). The existing set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and handles initialization of TLS-specific state when necessary in addition to DDP-specific state. Since TLS sockets do not receive individual TCP segments but always receive full TLS records, they can receive more data than is available in the current window (e.g. if a 16k TLS record is received but the socket buffer is itself 16k). To cope with this, just drop the window to 0 when this happens, but track the overage and "eat" the overage as it is read from the socket buffer not opening the window (or adding rx_credits) for the overage bytes. Reviewed by: np (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14529	2018-03-13 23:05:51 +00:00
John Baldwin	9689995d23	Simplify error handling in t4_tom.ko module loading. - Change t4_ddp_mod_load() to return void instead of always returning success. This avoids having to pretend to have proper support for unloading when only part of t4_tom_mod_load() has run. - If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly. The module handling code in the kernel invokes MOD_UNLOAD on a module whose MOD_LOAD fails with an error already. Reviewed by: np (part of a larger patch) MFC after: 1 month Sponsored by: Chelsio Communications	2018-03-13 21:42:38 +00:00
Brooks Davis	c92c85ffeb	md_pad is used by MDIOCLIST and not available for future use. MFC after: 1 week	2018-03-13 20:54:18 +00:00
Brooks Davis	8b9f77a14c	Don't overflow the kernel struct mdio in the MDIOCLIST ioctl. Always terminate the list with -1 and document the ioctl behavior. This preserves existing behavior as seen from userspace with the addition of the unconditional termination which will not be seen by working consumers of MDIOCLIST. Because this ioctl can only be performed by root (in default configurations) and is not used in the base system this bug is not deemed to warrant either a security advisory or an eratta notice. Reviewed by: kib Obtained from: CheriBSD Discussed with: security-officer (gordon) MFC after: 3 days Security: kernel heap buffer overflow Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14685	2018-03-13 20:39:06 +00:00
Brooks Davis	8037cdcd9a	Fix ISP_FC_LIP and ISP_RESCAN on big-endian 64-bit systems. For _IO() ioctls, addr is a pointer to uap->data which is a caddr_t. When the caddr_t stores an int, dereferencing addr as an (int *) results in truncation on little-endian 64-bit systems and corruption (owing to extracting top bits) on big-endian 64-bit systems. In practice the value of chan was probably always zero on systems of the latter type as all such FreeBSD platforms use a register-based calling convention. Reviewed by: mav Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14673	2018-03-13 19:56:10 +00:00
Konstantin Belousov	741e1c9196	Revert the chunk from r330410 in vm_page_reclaim_run(). There, the pages freed might be managed but the page's lock is not owned. For KPI correctness, the page lock is requried around the call to vm_page_free_prep(), which is asserted. Reclaim loop already did the work which could be done by vm_page_free_prep(), so the lock is not needed and the only consequence of not owning it is the assert trigger. Instead of adding the locking to satisfy the assert, revert to the code that calls vm_page_free_phys() directly. Reported by: pho Discussed with: jeff Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-03-13 18:27:23 +00:00
Nathan Whitehorn	9b0ec025d4	Restore missing temporary variable, deleted by accident in r330845. This unbreaks the ppc32 AIM build. Reported by: jhibbits	2018-03-13 18:24:21 +00:00
Kyle Evans	63ee68c220	EFIRT: SetVirtualAddressMap with 1:1 mapping after exiting boot services This fixes a problem encountered on the Lenovo Thinkpad X220/Yoga 11e where runtime services would try to inexplicably jump to other parts of memory where it shouldn't be when attempting to enumerate EFI vars, causing a panic. The virtual mapping is enabled by default and can be disabled by setting efi_disable_vmap in loader.conf(5). Reviewed by: kib (earlier version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14677	2018-03-13 17:10:52 +00:00
Ed Maste	a95659f75f	Use C99 boolean type for translate_osrel Migrate to modern types before creating MD Linuxolator bits for new architectures. Reviewed by: cem Sponsored by: Turing Robotic Industries Inc. Differential Revision: https://reviews.freebsd.org/D14676	2018-03-13 16:40:29 +00:00
Nathan Whitehorn	8864f35942	Execute PowerPC64/AIM kernel from direct map region when possible. When the kernel can be in real mode in early boot, we can execute from high addresses aliased to the kernel's physical memory. If that high address has the first two bits set to 1 (0xc...), those addresses will automatically become part of the direct map. This reduces page table pressure from the kernel and it sets up the kernel to be used with radix translation, for which it has to be up here. This is accomplished by exploiting the fact that all PowerPC kernels are built as position-independent executables and relocate themselves on start. Before this patch, the kernel runs at 1:1 VA:PA, but that VA/PA is random and set by the bootloader. Very early, it processes its ELF relocations to operate wherever it happens to find itself. This patch uses that mechanism to re-enter and re-relocate the kernel a second time witha new base address set up in the early parts of powerpc_init(). Reviewed by: jhibbits Differential Revision: D14647	2018-03-13 15:03:58 +00:00
Kyle Evans	92f1731bf3	Correct minor typo in comment, efi_dmcap -> efi_tmcap	2018-03-13 15:02:46 +00:00
Kyle Evans	8521b4a9df	efirtc: Pass a dummy tmcap pointer to efi_get_time_locked As noted in the comment, UEFI spec claims the capabilities pointer is optional, but some implementations will choke and attempt to dereference it without checking. This specific problem was found on a Lenovo Thinkpad X220 that would panic in efirtc_identify.	2018-03-13 15:01:23 +00:00
Ed Maste	b7feabf906	Use C99 designated initializers for struct execsw It it makes use slightly more clear and facilitates grepping.	2018-03-13 13:09:10 +00:00
Roger Pau Monné	4a6d4e7b58	at_rtc: check in ACPI FADT boot flags if the RTC is present Or else disable the device. Note that the detection can be bypassed by setting the hw.atrtc.enable option in the loader configuration file. More information can be found on atrtc(4). Sponsored by: Citrix Systems R&D Reviewed by: ian Differential revision: https://reviews.freebsd.org/D14399	2018-03-13 09:42:33 +00:00

1 2 3 4 5 ...

121303 Commits