freebsd-skq

Author	SHA1	Message	Date
Kirk McKusick	d530fd484b	TRIM consolodation is supposed to be off by default	2018-08-20 21:19:21 +00:00
Warner Losh	49a49b37df	Move options INTRNG into std.armv6 and std.armv7 INTRNG is required on all armv6 and armv7 systems, so make it standard.	2018-08-20 20:31:53 +00:00
Bjoern A. Zeeb	10b070c166	GC inc_isipv6; it was added for "temp" compatibility in 2001, r86764 and does not seem to be used.	2018-08-20 20:06:36 +00:00
Konstantin Belousov	a997bcc015	Update comment about ABI of flush_l1s_sw to match the reality. CPUID instruction clobbers %rbx and %rdx. Sponsored by: The FreeBSD Foundation MFC after: 13 days	2018-08-20 19:09:39 +00:00
Konstantin Belousov	b0568ddbec	Always initialize PCPU kcr3 for vmspace0 pmap. If an exception or NMI occurs before CPU switched to a pmap different from vmspace0, PCPU kcr3 is left zero for pti config, which causes triple-fault in the handler. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-08-20 19:07:57 +00:00
Oleksandr Tymoshenko	5747fe4fb9	[ig4] add ACPI Device HID for AMD platforms Added ACPI Device HID AMDI0010 for the designware I2C controllers in future AMD platforms. Also, when verifying component version check for minimal value instead of exact match. PR: 230641 Submitted by: Rajesh <rajfbsd@gmail.com> Reviewed by: cem, gonzo MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16670	2018-08-20 18:50:56 +00:00
Alan Cox	44d0efb215	Eliminate kmem_alloc_contig()'s unused arena parameter. Reviewed by: hselasky, kib, markj Discussed with: jeff Differential Revision: https://reviews.freebsd.org/D16799	2018-08-20 15:57:27 +00:00
Alexander Motin	8805f3d7be	Remove extra M_ZERO from NG_MKRESPONSE() argument. NG_MKRESPONSE() sets M_ZERO by itself. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 1 week	2018-08-20 14:35:54 +00:00
Randall Stewart	c28440db29	This change represents a substantial restructure of the way we reassembly inbound tcp segments. The old algorithm just blindly dropped in segments without coalescing. This meant that every segment could take up greater and greater room on the linked list of segments. This of course is now subject to a tighter limit (100) of segments which in a high BDP situation will cause us to be a lot more in-efficent as we drop segments beyond 100 entries that we receive. What this restructure does is cause the reassembly buffer to coalesce segments putting an emphasis on the two common cases (which avoid walking the list of segments) i.e. where we add to the back of the queue of segments and where we add to the front. We also have the reassembly buffer supporting a couple of debug options (black box logging as well as counters for code coverage). These are compiled out by default but can be added by uncommenting the defines. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16626	2018-08-20 12:43:18 +00:00
John Baldwin	a800b45c18	Merge amd64 and i386 <machine/intr_machdep.h> headers. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16803	2018-08-20 12:31:39 +00:00
John-Mark Gurney	b060c61dfd	use sbuf so that lines are printed together... As aarch64 often has SMP enabled, lines can get intermixed with other console output making these lines hard to read... Reviewed by: manu Differential Revision: https://reviews.freebsd.org/D16689	2018-08-19 21:37:51 +00:00
Matt Macy	381388b9c4	add snps IP uart support / genaralize UART This is an amalgam of a patch by Doug Ambrisko to generalize uart_acpi_find_device, imp moving the ACPI table to uart_dev_ns8250.c and advice by jhb to work around a bug in the EPYC 3151 BIOS (the BIOS incorrectly marks the serial ports as disabled) Reviewed by: imp MFC after: 8 weeks Differential Revision: https://reviews.freebsd.org/D16432	2018-08-19 21:10:21 +00:00
Justin Hibbits	8d67357c5c	powerpc conf: Add PRINTF_BUFR_SIZE option to Book-E configs Without this, printf is very hard to follow at times on multicore systems.	2018-08-19 19:07:59 +00:00
Justin Hibbits	b793c8ab28	Sort SPR_SPEFSCR in the SPR list Also remove duplicate definition of SPR_IBAT0U.	2018-08-19 19:03:43 +00:00
Justin Hibbits	290646564b	powerpc64: Align frequently used/exclusive data on cacheline boundaries This is effectively a merge from amd64 of r312888, r323235, and r333486. I've been running this on my POWER9 Talos for some time now with no ill effects. Suggested by: mjg	2018-08-19 19:00:44 +00:00
Emmanuel Vadot	65aee3a872	arm64: allwinner: Add aw_syscon driver to GENERIC Recent DTS use the syscon for the emac controller. We support this but since U-Boot is still using old DTS it was never needed for us to add this support, but this is a problem when using upstream recent DTS and will be when U-Boot will catch up. While here add a new compatible to the aw_syscon driver as Linux changed it ...	2018-08-19 18:55:33 +00:00
Justin Hibbits	340a810bf0	booke pmap: hide debug-ish printf behind bootverbose It's not necessary during normal operation to know the mapped region size and wasted space.	2018-08-19 18:54:43 +00:00
Konstantin Belousov	c1141fba00	Update L1TF workaround to sustain L1D pollution from NMI. Current mitigation for L1TF in bhyve flushes L1D either by an explicit WRMSR command, or by software reading enough uninteresting data to fully populate all lines of L1D. If NMI occurs after either of methods is completed, but before VM entry, L1D becomes polluted with the cache lines touched by NMI handlers. There is no interesting data which NMI accesses, but something sensitive might be co-located on the same cache line, and then L1TF exposes that to a rogue guest. Use VM entry MSR load list to ensure atomicity of L1D cache and VM entry if updated microcode was loaded. If only software flush method is available, try to help the bhyve sw flusher by also flushing L1D on NMI exit to kernel mode. Suggested by and discussed with: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16790	2018-08-19 18:47:16 +00:00
John Baldwin	2734fedc4e	Fix a couple of comment nits.	2018-08-19 17:57:51 +00:00
Xin LI	56019a539f	Bump __FreeBSD_version after r338059 (Chacha20 based arc4random(3) and deprecation of arc4random_stir and arc4random_addrandom).	2018-08-19 17:47:30 +00:00
Xin LI	c1e80940f3	Update userland arc4random() with OpenBSD's Chacha20 based arc4random(). ObsoleteFiles.inc: Remove manual pages for arc4random_addrandom(3) and arc4random_stir(3). contrib/ntp/lib/isc/random.c: contrib/ntp/sntp/libevent/evutil_rand.c: Eliminate in-tree usage of arc4random_addrandom(). crypto/heimdal/lib/roken/rand.c: crypto/openssh/config.h: Eliminate in-tree usage of arc4random_stir(). include/stdlib.h: Remove arc4random_stir() and arc4random_addrandom() prototypes, provide temporary shims for transistion period. lib/libc/gen/Makefile.inc: Hook arc4random-compat.c to build, add hint for Chacha20 source for kernel, and remove arc4random_addrandom(3) and arc4random_stir(3) links. lib/libc/gen/arc4random.c: Adopt OpenBSD arc4random.c,v 1.54 with bare minimum changes, use the sys/crypto/chacha20 implementation of keystream. lib/libc/gen/Symbol.map: Remove arc4random_stir and arc4random_addrandom interfaces. lib/libc/gen/arc4random.h: Adopt OpenBSD arc4random.h,v 1.4 but provide _ARC4_LOCK of our own. lib/libc/gen/arc4random.3: Adopt OpenBSD arc4random.3,v 1.35 but keep FreeBSD r114444 and r118247. lib/libc/gen/arc4random-compat.c: Compatibility shims for arc4random_stir and arc4random_addrandom functions to preserve ABI. Log once when called but do nothing otherwise. lib/libc/gen/getentropy.c: lib/libc/include/libc_private.h: Fold __arc4_sysctl into getentropy.c (renamed to arnd_sysctl). Remove from libc_private.h as a result. sys/crypto/chacha20/chacha.c: sys/crypto/chacha20/chacha.h: Make it possible to use the kernel implementation in libc. PR: 182610 Reviewed by: cem, markm Obtained from: OpenBSD Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16760	2018-08-19 17:40:50 +00:00
John Baldwin	38a13e9002	Fix the MPTable probe code after the 4:4 changes on i386. The MPTable probe code was using PMAP_MAP_LOW as the PA -> VA offset when searching for the table signature but still using KERNBASE once it had found the table. As a result, the mpfps table pointed into a random part of the kernel text instead of the actual MP Table. Rather than adding more #ifdef's, use BIOS_PADDRTOVADDR from <machine/pc/bios.h> which already uses PMAP_MAP_LOW on i386 and KERNBASE on amd64. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D16802	2018-08-19 17:36:50 +00:00
Kirk McKusick	4de0d16b8c	For traditional disks, the filesystem attempts to allocate the blocks of a file as contiguously as possible. Since the filesystem does not know how large a file will grow when it is first being written, it initially places the file in a set of blocks in which it currently fits. As it grows, it is relocated to areas with larger contiguous blocks. In this way it saves its large contiguous sets of blocks for the files that need them and thus avoids unnecessaily fragmenting its disk space. We used to skip reallocating the blocks of a file into a contiguous sequence if the underlying flash device requested BIO_DELETE notifications, because devices that benefit from BIO_DELETE also benefit from not moving the data. However, in the algorithm described above that reallocates the blocks, the destination for the data is usually moved before the data is written to the initially allocated location. So we rarely suffer the penalty of extra writes. With the addition of the consolodation of contiguous blocks into single BIO_DELETE operations, having fewer but larger contiguous blocks reduces the number of (slow and expensive) BIO_DELETE operations. So when doing BIO_DELETE consolodation, we do block reallocation. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2018-08-19 17:19:20 +00:00
Kirk McKusick	fc6e171535	Add consolodation of TRIM / BIO_DELETE commands to the UFS/FFS filesystem. When deleting files on filesystems that are stored on flash-memory (solid-state) disk drives, the filesystem notifies the underlying disk of the blocks that it is no longer using. The notification allows the drive to avoid saving these blocks when it needs to flash (zero out) one of its flash pages. These notifications of no-longer-being-used blocks are referred to as TRIM notifications. In FreeBSD these TRIM notifications are sent from the filesystem to the drive using the BIO_DELETE command. Until now, the filesystem would send a separate message to the drive for each block of the file that was deleted. Each Gigabyte of file size resulted in over 3000 TRIM messages being sent to the drive. This burst of messages can overwhelm the drive's task queue causing multiple second delays for read and write requests. This implementation collects runs of contiguous blocks in the file and then consolodates them into a single BIO_DELETE command to the drive. The BIO_DELETE command describes the run of blocks as a single large block being deleted. Each Gigabyte of file size can result in as few as two BIO_DELETE commands and is typically less than ten. Though these larger BIO_DELETE commands take longer to run, they do not clog the drive task queue, so read and write commands can intersperse effectively with them. Though this new feature has been throughly reviewed and tested, it is being added disabled by default so as to minimize the possibility of disrupting the upcoming 12.0 release. It can be enabled by running ``sysctl vfs.ffs.dotrimcons=1''. Users are encouraged to test it. If no problems arise, we will consider requesting that it be enabled by default for 12.0. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix	2018-08-19 16:56:42 +00:00
John Baldwin	a568818913	Remove some vestiges of IPI_LAZYPMAP on i386. The support for lazy pmap invalidations on i386 was removed in r281707. This removes the constant for the IPI and stops accounting for it when sizing the interrupt count arrays. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16801	2018-08-19 16:14:59 +00:00
Michael Tuexen	8e02b4e00c	Don't expose the uptime via the TCP timestamps. The TCP client side or the TCP server side when not using SYN-cookies used the uptime as the TCP timestamp value. This patch uses in all cases an offset, which is the result of a keyed hash function taking the source and destination addresses and port numbers into account. The keyed hash function is the same a used for the initial TSN. Reviewed by: rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16636	2018-08-19 14:56:10 +00:00
Cy Schubert	683a58eeb9	The bucket index is subtracted by one at lines 2304 and 2314. When 0 it becomes -1, except these are unsigned integers, so they become very large numbers. Thus are always larger than the maximum bucket; the hash table insertion fails causing NAT to fail. This commit ensures that if the index is already zero it is not reduced prior to insertion into the hash table. PR: 208566	2018-08-19 13:45:03 +00:00
Cy Schubert	58a290b9f4	Add handy DTrace probes useful in diagnosing NAT issues. DTrace probes are situated next to error counters and/or in one instance prior to the -1 return from various functions. This was useful in diagnosis of PR/208566 and will be handy in the future diagnosing NAT failures. PR: 208566 MFC after: 3 days	2018-08-19 13:44:59 +00:00
Cy Schubert	1d6e9fe75c	Expose np (nat_t - an entry in the nat table structure) in the DTrace probe when nat fails (label badnat). This is useful in diagnosing failed NAT issues and was used in PR/208566. PR: 208566 MFC after: 3 days	2018-08-19 13:44:56 +00:00
Tai-hwa Liang	d17f8070a1	Extending the delay cycles to give the codec more time to pump ADC data across the AC-link. Without this patch, some CS4614 cards will need users to reload the driver manually or the hardware won't be initialised properly. Something like: # kldload snd_csa # kldunload snd_csa # kldload snd_csa Tested with: Terratec SiXPack 5.1+	2018-08-19 01:14:46 +00:00
Conrad Meyer	b8e771e97a	Back out r338035 until Warner is finished churning GSoC PNP patches I was not aware Warner was making or planning to make forward progress in this area and have since been informed of that. It's easy to apply/reapply when churn dies down.	2018-08-19 00:46:22 +00:00
Conrad Meyer	faa319436f	Remove unused and easy to misuse PNP macro parameter Inspired by r338025, just remove the element size parameter to the MODULE_PNP_INFO macro entirely. The 'table' parameter is now required to have correct pointer (or array) type. Since all invocations of the macro already had this property and the emitted PNP data continues to include the element size, there is no functional change. Mostly done with the coccinelle 'spatch' tool: $ cat modpnpsize0.cocci @normaltables@ identifier b,c; expression a,d,e; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,d, -sizeof(d[0]), e); @singletons@ identifier b,c,d; expression a; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,&d, -sizeof(d), 1); $ rg -l MODULE_PNP_INFO -- sys \| \ xargs spatch --in-place --sp-file modpnpsize0.cocci (Note that coccinelle invokes diff(1) via a PATH search and expects diff to tolerate the -B flag, which BSD diff does not. So I had to link gdiff into PATH as diff to use spatch.) Tinderbox'd (-DMAKE_JUST_KERNELS).	2018-08-19 00:22:21 +00:00
Alan Cox	94d0f0877d	Oops. r338030 didn't eliminate the unused arena argument from all of kmem_alloc_attr()'s callers. Correct that mistake.	2018-08-18 22:35:19 +00:00
Kirk McKusick	7e038bc257	Replace the TRIM consolodation framework originally added in -r337396 driven by problems found with the algorithms being tested for TRIM consolodation. Reported by: Peter Holm Suggested by: kib Reviewed by: kib Sponsored by: Netflix	2018-08-18 22:21:59 +00:00
Alan Cox	db7c2a4822	Eliminate the unused arena parameter from kmem_alloc_attr(). Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D16793	2018-08-18 22:07:48 +00:00
Kirk McKusick	cc91864c26	Revert -r337396. It is being replaced with a revised interface that resulted from testing and further reviews.	2018-08-18 21:21:06 +00:00
Dimitry Andric	a06da7bafe	Use the size of one bge_devs element for the MODULE_PNP_INFO macro, instead of the size of the whole bge_devs array. This should stop kldxref searching beyond the end of .rodata when it processes relocations, and emitting "unhandled relocation type" errors, at least on i386.	2018-08-18 20:41:43 +00:00
Konstantin Belousov	1ace6e5bea	Rudimentary AER reading code for ddb(4). This is very primitive code to inspect the PCI error state and AER error state, dump the log and clear errors, from ddb. pci_print_faulted_dev() is made external to allow calling it from other places. It was called from NMI handler but this chunk is not included. Also there is a tunable-controlled code to clear AER on device attach, disabled by default. All this code was useful to me when I debugged ACPI_DMAR failures (not faults) long time ago. Reviewed by: cem, imp (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7813	2018-08-18 20:35:19 +00:00
John Baldwin	8cd385fda0	Make 'device crypto' lines more consistent. - In configurations with a pseudo devices section, move 'device crypto' into that section. - Use a consistent comment. Note that other things common in kernel configs such as GELI also require 'device crypto', not just IPSEC. Reviewed by: rgrimes, cem, imp Differential Revision: https://reviews.freebsd.org/D16775	2018-08-18 20:32:08 +00:00
Kyle Evans	d529de874b	res_find: Fix fallback logic The fallback logic was broken if hints were found in multiple environments. If we found a hint in either the loader environment or the static environment, fallback would be incremented excessively when we returned to the environment-selection bits. These checks should have also been guarded by the fbacklvl checks. As a result, fbacklvl could quickly get to a point where we skip either the static environment and/or the static hints depending on which environments contained valid hints. The impact of this bug is minimal, mostly affecting mips boards that use static hints and may have hints in either the loader environment or the static environment. There may be better ways to express the searchable environments and describing their characteristics (immutable, already searched, etc.) but this may be revisited after 12 branches. Reported by: Dan Nelson <dnelson_1901@yahoo.com> Triaged by: Dan Nelson <dnelson_1901@yahoo.com> MFC after: 3 days	2018-08-18 19:45:56 +00:00
Rick Macklem	fdab4d3b29	Fix LORs between vn_start_write() and vn_lock() in nfsrv_copymr(). When coding the pNFS server, I added vn_start_write() calls in nfsrv_copymr() done while the vnodes were locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes the LORs by moving the vn_start_write() calls up to before where the vnodes are locked. For "tvp", the vn_start_write() probaby isn't necessary, because NFS mounts can't be suspended. However, I think doing so is harmless. Thanks go to kib@ for letting me know that I had introduced these LORs. This patch only affects the behaviour of the pNFS server when pnfsdscopymr(8) is used to recover a mirrored DS.	2018-08-18 19:14:06 +00:00
Alan Cox	067fd85894	Eliminate the arena parameter to kmem_malloc_domain(). It is redundant. The domain and flags parameters suffice. In fact, the related functions kmem_alloc_{attr,contig}_domain() don't have an arena parameter. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D16713	2018-08-18 18:33:50 +00:00
Konstantin Belousov	9e2d4791d1	Print L1D FLUSH feature. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-08-18 12:17:05 +00:00
Xin LI	ed1fa01ac4	Regen after r337998.	2018-08-18 06:33:51 +00:00
Xin LI	0362ec1e8e	getrandom(2) should not be restricted in capability mode.	2018-08-18 06:31:49 +00:00
Navdeep Parhar	e7e0844422	cxgbe(4): Replace T4_PKT_TIMESTAMP with something slightly less hackish.	2018-08-18 04:23:51 +00:00
Rick Macklem	3e5ba2e187	Fix LORs between vn_start_write() and vn_lock() in the pNFS server. When coding the pNFS server, I added several vn_start_write() calls done while the vnode was locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes this by removing the added vn_start_write() calls and modifying the code so that the extant vn_start_write() call before the NFS RPC/operation is done when needed by the pNFS server. Flags are changed so that LayoutCommit and LayoutReturn now get a vn_start_write() done for them. When the pNFS server is enabled, the code now also changes the flags for Getattr, so that the vn_start_write() is done for Getattr, since it may need to do a vn_set_extattr(). The nfs_writerpc flag array was made global to the NFS server and renamed nfsrv_writerpc, which is consistent naming for globals in the NFS server. Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is locked results in a LOR. This patch only affects the behaviour of the pNFS server.	2018-08-17 21:12:16 +00:00
Navdeep Parhar	a56e2056a3	cxgbe(4): Adjust ntids to account for nhptids in the TOE case too. This should have been part of r337538.	2018-08-17 20:28:31 +00:00
Navdeep Parhar	72049e7395	cxgbe/tom: Put the ifnet or VLAN's PCP value in the 802.1Q tag of frames generated by the TOE. Works with vid 0 (no VLAN, just priority) too. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-17 19:22:46 +00:00
Mark Johnston	1436ff1ebb	Typo. X-MFC with: r337974	2018-08-17 16:07:06 +00:00
Mark Johnston	3ccbdc8254	Add INVARIANTS-only fences around lockless vnode refcount updates. Some internal KASSERTs access the v_iflag field without the vnode interlock held after such a refcount update. The fences are needed for the assertions to be correct in the face of store reordering. Reported and tested by: jhibbits Reviewed by: kib, mjg MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16756	2018-08-17 15:41:01 +00:00
Alexander Motin	cd2315086a	9751 Allocation throttling misplacing ditto blocks Relax allocation throttling for ditto blocks. Due to random imbalances in allocation it tends to push block copies to one vdev, that looks slightly better at the moment. Slightly less strict policy allows both improve data security and surprisingly write performance, since we don't need to touch extra metaslabs on each vdev to respect the min distance. Sponsored by: iXsystems, Inc.	2018-08-17 15:17:09 +00:00
Alexander Motin	a8e93e3cd7	9738 Fix third block copy allocations, broken at 9112. Use METASLAB_WEIGHT_CLAIM weight to allocate tertiary blocks. Previous use of METASLAB_WEIGHT_SECONDARY for that caused errors later on metaslab_activate_allocator() call, leading to massive load of unneeded metaslabs and write freezes. Reviewed by: Paul Dagnelie <pcd@delphix.com>	2018-08-17 15:00:41 +00:00
Kristof Provost	d47023236c	pf: Limit the maximum number of fragments per packet Similar to the network stack issue fixed in r337782 pf did not limit the number of fragments per packet, which could be exploited to generate high CPU loads with a crafted series of packets. Limit each packet to no more than 64 fragments. This should be sufficient on typical networks to allow maximum-sized IP frames. This addresses the issue for both IPv4 and IPv6. MFC after: 3 days Security: CVE-2018-5391 Sponsored by: Klara Systems	2018-08-17 15:00:10 +00:00
Warner Losh	62ee5bbd73	GPT is standard in x86 and arm64 land. Add it to DEFAULTS with the others. Differential Revision: https://reviews.freebsd.org/D16740	2018-08-17 14:47:21 +00:00
Mariusz Zaborski	2fe6aefff8	capsicum: allow the setproctitle(3) function in capability mode Capsicum in past allowed to change the process title. This was broken with r335939. PR: 230584 Submitted by: Yuichiro NAITO <naito.yuichiro@gmail.com> Reported by: ian@niw.com.au MFC after: 1 week	2018-08-17 14:35:10 +00:00
Rick Macklem	9fbb0faf4f	Don't set a file's size for the MDS file of a pNFS service. When a pNFS service is running, the size of the files created on the MDS are normally 0, since the data is written to the data files on the DS(s). However, without this patch, if a Setattr with a non-zero size was done by a client, the MDS file was set to that size. This was thought to be benign, but it turns out that files with a non-zero size plus extended attributes can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this panic() has not been isolated, this patch avoids the panic() and leaves the MDS files in a consistent state of always having a size == 0. Note that these MDS files never store data. The patch also includes an unnecessary initialization of savsize in case some compiler or static analyser complains it might not be initialized. This patch only affects the NFS server when pNFS is enabled via the "-p" command line option on nfsd.	2018-08-17 12:32:38 +00:00
Conrad Meyer	9ebbebe4f7	cryptosoft: Reduce generality of supported algorithm composition Fix a regression introduced in r336439. Rather than allowing any linked list of algorithms, allow at most two (typically, some combination of encrypt and/or MAC). Removes a WAITOK malloc in an unsleepable context (classic LOR) by placing both software algorithm contexts within the OCF-managed session object. Tested with 'cryptocheck -a all -d cryptosoft0', which includes some encrypt-and-MAC modes. PR: 230304 Reported by: sef@	2018-08-17 04:40:01 +00:00
Justin Hibbits	b14959dacc	random: Add PowerPC 'darn' instruction entropy source Summary: PowerISA 3.0 adds a 'darn' instruction to "deliver a random number". This driver was modeled after (rather, copied and gutted of) the Ivy Bridge rdrand driver. This uses the "Conditional Random Number" behavior to remove input bias. From the ISA reference the 'darn' instruction, and the random number generator backing it, conforms to the NIST SP800-90B and SP800-90C standards, compliant to the extent possible at the time the hardware was designed, and guarantees a minimum 0.5 bits of entropy per bit returned. Reviewed By: markm, secteam (delphij) Approved by: secteam (delphij) Differential Revision: https://reviews.freebsd.org/D16552	2018-08-17 03:49:07 +00:00
Kyle Evans	45625675e7	subr_prf: Don't write kern.boot_tag if it's empty This change allows one to set kern.boot_tag="" and not get a blank line preceding other boot messages. While this isn't super critical- blank lines are easy to filter out both mentally and in processing dmesg later- it allows for a mode of operation that matches previous behavior. I intend to MFC this whole series to stable/11 by the end of the month with boot_tag empty by default to make this effectively a nop in the stable branch.	2018-08-17 03:42:57 +00:00
Conrad Meyer	08d77c0178	Riscv: Include crypto for IPSec Similar to r337944. I think this is the last configuration that includes IPsec but not crypto.	2018-08-17 01:08:22 +00:00
Conrad Meyer	923d206149	arm: Define crypto option on platforms that include IPsec Missed in r337940. (It's not like there are any crypto files IPsec doesn't pull in, so it is unclear what not defining the crypto option was supposed to achieve.) Reported by: np@	2018-08-17 01:04:02 +00:00
Navdeep Parhar	32a52e9e39	if_vlan(4): A VLAN always has a PCP and its ifnet's if_pcp should be set to the PCP value in use instead of IFNET_PCP_NONE. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-17 01:03:23 +00:00
Conrad Meyer	25b7033b73	crypto(4): Add cryptosoft, cryptodev support for Poly-1305	2018-08-17 00:31:06 +00:00
Conrad Meyer	01d5de8fca	Add xform-conforming auth_hash wrapper for Poly-1305 The wrapper is a thin shim around libsodium's Poly-1305 implementation. For now, we just use the C algorithm and do not attempt to build the SSE-optimized variant for x86 processors. The algorithm support has not yet been plumbed through cryptodev, or added to cryptosoft.	2018-08-17 00:30:04 +00:00
Conrad Meyer	f36e41e20b	Bring in compatibility glue for libsodium The idea is untouched upstream sources live in sys/contrib/libsodium. sys/crypto/libsodium are support routines or compatibility headers to allow building unmodified upstream code. This is not yet integrated into the build system, so no functional change.	2018-08-17 00:27:56 +00:00
Conrad Meyer	0ac341f145	Bring in libsodium to sys/contrib Bring in https://github.com/jedisct1/libsodium at 461ac93b260b91db8ad957f5a576860e3e9c88a1 (August 7, 2018), unmodified. libsodium is derived from Daniel J. Bernstein et al.'s 2011 NaCl ("Networking and Cryptography Library," pronounced "salt") software library. At the risk of oversimplifying, libsodium primarily exists to make it easier to use NaCl. NaCl and libsodium provide high quality implementations of a number of useful cryptographic concepts (as well as the underlying primitics) seeing some adoption in newer network protocols. I considered but dismissed cleaning up the directory hierarchy and discarding artifacts of other build systems in favor of remaining close to upstream (and easing future updates). Nothing is integrated into the build system yet, so in that sense, no functional change.	2018-08-17 00:23:50 +00:00
Glen Barber	9dae4c97a0	Rename head from ALPHA1 to ALPHA2 in preparation for the next set of snapshot builds. Hashtag: MaximumEffort Approved by: re (implicit) Sponsored by: The FreeBSD Foundation	2018-08-16 23:58:22 +00:00
Navdeep Parhar	32d2623ae2	Add the ability to look up the 3b PCP of a VLAN interface. Use it in toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752	2018-08-16 23:46:38 +00:00
Jamie Gritton	c542c43ef1	Revert r337922, except for some documention-only bits. This needs to wait until user is changed to stop using jail(2). Differential Revision: D14791	2018-08-16 19:09:43 +00:00
Alexander Motin	6d14f2c48f	Make vfs.zfs.zio.dva_throttle_enabled sysctl writable. Not sure what I thought originally, but as I see now runtime changes are working fine, and the code seems like even designed for this.	2018-08-16 18:44:50 +00:00
Jamie Gritton	284001a222	Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating jails since FreeBSD 7. Along with the system call, put the various security.jail.allow_foo and security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or BURN_BRIDGES). These sysctls had two disparate uses: on the system side, they were global permissions for jails created via jail(2) which lacked fine-grained permission controls; inside a jail, they're read-only descriptions of what the current jail is allowed to do. The first use is obsolete along with jail(2), but keep them for the second-read-only use. Differential Revision: D14791	2018-08-16 18:40:16 +00:00
Doug Ambrisko	3991dbf3fa	Fix a module Makefile error on amd64 so the IPMI HW interfaces are built. When the module is being unloaded and no HW interfaces were created don't clean up. This was exposed by the amd64 module build issue.	2018-08-16 15:59:02 +00:00
Andrew Turner	a24d3c094e	Remove the L1 and L2 xscale page defines and rename the generic macros to the common name. While here move the macros to check these into pmap-v4.c as they're only used there. Sponsored by: DARPA, AFRL	2018-08-16 10:00:51 +00:00
Andrey V. Elsukov	8065bd0bca	Properly initialize IP version in IPv6 header. This was missed in r334673. Reported by: Lars Schotte <lars at gustik dot eu>	2018-08-16 09:19:06 +00:00
Alexander Motin	edc391e922	Add couple tunables/sysctl, missed in r336949.	2018-08-16 00:50:14 +00:00
Emmanuel Vadot	8c000099ec	am335x: Add pocketbeagle DTS to the build U-Boot works for this board since 2018.07 and the DTS is now present in the tree.	2018-08-15 21:47:03 +00:00
Navdeep Parhar	9f78434942	cxgbe(4): Use VLAN_TRUNKDEV instead of private cookie to figure out the parent of a VLAN ifnet. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-15 21:24:05 +00:00
Alexander Motin	8ce70dfcfa	Fix mismerge in r337196. ZoL did the same mistake, and fixed it with separate commit 863522b1f9: dsl_scan_scrub_cb: don't double-account non-embedded blocks We were doing count_block() twice inside this function, once unconditionally at the beginning (intended to catch the embedded block case) and once near the end after processing the block. The double-accounting caused the "zpool scrub" progress statistics in "zpool status" to climb from 0% to 200% instead of 0% to 100%, and showed double the I/O rate it was actually seeing. This was apparently a regression introduced in commit 00c405b4b5e8, which was an incorrect port of this OpenZFS commit: https://github.com/openzfs/openzfs/commit/d8a447a7 Reviewed by: Thomas Caputi <tcaputi@datto.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Steven Noonan <steven@uplinklabs.net> Closes #7720 Closes #7738 Reported by: sef	2018-08-15 21:01:57 +00:00
Matt Macy	f9be038601	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu	2018-08-15 20:23:08 +00:00
Alexander Leidinger	f6c0e63bf7	- Add exec hook "exec.created". This is called when the jail is created and before exec.start is called. [1] - Bump __FreeBSD_version. This allows to attach ZFS datasets and various other things to be done before any command/service/rc-script is started in the new jail. PR: 228066 [1] Reviewed by: jamie [1] Submitted by: Stefan Grönke <stefan@gronke.net> [1] Differential Revision: https://reviews.freebsd.org/D15330 [1]	2018-08-15 18:35:42 +00:00
Conrad Meyer	5cb27f0813	FUSE: Document global sysctl knobs So that I don't have to keep grepping around the codebase to remember what each one does. And maybe it saves someone else some time. Fix a trivial whitespace issue while here. No functional change. Sponsored by: Dell EMC Isilon	2018-08-15 17:41:19 +00:00
Luiz Otavio O Souza	a0376d4d29	Fix a typo in comment. MFC after: 3 days X-MFC with: r321316 Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-15 16:36:29 +00:00
Luiz Otavio O Souza	59b2022f94	Late style follow up on r312770. Submitted by: glebius X-MFC with: r312770 MFC after: 3 days	2018-08-15 15:44:30 +00:00
Andrew Turner	d8ba426351	Remove pmap_kenter_section from the arm pmap. It's unused. Sponsored by: DARPA, AFRL	2018-08-15 14:57:34 +00:00
Andrew Turner	ac2d0191fc	Remove ARM_HAVE_SUPERSECTIONS. It was only supported on some XScale CPUs. Sponsored by: DARPA, AFRL	2018-08-15 14:52:56 +00:00
Andrew Turner	d20512fab9	Make code and data only used within the arm pmap code as static. Sponsored by: DARPA, AFRL	2018-08-15 14:45:01 +00:00
Andrew Turner	fa68430df2	Remove arm pmap variables that are only ever set and never read. Sponsored by: DARPA, AFRL	2018-08-15 14:29:04 +00:00
Andrew Turner	8082df3c7e	Remove ARM_MMU_GENERIC, it's the only ARMV4/v5 MMU we support. Sponsored by: DARPA, AFRL	2018-08-15 14:19:07 +00:00
Andrew Turner	58e9644017	Remove the ARMv5 pmap function pointers. These were to support XScale so are now unused. Sponsored by: DARPA, AFRL	2018-08-15 13:52:31 +00:00
Andrew Turner	795272d885	Remove checks for now unsupported CPU_* values in arm headers. Sponsored by: DARPA, AFRL	2018-08-15 13:48:59 +00:00
Luiz Otavio O Souza	02fd7b50a0	The interface name must be sanitized before the search to match the existing netgraph node. Fixes the search (and use) of VLANs with dot notation. Obtained from: pfSense Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-15 13:42:22 +00:00
Andrew Turner	daa5b12a0a	Start to remove XScale support from the ARMv4/v5 pmap. Support for XScale has been removed from the kernel so we can remove it from here to help simplify the code. Sponsored by: DARPA, AFRL	2018-08-15 13:40:16 +00:00
Andrew Turner	916e7b1252	Set the Execute Never flags in EFI device memory as required by the ARMv8 spec. Sponsored by: DARPA, AFRL	2018-08-15 13:19:15 +00:00
Andrew Turner	f13a4096b7	Remove PHYSADDR from kernel configurations that don't need it. The only place we need to set it is when we also have FLASHADDR set. Sponsored by: DARPA, AFRL	2018-08-15 13:13:19 +00:00
Andrew Turner	559cb76c51	Remove the VIRT armv7 kernel config. It is supported by GENERIC. Sponsored by: DARPA, AFRL	2018-08-15 13:03:01 +00:00
Konstantin Belousov	54564eda77	Fix early EFIRT on PCID machines after r337773. Ensure that the valid PCID state is created for proc0 pmap, since it might be used by efirt enter() before first context switch on the BSP. Sponsored by: The FreeBSD Foundation MFC after: 6 days	2018-08-15 12:48:49 +00:00
Edward Tomasz Napierala	e77b6cfe34	In the help message at the mountroot prompt, suggest something that actually works and matches the bsdinstall(8) default. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2018-08-15 12:12:21 +00:00
Toomas Soome	527d337fdb	cd9660 pointer sign issues and missing __packed attribute The isonum_* functions are defined to take unsigend char* as an argument, but the structure fields are defined as char. Change to u_char where needed. Probably the full structure should be changed, but I'm not sure about the side affects. While there, add __packed attribute. Differential Revision: https://reviews.freebsd.org/D16564	2018-08-15 06:42:31 +00:00
Navdeep Parhar	51347c3ff1	cxgbe(4): Use two hashes instead of a table to keep track of hashfilters. Two because the driver needs to look up a hashfilter by its 4-tuple or tid. A couple of fixes while here: - Reject attempts to add duplicate hashfilters. - Do not assume that any part of the 4-tuple that isn't specified is 0. This makes it consistent with all other mandatory parameters that already require explicit user input. MFC after: 2 weeks Sponsored by: Chelsio Communications	2018-08-15 03:03:01 +00:00
Warner Losh	74cc33ce57	Flesh out a comment about what we're doing with read bias and trims. Sponsored by: Netflix	2018-08-15 00:15:40 +00:00
Warner Losh	56b9659ee7	arm/ralink cleanup Remove the non-INTRNG code. Remove left over cut and paste code from the lpc code that was the start for the port. Set KERNPHYSADDR and KERNVIRTADDR Tested on Buffalo_WZR2-G300N Differential Revision: https://reviews.freebsd.org/D16622	2018-08-14 20:45:43 +00:00
Mark Johnston	b8abc9d8f5	Help ensure that the copy loop doesn't get converted to a memcpy() call. Reported and reviewed by: kib X-MFC with: r337715 Sponsored by: The FreeBSD Foundation	2018-08-14 19:21:31 +00:00
David Bright	53e992cfb9	Fix several memory leaks. The libkqueue tests have several places that leak memory by using an idiom like: puts(kevent_to_str(kevp)); Rework to save the pointer returned from kevent_to_str() and then free() it after it has been used. Reported by: asomers (pointer to Coverity), Coverity CID: 1296063, 1296064, 1296065, 1296066, 1296067, 1350287, 1394960 Sponsored by: Dell EMC	2018-08-14 19:12:45 +00:00
Luiz Otavio O Souza	e13a20dad7	Disable the auto negotiation if the port is set to fixed-link. Tested on SG-3100 (ARMADA38X) and Espresso.bin (A37x0). Fixes the network on espresso.bin. Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-14 18:58:16 +00:00
Jonathan T. Looney	2ceeacbe71	Lower the default limits on the IPv6 reassembly queue. Currently, the limits are quite high. On machines with millions of mbuf clusters, the reassembly queue limits can also run into the millions. Lower these values. Also, try to ensure that no bucket will have a reassembly queue larger than approximately 100 items. This limits the cost to find the correct reassembly queue when processing an incoming fragment. Due to the low limits on each bucket's length, increase the size of the hash table from 64 to 1024. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:32:07 +00:00
Jonathan T. Looney	a967df1c8f	Lower the default limits on the IPv4 reassembly queue. In particular, try to ensure that no bucket will have a reassembly queue larger than approximately 100 items. This limits the cost to find the correct reassembly queue when processing an incoming fragment. Due to the low limits on each bucket's length, increase the size of the hash table from 64 to 1024. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:30:46 +00:00
Konstantin Belousov	c30578feeb	Provide part of the mitigation for L1TF-VMM. On the guest entry in bhyve, flush L1 data cache, using either L1D flush command MSR if available, or by reading enough uninteresting data to fill whole cache. Flush is automatically enabled on CPUs which do not report RDCL_NO, and can be disabled with the hw.vmm.l1d_flush tunable/kenv. Security: CVE-2018-3646 Reviewed by: emaste. jhb, Tony Luck <tony.luck@intel.com> Sponsored by: The FreeBSD Foundation	2018-08-14 17:29:41 +00:00
Jonathan T. Looney	5f9f192dc5	Drop 0-byte IPv6 fragments. Currently, we process IPv6 fragments with 0 bytes of payload, add them to the reassembly queue, and do not recognize them as duplicating or overlapping with adjacent 0-byte fragments. An attacker can exploit this to create long fragment queues. There is no legitimate reason for a fragment with no payload. However, because IPv6 packets with an empty payload are acceptable, allow an "atomic" fragment with no payload. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:29:22 +00:00
Jonathan T. Looney	1e9f3b734e	Implement a limit on on the number of IPv6 reassembly queues per bucket. There is a hashing algorithm which should distribute IPv6 reassembly queues across the available buckets in a relatively even way. However, if there is a flaw in the hashing algorithm which allows a large number of IPv6 fragment reassembly queues to end up in a single bucket, a per- bucket limit could help mitigate the performance impact of this flaw. Implement such a limit, with a default of twice the maximum number of reassembly queues divided by the number of buckets. Recalculate the limit any time the maximum number of reassembly queues changes. However, allow the user to override the value using a sysctl (net.inet6.ip6.maxfragbucketsize). Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:27:41 +00:00
Jonathan T. Looney	03c99d7662	Add a limit of the number of fragments per IPv6 packet. The IPv4 fragment reassembly code supports a limit on the number of fragments per packet. The default limit is currently 17 fragments. Among other things, this limit serves to limit the number of fragments the code must parse when trying to reassembly a packet. Add a limit to the IPv6 reassembly code. By default, limit a packet to 65 fragments (64 on the queue, plus one final fragment to complete the packet). This allows an average fragment size of 1,008 bytes, which should be sufficient to hold a fragment. (Recall that the IPv6 minimum MTU is 1280 bytes. Therefore, this configuration allows a full-size IPv6 packet to be fragmented on a link with the minimum MTU and still carry approximately 272 bytes of headers before the fragmented portion of the packet.) Users can adjust this limit using the net.inet6.ip6.maxfragsperpacket sysctl. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:26:07 +00:00
Jonathan T. Looney	2adfd64f35	Make the IPv6 fragment limits be global, rather than per-VNET, limits. The IPv6 reassembly fragment limit is based on the number of mbuf clusters, which are a global resource. However, the limit is currently applied on a per-VNET basis. Given enough VNETs (or given sufficient customization on enough VNETs), it is possible that the sum of all the VNET fragment limits will exceed the number of mbuf clusters available in the system. Given the fact that the fragment limits are intended (at least in part) to regulate access to a global resource, the IPv6 fragment limit should be applied on a global basis. Note that it is still possible to disable fragmentation for a particular VNET by setting the net.inet6.ip6.maxfragpackets sysctl to 0 for that VNET. In addition, it is now possible to disable fragmentation globally by setting the net.inet6.ip6.maxfrags sysctl to 0. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:24:26 +00:00
Jonathan T. Looney	ff790bbad0	Implement a limit on on the number of IPv4 reassembly queues per bucket. There is a hashing algorithm which should distribute IPv4 reassembly queues across the available buckets in a relatively even way. However, if there is a flaw in the hashing algorithm which allows a large number of IPv4 fragment reassembly queues to end up in a single bucket, a per- bucket limit could help mitigate the performance impact of this flaw. Implement such a limit, with a default of twice the maximum number of reassembly queues divided by the number of buckets. Recalculate the limit any time the maximum number of reassembly queues changes. However, allow the user to override the value using a sysctl (net.inet.ip.maxfragbucketsize). Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:23:05 +00:00
Jonathan T. Looney	7b9c5eb0a5	Add a global limit on the number of IPv4 fragments. The IP reassembly fragment limit is based on the number of mbuf clusters, which are a global resource. However, the limit is currently applied on a per-VNET basis. Given enough VNETs (or given sufficient customization of enough VNETs), it is possible that the sum of all the VNET limits will exceed the number of mbuf clusters available in the system. Given the fact that the fragment limit is intended (at least in part) to regulate access to a global resource, the fragment limit should be applied on a global basis. VNET-specific limits can be adjusted by modifying the net.inet.ip.maxfragpackets and net.inet.ip.maxfragsperpacket sysctls. To disable fragment reassembly globally, set net.inet.ip.maxfrags to 0. To disable fragment reassembly for a particular VNET, set net.inet.ip.maxfragpackets to 0. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:19:49 +00:00
Konstantin Belousov	8d32b46379	Add definitions related to the L1D flush operation capability and MSR. Sponsored by: The FreeBSD Foundation	2018-08-14 17:19:11 +00:00
Jonathan T. Looney	80d7a85390	Improve IPv6 reassembly performance by hashing fragments into buckets. Currently, all IPv6 fragment reassembly queues are kept in a flat linked list. This has a number of implications. Two significant implications are: all reassembly operations share a common lock, and it is possible for the linked list to grow quite large. Improve IPv6 reassembly performance by hashing fragments into buckets, each of which has its own lock. Calculate the hash key using a Jenkins hash with a random seed. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:17:37 +00:00
Jonathan T. Looney	5d9bd45518	Improve hashing of IPv4 fragments. Currently, IPv4 fragments are hashed into buckets based on a 32-bit key which is calculated by (src_ip ^ ip_id) and combined with a random seed. However, because an attacker can control the values of src_ip and ip_id, it is possible to construct an attack which causes very deep chains to form in a given bucket. To ensure more uniform distribution (and lower predictability for an attacker), calculate the hash based on a key which includes all the fields we use to identify a reassembly queue (dst_ip, src_ip, ip_id, and the ip protocol) as well as a random seed. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:15:47 +00:00
Konstantin Belousov	9840c7373c	Reserve page at the physical address zero on amd64. We always zero the invalidated PTE/PDE for superpage, which means that L1TF CPU vulnerability (CVE-2018-3620) can be only used for reading from the page at zero. Note that both i386 and amd64 exclude the page from phys_avail[] array, so this change is redundant, but I think that phys_avail[] on UEFI-boot does not need to do that. Eventually the blacklisting should be made conditional on CPUs which report that they are not vulnerable to L1TF. Reviewed by: emaste. jhb Sponsored by: The FreeBSD Foundation	2018-08-14 17:14:33 +00:00
Konstantin Belousov	8fba5348fc	amd64: ensure that curproc->p_vmspace pmap always matches PCPU curpmap. When performing context switch on a machine without PCID, if current %cr3 equals to the new pmap %cr3, which is typical for kernel_pmap vs. kernel process, I overlooked to update PCPU curpmap value. Remove check for %cr3 not equal to pm_cr3 for doing the update. It is believed that this case cannot happen at all, due to other changes in this revision. Also, do not set the very first curpmap to kernel_pmap, it should be vmspace0 pmap instead to match curproc. Move the common code to activate the initial pmap both on BSP and APs into pmap_activate_boot() helper. Reported by: eadler, ambrisko Discussed with: kevans Reviewed by: alc, markj (previous version) Tested by: ambrisko (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D16618	2018-08-14 16:37:14 +00:00
Luiz Otavio O Souza	6f207f5b47	Add support to the Marvell Xenon SDHCI controller. Tested on Espresso.bin (37x0) and Macchiato.bin (8k) with SD cards and eMMCs. Obtained from: pfSense Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-14 16:33:30 +00:00
Ruslan Bukin	5bcd113c91	Query MVPConf0.PVPE for number of CPUs. Rather than hard-coding the number of CPUs to 2, look up the PVPE field in MVPConf0, as the valid VPE numbers are from 0 to PVPE inclusive. Submitted by: "James Clarke" <jrtc4@cam.ac.uk> Reviewed by: br Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16644	2018-08-14 16:29:10 +00:00
Konstantin Belousov	ef52dc71eb	Fix typo. Noted by: alc MFC after: 3 days	2018-08-14 16:27:17 +00:00
Ruslan Bukin	b3410bc623	Avoid repeated address calculation for malta_ap_boot. Submitted by: "James Clarke" <jrtc4@cam.ac.uk> Reviewed by: br, arichardson Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16655	2018-08-14 16:26:44 +00:00
Ruslan Bukin	9aa2d5e4fa	Remove unused code. Sponsored by: DARPA, AFRL	2018-08-14 16:22:14 +00:00
Ruslan Bukin	2cfd37def0	Rewrite RISC-V disassembler: - Use macroses from encoding.h generated by riscv-opcodes. - Add support for C-compressed ISA extension. Sponsored by: DARPA, AFRL	2018-08-14 16:03:03 +00:00
Andrew Turner	3f9baabdd0	Remove cpu_pfr from arm. It's unused.	2018-08-14 16:01:25 +00:00
Andrew Turner	52a532939b	Remove an old comment now the code it references has been removed.	2018-08-14 15:48:13 +00:00
Andrew Turner	27e0028cdd	Fix the spelling of armv4_idcache_inv_all in an END macro.	2018-08-14 15:42:27 +00:00
Luiz Otavio O Souza	37844eaacf	Use the correct PTE when changing the attribute of multiple pages. Submitted by: andrew (long time ago) Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-14 15:27:50 +00:00
Mark Johnston	27f4c235ee	Explain why we aren't using memcpy(). Reported by: jmg X-MFC with: r337715 Sponsored by: The FreeBSD Foundation	2018-08-14 14:50:06 +00:00
Mark Johnston	845800e190	Don't use memcpy() in the early microcode loading code. At some point memcpy() may be an ifunc, ifunc resolution cannot be done until CPU identification has been performed, and CPU identification must be done after loading any microcode updates. X-MFC with: r337715 Sponsored by: The FreeBSD Foundation	2018-08-14 14:02:53 +00:00
Luiz Otavio O Souza	217643e7da	Fix a typo on the PSCI smc call wrapper. Looks good from: andrew Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-14 13:56:49 +00:00
Mark Johnston	3571aee662	Fix the !SMP x86 build. Reported by: Michael Butler <imb@protected-networks.net> X-MFC with: r337715 Sponsored by: The FreeBSD Foundation	2018-08-14 13:56:42 +00:00
Andrew Turner	398810619c	Support reading from the arm64 ID registers from userspace. Trap reads to the arm64 ID registers and write a safe value into them. This will allow us to put more useful values in these later and have userland check them to find what features the hardware supports. These are currently safe defaults, but will later be populated with better values from the hardware. Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16533	2018-08-14 11:00:54 +00:00
Michael Tuexen	98f04e431e	Use a macro to set the assoc state. I missed this in r337706.	2018-08-14 08:33:47 +00:00
Michael Tuexen	0f1346f7f4	Remove a set but not used warning showing up in usrsctp.	2018-08-14 08:32:33 +00:00
Andrey V. Elsukov	62484790e0	Restore ability to send ICMP and ICMPv6 redirects. It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled only when sending redirects for corresponding IP version is disabled via sysctl. Otherwise will be used default forwarding function. PR: 221137 Submitted by: mckay@ MFC after: 2 weeks	2018-08-14 07:54:14 +00:00
Matt Macy	81eb4dcf9e	Add library and kernel support for AMD Family 17h counters NB: lacks default sample rate for most counters	2018-08-14 05:18:43 +00:00
Ian Lepore	5af4ab6524	Export the eeprom device size via readonly sysctl. Also export the write page size and address size, although they are likely to be inherently less-interesting values outside of the driver.	2018-08-13 23:53:11 +00:00
Brooks Davis	8f4dfca127	Copy out from kernel to data, not the other way around. MFC after: 3 days Sponsored by: DARPA, AFRL	2018-08-13 21:53:18 +00:00
Marius Strobl	73ed47f04f	Remove the duplicated CSUM_IP6_TCP introduced in r311849 from the TX checksum capabilities of IGB-class MACs. While at it, fix the line wrapping. PR: 230571	2018-08-13 20:29:39 +00:00
Warner Losh	acc173a6aa	Port the mps panic-safe shutdown_final handling to mpr r330951 by smh fixed the mps driver to avoid deadlocks when panicing. The same code is needed for mpr, so port it here, along with the fix which allows the CCBs scheduled to complete avoiding at least a scary message and likely other unintended consequences. Sponsored by: Netflix Differential Review: https://reviews.freebsd.org/D16663	2018-08-13 19:59:42 +00:00
Warner Losh	d4b95382ee	Call xpt_sim_poll in shutdown_final handler. When we're shutting down, we send a number of start/stop commands to the known targets. We have to wait for them to complete. During a panic, the interrupts are off, and using pause to wait for them to fire and complete won't work: we have to poll after pause returns so the completion routines of the CCBs run so we decrement work outstanding counts. Sponsored by: Netflix Differential Review: https://reviews.freebsd.org/D16663	2018-08-13 19:59:37 +00:00
Warner Losh	0cc28e3cd5	Create xpt_sim_poll and refactor a bit using it. xpt_sim_poll takes the sim to poll as an argument. It will do the proper locking protocol, call the SIM polling routine, and then call camisr_runqueue to process completions on any CCBs the SIM's poll routine completed. It will be used during late shutdown when a SIM is waiting for CCBs it sent during shutdown to finish and the scheduler isn't running because we've panic'd. This sequence was used twice in cam_xpt, so refactor those to use this new function. Sponsored by: Netflix Differential Review: https://reviews.freebsd.org/D16663	2018-08-13 19:59:32 +00:00
Navdeep Parhar	408954013a	Whitespace nit in t4_tom.h	2018-08-13 19:21:28 +00:00
Vladimir Kondratyev	48f2b00648	evdev: Remove evdev.ko linkage dependency on kbd driver Move evdev_ev_kbd_event() helper from evdev to kbd.c as otherwise evdev unconditionally requires all keyboard and console stuff to be compiled into the kernel. This dependency happens as evdev_ev_kbd_event() helper references kbdsw global variable defined in kbd.c through use of kbdd_ioctl() macro. While here make all keyboard drivers respect evdev_rcpt_mask while setting typematic rate and LEDs with evdev interface. Requested by: Milan Obuch <bsd@dino.sk> Reviewed by: hselasky, gonzo Differential Revision: https://reviews.freebsd.org/D16614	2018-08-13 19:05:53 +00:00
Vladimir Kondratyev	911aed94fa	evdev: remove soft context from evdev methods parameter list. Now softc should be retrieved from struct edvev * pointer with evdev_get_softc() helper. wmt(4) is a sample of driver that support both KPI. Reviewed by: hselasky, gonzo Differential Revision: https://reviews.freebsd.org/D16614	2018-08-13 19:00:42 +00:00
Oleksandr Tymoshenko	b16d03ad6e	[ig4] Fix initialization sequence for newer ig4 chips Newer chips may require assert/deassert after power down for proper startup. Check respective flag in DEVIDLE_CTRL and perform operation if neccesssary. PR: 221777 Submitted by: marc.priggemeyer@gmail.com Obtained from: DragonFly BSD Tested on: Thinkpad T470	2018-08-13 18:53:14 +00:00
Mark Johnston	97edfc1b45	Implement kernel support for early loading of Intel microcode updates. Updates in the format described in section 9.11 of the Intel SDM can now be applied as one of the first steps in booting the kernel. Updates that are loaded this way are automatically re-applied upon exit from ACPI sleep states, in contrast with the existing cpucontrol(8)-based method. For the time being only Intel updates are supported. Microcode update files are passed to the kernel via loader(8). The file type must be "cpu_microcode" in order for the file to be recognized as a candidate microcode update. Updates for multiple CPU types may be concatenated together into a single file, in which case the kernel will select and apply a matching update. Memory used to store the update file will be freed back to the system once the update is applied, so this approach will not consume more memory than required. Reviewed by: kib MFC after: 6 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16370	2018-08-13 17:13:09 +00:00
Konstantin Belousov	c1344d2bbe	Prevent some parallel swap-ins, rate-limit swapper swap-ins. If faultin() was called outside swapper (from PHOLD()), do not allow swapper to initiate additional swap-ins. Swapper' initiated swap-ins are serialized because they are synchronous and executed in the context of the thread0. With the added limitation, we only allow parallel swap-ins from PHOLD(), which is up to PHOLD() users to manage, usually they do not need to. Rate-limit swapper' swap-ins to one in the MAXSLP / 2 seconds interval, counting faultin() swapins. Suggested by: alc Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16610	2018-08-13 16:48:46 +00:00
Jung-uk Kim	51f42bad71	Merge ACPICA 20180810.	2018-08-13 16:26:26 +00:00
Ruslan Bukin	c1d0e057d8	Add RISC-V instructions encoding. This is the output of $ cat opcodes opcodes-rvc-pseudo opcodes-rvc opcodes-custom \| ./parse-opcodes -c It is confirmed by author that the output of parse-opcodes is in the public domain. This will be required for DDB disassembler. Discussed with: Andrew Waterman <waterman@eecs.berkeley.edu> Obtained from: https://github.com/riscv/riscv-opcodes Sponsored by: DARPA, AFRL	2018-08-13 16:07:18 +00:00
Andrew Gallatin	5ccac9f972	lagg: allow lacp to manage the link state Lacp needs to manage the link state itself. Unlike other lagg protocols, the ability of lacp to pass traffic depends not only on the lagg members having link, but also on the lacp protocol converging to a distributing state with the link partner. If we prematurely mark the link as up, then we will send a gratuitous arp (via arp_handle_ifllchange()) before the lacp interface is capable of passing traffic. When this happens, the gratuitous arp is lost, and our link partner may cache a stale mac address (eg, when the base mac address for the lagg bundle changes, due to a BIOS change re-ordering NIC unit numbers) Reviewed by: jtl, hselasky Sponsored by: Netflix	2018-08-13 14:13:25 +00:00
Michael Tuexen	839d21d62e	Use the stacb instead of the asoc in state macros. This is not a functional change. Just a preparation for upcoming dtrace state change provider support.	2018-08-13 13:58:45 +00:00
Michael Tuexen	61a2188021	Use consistently the macors to modify the assoc state. No functional change.	2018-08-13 11:56:21 +00:00
Michal Meloun	23242e7a9c	Add USB ID for rebranded RTL8153 found on NVIDIA Jetson TX1 board. MFC after: 3 days	2018-08-13 07:28:25 +00:00
Emmanuel Vadot	2421576ca3	Import DTS files from Linux 4.18	2018-08-13 06:40:20 +00:00
Matt Macy	20a3cbe1f8	fix static ZFS linking Static linking of ZFS is a newish option and LINT doesn't include it	2018-08-12 21:04:53 +00:00
Justin Hibbits	54318d2a6a	ipmi/opal: Enable polled mode and proper callback Fix a NULL dereference that would occur any time an ioctl() was done, due to a missing ipmi_enqueue_request callback. Just use the default for now, until we decide to properly enable IPMI interrupts. Reported by: kbowling	2018-08-12 20:33:55 +00:00
Michael Tuexen	812649d86f	Add explicit cast to silence a warning for the userland stack. Thanks to Felix Weinrank for providing the patch.	2018-08-12 14:05:15 +00:00
Navdeep Parhar	4a89444d7e	Remove unused stuff from iw_cxgbe.h	2018-08-12 03:36:09 +00:00
Matt Macy	fb8f55f586	MFV/ZoL: Add dbuf hash and dbuf cache kstats TODO: KSTAT_TYPE_NAMED support commit 5e021f56d3437d3523904652fe3cc23ea1f4cb70 Author: Giuseppe Di Natale <dinatale2@users.noreply.github.com> Date: Mon Jan 29 10:24:52 2018 -0800 Add dbuf hash and dbuf cache kstats Introduce kstats about the dbuf hash and dbuf cache to make it easier to inspect state. This should help with debugging and understanding of these portions of the codebase. Correct format of dbuf kstat file. Introduce a dbc column to dbufs kstat to indicate if a dbuf is in the dbuf cache. Introduce field filtering in the dbufstat python script. Introduce a no header option to the dbufstat python script. Introduce a test case to test basic mru->mfu list movement in the ARC. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #6906	2018-08-12 03:15:30 +00:00
Matt Macy	13ae5c6ba8	MFV/ZoL: Fix stack dbuf_hold_impl() commit fc5bb51f08a6c91ff9ad3559d0266eeeab0b1f61 Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Thu Aug 26 10:52:00 2010 -0700 Fix stack dbuf_hold_impl() This commit preserves the recursive function dbuf_hold_impl() but moves the local variables and function arguments to the heap to minimize the stack frame size. Enough space is initially allocated on the stack for 20 levels of recursion. This technique was based on commit 34229a2f2ac07363f64ddd63e014964fff2f0671 which reduced stack usage of traverse_visitbp(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2018-08-12 02:24:18 +00:00
Matt Macy	6e3d1345d9	fix build DN_MAX_BONUSLEN -> DN_OLD_MAX_BONUSLEN	2018-08-12 02:12:44 +00:00
Matt Macy	0f5add2566	Restore legacy dnode_phys layout on tier 2 arches Evidently gcc4 doesn't support anonymous union members	2018-08-12 02:09:06 +00:00
Matt Macy	104ed324dd	MFV/ZoL: Fix stack noinline commit 60948de1ef976aabaa3630707bcc8b5867508507 Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Thu Aug 26 10:58:36 2010 -0700 Fix stack noinline Certain function must never be automatically inlined by gcc because they are stack heavy or called recursively. This patch flags all such functions I've found as 'noinline' to prevent gcc from making the optimization. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2018-08-12 01:29:30 +00:00
Matt Macy	71d48dbda3	MFV/ZoL: Fix PANIC: metaslab_free_dva(): bad DVA X:Y:Z commit 81edd3e83409218879e7af293daa86b0c40eb015 Author: Peng <peng.hse@xtaotech.com> Date: Wed Jun 8 15:22:07 2016 +0800 Fix PANIC: metaslab_free_dva(): bad DVA X:Y:Z The following scenario can result in garbage in the dn_spill field. The db->db_blkptr must be set to NULL when DNODE_FLAG_SPILL_BLKPTR is clear to ensure the dn_spill field is cleared. Current txg = A. * A new spill buffer is created. Its dbuf is initialized with db_blkptr = NULL and it's dirtied. Current txg = B. * The spill buffer is modified. It's marked as dirty in this txg. * Additional changes make the spill buffer unnecessary because the xattr fits into the bonus buffer, so it's removed. The dbuf is undirtied in this txg, but it's still referenced and cannot be destroyed. Current txg = C. * Starts syncing of txg A * dbuf_sync_leaf() is called for the spill buffer. Since db_blkptr is NULL, dbuf_check_blkptr() is called. * The dbuf starts being written and it reaches the ready state (not done yet). * A new change makes the spill buffer necessary again. sa_build_layouts() ends up calling dbuf_find() to locate the dbuf. It finds the old dbuf because it has not been destroyed yet (it will be destroyed when the previous write is done and there are no more references). The old dbuf has db_blkptr != NULL. * txg A write is complete and the dbuf released. However it's still referenced, so it's not destroyed. Current txg = D. * Starts syncing of txg B * dbuf_sync_leaf() is called for the bonus buffer. Its contents are directly copied into the dnode, overwriting the blkptr area because, in txg B, the bonus buffer was big enough to hold the entire xattr. * At this point, the db_blkptr of the spill buffer used in txg C gets corrupted. Signed-off-by: Peng <peng.hse@xtaotech.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3937	2018-08-12 01:17:32 +00:00
Matt Macy	6f06a36d47	MFV/ZoL: add dbuf stats NB: disabled pending the addition of KSTAT_TYPE_RAW support to the SPL commit e0b0ca983d6897bcddf05af2c0e5d01ff66f90db Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Wed Oct 2 17:11:19 2013 -0700 Add visibility in to cached dbufs Currently there is no mechanism to inspect which dbufs are being cached by the system. There are some coarse counters in arcstats by they only give a rough idea of what's being cached. This patch aims to improve the current situation by adding a new dbufs kstat. When read this new kstat will walk all cached dbufs linked in to the dbuf_hash. For each dbuf it will dump detailed information about the buffer. It will also dump additional information about the referenced arc buffer and its related dnode. This provides a more complete view in to exactly what is being cached. With this generic infrastructure in place utilities can be written to post-process the data to understand exactly how the caching is working. For example, the data could be processed to show a list of all cached dnodes and how much space they're consuming. Or a similar list could be generated based on dnode type. Many other ways to interpret the data exist based on what kinds of questions you're trying to answer. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov>	2018-08-12 01:10:18 +00:00
Matt Macy	cc0fbbb92e	MFV/ZoL: Implement large_dnode pool feature commit 50c957f702ea6d08a634e42f73e8a49931dd8055 Author: Ned Bass <bass6@llnl.gov> Date: Wed Mar 16 18:25:34 2016 -0700 Implement large_dnode pool feature Justification ------------- This feature adds support for variable length dnodes. Our motivation is to eliminate the overhead associated with using spill blocks. Spill blocks are used to store system attribute data (i.e. file metadata) that does not fit in the dnode's bonus buffer. By allowing a larger bonus buffer area the use of a spill block can be avoided. Spill blocks potentially incur an additional read I/O for every dnode in a dnode block. As a worst case example, reading 32 dnodes from a 16k dnode block and all of the spill blocks could issue 33 separate reads. Now suppose those dnodes have size 1024 and therefore don't need spill blocks. Then the worst case number of blocks read is reduced to from 33 to two--one per dnode block. In practice spill blocks may tend to be co-located on disk with the dnode blocks so the reduction in I/O would not be this drastic. In a badly fragmented pool, however, the improvement could be significant. ZFS-on-Linux systems that make heavy use of extended attributes would benefit from this feature. In particular, ZFS-on-Linux supports the xattr=sa dataset property which allows file extended attribute data to be stored in the dnode bonus buffer as an alternative to the traditional directory-based format. Workloads such as SELinux and the Lustre distributed filesystem often store enough xattr data to force spill bocks when xattr=sa is in effect. Large dnodes may therefore provide a performance benefit to such systems. Other use cases that may benefit from this feature include files with large ACLs and symbolic links with long target names. Furthermore, this feature may be desirable on other platforms in case future applications or features are developed that could make use of a larger bonus buffer area. Implementation -------------- The size of a dnode may be a multiple of 512 bytes up to the size of a dnode block (currently 16384 bytes). A dn_extra_slots field was added to the current on-disk dnode_phys_t structure to describe the size of the physical dnode on disk. The 8 bits for this field were taken from the zero filled dn_pad2 field. The field represents how many "extra" dnode_phys_t slots a dnode consumes in its dnode block. This convention results in a value of 0 for 512 byte dnodes which preserves on-disk format compatibility with older software. Similarly, the in-memory dnode_t structure has a new dn_num_slots field to represent the total number of dnode_phys_t slots consumed on disk. Thus dn->dn_num_slots is 1 greater than the corresponding dnp->dn_extra_slots. This difference in convention was adopted because, unlike on-disk structures, backward compatibility is not a concern for in-memory objects, so we used a more natural way to represent size for a dnode_t. The default size for newly created dnodes is determined by the value of a new "dnodesize" dataset property. By default the property is set to "legacy" which is compatible with older software. Setting the property to "auto" will allow the filesystem to choose the most suitable dnode size. Currently this just sets the default dnode size to 1k, but future code improvements could dynamically choose a size based on observed workload patterns. Dnodes of varying sizes can coexist within the same dataset and even within the same dnode block. For example, to enable automatically-sized dnodes, run # zfs set dnodesize=auto tank/fish The user can also specify literal values for the dnodesize property. These are currently limited to powers of two from 1k to 16k. The power-of-2 limitation is only for simplicity of the user interface. Internally the implementation can handle any multiple of 512 up to 16k, and consumers of the DMU API can specify any legal dnode value. The size of a new dnode is determined at object allocation time and stored as a new field in the znode in-memory structure. New DMU interfaces are added to allow the consumer to specify the dnode size that a newly allocated object should use. Existing interfaces are unchanged to avoid having to update every call site and to preserve compatibility with external consumers such as Lustre. The new interfaces names are given below. The versions of these functions that don't take a dnodesize parameter now just call the _dnsize() versions with a dnodesize of 0, which means use the legacy dnode size. New DMU interfaces: dmu_object_alloc_dnsize() dmu_object_claim_dnsize() dmu_object_reclaim_dnsize() New ZAP interfaces: zap_create_dnsize() zap_create_norm_dnsize() zap_create_flags_dnsize() zap_create_claim_norm_dnsize() zap_create_link_dnsize() The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The spa_maxdnodesize() function should be used to determine the maximum bonus length for a pool. These are a few noteworthy changes to key functions: * The prototype for dnode_hold_impl() now takes a "slots" parameter. When the DNODE_MUST_BE_FREE flag is set, this parameter is used to ensure the hole at the specified object offset is large enough to hold the dnode being created. The slots parameter is also used to ensure a dnode does not span multiple dnode blocks. In both of these cases, if a failure occurs, ENOSPC is returned. Keep in mind, these failure cases are only possible when using DNODE_MUST_BE_FREE. If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0. dnode_hold_impl() will check if the requested dnode is already consumed as an extra dnode slot by an large dnode, in which case it returns ENOENT. * The function dmu_object_alloc() advances to the next dnode block if dnode_hold_impl() returns an error for a requested object. This is because the beginning of the next dnode block is the only location it can safely assume to either be a hole or a valid starting point for a dnode. * dnode_next_offset_level() and other functions that iterate through dnode blocks may no longer use a simple array indexing scheme. These now use the current dnode's dn_num_slots field to advance to the next dnode in the block. This is to ensure we properly skip the current dnode's bonus area and don't interpret it as a valid dnode. zdb --- The zdb command was updated to display a dnode's size under the "dnsize" column when the object is dumped. For ZIL create log records, zdb will now display the slot count for the object. ztest ----- Ztest chooses a random dnodesize for every newly created object. The random distribution is more heavily weighted toward small dnodes to better simulate real-world datasets. Unused bonus buffer space is filled with non-zero values computed from the object number, dataset id, offset, and generation number. This helps ensure that the dnode traversal code properly skips the interior regions of large dnodes, and that these interior regions are not overwritten by data belonging to other dnodes. A new test visits each object in a dataset. It verifies that the actual dnode size matches what was stored in the ztest block tag when it was created. It also verifies that the unused bonus buffer space is filled with the expected data patterns. ZFS Test Suite -------------- Added six new large dnode-specific tests, and integrated the dnodesize property into existing tests for zfs allow and send/recv. Send/Receive ------------ ZFS send streams for datasets containing large dnodes cannot be received on pools that don't support the large_dnode feature. A send stream with large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be unrecognized by an incompatible receiving pool so that the zfs receive will fail gracefully. While not implemented here, it may be possible to generate a backward-compatible send stream from a dataset containing large dnodes. The implementation may be tricky, however, because the send object record for a large dnode would need to be resized to a 512 byte dnode, possibly kicking in a spill block in the process. This means we would need to construct a new SA layout and possibly register it in the SA layout object. The SA layout is normally just sent as an ordinary object record. But if we are constructing new layouts while generating the send stream we'd have to build the SA layout object dynamically and send it at the end of the stream. For sending and receiving between pools that do support large dnodes, the drr_object send record type is extended with a new field to store the dnode slot count. This field was repurposed from unused padding in the structure. ZIL Replay ---------- The dnode slot count is stored in the uppermost 8 bits of the lr_foid field. The bits were unused as the object id is currently capped at 48 bits. Resizing Dnodes --------------- It should be possible to resize a dnode when it is dirtied if the current dnodesize dataset property differs from the dnode's size, but this functionality is not currently implemented. Clearly a dnode can only grow if there are sufficient contiguous unused slots in the dnode block, but it should always be possible to shrink a dnode. Growing dnodes may be useful to reduce fragmentation in a pool with many spill blocks in use. Shrinking dnodes may be useful to allow sending a dataset to a pool that doesn't support the large_dnode feature. Feature Reference Counting -------------------------- The reference count for the large_dnode pool feature tracks the number of datasets that have ever contained a dnode of size larger than 512 bytes. The first time a large dnode is created in a dataset the dataset is converted to an extensible dataset. This is a one-way operation and the only way to decrement the feature count is to destroy the dataset, even if the dataset no longer contains any large dnodes. The complexity of reference counting on a per-dnode basis was too high, so we chose to track it on a per-dataset basis similarly to the large_block feature. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3542	2018-08-12 00:45:53 +00:00
Matt Macy	9f3a171221	Enable balanced arc pruning Taken from: ommit f6046738365571bd647f804958dfdff8a32fbde4 Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Sat May 30 09:57:53 2015 -0500 Make arc_prune() asynchronous As described in the comment above arc_adapt_thread() it is critical that the arc_adapt_thread() function never sleep while holding a hash lock. This behavior was possible in the Linux implementation because the arc_prune() logic was implemented to be synchronous. Under illumos the analogous dnlc_reduce_cache() function is asynchronous. To address this the arc_do_user_prune() function is has been reworked in to two new functions as follows: * arc_prune_async() is an asynchronous implementation which dispatches the prune callback to be run by the system taskq. This makes it suitable to use in the context of the arc_adapt_thread(). * arc_prune() is a synchronous implementation which depends on the arc_prune_async() implementation but blocks until the outstanding callbacks complete. This is used in arc_kmem_reap_now() where it is safe, and expected, that memory will be freed. This patch additionally adds the zfs_arc_meta_strategy module option while allows the meta reclaim strategy to be configured. It defaults to a balanced strategy which has been proved to work well under Linux but the illumos meta-only strategy can be enabled. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2018-08-11 22:01:52 +00:00
Navdeep Parhar	37310a98a8	cxgbe(4): Move all control queues to the adapter. There used to be one control queue per adapter (the mgmtq) that was initialized during adapter init and one per port that was initialized later during port init. This change moves all the control queues (one per port/channel) to the adapter so that they are initialized during adapter init and are available before any port is up. This allows the driver to issue ctrlq work requests over any channel without having to bring up any port. MFH: 2 weeks Sponsored by: Chelsio Communications	2018-08-11 21:10:08 +00:00
Matt Macy	d815f5ba09	buildworld fix: private appears to have special meaning on FreeBSD - revert to priv	2018-08-11 20:41:42 +00:00
Matt Macy	6b55e6fb04	Limit the amount of dnode metadata in the ARC In addition import most recent arc_prune_async implementation as dependency commit 25458cbef9e59ef9ee6a7e729ab2522ed308f88f Author: Tim Chase <tim@chase2k.com> Date: Wed Jul 13 07:42:40 2016 -0500 Limit the amount of dnode metadata in the ARC Metadata-intensive workloads can cause the ARC to become permanently filled with dnode_t objects as they're pinned by the VFS layer. Subsequent data-intensive workloads may only benefit from about 25% of the potential ARC (arc_c_max - arc_meta_limit). In order to help track metadata usage more precisely, the other_size metadata arcstat has replaced with dbuf_size, dnode_size and bonus_size. The new zfs_arc_dnode_limit tunable, which defaults to 10% of zfs_arc_meta_limit, defines the minimum number of bytes which is desirable to be consumed by dnodes. Attempts to evict non-metadata will trigger async prune tasks if the space used by dnodes exceeds this limit. The new zfs_arc_dnode_reduce_percent tunable specifies the amount by which the excess dnode space is attempted to be pruned as a percentage of the amount by which zfs_arc_dnode_limit is being exceeded. By default, it tries to unpin 10% of the dnodes. The problem of dnode metadata pinning was observed with the following testing procedure (in this example, zfs_arc_max is set to 4GiB): - Create a large number of small files until arc_meta_used exceeds arc_meta_limit (3GiB with default tuning) and arc_prune starts increasing. - Create a 3GiB file with dd. Observe arc_mata_used. It will still be around 3GiB. - Repeatedly read the 3GiB file and observe arc_meta_limit as before. It will continue to stay around 3GiB. With this modification, space for the 3GiB file is gradually made available as subsequent demands on the ARC are made. The previous behavior can be restored by setting zfs_arc_dnode_limit to the same value as the zfs_arc_meta_limit. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4345 Issue #4512 Issue #4773 Closes #4858	2018-08-11 19:45:04 +00:00
Alan Cox	c65ed2ff53	Eliminate a redundant assignment. MFC after: 1 week	2018-08-11 19:21:53 +00:00
Kristof Provost	e9ddca4a40	pf: Take the IF_ADDR_RLOCK() when iterating over the group list We did do this elsewhere in pf, but the lock was missing here. Sponsored by: Essen Hackathon	2018-08-11 16:37:55 +00:00
Kristof Provost	33b242b533	pf: Fix 'set skip on' for groups The pfi_skip_if() function sometimes caused skipping of groups to work, if the members of the group used the groupname as a name prefix. This is often the case, e.g. group lo usually contains lo0, lo1, ..., but not always. Rather than relying on the name explicitly check for group memberships. Obtained from: OpenBSD (pf_if.c,v 1.62, pf_if.c,v 1.63) Sponsored by: Essen Hackathon	2018-08-11 16:34:30 +00:00
Navdeep Parhar	3098bcfc05	cxgbe(4): Create two variants of service_iq, one for queues with freelists and one for those without. MFH: 3 weeks Sponsored by: Chelsio Communications	2018-08-11 04:55:47 +00:00
Matt Macy	90df93417e	ZFS/MFV: Use cached feature info in spa_add_feature_stats() commit 417104bdd3c7ce07ec58674dd078f9891c3bc780 Author: Ned Bass <bass6@llnl.gov> Date: Thu Feb 26 12:24:11 2015 -0800 Use cached feature info in spa_add_feature_stats() Avoid issuing I/O to the pool when retrieving feature flags information. Trying to read the ZAPs from disk means that zpool clear would hang if the pool is suspended and recovery would require a reboot. To keep the feature stats resident in memory, we hang a cached nvlist off of the spa. It is built up from disk the first time spa_add_feature_stats() is called, and refreshed thereafter using the cached feature reference counts. spa_add_feature_stats() gets called at pool import time so we can be sure the cached nvlist will be available if the pool is later suspended. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3082	2018-08-10 23:42:11 +00:00
Devin Teske	ab9ed8a1bd	Fix misspellings of transmitter/transmitted Reviewed by: emaste, bcr Sponsored by: Smule, Inc. Differential Revision: https://reviews.freebsd.org/D16025	2018-08-10 20:37:32 +00:00
Conrad Meyer	f053ca1f08	Walk back r337554 while discussion continues The idea was to get the uncontroversial mechanical change out of the way, then get the meatier functional changes reviewed subsequently. I had not realized that the immediately adjacent issue was addressed in a different direction in r334506 (see Warner's guidance in D15592). Discussion continues, trying to determine if there is a secondary issue still[1] and how best to fix it. With 12-related activities coming up, while that is ongoing, just take this back for now. [1]: Shutdown-time eventhandler events fire normally during panic's reboot path. Driver callbacks that attempt to issue and wait on interrupt- completed IO may never complete, hanging the system. This is particularly obnoxious in the shutdown/panic path, as the debugger cannot be entered anymore and the hang prevents reboot restoring availability. (There's nothing CAM-specific about this problem -- any shutdown event-triggered driver could do something like this during panic. But most NICs, etc. don't try to send spin-down commands at shutdown. ;-)) Discussed with: imp, markj	2018-08-10 19:19:07 +00:00
Kyle Evans	0915d9d070	subr_prf: remove think-o that had returned to local patch Reported by: cognet	2018-08-10 15:35:02 +00:00
Kyle Evans	170bc29131	boot tagging: minor fixes msgbufinit may be called multiple times as we initialize the msgbuf into a progressively larger buffer. This doesn't happen as of now on head, but it may happen in the future and we generally support this. As such, only print the boot tag if we've just initialized the buffer for the first time. The boot tag also now has a newline appended to it for better visibility, and has been switched to a normal printf, by requesto f bde, after we've denoted that the msgbuf is mapped.	2018-08-10 15:29:06 +00:00
Warner Losh	7e299411ac	Bring in timespce_get form NetBSD. Bring in the functionality for timespec_get from NetBSD. I've lightly edited the .c file to remove _DIAGASSERT because FreeBSD doesn't have that functionality and the typical #define'ing it to assert isn't right here. The man page is verbatim from NetBSD, but will be revised as part of a larger cleanup of the time man pages (they are inconsistent and vague in all the wrong places). Differential Review: https://reviews.freebsd.org/D16649	2018-08-10 15:16:30 +00:00
Kyle Evans	84c956df77	ath: Minor style cleanups device_printf => DPRINTF and two whitespace adjustments Submitted by: Augustin Cavalier <waddlesplash@gmail.com> Obtained from: Haiku (4a88aa503ad4155a20931e263d24343043994ea9) MFC after: 1 week	2018-08-10 13:38:23 +00:00
Kyle Evans	8e0cc51b87	ieee8021_node: fix whitespace issues Submitted by: Augustin Cavalier <waddlesplash@gmail.com> Obtained from: Haiku (dffc3e235360cd7b71261239ee8507b7d62a1471) MFC after: 1 week	2018-08-10 13:34:23 +00:00
Kyle Evans	58a7c4bfcf	net80211: Drain ageq before cleaning it up. The comment above ieee80211_ageq_cleanup specifically notes that the queue is assumed to be empty, and in order to make it so, ieee80211_ageq_drain must be used. Submitted by: Augustin Cavalier <waddlesplash@gmail.com> Obtained from: Haiku (dffc3e235360cd7b71261239ee8507b7d62a1471) MFC after: 1 week	2018-08-10 13:32:02 +00:00
Kyle Evans	060b3e4ff1	bwi(4): Set ic->ic_softc before bwi_getradiocaps to avoid bad deref Submitted by: François Revol <revol@free.fr> Obtained from: Haiku (ba88131cfde64e21bedb4ebedd699cfa5e7fd314) MFC after: 1 week	2018-08-10 13:06:14 +00:00
Andrey V. Elsukov	16bbf600d9	Remove unneeded ipsec-related includes. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D16637	2018-08-10 07:24:01 +00:00
Matt Macy	648cfe57fd	Performance optimization of AVL tree comparator functions MFV: commit ee36c709c3d5f7040e1bd11f5c75318aa03e789f Author: Gvozden Neskovic <neskovic@gmail.com> Date: Sat Aug 27 20:12:53 2016 +0200 perf: 2.75x faster ddt_entry_compare() First 256bits of ddt_key_t is a block checksum, which are expected to be close to random data. Hence, on average, comparison only needs to look at first few bytes of the keys. To reduce number of conditional jump instructions, the result is computed as: sign(memcmp(k1, k2)). Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} , which is computed efficiently. Synthetic performance evaluation of original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R) CPU E5-2660 v3: old 6.85789 s new 2.49089 s perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare() Compute the result directly instead of using conditionals perf: zfs_range_compare() Speedup between 1.1x - 2.5x, depending on compiler version and optimization level. perf: spa_error_entry_compare() `bcmp()` is not suitable for comparator use. Use `memcmp()` instead. perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare() perf: 2.8x faster zil_bp_compare() perf: 2.8x faster mze_compare() perf: faster dbuf_compare() perf: faster compares in spa_misc perf: 2.8x faster layout_hash_compare() perf: 2.8x faster space_reftree_compare() perf: libzfs: faster avl tree comparators perf: guid_compare() perf: dsl_deadlist_compare() perf: perm_set_compare() perf: 2x faster range_tree_seg_compare() perf: faster unique_compare() perf: faster vdev_cache _compare() perf: faster vdev_uberblock_compare() perf: faster fuid _compare() perf: faster zfs_znode_hold_compare() Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Richard Elling <richard.elling@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5033	2018-08-10 06:42:08 +00:00
Justin Hibbits	7d849dc1a4	powerpc: Add lwsync and ptesync 'sync' opcode variants to ddb disassembler The canonical form of sync is: sync L, E (if Category Elemental Memory Barriers implemented) The L bits (2) denote the type of sync: 0 -- hwsync 1 -- lwsync 2 -- ptesync or hwsync It's been found that most 32-bit CPUs designed prior to the introduction of lwsync will ignore the L bits. However, some cores, particularly the e500 core, will trigger an illegal instruction exception. Adding these variants will make it easier to see which sync variant is actually being used in case of a trap.	2018-08-10 03:28:40 +00:00
Cy Schubert	79476a1c3e	Correct a comment. Should have been detected by ipf_nat_in() not ipf_nat_out(). MFC after: 1 week X-MFC-with: r337558	2018-08-10 00:30:15 +00:00
Cy Schubert	e6191e11f0	Identify the return value (rval) that led to the IPv4 NAT failure in ipf_nat_checkout() and report it in the frb_natv4out and frb_natv4in dtrace probes. This is currently being used to diagnose NAT failures in PR/208566. It's rather handy so this commit makes it available for future diagnosis and debugging efforts. PR: 208566 MFC after: 1 week	2018-08-10 00:04:32 +00:00
Glen Barber	b534d57f63	Rename head from -CURRENT to -ALPHA1 as part of the 12.0-RELEASE cycle. This commit marks the start of the code slush for the 12.0 cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation	2018-08-10 00:01:21 +00:00
Conrad Meyer	2077be2b73	cam(4): Add an xpt-neutral flag indicating a valid panic CCB No functional change. Note that this change is careful to set the CCB header xflags after foo_fill_bar() routines, which generally zero existing flags. An earlier version of this patch mistakenly set the flag before the fill routines. Submitted by: Scott Ferris <sferris AT isilon.com>, jhibbits@ Reviewed by: bdrewery@, markj@, and non-committer FreeBSD contributor Anton Rang Sponsored by: Dell EMC Isilon	2018-08-09 21:53:32 +00:00
Navdeep Parhar	2d73ac5e4a	cxgbe(4): Add a sysctl to control the tx credit reclaim mechanism for netmap tx queues. There is no change in default behavior. Sponsored by: Chelsio Communications	2018-08-09 21:52:51 +00:00
Conrad Meyer	bc812246a0	cam_ccb.h: Remove redundant declarations of static inline functions No functional change. They're unnecessarily confusing for tools like grep or ctags. Sponsored by: Dell EMC Isilon	2018-08-09 21:20:07 +00:00
Navdeep Parhar	518bca2c21	cxgbe(4): Set fl_pktshift to 0 by default. Sponsored by: Chelsio Communications	2018-08-09 21:07:32 +00:00
Kyle Evans	240fcda1e8	subr_prf: style(9) the sizeof Reported by: jkim, ian	2018-08-09 19:09:06 +00:00
Mark Johnston	b50a4ea646	Account for the lowmem handlers in the inactive queue scan target. Before r329882 the target would be computed after lowmem handlers run and free pages. On some systems a significant amount of page reclamation happens this way. However, with r329882 the target is computed first, which can lead to unnecessary reclamation from the page cache, and this in turn may result in excessive swapping. Instead, adjust the target after running lowmem handlers. Don't invoke the lowmem handlers before the PID controller, though, since that would hide the true rate of page allocation. Reviewed by: alc, kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16606	2018-08-09 18:25:49 +00:00
Kyle Evans	4c793b68da	subr_prf: Use "sizeof current_boot_tag" instead	2018-08-09 17:53:18 +00:00
Kyle Evans	2a4650cc11	BOOT_TAG: Make a config(5) option, expose as sysctl and loader tunable BOOT_TAG lived shortly in sys/msgbuf.h, but this wasn't necessarily great for changing it or removing it. Move it into subr_prf.c and add options for it to opt_printf.h. One can specify both the BOOT_TAG and BOOT_TAG_SZ (really, size of the buffer that holds the BOOT_TAG). We expose it as kern.boot_tag and also add a loader tunable by the same name that we'll fetch upon initialization of the msgbuf. This allows for flexibility and also ensures that there's a consistent way to figure out the boot tag of the running kernel, rather than relying on headers to be in-sync. Prodded super-super-lightly by: imp	2018-08-09 17:47:47 +00:00
Kyle Evans	21aa6e8345	msgbuf: Light detailing (const'ify and bool'itize)	2018-08-09 17:42:27 +00:00
Navdeep Parhar	8a684e1fd1	cxgbe(4): Display pkt-size and burst-size in traffic class parameters.	2018-08-09 14:36:44 +00:00
Navdeep Parhar	5fc0f72f3b	cxgbe(4): Add support for high priority filters on T6+. They have their own region in the TCAM starting with T6, unlike previous chips where they were in the same region as normal filters. These filters "hit" before anything else in the LE's lookup. The exact order is: a) High priority filters b) TOE's active region (TCAM and/or hash) c) Servers (TOE hw listeners) d) Normal filters MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-09 14:19:47 +00:00
Leandro Lupori	c8e2123b6a	[ppc] Fix kernel panic when using BOOTP_NFSROOT On PowerPC (and possibly other architectures), that doesn't use EARLY_AP_STARTUP, the config task queue may be used initialized. This was observed while trying to mount the root fs from NFS, as reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168. This patch has 2 main changes: 1- Perform a basic initialization of qgroup_config, similar to what is done in taskqgroup_adjust, but simpler. This makes qgroup_config ready to be used during NFS root mount. 2- When EARLY_AP_STARTUP is not used, call inm_init() and in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs to send multicast packages to request an IP. PR: Bug 230168 Reported by: sbruno Reviewed by: jhibbits, mmacy, sbruno Approved by: jhibbits Differential Revision: D16633	2018-08-09 14:04:51 +00:00
Olivier Houchard	d8f1ed8d94	Import CK as of commit 08813496570879fbcc2adcdd9ddc0a054361bfde, mostly to avoid using lwsync on ppc32.	2018-08-09 12:11:49 +00:00
Hans Petter Selasky	25a1e0f636	Implement missing atomic_fcmpset_XXX() support for i386. This also fixes i386 build after r337527. MFC after: 1 week Sponsored by: Mellanox Technologies	2018-08-09 11:30:13 +00:00
Andriy Gapon	27cfcd9599	add an option for ddb ps command to print process arguments We use ps to collect the information of all processes in textdump. But it doesn't contain process arguments which however sometimes are very useful for debugging. The new 'a' modifier adds that capability. While here, remove 'm' modifier from ddb.4. It was in the manual page from its very first revision, but I could not find any evidence of the code ever supporting it. Submitted by: Terry Hu <thu@panzura.com> Reviewed by: kib MFC after: 1 week Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D16603	2018-08-09 11:21:31 +00:00
Hans Petter Selasky	6402bc3d1e	Use atomic_fcmpset_XXX() instead of atomic_cmpset_XXX() when possible in the LinuxKPI. Suggested by: mjg @ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-08-09 09:39:32 +00:00
Matt Macy	9fec45d8e5	epoch_block_wait: don't check TD_RUNNING struct epoch_thread is not type safe (stack allocated) and thus cannot be dereferenced from another CPU Reported by: novel@	2018-08-09 05:18:27 +00:00
Kyle Evans	2834d61202	kern: Add a BOOT_TAG marker at the beginning of boot dmesg From the "newly licensed to drive" PR department, add a BOOT_TAG marker (by default, --<<BOOT>>--, to the beginning of each boot's dmesg. This makes it easier to do textproc magic to locate the start of each boot and, of particular interest to some, the dmesg of the current boot. The PR has a dmesg(8) component as well that I've opted not to include for the moment- it was the more contentious part of this PR. bde@ also made the statement that this boot tag should be written with an ordinary printf, which I've- for the moment- declined to change about this patch to keep it more transparent to observer of the boot process. PR: 43434 Submitted by: dak <aurelien.nephtali@wanadoo.fr> (basically rewritten) MFC after: maybe never	2018-08-09 01:32:09 +00:00
Breno Leitao	78f4e2fea0	powerpc64/powernv: re-read RTC after polling If OPAL_RTC_READ is busy and does not return the information on the first run, as returning OPAL_BUSY_EVENT, the system will crash since ymd and hmsm variable will contain junk values. This is happening because we were not calling OPAL_RTC_READ again after OPAL_POLL_EVENTS' return, which would finally replace the old/junk hmsm and ymd values. The code was also mixing OPAL_RTC_READ and OPAL_POLL_EVENTS return values. This patch fix this logic and guarantee that we call OPAL_RTC_READ after OPAL_POLL_EVENTS return, and guarantee the code will only proceed if OPAL_RTC_READ returns OPAL_SUCCESS. Reviewed by: jhibbits Approved by: jhibbits (mentor) Differential Revision: https://reviews.freebsd.org/D16617	2018-08-08 21:19:07 +00:00
Rick Macklem	41df1b5b47	Assorted fixes to handling of LayoutRecall callbacks, mostly error handling. After a re-read of the appropriate section of RFC5661, I decided that a few things should be changed related to LayoutRecall callback handling. Here are the things fixed by this patch. - For two of the three cases that LayoutRecall is done, I now think setting the clora_changed argument false is correct. - All errors other than NFSERR_DELAY returned by LayoutRecall appear permanent, so don't retry for any of them. (NFSERR_DELAY is retried by newnfs_request(), so it is not affected by this patch.) - Instead of waiting "forever" (actually until the process is SIGTERM'd) for Layouts to be returned during a mirror copy, fail and return ENXIO after about 1minute. Waiting for a <ctrl>C made sense when pnfsdscopymr() was done by itself, but did not make sense when done via find(1). This patch only affects the pNFS server.	2018-08-08 20:21:45 +00:00
Andrey V. Elsukov	5c4aca8218	Use host byte order when comparing mss values. This fixes tcp-setmss action on little endian machines. PR: 225536 Submitted by: John Zielinski	2018-08-08 17:32:02 +00:00
Alan Cox	2bf8cb3804	Add support for pmap_enter(..., psind=1) to the armv6 pmap. In other words, add support for explicitly requesting that pmap_enter() create a 1 MB page mapping. (Essentially, this feature allows the machine-independent layer to create superpage mappings preemptively, and not wait for automatic promotion to occur.) Export pmap_ps_enabled() to the machine-independent layer. Add a flag to pmap_pv_insert_pte1() that specifies whether it should fail or reclaim a PV entry when one is not available. Refactor pmap_enter_pte1() into two functions, one by the same name, that is a general-purpose function for creating pte1 mappings, and another, pmap_enter_1mpage(), that is used to prefault 1 MB read- and/or execute- only mappings for execve(2), mmap(2), and shmat(2). In addition, as an optimization to pmap_enter(..., psind=0), eliminate the use of pte2_is_managed() from pmap_enter(). Unlike the x86 pmap implementations, armv6 does not have a managed bit defined within the PTE. So, pte2_is_managed() is actually a call to PHYS_TO_VM_PAGE(), which is O(n) in the number of vm_phys_segs[]. All but one call to PHYS_TO_VM_PAGE() in pmap_enter() can be avoided. Reviewed by: kib, markj, mmel Tested by: mmel MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D16555	2018-08-08 16:55:01 +00:00
Ruslan Bukin	6371d0bd64	Implement uma_small_alloc(), uma_small_free(). Reviewed by: markj Obtained from: arm64 Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16628	2018-08-08 16:08:38 +00:00
Pedro F. Giffuni	c820acbf0a	msdosfs: fixes for Undefined Behavior. These were found by the Undefined Behaviour GsoC project at NetBSD: Do not change signedness bit with left shift. While there avoid signed integer overflow. Address both issues with using unsigned type. msdosfs_fat.c:512:42, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:521:44, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:744:14, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:744:24, signed integer overflow: -2147483648 - 1 cannot be represented in type 'int [20]' msdosfs_fat.c:840:13, left shift of 1 by 31 places cannot be represented in type 'int' msdosfs_fat.c:840:36, signed integer overflow: -2147483648 - 1 cannot be represented in type 'int [20]' Detected with micro-UBSan in the user mode. Hinted from: NetBSD (CVS 1.33) MFC after: 2 weeks Differenctial Revision: https://reviews.freebsd.org/D16615	2018-08-08 15:08:22 +00:00
Randall Stewart	d18ea344e6	Fix a small bug in rack where it will end up sending the FIN twice. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16604	2018-08-08 13:36:49 +00:00
Fedor Uporov	53288b712d	Split the dir_index and dir_nlink features. Do not allow to create more that EXT4_LINK_MAX links to directory in case if the dir_nlink is not set, like it is done in the fresh e2fsprogs updates. MFC after: 3 months	2018-08-08 12:08:46 +00:00
Fedor Uporov	17c7b27f55	Fix directory blocks checksum updating logic. The checksum updating functions were not called in case of dir index inode splitting and in case of dir entry removing, when the entry was first in the block. Fix and move the dir entry adding logic when i_count == 0 to new function. MFC after: 3 months	2018-08-08 12:07:45 +00:00
Conrad Meyer	3dc1c7d6bc	FUSE: Remove some set-but-not-used variables No functional change.	2018-08-08 04:46:03 +00:00
Alan Cox	78f1deeffe	Defer and aggregate swap_pager_meta_build frees. Before swp_pager_meta_build replaces an old swapblk with an new one, it frees the old one. To allow such freeing of blocks to be aggregated, have swp_pager_meta_build return the old swap block, and make the caller responsible for freeing it. Define a pair of short static functions, swp_pager_init_freerange and swp_pager_update_freerange, to do the initialization and updating of blk addresses and counters used in aggregating blocks to be freed. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: kib, markj (an earlier version) Tested by: pho MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13707	2018-08-08 02:30:34 +00:00
Kevin Lo	a99020fbf3	- Fix hash calculation by MAC address - Since rx_cmd_c is an uint16_t, use le16toh() instead of le32toh() Reviewed by: emaste	2018-08-08 01:20:02 +00:00
Navdeep Parhar	09a7189fb7	cxgbe(4): Allow the driver to specify a burst size when configuring a traffic class for rate limiting. Add experimental knobs that allow the user to specify a default pktsize and burstsize for traffic classes associated with a port: dev.<ifname>.<instance>.tc.pktsize dev.<ifname>.<instance>.tc.burstsize Sponsored by: Chelsio Communications	2018-08-07 22:13:03 +00:00
Rick Macklem	93df87f208	Allow newnfs_request() to retry all callback RPCs with an NFSERR_DELAY reply. The code in newnfs_request() retries RPCs that get a reply of NFSERR_DELAY, but exempts certain NFSv4 operations. However, for callback RPCs, there should not be any exemptions at this time. The code would have erroneously exempted the CBRECALL callback, since it has the same operation number as the CLOSE operation. This patch fixes this by checking for a callback RPC (indicated by clp != NULL) and not checking for exempt operations for callbacks. This would have only affected the NFSv4 server when delegations are enabled (they are not enabled by default) and the client replies to CBRECALL with NFSERR_DELAY. This may never actually happen. Spotted during code inspection. MFC after: 2 weeks	2018-08-07 21:29:14 +00:00
Konstantin Belousov	8f94195022	Followup to r337430: only call elf_reloc_ifunc on x86. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-08-07 20:43:50 +00:00
Marius Strobl	e545f5a8a3	Update the list of architectures having atomic_fcmpset_{8,16,64}(9) and atomic_swap_{64,int}(9) respectively as of r337433.	2018-08-07 18:59:02 +00:00
Marius Strobl	13a10f3414	Implement atomic_swap_{int,long,ptr}(9).	2018-08-07 18:56:51 +00:00
Marius Strobl	ac97c7e4c1	Implement atomic_swap_64(9).	2018-08-07 18:56:01 +00:00
Konstantin Belousov	cb0eecdf92	Futex support functions in linux.ko and linux32.ko on amd64 should be aware of SMAP. Reported and tested by: Johannes Lundberg <johalun0@gmail.com>, wulf Sponsored by: The FreeBSD Foundation	2018-08-07 18:29:10 +00:00
Konstantin Belousov	289ead7cb0	Add missed handling of local relocs against ifunc target in the obj modules. Reported and tested by: wulf Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-08-07 18:26:46 +00:00
Mark Johnston	159f344b84	Recognize ICS1893C PHYs. Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> MFC after: 1 week	2018-08-07 17:13:42 +00:00
Mark Johnston	c7902fbeae	Improve handling of control message truncation. If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space for all control messages, the kernel sets MSG_CTRUNC in the message flags to indicate truncation of the control messages. In the case of SCM_RIGHTS messages, however, we were failing to dispose of the rights that had already been externalized into the recipient's file descriptor table. Add a new function and mbuf type to handle this cleanup task, and use it any time we fail to copy control messages out to the recipient. To simplify cleanup, control message truncation is now only performed at control message boundaries. The change also fixes a few related bugs: - Rights could be leaked to the recipient process if an error occurred while copying out a message's contents. - We failed to set MSG_CTRUNC if the truncation occurred on a control message boundary, e.g., if the caller received two control messages and provided only the exact amount of buffer space needed for the first. PR: 131876 Reviewed by: ed (previous version) MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16561	2018-08-07 16:36:48 +00:00
Colin Percival	0b4d5eb8fd	Replace a pair of 8-bit writes to VGA memory with a single 16-bit write. The VGA "text mode" buffer has a pair of bytes for each character: One byte for the character symbol, and an "attribute" byte encoding the foreground and background colours. When updating the screen, we were writing these two bytes separately. On some virtualized systems, every write results in a glyph being redrawn into a (graphical) virtual screen; writing these two bytes separately results in twice as much work being done to draw characters, whereas if we perform a single 16-bit write instead, the character only needs to be redrawn once. On an EC2 c5.4xlarge instance, this change cuts 1.30s from the kernel boot, speeding it up from 8.90s to 7.60s. MFC after: 1 week	2018-08-07 08:33:40 +00:00
Cy Schubert	95bdea60e0	Remove redundant and incorrect default definition of AF_INET6. AF_INET6 is defined in sys/socket.h where it's defined as 28. A bit of trivia: On NetBSD AF_INET6 is defined as 24. On Solaris it is defined as 26. This is probably why Darren defaulted to 26, because ipfilter was originally written for SunOS 4 and Solaris many moons ago. MFC after: 2 weeks	2018-08-07 07:12:59 +00:00
John Baldwin	d2aec9714a	Make the system C11 atomics headers fully compatible with external GCC. The <sys/cdefs.h> and <stdatomic.h> headers already included support for C11 atomics via intrinsincs in modern versions of GCC, but these versions tried to "hide" atomic variables inside a wrapper structure. This wrapper is not compatible with GCC's internal <stdatomic.h> header, so that if GCC's <stdatomic.h> was used together with <sys/cdefs.h>, use of C11 atomics would fail to compile. Fix this by not hiding atomic variables in a structure for modern versions of GCC. The headers already avoid using a wrapper structure on clang. Note that this wrapper was only used if C11 was not enabled (e.g. via -std=c99), so this also fixes compile failures if a modern version of GCC was used with -std=c11 but with FreeBSD's <stdatomic.h> instead of GCC's <stdatomic.h> and this change fixes that case as well. Reported by: Mark Millard Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D16585	2018-08-06 23:51:08 +00:00
Navdeep Parhar	1979b51141	cxgbe(4): Allow user-configured and driver-configured traffic classes to be used simultaneously. Move sysctl_tc and sysctl_tc_params to t4_sched.c while here. MFC after: 3 weeks Sponsored by: Chelsio Communications	2018-08-06 23:21:13 +00:00
Navdeep Parhar	7b8f5a200a	cxgbe(4): Break up sysctl_bitfield into 8 bit and 16 bit variants. Have them display the current value of the bitfield rather than the fixed value that was provided when the sysctl node was created. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-06 21:54:51 +00:00
Kirk McKusick	68c49bcc40	Put in place the framework for consolodating contiguous blocks into a smaller number of larger TRIM requests. The hope had been to have the full TRIM consolodation in place for 12.0, but the algorithms are still under development and need further testing. With this framework in place it will be possible to easily add TRIM consolodation once the optimal strategy has been found. The only functional change with this patch is the elimination of TRIM requests for blocks that are freed before they have been likely to have been written. Reviewed by: kib Discussed with: Warner Losh and Chuck Silvers Sponsored by: Netflix	2018-08-06 21:09:11 +00:00
Navdeep Parhar	564ec04ea8	Fix typo in cxgbe/t4_tom.	2018-08-06 19:09:55 +00:00
Jonathan T. Looney	95a914f631	Address concerns about CPU usage while doing TCP reassembly. Currently, the per-queue limit is a function of the receive buffer size and the MSS. In certain cases (such as connections with large receive buffers), the per-queue segment limit can be quite large. Because we process segments as a linked list, large queues may not perform acceptably. The better long-term solution is to make the queue more efficient. But, in the short-term, we can provide a way for a system administrator to set the maximum queue size. We set the default queue limit to 100. This is an effort to balance performance with a sane resource limit. Depending on their environment, goals, etc., an administrator may choose to modify this limit in either direction. Reviewed by: jhb Approved by: so Security: FreeBSD-SA-18:08.tcp Security: CVE-2018-6922	2018-08-06 17:36:57 +00:00
Andrew Turner	b17f3d298d	Default to armv5te in LINT on arm. This should fix building LINT there.	2018-08-06 14:40:45 +00:00
Hans Petter Selasky	549dcdb34e	Implement current_work() function in the LinuxKPI. Tested by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies	2018-08-06 10:48:20 +00:00
Randall Stewart	936b2b64ae	This fixes a bug in Rack where we were not properly using the correct value for Delayed Ack. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16579	2018-08-06 09:22:07 +00:00
Hans Petter Selasky	db119089be	Implement atomic_long_cmpxchg() function in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies	2018-08-06 08:40:02 +00:00
Hans Petter Selasky	f698bc4d76	Define __poll_t type in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies	2018-08-06 08:35:16 +00:00
Emmanuel Vadot	d19afc9abf	aw_thermal: Add nvmem and H5 support Now that aw_sid expose nvmem interface, use that to read the calibration data. Add support for H5 SoC. Fix the bindings, we used to have non-upstreamed bindings. Switch to the one that have been sent upstream. They are not stable yet, so we switch from custom, wrong, bindings to correct, proposed bindings	2018-08-06 05:36:00 +00:00
Emmanuel Vadot	97eb836f8b	aw_sid: Add nvmem interface Rework aw_sid so it can work with the nvmem interface. Each SoC expose a set of fuses (for now rootkey/boardid and, if available, the thermal calibration data). A fuse can be private or public, reading private fuse needs to be done via some registers instead of reading directly. Each fuse is exposed as a sysctl. For now leave the possibility for a driver to read any fuse without using the nvmem interface as the awg and emac driver use this to generate a mac address.	2018-08-06 05:35:24 +00:00
Rick Macklem	25705dd5d0	Copy all bits of a file handle in case there is padding in the structure. At least on x86, fhandle_t is a packed structure, so I believe an assignment will copy all the bits. However, for some current/future architectures, there might be padding in the structure that doesn't get copied via an assignment. Since NFS assumes a file handle is an opaque blob of bits that can be compared via memcmp()/bcmp(), all the bits including any padding must be copied. This patch replaces the assignments with a call to a byte copy function. Spotted during code inspection.	2018-08-05 19:21:50 +00:00
Kristof Provost	91e0f2d200	pf: Increase default hash table size Now that we (by default) limit the number of states to 100.000 it makse sense to also adjust the default size of the hash table. Based on the benchmarking results in https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/pf-states_hashsize/results/fbsd12-head.r332390/README.md 128K entries offers a good compromise between performance and memory use. Users may still overrule this setting with the net.pf.states_hashsize and net.pf.source_nodes_hashsize loader(8) tunables.	2018-08-05 13:54:37 +00:00

... 3 4 5 6 7 ...

123926 Commits