freebsd-nq

Author	SHA1	Message	Date
Matt Macy	afbd6cfa72	hpts: remove redundant decl breaking gcc build	2018-06-08 17:37:43 +00:00
Matt Macy	46033610ec	unbreak LINT build after r334804	2018-06-08 05:48:36 +00:00
Matt Macy	7f5336f666	hwpmc: fix arm64 INVARIANTS build	2018-06-08 05:48:28 +00:00
Mateusz Guzik	b8af2820f6	uma: fix up r334824 Turns out there is code which ends up passing M_ZERO to counters. Since counters zero unconditionally on their own, just ignore drop the flag in that place.	2018-06-08 05:40:36 +00:00
Matt Macy	58378a8971	rtentry_zinit: don't blindly pass through M_ZERO to counter alloc	2018-06-08 05:17:06 +00:00
Matt Macy	978910109d	hwpmc: avoid undefined variable on LINT	2018-06-08 05:01:09 +00:00
Matt Macy	eb7c901995	hwpmc: simplify calling convention for hwpmc interrupt handling pmc_process_interrupt takes 5 arguments when only 3 are needed. cpu is always available in curcpu and inuserspace can always be derived from the passed trapframe. While facially a reasonable cleanup this change was motivated by the need to workaround a compiler bug. core2_intr(cpu, tf) -> pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) -> pmc_add_sample(cpu, ring, pm, tf, inuserspace) In the process of optimizing the tail call the tf pointer was getting clobbered: (kgdb) up at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709 4709 pmc_save_kernel_callchain(ps->ps_pc, (kgdb) up 1205 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, resulting in a crash in pmc_save_kernel_callchain.	2018-06-08 04:58:03 +00:00
Mateusz Guzik	dfa5753e09	amd64: remove now unused bzero, bcmp and bcopy. move pagecopy higher up.	2018-06-08 04:18:42 +00:00
Mateusz Guzik	ea99223ec9	uma: remove M_ZERO support for pcpu zones Nothing in the tree uses it and pcpu zones have a fundamentally different use case than the regular zones - they are not supposed to be allocated and freed all the time. This reduces pollution in the allocation fast path.	2018-06-08 03:16:16 +00:00
Mateusz Guzik	c9ca1a70cc	amd64: fix a retarded bug in memset memset fills the target buffer from a byte-sized value passed in as the second argument. The fully-sized (8 bytes) register containing it is named %rsi. Lower 4 bytes can be referred to as %esi and finally the lowest byte is %sil. Vast majority of all the callers just zero the target buffer and set it up by doing xor %esi,%esi which has a side-effect of zeroing the upper parts of the register as well. Some others do a word-sized move to %esi which has the same result. However, there are callers which only fill %sil. This does not clear up the rest of the register. The value of %rsi is multiplied by $0x0101010101010101 to create a 8-byte sized pattern for 8-byte stores. Prior to the patch, the func just blindly took %rsi assuming the unwanted bytes are zeroed out. Since this is not the case for the callers which only play with %sil (the rest of the register can have absolutely anything), the resulting pattern can be garbage. This has potential for funny bugs. One side effect (which was not amusing) after enabling it instead of bzero was that the kernel was hanging on boot as a xen domU. Reported by: Trond Endrestøl <Trond.Endrestol fagskolen.gjovik.no> Pointy hat: me	2018-06-08 00:47:24 +00:00
Gleb Smirnoff	c5deaf0452	UMA memory debugging enabled with INVARIANTS consists of two things: trashing freed memory and checking that allocated memory is properly trashed, and also of keeping a bitset of freed items. Trashing/checking creates a lot of CPU cache poisoning, while keeping debugging bitsets consistent creates a lot of contention on UMA zone lock(s). The performance difference between INVARIANTS kernel and normal one is mostly attributed to UMA debugging, rather than to all KASSERT checks in the kernel. Add loader tunable vm.debug.divisor that allows either to turn off UMA debugging completely, or turn it on only for a fraction of allocations, while still running all KASSERTs in kernel. That allows to run INVARIANTS kernels in production environments without reducing load by orders of magnitude, but still doing useful extra checks. Default value is 1, meaning debug every allocation. Value of 0 would disable UMA debugging completely. Values above 1 enable debugging only for every N-th item. It isn't possible to strictly follow the number, but still amount of debugging is reduced roughly by (N-1)/N percent. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15199	2018-06-08 00:15:08 +00:00
Breno Leitao	6d645c57a3	Fix excise_initrd_region() to support 32- and 64-bit initrd params. Changed excise_initrd_region to support both 32- and 64-bit values for linux,initrd-start and linux,initrd-end. This fixes the boot problem on some machines after rS334485. Submitted by: Luis Pires <lffpires@ruabrasil.org> Reviewed by: jhibbits, leitao Approved by: jhibbits (mentor) Differential Revision: https://reviews.freebsd.org/D15667	2018-06-07 21:24:21 +00:00
Randall Stewart	401e870791	Take out the stack alias inadvertantly added by my commit. Reported by: Peter Lei	2018-06-07 20:57:12 +00:00
Randall Stewart	12693c6c83	Fix build issue with const and volatile and the myriad ways that the various compliers treat this. The only safe prefetch appears to be for AMD. The other compilers either are not volatile or are not const :( Reported by: Michael Tuexen	2018-06-07 19:57:55 +00:00
Benno Rice	b3b11d6400	Break recursion involving getnewvnode and zfs_rmnode. When we're at our vnode limit, getnewvnode will call into the vnode LRU cache to free up vnodes. If the vnode we try to recycle is a ZFS vnode we end up, eventually, in zfs_rmnode. If the ZFS vnode we're recycling represents something with extended attributes, zfs_rmnode will call zfs_zget which will attempt to allocate another vnode. If the next vnode we try to recycle is also a ZFS vnode representing something with extended attributes we can recurse further. This ends up being unbounded and can end up overflowing the stack. In order to avoid this, restructure zfs_rmnode to simply add the extended attribute directory's object ID to the unlinked set, thus not requiring the allocation of a vnode. We then schedule a task that calls zfs_unlinked_drain which will do the work of properly marking the vnodes for unlinking. zfs_unlinked_drain is also called on mount so these will be cleaned up there. Reviewed by: avg, mav Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D15342	2018-06-07 18:59:32 +00:00
Randall Stewart	89e560f441	This commit brings in a new refactored TCP stack called Rack. Rack includes the following features: - A different SACK processing scheme (the old sack structures are not used). - RACK (Recent acknowledgment) where counting dup-acks is no longer done instead time is used to knwo when to retransmit. (see the I-D) - TLP (Tail Loss Probe) where we will probe for tail-losses to attempt to try not to take a retransmit time-out. (see the I-D) - Burst mitigation using TCPHTPS - PRR (partial rate reduction) see the RFC. Once built into your kernel, you can select this stack by either socket option with the name of the stack is "rack" or by setting the global sysctl so the default is rack. Note that any connection that does not support SACK will be kicked back to the "default" base FreeBSD stack (currently known as "default"). To build this into your kernel you will need to enable in your kernel: makeoptions WITH_EXTRA_TCP_STACKS=1 options TCPHPTS Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15525	2018-06-07 18:18:13 +00:00
Konstantin Belousov	943defc3a0	Account for dmap limit when selecting the pages for the bootstrap pagetables. physmap[] can be inconsistent with the physical memory limit due to buggy bios, or to the hw.physmem tunable. Since bootstrap pagetables are initialized by accesses through the DMAP, we must ensure that DMAP really cover the selected pages. This is only relevant when machine has less than 4G RAM and buggy BIOS, which is the combination on Acer Chromebook 720. The call to mp_bootaddress() is moved later to have Maxmem initialized. An alternative could be to always cover 4G for DMAP, but this change seems to be simpler. Reported and tested by: grembo Reviewed by: royger Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15675	2018-06-07 17:04:34 +00:00
Breno Leitao	4a4b4c98f5	dev/ofw: Fix ofw_fdt_getprop() return values to match documentation Fix the behavior of ofw_fdt_getprop() and ofw_fdt_getprop() functions to match the documentation as the non-fdt code. Submitted by: Luis Pires <lffpires@ruabrasil.org> Reviewed by: manu, jhibbits Approved by: jhibbits (mentor) Differential Revision: https://reviews.freebsd.org/D15680	2018-06-07 15:59:08 +00:00
Andriy Gapon	0fb3a72a0d	x86: reorganize code that deals with unexpected NMI-s Expected NMI-s are those than are either generated by the software (such as a CPU sending NMI to other CPU) or generated by the hardware after the software configured it to do so (such as NMI-s on PMC events). Some unexpected NMI-s can be caused by hardware failures and it is possible to inquire the hardware about them (somewhat like MCA but much more primitive) using an EISA mechanism. In some cases the origin of the NMI can remain truly unknown. This commit should not change any functionality. It just reorganizes the code, so that it is easier to extend with new checks for the origin of the NMI. Also, it frees the code that has nothing to do with ISA from DEV_ISA. MFC after: 3 weeks	2018-06-07 14:46:52 +00:00
Andriy Gapon	413ed27cd7	expand descriptions of x86 panic_on_nmi and kdb_on_nmi sysctls The descriptions were as terse as the variable names and they did not explain additional conditions for knobs. MFC after: 1 week	2018-06-07 14:23:31 +00:00
Breno Leitao	7b2c7b92be	md: use prestaged mfs_root On PowerNV systems, the rootfs is passed through kexec, which loads the rootfs into memory and set two fdt entries to describe where the file is located in the memory; I need to pass this memory region to the md device as a mfs_root, but, current md driver does not support two things: * Just getting a pointer from an external (bootloader) memory. If I need to workaround it, I would need to declare a static array and memcopy from this external memory to this static variable. * The size of the image. The usage of mfs_root_end, which is not a pointer, seems to be not possible for this prestaged scenario. This patch simply adds a new way to load mfs_root from memory. Differential Revision: https://reviews.freebsd.org/D15625 Approved by: kib, jhibbits (mentor)	2018-06-07 13:57:34 +00:00
Jonathan T. Looney	16e05b3275	Fix a typo in vm_domain_set(). When a domain crosses into the severe range, we need to set the domain bit from the vm_severe_domains bitset (instead of clearing it). Reviewed by: jeff, markj Sponsored by: Netflix, Inc.	2018-06-07 13:29:54 +00:00
Eric Joyner	a06424ddd3	iflib: Record TCP checksum info in iflib when TCP checksum is requested ixl(4) (when it switches over to using iflib) devices need the TCP header length in order to do TCP checksum offload. Reviewed by: gallatin@, shurd@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15558	2018-06-07 13:03:07 +00:00
Hans Petter Selasky	71ee95ddf7	Define ARCH_KMALLOC_MINALIGN in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-07 11:44:11 +00:00
Hans Petter Selasky	d150e15285	Wrap timespec64 into timespec in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-07 11:41:42 +00:00
Hans Petter Selasky	a041e75a34	Move the EXPORT_SYMBOL_XXX() function macros into own header file. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-07 11:34:59 +00:00
Hans Petter Selasky	422d8af4df	Implement the dev_pm_set_driver_flags() function macro in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-07 11:29:07 +00:00
Justin Hibbits	b5e08a60e0	Build nvme modules for powerpc, and install man pages NVMe builds for powerpc now, so just build modules for all powerpc targets, and install NVMe man pages for all powerpc targets.	2018-06-07 11:25:36 +00:00
Alan Cox	5b274055d1	When pidctrl_daemon() is called multiple times within an interval, it should use the cumulative error to calculate the output.	2018-06-07 07:48:50 +00:00
Matt Macy	fcabd54160	AF_UNIX: check for unp == unp2 on disconnect	2018-06-07 04:57:40 +00:00
Alan Cox	e768070ca9	pidctrl_daemon() implements a variation on the classical, discrete PID controller that tries to handle early invocations of the controller, in other words, invocations before the expected end of the interval. However, there were some calculation errors in this early invocation case. Notably, if an early invocation occurred while the error was negative, the derivative term was off by a large amount. One visible effect of this error was that processes were being killed by the virtual memory system's OOM killer when in fact there was plentiful free memory. Correct a couple minor errors in the sysctl descriptions, and apply some style fixes. Reviewed by: jeff, markj	2018-06-07 02:54:11 +00:00
Matt Macy	9616acde97	hwpmc: don't do EMIT64 on constant	2018-06-07 02:20:27 +00:00
Matt Macy	f992dd4b5c	pmc: convert native to jsonl and track TSC value of samples - add '-j' options to filter to enable converting native pmc log format to json lines format to enable the use of scripts and external tooling % pmc filter -j pmc.log pmc.jsonl - Record the tsc value in sampling interrupts as opposed to recording nanotime when the sample is copied to a global log in hardclock - potentially many milliseconds later. - At initialize record the tsc_freq and the time of day to give us an offset for translating the tsc values in callchain records	2018-06-07 02:03:22 +00:00
Matt Macy	41abd7afa3	hwpmc: don't log pid->name more than once	2018-06-07 00:54:43 +00:00
Matt Macy	155046394a	cpufunc: add rdtscp for x86	2018-06-07 00:54:11 +00:00
Michael Tuexen	ff34bbe9c2	Improve compliance with RFC 4895 and RFC 6458. Silently dicard SCTP chunks which have been requested to be authenticated but are received unauthenticated no matter if support for SCTP authentication has been negotiated. This improves compliance with RFC 4895. When the application uses the SCTP_AUTH_CHUNK socket option to request a chunk to be received in an authenticated way, enable the SCTP authentication extension for the end-point. This improves compliance with RFC 6458. Discussed with: Peter Lei MFC after: 3 days	2018-06-06 19:27:06 +00:00
Conrad Meyer	cbb009b9fe	puc(4): Add provisional support for Exar XR17V352 Reportedly, this is sufficient for 4800bps use, but maybe not any better. PR: 228781 Submitted by: peo AT nethead.se	2018-06-06 16:47:33 +00:00
Hans Petter Selasky	40ddfc7604	Make some list functions RCU safe in the LinuxKPI. While at it rename hlist_add_after() into hlist_add_behind(). Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 15:49:01 +00:00
Sean Bruno	1a43cff92a	Load balance sockets with new SO_REUSEPORT_LB option. This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures: Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations: As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). This is a substantially different contribution as compared to its original incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@ for the substantive feedback that is included in this commit. Submitted by: Johannes Lundberg <johalun0@gmail.com> Obtained from: DragonflyBSD Relnotes: Yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003	2018-06-06 15:45:57 +00:00
Hans Petter Selasky	17e2a84e9a	Rewrite code using atomic_fcmpset_int() in the LinuxKPI. Suggested by: mjg@ MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 15:31:47 +00:00
Hans Petter Selasky	e6e028d01f	Implement the __add_wait_queue_entry_tail() function in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 15:19:30 +00:00
Justin Hibbits	3f9e1fc8ee	Revert r334708 This is the wrong place to put the barrier. Requested by: kib,mjg	2018-06-06 15:12:19 +00:00
Hans Petter Selasky	7e95e98db8	Implement the might_sleep_if() function macro in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 15:10:11 +00:00
Hans Petter Selasky	ab98f1e8d8	Rename two structure field members while keeping backwards compatibility in the LinuxKPI. Add a comment saying in which Linux version this change was made. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 15:06:21 +00:00
Hans Petter Selasky	1b092623b2	Implement the init_wait_entry() function macro in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 14:59:23 +00:00
Hans Petter Selasky	23dcf4359e	Implement the atomic_dec_if_positive() function in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 13:59:51 +00:00
Hans Petter Selasky	9e067b2256	Implement the ktime_compare() and ktime_after() functions in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 13:37:31 +00:00
Hans Petter Selasky	a16387c1d4	Implement the rdmsrl_safe() function macro in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies Sponsored by: Limelight Networks	2018-06-06 13:29:52 +00:00
Andrey V. Elsukov	590d0a43b6	Make in_delayed_cksum() be similar to IPv6 implementation. Use m_copyback() function to write checksum when it isn't located in the first mbuf of the chain. Handmade analog doesn't handle the case when parts of checksum are located in different mbufs. Also in case when mbuf is too short, m_copyback() will allocate new mbuf in the chain instead of making out of bounds write. Also wrap long line and remove now useless KASSERTs. X-MFC after: r334705	2018-06-06 13:01:53 +00:00
Justin Hibbits	32c369f40c	Add a memory barrier after taking a reference on the vnode holdcnt in _vhold This is needed to avoid a race between the VNASSERT() below, and another thread updating the VI_FREE flag, on weakly-ordered architectures. On a 72-thread POWER9, without this barrier a 'make -j72 buildworld' would panic on the assert regularly. It may be possible to use a weaker barrier, and I'll investigate that once all stability issues are worked out on POWER9.	2018-06-06 12:57:11 +00:00

1 2 3 4 5 ...

122556 Commits