122509 Commits

Author SHA1 Message Date
Hans Petter Selasky
a16387c1d4 Implement the rdmsrl_safe() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-06 13:29:52 +00:00
Andrey V. Elsukov
590d0a43b6 Make in_delayed_cksum() be similar to IPv6 implementation.
Use m_copyback() function to write checksum when it isn't located
in the first mbuf of the chain. Handmade analog doesn't handle the
case when parts of checksum are located in different mbufs.
Also in case when mbuf is too short, m_copyback() will allocate new
mbuf in the chain instead of making out of bounds write.

Also wrap long line and remove now useless KASSERTs.

X-MFC after:	r334705
2018-06-06 13:01:53 +00:00
Justin Hibbits
32c369f40c Add a memory barrier after taking a reference on the vnode holdcnt in _vhold
This is needed to avoid a race between the VNASSERT() below, and another
thread updating the VI_FREE flag, on weakly-ordered architectures.

On a 72-thread POWER9, without this barrier a 'make -j72 buildworld' would
panic on the assert regularly.

It may be possible to use a weaker barrier, and I'll investigate that once
all stability issues are worked out on POWER9.
2018-06-06 12:57:11 +00:00
Andrey V. Elsukov
4a089e6bc5 Use m_copyback() function to write delayed checksum when it isn't located
in the first mbuf of the chain.

MFC after:	1 week
2018-06-06 10:46:24 +00:00
Tom Jones
1fdbfb909f Use UDP len when calculating UDP checksums
The length of the IP payload is normally equal to the UDP length, UDP Options
(draft-ietf-tsvwg-udp-options-02) suggests using the difference between IP
length and UDP length to create space for trailing data.

Correct checksum length calculation to use the UDP length rather than the IP
length when not offloading UDP checksums.

Approved by: jtl (mentor)
Differential Revision:	https://reviews.freebsd.org/D15222
2018-06-06 07:04:40 +00:00
Andrey V. Elsukov
a41372abd4 Fix LINT-NOINET build.
Use known at build time size for min_length value. Also remove the
check from in6_gre_encapcheck(), now it is done in generic code.
2018-06-06 05:17:21 +00:00
Mateusz Guzik
ffc3ab5d39 malloc: elaborate on r334545 due to frequent questions
While here annotate the NULL check as probably true.
2018-06-06 05:08:05 +00:00
Matt Macy
b2ca2e50b9 hwpmc: add summary command and further metadata extensions
metadata changes:
- log pmc sample rate with pmcallocate
- log proc flags with thread / process logging
  to identify user vs kernel threads

fixes:
- use log cpuid to translate event id to event name

Implement rudimentary summary command to track sample
counts by thread and process name within a pmc log.

% make -j4 buildkernel >& /dev/null &
% sudo pmcstat -S unhalted_core_cycles -S llc-misses -O foo sleep 15
% pmc summary foo
cpu_clk_unhalted.thread_p_any:
        idle: 138108207162
        clang-6.0: 105336158004
        sh: 72340108510
        make: 8642012963
        kernel: 7754011631
longest_lat_cache.miss:
        clang-6.0: 87502625
        sh: 40901227
        make: 5500165
        kernel: 3300099
        awk: 2000060

%  pmc summary -f ~/foo
idx: 278 name: cpu_clk_unhalted.thread_p_any rate: 2000003
idle: 69054
clang-6.0: 52668
sh: 36170
make: 4321
kernel: 3877
hwpmc: proc(7445): 3319
awk: 1289
xargs: 357
rand_harvestq: 181
mtree: 102
intr: 53
zfskern: 31
usb: 7
pagedaemon: 4
ntpd: 3
syslogd: 1
acpi_thermal: 1
logger: 1
syncer: 1
snmptrapd: 1
sleep: 1
idx: 17 name: longest_lat_cache.miss rate: 100003
clang-6.0: 875
sh: 409
make: 55
kernel: 33
awk: 20
hwpmc: proc(7445): 14
xargs: 9
idle: 8
intr: 3
zfskern: 2
2018-06-06 02:48:09 +00:00
Andrey V. Elsukov
b941bc1d6e Rework if_gif(4) to use new encap_lookup_t method to speedup lookup
of needed interface when many gif interfaces are present.

Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations.
Use hash table to speedup lookup of needed softc. Interfaces
with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST.
Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed
16 years ago, and actually it could work only for outbound direction.
Each protocol, that can be handled by if_gif(4) interface is registered
by separate encap handler, this helps avoid invoking the handler
for unrelated protocols (GRE, PIM, etc.).

This change allows dramatically improve performance when many gif(4)
interfaces are used.

Sponsored by:	Yandex LLC
2018-06-05 21:24:59 +00:00
Andrey V. Elsukov
73ae9758a1 Constify argument of in6_getscope(). 2018-06-05 20:54:29 +00:00
Andrey V. Elsukov
6d8fdfa9d5 Rework IP encapsulation handling code.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
  data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
  families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
  that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
  registered, encapcheck handler of each interface is invoked for each
  packet. The search takes O(n) for n interfaces. All this work is done
  with exclusive lock held.

What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
  addedr: min_length is the minimum packet length, that encapsulation
  handler expects to see; exact_match is maximum number of bits, that
  can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
  targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
  many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
  method was used from this structure, so I don't see the need to keep
  using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
  pointer. Now it is passed directly trough encap_input_t method.
  encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
  any code in the tree that uses them. All consumers use encap_attach_func()
  method, that relies on invoking of encapcheck() to determine the needed
  handler.
- introduced struct encap_config, it contains parameters of encap handler
  that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
  handlers that need more bits to match will be checked first, and if
  encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.

Reviewed by:	mmacy
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15617
2018-06-05 20:51:01 +00:00
Eric van Gyzen
56009ba0ed Make Coverity more happy with r334545
Coverity complains about:

	if (((flags) & M_WAITOK) || _malloc_item != NULL)

saying:

	The expression
		1 /* (2 | 0x100) & 2 */ || _malloc_item != NULL
	is suspicious because it performs a Boolean operation
	on a constant other than 0 or 1.

Although the code is correct, add "!= 0" to make it slightly
more legible and to silence hundreds(?) of Coverity warnings.

Reported by:	Coverity
Discussed with:	mjg
Sponsored by:	Dell EMC
2018-06-05 20:34:11 +00:00
Andrey V. Elsukov
9f855fb282 tcp_lro.h requires <netinet/in.h>, include it directly instead of
indirect inclusion trough if_gif.h
2018-06-05 19:23:23 +00:00
Hans Petter Selasky
7a13eeba18 Declare and set the global "system_highpri_wq" workqueue structure pointer
in the LinuxKPI.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:49:35 +00:00
Hans Petter Selasky
c6d920309c Implement the INIT_DELAYED_WORK_ONSTACK() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:46:16 +00:00
Hans Petter Selasky
7f346854b8 Define the __kernel_size_t type in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:42:35 +00:00
Hans Petter Selasky
5a1d03bb7c Implement the task_pid_vnr() function macro in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:40:09 +00:00
Hans Petter Selasky
747d9a8165 Add "access" function pointer to the "vm_operations_struct" structure
in the LinuxKPI. While at it document when to use the "virtual_address" or
the "address" field in the "vm_fault" structure.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:37:28 +00:00
Hans Petter Selasky
0d2dce0b78 Implement mul_u32_u32() function in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:30:36 +00:00
Hans Petter Selasky
f446b7cab4 Implement timer_setup() and from_timer() function macros in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-05 15:20:20 +00:00
Ram Kishore Vegesna
ca21db8546 Issue:
Utility hangs when  OCS_IOCTL_CMD_MGMT_GET_ALL called in parallel on port 0 and port 1.

Fix: Using static structure for results is corrupting the second ioctl request. Removed static for results structure.
Approved by: ken
MFC after: 3 days
2018-06-05 15:05:26 +00:00
Ilya Bakulin
d670d9518f Enable high-speed on the card before increasing frequency on the controller
Increasing operating frequency without telling card to switch
to high-speed mode first upsets some cards and generates CRC errors.

While here, deselect / reselect cards after CMD6 and SCR fetch, as in original code.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15568
2018-06-05 11:03:24 +00:00
Eitan Adler
10c66ed32e usbdevs: sync from NetBSD
This adds several vendors from NetBSD's copy of the same file (r1.749).
Prefer longer more "canonical" names where the names differed.

Sort while here.
2018-06-05 09:52:38 +00:00
Kevin Lo
3fff2af912 Since we don't enable BUF_TRACKING and FULL_BUF_TRACKING buffer debugging
options in GENERIC kernels on arm and arm64, there's no need to disable
them.

Sponsored by:	MSI/FUNTORO
2018-06-05 05:24:42 +00:00
Kevin Lo
ce24b3e7d8 Add support for SIMCom SIM7600E.
Sponsored by:	MSI/FUNTORO
2018-06-05 05:19:04 +00:00
Matt Macy
ebfaf69cc0 hwpmc: log name->pid, name->tid mappings
By logging all threads and processes 'pmc filter'
can now filter on process or thread name, relieving
the user of the burden of determining which tid or
pid was which when the sample was taken.

% pmc filter -T if_io_tqg -P nginx pmc.log pmc-iflib.log

% pmc filter -x -T idle pmc.log pmc-noidle.log
2018-06-05 04:26:40 +00:00
Jung-uk Kim
3d90091d60 MFV: r334448
Import ACPICA 20180531.
2018-06-04 22:26:47 +00:00
Matt Macy
ac7012d284 hwpmc: don't defer user callchain capture completion to ast 2018-06-04 21:17:30 +00:00
Mark Johnston
97bc9a9384 Regen after r334626. 2018-06-04 19:36:47 +00:00
Mark Johnston
9f9c9b22ec Reimplement brk() and sbrk() to avoid the use of _end.
Previously, libc.so would initialize its notion of the break address
using _end, a special symbol emitted by the static linker following
the bss section.  Compatibility issues between lld and ld.bfd could
cause the wrong definition of _end (libc.so's definition rather than
that of the executable) to be used, breaking the brk()/sbrk()
interface.

Avoid this problem and future interoperability issues by simply not
relying on _end.  Instead, modify the break() system call to return
the kernel's view of the current break address, and have libc
initialize its state using an extra syscall upon the first use of the
interface.  As a side effect, this appears to fix brk()/sbrk() usage
in executables run with rtld direct exec, since the kernel and libc.so
no longer maintain separate views of the process' break address.

PR:		228574
Reviewed by:	kib (previous version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D15663
2018-06-04 19:35:15 +00:00
Conrad Meyer
ede2f7731d Correctly handle the padding for IPv6-AH, as specified by RFC4302
The RFC specifies that under IPv6 the complete AH header must be 64 bit
aligned, and under IPv4, 32 bit aligned. Prior to this change, we (along
with other BSDs and MacOS) had violated this requirement.

This makes it possible to set up IPv6-AH between Linux and BSD, and also
probably between Windows and BSD.

PR:		222684
Reported and tested by:	Jason Mader <jasonmader AT gmail.com>
Obtained from:	NetBSD xform_ah.c 1.105
		(b939fe2483972eb43d71bf990cfb7f26dece7839 NetBSD/src on GH)
		by Maxime Villard
MFC after:	35.2731 hours
Relnotes:	probably (breaks ipv6 compat with older FreeBSD/NetBSD/MacOS)
Sponsored by:	Dell EMC Isilon
2018-06-04 18:51:06 +00:00
Conrad Meyer
3d825c42ca str(r)chr: Replace union abuse with __DECONST
Writing one union member and reading another is technically illegal C,
although we do it in many places in the tree.  Use the __DECONST macro
instead, which is (technically) a valid C construct.

Trivial style(9) cleanups to touched lines while here.

Sponsored by:	Dell EMC Isilon
2018-06-04 18:47:14 +00:00
Matt Macy
cf823003a7 hwpmc: remove gratuitous curthread checks 2018-06-04 17:49:34 +00:00
Mark Johnston
27e29d103f Correct the description of vm_pageout_scan_inactive() after r334508.
Reported by:	alc
2018-06-04 16:46:36 +00:00
Alan Cox
3e7cb27cdd Use a single, consistent approach to returning success versus failure in
vm_map_madvise().  Previously, vm_map_madvise() used a traditional Unix-
style "return (0);" to indicate success in the common case, but Mach-
style return values in the edge cases.  Since KERN_SUCCESS equals zero,
the only problem with this inconsistency was stylistic.  vm_map_madvise()
has exactly two callers in the entire source tree, and only one of them
cares about the return value.  That caller, kern_madvise(), can be
simplified if vm_map_madvise() consistently uses Unix-style return
values.

Since vm_map_madvise() uses the variable modify_map as a Boolean, make it
one.

Eliminate a redundant error check from kern_madvise().  Add a comment
explaining where the check is performed.

Explicitly note that exec_release_args_kva() doesn't care about
vm_map_madvise()'s return value.  Since MADV_FREE is passed as the
behavior, the return value will always be zero.

Reviewed by:	kib, markj
MFC after:	7 days
2018-06-04 16:28:06 +00:00
Ruslan Bukin
8cd6c09e7e Fix build: ignore a GCC 7.2.0 warning which says that third argument of
memset(3) should contain the number of elements multiplied by the element
size.

Sponsored by:	DARPA, AFRL
2018-06-04 16:20:22 +00:00
Justin Hibbits
12f691959f Align UMA data to 128 byte cacheline size
Suggested by:	mjg
2018-06-04 15:44:17 +00:00
Mark Johnston
bcc51ef48d Fix the NUMA build for non-x86 platforms.
acpi_map_pxm_to_vm_domainid() is currently implemented only on x86.

MFC after:	1 week
2018-06-04 14:56:02 +00:00
Rick Macklem
8472f76005 Revert r334586 since I now think __unused is the better way to handle this. 2018-06-04 11:35:04 +00:00
Matt Macy
9645bcabdf hwpmc: fix fixed counters checks 2018-06-04 04:49:06 +00:00
Matt Macy
07d80fd8dc hwpmc: ABI fixes
- increase pmc cpuid field from 8 to 12 bits
- add cpuid version string to initialize entry in the log
  so that filter can identify which counter index an
  event name maps to
- GC unused config flags
- make fixed counter assignment more robust as well as the
  changes needed to be properly identified for filter
2018-06-04 02:05:48 +00:00
Matt Macy
5de96e33d6 hwpmc: support sampling both kernel and user stacks when interrupted in kernel
This adds the -U options to pmcstat which will attribute in-kernel samples
back to the user stack that invoked the system call. It is not the default,
because when looking at kernel profiles it is generally more desirable to
merge all instances of a given system call together.

Although heavily revised, this change is directly derived from D7350 by
Jonathan T. Looney.

Obtained from: jtl
Sponsored by: Juniper Networks, Limelight Networks
2018-06-04 01:10:23 +00:00
Rick Macklem
12c7a494ad Fix a gcc8 warning about a write only variable.
gcc8 warns that "verf" was set but not used. This was because the code
that uses it is disabled via a "#if 0".
This patch adds a "#if 0" to the variable's declaration and assignment
to get rid of the warning.
This way the code could be re-enabled without difficulty.

Requested by:	mmacy
MFC after:	2 weeks
2018-06-03 19:46:44 +00:00
Matt Macy
2ce69a4d04 hwpmc: ensure that mapin updates are synchronous 2018-06-03 19:37:17 +00:00
Matt Macy
a1a1e8a89a pmc: remove assert that is invalid in interrupt context 2018-06-03 19:37:09 +00:00
Vladimir Kondratyev
6bae6b2538 [evdev] Sync event codes with Linux kernel 4.16
MFC after:	2 weeks
2018-06-03 10:53:10 +00:00
Justin Hibbits
a1a990d8a4 Revert r326083, it doesn't behave as expected.
Even though there do appear to be more artificial frames, with 12, stack
traces no longer list at all.  Revert until a better, more stable value can
be determined.
2018-06-03 03:53:11 +00:00
Mateusz Guzik
d0a22279db Remove an unused argument to turnstile_unpend.
PR:	228694
Submitted by:	Julian Pszczołowski <julian.pszczolowski@gmail.com>
2018-06-02 22:37:53 +00:00
Mateusz Guzik
34c538c356 malloc: try to use builtins for zeroing at the callsite
Plenty of allocation sites pass M_ZERO and sizes which are small and known
at compilation time. Handling them internally in malloc loses this information
and results in avoidable calls to memset.

Instead, let the compiler take the advantage of it whenever possible.

Discussed with:	jeff
2018-06-02 22:20:09 +00:00
Justin Hibbits
5167f178ab Included VSX registers in powerpc core dumps
Summary: Included VSX registers in powerpc core dumps (both kernel and gcore)

Submitted by:	Luis Pires
Differential Revision: https://reviews.freebsd.org/D15512
2018-06-02 20:28:58 +00:00