121899 Commits

Author SHA1 Message Date
Xin LI
b6f7731dba Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-10 06:41:08 +00:00
Navdeep Parhar
f7a203bc21 cxgbe(4): Disable write-combined doorbells by default.
This had been the default behavior but was changed accidentally as part
of the recent iw_cxgbe+OFED overhaul.  Fix another bug in that change
while here: the global knob affects all the adapters in the system and
should be left alone by per-adapter code.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-05-10 06:33:54 +00:00
Justin Hibbits
b4a0a59871 Fix PPC symbol resolution
Summary:
There were 2 issues that were preventing correct symbol resolution
on PowerPC/pseries:

1- memory corruption at chrp_attach() - this caused the inital
   part of the symbol table to become zeroed, which would cause
   the kernel linker to fail to parse it.
   (this was probably zeroing out other memory parts as well)

2- DDB symbol resolution wasn't working because symtab contained
   not relocated addresses but it was given relocated offsets.
   Although relocating the symbol table fixed this, it broke the
   linker, that already handled this case.
   Thus, the fix for this consists in adding a new DDB macro:
   DB_STOFFS(offs) that converts a (potentially) relocated offset
   into one that can be compared with symbol table values.

PR:		227093
Submitted by:	Leandro Lupori <leandro.lupori_gmail.com>
Differential Revision: https://reviews.freebsd.org/D15372
2018-05-10 03:59:48 +00:00
Marcelo Araujo
8951f05525 Rework CTL frontend & backend options to use nv(3), allow creating multiple
ioctl frontend ports.

This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
  Removes ctl_be_arg structure and associated logic and replaces it with
  nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
  ctld(8).
  New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
  Those device nodes respond only to CTL_IO ioctl.

New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4

New syntax for ctl.conf:
target ... {
    port ioctl/<pp>
    ...
}

target ... {
    port ioctl/<pp>/<vp>
    ...

Note: Most of this work was made by jceel@, thank you.

Submitted by:	jceel
Reworked by:	myself
Reviewed by:	mav (earlier versions and recently during the rework)
Obtained from:  FreeNAS and TrueOS
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D9299
2018-05-10 03:50:20 +00:00
Warner Losh
3429b518c9 Remove unused bcopyb.
Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:54 +00:00
Warner Losh
baaa3c4d60 Simplify things a little
Rather than include a copy for memmove to call bcopy to call memcpy
(which handles overlapping copies), make memmove a strong reference to
memcpy to save the two calls.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:48 +00:00
Warner Losh
5aa07b053a Move MI-ish bcopy routine to libkern
riscv and powerpc have nearly identical bcopy.c that's
supposed to be mostly MI. Move it to the MI libkern.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:38 +00:00
Navdeep Parhar
5174205de5 cxgbe(4): Determine whether the firmware supports the FILTER2 work
request, which can be used to configure hardware NAT and swapmac.

All firmwares released after Jan 2017 support this work request.

Sponsored by:	Chelsio Communications
2018-05-10 00:04:14 +00:00
Mark Johnston
e3d5c4ade1 Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-09 20:57:18 +00:00
Mariusz Zaborski
31f7586d73 Introduce the 'n' flag for the geli attach command.
If the 'n' flag is provided the provided key number will be used to
decrypt device. This can be used combined with dryrun to verify if the key
is set correctly. This can be also used to determine which key slot we want to
change on already attached device.

Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D15309
2018-05-09 20:53:38 +00:00
Warner Losh
041f49aece Remove the 'All Rights Reserved' clause from some of the stuff I've
done for Netflix, since I'm in the neighborhood.
2018-05-09 20:32:23 +00:00
Warner Losh
33123867af Use the full year, for real this time. 2018-05-09 20:26:37 +00:00
Mark Johnston
b4fa90d6f9 Fix bxe(4) netdump rx polling.
Reviewed by:	cem, rstone
X-MFC with:	r333287
Sponsored by:	Dell EMC Isilon
2018-05-09 19:54:34 +00:00
Cy Schubert
4273f67609 Fix style error introduced in r333393.
Reported by:	jhb, imp, phk
MFC after:	6 days
X-MFC with:	r333393
2018-05-09 19:05:27 +00:00
Matt Macy
36688f706e Add taskqgroup_config_gtask_deinit to support teardown after
taskqgroup_config_gtask_init.

Approved by:	sbruno
2018-05-09 18:51:35 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Matt Macy
ca9551221b Remove bogus panic
r333345 added a panic to the default case statement on the incorrect
premise that it should "never happen" when in fact it is simply a
different adapter version.

Reported by:	markj
Approved by:	sbruno
2018-05-09 17:48:52 +00:00
Kyle Evans
f0fb94abca Standardize SPDX tag on files I've added 2018-05-09 16:52:28 +00:00
Kyle Evans
4b3c64f722 Remove "All Rights Reserved" on files that I hold sole copyright on
See r333391 for more detail; in summary: it holds no weight and may be
removed.
2018-05-09 16:44:19 +00:00
John Baldwin
485415ec47 Report TRAP_BRKPT for breakpoint traps on sparc64.
Reviewed by:	marius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15190
2018-05-09 15:25:26 +00:00
Mateusz Guzik
20ca271fdd amd64: depessimize bcmp for small buffers
Adapt assembly generated by clang for memcmp and use it for <= 64 sized
compares (which are the vast majority).

Sample result of doing stats on Broadwell (% of samples):
before: 4.0 kernel     bcmp                 cache_lookup
after : 0.7 kernel     bcmp                 cache_lookup

The routine is most definitely still not optimal. Anyone interested in
spending time improving it is welcome to take over.

Reviewed by:	kib
2018-05-09 15:16:25 +00:00
Konstantin Belousov
55c9d75e6b Avoid calls to bzero() before ireloc.
Evaluate cpu_stdext_feature early to have moved link_elf_ireloc() see
correct flags, most important is SMAP.

Tested by:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D15367
2018-05-09 14:39:24 +00:00
Warner Losh
603bbd0631 Minor style nits
Use full copyright year.
Remove 'All Rights Reserved' from new file (rights holder OK'd)
Minor #ifdef motion and #endif tagging
Remove __FBSDID macro from comments

Sponsored by: Netflix
OK'd by: rrs@
2018-05-09 14:11:35 +00:00
Konstantin Belousov
71d1bbce91 Remove PG_U from the rest of the kernel pmap ptes.
Supposedly, they PG_U bits there were set to easier making some kernel
page accessible to userspace in-place.  Since it was not used for the
whole existence of the amd64 pmap.c and current design of the shared
pages prefers double-mapping over the in-place access, remove PG_U
both from the direct map and KVA slots.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-09 12:09:08 +00:00
Konstantin Belousov
5aaa5bc3d6 Remove PG_U from the recursive pte for kernel pmap' PML4 page.
This PML4 page is never used for the userspace process, so there is no
security implications.  But the configuration trips SMAP check, which
should be corrected.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-09 12:03:40 +00:00
Andrey V. Elsukov
782360dec3 Bring in some last changes in NAT64 implementation:
o Modify ipfw(8) to be able set any prefix6 not just Well-Known,
  and also show configured prefix6;
o relocate some definitions and macros into proper place;
o convert nat64_debug and nat64_allow_private variables to be
  VNET-compatible;
o add struct nat64_config that keeps generic configuration needed
  to NAT64 code;
o add nat64_check_prefix6() function to check validness of specified
  by user IPv6 prefix according to RFC6052;
o use nat64_check_private_ip4() and nat64_embed_ip4() functions
  instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows
  to use any configured IPv6 prefixes that are allowed by RFC6052;
o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is
  Well-Known IPv6 prefix. It is used to reduce overhead to check this;
o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config
  structure. And respectivelly modify the rest of code;
o remove now unused ro argument from nat64_output() function;
o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions;
o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile
  as example.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2018-05-09 11:59:24 +00:00
Andrey V. Elsukov
2e4531a12b Add IFCAP_LINKSTATE support to if_loop(4).
Reviewed by:	wollman
Obtained from:	Yandex LLC
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D15278
2018-05-09 10:50:51 +00:00
Hans Petter Selasky
c20feee43b Add myself to copyright in the LinuxKPI RCU support layer.
Suggested by:	mmacy@
Sponsored by:	Mellanox Technologies
2018-05-09 08:50:42 +00:00
Navdeep Parhar
89f651e704 cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool.  Hash and
normal TCAM filters can be used together.  The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters.  Any T5/T6 card with memory can support at
least half a million hash filters.  The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file.  The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto).  It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters.  A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by:	Chelsio Communications
2018-05-09 04:09:49 +00:00
Cy Schubert
bb7af25076 Document intentional fallthrough. (CID 976535)
MFC after:	1 week
2018-05-09 02:07:09 +00:00
Cy Schubert
8d3478a26f Fix memory leak. (CID 1199373).
MFC after:	1 week
2018-05-09 02:02:58 +00:00
Matt Macy
ad738f3791 Reduce overhead of ktrace checks in the common case.
KTRPOINT() checks both if we are tracing _and_ if we are recursing within
ktrace. The second condition is only ever executed if ktrace is actually
enabled. This change moves the check out of the hot path in to the functions
themselves.

Discussed with mjg@

Reported by:	mjg@
Approved by:	sbruno@
2018-05-09 00:00:47 +00:00
Sean Bruno
57b4936514 nxge(4):
Remove nxge(4) and associated man page and tools in FreeBSD 12.0.

Submitted by:	kbowling
Reviewed by:	brooks
Relnotes:	yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D1529
2018-05-08 21:14:29 +00:00
Michael Tuexen
45d41de5e6 Fix two typos reported by N. J. Mann, which were introduced in
https://svnweb.freebsd.org/changeset/base/333382 by me.

MFC after:	3 days
2018-05-08 20:39:35 +00:00
Michael Tuexen
9669e724d1 When reporting ERROR or ABORT chunks, don't use more data
that is guaranteed to be contigous.
Thanks to Felix Weinrank for finding and reporting this bug
by fuzzing the usrsctp stack.

MFC after:	3 days
2018-05-08 18:48:51 +00:00
Jung-uk Kim
e7dfa7d8ab MFV: r333378
Import ACPICA 20180508.
2018-05-08 18:18:27 +00:00
Stephen Hurd
ac88e6da11 iflib: print message when iflib_tx_structures_setup fails
Print a message when iflib_tx_structures_setup fails, like we do for
iflib_rx_structures_setup.

Now that we always print a message from within
iflib_qset_structures_setup when it fails, stop printing one in
iflib_device_register() at the call site.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15300
2018-05-08 17:15:10 +00:00
Konstantin Belousov
053641bb1c Prepare DB# handler for deferred trigger of watchpoints.
Since pop %ss/mov %ss instructions defer all interrupts and exceptions
for the next instruction, it is possible that the userspace watchpoint
trap executes on the first instruction of the kernel entry for
syscall/bpt.

In this case, DB# should be treated similarly to NMI: on amd64 we must
always load GSBASE even if the trap comes from kernel mode, and load
the kernel page table root into %cr3.  Moreover, the trap must
use the dedicated stack, because we are still on the user stack when
trapped on syscall entry.

For i386, we must reload %cr3.  The syscall instruction is not configured,
so there is no issue with executing on user stack when trapping.

Due to some CPU erratas it is not always possible to detect that the
userspace watchpoint triggered by inspecting %dr6.  In trap(), compare the
trap %rip with the known unsafe entry points and if matched pretend that
the watchpoint did not fire at all.

Thank you to the MSRC Incident Response Team, and in particular Greg
Lenti and Nate Warfield, for coordinating the response to this issue
across multiple vendors.

Thanks to Computer Recycling at The Working Center of Kitchener for
making hardware available to allow us to test the patch on additional
CPU families.

Reviewed by:	jhb
Discussed with:	Matthew Dillon
Tested by:	emaste
Sponsored by:	The FreeBSD Foundation
Security:	CVE-2018-8897
Security:	FreeBSD-SA-18:06.debugreg
2018-05-08 17:00:34 +00:00
Stephen Hurd
6108c01395 iflib: cleanup queues when iflib_device_register fail
Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15299
2018-05-08 16:56:02 +00:00
Justin Hibbits
151c44e22b Fix wrong cpu0 identification
Summary:
chrp_cpuref_init() was relying on the boot strap processor to be
the first child of /cpus. That was not always the case, specially
on pseries with FDT.

This change uses the "reg" property of each CPU instead and also
adds several sanity checks to avoid unexpected behavior (maybe
too many panics?).

The main observed symptom was interrupts being missed by the main
processor, leading to timeouts and the kernel aborting the boot.

Submitted by:	Leandro Lupori
Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15174
2018-05-08 13:23:39 +00:00
Hans Petter Selasky
306cf294b2 Fix for missing network interface address event when adding the default IPv6
based link-local address.

The default link local address for IPv6 is added as part of bringing the
network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)"
from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should
catch all the cases of adding IPv6 based addresses to a network interface.
Add a witness warning in case the event handler is not allowed to sleep.

Reviewed by:	network (ae), kib
Differential Revision:	https://reviews.freebsd.org/D13407
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-08 11:39:01 +00:00
Matt Macy
10d20c84ed Fix spurious retransmit recovery on low latency networks
TCP's smoothed RTT (SRTT) can be much larger than an actual observed RTT. This can be either because of hz restricting the calculable RTT to 10ms in VMs or 1ms using the default 1000hz or simply because SRTT recently incorporated a larger value.

If an ACK arrives before the calculated badrxtwin (now + SRTT):
tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1));

We'll erroneously reset snd_una to snd_max. If multiple segments were dropped and this happens repeatedly the transmit rate will be limited to 1MSS per RTO until we've retransmitted all drops.

Reported by:	rstone
Reviewed by:	hiren, transport
Approved by:	sbruno
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8556
2018-05-08 02:22:34 +00:00
Matt Macy
d5210708dd Sleep rather than spin in e1000 when doing long running config operations.
With r333218 it is now possible for drivers to use an sx lock and thus sleep while
waiting on long running operations rather than DELAY().

Reported by:	gallatin
Reviewed by:	sbruno
Approved by:	sbruno
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14984
2018-05-08 01:39:45 +00:00
Mateusz Guzik
2824088536 Inlined sched_userret.
The tested condition is rarely true and it induces a function call
on each return to userspace.

Bumps getuid rate by about 1% on Broadwell.
2018-05-07 23:36:16 +00:00
Mateusz Guzik
75e9b455a9 Change trap_enotcap to bool and annotate with __read_frequently
It is read on each return to user space.
2018-05-07 23:10:12 +00:00
Mateusz Guzik
79ca7cbf09 Avoid calls to syscall_thread_enter/exit for statically defined syscalls
The entire mechanism is rarely used and is quite not performant due to
atomci ops on the syscall table. It also has added overhead for completely
unrelated syscalls.

Reduce it by avoiding the func calls if possible (which consistutes vast
majority of cases).

Provides about 3% syscall rate speed up for getuid on Broadwell.
2018-05-07 22:29:32 +00:00
Mateusz Guzik
a9456603f2 amd64: stop asserting params != NULL in the syscall path
The parameter is effectively controllable by userspace. It does not matter
what it is set to as it is being passed to copyin - worst case the operation
will just fail.

While here stop computing it unless it is going to be used.

Noted by:	dillon@backplane.com
2018-05-07 21:32:08 +00:00
Warner Losh
b425e3fba2 Put the CPU starting on one line. 2018-05-07 21:09:21 +00:00
Warner Losh
43d9cb5b74 Use device_quiet_children to silence verbose CPU probe messages.
Have cpu0 be noisy, but all the other CPU devices be quiet on boot.
2018-05-07 21:09:17 +00:00
Warner Losh
ad7142757b Add device_quiet_children() and device_has_quiet_children()
If you add a child to a device that has quiet children, we'll
automatically set the quiet flag on the children, and its
children.

This is indended for things like CPU that have a large amount of
repetition in booting that adds nothing.
2018-05-07 21:09:08 +00:00