Commit Graph

122105 Commits

Author SHA1 Message Date
Fedor Uporov
6d4a4ed747 Fix directory blocks checksumming.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15396
2018-05-13 19:48:30 +00:00
Fedor Uporov
c4aa9a026d Fix on-disk inode checksum calculation logic.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15395
2018-05-13 19:29:35 +00:00
Fedor Uporov
e06e5241a0 Fix EXT2FS_DEBUG definition usage.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15394
2018-05-13 19:19:10 +00:00
Mark Johnston
36f8fe9bbb Get rid of vm_pageout_page_queued().
vm_page_queue(), added in r333256, generalizes vm_pageout_page_queued(),
so use it instead.  No functional change intended.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D15402
2018-05-13 13:00:59 +00:00
Rick Macklem
0f13d146a0 Fix a slow leak of session structures in the NFSv4.1 server.
For a fairly rare case of a client doing an ExchangeID after a hard reboot,
the old confirmed clientid still exists, but some clients use a new
co_verifier. For this case, the server was not freeing up the sessions on
the old confirmed clientid.
This patch fixes this case. It also adds two LIST_INIT() macros, which are
actually no-ops, since the structure is malloc()d with M_ZERO so the pointer
is already set to NULL.
It should have minimal impact, since the only way I could exercise this
code path was by doing a hard power cycle (pulling the plus) on a machine
running Linux with a NFSv4.1 mount on the server.
Originally spotted during testing of the ESXi 6.5 client.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-13 12:42:53 +00:00
Rick Macklem
bb3436966a The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK.
When an NFSv4.1 session is busy due to a callback being in progress,
nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK.
The only effect this has is that the DestroySession operation will report
the failure for this case and this probably has little or no effect on a
client. Spotted by inspection and no failures related to this have been
reported.

MFC after:	2 months
2018-05-13 12:29:09 +00:00
Konstantin Belousov
2ebc882927 Detect and optimize reads from the hole on UFS.
- Create getblkx(9) variant of getblk(9) which can return error.
- Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP
  was performed before the buffer is created, and EJUSTRETURN returned
  in case the requested block does not exist.
- Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and
  allocating the pages for it), copying from zero_region instead.

The end result is less page allocations and buffer recycling when a
hole is read, which is important for some benchmarks.

Requested and reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D14917
2018-05-13 09:47:28 +00:00
Matt Macy
f1401123c5 hwpmc/epoch - don't reference domain if NUMA is not set
It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.

Reported by:	pho@, np@
Approved by:	sbruno@
2018-05-12 20:00:29 +00:00
Mark Johnston
5f05bda607 DTrace aarch64: Avoid calling unwind_frame() in the probe context.
unwind_frame() may be instrumented by FBT, leading to recursion into
dtrace_probe(). Manually inline unwind_frame() as we do with stack
unwinding code for other architectures.

Submitted by:	Domagoj Stolfa
Reviewed by:	manu
MFC after:	1 week
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D15359
2018-05-12 15:35:26 +00:00
Emmanuel Vadot
dfb8c122c9 aw_mmc: Rework regulator handling
Don't enable regulator on attach but dealt with them on power_up/power_off
Only set the voltage for the signaling regulator since I don't have boards
that can change the supply voltage.
Enable 1.8v signaling voltage.
2018-05-12 13:14:01 +00:00
Emmanuel Vadot
35a186191f aw_mmc: Do not fully init the controller in attach
Only do a reset of the controller at attach and init it at power_up.
We use to enable some interrupts in reset, only enable the interrupts
we are interested in when doing a request.
While here remove the regulators handling in power_on as it is very wrong
and will be dealt with in another commit.

Tested on: A31, A64
2018-05-12 13:13:34 +00:00
Emmanuel Vadot
2445c37a24 aw_mmc: Remove hardware reset
From all the BSP (Board Source Package) source that I've looked at it seems
that it's never done, remove it.

Tested On: A31, A64
2018-05-12 13:12:59 +00:00
Emmanuel Vadot
a37d59c145 aw_mmc: Read interrupt register value before writing to it
Reported by: jmcneill
2018-05-12 13:12:26 +00:00
Konstantin Belousov
a9c53bbb24 Kernel entry from vm86 mode, where PCB_VM86CALL pcb flag is not set,
is executed on the right stack already.  No copy from the entry stack
to the kstack must be performed for vm86 bios call code to function.

To access the pcb flags on kernel entry, unconditionally switch to
kernel address space if vm86 mode is detected.

This fixes very early vm86 bios calls, typically done when boot is
performed by boot2 without loader, and kernel falls back to BIOS calls
to get SMAP.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
2018-05-12 11:06:59 +00:00
Konstantin Belousov
801bf88ce3 On return from exception or interrupt, returns to vm86 mode with
PCB_VM86CALL pcb flag not set should be treated same as return to
userspace.

Most important, the address space must be switched.  This fixes
usermode vm86 operations after the 4/4 split.

Sponsored by:	The FreeBSD Foundation
2018-05-12 11:02:39 +00:00
Konstantin Belousov
507e50d5f9 Initialize tramp_idleptd during cold pmap startup, before the
exception code is copied to the trampoline.

The correct value is then copied to trampoline automatically, so
tramp_idleptd_reloced can be eliminated.

This will allow to use the same exception entry code to handle traps
from vm86 bios calls on early boot stage, as after the trampoline is
configured.

Sponsored by:	The FreeBSD Foundation
2018-05-12 10:57:34 +00:00
Konstantin Belousov
6652b9d9ea Create a macro for PIC code which loads %cr3 from tramp_idleptd.
Sponsored by:	The FreeBSD Foundation
2018-05-12 10:51:50 +00:00
Konstantin Belousov
2017ad1e81 Fix use of the custom TSS on i386 after the 4/4 split.
Record common_tssd, the descriptor to be written in GDT to point to
the common TSS, before LTR is executed.  The LTR instruction sets the
loaded descriptor type to 386 TSS busy, which traps on reloads.

Sponsored by:	The FreeBSD Foundation
2018-05-12 10:48:53 +00:00
Matt Macy
d626a614b9 hwpmc(9): clear remaining sample work for hardclock
- fix last minute change in 333509 where by runcount references
  to a pmc would remaining causing us to pause loop forever

Approved by:	sbruno
2018-05-12 03:45:30 +00:00
Warner Losh
794af7cfdc Remove extra copy of bcopy.c now that we're using the libkern version
of this file.
2018-05-12 01:43:32 +00:00
Matt Macy
e6b475e0af hwpmc(9): Make pmclog buffer pcpu and update constants
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.

Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &

Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total

After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total

For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &

Collecting 10 samples of `make -j96 buildkernel` from each:

x before
+ after

real time:
    N           Min           Max        Median           Avg        Stddev
x  10          76.4        127.62        84.845        88.577     15.100031
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        -28.398 +/- 10.0344
        -32.0602% +/- 7.69825%
        (Student's t, pooled s = 10.6794)

system time:
    N           Min           Max        Median           Avg        Stddev
x  10       2277.96       6948.53       2949.47      3341.492     1385.2677
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        -2277.47 +/- 920.425
        -68.1574% +/- 8.77623%
        (Student's t, pooled s = 979.596)

x no pmc
+ pmc running
real time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10          76.4        127.62        84.845        88.577     15.100031
Difference at 95.0% confidence
        29.73 +/- 10.0335
        50.5208% +/- 17.0525%
        (Student's t, pooled s = 10.6785)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        1.332 +/- 0.248939
        2.2635% +/- 0.426506%
        (Student's t, pooled s = 0.264942)

system time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10       2277.96       6948.53       2949.47      3341.492     1385.2677
Difference at 95.0% confidence
        2309.97 +/- 920.443
        223.937% +/- 89.3039%
        (Student's t, pooled s = 979.616)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        32.493 +/- 16.0042
        3.15% +/- 1.5794%
        (Student's t, pooled s = 17.0331)

Reviewed by:	jeff@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15155
2018-05-12 01:26:34 +00:00
Rick Macklem
5d4835e4b7 Add support for the TestStateID operation to the NFSv4.1 server.
The Linux client now uses the TestStateID operation, so this patch adds
support for it to the NFSv4.1 server. The FreeBSD client never uses this
operation, so it should not be affected.

MFC after:	2 months
2018-05-11 22:16:23 +00:00
Stephen Hurd
b69888c28f Fix LORs in in6?_leave_group()
r333175 updated the join_group functions, but not the leave_group ones.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15393
2018-05-11 21:42:27 +00:00
Konstantin Belousov
0de8041c8e Remove dead declaration.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-05-11 20:47:45 +00:00
Matt Macy
09f6ff4f1a iflib(9): Add support for cloning pseudo interfaces
Part 3 of many ...
The VPC framework relies heavily on cloning pseudo interfaces
(vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc).

This pulls in that piece. Some ancillary changes get pulled
in as a side effect.

Reviewed by:	shurd@
Approved by:	sbruno@
Sponsored by:	Joyent, Inc.
Differential Revision:	https://reviews.freebsd.org/D15347
2018-05-11 20:08:28 +00:00
Matt Macy
8dcbd0eae6 epoch(9): always set inited in epoch_init
- set inited in the !usedomains case

Reported by:	jhibbits
Approved by:	sbruno
2018-05-11 18:37:14 +00:00
Sean Bruno
3096900d09 vxge(4): deprecation notice
This hardware isn't totally ancient, about equal to a mxge(4) or mlx4en(4),
but the company was sold to Exar which then promptly exited the Ethernet
business so the card was commercially available for under 2 years. On deep
search, the only usage of these cards I found was by the importing of the
driver. There are code quality issues identified by Brooks and Hiren and
no visible use nor maintainership that warrant removal from FreeBSD 12.0.

Submitted by:	kbowling
Reviewed by:	gnn brooks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15363
2018-05-11 17:26:59 +00:00
Andrey V. Elsukov
e287c474be Apply the change from r272770 to if_ipsec(4) interface.
It is guaranteed that if_ipsec(4) interface is used only for tunnel
mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header.
Thus we can consider it as new packet and clear the protocols flags.
This allows ICMP/ICMPv6 properly handle errors that may cause this packet.

PR:		228108
MFC after:	1 week
2018-05-11 16:50:25 +00:00
Kenneth D. Merry
e440863e06 Clear out the entire structure, not just the size of a pointer to it.
sys/dev/ocs/ocs_os.c:
	In ocs_thread_create(), use sizeof(*thread) (instead of
	sizeof(thread)) as the size argument to memset so that we clear
	out the entire thread structure instead of just a few bytes of it.

Submitted by:	jtl
MFC after:	3 days
2018-05-11 14:50:26 +00:00
Ed Maste
d2a80426b3 usbdevs: add new Microchip USB-Ethernet device IDs
LAN7800 USB 3.1 to 10/100/1000 Ethernet with PHY
LAN7801 USB 3.1 to 10/100/1000 Ethernet with RGMII interface

Also update manufacturer name for the Vendor ID.  Microchip acquired
SMSC in May 2012.

Sponsored by:	The FreeBSD Foundation
2018-05-11 13:09:21 +00:00
Mateusz Guzik
726f22e081 amd64: align the .data.exclusive_cache_line section to 128
This aligns the section itself compared to other sections, does not change
internal alignment of fields stored inside. This may or may not come later.

The motivation is partially combating adverse effects of the adjacent cache
line prefetcher. Without the annotation part of read_mostly section was on
the line of fire.
2018-05-11 08:56:39 +00:00
Matt Macy
4aa302dfc9 epoch(9): callback task fixes
- initialize the pcpu STAILQ in the NUMA case
- don't enqueue the callback task if there isn't sufficient work to be done

Reported by:	pho@
Approved by:	sbruno@
2018-05-11 08:16:56 +00:00
Mateusz Guzik
782e38aa48 uma: increase alignment to 128 bytes on amd64
Current UMA internals are not suited for efficient operation in
multi-socket environments. In particular there is very common use of
MAXCPU arrays and other fields which are not always properly aligned and
are not local for target threads (apart from the first node of course).
Turns out the existing UMA_ALIGN macro can be used to mostly work around
the problem until the code get fixed. The current setting of 64 bytes
runs into trouble when adjacent cache line prefetcher gets to work.

An example 128-way benchmark doing a lot of malloc/frees has the following
instruction samples:

before:
kernel`lf_advlockasync+0x43b            32940
          kernel`malloc+0xe5            42380
           kernel`bzero+0x19            47798
   kernel`spinlock_exit+0x26            60423
         kernel`0xffffffff80            78238
                         0x0           136947
   kernel`uma_zfree_arg+0x46           159594
 kernel`uma_zalloc_arg+0x672           180556
   kernel`uma_zfree_arg+0x2a           459923
 kernel`uma_zalloc_arg+0x5ec           489910

after:
            kernel`bzero+0xd            46115
kernel`lf_advlockasync+0x25f            46134
kernel`lf_advlockasync+0x38a            49078
   kernel`fget_unlocked+0xd1            49942
kernel`lf_advlockasync+0x43b            55392
          kernel`copyin+0x4a            56963
           kernel`bzero+0x19            81983
   kernel`spinlock_exit+0x26            91889
         kernel`0xffffffff80           136357
                         0x0           239424

See the review for more details.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D15346
2018-05-11 07:04:57 +00:00
Mateusz Guzik
85c1b3c1cb rmlock: partially depessimize lock/unlock fastpath
Previusly the slow path was folded in and partially jumped over in the
common case.
2018-05-11 06:59:54 +00:00
Matt Macy
5c30b378f0 Allow different bridge types to coexist
if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.

Reviewed by:	sbruno@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15344
2018-05-11 05:00:40 +00:00
Matt Macy
b2cb28963b epoch(9): fix priority handling, make callback lists pcpu, and other fixes
- Lend priority to preempted threads in epoch_wait to handle the case
  in which we've had priority lent to us. Previously we borrowed the
  priority of the lowest priority preempted thread. (pointed out by mjg@)

- Don't attempt allocate memory per-domain on powerpc, we don't currently
  handle empty sockets (as is the case on jhibbits Talos' board).

- Handle deferred callbacks as pcpu lists and poll the lists periodically.
  Currently the interval is 1/hz.

- Drop the thread lock when adaptive spinning. Holding the lock starves
  other threads and can even lead to lockups.

- Keep a generation count pcpu so that we don't keep spining if a thread
  has left and re-entered an epoch section.

- Actually removed the callback from the callback list so that we don't
  double free. Sigh ...

Approved by:	sbruno@
2018-05-11 04:54:12 +00:00
Matt Macy
ef7f29d8e6 Test priority handling in epoch test.
- Double the number of test threads to mp_ncpu*2
- Give each thread a different scheduling priority
2018-05-11 04:47:05 +00:00
Justin Hibbits
04de51dbab No need to bzero splpar_vpa entries
splpar_vpa is in the BSS, so is already zeroed when the kernel starts up.

Tested by:	Leandro Lupori
2018-05-11 02:04:01 +00:00
Dag-Erling Smørgrav
20f8d7bc7e Slight cleanup of interface event logging.
Make if_printf() use vlog() instead of vprintf().  This means it can no
longer return the number of characters printed, as it used to, but every
single call to if_printf() in the entire kernel ignores the return value
anyway; just return 0 so we don't have to change the prototype.

Consistently use if_printf() throughout sys/net/if.c, instead of a
mixture of if_printf() and log().

In ifa_maintain_loopback_route(), don't needlessly log an error if we
either failed to add a route because it already existed or failed to
remove one because it did not.  We still return an error code, though.

MFC after:	1 week
2018-05-11 00:19:49 +00:00
Dag-Erling Smørgrav
6bff85ff9a Reduce <sys/queue.h> pollution.
While <sys/sysctl.h> includes <sys/queue.h> unconditionally, it is only
actually used in code which is conditional on _KERNEL.  Make the #include
itself conditional as well, and fix userland code that uses <sys/queue.h>
for other purposes but relied on <sys/sysctl.h> to bring it in.

MFC after:	1 week
2018-05-11 00:01:43 +00:00
Navdeep Parhar
f348cdad1a cxgbe(4): Add fields to support configuration of hardware NAT and
swapmac (SMAC/DMAC switcheroo) from userspace.

Sponsored by:	Chelsio Communications
2018-05-10 20:39:04 +00:00
Ed Maste
ff8f1e8332 Error out on attempt to link amd64 kernel with old binutils linker
As of r333461 we require ifunc support to link a working amd64 kernel.
The default in-tree bootstrap linker is lld and it has the required
support, as does any modern out-of-tree binutils linker.  The in-tree
GNU ld is from binutils 2.17.50 and it does not have ifunc support,
so produce an error rather than a broken kernel.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D15378
2018-05-10 20:10:02 +00:00
Matt Macy
7bf272a612 Allocate epoch for networking at startup
Additionally add CK to include paths for modules

Approved by:	sbruno@
2018-05-10 19:13:00 +00:00
Matt Macy
06bf2a6aef Add simple preempt safe epoch API
Read locking is over used in the kernel to guarantee liveness. This API makes
it easy to provide livenes guarantees without atomics.

Includes epoch_test kernel module to stress test the API.

Documentation will follow initial use case.

Test case and improvements to preemption handling in response to discussion
with mjg@

Reviewed by:	imp@, shurd@
Approved by:	sbruno@
2018-05-10 17:55:24 +00:00
Li-Wen Hsu
137c41d763 Fix build for platforms using GCC:
- Remove unused or dead store variable
- Remove unused function ctl_copyin_alloc
- Add missing curly brackets, this seems a regression in r287720

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15383
2018-05-10 17:22:04 +00:00
Jean-Sébastien Pédron
5e251aec86 vt(4): Use default VGA palette
Before this change, the VGA palette was configured to match the shell
palette (e.g. color #1 was red). There was one glitch early in boot when
the vt(4)'s VGA palette was loaded: the loader's logo would switch from
red to blue. Likewise for the "Booting..." message switching from blue
to red. That's because the loader's logo was drawed with the default VGA
palette where a few colors are swapped compared to the shell palette
(e.g. blue <-> red).

This change configures the default VGA palette during initialization and
converts input's colors from shell to VGA palette index.

There should be no visible changes, except the loader's logo which will
keep its original color.

Reviewed by:	eadler
2018-05-10 17:00:33 +00:00
Jean-Sébastien Pédron
e525438697 vt(4): Put for() loop outside switch() in vt_generate_cons_palette()
This makes it more logical:
 1. It checks the requested color format
 2. It fills the palette accordingly

Also vt_palette_init() is only called when needed (i.e. when the format
is `COLOR_FORMAT_RGB`).
2018-05-10 16:41:47 +00:00
Andrew Gallatin
66fe09d8d9 Fix a panic in the IPv6 multicast code.
Use LIST_FOREACH_SAFE in in6m_disconnect() since we're
deleting and freeing item from the membership list
while traversing the list.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-10 16:19:41 +00:00
Konstantin Belousov
8b4fc8b11c Make fpusave() and fpurestore() on amd64 ifuncs.
From now on, linking amd64 kernel requires either lld or newer ld.bfd.

Reviewed by:	jhb (as part of the large patch)
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13838
2018-05-10 15:01:43 +00:00
Andrew Gallatin
d5cdcc3a06 Fix the build after r333457
In r333457, the arguments to kern_pwritev() were accidentally
re-ordered as part of ANSIfication, breaking the build.
2018-05-10 13:19:42 +00:00
Ed Maste
cc3c9df80f ANSIfy sys_generic.c 2018-05-10 11:36:16 +00:00
Marcin Wojtas
2339f28c6e Do not pass header length to the ENA controller
Header length is optional hint for the ENA device. Because It is not
guaranteed that every packet header will be in the first mbuf
segment, it is better to skip passing any information. If the header
length will be indicating invalid value (different than 0), then the
packet will be dropped.

This kind situation can appear, when the UDP packet will be fragmented
by the stack in the ip_fragment() function.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reported by:  Krishna Yenduri <kyenduri@brkt.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2018-05-10 09:37:54 +00:00
Emmanuel Vadot
43fd679efb arm64: Add ALT_BREAK_TO_DEBUGGER to GENERIC
It is useful to enter kdb with an escape sequence.
While here move the USB_DEBUG with the others debug options and define
nooptions USB_DEBUG for GENERIC-NODEBUG
2018-05-10 09:37:50 +00:00
Marcin Wojtas
dbf2eb543b Skip setting the MTU for ENA if it is not changing
On AWS, a network interface can get reinitialized every 30 minutes due
to the MTU being (re)set when a new DHCP lease is obtained. This can
cause packet drop, along with annoying syslog messages.

Skip setting the MTU in the ena driver if the new MTU is the same as the
old MTU. Note this fix is already in the netfront driver.

Testing: Verified ena up/down messages do not appear every 30 min in
/var/log/messages with the fix in place.

Submitted by:   Krishna Yenduri <kyenduri@brkt.com>
Reviewed by: Michal Krawczyk <mk@semihalf.com>
2018-05-10 09:32:59 +00:00
Marcin Wojtas
6461d6a396 Apply fixes in ena-com
* Change ena-com BIT macro to work on unsigned value.
  To make the shifting operations safer, they should be working on
  unsigned values.

* Fix a mutex not owned ASSERT panic in ENA control path.
  A thread calling cv_broadcast()/cv_signal() must hold the mutex used for
  cv_wait(). Fix the ENA control path code that has this problem.

Submitted by:   Krishna Yenduri <kyenduri@brkt.com>
Reviewed by:    Michal Krawczyk <mk@semihalf.com>
Tested by:      Michal Krawczyk <mk@semihalf.com>
2018-05-10 09:25:51 +00:00
Marcin Wojtas
fbb0ed71b2 Upgrade ENA version to v0.8.1
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2018-05-10 09:06:21 +00:00
Xin LI
b6f7731dba Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-10 06:41:08 +00:00
Navdeep Parhar
f7a203bc21 cxgbe(4): Disable write-combined doorbells by default.
This had been the default behavior but was changed accidentally as part
of the recent iw_cxgbe+OFED overhaul.  Fix another bug in that change
while here: the global knob affects all the adapters in the system and
should be left alone by per-adapter code.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-05-10 06:33:54 +00:00
Justin Hibbits
b4a0a59871 Fix PPC symbol resolution
Summary:
There were 2 issues that were preventing correct symbol resolution
on PowerPC/pseries:

1- memory corruption at chrp_attach() - this caused the inital
   part of the symbol table to become zeroed, which would cause
   the kernel linker to fail to parse it.
   (this was probably zeroing out other memory parts as well)

2- DDB symbol resolution wasn't working because symtab contained
   not relocated addresses but it was given relocated offsets.
   Although relocating the symbol table fixed this, it broke the
   linker, that already handled this case.
   Thus, the fix for this consists in adding a new DDB macro:
   DB_STOFFS(offs) that converts a (potentially) relocated offset
   into one that can be compared with symbol table values.

PR:		227093
Submitted by:	Leandro Lupori <leandro.lupori_gmail.com>
Differential Revision: https://reviews.freebsd.org/D15372
2018-05-10 03:59:48 +00:00
Marcelo Araujo
8951f05525 Rework CTL frontend & backend options to use nv(3), allow creating multiple
ioctl frontend ports.

This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
  Removes ctl_be_arg structure and associated logic and replaces it with
  nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
  ctld(8).
  New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
  Those device nodes respond only to CTL_IO ioctl.

New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4

New syntax for ctl.conf:
target ... {
    port ioctl/<pp>
    ...
}

target ... {
    port ioctl/<pp>/<vp>
    ...

Note: Most of this work was made by jceel@, thank you.

Submitted by:	jceel
Reworked by:	myself
Reviewed by:	mav (earlier versions and recently during the rework)
Obtained from:  FreeNAS and TrueOS
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D9299
2018-05-10 03:50:20 +00:00
Warner Losh
3429b518c9 Remove unused bcopyb.
Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:54 +00:00
Warner Losh
baaa3c4d60 Simplify things a little
Rather than include a copy for memmove to call bcopy to call memcpy
(which handles overlapping copies), make memmove a strong reference to
memcpy to save the two calls.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:48 +00:00
Warner Losh
5aa07b053a Move MI-ish bcopy routine to libkern
riscv and powerpc have nearly identical bcopy.c that's
supposed to be mostly MI. Move it to the MI libkern.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:38 +00:00
Navdeep Parhar
5174205de5 cxgbe(4): Determine whether the firmware supports the FILTER2 work
request, which can be used to configure hardware NAT and swapmac.

All firmwares released after Jan 2017 support this work request.

Sponsored by:	Chelsio Communications
2018-05-10 00:04:14 +00:00
Mark Johnston
e3d5c4ade1 Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-09 20:57:18 +00:00
Mariusz Zaborski
31f7586d73 Introduce the 'n' flag for the geli attach command.
If the 'n' flag is provided the provided key number will be used to
decrypt device. This can be used combined with dryrun to verify if the key
is set correctly. This can be also used to determine which key slot we want to
change on already attached device.

Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D15309
2018-05-09 20:53:38 +00:00
Warner Losh
041f49aece Remove the 'All Rights Reserved' clause from some of the stuff I've
done for Netflix, since I'm in the neighborhood.
2018-05-09 20:32:23 +00:00
Warner Losh
33123867af Use the full year, for real this time. 2018-05-09 20:26:37 +00:00
Mark Johnston
b4fa90d6f9 Fix bxe(4) netdump rx polling.
Reviewed by:	cem, rstone
X-MFC with:	r333287
Sponsored by:	Dell EMC Isilon
2018-05-09 19:54:34 +00:00
Cy Schubert
4273f67609 Fix style error introduced in r333393.
Reported by:	jhb, imp, phk
MFC after:	6 days
X-MFC with:	r333393
2018-05-09 19:05:27 +00:00
Matt Macy
36688f706e Add taskqgroup_config_gtask_deinit to support teardown after
taskqgroup_config_gtask_init.

Approved by:	sbruno
2018-05-09 18:51:35 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Matt Macy
ca9551221b Remove bogus panic
r333345 added a panic to the default case statement on the incorrect
premise that it should "never happen" when in fact it is simply a
different adapter version.

Reported by:	markj
Approved by:	sbruno
2018-05-09 17:48:52 +00:00
Kyle Evans
f0fb94abca Standardize SPDX tag on files I've added 2018-05-09 16:52:28 +00:00
Kyle Evans
4b3c64f722 Remove "All Rights Reserved" on files that I hold sole copyright on
See r333391 for more detail; in summary: it holds no weight and may be
removed.
2018-05-09 16:44:19 +00:00
John Baldwin
485415ec47 Report TRAP_BRKPT for breakpoint traps on sparc64.
Reviewed by:	marius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15190
2018-05-09 15:25:26 +00:00
Mateusz Guzik
20ca271fdd amd64: depessimize bcmp for small buffers
Adapt assembly generated by clang for memcmp and use it for <= 64 sized
compares (which are the vast majority).

Sample result of doing stats on Broadwell (% of samples):
before: 4.0 kernel     bcmp                 cache_lookup
after : 0.7 kernel     bcmp                 cache_lookup

The routine is most definitely still not optimal. Anyone interested in
spending time improving it is welcome to take over.

Reviewed by:	kib
2018-05-09 15:16:25 +00:00
Konstantin Belousov
55c9d75e6b Avoid calls to bzero() before ireloc.
Evaluate cpu_stdext_feature early to have moved link_elf_ireloc() see
correct flags, most important is SMAP.

Tested by:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D15367
2018-05-09 14:39:24 +00:00
Warner Losh
603bbd0631 Minor style nits
Use full copyright year.
Remove 'All Rights Reserved' from new file (rights holder OK'd)
Minor #ifdef motion and #endif tagging
Remove __FBSDID macro from comments

Sponsored by: Netflix
OK'd by: rrs@
2018-05-09 14:11:35 +00:00
Konstantin Belousov
71d1bbce91 Remove PG_U from the rest of the kernel pmap ptes.
Supposedly, they PG_U bits there were set to easier making some kernel
page accessible to userspace in-place.  Since it was not used for the
whole existence of the amd64 pmap.c and current design of the shared
pages prefers double-mapping over the in-place access, remove PG_U
both from the direct map and KVA slots.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-09 12:09:08 +00:00
Konstantin Belousov
5aaa5bc3d6 Remove PG_U from the recursive pte for kernel pmap' PML4 page.
This PML4 page is never used for the userspace process, so there is no
security implications.  But the configuration trips SMAP check, which
should be corrected.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-09 12:03:40 +00:00
Andrey V. Elsukov
782360dec3 Bring in some last changes in NAT64 implementation:
o Modify ipfw(8) to be able set any prefix6 not just Well-Known,
  and also show configured prefix6;
o relocate some definitions and macros into proper place;
o convert nat64_debug and nat64_allow_private variables to be
  VNET-compatible;
o add struct nat64_config that keeps generic configuration needed
  to NAT64 code;
o add nat64_check_prefix6() function to check validness of specified
  by user IPv6 prefix according to RFC6052;
o use nat64_check_private_ip4() and nat64_embed_ip4() functions
  instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows
  to use any configured IPv6 prefixes that are allowed by RFC6052;
o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is
  Well-Known IPv6 prefix. It is used to reduce overhead to check this;
o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config
  structure. And respectivelly modify the rest of code;
o remove now unused ro argument from nat64_output() function;
o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions;
o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile
  as example.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2018-05-09 11:59:24 +00:00
Andrey V. Elsukov
2e4531a12b Add IFCAP_LINKSTATE support to if_loop(4).
Reviewed by:	wollman
Obtained from:	Yandex LLC
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D15278
2018-05-09 10:50:51 +00:00
Hans Petter Selasky
c20feee43b Add myself to copyright in the LinuxKPI RCU support layer.
Suggested by:	mmacy@
Sponsored by:	Mellanox Technologies
2018-05-09 08:50:42 +00:00
Navdeep Parhar
89f651e704 cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool.  Hash and
normal TCAM filters can be used together.  The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters.  Any T5/T6 card with memory can support at
least half a million hash filters.  The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file.  The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto).  It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters.  A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by:	Chelsio Communications
2018-05-09 04:09:49 +00:00
Cy Schubert
bb7af25076 Document intentional fallthrough. (CID 976535)
MFC after:	1 week
2018-05-09 02:07:09 +00:00
Cy Schubert
8d3478a26f Fix memory leak. (CID 1199373).
MFC after:	1 week
2018-05-09 02:02:58 +00:00
Matt Macy
ad738f3791 Reduce overhead of ktrace checks in the common case.
KTRPOINT() checks both if we are tracing _and_ if we are recursing within
ktrace. The second condition is only ever executed if ktrace is actually
enabled. This change moves the check out of the hot path in to the functions
themselves.

Discussed with mjg@

Reported by:	mjg@
Approved by:	sbruno@
2018-05-09 00:00:47 +00:00
Sean Bruno
57b4936514 nxge(4):
Remove nxge(4) and associated man page and tools in FreeBSD 12.0.

Submitted by:	kbowling
Reviewed by:	brooks
Relnotes:	yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D1529
2018-05-08 21:14:29 +00:00
Michael Tuexen
45d41de5e6 Fix two typos reported by N. J. Mann, which were introduced in
https://svnweb.freebsd.org/changeset/base/333382 by me.

MFC after:	3 days
2018-05-08 20:39:35 +00:00
Michael Tuexen
9669e724d1 When reporting ERROR or ABORT chunks, don't use more data
that is guaranteed to be contigous.
Thanks to Felix Weinrank for finding and reporting this bug
by fuzzing the usrsctp stack.

MFC after:	3 days
2018-05-08 18:48:51 +00:00
Jung-uk Kim
e7dfa7d8ab MFV: r333378
Import ACPICA 20180508.
2018-05-08 18:18:27 +00:00
Stephen Hurd
ac88e6da11 iflib: print message when iflib_tx_structures_setup fails
Print a message when iflib_tx_structures_setup fails, like we do for
iflib_rx_structures_setup.

Now that we always print a message from within
iflib_qset_structures_setup when it fails, stop printing one in
iflib_device_register() at the call site.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15300
2018-05-08 17:15:10 +00:00
Konstantin Belousov
053641bb1c Prepare DB# handler for deferred trigger of watchpoints.
Since pop %ss/mov %ss instructions defer all interrupts and exceptions
for the next instruction, it is possible that the userspace watchpoint
trap executes on the first instruction of the kernel entry for
syscall/bpt.

In this case, DB# should be treated similarly to NMI: on amd64 we must
always load GSBASE even if the trap comes from kernel mode, and load
the kernel page table root into %cr3.  Moreover, the trap must
use the dedicated stack, because we are still on the user stack when
trapped on syscall entry.

For i386, we must reload %cr3.  The syscall instruction is not configured,
so there is no issue with executing on user stack when trapping.

Due to some CPU erratas it is not always possible to detect that the
userspace watchpoint triggered by inspecting %dr6.  In trap(), compare the
trap %rip with the known unsafe entry points and if matched pretend that
the watchpoint did not fire at all.

Thank you to the MSRC Incident Response Team, and in particular Greg
Lenti and Nate Warfield, for coordinating the response to this issue
across multiple vendors.

Thanks to Computer Recycling at The Working Center of Kitchener for
making hardware available to allow us to test the patch on additional
CPU families.

Reviewed by:	jhb
Discussed with:	Matthew Dillon
Tested by:	emaste
Sponsored by:	The FreeBSD Foundation
Security:	CVE-2018-8897
Security:	FreeBSD-SA-18:06.debugreg
2018-05-08 17:00:34 +00:00
Stephen Hurd
6108c01395 iflib: cleanup queues when iflib_device_register fail
Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15299
2018-05-08 16:56:02 +00:00
Justin Hibbits
151c44e22b Fix wrong cpu0 identification
Summary:
chrp_cpuref_init() was relying on the boot strap processor to be
the first child of /cpus. That was not always the case, specially
on pseries with FDT.

This change uses the "reg" property of each CPU instead and also
adds several sanity checks to avoid unexpected behavior (maybe
too many panics?).

The main observed symptom was interrupts being missed by the main
processor, leading to timeouts and the kernel aborting the boot.

Submitted by:	Leandro Lupori
Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15174
2018-05-08 13:23:39 +00:00
Hans Petter Selasky
306cf294b2 Fix for missing network interface address event when adding the default IPv6
based link-local address.

The default link local address for IPv6 is added as part of bringing the
network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)"
from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should
catch all the cases of adding IPv6 based addresses to a network interface.
Add a witness warning in case the event handler is not allowed to sleep.

Reviewed by:	network (ae), kib
Differential Revision:	https://reviews.freebsd.org/D13407
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-08 11:39:01 +00:00
Matt Macy
10d20c84ed Fix spurious retransmit recovery on low latency networks
TCP's smoothed RTT (SRTT) can be much larger than an actual observed RTT. This can be either because of hz restricting the calculable RTT to 10ms in VMs or 1ms using the default 1000hz or simply because SRTT recently incorporated a larger value.

If an ACK arrives before the calculated badrxtwin (now + SRTT):
tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1));

We'll erroneously reset snd_una to snd_max. If multiple segments were dropped and this happens repeatedly the transmit rate will be limited to 1MSS per RTO until we've retransmitted all drops.

Reported by:	rstone
Reviewed by:	hiren, transport
Approved by:	sbruno
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8556
2018-05-08 02:22:34 +00:00
Matt Macy
d5210708dd Sleep rather than spin in e1000 when doing long running config operations.
With r333218 it is now possible for drivers to use an sx lock and thus sleep while
waiting on long running operations rather than DELAY().

Reported by:	gallatin
Reviewed by:	sbruno
Approved by:	sbruno
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14984
2018-05-08 01:39:45 +00:00
Mateusz Guzik
2824088536 Inlined sched_userret.
The tested condition is rarely true and it induces a function call
on each return to userspace.

Bumps getuid rate by about 1% on Broadwell.
2018-05-07 23:36:16 +00:00
Mateusz Guzik
75e9b455a9 Change trap_enotcap to bool and annotate with __read_frequently
It is read on each return to user space.
2018-05-07 23:10:12 +00:00
Mateusz Guzik
79ca7cbf09 Avoid calls to syscall_thread_enter/exit for statically defined syscalls
The entire mechanism is rarely used and is quite not performant due to
atomci ops on the syscall table. It also has added overhead for completely
unrelated syscalls.

Reduce it by avoiding the func calls if possible (which consistutes vast
majority of cases).

Provides about 3% syscall rate speed up for getuid on Broadwell.
2018-05-07 22:29:32 +00:00
Mateusz Guzik
a9456603f2 amd64: stop asserting params != NULL in the syscall path
The parameter is effectively controllable by userspace. It does not matter
what it is set to as it is being passed to copyin - worst case the operation
will just fail.

While here stop computing it unless it is going to be used.

Noted by:	dillon@backplane.com
2018-05-07 21:32:08 +00:00
Warner Losh
b425e3fba2 Put the CPU starting on one line. 2018-05-07 21:09:21 +00:00
Warner Losh
43d9cb5b74 Use device_quiet_children to silence verbose CPU probe messages.
Have cpu0 be noisy, but all the other CPU devices be quiet on boot.
2018-05-07 21:09:17 +00:00
Warner Losh
ad7142757b Add device_quiet_children() and device_has_quiet_children()
If you add a child to a device that has quiet children, we'll
automatically set the quiet flag on the children, and its
children.

This is indended for things like CPU that have a large amount of
repetition in booting that adds nothing.
2018-05-07 21:09:08 +00:00
Mateusz Guzik
bed34b0b04 amd64: fix up memset added in r333324
There was a missing trick expanding the passed pattern to a full word
by multiplication. As a side effect non-zero patterns would be
incorrectly laid down.

This stems from the use of rep stosq which is word-sized, while the passed
argument is byte-sized.

I initially repurposed memcpy into memset without taking this into account.
All but non-bzero testing was performed with a variant utilizing ERMS, i.e.
using only stosb which happens to not into the problem whatsoever. So my bad
twice.

Thanks to Oliver Pinter for noting the problem and providing a testcase.
2018-05-07 20:54:42 +00:00
Andrew Gallatin
1f7ce05d1d Fix an off-by-one error when deciding to request a tx interrupt
The canonical check for whether or not a ring is drainable is
TXQ_AVAIL() > MAX_TX_DESC() + 2.  Use this same construct here,
in order to avoid a potential off-by-one error where we might otherwise
fail to request an interrupt.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-07 18:11:22 +00:00
Mateusz Guzik
f185a3dc33 amd64: tweak the memmove comment regarding authorship
To make it clear the mentioned author did not write memmove.
2018-05-07 17:37:07 +00:00
Andrew Gallatin
e7bd0750af Boost thread priority while changing CPU frequency
Boost the priority of user-space threads when they set
their affinity to a core to adjust its frequency.   This avoids a situation
where a CPU bound kernel thread with the same affinity is running on a
down-clocked core, and will "block" powerd from up-clocking the core
until the kernel thread yields.   This can lead to poor perfomance,
and to things potentially getting stuck on Giant.

Reviewed by:	kib (imp reviewed earlier version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15246
2018-05-07 15:24:03 +00:00
Mateusz Guzik
6a909b9680 amd64: replace libkern's memset and memmove with assembly variants
memmove is repurposed bcopy (arguments swapped, return value added)
The libkern variant is a wrapper around bcopy, so this is a big
improvement.

memset is repurposed memcpy. The librkern variant is doing fishy stuff,
including branching on 0 and calling bzero.

Both functions are rather crude and subject to partial depessimization.

This is a soft prerequisite to adding variants utilizing the
'Enhanced REP MOVSB/STOSB' bit and let the kernel patch at runtime.
2018-05-07 15:07:28 +00:00
Alexander Motin
167a34407c Keep CARP state as INIT when net.inet.carp.allow=0.
Currently when net.inet.carp.allow=0 CARP state remains as MASTER, which is
not very useful (if there are other masters -- it can lead to split brain,
if there are none -- it makes no sense).  Having it as INIT makes it clear
that carp packets are disabled.

Submitted by:	wg
MFC after:	1 month
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14477
2018-05-07 14:44:55 +00:00
Andriy Gapon
de15b11aaa x86 cpususpend_handler: call wbinvd after setting suspend state bits
Without a subsequent wbinvd the changes to suspended_cpus (and
resuming_cpus) can be lost at least on AMD systems that use MOESI cache
coherency protocol.  That can happen because one of APs ends up as an
Owner of the corresponding cache line(s) and the changes may never reach
the main memory before the AP is reset.

While here, move clearing of suspended_cpus a little bit earlier as the
fact of returning from savectx (with zero return value) means that the
CPU has fully restored it execution context.

Also, rework the comment that describes the need for resuming_cpus.

This change fixed suspend to RAM a previously broken AMD-based system.

Reviewed by:	kib
Discussed with:	bde
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D15295
2018-05-07 12:22:25 +00:00
Emmanuel Vadot
3914c76abb clk: clk_set_assigned: Skip frequency of value 0
A frequency of value 0 mean that we don't want to change the frequency so
skip it.
2018-05-07 09:42:35 +00:00
Emmanuel Vadot
08f3f0f953 arm64: rockchip: cru: Call clk_set_assigned
We need to call clk_set_assigned after all the clock have been registered
to set the parents/rates described in the dtb.
2018-05-07 07:31:25 +00:00
Emmanuel Vadot
d195d4acee clk: Add support for assigned-clock-rates
The properties 'assigned-clocks', 'assigned-clock-parents' and
'assigned-clock-rates' all work together.
'assigned-clocks' holds the list of clock for which we need to either
assign a new parent or a new frequency.
The old code just supported assigning a new parents, add support for
assigning a new frequency too.
2018-05-07 07:30:40 +00:00
Emmanuel Vadot
dff9720331 arm64: rockchip: clk: Add support to reparent to clk_composite
All clk_composite type have the possibility to reparent (choosing another
parent to find a better frequency), add the support for that.
2018-05-07 07:29:48 +00:00
Emmanuel Vadot
66a4c42756 arm64: rk3328: Add pll rates tables
Add the known value to be safe for the rk3328 PLLs
2018-05-07 07:28:47 +00:00
Emmanuel Vadot
78d07c93a7 arm64: rk: Add support for setting pll rate
Add support for setting pll rate. On RockChip SoC two kind of plls are
supported, integer mode and fractional mode.
The two modes are intended to support more frequencies for the core plls.
While here change the recalc method as it appears that the datasheet is
wrong on the calculation method.
2018-05-07 07:28:10 +00:00
Emmanuel Vadot
178f57b143 arm64: rockchip: rk3328: Add armclk clock
Add the clock definition for the arm clock.
While here remove the indexes in the clock table as we will need clock
with a 0 index (non-exported clocks).
2018-05-07 07:26:48 +00:00
Pedro F. Giffuni
b732ceb6ca msdosfs: use vfs_timestamp() to generate timestamps instead of getnanotime().
Most filesystems, with the notable exceptions of msdosfs and autofs use
only vfs_timestamp() to read the current time. This has the benefit of
configurable granularity (using the vfs.timestamp_precision sysctl).

For convenience, use it on msdosfs too.

Submitted by:	Damjan Jovanovic
Differential Revision:	https://reviews.freebsd.org/D15297
2018-05-06 21:29:29 +00:00
Poul-Henning Kamp
1e3b21b1b8 With the fall-back hack for lint gone, I have no copyright claim on this file. 2018-05-06 21:22:46 +00:00
Matt Macy
b6f6f88018 r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.

Synchronously remove all external references to a multicast address before enqueueing for delete.

Reported by:	lwhsu
Approved by:	sbruno
2018-05-06 20:34:13 +00:00
Matt Macy
7edd877a1e The ifnet pointer (ifp) in rt_newaddrmsg can be valid without ifp->if_addr being set if
if the ifnet is still live by way of a reference but
in line for deletion. Check ifp->if_addr before dereferencing.

Approved by:	sbruno
2018-05-06 20:32:47 +00:00
Emmanuel Vadot
41cd649615 am335x_prcm: Delay the frequencies read check
With Linux 4.17 dts the compatible for the prcm added 'simplebus' we mean
that the simplebus driver will attach to it at the BUS_PASS_BUS pass.
Change the pass for the prcm driver to be at BUS_PASS_BUS so we will win
the attach.
This introduce a problem as this driver needs the ti_scm one to be already
attached. ti_scm also attach at BUS_PASS_BUS but after the prcm one as it is
after in the dtb and the simplebus driver simpy walk the tree to attach it's
children.
Use the bus_new_pass method to defer the frequencies read at BUS_PASS_TIMER.
This fixes booting on BeagleBone*

Reported by:	many
2018-05-06 14:37:11 +00:00
Michael Tuexen
67e8b08bbe Ensure we are not dereferencing a NULL pointer.
This was found by Coverity scanning the usrsctp stack (CID 203808).

MFC after:	3 days
2018-05-06 14:19:50 +00:00
Mark Johnston
9461882562 Add netdump support to iflib.
em(4) and igb(4) were tested by me, and ixgbe(4) and bnxt(4) were
tested by sbruno.

Reviewed by:	mmacy, shurd
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15262
2018-05-06 00:57:52 +00:00
Mark Johnston
c857c7d553 Add netdump support to vtnet(4).
Tested with bhyve.

Reviewed by:	bryanv, julian
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15261
2018-05-06 00:53:52 +00:00
Mark Johnston
306c97e2d8 Add netdump support to re(4).
Tested with a RealTek 8101E adapter.

Reviewed by:	sbruno
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15260
2018-05-06 00:52:17 +00:00
Mark Johnston
eb07d67ef3 Add netdump support to cxgb(4).
Tested with a T320 adapter.

Reviewed by:	np
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15258
2018-05-06 00:48:43 +00:00
Mark Johnston
6eadb68b14 Add netdump support to bxe(4).
Tested with a NetXtreme II BCM57810 adapter.

Reviewed by:	davidcs
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15257
2018-05-06 00:47:39 +00:00
Mark Johnston
ded669627a Add netdump hooks to bge(4).
Tested with a NetXtreme BCM5727 adapter.

Reviewed by:	julian
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15256
2018-05-06 00:45:41 +00:00
Mark Johnston
8a4665833d Add netdump hooks to alc(4).
Tested with an AR8162.

Reviewed by:	julian, sbruno
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15255
2018-05-06 00:43:46 +00:00
Mark Johnston
e505460228 Import the netdump client code.
This is a component of a system which lets the kernel dump core to
a remote host after a panic, rather than to a local storage device.
The server component is available in the ports tree. netdump is
particularly useful on diskless systems.

The netdump(4) man page contains some details describing the protocol.
Support for configuring netdump will be added to dumpon(8) in a future
commit. To use netdump, the kernel must have been compiled with the
NETDUMP option.

The initial revision of netdump was written by Darrell Anderson and
was integrated into Sandvine's OS, from which this version was derived.

Reviewed by:	bdrewery, cem (earlier versions), julian, sbruno
MFC after:	1 month
X-MFC note:	use a spare field in struct ifnet
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15253
2018-05-06 00:38:29 +00:00
Mark Johnston
bd92e6b6f5 Refactor some of the MI kernel dump code in preparation for netdump.
- Add clear_dumper() to complement set_dumper().
- Drain netdump's preallocated mbuf pool when clearing the dumper.
- Don't do bounds checking for dumpers with mediasize 0.
- Add dumper callbacks for initialization for writing out headers.

Reviewed by:	sbruno
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15252
2018-05-06 00:22:38 +00:00
Mark Johnston
5475ca5aca Add an mbuf allocator for netdump.
The aim is to permit mbuf allocations after a panic without calling into
the page allocator, without imposing any runtime overhead during regular
operation of the system, and without modifying driver code. The approach
taken is to preallocate a number of mbufs and clusters, storing them
in linked lists, and using the lists to back some UMA cache zones. At
panic time, the mbuf and cluster zone pointers are overwritten with
those of the cache zones so that the mbuf allocator returns
preallocated items.

Using this scheme, drivers which cache mbuf zone pointers from
m_getzone() require special handling when implementing netdump support.

Reviewed by:	cem (earlier version), julian, sbruno
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15251
2018-05-06 00:19:48 +00:00
Mark Johnston
c2ba2d1b0e Style.
MFC after:	3 days
2018-05-06 00:11:30 +00:00
Mark Johnston
681554d70b Remove a redundant assertion.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2018-05-06 00:05:03 +00:00
Mark Johnston
40e805221b Avoid dropping the topology lock in gmirror's dumpconf implementation.
Doing so introduces races which can lead to a use-after-free when
grabbing a snapshot of the GEOM mesh.

To ensure that a mirror's disk list remains stable, change its locking
protocol: both the softc lock and the topology lock are now required
to modify the list, so either lock is sufficient for traversal.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-05-06 00:03:24 +00:00
Matt Macy
28c001002a Currently in_pcbfree will unconditionally wunlock the pcbinfo lock
to avoid a LOR on the multicast list lock in the freemoptions routines.
As it turns out, tcp_usr_detach can acquire the tcbinfo lock readonly.
Trying to wunlock the pcbinfo lock in that context has caused a number
of reported crashes.

This change unclutters in_pcbfree and moves the handling of wunlock vs
runlock of pcbinfo to the freemoptions routine.

Reported by:	mjg@, bde@, o.hartmann at walstatt.org
Approved by:	sbruno
2018-05-05 22:40:40 +00:00
Navdeep Parhar
b6f2c452cb cxgbe(4): Update all firmwares to 1.19.1.0.
These firmwares and the following list of changes are from the public
ChelsioUwire-3.7.1.0 release.

T6 Firmware
================================================================================
Version : 1.19.1.0
Date    : 04/23/2018
================================================================================

Fixes
-----

BASE:
- Fixed traffic stall when rate-limit is modified while running traffic.
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.

ETH:
- Exit Auto-Negotiation if we don't receive base page from peer within 10s.
  This fixes some cases where in we keep on restarting auto negotiation without
  ever exiting, resulting in link failure.
- Fixes an issue where VF packets counter were not increasing if VF packets
  coalesced WR is used by driver.

OFLD:
- Kernel and user mode NVMEoF performance enhancements.

FOiSCSI:
- Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.

================================================================================
Version : 1.18.9.0
Date    : 03/27/2018
================================================================================

Fixes
-----

BASE:
- For Ethernet frames less than 64B, pad them with zero bytes as per IEEE spec
  (RFC 894).
- Added a new parameter iqtype to FW_IQ_CMD to identify the ingress NIC or offload
  queues. This fixes an issue where driver was receiving interrupt with no new
  messages in queue.
- FW_PARAMS_CMD processes all the valaid paramaters and returns value 0UL for
  any unknown parameter.

OFLD:
- Fixes connection failure during SRQ reuse.
- Fixes incorrect cqe in case of WRITE with immediate operation.

FOiSCSI:
- Fixes a fw crash when wrong node-id is passed to FW_FOISCSI_CTRL_WR.

FOFCoE:
- Fixes a fw hang while creating NPIV.

Enhancements
------------

ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.

================================================================================
Version : 1.18.4.0
Date    : 02/28/2018
================================================================================

Fixes
-----

BASE:
- Fixed Rate limiting not working for 101Mbps<=rate limit<=163Mbps range.
- Fixed starting more than 32 VMs on PF4 causing firmware hang.

ETH:
- Fixed link failure due to FEC mismatch with optics.
- Fixed link failure with link toggle stress tests.
- Only BaseR FEC is supported for 50G.
- Fixed a bug in next page handling which sometimes causes link down.
- Fixed port down due to failre to read eeprom contents of some modules.
- Fixed a bug causing adapter to fail with spider configuration.

FOiSCSI:
- Fixed a bug causing login failure when connecting to multiple targets.

Enhancements
------------

BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
  the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).

ETH:
- Added support for user to contol pause negotiation during auto negotiation.

FOiSCSI:
- Added a new facility to redirect few fw events to offload rx queue
  (based on driver's configration)
- Driver can ignore providing ipv6 prefix len during ipv6 address configuration.

================================================================================
Version : 1.17.14.0
Date    : 12/27/2017
================================================================================

FIXES
-----

BASE:
- Fixed an FLR failure during simulteneous power up of VM.
- Fixed an issue in vlan acl which was limiting vlan range to 1024.

ETH:
- Enabled RS-FEC for 25G active copper cable and 25GBASE-SR.
- When auto negotiation is enabled, final pause settings are resolved
  based on local and peer pause settings.
- Handle NACK for an I2C access.

OFLD
- Fixed rdma connection cleanup in SO adpater.
- Fixed rdma connections during read invalidate.
- Fixed the crash when invalid BW rate is passed to fw.
- Fixed the traffic hang when BW allocation is changed from switch during traffic.

FOFCoE:
- Fixed an issue where initiator remains logged-in even after LLDP is disabled
  on switch.

ENHANCEMENTS
------------

BASE:
- Added support for 248 VFs.
- Added fw driver periodic calibration for MC.

ETH:
- Added XLAUI port type support.
- Added raw mac entry deletion support (FW_VI_MAC_ID_BASED_FREE).

OFLD:
- Inline IPSec support added (flag F_FW_ULPTX_WR_DATA indicates the inline
  IPSec WR).
- New work request FW_RI_RDMA_WRITE_CMPL_WR (write with completion) added to

T5 Firmware
================================================================================
Version : 1.19.1.0
Date    : 04/23/2018
================================================================================

Fixes
-----

BASE:
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.

ETH:
- Fixes an issue where VF packets counter were not increasing if VF packets
  coalesced WR is used by driver.

OFLD:
- Fixes an issue where fw hangs if max traffic rate passed is 0.

FOiSCSI:
-  Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.

================================================================================
Version : 1.18.9.0
Date    : 03/27/2018
================================================================================

Fixes
-----

BASE:
- For Ethernet frames less than 64B, pad them with zero bytes as per IEEE spec
  (RFC 894).
- Added a new parameter iqtype to FW_IQ_CMD to identify the ingress NIC or offload
  queues. This fixes an issue where driver was receiving interrupt with no new
  messages in queue.

ETH:
- Pad the Ethernet packets of size less than 64B with zeros. This fixes the
  incorrect checksum generation of packets less then 64B.

FOiSCSI:
- Fixes a fw crash when wrong node-id is passed to FW_FOISCSI_CTRL_WR.

FOFCoE:
- Fixes a fw hang while creating NPIV.

Enhancements
------------

ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.

================================================================================
Version : 1.18.4.0
Date    : 02/28/2018
================================================================================

Fixes
-----

BASE:
- Fixed starting more than 32 VMs on PF4 causing firmware hang.

FOiSCSI:
- Fixed a bug causing login failure when connecting to multiple targets.

Enhancements
------------

BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
  the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).

ETH:
- Added support for user to contol pause negotiation during auto negotiation.

FOiSCSI:
- Added a new facility to redirect few fw events to offload rx queue
  (based on driver's configration)
- Driver can ignore providing ipv6 prefix len during ipv6 address configuration.

================================================================================
Version : 1.17.14.0
Date    : 12/27/2017
================================================================================

FIXES
-----

BASE:
- Fixed an issue in vlan acl which was limiting vlan range to 1024.

ETH:
- Corrected lane inversion logic.
- Fixed improper LED behavior in T580 cards.
- When auto negotiation is enabled, final pause settings are resolved
  based on local and peer pause settings.
- Handle NACK for an I2C access.

OFLD
- Fixed rdma connections during read invalidate.

FOiSCSI:
- Fixed a connections hang when link is toggled frequently.

FOFCoE:
- Fixed an issue where initiator remains logged-in even after LLDP is disabled
  on switch.

ENHANCEMENTS
------------

BASE:
- Added support for 124 VFs.

ETH:
- Added XLAUI port type support.
- Added raw mac entry deletion support (FW_VI_MAC_ID_BASED_FREE).

OFLD:
- New work request FW_RI_RDMA_WRITE_CMPL_WR (write with completion) added to
  optimize NVMEoF write.

T4 Firmware
================================================================================
Version : 1.19.1.0
Date    : 04/23/2018
================================================================================

Fixes
-----

BASE:
- Fixes a firmware crash in FW_ETH_TX_EO_WR handling.
- Fixes host DCB support when FW_PORT_CMD is used.

FOiSCSI:
-  Fixes fw crash when trying to connect to non-existence IPv6 iSNS target.

================================================================================
Version : 1.18.9.0
Date    : 03/27/2018
================================================================================

Fixes
-----

BASE:
- Added a new paramter iqtype to FW_IQ_CMD to identify the ingress NIC or
  offload queues. This fixes an issue where driver was receiving interrupt with
  no new messages in queue.

FOFCoE:
- Fixes a fw hang while creating NPIV.

Enhancements
------------

ETH:
- A new WR FW_ETH_TX_PKTS_VM_WR added to support VM packet coalescing.

================================================================================
Version : 1.18.4.0
Date    : 02/28/2018
================================================================================

Enhancements
------------

BASE:
- Added a new firmware API to retrieve the maximum temperaturethreshold for
  the chip (FW_PARAM_DEV_DIAG_MAXTMPTHRESH).

================================================================================
Version : 1.17.14.0
Date    : 12/27/2017
================================================================================

FIXES
-----

BASE:
- Fixed an issue in vlan acl which was limiting vlan range to 1024.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-05-05 20:16:08 +00:00
Justin Hibbits
10d0cdfc6e Add support for powernv POWER9 MMU initialization
The POWER9 MMU (PowerISA 3.0) is slightly different from current
configurations, using a partition table even for hypervisor mode, and
dropping the SDR1 register.  Key off the newly early-enabled CPU features
flags for the new architecture, and configure the MMU appropriately.

The POWER9 MMU ignores the "PSIZ" field in the PTCR, and expects a 64kB
table.  As we are enabled for powernv (hypervisor mode, no VMs), only
initialize partition table entry 0, and zero out the rest.  The actual
contents of the register are identical to SDR1 from previous architectures.

Along with this, fix a bug in the page table allocation with very large
memory.  The table can be allocated on any 256k boundary.  The
bootstrap_alloc alignment argument is an int, and with large amounts of
memory passing the size of the table as the alignment will overflow an
integer.  Hard-code the alignment at 256k as wider alignment is not
necessary.

Reviewed by:	nwhitehorn
Tested by:	Breno Leitao
Relnotes:	Yes
2018-05-05 16:00:02 +00:00
Justin Hibbits
55a12bbda2 Break out the cpu_features setup to its own function, to be run earlier
The new POWER9 MMU configuration is slightly different from current setups.
Rather than special-casing on POWER9, move the initialization of cpu_features
and cpu_features2 to as early as possible, so that platform and MMU
configuration can be based upon CPU features instead of specific CPUs if at all
possible.

Reviewed by:	nwhitehorn
2018-05-05 15:48:39 +00:00
Justin Hibbits
4f4f92c58f Add POWER9 to the POWER8 bootstrap case blocks
POWER8 and POWER9 have similar configuration requirements for hypervisor setup,
and in the cases here they're identical.  Add the POWER9 constant to the POWER8
list so it's initialized correctly.

Reviewed by:	nwhitehorn
2018-05-05 15:42:58 +00:00
Andriy Gapon
34577ddb15 amdsbwd: fix reboot status reporting
Originally, I overlooked that PMIO register 0xc0 has a dual personality.
It can either be S5/Reset Status register or Misc. Fix register (aka
debug status register).  The mode is controlled by bit 2 in PMIO
register 0xc4.  Apparently there are register programming requirements
for the second personality, so many BIOSes leave the register, after
programming it, in that mode.  So, we need to switch the register to the
correct mode.

Additionally, AMDSB8_WD_RST_STS was defined incorrectly as bit 13 while
it is actually bit 25 (and the register's width is 32 bits, not 16).

With this change I see the following in dmesg after a reset by the
watchdog:
amdsbwd0: ResetStatus = 0x42000000
amdsbwd0: Previous Reset was caused by Watchdog

MFC after:	2 weeks
2018-05-05 05:22:11 +00:00
Andriy Gapon
bd3afae0ca for bus suspend, detach and shutdown iterate children in reverse order
For most buses all children are equal, so the order does not matter.
Other buses, such as acpi, carefully order their child devices to
express implicit dependencies between them.  For such buses it is safer
to bring down devices in the reverse order.

I believe that this is the reason why hpet_suspend had to be disabled.
Some drivers depend on a working event timer until they are suspended.
But previously we would suspend hpet very early.

I tested this change by makinbg hpet_suspend actually stop HPET timers
and tested that too.

Note that this change is not a complete solution as it does not take
into account bus passes.
A better approach would be to track the actual attach order of the
devices and to use the reverse of that.

Reviewed by:	imp, mav
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D15291
2018-05-05 05:19:32 +00:00
Mateusz Guzik
5ec2c93667 tc: bcopy -> memcpy 2018-05-04 22:48:10 +00:00
Mateusz Guzik
ac7edb45e1 amd64: syscall path bcopy -> memcpy 2018-05-04 22:41:12 +00:00
Mateusz Guzik
0bb311526d Allow the compiler to use __builtin_memcpy
In particular this allows the compiler to avoid heavy-handed machinery
if the to be copied buffer is small.

Reviewed by:	jhb
2018-05-04 22:33:54 +00:00
Jamie Gritton
0e5c6bd436 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
Matt Macy
df66feb8da % WITHOUT_FORMAT_EXTENSIONS= XCC=/usr/local/bin/gcc8 make -j96 buildkernel KERNCONF=GENERIC-NODEBUG -s >& log
% grep "inlining failed" log | wc
     234    3570   36065
Consensus on those polled is that inlining failure warnings are not useful

Approved by:	sbruno
2018-05-04 19:31:28 +00:00
Ian Lepore
d8cf9c4f8b Properly support the GPIO_PIN_PRESET_{LOW,HIGH} options when configuring
a gpio pin.  If neither of the options is specified, pre-set the pin's
output value to the pin's current input value, to achieve glitch-free
transitions to output mode on pins that are pulled up or down at reset
or via fdt pinctrl data.
2018-05-04 19:28:05 +00:00
Matt Macy
8fd222ebb4 fix case where pidx_last might be used uninitialized
Reviewed by:	sbruno
2018-05-04 18:59:01 +00:00
Matt Macy
1ae4848cd0 fix gcc8 warnings
Approved by:	sbruno
2018-05-04 18:57:05 +00:00
Matt Macy
d39c265800 fix gcc8 compile
Approved by:	sbruno
2018-05-04 18:25:07 +00:00
Mark Johnston
1b5c869d64 Fix some races introduced in r332974.
With r332974, when performing a synchronized access of a page's "queue"
field, one must first check whether the page is logically dequeued. If
so, then the page lock does not prevent the page from being removed
from its page queue. Intoduce vm_page_queue(), which returns the page's
logical queue index. In some cases, direct access to the "queue" field
is still required, but such accesses should be confined to sys/vm.

Reported and tested by:	pho
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15280
2018-05-04 17:17:30 +00:00
Ian Lepore
e780d0fc82 Make reading imx6 gpio pins work correctly whether the pin is in open-drain
mode or not.  An earlier attempt to make this work was done in r320456, by
always reading the pad status register (PSR) instead of the data register.
But it turns out the values in PSR only reflect the electrical level of an
output pin if the pad is configured with the SION (Set Input On) bit in the
pinmux config, and most output gpio pads are not configured that way.

So now a gpio read is done by returning the value from the data register,
which works right whether the pin is configured for input or output, unless
the pin has been set for OPENDRAIN mode, in which case the PSR is read
instead.  For this to work, the pin must also be configured with SION turned
on in the fdt pinmux data, which is a reasonable thing to require for the
unusual case of reading an open-drain output pin.
2018-05-04 16:23:54 +00:00
Stephen Hurd
b89827a052 iflib: fix invalid free during queue allocation failure
In r301567, code was added to cleanup to prevent memory leaks for the
Tx and Rx ring structs. This code carefully tracked txq and rxq, and
made sure to free them properly during cleanup.

Because we assigned the txq and rxq pointers into the ctx->ifc_txqs and
ctx->ifc_rxqs, we carefully reset these pointers to NULL, so that
cleanup code would not accidentally free the memory twice.

This was changed by r304021 ("Update iflib to support more NIC designs"),
which removed this resetting of the pointers to NULL, because it re-used
the txq and rxq pointers as an index into the queue set array.

Unfortunately, the cleanup code was left alone. Thus, if we fail to
allocate DMA or fail to configure the queues using the drivers ifdi
methods, we will attempt to free txq and rxq. These variables would now
incorrectly point to the wrong location, resulting in a page fault.

There are a number of methods to correct this, but ultimately the root
cause was that we reuse the txq and rxq pointers for two different
purposes.

Instead, when allocating, store the returned pointer directly into
ctx->ifc_txqs and ctx->ifc_rxqs. Then, assign this to txq and rxq as
index pointers before starting the loop to allocate each queue.
Drop the cleanup code for txq and rxq, and only use ctx->ifc_txqs and
ctx->ifc_rxqs.

Thus, we no longer need to free txq or rxq under any error flow, and
intsead rely solely on the pointers stored in ctx->ifc_txqs and
ctx->ifc_rxqs. This prevents the invalid free(), and ensures that we
still properly cleanup after ourselves as before when failing to
allocate.

Submitted by:	Jacob Keller
Reviewed by:	gallatin, sbruno
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15285
2018-05-04 15:20:34 +00:00
Stephen Hurd
4d613f5d04 iflib: remove unused brscp pointer from iflib_queues_alloc
This pointer was no longer written to as of r315217. Since nothing writes
to the variable, remove it.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin, kmacy, sbruno
Differential Revision:	https://reviews.freebsd.org/D15284
2018-05-04 15:11:16 +00:00
Kyle Evans
6d5127daf4 arm: overlays: Update to new path-based sugar format 2018-05-04 14:38:48 +00:00
Andrey V. Elsukov
5ada542398 Immediately propagate EACCES error code to application from tcp_output.
In r309610 and r315514 the behavior of handling EACCES was changed, and
tcp_output() now returns zero when EACCES happens. The reason of this
change was a hesitation that applications that use TCP-MD5 will be
affected by changes in project/ipsec.

TCP-MD5 code returns EACCES when security assocition for given connection
is not configured. But the same error code can return pfil(9), and this
change has affected connections blocked by pfil(9). E.g. application
doesn't return immediately when SYN segment is blocked, instead it waits
when several tries will be failed.

Actually, for TCP-MD5 application it doesn't matter will it get EACCES
after first SYN, or after several tries. Security associtions must be
configured before initiating TCP connection.

I left the EACCES in the switch() to show that it has special handling.

Reported by:	Andreas Longwitz <longwitz at incore dot de>
MFC after:	10 days
2018-05-04 09:28:12 +00:00
Andriy Gapon
ca7019d2ac opensolaris system_taskq does not need to run at maximum priority
In fact, this taskqueue should use "boring" threads, nothing special
about them.

MFC after:	2 weeks
2018-05-04 07:28:01 +00:00
Matt Macy
748ff486b0 dup1_processes -t 96 -s 5 on a dual 8160
x dup_before
+ dup_after
+------------------------------------------------------------+
|             x                                            + |
|x    x   x   x                                         ++ ++|
|   |____AM___|                                          |AM||
+------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5  1.514954e+08 1.5230351e+08 1.5206157e+08 1.5199371e+08     341205.71
+   5 1.5494336e+08 1.5519569e+08 1.5511982e+08 1.5508323e+08     96232.829
Difference at 95.0% confidence
        3.08952e+06 +/- 365604
        2.03266% +/- 0.245071%
        (Student's t, pooled s = 250681)

Reported by:	mjg@
MFC after:	1 week
2018-05-04 06:51:01 +00:00
Mateusz Guzik
f0648bcc04 amd64: get rid of the pessimized bcopy in syscall arg copy
The code was unnecessarily conditionally copying either 5 or 6 args.
It can blindly copy 6, which also means the size is known at compilation
time and the operation can be depessimized.

Note the entire syscall handling code is rather slow.

Tested on Skylake, sample result for getppid (calls/s):
without pti: 7310106 -> 10653569
with pti: 3304843 -> 4148306

Some syscalls (like read) did not note any difference, other have typically
very modest wins.
2018-05-04 04:05:07 +00:00
Mateusz Guzik
a571c38536 Allow __builtin_memmove instead of bcopy for small buffers of known size
See r323329 for an explanation why this is a good idea.
2018-05-04 04:00:48 +00:00
Pedro F. Giffuni
c85866888d msdosfs: long names of files are created incorrectly.
This fixes a regression that happened in r120492 (2003) where libkiconv
was introduced and we went from checking unlen to checking for '\0'.

PR:		111843
Patch by:	Damjan Jovanovic
MFC after:	1 week
2018-05-04 03:44:12 +00:00
Ed Maste
3804f572a3 zfs_ioctl: avoid out-of-bound read
admbugs:	796
Submitted by:	Domagoj Stolfa <ds815@cam.ac.uk>
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:	avg
MFC after:	1 day
2018-05-04 00:56:41 +00:00
Ed Maste
b525a10ac0 gpart: add fat32lba MBR partition type
FAT32 partition with LBA addressing.

Reviewed by:	marcel
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D15266
2018-05-04 00:34:27 +00:00
Peter Grehan
598b1345d9 Allow PCI VGA devices to be detached.
GPUs often have a VGA PCI class code and are probed/attached
by the VGA driver. Allow them to be detached so they can
be presented as passthru devices to VM guests.

Submitted by:	mmacy
Reviewed by:	jhb, imp, rgrimes
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D15269
2018-05-03 22:51:44 +00:00
Konstantin Belousov
d5effb01f1 Add helper macros to hide some boring repeatable ceremonies to define
ifuncs on x86.

Also keep helpers to define 'pseudo-ifuncs' which are emulated by the
indirect jmp.

Reviewed by:	jhb (previous version, as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D13838
2018-05-03 21:45:59 +00:00
Konstantin Belousov
7035cf14ee Implement support for ifuncs in the kernel linker.
Required MD bits are only provided for x86.

Reviewed by:	jhb (previous version, as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D13838
2018-05-03 21:37:46 +00:00
Kyle Evans
903da53796 Garbage collect the a83t emac overlays
The 4.16 DTS import brought in emac support for the a83t. Since these
boards' DTS is pulled from /boot and I forgot to hook these up to the build,
they should be fairly safe to go away.

The a83t-sid and h3-sid overlays are still relevant. a83t-sid will likely
come in with 4.18 DTS.
2018-05-03 19:49:40 +00:00
Kyle Evans
60eec3517d dtb/allwinner: Add a83t-sid overlay 2018-05-03 19:45:48 +00:00
Jung-uk Kim
e787342e25 Redo r332918 with the ACPICA API and remove debug.acpi.suspend_deep_bounce.
AcpiOsEnterSleep() was meant to implement this feature.

Reviewed by:	avg
2018-05-03 19:00:50 +00:00
Stephen Hurd
aa8a24d37f Allow iflib NIC drivers to sleep rather than busy wait
Since the move to SMP NIC driver locking has had to go through serious
contortions using mtx around long running hardware operations. This moves
iflib past that.

Individual drivers may now sleep when appropriate.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14983
2018-05-03 17:02:31 +00:00
Andriy Gapon
9a042dbc6e amdsbwd: add suspend and resume methods
Without the suspend method the watchdog may fire in S1 state.  Without
the resume method the watchdog is not re-enabled after returning from S3
state.  I observe this on one of my systems.

Not sure if watchdog(4) should participate in the suspend actions.
Right now everything is up to individual drivers.

MFC after:	2 weeks
2018-05-03 15:33:18 +00:00
Sean Bruno
68ff29affa cc_cubic:
- Update cubic parameters to draft-ietf-tcpm-cubic-04

Submitted by:	Matt Macy <mmacy@mattmacy.io>
Reviewed by:	lstewart
Differential Revision:	https://reviews.freebsd.org/D10556
2018-05-03 15:01:27 +00:00
Sean Bruno
dc86af6273 nxge(4) deprecation notice
Submitted by:	kbowling
Reviewed by:	brooks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15277
2018-05-03 14:48:42 +00:00
Andriy Gapon
7bf91e9a53 hpet: use macros instead of magic values for the timer mode
MFC after:	1 week
2018-05-03 13:14:31 +00:00
Konstantin Belousov
9ea6332090 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D13838
2018-05-03 10:17:37 +00:00
Marcin Wojtas
10ebf6661e Add Marvell ArmadaXP and Armada38X to GENERIC config
Include source files and drivers for Marvell ArmadaXP and Armada38X
in GENERIC kernel config.

Submitted by: Michal Mazur <mkm@semihalf.com>
              Rafal Kozik <rk@semihalf.com>
Reviewed by: manu
Tested by: manu
Obtained from: Semihalf
Sponsored by: Stormshield
2018-05-03 01:23:42 +00:00
Marcin Wojtas
30b5fb13f2 Fix SoC identification issue on Marvell platforms
Marvell SoC identification function was called by SYSINIT on all armv7
platforms, which brakes platforms other than Marvell built with
GENERIC config. Fix this by shifting SoC identifying to Marvell platform
initialization.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: manu
Tested by: manu
Obtained from: Semihalf
Sponsored by: Stormshield
2018-05-03 01:10:41 +00:00
Michael Tuexen
f9656ee690 Send an ICMPv6 PacketTooBig message in case of forwading a packet which
is too big for the outgoing interface and no firewall is involed.
This problem was introduced in
https://svnweb.freebsd.org/changeset/base/324996
Thanks to Irene Ruengeler for finding the bug and testing the fix.

Reviewed by:	kp@
MFC after:	3 days
2018-05-02 22:11:16 +00:00
Rick Macklem
7427a9f138 Revert r333183, since I am not sure that just initializing the
list is the correct thing to do and that is already done without
this commit.
2018-05-02 21:29:42 +00:00
Rick Macklem
858bb2fc1a Add two missing LIST_INIT()s.
This patch adds two missing LIST_INIT()s. Found by inspection.
In practice, these are currently no-ops, since the structure they are
in is malloc'd with M_ZERO and all LIST_INIT does is set the pointer
in the list head to NULL. (In other words, the M_ZERO has already
correctly initialized it.)

MFC after:	2 months
2018-05-02 20:36:11 +00:00
Konstantin Belousov
952e75c763 mlx5en: Always allow VLAN id 0.
According to the 802.1Q-2014 9.6 VLAN Tag Control Information, VID value 0
means that there is no VLAN tag assigned to the packet, and only PCP and
DEI values from the tag are meaningful.  Current flow table programming
filter out such packets.

When programming VLAN filter for flow table, unconditionally add rule which
accept packets with VLAN id 0.  The packets are already handled correctly
by the network stack.

Reviewed by:	hselasky, slavash
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-05-02 20:22:03 +00:00
Alexander Motin
c252f63740 Fix LOR between controller and queue locks.
Admin pass-through requests took controller lock before the queue lock,
but in case of request submission to a failed controller controller lock
was taken after the queue lock.  Fix that by reducing the lock scopes and
switching to mtx_pool locks to track pass-through request completion.

Sponsored by:	iXsystems, Inc.
2018-05-02 20:13:03 +00:00
Michael Tuexen
4c6a10903f SImplify the call to tcp_drop(), since the handling of soft error
is also done in tcp_drop(). No functional change.

Sponsored by:	Netflix, Inc.
2018-05-02 20:04:31 +00:00
Oleksandr Tymoshenko
5ebc699ab2 Unbreak RaspberryPi 2 boot after r332839
r332839 changed number of cells per interrupt for local_intc from 1 to 2
to pass type of IRQ. Driver expected only 1 cell so after r332839
all interrupt children of local_intc failed to allocate IRQ resource.

Fix this regression by relaxing check for number of cells in interrupt
property to be either 1 or 2.

PR:		227904
2018-05-02 20:04:25 +00:00
Stephen Hurd
f3e1324b41 Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by:	mmacy <mmacy@mattmacy.io>
Reviewed by:	shurd, sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14969
2018-05-02 19:36:29 +00:00
Peter Grehan
adb947a67a Use PCI power-mgmt to reset a device if FLR fails.
A large number of devices don't support PCIe FLR, in particular
graphics adapters. Use PCI power management to perform the
reset if FLR fails or isn't available, by cycling the device
through the D3 state.

This has been tested by a number of users with Nvidia and AMD GPUs.

Submitted and tested by: Matt Macy
Reviewed by:	jhb, imp, rgrimes
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D15268
2018-05-02 17:41:00 +00:00
Sean Bruno
2695c9c109 Retire ixgb(4)
This driver was for an early and uncommon legacy PCI 10GbE for a single
ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family.

Submitted by:	kbowling
Reviewed by:	brooks imp jeffrey.e.pieper@intel.com
Relnotes:	yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15234
2018-05-02 15:59:15 +00:00
Roger Pau Monné
9021fe72fc xen: fix formatting of xen_init_ops
No functional change

Sponsored by: Citrix Systems R&D
2018-05-02 10:20:55 +00:00
Roger Pau Monné
2602ef7cfa xen: fix gntdev
Current interface to the gntdev in FreeBSD is wrong, and mostly worked
out of luck before the PTI FreeBSD fixes, when kernel and user-space
where sharing the same page tables.

On FreeBSD ioctls have the size of the passed struct encoded in the
ioctl number, because the generic ioctl handler in the OS takes care
of copying the data from user-space to kernel space, and then calls
the device specific ioctl handler. Thus using ioctl structs with
variable sizes is not possible.

The fix is to turn the array of structs at the end of
ioctl_gntdev_alloc_gref and ioctl_gntdev_map_grant_ref into pointers,
that can be properly accessed from the kernel gntdev driver using the
copyin/copyout functions. Note that this is exactly how it's done for
the privcmd driver.

Sponsored by:   Citrix Systems R&D
2018-05-02 10:19:17 +00:00
Alexander Motin
fc9bdb4ee5 Clean enclosure_table when resetting num_enc_table_entries to zero.
Garbage left there by r325363 in some scenarios found to lead to later
enclosure mapping failures.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2018-05-02 02:41:09 +00:00
Scott Long
4899b94bac Refactor dadone(). There was no useful code sharing in it; it was just
a 1500 line switch statement.  Callers now specify a discrete completion
handler, though they're still welcome to track state via ccb_state.

Sponsored by:	Netflix
2018-05-01 21:42:27 +00:00
Navdeep Parhar
e1320420d5 cxgbe(4): Move all TCAM filter code into a separate file.
Sponsored by:	Chelsio Communications
2018-05-01 20:17:22 +00:00
Scott Long
eed99e7557 cam_periph_runccb() changed several years ago to overwrite the ccb callback
pointer.  It's now unhelpful and misleading for callers to continue to set
it, so bring all callers into conformance.  There's no real functional change,
but it makes reading the code a lot less confusing.

Sponsored by:	Netflix
2018-05-01 20:09:29 +00:00
Jung-uk Kim
835b56bfeb MFV: r333077
Merge ACPICA 20180427.
2018-05-01 19:17:38 +00:00
Eric Joyner
ceebc2f348 ixl(4): Update to 1.9.9-k
Refresh upstream driver before impending conversion to iflib.

Major changes:

- Support for descriptor writeback mode (required by ixlv(4) for AVF support)
- Ability to disable firmware LLDP agent by user (PR 221530)
- Fix for TX queue hang when using TSO (PR 221919)
- Separate descriptor ring sizes for TX and RX rings

PR:		221530, 221919
Submitted by:	Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by:	#IntelNetworking
MFC after:	1 day
Relnotes:	Yes
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D14985
2018-05-01 18:50:12 +00:00