Commit Graph

122105 Commits

Author SHA1 Message Date
Stephen Hurd
5ee36c68e1 Work around lack of TX IRQs in iflib for netmap
When poll() is called via netmap, txsync is initially called,
and if there are no available buffers to reclaim, it waits for the driver
to notify of new buffers. Since the TX IRQ is generally not used in iflib
drivers, this ends up causing a timeout.

Work around this by having the reclaim DELAY(1) if it's initially unable
to reclaim anything, then schedule the tx task, which will spin by
continuously rescheduling itself until some buffers are reclaimed. In
general, the delay is enough to allow some buffers to be reclaimed, so
spinning is minimized.

Reported by:	Johannes Lundberg <johalun0@gmail.com>
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15455
2018-05-16 21:03:22 +00:00
Conrad Meyer
4812c5c5e4 teken: Unbreak syscons' use of teken
Only vt(4) initializes these callbacks non-NULL at this time, so invoke the
function pointers conditionally.

Broken in r333669.

Submitted by:	bde@
2018-05-16 18:12:49 +00:00
Navdeep Parhar
47ae7a7e4f cxgbe(4): Fall back to a failsafe configuration built into the firmware
if an error is reported while pre-processing the configuration file that
the driver attempted to use.

Also, allow the user to explicitly use the built-in configuration with
hw.cxgbe.config_file="built-in"

MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-16 17:55:16 +00:00
John Baldwin
964b24b42d Include kernel modules for MALTA kernels.
Sponsored by:	DARPA / AFRL
2018-05-16 17:54:40 +00:00
John Baldwin
ca75fa17ee Export a breakpoint() function to userland for riscv.
As a result, enable tests using breakpoint() on riscv.

Reviewed by:	br
Differential Revision:	https://reviews.freebsd.org/D15191
2018-05-16 16:56:35 +00:00
Ed Maste
c9a1264c61 Clean up vt source whitespace issues 2018-05-16 11:19:03 +00:00
Jean-Sébastien Pédron
4e5a8fdbff vt(4): Resume vt_timer() in vtterm_post_input() only
There is no need to try to resume it after each smaller operations
(putchar, cursor_position, copy, fill).

The resume function already checks if the timer is armed before doing
anything, but it uses an atomic cmpset which is expensive. And resuming
the timer at the end of input processing is enough.

While here, we also skip timer resume if the input is for another
windows than the currently displayed one. I.e. if `ttyv0` is currently
displayed, any changes to `ttyv1` shouldn't resume the timer (which
would refresh `ttyv0`).

By doing the same benchmark as r333669, I get:
  * vt(4), before r333669:  1500 ms
  * vt(4), with this patch:  760 ms
  * syscons(4):              700 ms
2018-05-16 10:08:50 +00:00
Jean-Sébastien Pédron
547e74a8be teken, vt(4): New callbacks to lock the terminal once
... to process input, instead of inside each smaller operations such as
appending a character or moving the cursor forward.

In other words, before we were doing (oversimplified):

  teken_input()
    <for each input character>
      vtterm_putchar()
        VTBUF_LOCK()
        VTBUF_UNLOCK()
      vtterm_cursor_position()
        VTBUF_LOCK()
        VTBUF_UNLOCK()

Now, we are doing:

  vtterm_pre_input()
    VTBUF_LOCK()
  teken_input()
    <for each input character>
      vtterm_putchar()
      vtterm_cursor_position()
  vtterm_post_input()
    VTBUF_UNLOCK()

The situation was even worse when the vtterm_copy() and vtterm_fill()
callbacks were involved.

The new callbacks are:
  * struct terminal_class->tc_pre_input()
  * struct terminal_class->tc_post_input()

They are called in teken_input(), surrounding the while() loop.

The goal is to improve input processing speed of vt(4). As a benchmark,
here is the time taken to write a text file of 360 000 lines (26 MiB) on
`ttyv0`:

  * vt(4), unmodified:      1500 ms
  * vt(4), with this patch: 1200 ms
  * syscons(4):              700 ms

This is on a Haswell laptop with a GENERIC-NODEBUG kernel.

At the same time, the locking is changed in the vt_flush() function
which is responsible to draw the text on screen. So instead of
(indirectly) using VTBUF_LOCK() just to read and reset the dirty area
of the internal buffer, the lock is held for about the entire function,
including the drawing part.

The change is mostly visible while content is scrolling fast: before,
lines could appear garbled while scrolling because the internal buffer
was accessed without locks (once the scrolling was finished, the output
was correct). Now, the scrolling appears correct.

In the end, the locking model is closer to what syscons(4) does.

Differential Revision:	https://reviews.freebsd.org/D15302
2018-05-16 09:01:02 +00:00
Andriy Gapon
c9c4d38aa8 followup to r332730/r332752: set kdb_why to "trap" for fatal traps
This change updates arm, arm64 and mips achitectures.  Additionally, it
removes redundant checks for kdb_active where it already results in
kdb_reenter() and adds kdb_reenter() calls where they were missing.

Some architectures check the return value of kdb_trap(), but some don't.
I haven't changed any of that.

Some trap handling routines have a return code.  I am not sure if I
provided correct ones for returns after kdb_reenter().  kdb_reenter
should never return unless kdb_jmpbufp is NULL for some reason.

Only compile tested for all affected architectures.  There can be bugs
resulting from my poor understanding of architecture specific details.

Reported by:	jhb
Reviewed by:	jhb, eadler
MFC after:	4 weeks
Differential Revision: https://reviews.freebsd.org/D15431
2018-05-16 06:52:08 +00:00
Ed Maste
d6b97a6497 Attempt to fix build by removing EOF backslash-newline
GCC complains:
In file included from .../sys/dev/usb/input/uhid.c:77:
.../usb_rdesc.h:280:37: error: backslash-newline at end of file
2018-05-16 03:17:37 +00:00
Andrew Gallatin
9a6981ef5b Unhook DEBUG_BUFRING from INVARIANTS
Some of the DEBUG_BUFRING checks are racy, and can lead to
spurious assertions when run under high load.  Unhook these
from INVARIANTS until the author can fix or remove them.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-15 23:55:38 +00:00
Navdeep Parhar
b6ce1d5b1d cxgbe(4): Add support for two more flash parts.
Obtained from:	Chelsio Communications
MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-15 22:26:09 +00:00
Warner Losh
d9a7a61b2b Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.) This extends
the method used in da to ada, nda, and mda.

Sponsored by: Netflix
Submitted by: Chuck Silvers
2018-05-15 22:22:10 +00:00
Navdeep Parhar
40f242e440 cxgbe(4): Claim some more T5 and T6 boards.
MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-15 21:54:59 +00:00
Warner Losh
0eedd21317 Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.)

Sponsored by: Netflix
Submitted by: Chuck Silvers
Differential Revision: https://reviews.freebsd.org/D15435
2018-05-15 21:25:35 +00:00
Marius Strobl
646fd30caf - If present, take advantage of the R/W cache of eMMC revision 1.5 and
later devices. These caches work akin to the ones found in HDDs/SSDs
  that ada(4)/da(4) also enable if existent, but likewise increase the
  likelihood of data loss in case of a sudden power outage etc. On the
  other hand, write performance is up to twice as high for e. g. 1 GiB
  files depending on the actual chip and transfer mode employed.
  For maximum data integrity, the usage of eMMC caches can be disabled
  via the hw.mmcsd.cache tunable.
- Get rid of the NOP mmcsd_open().
2018-05-15 21:15:09 +00:00
Marius Strobl
e388d638b1 Restore style(9) conformance after r320844 (actually requested pre-
commit) and bring the r320844 additions in line with existing bits.
2018-05-15 21:07:11 +00:00
Rick Macklem
0ebe2634be End grace for the NFSv4 server if all mounts do ReclaimComplete.
The NFSv4 protocol requires that the server only allow reclaim of state
and not issue any new open/lock state for a grace period after booting.
The NFSv4.0 protocol required this grace period to be greater than the
lease duration (over 2minutes). For NFSv4.1, the client tells the server
that it has done reclaiming state by doing a ReclaimComplete operation.
If all NFSv4 clients are NFSv4.1, the grace period can end once all the
clients have done ReclaimComplete, shortening the time period considerably.
This patch does this. If there are any NFSv4.0 mounts, the grace period
will still be over 2minutes.
This change is only an optimization and does not affect correct operation.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-15 20:28:50 +00:00
Brooks Davis
514ef08caf Unwrap a line that no longer requires wrapping. 2018-05-15 20:14:38 +00:00
Brooks Davis
423349f696 Remove stray tabs from in_lltable_dump_entry(). 2018-05-15 20:13:00 +00:00
Brooks Davis
9c082ac517 Unwrap some not-so-long lines now that extra tabs been removed. 2018-05-15 17:59:46 +00:00
Brooks Davis
04f0d3db4a Remove stray tabs in in6_lltable_dump_entry(). NFC. 2018-05-15 17:57:46 +00:00
Antoine Brodin
147d12a7d3 vmmdev: return EFAULT when trying to read beyond VM system memory max address
Currently, when using dd(1) to take a VM memory image, the capture never ends,
reading zeroes when it's beyond VM system memory max address.
Return EFAULT when trying to read beyond VM system memory max address.

Reviewed by:	imp, grehan, anish
Approved by:	grehan
Differential Revision:	https://reviews.freebsd.org/D15156
2018-05-15 17:20:58 +00:00
Andriy Gapon
7c5ccd2dce calibrate lapic timer in native_lapic_setup
The idea is to calibrate the LAPIC timer just once and only on boot,
given that [at present] the timer constants are global and shared
between all processors.

My primary motivation is to fix a panic that can happen when dynamically
switching to lapic timer.  The panic is caused by a recursion on
et_hw_mtx when printing the calibration results to console.  See the
review for the details of the panic.

Also, the code should become slightly simpler and easier to read.  The
previous code was racy too.  Multiple processors could start calibrating
the global constants concurrently, although that seems to have been
benign.

Reviewed by:	kib, mav, jhb
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D15422
2018-05-15 16:56:30 +00:00
Stephen Hurd
f2cf90e264 Check that ifma_protospec != NULL in inm_lookup
If ifma_protospec is NULL when inm_lookup() is called, there
is a dereference in a NULL struct pointer. This ensures that struct is
not NULL before comparing the address.

Reported by:	dumbbell
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15440
2018-05-15 16:54:41 +00:00
Andrew Turner
25964cd229 Increase the number of pages we allocate in the arm64 early boot. We are
already close to the limit so increasing the kernel size may cause it to
fail to boot when it runs past the end of allocated memory.

Reported by:	manu
Sponsored by:	DARPA, AFRL
2018-05-15 16:44:35 +00:00
Brooks Davis
e15f0023ed Allow freebsd32 __sysctl(2) to return ENOMEM.
This is required by programs like sockstat that read variably sized
sysctls such as kern.file.  The normal path has no such restriction and
the restriction was added without comment along with initial support for
freebsd32 in 2002 (r100384).

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15438
2018-05-15 16:24:58 +00:00
Hans Petter Selasky
e757cb8ecb Add new USB HID driver for Super Nintendo gamepads.
Differential Revision:	https://reviews.freebsd.org/D15385
Submitted by:		johalun@gmail.com (Johannes Lundberg)
Sponsored by:		Mellanox Technologies
2018-05-15 15:36:34 +00:00
Edward Tomasz Napierala
4e6e77b74f Fix sysctl description.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-05-15 15:11:52 +00:00
Sean Bruno
94f8b882e5 igb(4):
I210 restore functionality if pxeboot rom is enabled on this device.

r333345 attempted to determine if this code was needed or it was some kind
of work around for a problem.  Turns out, its definitely a work around for
hardware locking and synchronization that manifests itself if the option
Rom is enabled and is selected as a boot device (there was a PXE attempt).

Reviewed by:	mmacy
Differential Revision:	https://reviews.freebsd.org/D15439
2018-05-15 13:30:59 +00:00
Andriy Gapon
873c2703d8 Fix 'zpool create -t <tempname>'
Creating a pool with a temporary name fails when we also specify custom
dataset properties: this is because we mistakenly call
zfs_set_prop_nvlist() on the "real" pool name which, as expected,
cannot be found because the SPA is present in the namespace with the
temporary name.

Fix this by specifying the correct pool name when setting the dataset
properties.

Author: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

Obtained from:	ZFS on Linux, zfsonlinux/zfs@4ceb8dd6fd
MFC after:	1 week
2018-05-15 13:27:29 +00:00
Hans Petter Selasky
984507528d Add support for setting type of service, TOS, for outgoing RDMA connections
in the krping kernel test utility.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-05-15 07:46:24 +00:00
Navdeep Parhar
b3daa684d8 cxgbe(4): Filtering related features and fixes.
- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.

Sponsored by:	Chelsio Communications
2018-05-15 04:24:38 +00:00
Ed Maste
ea0939f0af subr_pidctrl: use standard 2-Clause FreeBSD license and disclaimer
Approved by:	jeff
2018-05-15 00:50:09 +00:00
Marius Strobl
7217ea7c81 Let mmcsd_ioctl() ensure appropriate privileges via priv_check(9). 2018-05-14 21:57:45 +00:00
Marius Strobl
aafdd1d65a The broken DDR52 support of Intel Bay Trail eMMC controllers rumored
in the commit log of r321385 has been confirmed via the public VLI54
erratum. Thus, stop advertising DDR52 for these controllers.
Note that this change should hardly make a difference in practice as
eMMC chips from the same era as these SoCs most likely support HS200
at least, probably even up to HS400ES.
2018-05-14 21:46:06 +00:00
Stephen Hurd
99031b8f7d Replace rmlock with epoch in lagg
Use the new epoch based reclamation API. Now the hot paths will not
block at all, and the sx lock is used for the softc data.  This fixes LORs
reported where the rwlock was obtained when the sxlock was held.

Submitted by:	mmacy
Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15355
2018-05-14 20:06:49 +00:00
John Baldwin
0b3e6e4c50 Make the common interrupt entry point labels local labels.
Kernel debuggers depend on symbol names to find stack frames with a
trapframe rather than a normal stack frame.  The labels used for the
shared interrupt entry point for the PTI and non-PTI cases did not
match the existing patterns confusing debuggers.  Add the '.L' prefix
to mark these symbols as local so they are not visible in the symbol
table.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-05-14 17:27:53 +00:00
Michael Tuexen
482d420926 sctp_get_mbuf_for_msg() should honor the allinone parameter.
When it is not required that the buffer is not a chain, return
a chain. This is based on a patch provided by Irene Ruengeler.
2018-05-14 15:16:51 +00:00
Michael Tuexen
589c42c2c8 Ensure that the MTU's used are multiple of 4.
The length of SCTP packets is always a multiple of 4. Therefore,
ensure that the MTUs used are also a multiple of 4.

Thanks to Irene Ruengeler for providing an earlier version of this
patch.

MFC after:	1 week
2018-05-14 13:50:17 +00:00
Matt Macy
102ccac21c hwpmc: don't reference domain index with no memory backing it
On multi-socket the domain will be correctly set for a given CPU
regardless of whether or not NUMA is enabled.

Approved by:	sbruno
2018-05-14 06:11:25 +00:00
Nathan Whitehorn
b00df92b1f Final fix for alignment issues with the page table first patched with
r333273 and partially reverted with r333594.

Older CPUs implement addition of offsets into the page table by a
bitwise OR rather than actual addition, which only works if the table is
aligned at a multiple of its own size (they also require it to be aligned
at a multiple of 256KB). Newer ones do not have that requirement, but it
hardly matters to enforce it anyway.

The original code was failing on newer systems with huge amounts of RAM
(> 512 GB), in which the page table was 4 GB in size. Because the
bootstrap memory allocator took its alignment parameter as an int, this
turned into a 0, removing any alignment constraint at all and making
the MMU fail. The first round of this patch (r333273) fixed this case by
aligning it at 256 KB, which broke older CPUs. Fix this instead by widening
the alignment parameter.
2018-05-14 04:00:52 +00:00
Matt Macy
8fa7df3668 pmc: don't add pmc owner to list until setup is complete
Once a pmc owner is added to the pmc_ss_owners list it is
visible for all to see. We don't want this to happen until
setup is complete.

Reported by:	mjg
Approved by:	sbruno
2018-05-14 01:08:47 +00:00
Matt Macy
eae1b7dfa1 pmc: fix buildworld
hid ck_queue.h from user

Approved by:	sbruno
2018-05-14 00:56:33 +00:00
Matt Macy
0f00315cb3 hwpmc: fix load/unload race and vm map LOR
- fix load/unload race by allocating the per-domain list structure at boot

- fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch
  to protect the liveness of elements of the pmc_ss_owners list

Reported by:	pho
Approved by:	sbruno
2018-05-14 00:21:04 +00:00
Matt Macy
0c58f85b8d epoch(9): allow sx locks to be held across epoch_wait()
The INVARIANTS checks in epoch_wait() were intended to
prevent the block handler from returning with locks held.
What it in fact did was preventing anything except Giant
from being held across it. Check that the number of locks
held has not changed instead.

Approved by:	sbruno@
2018-05-14 00:14:00 +00:00
Nathan Whitehorn
b9ff14e6e9 Revert changes to hash table alignment in r333273, which booting on all G5
systems, pending further analysis.
2018-05-13 23:56:43 +00:00
Rick Macklem
8932a4835f Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID.
In the reply to an ExchangeID operation, the NFSv4.1 server returns a
"scope" value (eir_server_scope). If this value is the same, it indicates
that two servers share state, which is never the case for FreeBSD servers.
As such, the value needs to be unique and it was without this patch.
However, I just found out that it is not supposed to change when the
server reboots and without this patch, it did change.
This patch fixes eir_server_scope so that it does not change when the
server is rebooted.
The only affect not having this patch has is that Linux clients don't
reclaim opens and locks after a server reboot, which meant they lost
any byte range locks held before the server rebooted.
It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not
affected by this bug.

MFC after:	1 week
2018-05-13 23:38:01 +00:00
Matt Macy
1f4beb6312 epoch(9): cleanups, additional debug checks, and add global_epoch
- GC the _nopreempt routines
    - to really benefit we'd need a separate routine
    - they're not currently in use
    - they complicate the API for no benefit at this time

- check that we're actually in a epoch section at exit

- handle epoch_call() early in boot

- Fix copyright declaration language

Approved by:	sbruno@
2018-05-13 23:24:48 +00:00
Konstantin Belousov
3f3a2d0f8d Fix PMC_IN_TRAP_HANDLER() for i386 after the 4/4 split.
Sponsored by:	The FreeBSD Foundation
2018-05-13 20:10:02 +00:00
Fedor Uporov
6d4a4ed747 Fix directory blocks checksumming.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15396
2018-05-13 19:48:30 +00:00
Fedor Uporov
c4aa9a026d Fix on-disk inode checksum calculation logic.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15395
2018-05-13 19:29:35 +00:00
Fedor Uporov
e06e5241a0 Fix EXT2FS_DEBUG definition usage.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15394
2018-05-13 19:19:10 +00:00
Mark Johnston
36f8fe9bbb Get rid of vm_pageout_page_queued().
vm_page_queue(), added in r333256, generalizes vm_pageout_page_queued(),
so use it instead.  No functional change intended.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D15402
2018-05-13 13:00:59 +00:00
Rick Macklem
0f13d146a0 Fix a slow leak of session structures in the NFSv4.1 server.
For a fairly rare case of a client doing an ExchangeID after a hard reboot,
the old confirmed clientid still exists, but some clients use a new
co_verifier. For this case, the server was not freeing up the sessions on
the old confirmed clientid.
This patch fixes this case. It also adds two LIST_INIT() macros, which are
actually no-ops, since the structure is malloc()d with M_ZERO so the pointer
is already set to NULL.
It should have minimal impact, since the only way I could exercise this
code path was by doing a hard power cycle (pulling the plus) on a machine
running Linux with a NFSv4.1 mount on the server.
Originally spotted during testing of the ESXi 6.5 client.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-13 12:42:53 +00:00
Rick Macklem
bb3436966a The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK.
When an NFSv4.1 session is busy due to a callback being in progress,
nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK.
The only effect this has is that the DestroySession operation will report
the failure for this case and this probably has little or no effect on a
client. Spotted by inspection and no failures related to this have been
reported.

MFC after:	2 months
2018-05-13 12:29:09 +00:00
Konstantin Belousov
2ebc882927 Detect and optimize reads from the hole on UFS.
- Create getblkx(9) variant of getblk(9) which can return error.
- Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP
  was performed before the buffer is created, and EJUSTRETURN returned
  in case the requested block does not exist.
- Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and
  allocating the pages for it), copying from zero_region instead.

The end result is less page allocations and buffer recycling when a
hole is read, which is important for some benchmarks.

Requested and reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D14917
2018-05-13 09:47:28 +00:00
Matt Macy
f1401123c5 hwpmc/epoch - don't reference domain if NUMA is not set
It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.

Reported by:	pho@, np@
Approved by:	sbruno@
2018-05-12 20:00:29 +00:00
Mark Johnston
5f05bda607 DTrace aarch64: Avoid calling unwind_frame() in the probe context.
unwind_frame() may be instrumented by FBT, leading to recursion into
dtrace_probe(). Manually inline unwind_frame() as we do with stack
unwinding code for other architectures.

Submitted by:	Domagoj Stolfa
Reviewed by:	manu
MFC after:	1 week
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D15359
2018-05-12 15:35:26 +00:00
Emmanuel Vadot
dfb8c122c9 aw_mmc: Rework regulator handling
Don't enable regulator on attach but dealt with them on power_up/power_off
Only set the voltage for the signaling regulator since I don't have boards
that can change the supply voltage.
Enable 1.8v signaling voltage.
2018-05-12 13:14:01 +00:00
Emmanuel Vadot
35a186191f aw_mmc: Do not fully init the controller in attach
Only do a reset of the controller at attach and init it at power_up.
We use to enable some interrupts in reset, only enable the interrupts
we are interested in when doing a request.
While here remove the regulators handling in power_on as it is very wrong
and will be dealt with in another commit.

Tested on: A31, A64
2018-05-12 13:13:34 +00:00
Emmanuel Vadot
2445c37a24 aw_mmc: Remove hardware reset
From all the BSP (Board Source Package) source that I've looked at it seems
that it's never done, remove it.

Tested On: A31, A64
2018-05-12 13:12:59 +00:00
Emmanuel Vadot
a37d59c145 aw_mmc: Read interrupt register value before writing to it
Reported by: jmcneill
2018-05-12 13:12:26 +00:00
Konstantin Belousov
a9c53bbb24 Kernel entry from vm86 mode, where PCB_VM86CALL pcb flag is not set,
is executed on the right stack already.  No copy from the entry stack
to the kstack must be performed for vm86 bios call code to function.

To access the pcb flags on kernel entry, unconditionally switch to
kernel address space if vm86 mode is detected.

This fixes very early vm86 bios calls, typically done when boot is
performed by boot2 without loader, and kernel falls back to BIOS calls
to get SMAP.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
2018-05-12 11:06:59 +00:00
Konstantin Belousov
801bf88ce3 On return from exception or interrupt, returns to vm86 mode with
PCB_VM86CALL pcb flag not set should be treated same as return to
userspace.

Most important, the address space must be switched.  This fixes
usermode vm86 operations after the 4/4 split.

Sponsored by:	The FreeBSD Foundation
2018-05-12 11:02:39 +00:00
Konstantin Belousov
507e50d5f9 Initialize tramp_idleptd during cold pmap startup, before the
exception code is copied to the trampoline.

The correct value is then copied to trampoline automatically, so
tramp_idleptd_reloced can be eliminated.

This will allow to use the same exception entry code to handle traps
from vm86 bios calls on early boot stage, as after the trampoline is
configured.

Sponsored by:	The FreeBSD Foundation
2018-05-12 10:57:34 +00:00
Konstantin Belousov
6652b9d9ea Create a macro for PIC code which loads %cr3 from tramp_idleptd.
Sponsored by:	The FreeBSD Foundation
2018-05-12 10:51:50 +00:00
Konstantin Belousov
2017ad1e81 Fix use of the custom TSS on i386 after the 4/4 split.
Record common_tssd, the descriptor to be written in GDT to point to
the common TSS, before LTR is executed.  The LTR instruction sets the
loaded descriptor type to 386 TSS busy, which traps on reloads.

Sponsored by:	The FreeBSD Foundation
2018-05-12 10:48:53 +00:00
Matt Macy
d626a614b9 hwpmc(9): clear remaining sample work for hardclock
- fix last minute change in 333509 where by runcount references
  to a pmc would remaining causing us to pause loop forever

Approved by:	sbruno
2018-05-12 03:45:30 +00:00
Warner Losh
794af7cfdc Remove extra copy of bcopy.c now that we're using the libkern version
of this file.
2018-05-12 01:43:32 +00:00
Matt Macy
e6b475e0af hwpmc(9): Make pmclog buffer pcpu and update constants
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.

Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &

Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total

After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total

For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &

Collecting 10 samples of `make -j96 buildkernel` from each:

x before
+ after

real time:
    N           Min           Max        Median           Avg        Stddev
x  10          76.4        127.62        84.845        88.577     15.100031
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        -28.398 +/- 10.0344
        -32.0602% +/- 7.69825%
        (Student's t, pooled s = 10.6794)

system time:
    N           Min           Max        Median           Avg        Stddev
x  10       2277.96       6948.53       2949.47      3341.492     1385.2677
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        -2277.47 +/- 920.425
        -68.1574% +/- 8.77623%
        (Student's t, pooled s = 979.596)

x no pmc
+ pmc running
real time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10          76.4        127.62        84.845        88.577     15.100031
Difference at 95.0% confidence
        29.73 +/- 10.0335
        50.5208% +/- 17.0525%
        (Student's t, pooled s = 10.6785)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        1.332 +/- 0.248939
        2.2635% +/- 0.426506%
        (Student's t, pooled s = 0.264942)

system time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10       2277.96       6948.53       2949.47      3341.492     1385.2677
Difference at 95.0% confidence
        2309.97 +/- 920.443
        223.937% +/- 89.3039%
        (Student's t, pooled s = 979.616)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        32.493 +/- 16.0042
        3.15% +/- 1.5794%
        (Student's t, pooled s = 17.0331)

Reviewed by:	jeff@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15155
2018-05-12 01:26:34 +00:00
Rick Macklem
5d4835e4b7 Add support for the TestStateID operation to the NFSv4.1 server.
The Linux client now uses the TestStateID operation, so this patch adds
support for it to the NFSv4.1 server. The FreeBSD client never uses this
operation, so it should not be affected.

MFC after:	2 months
2018-05-11 22:16:23 +00:00
Stephen Hurd
b69888c28f Fix LORs in in6?_leave_group()
r333175 updated the join_group functions, but not the leave_group ones.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15393
2018-05-11 21:42:27 +00:00
Konstantin Belousov
0de8041c8e Remove dead declaration.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-05-11 20:47:45 +00:00
Matt Macy
09f6ff4f1a iflib(9): Add support for cloning pseudo interfaces
Part 3 of many ...
The VPC framework relies heavily on cloning pseudo interfaces
(vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc).

This pulls in that piece. Some ancillary changes get pulled
in as a side effect.

Reviewed by:	shurd@
Approved by:	sbruno@
Sponsored by:	Joyent, Inc.
Differential Revision:	https://reviews.freebsd.org/D15347
2018-05-11 20:08:28 +00:00
Matt Macy
8dcbd0eae6 epoch(9): always set inited in epoch_init
- set inited in the !usedomains case

Reported by:	jhibbits
Approved by:	sbruno
2018-05-11 18:37:14 +00:00
Sean Bruno
3096900d09 vxge(4): deprecation notice
This hardware isn't totally ancient, about equal to a mxge(4) or mlx4en(4),
but the company was sold to Exar which then promptly exited the Ethernet
business so the card was commercially available for under 2 years. On deep
search, the only usage of these cards I found was by the importing of the
driver. There are code quality issues identified by Brooks and Hiren and
no visible use nor maintainership that warrant removal from FreeBSD 12.0.

Submitted by:	kbowling
Reviewed by:	gnn brooks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15363
2018-05-11 17:26:59 +00:00
Andrey V. Elsukov
e287c474be Apply the change from r272770 to if_ipsec(4) interface.
It is guaranteed that if_ipsec(4) interface is used only for tunnel
mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header.
Thus we can consider it as new packet and clear the protocols flags.
This allows ICMP/ICMPv6 properly handle errors that may cause this packet.

PR:		228108
MFC after:	1 week
2018-05-11 16:50:25 +00:00
Kenneth D. Merry
e440863e06 Clear out the entire structure, not just the size of a pointer to it.
sys/dev/ocs/ocs_os.c:
	In ocs_thread_create(), use sizeof(*thread) (instead of
	sizeof(thread)) as the size argument to memset so that we clear
	out the entire thread structure instead of just a few bytes of it.

Submitted by:	jtl
MFC after:	3 days
2018-05-11 14:50:26 +00:00
Ed Maste
d2a80426b3 usbdevs: add new Microchip USB-Ethernet device IDs
LAN7800 USB 3.1 to 10/100/1000 Ethernet with PHY
LAN7801 USB 3.1 to 10/100/1000 Ethernet with RGMII interface

Also update manufacturer name for the Vendor ID.  Microchip acquired
SMSC in May 2012.

Sponsored by:	The FreeBSD Foundation
2018-05-11 13:09:21 +00:00
Mateusz Guzik
726f22e081 amd64: align the .data.exclusive_cache_line section to 128
This aligns the section itself compared to other sections, does not change
internal alignment of fields stored inside. This may or may not come later.

The motivation is partially combating adverse effects of the adjacent cache
line prefetcher. Without the annotation part of read_mostly section was on
the line of fire.
2018-05-11 08:56:39 +00:00
Matt Macy
4aa302dfc9 epoch(9): callback task fixes
- initialize the pcpu STAILQ in the NUMA case
- don't enqueue the callback task if there isn't sufficient work to be done

Reported by:	pho@
Approved by:	sbruno@
2018-05-11 08:16:56 +00:00
Mateusz Guzik
782e38aa48 uma: increase alignment to 128 bytes on amd64
Current UMA internals are not suited for efficient operation in
multi-socket environments. In particular there is very common use of
MAXCPU arrays and other fields which are not always properly aligned and
are not local for target threads (apart from the first node of course).
Turns out the existing UMA_ALIGN macro can be used to mostly work around
the problem until the code get fixed. The current setting of 64 bytes
runs into trouble when adjacent cache line prefetcher gets to work.

An example 128-way benchmark doing a lot of malloc/frees has the following
instruction samples:

before:
kernel`lf_advlockasync+0x43b            32940
          kernel`malloc+0xe5            42380
           kernel`bzero+0x19            47798
   kernel`spinlock_exit+0x26            60423
         kernel`0xffffffff80            78238
                         0x0           136947
   kernel`uma_zfree_arg+0x46           159594
 kernel`uma_zalloc_arg+0x672           180556
   kernel`uma_zfree_arg+0x2a           459923
 kernel`uma_zalloc_arg+0x5ec           489910

after:
            kernel`bzero+0xd            46115
kernel`lf_advlockasync+0x25f            46134
kernel`lf_advlockasync+0x38a            49078
   kernel`fget_unlocked+0xd1            49942
kernel`lf_advlockasync+0x43b            55392
          kernel`copyin+0x4a            56963
           kernel`bzero+0x19            81983
   kernel`spinlock_exit+0x26            91889
         kernel`0xffffffff80           136357
                         0x0           239424

See the review for more details.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D15346
2018-05-11 07:04:57 +00:00
Mateusz Guzik
85c1b3c1cb rmlock: partially depessimize lock/unlock fastpath
Previusly the slow path was folded in and partially jumped over in the
common case.
2018-05-11 06:59:54 +00:00
Matt Macy
5c30b378f0 Allow different bridge types to coexist
if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.

Reviewed by:	sbruno@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15344
2018-05-11 05:00:40 +00:00
Matt Macy
b2cb28963b epoch(9): fix priority handling, make callback lists pcpu, and other fixes
- Lend priority to preempted threads in epoch_wait to handle the case
  in which we've had priority lent to us. Previously we borrowed the
  priority of the lowest priority preempted thread. (pointed out by mjg@)

- Don't attempt allocate memory per-domain on powerpc, we don't currently
  handle empty sockets (as is the case on jhibbits Talos' board).

- Handle deferred callbacks as pcpu lists and poll the lists periodically.
  Currently the interval is 1/hz.

- Drop the thread lock when adaptive spinning. Holding the lock starves
  other threads and can even lead to lockups.

- Keep a generation count pcpu so that we don't keep spining if a thread
  has left and re-entered an epoch section.

- Actually removed the callback from the callback list so that we don't
  double free. Sigh ...

Approved by:	sbruno@
2018-05-11 04:54:12 +00:00
Matt Macy
ef7f29d8e6 Test priority handling in epoch test.
- Double the number of test threads to mp_ncpu*2
- Give each thread a different scheduling priority
2018-05-11 04:47:05 +00:00
Justin Hibbits
04de51dbab No need to bzero splpar_vpa entries
splpar_vpa is in the BSS, so is already zeroed when the kernel starts up.

Tested by:	Leandro Lupori
2018-05-11 02:04:01 +00:00
Dag-Erling Smørgrav
20f8d7bc7e Slight cleanup of interface event logging.
Make if_printf() use vlog() instead of vprintf().  This means it can no
longer return the number of characters printed, as it used to, but every
single call to if_printf() in the entire kernel ignores the return value
anyway; just return 0 so we don't have to change the prototype.

Consistently use if_printf() throughout sys/net/if.c, instead of a
mixture of if_printf() and log().

In ifa_maintain_loopback_route(), don't needlessly log an error if we
either failed to add a route because it already existed or failed to
remove one because it did not.  We still return an error code, though.

MFC after:	1 week
2018-05-11 00:19:49 +00:00
Dag-Erling Smørgrav
6bff85ff9a Reduce <sys/queue.h> pollution.
While <sys/sysctl.h> includes <sys/queue.h> unconditionally, it is only
actually used in code which is conditional on _KERNEL.  Make the #include
itself conditional as well, and fix userland code that uses <sys/queue.h>
for other purposes but relied on <sys/sysctl.h> to bring it in.

MFC after:	1 week
2018-05-11 00:01:43 +00:00
Navdeep Parhar
f348cdad1a cxgbe(4): Add fields to support configuration of hardware NAT and
swapmac (SMAC/DMAC switcheroo) from userspace.

Sponsored by:	Chelsio Communications
2018-05-10 20:39:04 +00:00
Ed Maste
ff8f1e8332 Error out on attempt to link amd64 kernel with old binutils linker
As of r333461 we require ifunc support to link a working amd64 kernel.
The default in-tree bootstrap linker is lld and it has the required
support, as does any modern out-of-tree binutils linker.  The in-tree
GNU ld is from binutils 2.17.50 and it does not have ifunc support,
so produce an error rather than a broken kernel.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D15378
2018-05-10 20:10:02 +00:00
Matt Macy
7bf272a612 Allocate epoch for networking at startup
Additionally add CK to include paths for modules

Approved by:	sbruno@
2018-05-10 19:13:00 +00:00
Matt Macy
06bf2a6aef Add simple preempt safe epoch API
Read locking is over used in the kernel to guarantee liveness. This API makes
it easy to provide livenes guarantees without atomics.

Includes epoch_test kernel module to stress test the API.

Documentation will follow initial use case.

Test case and improvements to preemption handling in response to discussion
with mjg@

Reviewed by:	imp@, shurd@
Approved by:	sbruno@
2018-05-10 17:55:24 +00:00
Li-Wen Hsu
137c41d763 Fix build for platforms using GCC:
- Remove unused or dead store variable
- Remove unused function ctl_copyin_alloc
- Add missing curly brackets, this seems a regression in r287720

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15383
2018-05-10 17:22:04 +00:00
Jean-Sébastien Pédron
5e251aec86 vt(4): Use default VGA palette
Before this change, the VGA palette was configured to match the shell
palette (e.g. color #1 was red). There was one glitch early in boot when
the vt(4)'s VGA palette was loaded: the loader's logo would switch from
red to blue. Likewise for the "Booting..." message switching from blue
to red. That's because the loader's logo was drawed with the default VGA
palette where a few colors are swapped compared to the shell palette
(e.g. blue <-> red).

This change configures the default VGA palette during initialization and
converts input's colors from shell to VGA palette index.

There should be no visible changes, except the loader's logo which will
keep its original color.

Reviewed by:	eadler
2018-05-10 17:00:33 +00:00
Jean-Sébastien Pédron
e525438697 vt(4): Put for() loop outside switch() in vt_generate_cons_palette()
This makes it more logical:
 1. It checks the requested color format
 2. It fills the palette accordingly

Also vt_palette_init() is only called when needed (i.e. when the format
is `COLOR_FORMAT_RGB`).
2018-05-10 16:41:47 +00:00
Andrew Gallatin
66fe09d8d9 Fix a panic in the IPv6 multicast code.
Use LIST_FOREACH_SAFE in in6m_disconnect() since we're
deleting and freeing item from the membership list
while traversing the list.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-10 16:19:41 +00:00
Konstantin Belousov
8b4fc8b11c Make fpusave() and fpurestore() on amd64 ifuncs.
From now on, linking amd64 kernel requires either lld or newer ld.bfd.

Reviewed by:	jhb (as part of the large patch)
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13838
2018-05-10 15:01:43 +00:00
Andrew Gallatin
d5cdcc3a06 Fix the build after r333457
In r333457, the arguments to kern_pwritev() were accidentally
re-ordered as part of ANSIfication, breaking the build.
2018-05-10 13:19:42 +00:00
Ed Maste
cc3c9df80f ANSIfy sys_generic.c 2018-05-10 11:36:16 +00:00
Marcin Wojtas
2339f28c6e Do not pass header length to the ENA controller
Header length is optional hint for the ENA device. Because It is not
guaranteed that every packet header will be in the first mbuf
segment, it is better to skip passing any information. If the header
length will be indicating invalid value (different than 0), then the
packet will be dropped.

This kind situation can appear, when the UDP packet will be fragmented
by the stack in the ip_fragment() function.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reported by:  Krishna Yenduri <kyenduri@brkt.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2018-05-10 09:37:54 +00:00
Emmanuel Vadot
43fd679efb arm64: Add ALT_BREAK_TO_DEBUGGER to GENERIC
It is useful to enter kdb with an escape sequence.
While here move the USB_DEBUG with the others debug options and define
nooptions USB_DEBUG for GENERIC-NODEBUG
2018-05-10 09:37:50 +00:00
Marcin Wojtas
dbf2eb543b Skip setting the MTU for ENA if it is not changing
On AWS, a network interface can get reinitialized every 30 minutes due
to the MTU being (re)set when a new DHCP lease is obtained. This can
cause packet drop, along with annoying syslog messages.

Skip setting the MTU in the ena driver if the new MTU is the same as the
old MTU. Note this fix is already in the netfront driver.

Testing: Verified ena up/down messages do not appear every 30 min in
/var/log/messages with the fix in place.

Submitted by:   Krishna Yenduri <kyenduri@brkt.com>
Reviewed by: Michal Krawczyk <mk@semihalf.com>
2018-05-10 09:32:59 +00:00
Marcin Wojtas
6461d6a396 Apply fixes in ena-com
* Change ena-com BIT macro to work on unsigned value.
  To make the shifting operations safer, they should be working on
  unsigned values.

* Fix a mutex not owned ASSERT panic in ENA control path.
  A thread calling cv_broadcast()/cv_signal() must hold the mutex used for
  cv_wait(). Fix the ENA control path code that has this problem.

Submitted by:   Krishna Yenduri <kyenduri@brkt.com>
Reviewed by:    Michal Krawczyk <mk@semihalf.com>
Tested by:      Michal Krawczyk <mk@semihalf.com>
2018-05-10 09:25:51 +00:00
Marcin Wojtas
fbb0ed71b2 Upgrade ENA version to v0.8.1
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2018-05-10 09:06:21 +00:00
Xin LI
b6f7731dba Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-10 06:41:08 +00:00
Navdeep Parhar
f7a203bc21 cxgbe(4): Disable write-combined doorbells by default.
This had been the default behavior but was changed accidentally as part
of the recent iw_cxgbe+OFED overhaul.  Fix another bug in that change
while here: the global knob affects all the adapters in the system and
should be left alone by per-adapter code.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-05-10 06:33:54 +00:00
Justin Hibbits
b4a0a59871 Fix PPC symbol resolution
Summary:
There were 2 issues that were preventing correct symbol resolution
on PowerPC/pseries:

1- memory corruption at chrp_attach() - this caused the inital
   part of the symbol table to become zeroed, which would cause
   the kernel linker to fail to parse it.
   (this was probably zeroing out other memory parts as well)

2- DDB symbol resolution wasn't working because symtab contained
   not relocated addresses but it was given relocated offsets.
   Although relocating the symbol table fixed this, it broke the
   linker, that already handled this case.
   Thus, the fix for this consists in adding a new DDB macro:
   DB_STOFFS(offs) that converts a (potentially) relocated offset
   into one that can be compared with symbol table values.

PR:		227093
Submitted by:	Leandro Lupori <leandro.lupori_gmail.com>
Differential Revision: https://reviews.freebsd.org/D15372
2018-05-10 03:59:48 +00:00
Marcelo Araujo
8951f05525 Rework CTL frontend & backend options to use nv(3), allow creating multiple
ioctl frontend ports.

This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
  Removes ctl_be_arg structure and associated logic and replaces it with
  nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
  ctld(8).
  New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
  Those device nodes respond only to CTL_IO ioctl.

New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4

New syntax for ctl.conf:
target ... {
    port ioctl/<pp>
    ...
}

target ... {
    port ioctl/<pp>/<vp>
    ...

Note: Most of this work was made by jceel@, thank you.

Submitted by:	jceel
Reworked by:	myself
Reviewed by:	mav (earlier versions and recently during the rework)
Obtained from:  FreeNAS and TrueOS
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D9299
2018-05-10 03:50:20 +00:00
Warner Losh
3429b518c9 Remove unused bcopyb.
Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:54 +00:00
Warner Losh
baaa3c4d60 Simplify things a little
Rather than include a copy for memmove to call bcopy to call memcpy
(which handles overlapping copies), make memmove a strong reference to
memcpy to save the two calls.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:48 +00:00
Warner Losh
5aa07b053a Move MI-ish bcopy routine to libkern
riscv and powerpc have nearly identical bcopy.c that's
supposed to be mostly MI. Move it to the MI libkern.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:38 +00:00
Navdeep Parhar
5174205de5 cxgbe(4): Determine whether the firmware supports the FILTER2 work
request, which can be used to configure hardware NAT and swapmac.

All firmwares released after Jan 2017 support this work request.

Sponsored by:	Chelsio Communications
2018-05-10 00:04:14 +00:00
Mark Johnston
e3d5c4ade1 Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-09 20:57:18 +00:00
Mariusz Zaborski
31f7586d73 Introduce the 'n' flag for the geli attach command.
If the 'n' flag is provided the provided key number will be used to
decrypt device. This can be used combined with dryrun to verify if the key
is set correctly. This can be also used to determine which key slot we want to
change on already attached device.

Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D15309
2018-05-09 20:53:38 +00:00
Warner Losh
041f49aece Remove the 'All Rights Reserved' clause from some of the stuff I've
done for Netflix, since I'm in the neighborhood.
2018-05-09 20:32:23 +00:00
Warner Losh
33123867af Use the full year, for real this time. 2018-05-09 20:26:37 +00:00
Mark Johnston
b4fa90d6f9 Fix bxe(4) netdump rx polling.
Reviewed by:	cem, rstone
X-MFC with:	r333287
Sponsored by:	Dell EMC Isilon
2018-05-09 19:54:34 +00:00
Cy Schubert
4273f67609 Fix style error introduced in r333393.
Reported by:	jhb, imp, phk
MFC after:	6 days
X-MFC with:	r333393
2018-05-09 19:05:27 +00:00
Matt Macy
36688f706e Add taskqgroup_config_gtask_deinit to support teardown after
taskqgroup_config_gtask_init.

Approved by:	sbruno
2018-05-09 18:51:35 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Matt Macy
ca9551221b Remove bogus panic
r333345 added a panic to the default case statement on the incorrect
premise that it should "never happen" when in fact it is simply a
different adapter version.

Reported by:	markj
Approved by:	sbruno
2018-05-09 17:48:52 +00:00
Kyle Evans
f0fb94abca Standardize SPDX tag on files I've added 2018-05-09 16:52:28 +00:00
Kyle Evans
4b3c64f722 Remove "All Rights Reserved" on files that I hold sole copyright on
See r333391 for more detail; in summary: it holds no weight and may be
removed.
2018-05-09 16:44:19 +00:00
John Baldwin
485415ec47 Report TRAP_BRKPT for breakpoint traps on sparc64.
Reviewed by:	marius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15190
2018-05-09 15:25:26 +00:00
Mateusz Guzik
20ca271fdd amd64: depessimize bcmp for small buffers
Adapt assembly generated by clang for memcmp and use it for <= 64 sized
compares (which are the vast majority).

Sample result of doing stats on Broadwell (% of samples):
before: 4.0 kernel     bcmp                 cache_lookup
after : 0.7 kernel     bcmp                 cache_lookup

The routine is most definitely still not optimal. Anyone interested in
spending time improving it is welcome to take over.

Reviewed by:	kib
2018-05-09 15:16:25 +00:00
Konstantin Belousov
55c9d75e6b Avoid calls to bzero() before ireloc.
Evaluate cpu_stdext_feature early to have moved link_elf_ireloc() see
correct flags, most important is SMAP.

Tested by:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D15367
2018-05-09 14:39:24 +00:00
Warner Losh
603bbd0631 Minor style nits
Use full copyright year.
Remove 'All Rights Reserved' from new file (rights holder OK'd)
Minor #ifdef motion and #endif tagging
Remove __FBSDID macro from comments

Sponsored by: Netflix
OK'd by: rrs@
2018-05-09 14:11:35 +00:00
Konstantin Belousov
71d1bbce91 Remove PG_U from the rest of the kernel pmap ptes.
Supposedly, they PG_U bits there were set to easier making some kernel
page accessible to userspace in-place.  Since it was not used for the
whole existence of the amd64 pmap.c and current design of the shared
pages prefers double-mapping over the in-place access, remove PG_U
both from the direct map and KVA slots.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-09 12:09:08 +00:00
Konstantin Belousov
5aaa5bc3d6 Remove PG_U from the recursive pte for kernel pmap' PML4 page.
This PML4 page is never used for the userspace process, so there is no
security implications.  But the configuration trips SMAP check, which
should be corrected.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-09 12:03:40 +00:00
Andrey V. Elsukov
782360dec3 Bring in some last changes in NAT64 implementation:
o Modify ipfw(8) to be able set any prefix6 not just Well-Known,
  and also show configured prefix6;
o relocate some definitions and macros into proper place;
o convert nat64_debug and nat64_allow_private variables to be
  VNET-compatible;
o add struct nat64_config that keeps generic configuration needed
  to NAT64 code;
o add nat64_check_prefix6() function to check validness of specified
  by user IPv6 prefix according to RFC6052;
o use nat64_check_private_ip4() and nat64_embed_ip4() functions
  instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows
  to use any configured IPv6 prefixes that are allowed by RFC6052;
o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is
  Well-Known IPv6 prefix. It is used to reduce overhead to check this;
o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config
  structure. And respectivelly modify the rest of code;
o remove now unused ro argument from nat64_output() function;
o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions;
o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile
  as example.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2018-05-09 11:59:24 +00:00
Andrey V. Elsukov
2e4531a12b Add IFCAP_LINKSTATE support to if_loop(4).
Reviewed by:	wollman
Obtained from:	Yandex LLC
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D15278
2018-05-09 10:50:51 +00:00
Hans Petter Selasky
c20feee43b Add myself to copyright in the LinuxKPI RCU support layer.
Suggested by:	mmacy@
Sponsored by:	Mellanox Technologies
2018-05-09 08:50:42 +00:00
Navdeep Parhar
89f651e704 cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool.  Hash and
normal TCAM filters can be used together.  The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters.  Any T5/T6 card with memory can support at
least half a million hash filters.  The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file.  The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto).  It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters.  A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by:	Chelsio Communications
2018-05-09 04:09:49 +00:00
Cy Schubert
bb7af25076 Document intentional fallthrough. (CID 976535)
MFC after:	1 week
2018-05-09 02:07:09 +00:00
Cy Schubert
8d3478a26f Fix memory leak. (CID 1199373).
MFC after:	1 week
2018-05-09 02:02:58 +00:00
Matt Macy
ad738f3791 Reduce overhead of ktrace checks in the common case.
KTRPOINT() checks both if we are tracing _and_ if we are recursing within
ktrace. The second condition is only ever executed if ktrace is actually
enabled. This change moves the check out of the hot path in to the functions
themselves.

Discussed with mjg@

Reported by:	mjg@
Approved by:	sbruno@
2018-05-09 00:00:47 +00:00
Sean Bruno
57b4936514 nxge(4):
Remove nxge(4) and associated man page and tools in FreeBSD 12.0.

Submitted by:	kbowling
Reviewed by:	brooks
Relnotes:	yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D1529
2018-05-08 21:14:29 +00:00
Michael Tuexen
45d41de5e6 Fix two typos reported by N. J. Mann, which were introduced in
https://svnweb.freebsd.org/changeset/base/333382 by me.

MFC after:	3 days
2018-05-08 20:39:35 +00:00
Michael Tuexen
9669e724d1 When reporting ERROR or ABORT chunks, don't use more data
that is guaranteed to be contigous.
Thanks to Felix Weinrank for finding and reporting this bug
by fuzzing the usrsctp stack.

MFC after:	3 days
2018-05-08 18:48:51 +00:00
Jung-uk Kim
e7dfa7d8ab MFV: r333378
Import ACPICA 20180508.
2018-05-08 18:18:27 +00:00
Stephen Hurd
ac88e6da11 iflib: print message when iflib_tx_structures_setup fails
Print a message when iflib_tx_structures_setup fails, like we do for
iflib_rx_structures_setup.

Now that we always print a message from within
iflib_qset_structures_setup when it fails, stop printing one in
iflib_device_register() at the call site.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15300
2018-05-08 17:15:10 +00:00
Konstantin Belousov
053641bb1c Prepare DB# handler for deferred trigger of watchpoints.
Since pop %ss/mov %ss instructions defer all interrupts and exceptions
for the next instruction, it is possible that the userspace watchpoint
trap executes on the first instruction of the kernel entry for
syscall/bpt.

In this case, DB# should be treated similarly to NMI: on amd64 we must
always load GSBASE even if the trap comes from kernel mode, and load
the kernel page table root into %cr3.  Moreover, the trap must
use the dedicated stack, because we are still on the user stack when
trapped on syscall entry.

For i386, we must reload %cr3.  The syscall instruction is not configured,
so there is no issue with executing on user stack when trapping.

Due to some CPU erratas it is not always possible to detect that the
userspace watchpoint triggered by inspecting %dr6.  In trap(), compare the
trap %rip with the known unsafe entry points and if matched pretend that
the watchpoint did not fire at all.

Thank you to the MSRC Incident Response Team, and in particular Greg
Lenti and Nate Warfield, for coordinating the response to this issue
across multiple vendors.

Thanks to Computer Recycling at The Working Center of Kitchener for
making hardware available to allow us to test the patch on additional
CPU families.

Reviewed by:	jhb
Discussed with:	Matthew Dillon
Tested by:	emaste
Sponsored by:	The FreeBSD Foundation
Security:	CVE-2018-8897
Security:	FreeBSD-SA-18:06.debugreg
2018-05-08 17:00:34 +00:00
Stephen Hurd
6108c01395 iflib: cleanup queues when iflib_device_register fail
Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15299
2018-05-08 16:56:02 +00:00
Justin Hibbits
151c44e22b Fix wrong cpu0 identification
Summary:
chrp_cpuref_init() was relying on the boot strap processor to be
the first child of /cpus. That was not always the case, specially
on pseries with FDT.

This change uses the "reg" property of each CPU instead and also
adds several sanity checks to avoid unexpected behavior (maybe
too many panics?).

The main observed symptom was interrupts being missed by the main
processor, leading to timeouts and the kernel aborting the boot.

Submitted by:	Leandro Lupori
Reviewed by:	nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15174
2018-05-08 13:23:39 +00:00
Hans Petter Selasky
306cf294b2 Fix for missing network interface address event when adding the default IPv6
based link-local address.

The default link local address for IPv6 is added as part of bringing the
network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)"
from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should
catch all the cases of adding IPv6 based addresses to a network interface.
Add a witness warning in case the event handler is not allowed to sleep.

Reviewed by:	network (ae), kib
Differential Revision:	https://reviews.freebsd.org/D13407
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-08 11:39:01 +00:00
Matt Macy
10d20c84ed Fix spurious retransmit recovery on low latency networks
TCP's smoothed RTT (SRTT) can be much larger than an actual observed RTT. This can be either because of hz restricting the calculable RTT to 10ms in VMs or 1ms using the default 1000hz or simply because SRTT recently incorporated a larger value.

If an ACK arrives before the calculated badrxtwin (now + SRTT):
tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1));

We'll erroneously reset snd_una to snd_max. If multiple segments were dropped and this happens repeatedly the transmit rate will be limited to 1MSS per RTO until we've retransmitted all drops.

Reported by:	rstone
Reviewed by:	hiren, transport
Approved by:	sbruno
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8556
2018-05-08 02:22:34 +00:00
Matt Macy
d5210708dd Sleep rather than spin in e1000 when doing long running config operations.
With r333218 it is now possible for drivers to use an sx lock and thus sleep while
waiting on long running operations rather than DELAY().

Reported by:	gallatin
Reviewed by:	sbruno
Approved by:	sbruno
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14984
2018-05-08 01:39:45 +00:00
Mateusz Guzik
2824088536 Inlined sched_userret.
The tested condition is rarely true and it induces a function call
on each return to userspace.

Bumps getuid rate by about 1% on Broadwell.
2018-05-07 23:36:16 +00:00