Commit Graph

131412 Commits

Author SHA1 Message Date
mmacy
80818b2b26 Fix i386 build
Move epoch_section to after td_emuldata, but note the 3 surrounding LP64 holes
while I'm here.

Approved by:	sbruno
2018-05-17 02:54:30 +00:00
lstewart
c5f2dadecb Plug a memory leak and potential NULL-pointer dereference introduced in r331214.
Each TCP connection that uses the system default cc_newreno(4) congestion
control algorithm module leaks a "struct newreno" (8 bytes of memory) at
connection initialisation time. The NULL-pointer dereference is only germane
when using the ABE feature, which is disabled by default.

While at it:

- Defer the allocation of memory until it is actually needed given that ABE is
  optional and disabled by default.

- Document the ENOMEM errno in getsockopt(2)/setsockopt(2).

- Document ENOMEM and ENOBUFS in tcp(4) as being synonymous given that they are
  used interchangeably throughout the code.

- Fix a few other nits also accidentally omitted from the original patch.

Reported by:	Harsh Jain on freebsd-net@
Tested by:	tjh@
Differential Revision:	https://reviews.freebsd.org/D15358
2018-05-17 02:46:27 +00:00
np
24435f6990 cxgbe(4): Allocate offload Tx queues when a card has resources
provisioned for NIC_ETHOFLD and the kernel has option RATELIMIT.

It is possible to use the chip's offload queues for normal NIC Tx and
not just TOE Tx.  The difference is that these queues support out of
order processing of work requests and have a per-"flowid" mechanism for
tracking credits between the driver and hardware.  This allows Tx for
any number of flows bound to different rate limits to be submitted to a
single Tx queue and the work requests for slow flows won't cause HOL
blocking for the rest.

Sponsored by:	Chelsio Communications
2018-05-17 01:42:18 +00:00
mmacy
8520e87bb7 epoch(9): make recursion lighter weight
There isn't any real work to do except bump td_epochnest when recursing.
Skip the additional work in this case.

Approved by:	sbruno
2018-05-17 01:13:40 +00:00
np
61e9a1a2fe cxgbe(4): Add NIC_ETHOFLD to the NIC capabilities allowed by the driver
by default.

This is the first of a series of commits that will add support for
RATELIMIT kernel option to the base if_cxgbe driver, for use with
ordinary NIC traffic "flows".  RATELIMIT is already supported by t4_tom
for the fully-offloaded TCP connections that it handles.

Sponsored by:	Chelsio Communications
2018-05-17 00:52:48 +00:00
mmacy
c6869bc0ff epoch(9): Guarantee forward progress on busy sections
Add epoch section to struct thread. We can use this to
ennable epoch counter to advance even if a section is
perpetually occupied by a thread.

Approved by:	sbruno
2018-05-17 00:45:35 +00:00
mckusick
4b68405396 Fix warning found by Coverity.
CID 1009353:  Error handling issues  (CHECKED_RETURN)
2018-05-16 23:42:02 +00:00
mckusick
6a5f395205 Revert change made in base r171522
(https://svnweb.freebsd.org/base?view=revision&revision=304232)
converting clrbuf() (which clears the entire buffer) to vfs_bio_clrbuf()
(which clears only the new pages that have been added to the buffer).

Failure to properly remove pages from the buffer cache can make
pages that appear not to need clearing to actually have bad random
data in them. See for example base r304232
(https://svnweb.freebsd.org/base?view=revision&revision=304232)
which noted the need to set B_INVAL and B_NOCACHE as well as clear
the B_CACHE flag before calling brelse() to release the buffer.

Rather than trying to find all the incomplete brelse() calls, it
is simpler, though more slightly expensive, to simply clear the
entire buffer when it is newly allocated.

PR: 213507
Submitted by: Damjan Jovanovic
Reviewed by:  kib
2018-05-16 23:30:03 +00:00
mmacy
b4ad383689 hwpmc: Implement per-thread counters for PMC sampling
This implements per-thread counters for PMC sampling. The thread
descriptors are stored in a list attached to the process descriptor.
These thread descriptors can store any per-thread information necessary
for current or future features. For the moment, they just store the counters
for sampling.

The thread descriptors are created when the process descriptor is created.
Additionally, thread descriptors are created or freed when threads
are started or stopped. Because the thread exit function is called in a
critical section, we can't directly free the thread descriptors. Hence,
they are freed to a cache, which is also used as a source of allocations
when needed for new threads.

Approved by:	sbruno
Obtained from:	jtl
Sponsored by:	Juniper Networks, Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15335
2018-05-16 22:29:20 +00:00
mmacy
00950b6e0c Fix !netmap build post r333686
Approved by:	sbruno
2018-05-16 22:25:47 +00:00
shurd
df10b02879 Work around lack of TX IRQs in iflib for netmap
When poll() is called via netmap, txsync is initially called,
and if there are no available buffers to reclaim, it waits for the driver
to notify of new buffers. Since the TX IRQ is generally not used in iflib
drivers, this ends up causing a timeout.

Work around this by having the reclaim DELAY(1) if it's initially unable
to reclaim anything, then schedule the tx task, which will spin by
continuously rescheduling itself until some buffers are reclaimed. In
general, the delay is enough to allow some buffers to be reclaimed, so
spinning is minimized.

Reported by:	Johannes Lundberg <johalun0@gmail.com>
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15455
2018-05-16 21:03:22 +00:00
cem
27c89ceaea teken: Unbreak syscons' use of teken
Only vt(4) initializes these callbacks non-NULL at this time, so invoke the
function pointers conditionally.

Broken in r333669.

Submitted by:	bde@
2018-05-16 18:12:49 +00:00
np
4202e51951 cxgbe(4): Fall back to a failsafe configuration built into the firmware
if an error is reported while pre-processing the configuration file that
the driver attempted to use.

Also, allow the user to explicitly use the built-in configuration with
hw.cxgbe.config_file="built-in"

MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-16 17:55:16 +00:00
jhb
bdb0ac4763 Include kernel modules for MALTA kernels.
Sponsored by:	DARPA / AFRL
2018-05-16 17:54:40 +00:00
jhb
1f3be0d244 Export a breakpoint() function to userland for riscv.
As a result, enable tests using breakpoint() on riscv.

Reviewed by:	br
Differential Revision:	https://reviews.freebsd.org/D15191
2018-05-16 16:56:35 +00:00
emaste
75a33a66f9 Clean up vt source whitespace issues 2018-05-16 11:19:03 +00:00
dumbbell
6630e31df5 vt(4): Resume vt_timer() in vtterm_post_input() only
There is no need to try to resume it after each smaller operations
(putchar, cursor_position, copy, fill).

The resume function already checks if the timer is armed before doing
anything, but it uses an atomic cmpset which is expensive. And resuming
the timer at the end of input processing is enough.

While here, we also skip timer resume if the input is for another
windows than the currently displayed one. I.e. if `ttyv0` is currently
displayed, any changes to `ttyv1` shouldn't resume the timer (which
would refresh `ttyv0`).

By doing the same benchmark as r333669, I get:
  * vt(4), before r333669:  1500 ms
  * vt(4), with this patch:  760 ms
  * syscons(4):              700 ms
2018-05-16 10:08:50 +00:00
dumbbell
b9337da075 teken, vt(4): New callbacks to lock the terminal once
... to process input, instead of inside each smaller operations such as
appending a character or moving the cursor forward.

In other words, before we were doing (oversimplified):

  teken_input()
    <for each input character>
      vtterm_putchar()
        VTBUF_LOCK()
        VTBUF_UNLOCK()
      vtterm_cursor_position()
        VTBUF_LOCK()
        VTBUF_UNLOCK()

Now, we are doing:

  vtterm_pre_input()
    VTBUF_LOCK()
  teken_input()
    <for each input character>
      vtterm_putchar()
      vtterm_cursor_position()
  vtterm_post_input()
    VTBUF_UNLOCK()

The situation was even worse when the vtterm_copy() and vtterm_fill()
callbacks were involved.

The new callbacks are:
  * struct terminal_class->tc_pre_input()
  * struct terminal_class->tc_post_input()

They are called in teken_input(), surrounding the while() loop.

The goal is to improve input processing speed of vt(4). As a benchmark,
here is the time taken to write a text file of 360 000 lines (26 MiB) on
`ttyv0`:

  * vt(4), unmodified:      1500 ms
  * vt(4), with this patch: 1200 ms
  * syscons(4):              700 ms

This is on a Haswell laptop with a GENERIC-NODEBUG kernel.

At the same time, the locking is changed in the vt_flush() function
which is responsible to draw the text on screen. So instead of
(indirectly) using VTBUF_LOCK() just to read and reset the dirty area
of the internal buffer, the lock is held for about the entire function,
including the drawing part.

The change is mostly visible while content is scrolling fast: before,
lines could appear garbled while scrolling because the internal buffer
was accessed without locks (once the scrolling was finished, the output
was correct). Now, the scrolling appears correct.

In the end, the locking model is closer to what syscons(4) does.

Differential Revision:	https://reviews.freebsd.org/D15302
2018-05-16 09:01:02 +00:00
avg
440049bc68 followup to r332730/r332752: set kdb_why to "trap" for fatal traps
This change updates arm, arm64 and mips achitectures.  Additionally, it
removes redundant checks for kdb_active where it already results in
kdb_reenter() and adds kdb_reenter() calls where they were missing.

Some architectures check the return value of kdb_trap(), but some don't.
I haven't changed any of that.

Some trap handling routines have a return code.  I am not sure if I
provided correct ones for returns after kdb_reenter().  kdb_reenter
should never return unless kdb_jmpbufp is NULL for some reason.

Only compile tested for all affected architectures.  There can be bugs
resulting from my poor understanding of architecture specific details.

Reported by:	jhb
Reviewed by:	jhb, eadler
MFC after:	4 weeks
Differential Revision: https://reviews.freebsd.org/D15431
2018-05-16 06:52:08 +00:00
emaste
08fb35de69 Attempt to fix build by removing EOF backslash-newline
GCC complains:
In file included from .../sys/dev/usb/input/uhid.c:77:
.../usb_rdesc.h:280:37: error: backslash-newline at end of file
2018-05-16 03:17:37 +00:00
gallatin
b43b15d5bc Unhook DEBUG_BUFRING from INVARIANTS
Some of the DEBUG_BUFRING checks are racy, and can lead to
spurious assertions when run under high load.  Unhook these
from INVARIANTS until the author can fix or remove them.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-15 23:55:38 +00:00
np
8462114df1 cxgbe(4): Add support for two more flash parts.
Obtained from:	Chelsio Communications
MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-15 22:26:09 +00:00
imp
f39d874a4b Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.) This extends
the method used in da to ada, nda, and mda.

Sponsored by: Netflix
Submitted by: Chuck Silvers
2018-05-15 22:22:10 +00:00
np
72f672768b cxgbe(4): Claim some more T5 and T6 boards.
MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-05-15 21:54:59 +00:00
imp
b2910ffe25 Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.)

Sponsored by: Netflix
Submitted by: Chuck Silvers
Differential Revision: https://reviews.freebsd.org/D15435
2018-05-15 21:25:35 +00:00
marius
9fcfef67f1 - If present, take advantage of the R/W cache of eMMC revision 1.5 and
later devices. These caches work akin to the ones found in HDDs/SSDs
  that ada(4)/da(4) also enable if existent, but likewise increase the
  likelihood of data loss in case of a sudden power outage etc. On the
  other hand, write performance is up to twice as high for e. g. 1 GiB
  files depending on the actual chip and transfer mode employed.
  For maximum data integrity, the usage of eMMC caches can be disabled
  via the hw.mmcsd.cache tunable.
- Get rid of the NOP mmcsd_open().
2018-05-15 21:15:09 +00:00
marius
c2e7c0202f Restore style(9) conformance after r320844 (actually requested pre-
commit) and bring the r320844 additions in line with existing bits.
2018-05-15 21:07:11 +00:00
rmacklem
098223ca00 End grace for the NFSv4 server if all mounts do ReclaimComplete.
The NFSv4 protocol requires that the server only allow reclaim of state
and not issue any new open/lock state for a grace period after booting.
The NFSv4.0 protocol required this grace period to be greater than the
lease duration (over 2minutes). For NFSv4.1, the client tells the server
that it has done reclaiming state by doing a ReclaimComplete operation.
If all NFSv4 clients are NFSv4.1, the grace period can end once all the
clients have done ReclaimComplete, shortening the time period considerably.
This patch does this. If there are any NFSv4.0 mounts, the grace period
will still be over 2minutes.
This change is only an optimization and does not affect correct operation.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-15 20:28:50 +00:00
brooks
f7144e3d61 Unwrap a line that no longer requires wrapping. 2018-05-15 20:14:38 +00:00
brooks
948a40e7de Remove stray tabs from in_lltable_dump_entry(). 2018-05-15 20:13:00 +00:00
brooks
f7adfa5151 Unwrap some not-so-long lines now that extra tabs been removed. 2018-05-15 17:59:46 +00:00
brooks
f5d8b3c62e Remove stray tabs in in6_lltable_dump_entry(). NFC. 2018-05-15 17:57:46 +00:00
antoine
8156c46154 vmmdev: return EFAULT when trying to read beyond VM system memory max address
Currently, when using dd(1) to take a VM memory image, the capture never ends,
reading zeroes when it's beyond VM system memory max address.
Return EFAULT when trying to read beyond VM system memory max address.

Reviewed by:	imp, grehan, anish
Approved by:	grehan
Differential Revision:	https://reviews.freebsd.org/D15156
2018-05-15 17:20:58 +00:00
avg
d8caa49667 calibrate lapic timer in native_lapic_setup
The idea is to calibrate the LAPIC timer just once and only on boot,
given that [at present] the timer constants are global and shared
between all processors.

My primary motivation is to fix a panic that can happen when dynamically
switching to lapic timer.  The panic is caused by a recursion on
et_hw_mtx when printing the calibration results to console.  See the
review for the details of the panic.

Also, the code should become slightly simpler and easier to read.  The
previous code was racy too.  Multiple processors could start calibrating
the global constants concurrently, although that seems to have been
benign.

Reviewed by:	kib, mav, jhb
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D15422
2018-05-15 16:56:30 +00:00
shurd
d32470f6c3 Check that ifma_protospec != NULL in inm_lookup
If ifma_protospec is NULL when inm_lookup() is called, there
is a dereference in a NULL struct pointer. This ensures that struct is
not NULL before comparing the address.

Reported by:	dumbbell
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15440
2018-05-15 16:54:41 +00:00
andrew
9d8b5dd26c Increase the number of pages we allocate in the arm64 early boot. We are
already close to the limit so increasing the kernel size may cause it to
fail to boot when it runs past the end of allocated memory.

Reported by:	manu
Sponsored by:	DARPA, AFRL
2018-05-15 16:44:35 +00:00
brooks
5bb6870ecb Allow freebsd32 __sysctl(2) to return ENOMEM.
This is required by programs like sockstat that read variably sized
sysctls such as kern.file.  The normal path has no such restriction and
the restriction was added without comment along with initial support for
freebsd32 in 2002 (r100384).

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15438
2018-05-15 16:24:58 +00:00
hselasky
5756ac23f9 Add new USB HID driver for Super Nintendo gamepads.
Differential Revision:	https://reviews.freebsd.org/D15385
Submitted by:		johalun@gmail.com (Johannes Lundberg)
Sponsored by:		Mellanox Technologies
2018-05-15 15:36:34 +00:00
trasz
fa3f8346bc Fix sysctl description.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-05-15 15:11:52 +00:00
sbruno
02d1f9d9e5 igb(4):
I210 restore functionality if pxeboot rom is enabled on this device.

r333345 attempted to determine if this code was needed or it was some kind
of work around for a problem.  Turns out, its definitely a work around for
hardware locking and synchronization that manifests itself if the option
Rom is enabled and is selected as a boot device (there was a PXE attempt).

Reviewed by:	mmacy
Differential Revision:	https://reviews.freebsd.org/D15439
2018-05-15 13:30:59 +00:00
avg
419e295c00 Fix 'zpool create -t <tempname>'
Creating a pool with a temporary name fails when we also specify custom
dataset properties: this is because we mistakenly call
zfs_set_prop_nvlist() on the "real" pool name which, as expected,
cannot be found because the SPA is present in the namespace with the
temporary name.

Fix this by specifying the correct pool name when setting the dataset
properties.

Author: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

Obtained from:	ZFS on Linux, zfsonlinux/zfs@4ceb8dd6fd
MFC after:	1 week
2018-05-15 13:27:29 +00:00
hselasky
b6bb68cbd9 Add support for setting type of service, TOS, for outgoing RDMA connections
in the krping kernel test utility.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-05-15 07:46:24 +00:00
np
29008d1ccb cxgbe(4): Filtering related features and fixes.
- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.

Sponsored by:	Chelsio Communications
2018-05-15 04:24:38 +00:00
emaste
0e06aa13ba subr_pidctrl: use standard 2-Clause FreeBSD license and disclaimer
Approved by:	jeff
2018-05-15 00:50:09 +00:00
marius
56e8a8ece1 Let mmcsd_ioctl() ensure appropriate privileges via priv_check(9). 2018-05-14 21:57:45 +00:00
marius
88409bdbbf The broken DDR52 support of Intel Bay Trail eMMC controllers rumored
in the commit log of r321385 has been confirmed via the public VLI54
erratum. Thus, stop advertising DDR52 for these controllers.
Note that this change should hardly make a difference in practice as
eMMC chips from the same era as these SoCs most likely support HS200
at least, probably even up to HS400ES.
2018-05-14 21:46:06 +00:00
shurd
8b4a96b13e Replace rmlock with epoch in lagg
Use the new epoch based reclamation API. Now the hot paths will not
block at all, and the sx lock is used for the softc data.  This fixes LORs
reported where the rwlock was obtained when the sxlock was held.

Submitted by:	mmacy
Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15355
2018-05-14 20:06:49 +00:00
jhb
8e38b4b70f Make the common interrupt entry point labels local labels.
Kernel debuggers depend on symbol names to find stack frames with a
trapframe rather than a normal stack frame.  The labels used for the
shared interrupt entry point for the PTI and non-PTI cases did not
match the existing patterns confusing debuggers.  Add the '.L' prefix
to mark these symbols as local so they are not visible in the symbol
table.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-05-14 17:27:53 +00:00
tuexen
2a6092ee58 sctp_get_mbuf_for_msg() should honor the allinone parameter.
When it is not required that the buffer is not a chain, return
a chain. This is based on a patch provided by Irene Ruengeler.
2018-05-14 15:16:51 +00:00
tuexen
e025e3497c Ensure that the MTU's used are multiple of 4.
The length of SCTP packets is always a multiple of 4. Therefore,
ensure that the MTUs used are also a multiple of 4.

Thanks to Irene Ruengeler for providing an earlier version of this
patch.

MFC after:	1 week
2018-05-14 13:50:17 +00:00
mmacy
4c3dbe627d hwpmc: don't reference domain index with no memory backing it
On multi-socket the domain will be correctly set for a given CPU
regardless of whether or not NUMA is enabled.

Approved by:	sbruno
2018-05-14 06:11:25 +00:00
nwhitehorn
ec32568d63 Final fix for alignment issues with the page table first patched with
r333273 and partially reverted with r333594.

Older CPUs implement addition of offsets into the page table by a
bitwise OR rather than actual addition, which only works if the table is
aligned at a multiple of its own size (they also require it to be aligned
at a multiple of 256KB). Newer ones do not have that requirement, but it
hardly matters to enforce it anyway.

The original code was failing on newer systems with huge amounts of RAM
(> 512 GB), in which the page table was 4 GB in size. Because the
bootstrap memory allocator took its alignment parameter as an int, this
turned into a 0, removing any alignment constraint at all and making
the MMU fail. The first round of this patch (r333273) fixed this case by
aligning it at 256 KB, which broke older CPUs. Fix this instead by widening
the alignment parameter.
2018-05-14 04:00:52 +00:00
mmacy
4d2273cd96 pmc: don't add pmc owner to list until setup is complete
Once a pmc owner is added to the pmc_ss_owners list it is
visible for all to see. We don't want this to happen until
setup is complete.

Reported by:	mjg
Approved by:	sbruno
2018-05-14 01:08:47 +00:00
mmacy
c00e92fa97 pmc: fix buildworld
hid ck_queue.h from user

Approved by:	sbruno
2018-05-14 00:56:33 +00:00
mmacy
9888701947 hwpmc: fix load/unload race and vm map LOR
- fix load/unload race by allocating the per-domain list structure at boot

- fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch
  to protect the liveness of elements of the pmc_ss_owners list

Reported by:	pho
Approved by:	sbruno
2018-05-14 00:21:04 +00:00
mmacy
641892da2a epoch(9): allow sx locks to be held across epoch_wait()
The INVARIANTS checks in epoch_wait() were intended to
prevent the block handler from returning with locks held.
What it in fact did was preventing anything except Giant
from being held across it. Check that the number of locks
held has not changed instead.

Approved by:	sbruno@
2018-05-14 00:14:00 +00:00
nwhitehorn
5ee2c8220a Revert changes to hash table alignment in r333273, which booting on all G5
systems, pending further analysis.
2018-05-13 23:56:43 +00:00
rmacklem
082a58b8e3 Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID.
In the reply to an ExchangeID operation, the NFSv4.1 server returns a
"scope" value (eir_server_scope). If this value is the same, it indicates
that two servers share state, which is never the case for FreeBSD servers.
As such, the value needs to be unique and it was without this patch.
However, I just found out that it is not supposed to change when the
server reboots and without this patch, it did change.
This patch fixes eir_server_scope so that it does not change when the
server is rebooted.
The only affect not having this patch has is that Linux clients don't
reclaim opens and locks after a server reboot, which meant they lost
any byte range locks held before the server rebooted.
It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not
affected by this bug.

MFC after:	1 week
2018-05-13 23:38:01 +00:00
mmacy
77f8d39c41 epoch(9): cleanups, additional debug checks, and add global_epoch
- GC the _nopreempt routines
    - to really benefit we'd need a separate routine
    - they're not currently in use
    - they complicate the API for no benefit at this time

- check that we're actually in a epoch section at exit

- handle epoch_call() early in boot

- Fix copyright declaration language

Approved by:	sbruno@
2018-05-13 23:24:48 +00:00
kib
088f960e02 Fix PMC_IN_TRAP_HANDLER() for i386 after the 4/4 split.
Sponsored by:	The FreeBSD Foundation
2018-05-13 20:10:02 +00:00
fsu
58c11e816a Fix directory blocks checksumming.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15396
2018-05-13 19:48:30 +00:00
fsu
faa61df95a Fix on-disk inode checksum calculation logic.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15395
2018-05-13 19:29:35 +00:00
fsu
aad5de15a2 Fix EXT2FS_DEBUG definition usage.
Reviewed by:    pfg
MFC after:      3 months

Differential Revision:    https://reviews.freebsd.org/D15394
2018-05-13 19:19:10 +00:00
markj
97bbbf7239 Get rid of vm_pageout_page_queued().
vm_page_queue(), added in r333256, generalizes vm_pageout_page_queued(),
so use it instead.  No functional change intended.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D15402
2018-05-13 13:00:59 +00:00
rmacklem
de34a49ecf Fix a slow leak of session structures in the NFSv4.1 server.
For a fairly rare case of a client doing an ExchangeID after a hard reboot,
the old confirmed clientid still exists, but some clients use a new
co_verifier. For this case, the server was not freeing up the sessions on
the old confirmed clientid.
This patch fixes this case. It also adds two LIST_INIT() macros, which are
actually no-ops, since the structure is malloc()d with M_ZERO so the pointer
is already set to NULL.
It should have minimal impact, since the only way I could exercise this
code path was by doing a hard power cycle (pulling the plus) on a machine
running Linux with a NFSv4.1 mount on the server.
Originally spotted during testing of the ESXi 6.5 client.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-13 12:42:53 +00:00
rmacklem
1e4675982e The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK.
When an NFSv4.1 session is busy due to a callback being in progress,
nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK.
The only effect this has is that the DestroySession operation will report
the failure for this case and this probably has little or no effect on a
client. Spotted by inspection and no failures related to this have been
reported.

MFC after:	2 months
2018-05-13 12:29:09 +00:00
kib
57921f916a Detect and optimize reads from the hole on UFS.
- Create getblkx(9) variant of getblk(9) which can return error.
- Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP
  was performed before the buffer is created, and EJUSTRETURN returned
  in case the requested block does not exist.
- Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and
  allocating the pages for it), copying from zero_region instead.

The end result is less page allocations and buffer recycling when a
hole is read, which is important for some benchmarks.

Requested and reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D14917
2018-05-13 09:47:28 +00:00
mmacy
71ab2f70a9 hwpmc/epoch - don't reference domain if NUMA is not set
It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.

Reported by:	pho@, np@
Approved by:	sbruno@
2018-05-12 20:00:29 +00:00
markj
225f18443a DTrace aarch64: Avoid calling unwind_frame() in the probe context.
unwind_frame() may be instrumented by FBT, leading to recursion into
dtrace_probe(). Manually inline unwind_frame() as we do with stack
unwinding code for other architectures.

Submitted by:	Domagoj Stolfa
Reviewed by:	manu
MFC after:	1 week
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D15359
2018-05-12 15:35:26 +00:00
manu
1695a02fac aw_mmc: Rework regulator handling
Don't enable regulator on attach but dealt with them on power_up/power_off
Only set the voltage for the signaling regulator since I don't have boards
that can change the supply voltage.
Enable 1.8v signaling voltage.
2018-05-12 13:14:01 +00:00
manu
25129c5588 aw_mmc: Do not fully init the controller in attach
Only do a reset of the controller at attach and init it at power_up.
We use to enable some interrupts in reset, only enable the interrupts
we are interested in when doing a request.
While here remove the regulators handling in power_on as it is very wrong
and will be dealt with in another commit.

Tested on: A31, A64
2018-05-12 13:13:34 +00:00
manu
9ce08fe12f aw_mmc: Remove hardware reset
From all the BSP (Board Source Package) source that I've looked at it seems
that it's never done, remove it.

Tested On: A31, A64
2018-05-12 13:12:59 +00:00
manu
f11cc0f016 aw_mmc: Read interrupt register value before writing to it
Reported by: jmcneill
2018-05-12 13:12:26 +00:00
kib
99b8ab0632 Kernel entry from vm86 mode, where PCB_VM86CALL pcb flag is not set,
is executed on the right stack already.  No copy from the entry stack
to the kstack must be performed for vm86 bios call code to function.

To access the pcb flags on kernel entry, unconditionally switch to
kernel address space if vm86 mode is detected.

This fixes very early vm86 bios calls, typically done when boot is
performed by boot2 without loader, and kernel falls back to BIOS calls
to get SMAP.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
2018-05-12 11:06:59 +00:00
kib
20d1e9d505 On return from exception or interrupt, returns to vm86 mode with
PCB_VM86CALL pcb flag not set should be treated same as return to
userspace.

Most important, the address space must be switched.  This fixes
usermode vm86 operations after the 4/4 split.

Sponsored by:	The FreeBSD Foundation
2018-05-12 11:02:39 +00:00
kib
ca1ba2e919 Initialize tramp_idleptd during cold pmap startup, before the
exception code is copied to the trampoline.

The correct value is then copied to trampoline automatically, so
tramp_idleptd_reloced can be eliminated.

This will allow to use the same exception entry code to handle traps
from vm86 bios calls on early boot stage, as after the trampoline is
configured.

Sponsored by:	The FreeBSD Foundation
2018-05-12 10:57:34 +00:00
kib
137b66125c Create a macro for PIC code which loads %cr3 from tramp_idleptd.
Sponsored by:	The FreeBSD Foundation
2018-05-12 10:51:50 +00:00
kib
c4d86cb0db Fix use of the custom TSS on i386 after the 4/4 split.
Record common_tssd, the descriptor to be written in GDT to point to
the common TSS, before LTR is executed.  The LTR instruction sets the
loaded descriptor type to 386 TSS busy, which traps on reloads.

Sponsored by:	The FreeBSD Foundation
2018-05-12 10:48:53 +00:00
mmacy
dcb7d046f9 hwpmc(9): clear remaining sample work for hardclock
- fix last minute change in 333509 where by runcount references
  to a pmc would remaining causing us to pause loop forever

Approved by:	sbruno
2018-05-12 03:45:30 +00:00
imp
56cab5772f Remove extra copy of bcopy.c now that we're using the libkern version
of this file.
2018-05-12 01:43:32 +00:00
mmacy
2981a3420c hwpmc(9): Make pmclog buffer pcpu and update constants
On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.

Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &

Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total

After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total

For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &

Collecting 10 samples of `make -j96 buildkernel` from each:

x before
+ after

real time:
    N           Min           Max        Median           Avg        Stddev
x  10          76.4        127.62        84.845        88.577     15.100031
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        -28.398 +/- 10.0344
        -32.0602% +/- 7.69825%
        (Student's t, pooled s = 10.6794)

system time:
    N           Min           Max        Median           Avg        Stddev
x  10       2277.96       6948.53       2949.47      3341.492     1385.2677
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        -2277.47 +/- 920.425
        -68.1574% +/- 8.77623%
        (Student's t, pooled s = 979.596)

x no pmc
+ pmc running
real time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10          76.4        127.62        84.845        88.577     15.100031
Difference at 95.0% confidence
        29.73 +/- 10.0335
        50.5208% +/- 17.0525%
        (Student's t, pooled s = 10.6785)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10         58.38         59.15         58.86        58.847    0.22504567
+  10         59.71         60.79        60.135        60.179    0.29957192
Difference at 95.0% confidence
        1.332 +/- 0.248939
        2.2635% +/- 0.426506%
        (Student's t, pooled s = 0.264942)

system time:

HEAD:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10       2277.96       6948.53       2949.47      3341.492     1385.2677
Difference at 95.0% confidence
        2309.97 +/- 920.443
        223.937% +/- 89.3039%
        (Student's t, pooled s = 979.616)

patched:
    N           Min           Max        Median           Avg        Stddev
x  10       1010.15       1073.31      1025.465      1031.524     18.135705
+  10        1038.7       1081.06      1070.555      1064.017      15.85404
Difference at 95.0% confidence
        32.493 +/- 16.0042
        3.15% +/- 1.5794%
        (Student's t, pooled s = 17.0331)

Reviewed by:	jeff@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15155
2018-05-12 01:26:34 +00:00
rmacklem
962b30b569 Add support for the TestStateID operation to the NFSv4.1 server.
The Linux client now uses the TestStateID operation, so this patch adds
support for it to the NFSv4.1 server. The FreeBSD client never uses this
operation, so it should not be affected.

MFC after:	2 months
2018-05-11 22:16:23 +00:00
shurd
210352aeb1 Fix LORs in in6?_leave_group()
r333175 updated the join_group functions, but not the leave_group ones.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15393
2018-05-11 21:42:27 +00:00
kib
009a13f92b Remove dead declaration.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-05-11 20:47:45 +00:00
mmacy
0a7aab5128 iflib(9): Add support for cloning pseudo interfaces
Part 3 of many ...
The VPC framework relies heavily on cloning pseudo interfaces
(vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc).

This pulls in that piece. Some ancillary changes get pulled
in as a side effect.

Reviewed by:	shurd@
Approved by:	sbruno@
Sponsored by:	Joyent, Inc.
Differential Revision:	https://reviews.freebsd.org/D15347
2018-05-11 20:08:28 +00:00
mmacy
61486c0416 epoch(9): always set inited in epoch_init
- set inited in the !usedomains case

Reported by:	jhibbits
Approved by:	sbruno
2018-05-11 18:37:14 +00:00
sbruno
efee591c25 vxge(4): deprecation notice
This hardware isn't totally ancient, about equal to a mxge(4) or mlx4en(4),
but the company was sold to Exar which then promptly exited the Ethernet
business so the card was commercially available for under 2 years. On deep
search, the only usage of these cards I found was by the importing of the
driver. There are code quality issues identified by Brooks and Hiren and
no visible use nor maintainership that warrant removal from FreeBSD 12.0.

Submitted by:	kbowling
Reviewed by:	gnn brooks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15363
2018-05-11 17:26:59 +00:00
ae
c53ab47acf Apply the change from r272770 to if_ipsec(4) interface.
It is guaranteed that if_ipsec(4) interface is used only for tunnel
mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header.
Thus we can consider it as new packet and clear the protocols flags.
This allows ICMP/ICMPv6 properly handle errors that may cause this packet.

PR:		228108
MFC after:	1 week
2018-05-11 16:50:25 +00:00
ken
7820a12c47 Clear out the entire structure, not just the size of a pointer to it.
sys/dev/ocs/ocs_os.c:
	In ocs_thread_create(), use sizeof(*thread) (instead of
	sizeof(thread)) as the size argument to memset so that we clear
	out the entire thread structure instead of just a few bytes of it.

Submitted by:	jtl
MFC after:	3 days
2018-05-11 14:50:26 +00:00
emaste
babef78f7a usbdevs: add new Microchip USB-Ethernet device IDs
LAN7800 USB 3.1 to 10/100/1000 Ethernet with PHY
LAN7801 USB 3.1 to 10/100/1000 Ethernet with RGMII interface

Also update manufacturer name for the Vendor ID.  Microchip acquired
SMSC in May 2012.

Sponsored by:	The FreeBSD Foundation
2018-05-11 13:09:21 +00:00
mjg
7831fe46a1 amd64: align the .data.exclusive_cache_line section to 128
This aligns the section itself compared to other sections, does not change
internal alignment of fields stored inside. This may or may not come later.

The motivation is partially combating adverse effects of the adjacent cache
line prefetcher. Without the annotation part of read_mostly section was on
the line of fire.
2018-05-11 08:56:39 +00:00
mmacy
5f425dce2c epoch(9): callback task fixes
- initialize the pcpu STAILQ in the NUMA case
- don't enqueue the callback task if there isn't sufficient work to be done

Reported by:	pho@
Approved by:	sbruno@
2018-05-11 08:16:56 +00:00
mjg
02e19399b7 uma: increase alignment to 128 bytes on amd64
Current UMA internals are not suited for efficient operation in
multi-socket environments. In particular there is very common use of
MAXCPU arrays and other fields which are not always properly aligned and
are not local for target threads (apart from the first node of course).
Turns out the existing UMA_ALIGN macro can be used to mostly work around
the problem until the code get fixed. The current setting of 64 bytes
runs into trouble when adjacent cache line prefetcher gets to work.

An example 128-way benchmark doing a lot of malloc/frees has the following
instruction samples:

before:
kernel`lf_advlockasync+0x43b            32940
          kernel`malloc+0xe5            42380
           kernel`bzero+0x19            47798
   kernel`spinlock_exit+0x26            60423
         kernel`0xffffffff80            78238
                         0x0           136947
   kernel`uma_zfree_arg+0x46           159594
 kernel`uma_zalloc_arg+0x672           180556
   kernel`uma_zfree_arg+0x2a           459923
 kernel`uma_zalloc_arg+0x5ec           489910

after:
            kernel`bzero+0xd            46115
kernel`lf_advlockasync+0x25f            46134
kernel`lf_advlockasync+0x38a            49078
   kernel`fget_unlocked+0xd1            49942
kernel`lf_advlockasync+0x43b            55392
          kernel`copyin+0x4a            56963
           kernel`bzero+0x19            81983
   kernel`spinlock_exit+0x26            91889
         kernel`0xffffffff80           136357
                         0x0           239424

See the review for more details.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D15346
2018-05-11 07:04:57 +00:00
mjg
e6cfcf1248 rmlock: partially depessimize lock/unlock fastpath
Previusly the slow path was folded in and partially jumped over in the
common case.
2018-05-11 06:59:54 +00:00
mmacy
361b54f07a Allow different bridge types to coexist
if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.

Reviewed by:	sbruno@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15344
2018-05-11 05:00:40 +00:00
mmacy
73d042eb34 epoch(9): fix priority handling, make callback lists pcpu, and other fixes
- Lend priority to preempted threads in epoch_wait to handle the case
  in which we've had priority lent to us. Previously we borrowed the
  priority of the lowest priority preempted thread. (pointed out by mjg@)

- Don't attempt allocate memory per-domain on powerpc, we don't currently
  handle empty sockets (as is the case on jhibbits Talos' board).

- Handle deferred callbacks as pcpu lists and poll the lists periodically.
  Currently the interval is 1/hz.

- Drop the thread lock when adaptive spinning. Holding the lock starves
  other threads and can even lead to lockups.

- Keep a generation count pcpu so that we don't keep spining if a thread
  has left and re-entered an epoch section.

- Actually removed the callback from the callback list so that we don't
  double free. Sigh ...

Approved by:	sbruno@
2018-05-11 04:54:12 +00:00
mmacy
2076e8aba5 Test priority handling in epoch test.
- Double the number of test threads to mp_ncpu*2
- Give each thread a different scheduling priority
2018-05-11 04:47:05 +00:00
jhibbits
095a58f17a No need to bzero splpar_vpa entries
splpar_vpa is in the BSS, so is already zeroed when the kernel starts up.

Tested by:	Leandro Lupori
2018-05-11 02:04:01 +00:00
des
42f7c6ed63 Slight cleanup of interface event logging.
Make if_printf() use vlog() instead of vprintf().  This means it can no
longer return the number of characters printed, as it used to, but every
single call to if_printf() in the entire kernel ignores the return value
anyway; just return 0 so we don't have to change the prototype.

Consistently use if_printf() throughout sys/net/if.c, instead of a
mixture of if_printf() and log().

In ifa_maintain_loopback_route(), don't needlessly log an error if we
either failed to add a route because it already existed or failed to
remove one because it did not.  We still return an error code, though.

MFC after:	1 week
2018-05-11 00:19:49 +00:00
des
58d2db41a5 Reduce <sys/queue.h> pollution.
While <sys/sysctl.h> includes <sys/queue.h> unconditionally, it is only
actually used in code which is conditional on _KERNEL.  Make the #include
itself conditional as well, and fix userland code that uses <sys/queue.h>
for other purposes but relied on <sys/sysctl.h> to bring it in.

MFC after:	1 week
2018-05-11 00:01:43 +00:00
np
f702263d0a cxgbe(4): Add fields to support configuration of hardware NAT and
swapmac (SMAC/DMAC switcheroo) from userspace.

Sponsored by:	Chelsio Communications
2018-05-10 20:39:04 +00:00
emaste
52fe12515b Error out on attempt to link amd64 kernel with old binutils linker
As of r333461 we require ifunc support to link a working amd64 kernel.
The default in-tree bootstrap linker is lld and it has the required
support, as does any modern out-of-tree binutils linker.  The in-tree
GNU ld is from binutils 2.17.50 and it does not have ifunc support,
so produce an error rather than a broken kernel.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D15378
2018-05-10 20:10:02 +00:00
mmacy
0f77b86d64 Allocate epoch for networking at startup
Additionally add CK to include paths for modules

Approved by:	sbruno@
2018-05-10 19:13:00 +00:00
mmacy
68b801ac97 Add simple preempt safe epoch API
Read locking is over used in the kernel to guarantee liveness. This API makes
it easy to provide livenes guarantees without atomics.

Includes epoch_test kernel module to stress test the API.

Documentation will follow initial use case.

Test case and improvements to preemption handling in response to discussion
with mjg@

Reviewed by:	imp@, shurd@
Approved by:	sbruno@
2018-05-10 17:55:24 +00:00
lwhsu
a2b0dc578d Fix build for platforms using GCC:
- Remove unused or dead store variable
- Remove unused function ctl_copyin_alloc
- Add missing curly brackets, this seems a regression in r287720

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15383
2018-05-10 17:22:04 +00:00
dumbbell
242ffb3648 vt(4): Use default VGA palette
Before this change, the VGA palette was configured to match the shell
palette (e.g. color #1 was red). There was one glitch early in boot when
the vt(4)'s VGA palette was loaded: the loader's logo would switch from
red to blue. Likewise for the "Booting..." message switching from blue
to red. That's because the loader's logo was drawed with the default VGA
palette where a few colors are swapped compared to the shell palette
(e.g. blue <-> red).

This change configures the default VGA palette during initialization and
converts input's colors from shell to VGA palette index.

There should be no visible changes, except the loader's logo which will
keep its original color.

Reviewed by:	eadler
2018-05-10 17:00:33 +00:00
dumbbell
c7886ada60 vt(4): Put for() loop outside switch() in vt_generate_cons_palette()
This makes it more logical:
 1. It checks the requested color format
 2. It fills the palette accordingly

Also vt_palette_init() is only called when needed (i.e. when the format
is `COLOR_FORMAT_RGB`).
2018-05-10 16:41:47 +00:00
gallatin
21f42492ac Fix a panic in the IPv6 multicast code.
Use LIST_FOREACH_SAFE in in6m_disconnect() since we're
deleting and freeing item from the membership list
while traversing the list.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-10 16:19:41 +00:00
kib
852722fdfd Make fpusave() and fpurestore() on amd64 ifuncs.
From now on, linking amd64 kernel requires either lld or newer ld.bfd.

Reviewed by:	jhb (as part of the large patch)
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13838
2018-05-10 15:01:43 +00:00
gallatin
2eceae8f13 Fix the build after r333457
In r333457, the arguments to kern_pwritev() were accidentally
re-ordered as part of ANSIfication, breaking the build.
2018-05-10 13:19:42 +00:00
emaste
90944a37d3 ANSIfy sys_generic.c 2018-05-10 11:36:16 +00:00
mw
462a26d62e Do not pass header length to the ENA controller
Header length is optional hint for the ENA device. Because It is not
guaranteed that every packet header will be in the first mbuf
segment, it is better to skip passing any information. If the header
length will be indicating invalid value (different than 0), then the
packet will be dropped.

This kind situation can appear, when the UDP packet will be fragmented
by the stack in the ip_fragment() function.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reported by:  Krishna Yenduri <kyenduri@brkt.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2018-05-10 09:37:54 +00:00
manu
6caecdadde arm64: Add ALT_BREAK_TO_DEBUGGER to GENERIC
It is useful to enter kdb with an escape sequence.
While here move the USB_DEBUG with the others debug options and define
nooptions USB_DEBUG for GENERIC-NODEBUG
2018-05-10 09:37:50 +00:00
mw
e2ad139b86 Skip setting the MTU for ENA if it is not changing
On AWS, a network interface can get reinitialized every 30 minutes due
to the MTU being (re)set when a new DHCP lease is obtained. This can
cause packet drop, along with annoying syslog messages.

Skip setting the MTU in the ena driver if the new MTU is the same as the
old MTU. Note this fix is already in the netfront driver.

Testing: Verified ena up/down messages do not appear every 30 min in
/var/log/messages with the fix in place.

Submitted by:   Krishna Yenduri <kyenduri@brkt.com>
Reviewed by: Michal Krawczyk <mk@semihalf.com>
2018-05-10 09:32:59 +00:00
mw
18c1d53725 Apply fixes in ena-com
* Change ena-com BIT macro to work on unsigned value.
  To make the shifting operations safer, they should be working on
  unsigned values.

* Fix a mutex not owned ASSERT panic in ENA control path.
  A thread calling cv_broadcast()/cv_signal() must hold the mutex used for
  cv_wait(). Fix the ENA control path code that has this problem.

Submitted by:   Krishna Yenduri <kyenduri@brkt.com>
Reviewed by:    Michal Krawczyk <mk@semihalf.com>
Tested by:      Michal Krawczyk <mk@semihalf.com>
2018-05-10 09:25:51 +00:00
mw
196546a6e6 Upgrade ENA version to v0.8.1
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2018-05-10 09:06:21 +00:00
delphij
f29950c935 Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-10 06:41:08 +00:00
np
33b588ad7d cxgbe(4): Disable write-combined doorbells by default.
This had been the default behavior but was changed accidentally as part
of the recent iw_cxgbe+OFED overhaul.  Fix another bug in that change
while here: the global knob affects all the adapters in the system and
should be left alone by per-adapter code.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2018-05-10 06:33:54 +00:00
jhibbits
560fc64981 Fix PPC symbol resolution
Summary:
There were 2 issues that were preventing correct symbol resolution
on PowerPC/pseries:

1- memory corruption at chrp_attach() - this caused the inital
   part of the symbol table to become zeroed, which would cause
   the kernel linker to fail to parse it.
   (this was probably zeroing out other memory parts as well)

2- DDB symbol resolution wasn't working because symtab contained
   not relocated addresses but it was given relocated offsets.
   Although relocating the symbol table fixed this, it broke the
   linker, that already handled this case.
   Thus, the fix for this consists in adding a new DDB macro:
   DB_STOFFS(offs) that converts a (potentially) relocated offset
   into one that can be compared with symbol table values.

PR:		227093
Submitted by:	Leandro Lupori <leandro.lupori_gmail.com>
Differential Revision: https://reviews.freebsd.org/D15372
2018-05-10 03:59:48 +00:00
araujo
2549fc5001 Rework CTL frontend & backend options to use nv(3), allow creating multiple
ioctl frontend ports.

This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
  Removes ctl_be_arg structure and associated logic and replaces it with
  nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
  ctld(8).
  New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
  Those device nodes respond only to CTL_IO ioctl.

New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4

New syntax for ctl.conf:
target ... {
    port ioctl/<pp>
    ...
}

target ... {
    port ioctl/<pp>/<vp>
    ...

Note: Most of this work was made by jceel@, thank you.

Submitted by:	jceel
Reworked by:	myself
Reviewed by:	mav (earlier versions and recently during the rework)
Obtained from:  FreeNAS and TrueOS
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D9299
2018-05-10 03:50:20 +00:00
imp
fbbe5571b3 Remove unused bcopyb.
Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:54 +00:00
imp
0bfb069f86 Simplify things a little
Rather than include a copy for memmove to call bcopy to call memcpy
(which handles overlapping copies), make memmove a strong reference to
memcpy to save the two calls.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:48 +00:00
imp
741ea3fea2 Move MI-ish bcopy routine to libkern
riscv and powerpc have nearly identical bcopy.c that's
supposed to be mostly MI. Move it to the MI libkern.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-10 02:31:38 +00:00
np
e34f445e92 cxgbe(4): Determine whether the firmware supports the FILTER2 work
request, which can be used to configure hardware NAT and swapmac.

All firmwares released after Jan 2017 support this work request.

Sponsored by:	Chelsio Communications
2018-05-10 00:04:14 +00:00
markj
73b6d8a09d Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-09 20:57:18 +00:00
oshogbo
9f099d8764 Introduce the 'n' flag for the geli attach command.
If the 'n' flag is provided the provided key number will be used to
decrypt device. This can be used combined with dryrun to verify if the key
is set correctly. This can be also used to determine which key slot we want to
change on already attached device.

Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D15309
2018-05-09 20:53:38 +00:00
imp
899bd2ec13 Remove the 'All Rights Reserved' clause from some of the stuff I've
done for Netflix, since I'm in the neighborhood.
2018-05-09 20:32:23 +00:00
imp
d8f0115e23 Use the full year, for real this time. 2018-05-09 20:26:37 +00:00
markj
15067afda3 Fix bxe(4) netdump rx polling.
Reviewed by:	cem, rstone
X-MFC with:	r333287
Sponsored by:	Dell EMC Isilon
2018-05-09 19:54:34 +00:00
cy
c946900e92 Fix style error introduced in r333393.
Reported by:	jhb, imp, phk
MFC after:	6 days
X-MFC with:	r333393
2018-05-09 19:05:27 +00:00
mmacy
5a6351e4d0 Add taskqgroup_config_gtask_deinit to support teardown after
taskqgroup_config_gtask_init.

Approved by:	sbruno
2018-05-09 18:51:35 +00:00
mmacy
a0bd5d3d7f Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
mmacy
442cc65d0c Remove bogus panic
r333345 added a panic to the default case statement on the incorrect
premise that it should "never happen" when in fact it is simply a
different adapter version.

Reported by:	markj
Approved by:	sbruno
2018-05-09 17:48:52 +00:00
kevans
e3f7dbda44 Standardize SPDX tag on files I've added 2018-05-09 16:52:28 +00:00
kevans
3ca335caec Remove "All Rights Reserved" on files that I hold sole copyright on
See r333391 for more detail; in summary: it holds no weight and may be
removed.
2018-05-09 16:44:19 +00:00
jhb
aa34e24a27 Report TRAP_BRKPT for breakpoint traps on sparc64.
Reviewed by:	marius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15190
2018-05-09 15:25:26 +00:00
mjg
c6c875dd41 amd64: depessimize bcmp for small buffers
Adapt assembly generated by clang for memcmp and use it for <= 64 sized
compares (which are the vast majority).

Sample result of doing stats on Broadwell (% of samples):
before: 4.0 kernel     bcmp                 cache_lookup
after : 0.7 kernel     bcmp                 cache_lookup

The routine is most definitely still not optimal. Anyone interested in
spending time improving it is welcome to take over.

Reviewed by:	kib
2018-05-09 15:16:25 +00:00
kib
4895cdf089 Avoid calls to bzero() before ireloc.
Evaluate cpu_stdext_feature early to have moved link_elf_ireloc() see
correct flags, most important is SMAP.

Tested by:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D15367
2018-05-09 14:39:24 +00:00
imp
e08b855ef7 Minor style nits
Use full copyright year.
Remove 'All Rights Reserved' from new file (rights holder OK'd)
Minor #ifdef motion and #endif tagging
Remove __FBSDID macro from comments

Sponsored by: Netflix
OK'd by: rrs@
2018-05-09 14:11:35 +00:00
kib
98f701752a Remove PG_U from the rest of the kernel pmap ptes.
Supposedly, they PG_U bits there were set to easier making some kernel
page accessible to userspace in-place.  Since it was not used for the
whole existence of the amd64 pmap.c and current design of the shared
pages prefers double-mapping over the in-place access, remove PG_U
both from the direct map and KVA slots.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-09 12:09:08 +00:00
kib
dfdc3054ac Remove PG_U from the recursive pte for kernel pmap' PML4 page.
This PML4 page is never used for the userspace process, so there is no
security implications.  But the configuration trips SMAP check, which
should be corrected.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-09 12:03:40 +00:00
ae
68071c299a Bring in some last changes in NAT64 implementation:
o Modify ipfw(8) to be able set any prefix6 not just Well-Known,
  and also show configured prefix6;
o relocate some definitions and macros into proper place;
o convert nat64_debug and nat64_allow_private variables to be
  VNET-compatible;
o add struct nat64_config that keeps generic configuration needed
  to NAT64 code;
o add nat64_check_prefix6() function to check validness of specified
  by user IPv6 prefix according to RFC6052;
o use nat64_check_private_ip4() and nat64_embed_ip4() functions
  instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows
  to use any configured IPv6 prefixes that are allowed by RFC6052;
o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is
  Well-Known IPv6 prefix. It is used to reduce overhead to check this;
o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config
  structure. And respectivelly modify the rest of code;
o remove now unused ro argument from nat64_output() function;
o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions;
o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile
  as example.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2018-05-09 11:59:24 +00:00
ae
0490c97003 Add IFCAP_LINKSTATE support to if_loop(4).
Reviewed by:	wollman
Obtained from:	Yandex LLC
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D15278
2018-05-09 10:50:51 +00:00
hselasky
4bc2d771ed Add myself to copyright in the LinuxKPI RCU support layer.
Suggested by:	mmacy@
Sponsored by:	Mellanox Technologies
2018-05-09 08:50:42 +00:00
np
d85f4bf1d5 cxgbe(4): Add support for hash filters.
These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool.  Hash and
normal TCAM filters can be used together.  The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters.  Any T5/T6 card with memory can support at
least half a million hash filters.  The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file.  The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto).  It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters.  A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by:	Chelsio Communications
2018-05-09 04:09:49 +00:00
cy
d595ecd177 Document intentional fallthrough. (CID 976535)
MFC after:	1 week
2018-05-09 02:07:09 +00:00
cy
9890e954c6 Fix memory leak. (CID 1199373).
MFC after:	1 week
2018-05-09 02:02:58 +00:00
mmacy
ebb11d48b8 Reduce overhead of ktrace checks in the common case.
KTRPOINT() checks both if we are tracing _and_ if we are recursing within
ktrace. The second condition is only ever executed if ktrace is actually
enabled. This change moves the check out of the hot path in to the functions
themselves.

Discussed with mjg@

Reported by:	mjg@
Approved by:	sbruno@
2018-05-09 00:00:47 +00:00
sbruno
9e75a9c23d nxge(4):
Remove nxge(4) and associated man page and tools in FreeBSD 12.0.

Submitted by:	kbowling
Reviewed by:	brooks
Relnotes:	yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D1529
2018-05-08 21:14:29 +00:00
tuexen
f92d8838e6 Fix two typos reported by N. J. Mann, which were introduced in
https://svnweb.freebsd.org/changeset/base/333382 by me.

MFC after:	3 days
2018-05-08 20:39:35 +00:00