Commit Graph

3992 Commits

Author SHA1 Message Date
ae
8d3e25d418 Add ifaddr_event_ext event. It is similar to ifaddr_event, but the
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.

MFC after:	3 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D17100
2018-10-21 15:02:06 +00:00
glebius
8e0b6f937e Fix exiting an epoch(9) we never entered. May happen only with MAC. 2018-10-21 12:39:00 +00:00
hselasky
ccccebe3aa Fix deadlock when destroying VLANs.
Synchronizing the epoch before freeing the multicast addresses while holding
the VLAN_XLOCK() might lead to a deadlock. Use deferred freeing of the VLAN
multicast addresses to resolve deadlock. Backtrace:

Thread1:
epoch_block_handler_preempt()
ck_epoch_synchronize_wait()
epoch_wait_preempt()
vlan_setmulti()
vlan_ioctl()
in6m_release_task()
gtaskqueue_run_locked()
gtaskqueue_thread_loop()
fork_exit()
fork_trampoline()

Thread2:
sleepq_switch()
sleepq_wait()
_sx_xlock_hard()
_sx_xlock()
in6_leavegroup()
in6_purgeaddr()
if_purgeaddrs()
if_detach_internal()
if_detach()
vlan_clone_destroy()
if_clone_destroyif()
if_clone_destroy()
ifioctl()
kern_ioctl()
sys_ioctl()
amd64_syscall()
fast_syscall_common()
syscall()

Differential revision:	https://reviews.freebsd.org/D17496
Reviewed by:		slavash, mmacy
Approved by:		re (kib)
Sponsored by:		Mellanox Technologies
2018-10-15 10:29:29 +00:00
erj
4578e98d65 ixl/iavf(4): Change ixlv to iavf and update it to use iflib(9)
Finishes the conversion of the 40Gb Intel Ethernet drivers to iflib(9) for
FreeBSD 12.0, and fixes numerous bugs in both ixl(4) and iavf(4).

This commit also re-adds the VF driver to GENERIC since it now compiles and
functions.

The VF driver name was changed from ixlv(4) to iavf(4) because the VF driver is
now intended to be used with future products, not just with Fortville/Fort Park
VFs.

A man page update that documents these drivers is forthcoming in a separate
commit.

Reviewed by:    sbruno@, kbowling@
Tested by:      jeffrey.e.pieper@intel.com
Approved by:	re (gjb@)
Relnotes:       yes
Sponsored by:   Intel Corporation
Differential Revision: https://reviews.freebsd.org/D16429
2018-10-12 22:40:54 +00:00
jtl
1d041dd4b1 There are three places where we return from a function which entered an
epoch section without exiting that epoch section. This is bad for two
reasons: the epoch section won't exit, and we will leave the epoch tracker
from the stack on the epoch list.

Fix the epoch leak by making sure we exit epoch sections before returning.

Reviewed by:	ae, gallatin, mmacy
Approved by:	re (gjb, kib)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D17450
2018-10-09 13:26:06 +00:00
tuexen
a16e14a2bb Use strlcpy() instead of strncpy().
Approved by:            re (kib@)
CID:			1395980, 1395981
X-MFC with:		r339012
MFC after:              1 week
2018-10-03 07:35:16 +00:00
tuexen
90ffd2da8f For changing the MTU on tun/tap devices, it should not matter whether it
is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or
doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command.
Without this patch, for IPv6 the new MTU is not used when creating routes.
Especially, when initiating TCP connections after increasing the MTU,
the old MTU is still used to compute the MSS.
Thanks to ae@ and bz@ for helping to improve the patch.

Reviewed by:		ae@, bz@
Approved by:		re (kib@)
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D17180
2018-09-29 13:01:23 +00:00
imp
8efc2b3f05 Reapply, with minor tweaks, r338025, from the original commit:
Remove unused and easy to misuse PNP macro parameter

Inspired by r338025, just remove the element size parameter to the
MODULE_PNP_INFO macro entirely.  The 'table' parameter is now required to
have correct pointer (or array) type.  Since all invocations of the macro
already had this property and the emitted PNP data continues to include the
element size, there is no functional change.

Mostly done with the coccinelle 'spatch' tool:

  $ cat modpnpsize0.cocci
    @normaltables@
    identifier b,c;
    expression a,d,e;
    declarer MODULE_PNP_INFO;
    @@
     MODULE_PNP_INFO(a,b,c,d,
    -sizeof(d[0]),
     e);

    @singletons@
    identifier b,c,d;
    expression a;
    declarer MODULE_PNP_INFO;
    @@
     MODULE_PNP_INFO(a,b,c,&d,
    -sizeof(d),
     1);

  $ rg -l MODULE_PNP_INFO -- sys | \
    xargs spatch --in-place --sp-file modpnpsize0.cocci

(Note that coccinelle invokes diff(1) via a PATH search and expects diff to
tolerate the -B flag, which BSD diff does not.  So I had to link gdiff into
PATH as diff to use spatch.)

Tinderbox'd (-DMAKE_JUST_KERNELS).
Approved by: re (glen)
2018-09-26 17:12:14 +00:00
mmacy
d09bac9dfb fix vlan locking to permit sx acquisition in ioctl calls
- update vlan(9) to handle changes earlier this year in multicast locking

Tested by: np@, darkfiberu at gmail.com

PR:	230510
Reviewed by:	mjoras@, shurd@, sbruno@
Approved by:	re (gjb@)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16808
2018-09-21 01:37:08 +00:00
shurd
bd447b7e63 Fix capabilities handling for iflib drivers
Various capabilities were not being handled correctly in the
SIOCSIFCAP handler. Specifically:

IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6 could be set even if not supported

It was impossible to disable IFCAP_RXCSUM and/or IFCAP_RXCSUM_IPV6 via
ifconfig since it does ioctl() per command-line flag rather than combine
them into a single call.

IFCAP_VLAN_HWCSUM could not be modified via the ioctl()

Setting any combination of the three IFCAP_WOL flags would set only
IFCAP_WOL_MCAST | IFCAP_WOL_MAGIC. For example, setting only
IFCAP_WOL_UCAST would result in both IFCAP_WOL_MCAST and IFCAP_WOL_MAGIC
being enabled, but IFCAP_WOL_UCAST would not be enabled.

Because if_vlancap() was called before if_togglecapenable(), vlan flags
were sometimes not applied correctly.

Interfaces were being unnecessarily stopped and restarted for WoL

PR:		231151
Submitted by:	Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
Reported by:	Shirkdog <mshirk@daemon-security.com>
Reviewed by:	galladin
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17158
2018-09-20 19:35:35 +00:00
ae
74f4aa8745 Restore outbound packets capturing for if_gre(4). It was missed in r335048.
Also clear M_MCAST and M_BCAST flags for encapsulated datagram, since it
will have new IP header.

Approved by:	re (kib)
2018-09-17 10:10:14 +00:00
br
4b45af7432 Don't mark module data as static on RISC-V.
Similar to arm64, riscv compiler uses PC-relative loads/stores,
and with static data compiler does not emit relocations.
In result, kernel module linker has nothing to fix and data accessed
from the wrong location.

Approved by:	re (gjb)
Sponsored by:	DARPA, AFRL
2018-09-12 08:05:33 +00:00
shurd
0c8b895aa7 Clean up iflib sysctls
Remove sysctls:
txq_drain_encapfail - now a duplicate of encap_txd_encap_fail
intr_link - was never incremented
intr_msix - was never incremented
rx_zero_len - was never incremented

The following were not incremented in all code-paths that apply:
m_pullups, mbuf_defrag, rxd_flush, tx_encap, rx_intr_enables, tx_frees,
encap_txd_encap_fail.

Fixes:
Replace the broken collapse_pkthdr() implementation with an MPASS().
fl_refills and fl_refills_large were not incremented when using netmap.

Reviewed by:	gallatin
Approved by:	re (marius)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16733
2018-09-06 18:51:52 +00:00
bz
44d2e30ec4 Rather than duplicating the functionality of a macro after r322866
use the already existing one.  No functional changes.

Reviewed by:	karels, ae
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17004
2018-09-03 22:10:49 +00:00
shurd
aacbdebbaf Fix compile error due to missing parenthesis in r338372
Approved by:	re (gjb)
2018-08-29 16:21:34 +00:00
shurd
ab689463dd Fix potential data corruption in iflib
The MP ring may have txq pointers enqueued.  Previously, these were
passed to m_free() when IFC_QFLUSH was set.  This patch checks for
the value and doesn't call m_free().

Reviewed by:	gallatin
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16882
2018-08-29 15:55:25 +00:00
markm
d8723e8b03 Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR:		230870
Reviewed by:	cem
Approved by:	so(delphij,gtetlow)
Approved by:	re(marius)
Differential Revision:	https://reviews.freebsd.org/D16898
2018-08-26 12:51:46 +00:00
np
e16f1bf84a Unbreak VLANs after r337943.
ether_set_pcp should not be called from ether_output_frame for VLAN
interfaces -- the vid + pcp will be inserted during vlan_transmit in
that case. r337943 sets the VLAN's ifnet's if_pcp to a proper PCP value
and this led to double encapsulation (once with vid 0 and second time
with vid+pcp).

PR: 230794
Reviewed by:	kib@
Approved by:	re@ (gjb@)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D16887
2018-08-24 21:48:13 +00:00
pkelsey
2e5630c90a Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of
2^32 bps or greater to be used.  Prior to this, bandwidth parameters
would simply wrap at the 2^32 boundary.  The computations in the HFSC
scheduler and token bucket regulator have been modified to operate
correctly up to at least 100 Gbps.  No other algorithms have been
examined or modified for correct operation above 2^32 bps (some may
have existing computation resolution or overflow issues at rates below
that threshold).  pfctl(8) will now limit non-HFSC bandwidth
parameters to 2^32 - 1 before passing them to the kernel.

The extensions to the pf(4) ioctl interface have been made in a
backwards-compatible way by versioning affected data structures,
supporting all versions in the kernel, and implementing macros that
will cause existing code that consumes that interface to use version 0
without source modifications.  If version 0 consumers of the interface
are used against a new kernel that has had bandwidth parameters of
2^32 or greater configured by updated tools, such bandwidth parameters
will be reported as 2^32 - 1 bps by those old consumers.

All in-tree consumers of the pf(4) interface have been updated.  To
update out-of-tree consumers to the latest version of the interface,
define PFIOC_USE_LATEST ahead of any includes and use the code of
pfctl(8) as a guide for the ioctls of interest.

PR:	211730
Reviewed by:	jmallett, kp, loos
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D16782
2018-08-22 19:38:48 +00:00
erj
7b4938ad4e if_media: Add new 2.5G/5G/25G/40G/50G/100G/200G/400G media types
Upcoming Ethernet hardware will support new media types that aren't in the kernel
yet, so they are added here. These mostly include new 25G/50G/100G media types;
and this commit introduces new 200G/400G speeds and media.

Reviewed by:	hselasky@, jhb@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D16731
2018-08-22 18:19:56 +00:00
mmacy
b9a21f35a8 fix copy/paste error when clearing ifma flag
CID: 1395119
Reported by:	vangyzen
2018-08-21 22:59:22 +00:00
cem
d70d723ffc Back out r338035 until Warner is finished churning GSoC PNP patches
I was not aware Warner was making or planning to make forward progress in
this area and have since been informed of that.

It's easy to apply/reapply when churn dies down.
2018-08-19 00:46:22 +00:00
cem
3d8ae7a0f4 Remove unused and easy to misuse PNP macro parameter
Inspired by r338025, just remove the element size parameter to the
MODULE_PNP_INFO macro entirely.  The 'table' parameter is now required to
have correct pointer (or array) type.  Since all invocations of the macro
already had this property and the emitted PNP data continues to include the
element size, there is no functional change.

Mostly done with the coccinelle 'spatch' tool:

  $ cat modpnpsize0.cocci
    @normaltables@
    identifier b,c;
    expression a,d,e;
    declarer MODULE_PNP_INFO;
    @@
     MODULE_PNP_INFO(a,b,c,d,
    -sizeof(d[0]),
     e);

    @singletons@
    identifier b,c,d;
    expression a;
    declarer MODULE_PNP_INFO;
    @@
     MODULE_PNP_INFO(a,b,c,&d,
    -sizeof(d),
     1);

  $ rg -l MODULE_PNP_INFO -- sys | \
    xargs spatch --in-place --sp-file modpnpsize0.cocci

(Note that coccinelle invokes diff(1) via a PATH search and expects diff to
tolerate the -B flag, which BSD diff does not.  So I had to link gdiff into
PATH as diff to use spatch.)

Tinderbox'd (-DMAKE_JUST_KERNELS).
2018-08-19 00:22:21 +00:00
np
6e862a5f4b if_vlan(4): A VLAN always has a PCP and its ifnet's if_pcp should be set
to the PCP value in use instead of IFNET_PCP_NONE.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-17 01:03:23 +00:00
np
06d6f82b42 Add the ability to look up the 3b PCP of a VLAN interface. Use it in
toe_l2_resolve to fill up the complete vtag and not just the vid.

Reviewed by:	kib@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D16752
2018-08-16 23:46:38 +00:00
mmacy
99cec0a00c Fix in6_multi double free
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
  - add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
  goes to zero OR when the interface goes away
  - decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
  - add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
  door wider still to races
  - call inp_gcmoptions synchronously after dropping the the inpcb lock

Fundamentally multicast needs a rewrite - but keep applying band-aids for now.

Tested by: kp
Reported by: novel, kp, lwhsu
2018-08-15 20:23:08 +00:00
gallatin
aeea8c4eef lagg: allow lacp to manage the link state
Lacp needs to manage the link state itself. Unlike other
lagg protocols, the ability of lacp to pass traffic
depends not only on the lagg members having link, but also
on the lacp protocol converging to a distributing state with the
link partner.

If we prematurely mark the link as up, then we will send a
gratuitous arp (via arp_handle_ifllchange()) before the lacp
interface is capable of passing traffic. When this happens,
the gratuitous arp is lost, and our link partner may cache
a stale mac address (eg, when the base mac address for the
lagg bundle changes, due to a BIOS change re-ordering NIC
unit numbers)

Reviewed by: jtl, hselasky
Sponsored by: Netflix
2018-08-13 14:13:25 +00:00
kp
7016cbb5d6 pf: Increase default hash table size
Now that we (by default) limit the number of states to 100.000 it makse sense
to also adjust the default size of the hash table.

Based on the benchmarking results in
https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/pf-states_hashsize/results/fbsd12-head.r332390/README.md
128K entries offers a good compromise between performance and memory use.

Users may still overrule this setting with the net.pf.states_hashsize and
net.pf.source_nodes_hashsize loader(8) tunables.
2018-08-05 13:54:37 +00:00
pkelsey
10742aaed6 Mark the send queue ready so ALTQ is available. 2018-08-04 01:45:17 +00:00
andrew
081aff6081 As with DPCPU_DEFINE_STATIC make VNET_DEFINE_STATIC non-static on arm64 in
modules. It also fails in the same way, we are unable to relocate static
variables as the compiler uses PC-relative loads with nothing for the
kernel linker to relocate.

Sponsored by:	DARPA, AFRL
2018-07-30 15:05:07 +00:00
andrew
537fdde573 Ensure the DPCPU and VNET module spaces are aligned to hold a pointer.
Previously they may have been aligned to a char, leading to misaligned
DPCPU and VNET variables.

Sponsored by:	DARPA, AFRL
2018-07-30 14:25:17 +00:00
andrew
6785775244 As with DPCPU_DEFINE make it a compile error to use static with VNET_DEFINE.
There is the VNET_DEFINE_STATIC macro for that.
2018-07-30 12:44:44 +00:00
pkelsey
06ba49246a ALTQ support for iflib.
Reviewed by:	jmallett, mmacy
Differential Revision:	https://reviews.freebsd.org/D16433
2018-07-25 22:46:36 +00:00
marius
2e3a85c261 Since r336611, n is only used for INET in iflib_parse_header().
Reported by:	rpokala
2018-07-24 23:40:27 +00:00
andrew
a6605d2938 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
andrew
65d35e69cb As with DPCPU create VNET_DEFINE_STATIC for when a variable needs to be
declaired static. This will allow us to change the definition on arm64
as it has the same issues described in r336349.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:31:16 +00:00
eugen
4596989e7e epair(4): make sure we do not duplicate MAC addresses
in case of reused if_index.

PR:		229957
Tested by:	O. Hartmann <ohartmann@walstatt.org>
Approved by:	avg (mentor)
2018-07-23 07:11:58 +00:00
marius
9c190e8f72 Use the maximum of isc_tx_{nsegments,tso_segments_max} for MAX_TX_DESC.
Since r336313, TSO support for LEM-class devices is removed again as it
was before the conversion of {l,}em(4) to iflib(4) in r311849 and as a
result, isc_tx_tso_segments_max is 0 for LEM-class devices now. Thus,
inappropriate watermarks were used for this class.

This is really only a band-aid, though, because so far iflib(9) doesn't
fully take into account that DMA engines can support different maxima
of segments for transfers of TSO and non-TSO packets. For example, the
DESC_RECLAIMABLE macro is based on isc_tx_nsegments while MAX_TX_DESC
used isc_tx_tso_segments_max only. For most in-tree consumers that
doesn't make a difference as the maxima are the same for both kinds of
transfers (that is, apart from the fact that TSO may require up to 2
sentinel descriptors but also not with every MAC supported). However,
isc_tx_nsegments is 8 but isc_tx_tso_segments_max is 85 by default
with ixl(4).
2018-07-22 17:51:11 +00:00
marius
8ef4610a11 - Given that the controlling expression of the receive loop in iflib_rxeof()
tests for avail > 0, avail can never be 0 within that loop. Thus, move
  decrementing avail and budget_left into the loop and before the code which
  checks for additional descriptors having become available in case all the
  previous ones have been processed but there still is budget left so the
  latter code works as expected. [1]
- In iflib_{busdma_load_mbuf_sg,parse_header}(), remove dead stores to m
  and n respectively. [2, 3]
- In collapse_pkthdr(), ensure that m_next isn't NULL before dereferencing
  it. [4]
- Remove a duplicate assignment of segs in iflib_encap().

Reported by:	Coverity
CID:		1356027 [1], 1356047 [2], 1368205 [3], 1356028 [4]
2018-07-22 17:45:44 +00:00
shurd
06b406febd Add knob to control tx ring abdication.
r323954 changed the mp ring behaviour when 64-bit atomics were
available to abdicate the TX ring rather than having one become a
consumer thereby running to completion on TX. The consumer of the mp
ring was then triggered in the tx task rather than blocking the TX call.
While this significantly lowered the number of RX drops in small-packet
forwarding, it also negatively impacts TX performance.

With this change, the default behaviour is reverted, causing one TX ring
to become a consumer during the enqueue call. A new sysctl,
dev.X.Y.iflib.tx_abdicate is added to control this behaviour.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16302
2018-07-20 17:45:26 +00:00
shurd
4db9126b14 Improve netmap TX handling when TX IRQs are not used/supported
Use the timer to poll for TX completions when there are
outstanding TX slots. Track when the last driver timer was called
to prevent overcalling it. Also clean up some kring vs NIC ring
usage.

Reviewed by:	marius, Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16300
2018-07-20 17:24:45 +00:00
ae
d94c744a40 Move invoking of callout_stop(&lle->lle_timer) into llentry_free().
This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.

PR:		209682, 225927
Submitted by:	hselasky (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4605
2018-07-17 11:33:23 +00:00
marius
eeec306a59 Assorted TSO fixes for em(4)/iflib(9) and dead code removal:
- Ever since the workaround for the silicon bug of TSO4 causing MAC hangs
  was committed in r295133, CSUM_TSO always got disabled unconditionally
  by em(4) on the first invocation of em_init_locked(). However, even with
  that problem fixed, it turned out that for at least e. g. 82579 not all
  necessary TSO workarounds are in place, still causing MAC hangs even at
  Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled
  in r323292 (r323293 for stable/10) for the EM-class by default, allowing
  users to turn it on if it happens to work with their particular EM MAC
  in a Gigabit-only environment.
  In head, the TSO workaround for speeds other than Gigabit was lost with
  the conversion to iflib(9) in r311849 (possibly along with another one
  or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4
  got enabled by default again, causing device hangs. Therefore, change the
  default for this hardware class back to have TSO4 off, allowing users
  to turn it on manually if it happens to work in their environment as
  we do in stable/{10,11}. An alternative would be to add a whitelist of
  EM-class devices where TSO4 actually is reliable with the workarounds in
  place, but given that the advantage of TSO at Gigabit speed is rather
  limited - especially with the overhead of these workarounds -, that's
  really not worth it. [1]
  This change includes the addition of an isc_capabilities to struct
  if_softc_ctx so iflib(9) can also handle interface capabilities that
  shouldn't be enabled by default which is used to handle the default-off
  capabilities of e1000 as suggested by shurd@ and moving their handling
  from em_setup_interface() to em_if_attach_pre() accordingly.
- Although 82543 support TSO4 in theory, the former lem(4) didn't have
  support for TSO4, presumably because TSO4 is even more broken in the
  LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class
  devices was enabled as part of the conversion to iflib(9) in r311849,
  causing device hangs. So revert back to the pre-r311849 behavior of
  not supporting TSO4 for LEM-class at all, which includes not creating
  a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2]
- In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET
  (65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO
  DMA must have a maxsize of the maximum TSO size plus the size of a
  VLAN header for software VLAN tagging. The iflib(9) converted em(4),
  thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE
  in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET
  in em_setup_interface() (apparently, left-over from pre-iflib(9)
  times). So remove the later and correct iflib(9) to correctly cap
  the maximum TSO size reported to the stack at IP_MAXPACKET. While at
  it, let iflib(9) use if_sethwtsomax*().
  This change includes the addition of isc_tso_max{seg,}size DMA engine
  constraints for the TSO DMA tag to struct if_shared_ctx and letting
  iflib_txsd_alloc() automatically adjust the maxsize of that tag in case
  IFCAP_VLAN_MTU is supported as requested by shurd@.
- Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet
  header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9)
  so adjustment is automatically done in case IFCAP_VLAN_MTU is supported.
  As a consequence, this adjustment now is also done in case of bnxt(4)
  which missed it previously.
- Move the reduction of the maximum TSO segment count reported to the
  stack by the number of m_pullup(9) calls (which in the worst case,
  can add another mbuf and, thus, the requirement for another DMA
  segment each) in the transmit path for performance reasons from
  em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now
  done in iflib_parse_header() rather than in the no longer existing
  em_xmit(). Moreover, this optimization applies to all drivers using
  iflib(9) and not just em(4); all in-tree iflib(9) consumers still
  have enough room to handle full size TSO packets. Also, reduce the
  adjustment to the maximum number of m_pullup(9)'s now performed in
  iflib_parse_header().
- Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9)
  in r311849 and r335338 respectively, these drivers didn't enable
  IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed
  through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on
  by default but also lagg(4) was fixed in that regard in r203548. So
  just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling
  in {em,ixl,ixlv}_setup_interface().
- Nuke other redundant IFCAP_* setting in {em,ixl,ixlv}_setup_interface()
  which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre()
  now.
- Remove some redundant/dead setting of scctx->isc_tx_csum_flags in
  em_if_attach_pre().
- Remove some IFCAP_* duplicated either directly or indirectly (e. g.
  via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS.
- Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as
  iflib(9) adds that capability unconditionally.
- Remove some unused macros from em(4).
- Bump __FreeBSD_version as some of the above changes require the modules
  of drivers using iflib(9) to be recompiled.

Okayed by:	sbruno@ at 201806 DevSummit Transport Working Group [1]
Reviewed by:	sbruno (earlier version), erj
PR:	219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2]
Differential Revision:	https://reviews.freebsd.org/D15720
2018-07-15 19:04:23 +00:00
kp
21b4f170cf pf: Fix typo in r336221
Reported by:	olivier@
2018-07-12 18:07:28 +00:00
kp
f5bc1a9c7b pf: Increate default state table size
The typical system now has a lot more memory than when pf was new, and is also
expected to handle more connections. Increase the default size of the state
table.
Note that users can overrule this using 'set limit states' in pf.conf.

From OpenBSD:
    The year is 2018.
    Mercury, Bowie, Cash, Motorola and DEC all left us.
    Just pf still has a default state table limit of 10000.
    Had! Now it's a tiny little bit more, 100k.
    lead guitar: me
    ok chorus: phessler theo claudio benno
    background school girl laughing: bob

Obtained from:	OpenBSD
2018-07-12 16:35:35 +00:00
ae
19e11c571f Deduplicate the code.
Add generic function if_tunnel_check_nesting() that does check for
allowed nesting level for tunneling interfaces and also does loop
detection. Use it in gif(4), gre(4) and me(4) interfaces.

Differential Revision:	https://reviews.freebsd.org/D16162
2018-07-09 11:03:28 +00:00
sbruno
3886e43127 struct ifmediareq *ifmrp is only used in the COMPAT_FREEBSD32 parts of
ifioctl().  Move it inside the proper #ifdef.  This was throwing a valid
"Assigned but unused" warning with gcc.

Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16063
2018-07-07 13:35:06 +00:00
will
5ce23703c1 Revert r335833.
Several third-parties use at least some of these ioctls.  While it would be
better for regression testing if they were used in base (or at least in the
test suite), it's currently not worth the trouble to push through removal.

Submitted by:	antoine, markj
2018-07-04 03:36:46 +00:00
mmacy
14de8a2820 epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
will
af6017a22f pf: remove unused ioctls.
Several ioctls are unused in pf, in the sense that no base utility
references them.  Additionally, a cursory review of pf-based ports
indicates they're not used elsewhere either.  Some of them have been
unused since the original import.  As far as I can tell, they're also
unused in OpenBSD.  Finally, removing this code removes the need for
future pf work to take them into account.

Reviewed by:		kp
Differential Revision:	https://reviews.freebsd.org/D16076
2018-07-01 01:16:03 +00:00
ae
fd52110019 Add NULL pointer check.
encap_lookup_t method can be invoked by IP encap subsytem even if none
of gif/gre/me interfaces are exist. Hash tables are allocated on demand,
when first interface is created. So, make NULL pointer check before
doing access to hash table.

PR:		229378
2018-06-28 11:39:27 +00:00
ae
377f86ae2b Move BPFIF_* macro definitions into .c file, where struct bpf_if is
declared.

They are only used in this file and there is no need to export them via
bpfdesc.h.
2018-06-19 10:34:45 +00:00
erj
a5400f53b1 iflib: Style fixes
MFC after:	1 week
2018-06-18 17:27:43 +00:00
marius
6802d9dfbc Assorted fixes to MSI-X/MSI/INTx setup in iflib(9):
- In iflib_msix_init(), VMMs with broken MSI-X activation are trying
  to be worked around by manually enabling PCIM_MSIXCTRL_MSIX_ENABLE
  before calling pci_alloc_msix(9). Apart from constituting a layering
  violation, this has the problem of leaving PCIM_MSIXCTRL_MSIX_ENABLE
  enabled when falling back to MSI or INTx when e. g. MSI-X is black-
  listed and initially also when disabled via hw.pci.enable_msix. The
  later in turn was incorrectly worked around in r325166.
  Since r310806, pci(4) itself has code to deal with broken MSI-X
  handling of VMMs, so all of these workarounds in iflib(9) can go,
  fixing non-working interrupts when falling back to MSI/INTx. In
  any case, possibly further adjustments to broken MSI-X activation
  of VMMs like enabling r310806 by default in VM environments need to
  be placed into pci(4), not iflib(9). [1]
- Also remove the pci_enable_busmaster(9) call from iflib_msix_init(),
  which is already more properly invoked from iflib_device_attach().
- When falling back to MSI/INTx, release the MSI-X BAR resource again.
- When falling back to INTx, ensure scctx->isc_vectors is set to 1 and
  not to something higher from a device with more than one MSI message
  supported.
- Make the nearby ring_state(s) stuff (static) const.

Discussed with:	jhb at BSDCan 2018 [1]
Reviewed by:	imp, jhb
Differential Revision:	https://reviews.freebsd.org/D15729
2018-06-17 20:33:02 +00:00
ae
3d1b3c6fd6 Fix typo.
Reported by:	rpokala
2018-06-16 19:21:09 +00:00
ae
a58623ba71 Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.

Reviewed by:	melifaro, olivier
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15789
2018-06-16 08:26:23 +00:00
ae
b4d3b30b6c Add missing BPF_MTAP2() for outbound packets. 2018-06-14 15:04:30 +00:00
ae
8020fef9d7 Convert if_me(4) driver to use encap_lookup_t method and be lockless on
data path.
2018-06-14 14:53:24 +00:00
jtl
8222f5cb7c Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
ae
e6c79fbed1 Rework if_gre(4) to use encap_lookup_t method to speedup lookup
of needed interface when many gre interfaces are present.

Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations. Use hash table to
speedup lookup of needed softc.
2018-06-13 11:11:33 +00:00
jtl
a0a72815a8 Fix a memory leak for the BIOCSETWF ioctl on kernels with the BPF_JITTER
option.

The BPF code was creating a compiled filter in the common filter-creation
path.  However, BPF only uses compiled filters in the read direction.
When creating a write filter, the common filter-creation code was
creating an unneeded write filter and leaking the memory used for that.

MFC after:	2 weeks
Sponsored by:	Netflix
2018-06-11 23:32:06 +00:00
ae
4764682061 Explicitly change the link state when we assingn an address.
Since we are setting IFF_UP flag on SIOCSIFADDR, it is possible, that
after this link state information still not initialized properly.
This leads to problems with routing, since now interface has
IFCAP_LINKSTATE capability and a route is considered as working only
when interface's link state is in LINK_STATE_UP (see RT_LINK_IS_UP()
macro).

Reported by:	Marek Zarychta
MFC after:	3 days
2018-06-09 09:57:14 +00:00
shurd
f7f3ce47d0 Remove tx task spinning added in r333686
This caused issues with PASTE.  Just remove the reschedule since the DELAY()
should be enough for use cases such as pkt-gen which were failing before the
change.

Reported by:	Michio Honda
Sponsored by:	Limelight Networks
2018-06-08 21:49:19 +00:00
mjg
08fabf55c9 uma: fix up r334824
Turns out there is code which ends up passing M_ZERO to counters.
Since counters zero unconditionally on their own, just ignore drop the
flag in that place.
2018-06-08 05:40:36 +00:00
mmacy
69a922f7ab rtentry_zinit: don't blindly pass through M_ZERO to counter alloc 2018-06-08 05:17:06 +00:00
erj
0ac17051d5 iflib: Record TCP checksum info in iflib when TCP checksum is requested
ixl(4) (when it switches over to using iflib) devices need the TCP header
length in order to do TCP checksum offload.

Reviewed by:	gallatin@, shurd@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15558
2018-06-07 13:03:07 +00:00
ae
d1ee857bcf Rework if_gif(4) to use new encap_lookup_t method to speedup lookup
of needed interface when many gif interfaces are present.

Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations.
Use hash table to speedup lookup of needed softc. Interfaces
with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST.
Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed
16 years ago, and actually it could work only for outbound direction.
Each protocol, that can be handled by if_gif(4) interface is registered
by separate encap handler, this helps avoid invoking the handler
for unrelated protocols (GRE, PIM, etc.).

This change allows dramatically improve performance when many gif(4)
interfaces are used.

Sponsored by:	Yandex LLC
2018-06-05 21:24:59 +00:00
ae
dfbd18b5fe Rework IP encapsulation handling code.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
  data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
  families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
  that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
  registered, encapcheck handler of each interface is invoked for each
  packet. The search takes O(n) for n interfaces. All this work is done
  with exclusive lock held.

What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
  addedr: min_length is the minimum packet length, that encapsulation
  handler expects to see; exact_match is maximum number of bits, that
  can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
  targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
  many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
  method was used from this structure, so I don't see the need to keep
  using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
  pointer. Now it is passed directly trough encap_input_t method.
  encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
  any code in the tree that uses them. All consumers use encap_attach_func()
  method, that relies on invoking of encapcheck() to determine the needed
  handler.
- introduced struct encap_config, it contains parameters of encap handler
  that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
  handlers that need more bits to match will be checked first, and if
  encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.

Reviewed by:	mmacy
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15617
2018-06-05 20:51:01 +00:00
mmacy
3fe03791ac Reduce overhead of entropy collection
- move harvest mask check inline
- move harvest mask to frequently_read out of actively
  modified cache line
- disable ether_input collection and describe its limitations
  in NOTES

Typically entropy collection in ether_input was stirring zero
in to the entropy pool while at the same time greatly reducing
max pps. This indicates that perhaps we should more closely
scrutinize how much entropy we're getting from a given source
as well as what our actual entropy collection needs are for
seeding Yarrow.

Reviewed by: cem, gallatin, delphij
Approved by: secteam
Differential Revision: https://reviews.freebsd.org/D15526
2018-05-31 21:53:07 +00:00
hselasky
34e8800cc7 Re-apply r190640.
- Restore local change to include <net/bpf.h> inside pcap.h.
This fixes ports build problems.
- Update local copy of dlt.h with new DLT types.
- Revert no longer needed <net/bpf.h> includes which were added
as part of r334277.

Suggested by:	antoine@, delphij@, np@
MFC after:	3 weeks
Sponsored by:	Mellanox Technologies
2018-05-31 09:11:21 +00:00
mmacy
a5d5b1e5b9 if_setlladdr: don't call ioctl in epoch context
PR: 228612
Reported by: markj
2018-05-30 21:46:10 +00:00
kp
fbe5f2b7e0 pf: Add missing include statement
rmlocks require <sys/lock.h> as well as <sys/rmlock.h>.
Unbreak mips build.
2018-05-30 12:40:37 +00:00
kp
de7905d658 pf: Replace rwlock on PF_RULES_LOCK with rmlock
Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.

While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.

Submitted by:	farrokhi@
Differential Revision:	https://reviews.freebsd.org/D15502
2018-05-30 07:11:33 +00:00
shurd
40a1e4b33c iflib: mark irq allocation name parameter as constant
The *name parameter passed to iflib_irq_alloc_generic and
iflib_softirq_alloc_generic is never modified. Many places in code pass
string literals and thus should not be modified.

Mark the *name parameter as a const char * instead, so that we enforce
that the name is not modified before passing to bus_describe_intr()

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	kmacy
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15343
2018-05-29 21:56:39 +00:00
mmacy
3b32e27457 iflib: hold context lock across detach for drivers that need it 2018-05-29 18:03:43 +00:00
mmacy
fd829508af rt_getifa_fib: don't use ifa but info->rti_ifa
Reported by:	kp
2018-05-29 07:14:57 +00:00
mmacy
722df2d2de route: fix missed ref adds
- ensure that we bump the ifa ref whenever we add a reference
 - defer freeing epoch protected references until after the if_purgaddrs
   loop
2018-05-29 00:53:53 +00:00
erj
701350ae28 iflib: Add new shared flag: IFLIB_ADMIN_ALWAYS_RUN
ixl(4)'s nvmupdate utility expects the nvmupdate process to run
while the interface is down; these nvm update commands use the
admin queue, so the admin queue needs to be able to generate
interrupts and be processed while the interface is down.

So add a flag that ixl(4) sets that lets the entire admin task
run even when the interface is marked down/IFF_DRV_RUNNING isn't set.

With this change, nvmupdate should function like it did pre-iflib.

Reviewed by:	gallatin@, sbruno@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15575
2018-05-26 00:46:08 +00:00
mmacy
bef06dbd7a rtrequest1_fib: we need to always bump the ifaddr refcount when we take a reference from
an rtentry. r334118 introduced a case when this was not done.

While we're here make the intent more obvious by moving the refcount
bump down to when we know we'll actually need it.

Reported by:	markj
2018-05-25 19:48:26 +00:00
mmacy
c937b516d8 CK: update consumers to use CK macros across the board
r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
2018-05-24 23:21:23 +00:00
mmacy
710b4829e5 if_delgroups: add missed unlock introduced by r334118 2018-05-24 17:54:08 +00:00
mmacy
ecd6e9d307 UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
pizzamig
fbb6c8b691 Improve MAC address uniqueness on if_epair(4).
As reported in PR184149, it can happen that epair devices can have the same
MAC address.
This solution is based on a 32-bit hash, obtained combining the if_index of
the a interface and the hostid.
If the hostid is zero, a random number is used.

PR:		184149
Reviewed by:	wollman, eugen
Approved by:	cognet
Differential Revision:	https://reviews.freebsd.org/D15329
2018-05-23 13:10:57 +00:00
markj
c354f4f2f6 Simplify lagg_input().
No functional change intended.

MFC after:	2 weeks
2018-05-22 15:35:38 +00:00
mmacy
8e61308048 ck: simplify interface with libkvm consumers by defining ck_queue types
as their queue.h equivalents if !_KERNEL
2018-05-21 01:53:23 +00:00
mmacy
da84b7fa5a net: fix uninitialized variable warning 2018-05-19 19:00:04 +00:00
mmacy
d814acfa7c mp_ring: fix i386
Even though 64-bit atomics are supported on i386 there are panics
indicating that the code does not work correctly there. Switch
to mutex based variant (and fix that while we're here).

Reported by:	pho, kib
2018-05-19 16:44:12 +00:00
mmacy
0db6398617 net: fix set but not used 2018-05-19 05:27:49 +00:00
mmacy
7aeac9ef18 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
mmacy
9f0d447325 epoch(9): allocate net epochs earlier in boot 2018-05-18 18:48:00 +00:00
mmacy
2f5a774893 epoch: move epoch variables to read mostly section 2018-05-18 17:58:15 +00:00
emaste
f0cc1a044c Use NULL for SYSINIT's last arg, which is a pointer type
Sponsored by:	The FreeBSD Foundation
2018-05-18 17:58:09 +00:00
mmacy
a48d80f193 epoch(9): Make epochs non-preemptible by default
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.

Reported by:	markj
2018-05-18 17:29:43 +00:00
mmacy
aac2a8081e epoch: add non-preemptible "critical" variant
adds:
- epoch_enter_critical() - can be called inside a different epoch,
  starts a section that will acquire any MTX_DEF mutexes or do
  anything that might sleep.
- epoch_exit_critical() - corresponding exit call
- epoch_wait_critical() - wait variant that is guaranteed that any
  threads in a section are running.
- epoch_global_critical - an epoch_wait_critical safe epoch instance

Requested by:   markj
Approved by:	sbruno
2018-05-18 01:52:51 +00:00
mmacy
00950b6e0c Fix !netmap build post r333686
Approved by:	sbruno
2018-05-16 22:25:47 +00:00
shurd
df10b02879 Work around lack of TX IRQs in iflib for netmap
When poll() is called via netmap, txsync is initially called,
and if there are no available buffers to reclaim, it waits for the driver
to notify of new buffers. Since the TX IRQ is generally not used in iflib
drivers, this ends up causing a timeout.

Work around this by having the reclaim DELAY(1) if it's initially unable
to reclaim anything, then schedule the tx task, which will spin by
continuously rescheduling itself until some buffers are reclaimed. In
general, the delay is enough to allow some buffers to be reclaimed, so
spinning is minimized.

Reported by:	Johannes Lundberg <johalun0@gmail.com>
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15455
2018-05-16 21:03:22 +00:00
shurd
8b4a96b13e Replace rmlock with epoch in lagg
Use the new epoch based reclamation API. Now the hot paths will not
block at all, and the sx lock is used for the softc data.  This fixes LORs
reported where the rwlock was obtained when the sxlock was held.

Submitted by:	mmacy
Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15355
2018-05-14 20:06:49 +00:00
mmacy
0a7aab5128 iflib(9): Add support for cloning pseudo interfaces
Part 3 of many ...
The VPC framework relies heavily on cloning pseudo interfaces
(vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc).

This pulls in that piece. Some ancillary changes get pulled
in as a side effect.

Reviewed by:	shurd@
Approved by:	sbruno@
Sponsored by:	Joyent, Inc.
Differential Revision:	https://reviews.freebsd.org/D15347
2018-05-11 20:08:28 +00:00
ae
c53ab47acf Apply the change from r272770 to if_ipsec(4) interface.
It is guaranteed that if_ipsec(4) interface is used only for tunnel
mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header.
Thus we can consider it as new packet and clear the protocols flags.
This allows ICMP/ICMPv6 properly handle errors that may cause this packet.

PR:		228108
MFC after:	1 week
2018-05-11 16:50:25 +00:00
mmacy
361b54f07a Allow different bridge types to coexist
if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.

Reviewed by:	sbruno@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15344
2018-05-11 05:00:40 +00:00