Commit Graph

4000 Commits

Author SHA1 Message Date
Andrey V. Elsukov
c6851ad04c Restore outbound packets capturing for if_gre(4). It was missed in r335048.
Also clear M_MCAST and M_BCAST flags for encapsulated datagram, since it
will have new IP header.

Approved by:	re (kib)
2018-09-17 10:10:14 +00:00
Ruslan Bukin
86c5937532 Don't mark module data as static on RISC-V.
Similar to arm64, riscv compiler uses PC-relative loads/stores,
and with static data compiler does not emit relocations.
In result, kernel module linker has nothing to fix and data accessed
from the wrong location.

Approved by:	re (gjb)
Sponsored by:	DARPA, AFRL
2018-09-12 08:05:33 +00:00
Stephen Hurd
64e6fc1379 Clean up iflib sysctls
Remove sysctls:
txq_drain_encapfail - now a duplicate of encap_txd_encap_fail
intr_link - was never incremented
intr_msix - was never incremented
rx_zero_len - was never incremented

The following were not incremented in all code-paths that apply:
m_pullups, mbuf_defrag, rxd_flush, tx_encap, rx_intr_enables, tx_frees,
encap_txd_encap_fail.

Fixes:
Replace the broken collapse_pkthdr() implementation with an MPASS().
fl_refills and fl_refills_large were not incremented when using netmap.

Reviewed by:	gallatin
Approved by:	re (marius)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16733
2018-09-06 18:51:52 +00:00
Bjoern A. Zeeb
d4672c61d3 Rather than duplicating the functionality of a macro after r322866
use the already existing one.  No functional changes.

Reviewed by:	karels, ae
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D17004
2018-09-03 22:10:49 +00:00
Stephen Hurd
bc0e855bd9 Fix compile error due to missing parenthesis in r338372
Approved by:	re (gjb)
2018-08-29 16:21:34 +00:00
Stephen Hurd
a520f8b6fe Fix potential data corruption in iflib
The MP ring may have txq pointers enqueued.  Previously, these were
passed to m_free() when IFC_QFLUSH was set.  This patch checks for
the value and doesn't call m_free().

Reviewed by:	gallatin
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16882
2018-08-29 15:55:25 +00:00
Mark Murray
19fa89e938 Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR:		230870
Reviewed by:	cem
Approved by:	so(delphij,gtetlow)
Approved by:	re(marius)
Differential Revision:	https://reviews.freebsd.org/D16898
2018-08-26 12:51:46 +00:00
Navdeep Parhar
47c39432b1 Unbreak VLANs after r337943.
ether_set_pcp should not be called from ether_output_frame for VLAN
interfaces -- the vid + pcp will be inserted during vlan_transmit in
that case. r337943 sets the VLAN's ifnet's if_pcp to a proper PCP value
and this led to double encapsulation (once with vid 0 and second time
with vid+pcp).

PR: 230794
Reviewed by:	kib@
Approved by:	re@ (gjb@)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D16887
2018-08-24 21:48:13 +00:00
Patrick Kelsey
249cc75fd1 Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of
2^32 bps or greater to be used.  Prior to this, bandwidth parameters
would simply wrap at the 2^32 boundary.  The computations in the HFSC
scheduler and token bucket regulator have been modified to operate
correctly up to at least 100 Gbps.  No other algorithms have been
examined or modified for correct operation above 2^32 bps (some may
have existing computation resolution or overflow issues at rates below
that threshold).  pfctl(8) will now limit non-HFSC bandwidth
parameters to 2^32 - 1 before passing them to the kernel.

The extensions to the pf(4) ioctl interface have been made in a
backwards-compatible way by versioning affected data structures,
supporting all versions in the kernel, and implementing macros that
will cause existing code that consumes that interface to use version 0
without source modifications.  If version 0 consumers of the interface
are used against a new kernel that has had bandwidth parameters of
2^32 or greater configured by updated tools, such bandwidth parameters
will be reported as 2^32 - 1 bps by those old consumers.

All in-tree consumers of the pf(4) interface have been updated.  To
update out-of-tree consumers to the latest version of the interface,
define PFIOC_USE_LATEST ahead of any includes and use the code of
pfctl(8) as a guide for the ioctls of interest.

PR:	211730
Reviewed by:	jmallett, kp, loos
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D16782
2018-08-22 19:38:48 +00:00
Eric Joyner
fe2bf351fe if_media: Add new 2.5G/5G/25G/40G/50G/100G/200G/400G media types
Upcoming Ethernet hardware will support new media types that aren't in the kernel
yet, so they are added here. These mostly include new 25G/50G/100G media types;
and this commit introduces new 200G/400G speeds and media.

Reviewed by:	hselasky@, jhb@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D16731
2018-08-22 18:19:56 +00:00
Matt Macy
77ad07b6a3 fix copy/paste error when clearing ifma flag
CID: 1395119
Reported by:	vangyzen
2018-08-21 22:59:22 +00:00
Conrad Meyer
b8e771e97a Back out r338035 until Warner is finished churning GSoC PNP patches
I was not aware Warner was making or planning to make forward progress in
this area and have since been informed of that.

It's easy to apply/reapply when churn dies down.
2018-08-19 00:46:22 +00:00
Conrad Meyer
faa319436f Remove unused and easy to misuse PNP macro parameter
Inspired by r338025, just remove the element size parameter to the
MODULE_PNP_INFO macro entirely.  The 'table' parameter is now required to
have correct pointer (or array) type.  Since all invocations of the macro
already had this property and the emitted PNP data continues to include the
element size, there is no functional change.

Mostly done with the coccinelle 'spatch' tool:

  $ cat modpnpsize0.cocci
    @normaltables@
    identifier b,c;
    expression a,d,e;
    declarer MODULE_PNP_INFO;
    @@
     MODULE_PNP_INFO(a,b,c,d,
    -sizeof(d[0]),
     e);

    @singletons@
    identifier b,c,d;
    expression a;
    declarer MODULE_PNP_INFO;
    @@
     MODULE_PNP_INFO(a,b,c,&d,
    -sizeof(d),
     1);

  $ rg -l MODULE_PNP_INFO -- sys | \
    xargs spatch --in-place --sp-file modpnpsize0.cocci

(Note that coccinelle invokes diff(1) via a PATH search and expects diff to
tolerate the -B flag, which BSD diff does not.  So I had to link gdiff into
PATH as diff to use spatch.)

Tinderbox'd (-DMAKE_JUST_KERNELS).
2018-08-19 00:22:21 +00:00
Navdeep Parhar
32a52e9e39 if_vlan(4): A VLAN always has a PCP and its ifnet's if_pcp should be set
to the PCP value in use instead of IFNET_PCP_NONE.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-08-17 01:03:23 +00:00
Navdeep Parhar
32d2623ae2 Add the ability to look up the 3b PCP of a VLAN interface. Use it in
toe_l2_resolve to fill up the complete vtag and not just the vid.

Reviewed by:	kib@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D16752
2018-08-16 23:46:38 +00:00
Matt Macy
f9be038601 Fix in6_multi double free
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
  - add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
  goes to zero OR when the interface goes away
  - decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
  - add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
  door wider still to races
  - call inp_gcmoptions synchronously after dropping the the inpcb lock

Fundamentally multicast needs a rewrite - but keep applying band-aids for now.

Tested by: kp
Reported by: novel, kp, lwhsu
2018-08-15 20:23:08 +00:00
Andrew Gallatin
5ccac9f972 lagg: allow lacp to manage the link state
Lacp needs to manage the link state itself. Unlike other
lagg protocols, the ability of lacp to pass traffic
depends not only on the lagg members having link, but also
on the lacp protocol converging to a distributing state with the
link partner.

If we prematurely mark the link as up, then we will send a
gratuitous arp (via arp_handle_ifllchange()) before the lacp
interface is capable of passing traffic. When this happens,
the gratuitous arp is lost, and our link partner may cache
a stale mac address (eg, when the base mac address for the
lagg bundle changes, due to a BIOS change re-ordering NIC
unit numbers)

Reviewed by: jtl, hselasky
Sponsored by: Netflix
2018-08-13 14:13:25 +00:00
Kristof Provost
91e0f2d200 pf: Increase default hash table size
Now that we (by default) limit the number of states to 100.000 it makse sense
to also adjust the default size of the hash table.

Based on the benchmarking results in
https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/pf-states_hashsize/results/fbsd12-head.r332390/README.md
128K entries offers a good compromise between performance and memory use.

Users may still overrule this setting with the net.pf.states_hashsize and
net.pf.source_nodes_hashsize loader(8) tunables.
2018-08-05 13:54:37 +00:00
Patrick Kelsey
8f410865b8 Mark the send queue ready so ALTQ is available. 2018-08-04 01:45:17 +00:00
Andrew Turner
b6ea4c5a2a As with DPCPU_DEFINE_STATIC make VNET_DEFINE_STATIC non-static on arm64 in
modules. It also fails in the same way, we are unable to relocate static
variables as the compiler uses PC-relative loads with nothing for the
kernel linker to relocate.

Sponsored by:	DARPA, AFRL
2018-07-30 15:05:07 +00:00
Andrew Turner
cd2106eaea Ensure the DPCPU and VNET module spaces are aligned to hold a pointer.
Previously they may have been aligned to a char, leading to misaligned
DPCPU and VNET variables.

Sponsored by:	DARPA, AFRL
2018-07-30 14:25:17 +00:00
Andrew Turner
bc61d94997 As with DPCPU_DEFINE make it a compile error to use static with VNET_DEFINE.
There is the VNET_DEFINE_STATIC macro for that.
2018-07-30 12:44:44 +00:00
Patrick Kelsey
b8ca475604 ALTQ support for iflib.
Reviewed by:	jmallett, mmacy
Differential Revision:	https://reviews.freebsd.org/D16433
2018-07-25 22:46:36 +00:00
Marius Strobl
c9a49a4fd8 Since r336611, n is only used for INET in iflib_parse_header().
Reported by:	rpokala
2018-07-24 23:40:27 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Andrew Turner
fceba23f93 As with DPCPU create VNET_DEFINE_STATIC for when a variable needs to be
declaired static. This will allow us to change the definition on arm64
as it has the same issues described in r336349.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:31:16 +00:00
Eugene Grosbein
804771f553 epair(4): make sure we do not duplicate MAC addresses
in case of reused if_index.

PR:		229957
Tested by:	O. Hartmann <ohartmann@walstatt.org>
Approved by:	avg (mentor)
2018-07-23 07:11:58 +00:00
Marius Strobl
7474544bac Use the maximum of isc_tx_{nsegments,tso_segments_max} for MAX_TX_DESC.
Since r336313, TSO support for LEM-class devices is removed again as it
was before the conversion of {l,}em(4) to iflib(4) in r311849 and as a
result, isc_tx_tso_segments_max is 0 for LEM-class devices now. Thus,
inappropriate watermarks were used for this class.

This is really only a band-aid, though, because so far iflib(9) doesn't
fully take into account that DMA engines can support different maxima
of segments for transfers of TSO and non-TSO packets. For example, the
DESC_RECLAIMABLE macro is based on isc_tx_nsegments while MAX_TX_DESC
used isc_tx_tso_segments_max only. For most in-tree consumers that
doesn't make a difference as the maxima are the same for both kinds of
transfers (that is, apart from the fact that TSO may require up to 2
sentinel descriptors but also not with every MAC supported). However,
isc_tx_nsegments is 8 but isc_tx_tso_segments_max is 85 by default
with ixl(4).
2018-07-22 17:51:11 +00:00
Marius Strobl
8b8d90931d - Given that the controlling expression of the receive loop in iflib_rxeof()
tests for avail > 0, avail can never be 0 within that loop. Thus, move
  decrementing avail and budget_left into the loop and before the code which
  checks for additional descriptors having become available in case all the
  previous ones have been processed but there still is budget left so the
  latter code works as expected. [1]
- In iflib_{busdma_load_mbuf_sg,parse_header}(), remove dead stores to m
  and n respectively. [2, 3]
- In collapse_pkthdr(), ensure that m_next isn't NULL before dereferencing
  it. [4]
- Remove a duplicate assignment of segs in iflib_encap().

Reported by:	Coverity
CID:		1356027 [1], 1356047 [2], 1368205 [3], 1356028 [4]
2018-07-22 17:45:44 +00:00
Stephen Hurd
fe51d4cdfe Add knob to control tx ring abdication.
r323954 changed the mp ring behaviour when 64-bit atomics were
available to abdicate the TX ring rather than having one become a
consumer thereby running to completion on TX. The consumer of the mp
ring was then triggered in the tx task rather than blocking the TX call.
While this significantly lowered the number of RX drops in small-packet
forwarding, it also negatively impacts TX performance.

With this change, the default behaviour is reverted, causing one TX ring
to become a consumer during the enqueue call. A new sysctl,
dev.X.Y.iflib.tx_abdicate is added to control this behaviour.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16302
2018-07-20 17:45:26 +00:00
Stephen Hurd
dd7fbcf193 Improve netmap TX handling when TX IRQs are not used/supported
Use the timer to poll for TX completions when there are
outstanding TX slots. Track when the last driver timer was called
to prevent overcalling it. Also clean up some kring vs NIC ring
usage.

Reviewed by:	marius, Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16300
2018-07-20 17:24:45 +00:00
Andrey V. Elsukov
acf673edf0 Move invoking of callout_stop(&lle->lle_timer) into llentry_free().
This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.

PR:		209682, 225927
Submitted by:	hselasky (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4605
2018-07-17 11:33:23 +00:00
Marius Strobl
7f87c0406d Assorted TSO fixes for em(4)/iflib(9) and dead code removal:
- Ever since the workaround for the silicon bug of TSO4 causing MAC hangs
  was committed in r295133, CSUM_TSO always got disabled unconditionally
  by em(4) on the first invocation of em_init_locked(). However, even with
  that problem fixed, it turned out that for at least e. g. 82579 not all
  necessary TSO workarounds are in place, still causing MAC hangs even at
  Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled
  in r323292 (r323293 for stable/10) for the EM-class by default, allowing
  users to turn it on if it happens to work with their particular EM MAC
  in a Gigabit-only environment.
  In head, the TSO workaround for speeds other than Gigabit was lost with
  the conversion to iflib(9) in r311849 (possibly along with another one
  or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4
  got enabled by default again, causing device hangs. Therefore, change the
  default for this hardware class back to have TSO4 off, allowing users
  to turn it on manually if it happens to work in their environment as
  we do in stable/{10,11}. An alternative would be to add a whitelist of
  EM-class devices where TSO4 actually is reliable with the workarounds in
  place, but given that the advantage of TSO at Gigabit speed is rather
  limited - especially with the overhead of these workarounds -, that's
  really not worth it. [1]
  This change includes the addition of an isc_capabilities to struct
  if_softc_ctx so iflib(9) can also handle interface capabilities that
  shouldn't be enabled by default which is used to handle the default-off
  capabilities of e1000 as suggested by shurd@ and moving their handling
  from em_setup_interface() to em_if_attach_pre() accordingly.
- Although 82543 support TSO4 in theory, the former lem(4) didn't have
  support for TSO4, presumably because TSO4 is even more broken in the
  LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class
  devices was enabled as part of the conversion to iflib(9) in r311849,
  causing device hangs. So revert back to the pre-r311849 behavior of
  not supporting TSO4 for LEM-class at all, which includes not creating
  a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2]
- In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET
  (65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO
  DMA must have a maxsize of the maximum TSO size plus the size of a
  VLAN header for software VLAN tagging. The iflib(9) converted em(4),
  thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE
  in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET
  in em_setup_interface() (apparently, left-over from pre-iflib(9)
  times). So remove the later and correct iflib(9) to correctly cap
  the maximum TSO size reported to the stack at IP_MAXPACKET. While at
  it, let iflib(9) use if_sethwtsomax*().
  This change includes the addition of isc_tso_max{seg,}size DMA engine
  constraints for the TSO DMA tag to struct if_shared_ctx and letting
  iflib_txsd_alloc() automatically adjust the maxsize of that tag in case
  IFCAP_VLAN_MTU is supported as requested by shurd@.
- Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet
  header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9)
  so adjustment is automatically done in case IFCAP_VLAN_MTU is supported.
  As a consequence, this adjustment now is also done in case of bnxt(4)
  which missed it previously.
- Move the reduction of the maximum TSO segment count reported to the
  stack by the number of m_pullup(9) calls (which in the worst case,
  can add another mbuf and, thus, the requirement for another DMA
  segment each) in the transmit path for performance reasons from
  em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now
  done in iflib_parse_header() rather than in the no longer existing
  em_xmit(). Moreover, this optimization applies to all drivers using
  iflib(9) and not just em(4); all in-tree iflib(9) consumers still
  have enough room to handle full size TSO packets. Also, reduce the
  adjustment to the maximum number of m_pullup(9)'s now performed in
  iflib_parse_header().
- Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9)
  in r311849 and r335338 respectively, these drivers didn't enable
  IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed
  through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on
  by default but also lagg(4) was fixed in that regard in r203548. So
  just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling
  in {em,ixl,ixlv}_setup_interface().
- Nuke other redundant IFCAP_* setting in {em,ixl,ixlv}_setup_interface()
  which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre()
  now.
- Remove some redundant/dead setting of scctx->isc_tx_csum_flags in
  em_if_attach_pre().
- Remove some IFCAP_* duplicated either directly or indirectly (e. g.
  via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS.
- Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as
  iflib(9) adds that capability unconditionally.
- Remove some unused macros from em(4).
- Bump __FreeBSD_version as some of the above changes require the modules
  of drivers using iflib(9) to be recompiled.

Okayed by:	sbruno@ at 201806 DevSummit Transport Working Group [1]
Reviewed by:	sbruno (earlier version), erj
PR:	219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2]
Differential Revision:	https://reviews.freebsd.org/D15720
2018-07-15 19:04:23 +00:00
Kristof Provost
b895839daa pf: Fix typo in r336221
Reported by:	olivier@
2018-07-12 18:07:28 +00:00
Kristof Provost
f4cdb8aedd pf: Increate default state table size
The typical system now has a lot more memory than when pf was new, and is also
expected to handle more connections. Increase the default size of the state
table.
Note that users can overrule this using 'set limit states' in pf.conf.

From OpenBSD:
    The year is 2018.
    Mercury, Bowie, Cash, Motorola and DEC all left us.
    Just pf still has a default state table limit of 10000.
    Had! Now it's a tiny little bit more, 100k.
    lead guitar: me
    ok chorus: phessler theo claudio benno
    background school girl laughing: bob

Obtained from:	OpenBSD
2018-07-12 16:35:35 +00:00
Andrey V. Elsukov
98a8fdf6da Deduplicate the code.
Add generic function if_tunnel_check_nesting() that does check for
allowed nesting level for tunneling interfaces and also does loop
detection. Use it in gif(4), gre(4) and me(4) interfaces.

Differential Revision:	https://reviews.freebsd.org/D16162
2018-07-09 11:03:28 +00:00
Sean Bruno
2da1967762 struct ifmediareq *ifmrp is only used in the COMPAT_FREEBSD32 parts of
ifioctl().  Move it inside the proper #ifdef.  This was throwing a valid
"Assigned but unused" warning with gcc.

Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16063
2018-07-07 13:35:06 +00:00
Will Andrews
cc535c95ca Revert r335833.
Several third-parties use at least some of these ioctls.  While it would be
better for regression testing if they were used in base (or at least in the
test suite), it's currently not worth the trouble to push through removal.

Submitted by:	antoine, markj
2018-07-04 03:36:46 +00:00
Matt Macy
6573d7580b epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
Will Andrews
c1887e9f09 pf: remove unused ioctls.
Several ioctls are unused in pf, in the sense that no base utility
references them.  Additionally, a cursory review of pf-based ports
indicates they're not used elsewhere either.  Some of them have been
unused since the original import.  As far as I can tell, they're also
unused in OpenBSD.  Finally, removing this code removes the need for
future pf work to take them into account.

Reviewed by:		kp
Differential Revision:	https://reviews.freebsd.org/D16076
2018-07-01 01:16:03 +00:00
Andrey V. Elsukov
6e081509db Add NULL pointer check.
encap_lookup_t method can be invoked by IP encap subsytem even if none
of gif/gre/me interfaces are exist. Hash tables are allocated on demand,
when first interface is created. So, make NULL pointer check before
doing access to hash table.

PR:		229378
2018-06-28 11:39:27 +00:00
Andrey V. Elsukov
ca3cd72b17 Move BPFIF_* macro definitions into .c file, where struct bpf_if is
declared.

They are only used in this file and there is no need to export them via
bpfdesc.h.
2018-06-19 10:34:45 +00:00
Eric Joyner
dfae03b5b5 iflib: Style fixes
MFC after:	1 week
2018-06-18 17:27:43 +00:00
Marius Strobl
e4defe55a8 Assorted fixes to MSI-X/MSI/INTx setup in iflib(9):
- In iflib_msix_init(), VMMs with broken MSI-X activation are trying
  to be worked around by manually enabling PCIM_MSIXCTRL_MSIX_ENABLE
  before calling pci_alloc_msix(9). Apart from constituting a layering
  violation, this has the problem of leaving PCIM_MSIXCTRL_MSIX_ENABLE
  enabled when falling back to MSI or INTx when e. g. MSI-X is black-
  listed and initially also when disabled via hw.pci.enable_msix. The
  later in turn was incorrectly worked around in r325166.
  Since r310806, pci(4) itself has code to deal with broken MSI-X
  handling of VMMs, so all of these workarounds in iflib(9) can go,
  fixing non-working interrupts when falling back to MSI/INTx. In
  any case, possibly further adjustments to broken MSI-X activation
  of VMMs like enabling r310806 by default in VM environments need to
  be placed into pci(4), not iflib(9). [1]
- Also remove the pci_enable_busmaster(9) call from iflib_msix_init(),
  which is already more properly invoked from iflib_device_attach().
- When falling back to MSI/INTx, release the MSI-X BAR resource again.
- When falling back to INTx, ensure scctx->isc_vectors is set to 1 and
  not to something higher from a device with more than one MSI message
  supported.
- Make the nearby ring_state(s) stuff (static) const.

Discussed with:	jhb at BSDCan 2018 [1]
Reviewed by:	imp, jhb
Differential Revision:	https://reviews.freebsd.org/D15729
2018-06-17 20:33:02 +00:00
Andrey V. Elsukov
ae11b3829c Fix typo.
Reported by:	rpokala
2018-06-16 19:21:09 +00:00
Andrey V. Elsukov
20efcfc602 Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.

Reviewed by:	melifaro, olivier
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15789
2018-06-16 08:26:23 +00:00
Andrey V. Elsukov
9597ff83b8 Add missing BPF_MTAP2() for outbound packets. 2018-06-14 15:04:30 +00:00
Andrey V. Elsukov
2addcba7d5 Convert if_me(4) driver to use encap_lookup_t method and be lockless on
data path.
2018-06-14 14:53:24 +00:00
Jonathan T. Looney
0766f278d8 Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
Andrey V. Elsukov
a5185adeb6 Rework if_gre(4) to use encap_lookup_t method to speedup lookup
of needed interface when many gre interfaces are present.

Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations. Use hash table to
speedup lookup of needed softc.
2018-06-13 11:11:33 +00:00