Commit Graph

3810 Commits

Author SHA1 Message Date
Hans Petter Selasky
fa3f256682 Disallow TUN and TAP character device IOCTLs to modify the network device
type to any value. This can cause page faults and panics due to accessing
uninitialized fields in the "struct ifnet" which are specific to the network
device type.

MFC after:	1 week
Found by:	jau@iki.fi
PR:		223767
Sponsored by:	Mellanox Technologies
2017-11-29 09:40:11 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Stephen Hurd
7274b2f6be Fix off-by-one error in bit_nclear() usage
bit_nclear() takes the bit numbers for the start and end bits, not the start
and a count.  This was resulting in memory corruption past the end of the
bitstr_t.

Sponsored by:	Limelight Networks
2017-11-20 21:57:04 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Konstantin Belousov
319a53f5ad Fix build.
Sponsored by:	The FreeBSD Foundation
2017-11-19 11:21:16 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Stephen Hurd
d27352644a Fix default numbers of iflib queue sets
The intent appears to be having one RX/TX queue set per core,
but since scctx->isc_n[tr]xqsets is set to max before calling
iflib_msix_init(), both end up being set to total number of cores.

Use ctx->ifc_sysctl_n[rt]xqs as the selected value and
scctx->isc_n[rt]xqsets as the max. This should result in what appears
to be the intended behaviour

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13096
2017-11-16 18:52:58 +00:00
Antoine Brodin
f5056d933a Do not leak control in raw_usend 2017-11-08 23:20:05 +00:00
Konstantin Belousov
3cf8254f1e Add a place for a driver to report rx timestamps in nanoseconds from
boot for the received packets.

The rcv_tstmp field overlaps the place of Ln header length indicators,
not used by received packets.  The basic pkthdr rearrangement change
in sys/mbuf.h was provided by gallatin.

There are two accompanying M_ flags: M_TSTMP means that there is the
timestamp (and it was generated by hardware).

Another flag M_TSTMP_HPREC indicates that the timestamp is
high-precision.  Practically M_TSTMP_HPREC means that hardware
provided additional precision comparing with the stamps when the flag
is not set.  E.g., for ConnectX all packets are stamped by hardware
when PCIe transaction to write out the completion descriptor is
performed, but PTP packet are stamped on port.  For Intel cards, when
PTP assist is enabled, only PTP packets are stamped in the limited
number of registers, so if Intel cards ever start support this
mechanism, they would always set M_TSTMP | M_TSTMP_HPREC if hardware
timestamp is present for the given packet.

Add IFCAP_HWRXTSTMP interface capability to indicate the support for
hardware rx timestamping, and ifconfig(8) command to toggle it.

Based on the patch by:	gallatin
Reviewed by:	gallatin (previous version), hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks (? mbuf KBI issue)
X-Differential revision:	https://reviews.freebsd.org/D12638
2017-11-07 09:29:14 +00:00
Sean Bruno
abec47242a Fix NOINET/NOINET6 build during compilation of iflib.
Reported by:	kib
2017-11-06 19:54:25 +00:00
Stephen Hurd
35e4e998d8 Only chain non-LRO mbufs when LRO is not possible
Preserve packet order between tcp_lro_rx() and if_input() to avoid
creating extra corner cases. If no packets can be LROed, combine them
into one chain for submission via if_input(). If any packet can
potentially be LROed however, retain old behaviour and call if_input()
for each packet.

This should keep the 12% improvement for small packet forwarding intact,
but mostly avoids impacting the LRO case.

Reviewed by:	cem, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12876
2017-11-06 16:23:21 +00:00
Eugene Grosbein
9f23a54e52 Allow a process to assign an IP address to local ppp interface
even if kernel routing table already has a route to the address in question
installed by some routing daemon (PR 223129).

Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647).

PR:			222647, 223129
Reviewed by:		gnn
Approved by:		avg (mentor), mav (mentor)
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D12747
2017-11-05 14:41:48 +00:00
Kristof Provost
85f330e5fa epair: Fix panic on unload
The VNET_SYSUNINIT() callback is executed after the MOD_UNLOAD. That means
that netisr_unregister() has already been called when
netisr_unregister_vnet() gets calls, leading to an assertion failure.

Restore the expected order of operations by performing everything that
was done in MOD_UNLOAD to a SYSUNINIT() (that will be called after the
VNET_SYSUNINIT()).

Differential Revision:	https://reviews.freebsd.org/D12771
2017-11-01 14:27:26 +00:00
Stephen Hurd
0b6c52b69f Preserve TSO checksum flags
r323941 incorrectly disabled TSO flags based on MTU.

Reported by:	Yuri Pankov <yuripv@gmx.com>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12880
2017-10-31 19:03:35 +00:00
Stephen Hurd
a1b799ca5b Fix PR221990 - Assertion at iflib.c:1947
ifl_pidx and ifl_credits are going out of sync in _iflib_fl_refill() as they
use different update log.  Use the same update logic for both, and add a
final call to isc_rxd_refill() to handle early exits from the loop.

PR:		221990
Reported by:	pho
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12798
2017-10-31 17:50:42 +00:00
Stephen Hurd
10e0d93811 Fix build with nodevice netmap
iru_init() was declared and used outside the DEV_NETMAP
conditional blocks, but was implemented inside one. Move the
implementation out of the DEV_NETMAP block to allow building with
netmap disabled.

Reported by:	Andrew Turner <andrew@fubar.geek.nz>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12842
2017-10-31 02:49:28 +00:00
Stephen Hurd
09b57b7f40 bnxt: HW_LRO Rx Pkt with > 32 fragments caused Crash (iflib)
Broadcom NIC with HW_LRO setting max_agg_segs >= 6 can generate Rx pkt with
64 (2^6) fragments, modify IFLIB_MAX_RX_SEGS to 64 to avoid memory
corruption / Crash.

Submitted by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D12774
2017-10-30 21:20:33 +00:00
Stephen Hurd
2d873474b2 Fix PR222744 - netmap errors with iflib em driver
Fix error when refilling netmap buffers that resulted in the first
buffer of the successive passes through ifl_bus_addrs[] leaving the
first value unset (tmp_pidx started at 1, not zero after the first time
through the loop).

Leave the one unused buffer required by some NICs visible in the netmap
ring rather than hidden. There will always be a buffer in use by the
kernel now when an iflib driver is used via netmap.

Always get the netmap slot index via netmap_idx_n2k() to account for
nkr_hwofs in a consistent way.

Split shared functionality into new functions.
iru_init(): shared by _iflib_fl_refill() and netmap_fl_refill()
netmap_fl_refill(): shared by iflib_netmap_rxsync() and
iflib_netmap_rxq_init()

PR:		222744
Reported by:	Shirkdog <mshirk@daemon-security.com>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12769
2017-10-30 21:14:31 +00:00
Stephen Hurd
0fdea53954 Avoid enabling MSI-X if MSI-X is disabled globally
It was reported on the community call that with
hw.pci.enable_msix=0, iflib would enable MSI-X on the device and attempt
to use it, which caused issues. Test the sysctl explicitly and do not
enable MSI-X if it's disabled globally.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12805
2017-10-30 21:08:12 +00:00
Stephen Hurd
3429c02f82 Some cache related optimizations
1. prefetch 128 bytes of mbufs.
2. Re-order filling the pkt_info so cache stalls happen at the end
3. Define empty prefetch2cachelines() macro when the function isn't present.

Provides small performance improvments on some hardware

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12447
2017-10-23 20:50:08 +00:00
Bjoern A. Zeeb
8e94025b41 With r181803 on 2008-08-17 23:27:27Z the first VIMAGE commit went into
HEAD.  Enable VIMAGE in GENERIC kernels and some others (where GENERIC does
not exist) on HEAD.

Disable building LINT-VIMAGE with VIMAGE being default.

This should give it a lot more exposure in the run-up to 12 to help
us evaluate whether to keep it on by default or not.
We are also hoping to get better performance testing.
The feature can be disabled using nooptions.

Requested by:		many
Reviewed by:		kristof, emaste, hiren
X-MFC after:		never
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D12639
2017-10-20 21:40:59 +00:00
Andriy Voskoboinyk
c64c1f95a7 ifnet(9): split ifc_alloc_unit() (should simplify code flow)
Allocate smallest unit number from pool via ifc_alloc_unit_next()
and exact unit number (if available) via ifc_alloc_unit_specific().

While here, address possible deadlock (mentioned in PR).

PR:		217401
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D12551
2017-10-16 21:21:31 +00:00
Sepherosa Ziehau
2832cbe7d2 rss: Remove never defined UDP_IPV4_EX
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12455
2017-10-11 06:08:01 +00:00
Stephen Hurd
1c0054d261 Fix "taskqgroup_attach: setaffinity failed: 3" with iflib drivers
Improved logging added in r323879 exposed an error during
attach. We need the irq, not the rid to work correctly. em uses
shared irqs, so it will use the same irq for TX as RX. bnxt does
not use shared irqs, or TX irqs at all, so there's no need to set
the TX irq affinity.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12496
2017-10-05 14:43:30 +00:00
Conrad Meyer
916616c4c5 Add PNP metadata to more drivers
GPUs: radeonkms, i915kms
NICs: if_em, if_igb, if_bnxt

This metadata isn't used yet, but it will be handy to have later to
implement automatic module loading.

Reviewed by:	imp, mmacy
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12488
2017-09-26 23:23:58 +00:00
Stephen Hurd
1225d9da9f Have ifmp_ring_enqueue() abdicate instead of switch to a consumer
Move TX out of the enqueue() path. As a result, we need
to have ifmp_ring_check_drainage() pick up from the abdicate state.

We also need to either enqueue the TX task, or check drainage
after calling ifmp_ring_enqueue() to ensure it's sent.

This change results in a 30% small packet forwarding improvement.

Reviewed by:	olivier, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12439
2017-09-23 16:46:30 +00:00
Stephen Hurd
f4d2154e0c Make the rx budget a tunable
This allows tuning the rx budget for special load profiles
as well as more easily testing to determine sane defaults.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12445
2017-09-23 01:37:01 +00:00
Stephen Hurd
20f63282f8 Chain mbufs before passing to if_input()
Build a list of mbufs to pass to if_input() after LRO. Results in
12% small packet forwarding rate improvement.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12444
2017-09-23 01:35:14 +00:00
Stephen Hurd
c5cf217261 Some small packet performance improvements
If the packet is smaller than MTU, disable the TSO flags.
Move TCP header parsing inside the IS_TSO?() test.
Add a new IFLIB_NEED_ZERO_CSUM flag to indicate the checksums need to be zeroed before TX.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12442
2017-09-23 01:33:20 +00:00
Kristof Provost
ed9de14d2f bridge: Set module version
This ensures that the loader will not load the module if it's also built in to
the kernel.

PR:		220860
Submitted by:	Eugene Grosbein <eugen@freebsd.org>
Reported by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
2017-09-21 14:14:01 +00:00
Stephen Hurd
d0d0ad0ae2 Fix iflib netmap RX
RXQ setup for netmap was broken because netmap_rxq_init was getting called
before IFDI_INIT - thus we ended up with ring tail pointer being reset to zero.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12140
2017-09-20 20:40:49 +00:00
Alan Cox
e9bfbb02c5 In r288122, we changed vm_page_unwire() so that it returns a Boolean
indicating whether the page's wire count transitioned to zero.  Use that
return value in zbuf_page_free() rather than checking the wire count.

MFC after:	1 week
2017-09-20 04:59:52 +00:00
Stephen Hurd
ab2e3f7958 Revert r323516 (iflib rollup)
This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.

I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.

Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
2017-09-16 02:41:38 +00:00
Stephen Hurd
d300df0182 Roll up iflib commits from github. This pulls in most of the work done
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/

This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree.  The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.

Here is a summary of changes included in this patch:

1)  More checks when INVARIANTS are enabled for eariler problem
    detection
2)  Group Task Queue cleanups
    - Fix use of duplicate shortdesc for gtaskqueue malloc type.
      Some interfaces such as memguard(9) use the short description to
      identify malloc types, so duplicates should be avoided.
3)  Allow gtaskqueues to use ithreads in addition to taskqueues
    - In some cases, this can improve performance
4)  Better logging when taskqgroup_attach*() fails to set interrupt
    affinity.
5)  Do not start gtaskqueues until they're needed
6)  Have mp_ring enqueue function enter the ABDICATED rather than BUSY
    state.  This moves the TX to the gtaskq and allows processing to
    continue faster as well as make TX batching more likely.
7)  Add an ift_txd_errata function to struct if_txrx.  This allows
    drivers to inspect/modify mbufs before transmission.
8)  Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
    checksums zeroed for checksum offload to work.  This avoids modifying
    packet data in the TX path when possible.
9)  Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B.  We
    often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
    bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
    This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
    This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
    rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
    before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
    IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
    especially when RX and TX share an interrupt.  RX will attempt to
    take the first threads on a core, and TX will attempt to take
    successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
    cpus an interrupt has affinity with.  This allows TX queues to
    ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
    - timer_int - interval at which to run per-queue timers in ticks
    - force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
    - rx_budget allows tuning the batch size on the RX path
    - watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
    IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
    when waiting for firmware
    - After interrupts are enabled, convert all waits to sleeps
    - Eliminates e1000 software/firmware synchronization busy waits after
      startup
25) e1000: Remove special case for budget=1 in em_txrx.c
    - Premature optimization which may actually be incorrect with
      multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
    RX and TX.
    - Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
    Much easier to understand separate functions and "if (is_igb)" than
    previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"

#blamebruno

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12235
2017-09-13 01:18:42 +00:00
Matt Joras
fdbf11746a Allow vlan interfaces to rx through netmap(4).
Normally after receiving a packet, a vlan(4) interface sends the packet
back through its parent interface's rx routine so that it can be
processed as an untagged frame. It does this by using the parent's
ifp->if_input. This is incompatible with netmap(4), which replaces the
vlan(4) interface's if_input with a netmap(4) hook. Fix this by using
the vlan(4) interface's ifp instead of the parent's directly.

Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	rstone
Approved by:	rstone (mentor)
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12191
2017-09-13 00:25:09 +00:00
Navdeep Parhar
aa0186bc36 Make LACP based lagg work with interfaces (like 100Gbps and 25Gbps) that
report extended media types.

lacp_aggregator_bandwidth() uses the media to determine the speed of the
interface and returns 0 for IFM_OTHER without the bits in the extended
range.

Reported by:	kbowling@
Reviewed by:	eugen_grosbein.net, mjoras@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D12188
2017-09-06 14:36:35 +00:00
Hans Petter Selasky
95ed5015ec Add support for generic backpressure indicator for ratelimited
transmit queues aswell as non-ratelimited ones.

Add the required structure bits in order to support a backpressure
indication with ratelimited connections aswell as non-ratelimited
ones. The backpressure indicator is a value between zero and 65535
inclusivly, indicating if the destination transmit queue is empty or
full respectivly. Applications can use this value as a decision point
for when to stop transmitting data to avoid endless ENOBUFS error
codes upon transmitting an mbuf. This indicator is also useful to
reduce the latency for ratelimited queues.

Reviewed by:		gallatin, kib, gnn
Differential Revision:	https://reviews.freebsd.org/D11518
Sponsored by:		Mellanox Technologies
2017-09-06 13:56:18 +00:00
Sepherosa Ziehau
0f3af0411d if: Add ioctls to get RSS key and hash type/function.
It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299
https://svnweb.freebsd.org/base?view=revision&revision=322485

These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.

Setting RSS key and hash type/function is a different story,
which probably requires more discussion.

Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.

Reviewed by:	gallatin
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12174
2017-09-05 05:28:52 +00:00
Gleb Smirnoff
2cc3b2eec5 Do not abuse flag that is clearly marked as unused.
This creates conflicts with FreeBSD variations that may use it.  The
usage of the flag M_TOOBIG is limited to iflib queue, thus using
one of M_PROTO flags is fine.  There is no need to grab global flag.

Silence from:	kmacy, sbruno (2 weeks)
2017-08-31 23:19:18 +00:00
Sean Bruno
a969350226 Revert r323008 and its conversion of e1000/iflib to using SX locks.
This seems to be missing something on the 82574L causing NFS root mounts
to hang.

Reported by:	kib
2017-08-30 18:56:24 +00:00
Sean Bruno
e17e5b4134 Continuation of lock cleanup in e1000.
Post-cold sleep instead of DELAY when waiting for firmware.

Convert softc mutex to an SX lock.  Change all waits to sleeps
once interrupts are enabled (and it is safe to sleep).

Submitted by:	Matt Macy <matt@mattmacy.io>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12101
2017-08-30 00:20:43 +00:00
Gleb Smirnoff
dcfa556b02 Garbage collect RT_NORTREF, which is no longer in use after FLOWTABLE removal. 2017-08-24 23:08:12 +00:00
Sean Bruno
21e10b16e0 iflib: call device's if_init function during vlan initialization.
Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	shurd
Sponsored by:	Broadcom
Differential Revision:	https://reviews.freebsd.org/D12098
2017-08-23 21:49:56 +00:00
Kristof Provost
9ce40d321d bpf: Fix incorrect cleanup
Cleaning up a bpf_if is a two stage process. We first move it to the
bpf_freelist (in bpfdetach()) and only later do we actually free it (in
bpf_ifdetach()).

We cannot set the ifp->if_bpf to NULL from bpf_ifdetach() because it's
possible that the ifnet has already gone away, or that it has been assigned
a new bpf_if.
This can lead to a struct ifnet which is up, but has if_bpf set to NULL,
which will panic when we try to send the next packet.

Keep track of the pointer to the bpf_if (because it's not always
ifp->if_bpf), and NULL it immediately in bpfdetach().

PR:		213896
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11782
2017-08-16 19:40:07 +00:00
Matt Joras
d148c2a2b1 Rework vlan(4) locking.
Previously the locking of vlan(4) interfaces was not very comprehensive.
Particularly there was very little protection against the destruction of
active vlan(4) interfaces or concurrent modification of a vlan(4)
interface. The former readily produced several different panics.

The changes can be summarized as using two global vlan locks (an
rmlock(9) and an sx(9)) to protect accesses to the if_vlantrunk field of
struct ifnet, in addition to other places where global exclusive access
is required. vlan(4) should now be much more resilient to the destruction
of active interfaces and concurrent calls into the configuration path.

PR:	220980
Reviewed by:	ae, markj, mav, rstone
Approved by:	rstone (mentor)
MFC after:	4 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11370
2017-08-15 17:52:37 +00:00
Sean Bruno
5c5ca36ca2 Don't leak mbufs if clusers exceeds the number of segments. This would
leak mbufs over time causing crashes.

PR:		221202
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	gergely.czuczy@harmless.hu
Sponsored by:	Limelight Networks
2017-08-10 03:43:23 +00:00
Sean Bruno
18a660b344 Export IFCAP_HWSTATS so that we don't experience double stats counting
on iflib enabled devices.

PR:		220198
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	Ben Woods <woodsb02@freebsd.org>
Sponsored by:	Limelight Networks
2017-08-10 03:11:05 +00:00
Andrey V. Elsukov
95e8b991ca Add to if_enc(4) ability to capture packets via BPF after pfil processing.
New flag 0x4 can be configured in net.enc.[in|out].ipsec_bpf_mask.
When it is set, if_enc(4) additionally captures a packet via BPF after
invoking pfil hook. This may be useful for debugging.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D11804
2017-08-09 12:24:07 +00:00
Andrey V. Elsukov
1a01e0e7ac Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
from enc_hhook().

This should solve the problem when pf is used with if_enc(4) interface,
and outbound packet with existing PCB checked by pf, and this leads to
deadlock due to pf does its own PCB lookup and tries to take rlock when
wlock is already held.

Now we pass PCB pointer if it is known to the pfil hook, this helps to
avoid extra PCB lookup and thus rlock acquiring is not needed.
For inbound packets it is safe to pass NULL, because we do not held any
PCB locks yet.

PR:		220217
MFC after:	3 weeks
Sponsored by:	Yandex LLC
2017-07-31 11:04:35 +00:00
Luiz Otavio O Souza
f227f64a64 Remove the unused mutex since r273220.
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-07-28 04:41:57 +00:00