Commit Graph

3981 Commits

Author SHA1 Message Date
Stephen Hurd
5c1d8c4b73 Split out flag manipulation from general context manipulation in iflib
To avoid blocking on the context lock in the swi thread and risk potential
deadlocks, this change protects lighter weight updates that only need to
be consistent with each other with their own lock.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14967
2018-04-10 19:48:24 +00:00
Stephen Hurd
f422673e10 Make BPF global lock an SX
This allows NIC drivers to sleep on polling config operations.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14982
2018-04-10 19:42:50 +00:00
Vincenzo Maffione
4f80b14ce2 netmap: align codebase to upstream version v11.4
Changelist:
  - remove unused nkr_slot_flags
  - new nm_intr adapter callback to enable/disable interrupts
  - remove unused sysctls and document the other sysctls
  - new infrastructure to support NS_MOREFRAG for NIC ports
  - support for external memory allocator (for now linux-only),
    including linux-specific changes in common headers
  - optimizations within netmap pipes datapath
  - improvements on VALE control API
  - new nm_parse() helper function in netmap_user.h
  - various bug fixes and code clean up

Approved by:	hrs (mentor)
2018-04-09 09:24:26 +00:00
Brooks Davis
8a4a4a43f8 Remove the thread argument from ifr_buffer_*() accessors.
They are always used in a context where curthread is the correct thread.
This makes them more similar to the ifr_data_get_ptr() accessor.
2018-04-06 23:25:54 +00:00
Brooks Davis
e7fdc72e95 ifconf(): correct handling of sockaddrs smaller than struct sockaddr.
Portable programs that use SIOCGIFCONF (e.g. traceroute) assume
that each pseudo ifreq is of length MAX(sizeof(struct ifreq),
sizeof(ifr_name) + ifr_addr.sa_len).  For short sockaddrs we copied
too much from the source sockaddr resulting in a heap leak.

I believe only one such sockaddr exists (struct sockaddr_sco which
is 8 bytes) and it is unclear if such sockaddrs end up on interfaces
in practice.  If it did, the result would be an 8 byte heap leak on
current architectures.

admbugs:	869
Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	3 days
Security:	kernel heap leak
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14981
2018-04-06 20:26:56 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Kristof Provost
adfe2f6aff pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.

Limit the allocation to required size (or the user allocation, if that's
smaller). That does mean we need to do the allocation with the rules
lock held (so the number doesn't change while we're doing this), so it
can't M_WAITOK.

MFC after:	1 week
2018-04-06 15:54:30 +00:00
Brooks Davis
756181b8f5 Add 32-bit compat for ioctls that take struct ifgroupreq.
Use an accessor to access ifgr_group and ifgr_groups.

Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such
as "case SIOCAIFGROUP:". This avoids poluting the switch statements
with large numbers of #ifdefs.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14960
2018-04-05 22:14:55 +00:00
Brooks Davis
2443045f30 ifconf(): Always zero the whole struct ifreq.
The previous split of zeroing ifr_name and ifr_addr seperately is safe
on current architectures, but would be unsafe if pointers were larger
than 8 bytes. Combining the zeroing adds no real cost (a few
instructions) and makes the security property easier to verify.

Reviewed by:	kib, emaste
Obtained from:	CheriBSD
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14912
2018-04-05 21:58:28 +00:00
Vincenzo Maffione
46023447b6 netmap: align if_ptnet guest driver to the upstream code (commit 0e15788)
The change upgrades the driver to use the split Communication Status
Block (CSB) format. In this way the variables written by the guest
and read by the host are allocated in a different cacheline than
the variables written by the host and read by the guest; this is
needed to avoid cache thrashing.

Approved by:	hrs (mentor)
2018-04-04 21:31:12 +00:00
Brooks Davis
8708f1bdaf Document and enforce assumptions about struct (in6_)ifreq.
- The two types must be type-punnable for shared members of ifr_ifru.
  This allows compatibility accessors to be shared.

- There must be no padding gap between ifr_name and ifr_ifru.  This is
  assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
  broadly portable.  This is true for all current architectures, but very
  large (256-bit) fat-pointers could violate this invariant.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14910
2018-03-30 21:38:53 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Brooks Davis
69f0fecbd6 Remove infrastructure for token-ring networks.
Reviewed by:	cem, imp, jhb, jmallett
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14875
2018-03-28 23:33:26 +00:00
Brooks Davis
38d958a647 Improve copy-and-pasted versions of SIOCGIFADDR.
The original implementation used a reference to ifr_data and a cast to
do the equivalent of accessing ifr_addr. This was copied multiple
times since 1996.

Approved by:	kib
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14873
2018-03-27 20:51:49 +00:00
Brooks Davis
f8f65519d2 Fix a whitespace bug missed in refactoring prior to r331641.
MFC with:	r331641
2018-03-27 18:55:39 +00:00
Brooks Davis
86d2ef167a Fix access to ifru_buffer on freebsd32.
Make all kernel accesses to ifru_buffer go via access functions
which take the process ABI into account and use an appropriate union
to access members in the correct place in struct ifreq.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14846
2018-03-27 18:26:50 +00:00
Konstantin Belousov
f137973487 Allow to specify PCP on packets not belonging to any VLAN.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful.  See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.

Make it possible to specify PCP value for outgoing packets on an
ethernet interface.  When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value.  The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.

Drivers might have issues with filtering VID 0 packets on
receive.  This bug should be fixed for each driver.

Reviewed by:	ae (previous version), hselasky, melifaro
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D14702
2018-03-27 15:29:32 +00:00
Mark Johnston
18628b74bd Clamp IFLIB_RX_COPY_THRESH to MHLEN in iflib_rxd_pkt_get().
If one has added fields to struct mbuf such that MHLEN is smaller than
this threshold (128), iflib_rxd_pkt_get() may otherwise overrun the
internal mbuf buffer while copying.

Reviewed by:	mmacy
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14843
2018-03-25 23:23:19 +00:00
Kristof Provost
effaab8861 netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by:	ae, kevans
Differential Revision:	https://reviews.freebsd.org/D13715
2018-03-23 16:56:44 +00:00
Alexander V. Chernikov
b2b7ca49dc Use count(9) api for the bpf(4) statistics.
Currently each bfp descriptor uses u64 variables to maintain its counters.
On interfaces with high packet rate this leads to unnecessary contention
and inaccurate reporting.

PR:		kern/205320
Reported by:	elofu17 at hotmail.com
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14726
2018-03-20 22:57:06 +00:00
Alexander V. Chernikov
1435dcd94f Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.
Current arp/nd code relies on the feedback from the datapath indicating
 that the entry is still used. This mechanism is incorporated into the
 arpresolve()/nd6_resolve() routines. After the inpcb route cache
 introduction, the packet path for the locally-originated packets changed,
 passing cached lle pointer to the ether_output() directly. This resulted
 in the arp/ndp entry expire each time exactly after the configured max_age
 interval. During the small window between the ARP/NDP request and reply
 from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
 path used by route cache. Unify the notification code by using single
 inlined function with the per-AF callbacks.

Reported by:	sthaug at nethelp.no
Reviewed by:	ae
MFC after:	2 weeks
2018-03-17 17:05:48 +00:00
Andriy Voskoboinyk
2a440d19c1 Correct comment for IFM_IEEE80211_VHT media variant. 2018-03-15 23:32:29 +00:00
Andrey V. Elsukov
6d5948bbe3 Define ethernet type 0x88A8 as ETHERTYPE_QINQ.
Reviewed by:	kp
Obtained from:	OpenBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14593
2018-03-06 12:01:31 +00:00
Stephen Hurd
226fb85d19 iflib: stop timer callout when stopping
iflib_timer has been seen running after the interface had been removed.
This change prevents that.

Submitted by:	matt.macy@joyent.com
2018-03-02 18:48:07 +00:00
Kristof Provost
bf56a3fe47 pf: Cope with overly large net.pf.states_hashsize
If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.

Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.

PR:		209475
Submitted by:	Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D14367
2018-02-25 08:56:44 +00:00
Ryan Stone
b3b6ff23e7 Allow route change requests to not specify the gateway.
Only require a gateway to be specified on a route add request.  On
a route change request that does not specify the gateway, the
gateway will remain the same.  This allows changing other route
parameters without having to re-specifying the gateway, like in
"route change 10.0.0.0/8 -mtu 9000".

Update the route(8) manpage to explicitly call out this usage
as being supported.

MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Reviewed By: eugen (rtsock.c change), rgrimes
Differential Revision: https://reviews.freebsd.org/D14291
2018-02-21 19:13:23 +00:00
Stephen Hurd
81ad57b1b8 IFLIB: Make isc_magic unsigned
The IFLIB_MAGIC macro is > INT_MAX, so isc_magic should
be able to contain it.

Reported by:	jeb
Sponsored by:	Limelight  Networks
2018-02-21 18:57:00 +00:00
Navdeep Parhar
7cb7c6e37a Catch up with the removal of nktr_slot_flags from upstream netmap. No
functional impact intended.

Submitted by:	Vincenzo Maffione <v.maffione@gmail.com>
2018-02-20 21:42:45 +00:00
Stephen Hurd
a4e5960730 IFLIB: do not remove dmamap on buffer unload
Dmamap is created only on IFC attach. If we remove it on
buffer release, we won't be able to do ifconfig down&up. Only destroy
when in detach.

Reported by:	wma
Reviewed by:	wma
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14060
2018-02-20 18:33:45 +00:00
Wojciech Macek
74549d4b0f BPF: Switch to 32 bit compatible mode only when thread is 32 bit
Sometimes 32 bit and 64 bit ioctls are represented by the same number.
It causes unnecessary switch to 32 bit commpatible mode.

This patch prevents switching when we are dealing with 64 bit executable.
It fixes issue mentioned here

Authored by:           Patryk Duda <pdk@semihalf.com>
Submitted by:          Wojciech Macek <wma@semihalf.com>
Reviewed by:           andrew, wma
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14023
2018-01-25 12:13:41 +00:00
Steven Hartland
6fb1399a4c Added missing CTLFLAG_VNET to lacp default_strict_mode
Added CTLFLAG_VNET to net.link.lagg.lacp.default_strict_mode which was missed
in r290450.

Reported by:	julian@
MFC after:	1 week
Sponsored by:	Multiplay
2018-01-24 10:13:14 +00:00
Ryan Stone
bc3d87fd59 Increment the route table gen count after a modify
Increment the route table generation count after modifying a
route.  This signals back to TCP connections that they need to
update their L2 caches as the gateway for their route may have
changed.  This is a heavier hammer than is needed, strictly
speaking, but route changes will be unlikely enough that the
performance effects of invalidating all connection route caches
should be negligible.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13990
Reviewed by:	karels
2018-01-23 03:15:44 +00:00
Ryan Stone
fc21c53f63 Reduce code duplication for inpcb route caching
Add a new macro to clear both the L3 and L2 route caches, to
hopefully prevent future instances where only the L3 cache was
cleared when both should have been.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13989
Reviewed by:	karels
2018-01-23 03:15:39 +00:00
Ryan Stone
dbeab32f94 Invalidate inpcb LLE cache if cached route is invalidated
When the inpcb route cache is invalidated after a change to the
routing tables, we need to invalidate the LLE cache as well.
Previous to this change packets for the connection would continue
to use the old L2 information from the old L3 gateway, and the
packets for the connection would likely be blackholed.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13988
Reviewed by:	karels
2018-01-23 03:15:35 +00:00
Konstantin Belousov
279e33d489 Fix compat32 for sysctl net.PF_ROUTE...NET_RT_IFLISTL.
Route messages are aligned to the host long type alignment, which
breaks 32bit.

Reported and tested by:	lwhsu
Diagnosed by:	Yuri Pankov <yuripv@icloud.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-22 20:49:17 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
443133416b net*: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:21:51 +00:00
Steven Hartland
fd3bb7aa46 Disabled the use of flowid for lagg by default
Disabled the use of RSS hash from the network card aka flowid for
lagg(4) interfaces by default as it's currently incompatible with
the lacp and loadbalance protocols.

The incompatibility is due to the fact that the flowid isn't know
for the first packet of a new outbound stream which can result in
the hash calculation method changing and hence a stream being
incorrectly split across multiple interfaces during normal
operation.

This can be re-enabled by setting the following in loader.conf:
net.link.lagg.default_use_flowid="1"

Discussed with: kmacy
Sponsored by:	Multiplay
2018-01-04 20:05:47 +00:00
Kristof Provost
5d0020d6d7 pf: Clean all fragments on shutdown
When pf is unloaded, or a vnet jail using pf is stopped we need to
ensure we clean up all fragments, not just the expired ones.
2017-12-31 10:01:31 +00:00
Bryan Venteicher
ac2b436d20 Add macro for vxlan list mutex lock and unlock
This will simplify some later VNET support.

Submitted by:	hrs
MFC after:	2 weeks
2017-12-30 19:49:40 +00:00
Bryan Venteicher
6d7bc5838b Advertise IFCAP_LINKSTAT after r326480 added link status support
MFC after:	2 weeks
2017-12-30 19:35:12 +00:00
Bryan Venteicher
33e0d8f057 Add support for IPv6 scoped addresses to vxlan
MFC after:	2 weeks
2017-12-30 04:03:53 +00:00
Stephen Hurd
9c58cafaa3 Don't pass rids to taskqgroup_attach()
As everywhere else, we want to pass rman_get_start(irq->ii_res).  This
caused set affinity errors when not using MSI-X vectors (legacy and MSI
interrupts).

Reported by:	sbruno
Sponsored by:	Limelight Networks
2017-12-27 20:42:30 +00:00
Stephen Hurd
ca03863c85 Remove assertion that's not true for !EARLY_AP_STARTUP
gtask->gt_taskqueue is NULL when EARLY_AP_STARTUP is not enabled.
Remove assertion to allow this config to work.

Reported by:	oleg
Sponsored by:	Limelight Networks
2017-12-27 19:14:15 +00:00
Stephen Hurd
de13095409 Fix indentation.
Sponsored by:	Limelight Networks
2017-12-27 19:12:32 +00:00
Eitan Adler
caa7e52f3f kernel: Fix several typos and minor errors
- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by:	imp, benno
2017-12-27 03:23:21 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Alexander Kabaev
92f19df431 Do not pass NULL pointer to copyout in if_clone_list.
Sometimes caller is only interested in how many clones
are there and NULL pointer is passed for the destination
buffer. Do not pass it to copyout then.
2017-12-23 16:45:24 +00:00
Alexander Kabaev
ce4ab99d82 Remove some trailing whitespace.
Reviewed by: glebius, ae
Differential Revision: https://reviews.freebsd.org/D10386
2017-12-23 16:24:00 +00:00
Alexander Kabaev
3395dd6eb8 Do not double free the memory in if_clone.
if_clone_attach function will drop the reference on failure  which will
free the if_clone structure. No need to do it second time.

Reviewed by: glebius, ae
Differential Revision: https://reviews.freebsd.org/D10386
2017-12-23 16:23:58 +00:00
Warner Losh
79554b4049 The device tables end with a sentinel in iflib. Don't include the
sentinel in the output.
2017-12-23 04:50:52 +00:00
Warner Losh
d2064cf030 Use '#' rather than some made up name for fields we want to ignore. 2017-12-22 17:53:27 +00:00
Konstantin Belousov
97755e83f5 Fix build for kernels with SCHED_4BSD.
Sponsored by:	The FreeBSD Foundation
2017-12-21 23:05:13 +00:00
Stephen Hurd
25ac1dd5c7 Don't call tcp_lro_rx() unless hardware verified TCP/UDP csum
It seems that tcp_lro_rx() doesn't verify TCP checksums, so
if there are bad checksums in the packets caused by invalid data, the
invalid data will pass through without errors.

This was noticed with the igb driver and a specific internet host:
fetch http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz -o test.bin && sha256 test.bin
Would result in a different value sometimes.

This ends up making LRO require RXCSUM to be enabled, and RXCSUM to
support TCP and UDP checksums.

PR:		224346
Reported by:	gjb
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13561
2017-12-21 01:22:36 +00:00
Li-Wen Hsu
40cf51c438 Add missing ;
Approved by:	kevlo
2017-12-20 06:08:16 +00:00
Stephen Hurd
b103855e18 Support attaching tx queues to cpus
This will attempt to use a different thread/core on the same L2
cache when possible, or use the same cpu as the rx thread when not.
If SMP isn't enabled, don't go looking for cores to use. This is mostly
useful when using shared TX/RX queues.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12446
2017-12-20 01:03:34 +00:00
Stephen Hurd
96fc97c81f Update Matthew Macy contact info
Email address has changed, uses consistent name (Matthew, not Matt)

Reported by:	Matthew Macy <mmacy@mattmacy.io>
Differential Revision:	https://reviews.freebsd.org/D13537
2017-12-19 17:59:00 +00:00
Andrey V. Elsukov
0e253fd12c Fix possible memory leak.
vxlan_ftable entries are sorted in ascending order, due to wrong arguments
order it is possible to stop search before existing element will be found.
Then new element will be allocated in vxlan_ftable_update_locked() and can
be inserted in the list second time or trigger MPASS() assertion with
enabled INVARIANTS.

PR:		224371
MFC after:	1 week
2017-12-16 14:36:21 +00:00
Ryan Stone
19f41c2a52 Plug an ifaddr leak when changing a route's src
If a route is modified in a way that changes the route's source
address (i.e. the address used to access the gateway), then a
reference on the ifaddr representing the old source address will
be leaked if the address type does not have an ifa_rtrequest
method defined.  Plug the leak by releasing the reference in
all cases.

Differential Revision:	https://reviews.freebsd.org/D13417
Reviewed by:	ae
MFC after:	3 weeks
Sponsored by:	Dell
2017-12-14 20:48:50 +00:00
Stephen Hurd
06c47d48dc Increment encap_pad_mbuf_fail when m_dup() fails in padding
Previously, the counter was only incremented when m_append() failed.  Since
the function can also fail on m_dup() now, increment the counter there as
well.

Sponsored by:	Limelight Networks
2017-12-11 20:01:28 +00:00
Stephen Hurd
04993890dc Free mbuf chain when m_dup fails
Fix memory leak where mbuf chain wasn't free()d if iflib_ether_pad()
has a failure in m_dup().

Reported by:	"Ryan Stone" <rysto32@gmail.com>
Sponsored by:	Limelight Networks
2017-12-08 19:50:06 +00:00
Stephen Hurd
a15fbbb8fe Handle read-only mbufs in iflib ether pad function
If ethernet padding is enabled, and a read-only mbuf is passed,
it would modify the mbuf using m_append(). Instead, call m_dup() and
append to the new packet.

Reported by:	Pyun YongHyeon
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13414
2017-12-08 18:43:31 +00:00
Gleb Smirnoff
17eea3202a Garbage collect IFCAP_POLLING_NOCOUNT. It wasn't used since very
beginning of polling(4).  The module always ignored return value
from driver polling handler.
2017-12-06 23:03:34 +00:00
Stephen Hurd
d14c853ba3 iflib: Support to padding Ethernet frames to a min size
Some bnxt devices do not correctly send frames smaller than
52 bytes (without CRC), so add a quirk that will pad frames to an
arbitrary size before passing off to the encap routine.

Reported by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13269
2017-12-05 21:00:31 +00:00
Stephen Hurd
fe1bcada9b Avoid calling CURVNET_[SET|RESTORE] for each packet
The LRO possible test was calling CURVNET_SET once for IPv4 or IPv6 for
each packet in a chain. Only call it once per chain instead.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	cem, ae
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13368
2017-12-05 20:43:24 +00:00
Eric Joyner
b0f3e715fa ifconfig(8): Display extended compliance code string for SFP transceivers
- Updates tables in affected files with new entries from newer spec
revisions of SFF-8472, SFF-8024, and SFF-8636

- Change ifconfig to read and display the extended compliance code for
SFP media if the extended compliance code is not 0. This was being displayed
for QSFP transceivers only, but SFP28 media uses this to report 25G
capability.

Reviewed by:	melifaro, sbruno
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D13286
2017-12-05 18:42:07 +00:00
Bryan Venteicher
93a5a3b019 Add if media and link status events to vxlan
PR:		214359
MFC after:	2 weeks
2017-12-02 22:04:00 +00:00
Stephen Hurd
a027c8e958 Add support for SIOCGIFXMEDIA to iflib
SIOCGIFXMEDIA is required for extended ethernet media types,
but iflib did not support it.

Reported by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13312
2017-12-01 17:58:20 +00:00
Hans Petter Selasky
b58e7aacf6 Properly define the VLAN_XXX() function macros to avoid miscompilation when
used inside "if" statements comparing with another value.

Detailed explanation:
"if (a ? b : c != 0)" is not the same like "if ((a ? b : c) != 0)"
which is the expected behaviour of a function macro.

Affects:
toecore, linuxkpi and ibcore.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-11-30 11:35:22 +00:00
Stephen Hurd
772593dbd9 Fix comment introduced in r326369
The code uses the set of all CPUs, it doesn't zero out the set.

Sponsored by:	Limelight Networks
2017-11-29 18:21:17 +00:00
Stephen Hurd
e516b5353b Ensure that ctx->ifc_cpus is always initialized
If a device didn't support MSI-X, ctx->ifc_cpus would not be initialized,
but the IRQ allocation routines still uses the value.  Move the
initialization to common code.

Sponsored by:	Limelight Networks
2017-11-29 18:14:57 +00:00
Hans Petter Selasky
fa3f256682 Disallow TUN and TAP character device IOCTLs to modify the network device
type to any value. This can cause page faults and panics due to accessing
uninitialized fields in the "struct ifnet" which are specific to the network
device type.

MFC after:	1 week
Found by:	jau@iki.fi
PR:		223767
Sponsored by:	Mellanox Technologies
2017-11-29 09:40:11 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Stephen Hurd
7274b2f6be Fix off-by-one error in bit_nclear() usage
bit_nclear() takes the bit numbers for the start and end bits, not the start
and a count.  This was resulting in memory corruption past the end of the
bitstr_t.

Sponsored by:	Limelight Networks
2017-11-20 21:57:04 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Konstantin Belousov
319a53f5ad Fix build.
Sponsored by:	The FreeBSD Foundation
2017-11-19 11:21:16 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Stephen Hurd
d27352644a Fix default numbers of iflib queue sets
The intent appears to be having one RX/TX queue set per core,
but since scctx->isc_n[tr]xqsets is set to max before calling
iflib_msix_init(), both end up being set to total number of cores.

Use ctx->ifc_sysctl_n[rt]xqs as the selected value and
scctx->isc_n[rt]xqsets as the max. This should result in what appears
to be the intended behaviour

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13096
2017-11-16 18:52:58 +00:00
Antoine Brodin
f5056d933a Do not leak control in raw_usend 2017-11-08 23:20:05 +00:00
Konstantin Belousov
3cf8254f1e Add a place for a driver to report rx timestamps in nanoseconds from
boot for the received packets.

The rcv_tstmp field overlaps the place of Ln header length indicators,
not used by received packets.  The basic pkthdr rearrangement change
in sys/mbuf.h was provided by gallatin.

There are two accompanying M_ flags: M_TSTMP means that there is the
timestamp (and it was generated by hardware).

Another flag M_TSTMP_HPREC indicates that the timestamp is
high-precision.  Practically M_TSTMP_HPREC means that hardware
provided additional precision comparing with the stamps when the flag
is not set.  E.g., for ConnectX all packets are stamped by hardware
when PCIe transaction to write out the completion descriptor is
performed, but PTP packet are stamped on port.  For Intel cards, when
PTP assist is enabled, only PTP packets are stamped in the limited
number of registers, so if Intel cards ever start support this
mechanism, they would always set M_TSTMP | M_TSTMP_HPREC if hardware
timestamp is present for the given packet.

Add IFCAP_HWRXTSTMP interface capability to indicate the support for
hardware rx timestamping, and ifconfig(8) command to toggle it.

Based on the patch by:	gallatin
Reviewed by:	gallatin (previous version), hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks (? mbuf KBI issue)
X-Differential revision:	https://reviews.freebsd.org/D12638
2017-11-07 09:29:14 +00:00
Sean Bruno
abec47242a Fix NOINET/NOINET6 build during compilation of iflib.
Reported by:	kib
2017-11-06 19:54:25 +00:00
Stephen Hurd
35e4e998d8 Only chain non-LRO mbufs when LRO is not possible
Preserve packet order between tcp_lro_rx() and if_input() to avoid
creating extra corner cases. If no packets can be LROed, combine them
into one chain for submission via if_input(). If any packet can
potentially be LROed however, retain old behaviour and call if_input()
for each packet.

This should keep the 12% improvement for small packet forwarding intact,
but mostly avoids impacting the LRO case.

Reviewed by:	cem, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12876
2017-11-06 16:23:21 +00:00
Eugene Grosbein
9f23a54e52 Allow a process to assign an IP address to local ppp interface
even if kernel routing table already has a route to the address in question
installed by some routing daemon (PR 223129).

Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647).

PR:			222647, 223129
Reviewed by:		gnn
Approved by:		avg (mentor), mav (mentor)
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D12747
2017-11-05 14:41:48 +00:00
Kristof Provost
85f330e5fa epair: Fix panic on unload
The VNET_SYSUNINIT() callback is executed after the MOD_UNLOAD. That means
that netisr_unregister() has already been called when
netisr_unregister_vnet() gets calls, leading to an assertion failure.

Restore the expected order of operations by performing everything that
was done in MOD_UNLOAD to a SYSUNINIT() (that will be called after the
VNET_SYSUNINIT()).

Differential Revision:	https://reviews.freebsd.org/D12771
2017-11-01 14:27:26 +00:00
Stephen Hurd
0b6c52b69f Preserve TSO checksum flags
r323941 incorrectly disabled TSO flags based on MTU.

Reported by:	Yuri Pankov <yuripv@gmx.com>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12880
2017-10-31 19:03:35 +00:00
Stephen Hurd
a1b799ca5b Fix PR221990 - Assertion at iflib.c:1947
ifl_pidx and ifl_credits are going out of sync in _iflib_fl_refill() as they
use different update log.  Use the same update logic for both, and add a
final call to isc_rxd_refill() to handle early exits from the loop.

PR:		221990
Reported by:	pho
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12798
2017-10-31 17:50:42 +00:00
Stephen Hurd
10e0d93811 Fix build with nodevice netmap
iru_init() was declared and used outside the DEV_NETMAP
conditional blocks, but was implemented inside one. Move the
implementation out of the DEV_NETMAP block to allow building with
netmap disabled.

Reported by:	Andrew Turner <andrew@fubar.geek.nz>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12842
2017-10-31 02:49:28 +00:00
Stephen Hurd
09b57b7f40 bnxt: HW_LRO Rx Pkt with > 32 fragments caused Crash (iflib)
Broadcom NIC with HW_LRO setting max_agg_segs >= 6 can generate Rx pkt with
64 (2^6) fragments, modify IFLIB_MAX_RX_SEGS to 64 to avoid memory
corruption / Crash.

Submitted by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D12774
2017-10-30 21:20:33 +00:00
Stephen Hurd
2d873474b2 Fix PR222744 - netmap errors with iflib em driver
Fix error when refilling netmap buffers that resulted in the first
buffer of the successive passes through ifl_bus_addrs[] leaving the
first value unset (tmp_pidx started at 1, not zero after the first time
through the loop).

Leave the one unused buffer required by some NICs visible in the netmap
ring rather than hidden. There will always be a buffer in use by the
kernel now when an iflib driver is used via netmap.

Always get the netmap slot index via netmap_idx_n2k() to account for
nkr_hwofs in a consistent way.

Split shared functionality into new functions.
iru_init(): shared by _iflib_fl_refill() and netmap_fl_refill()
netmap_fl_refill(): shared by iflib_netmap_rxsync() and
iflib_netmap_rxq_init()

PR:		222744
Reported by:	Shirkdog <mshirk@daemon-security.com>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12769
2017-10-30 21:14:31 +00:00
Stephen Hurd
0fdea53954 Avoid enabling MSI-X if MSI-X is disabled globally
It was reported on the community call that with
hw.pci.enable_msix=0, iflib would enable MSI-X on the device and attempt
to use it, which caused issues. Test the sysctl explicitly and do not
enable MSI-X if it's disabled globally.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12805
2017-10-30 21:08:12 +00:00
Stephen Hurd
3429c02f82 Some cache related optimizations
1. prefetch 128 bytes of mbufs.
2. Re-order filling the pkt_info so cache stalls happen at the end
3. Define empty prefetch2cachelines() macro when the function isn't present.

Provides small performance improvments on some hardware

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12447
2017-10-23 20:50:08 +00:00
Bjoern A. Zeeb
8e94025b41 With r181803 on 2008-08-17 23:27:27Z the first VIMAGE commit went into
HEAD.  Enable VIMAGE in GENERIC kernels and some others (where GENERIC does
not exist) on HEAD.

Disable building LINT-VIMAGE with VIMAGE being default.

This should give it a lot more exposure in the run-up to 12 to help
us evaluate whether to keep it on by default or not.
We are also hoping to get better performance testing.
The feature can be disabled using nooptions.

Requested by:		many
Reviewed by:		kristof, emaste, hiren
X-MFC after:		never
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D12639
2017-10-20 21:40:59 +00:00
Andriy Voskoboinyk
c64c1f95a7 ifnet(9): split ifc_alloc_unit() (should simplify code flow)
Allocate smallest unit number from pool via ifc_alloc_unit_next()
and exact unit number (if available) via ifc_alloc_unit_specific().

While here, address possible deadlock (mentioned in PR).

PR:		217401
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D12551
2017-10-16 21:21:31 +00:00
Sepherosa Ziehau
2832cbe7d2 rss: Remove never defined UDP_IPV4_EX
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12455
2017-10-11 06:08:01 +00:00
Stephen Hurd
1c0054d261 Fix "taskqgroup_attach: setaffinity failed: 3" with iflib drivers
Improved logging added in r323879 exposed an error during
attach. We need the irq, not the rid to work correctly. em uses
shared irqs, so it will use the same irq for TX as RX. bnxt does
not use shared irqs, or TX irqs at all, so there's no need to set
the TX irq affinity.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12496
2017-10-05 14:43:30 +00:00
Conrad Meyer
916616c4c5 Add PNP metadata to more drivers
GPUs: radeonkms, i915kms
NICs: if_em, if_igb, if_bnxt

This metadata isn't used yet, but it will be handy to have later to
implement automatic module loading.

Reviewed by:	imp, mmacy
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12488
2017-09-26 23:23:58 +00:00
Stephen Hurd
1225d9da9f Have ifmp_ring_enqueue() abdicate instead of switch to a consumer
Move TX out of the enqueue() path. As a result, we need
to have ifmp_ring_check_drainage() pick up from the abdicate state.

We also need to either enqueue the TX task, or check drainage
after calling ifmp_ring_enqueue() to ensure it's sent.

This change results in a 30% small packet forwarding improvement.

Reviewed by:	olivier, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12439
2017-09-23 16:46:30 +00:00
Stephen Hurd
f4d2154e0c Make the rx budget a tunable
This allows tuning the rx budget for special load profiles
as well as more easily testing to determine sane defaults.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12445
2017-09-23 01:37:01 +00:00
Stephen Hurd
20f63282f8 Chain mbufs before passing to if_input()
Build a list of mbufs to pass to if_input() after LRO. Results in
12% small packet forwarding rate improvement.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12444
2017-09-23 01:35:14 +00:00
Stephen Hurd
c5cf217261 Some small packet performance improvements
If the packet is smaller than MTU, disable the TSO flags.
Move TCP header parsing inside the IS_TSO?() test.
Add a new IFLIB_NEED_ZERO_CSUM flag to indicate the checksums need to be zeroed before TX.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12442
2017-09-23 01:33:20 +00:00
Kristof Provost
ed9de14d2f bridge: Set module version
This ensures that the loader will not load the module if it's also built in to
the kernel.

PR:		220860
Submitted by:	Eugene Grosbein <eugen@freebsd.org>
Reported by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
2017-09-21 14:14:01 +00:00
Stephen Hurd
d0d0ad0ae2 Fix iflib netmap RX
RXQ setup for netmap was broken because netmap_rxq_init was getting called
before IFDI_INIT - thus we ended up with ring tail pointer being reset to zero.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12140
2017-09-20 20:40:49 +00:00
Alan Cox
e9bfbb02c5 In r288122, we changed vm_page_unwire() so that it returns a Boolean
indicating whether the page's wire count transitioned to zero.  Use that
return value in zbuf_page_free() rather than checking the wire count.

MFC after:	1 week
2017-09-20 04:59:52 +00:00
Stephen Hurd
ab2e3f7958 Revert r323516 (iflib rollup)
This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.

I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.

Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
2017-09-16 02:41:38 +00:00
Stephen Hurd
d300df0182 Roll up iflib commits from github. This pulls in most of the work done
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/

This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree.  The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.

Here is a summary of changes included in this patch:

1)  More checks when INVARIANTS are enabled for eariler problem
    detection
2)  Group Task Queue cleanups
    - Fix use of duplicate shortdesc for gtaskqueue malloc type.
      Some interfaces such as memguard(9) use the short description to
      identify malloc types, so duplicates should be avoided.
3)  Allow gtaskqueues to use ithreads in addition to taskqueues
    - In some cases, this can improve performance
4)  Better logging when taskqgroup_attach*() fails to set interrupt
    affinity.
5)  Do not start gtaskqueues until they're needed
6)  Have mp_ring enqueue function enter the ABDICATED rather than BUSY
    state.  This moves the TX to the gtaskq and allows processing to
    continue faster as well as make TX batching more likely.
7)  Add an ift_txd_errata function to struct if_txrx.  This allows
    drivers to inspect/modify mbufs before transmission.
8)  Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
    checksums zeroed for checksum offload to work.  This avoids modifying
    packet data in the TX path when possible.
9)  Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B.  We
    often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
    bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
    This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
    This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
    rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
    before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
    IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
    especially when RX and TX share an interrupt.  RX will attempt to
    take the first threads on a core, and TX will attempt to take
    successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
    cpus an interrupt has affinity with.  This allows TX queues to
    ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
    - timer_int - interval at which to run per-queue timers in ticks
    - force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
    - rx_budget allows tuning the batch size on the RX path
    - watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
    IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
    when waiting for firmware
    - After interrupts are enabled, convert all waits to sleeps
    - Eliminates e1000 software/firmware synchronization busy waits after
      startup
25) e1000: Remove special case for budget=1 in em_txrx.c
    - Premature optimization which may actually be incorrect with
      multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
    RX and TX.
    - Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
    Much easier to understand separate functions and "if (is_igb)" than
    previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"

#blamebruno

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12235
2017-09-13 01:18:42 +00:00
Matt Joras
fdbf11746a Allow vlan interfaces to rx through netmap(4).
Normally after receiving a packet, a vlan(4) interface sends the packet
back through its parent interface's rx routine so that it can be
processed as an untagged frame. It does this by using the parent's
ifp->if_input. This is incompatible with netmap(4), which replaces the
vlan(4) interface's if_input with a netmap(4) hook. Fix this by using
the vlan(4) interface's ifp instead of the parent's directly.

Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	rstone
Approved by:	rstone (mentor)
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12191
2017-09-13 00:25:09 +00:00
Navdeep Parhar
aa0186bc36 Make LACP based lagg work with interfaces (like 100Gbps and 25Gbps) that
report extended media types.

lacp_aggregator_bandwidth() uses the media to determine the speed of the
interface and returns 0 for IFM_OTHER without the bits in the extended
range.

Reported by:	kbowling@
Reviewed by:	eugen_grosbein.net, mjoras@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D12188
2017-09-06 14:36:35 +00:00
Hans Petter Selasky
95ed5015ec Add support for generic backpressure indicator for ratelimited
transmit queues aswell as non-ratelimited ones.

Add the required structure bits in order to support a backpressure
indication with ratelimited connections aswell as non-ratelimited
ones. The backpressure indicator is a value between zero and 65535
inclusivly, indicating if the destination transmit queue is empty or
full respectivly. Applications can use this value as a decision point
for when to stop transmitting data to avoid endless ENOBUFS error
codes upon transmitting an mbuf. This indicator is also useful to
reduce the latency for ratelimited queues.

Reviewed by:		gallatin, kib, gnn
Differential Revision:	https://reviews.freebsd.org/D11518
Sponsored by:		Mellanox Technologies
2017-09-06 13:56:18 +00:00
Sepherosa Ziehau
0f3af0411d if: Add ioctls to get RSS key and hash type/function.
It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299
https://svnweb.freebsd.org/base?view=revision&revision=322485

These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.

Setting RSS key and hash type/function is a different story,
which probably requires more discussion.

Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.

Reviewed by:	gallatin
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12174
2017-09-05 05:28:52 +00:00
Gleb Smirnoff
2cc3b2eec5 Do not abuse flag that is clearly marked as unused.
This creates conflicts with FreeBSD variations that may use it.  The
usage of the flag M_TOOBIG is limited to iflib queue, thus using
one of M_PROTO flags is fine.  There is no need to grab global flag.

Silence from:	kmacy, sbruno (2 weeks)
2017-08-31 23:19:18 +00:00
Sean Bruno
a969350226 Revert r323008 and its conversion of e1000/iflib to using SX locks.
This seems to be missing something on the 82574L causing NFS root mounts
to hang.

Reported by:	kib
2017-08-30 18:56:24 +00:00
Sean Bruno
e17e5b4134 Continuation of lock cleanup in e1000.
Post-cold sleep instead of DELAY when waiting for firmware.

Convert softc mutex to an SX lock.  Change all waits to sleeps
once interrupts are enabled (and it is safe to sleep).

Submitted by:	Matt Macy <matt@mattmacy.io>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12101
2017-08-30 00:20:43 +00:00
Gleb Smirnoff
dcfa556b02 Garbage collect RT_NORTREF, which is no longer in use after FLOWTABLE removal. 2017-08-24 23:08:12 +00:00
Sean Bruno
21e10b16e0 iflib: call device's if_init function during vlan initialization.
Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	shurd
Sponsored by:	Broadcom
Differential Revision:	https://reviews.freebsd.org/D12098
2017-08-23 21:49:56 +00:00
Kristof Provost
9ce40d321d bpf: Fix incorrect cleanup
Cleaning up a bpf_if is a two stage process. We first move it to the
bpf_freelist (in bpfdetach()) and only later do we actually free it (in
bpf_ifdetach()).

We cannot set the ifp->if_bpf to NULL from bpf_ifdetach() because it's
possible that the ifnet has already gone away, or that it has been assigned
a new bpf_if.
This can lead to a struct ifnet which is up, but has if_bpf set to NULL,
which will panic when we try to send the next packet.

Keep track of the pointer to the bpf_if (because it's not always
ifp->if_bpf), and NULL it immediately in bpfdetach().

PR:		213896
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11782
2017-08-16 19:40:07 +00:00
Matt Joras
d148c2a2b1 Rework vlan(4) locking.
Previously the locking of vlan(4) interfaces was not very comprehensive.
Particularly there was very little protection against the destruction of
active vlan(4) interfaces or concurrent modification of a vlan(4)
interface. The former readily produced several different panics.

The changes can be summarized as using two global vlan locks (an
rmlock(9) and an sx(9)) to protect accesses to the if_vlantrunk field of
struct ifnet, in addition to other places where global exclusive access
is required. vlan(4) should now be much more resilient to the destruction
of active interfaces and concurrent calls into the configuration path.

PR:	220980
Reviewed by:	ae, markj, mav, rstone
Approved by:	rstone (mentor)
MFC after:	4 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11370
2017-08-15 17:52:37 +00:00
Sean Bruno
5c5ca36ca2 Don't leak mbufs if clusers exceeds the number of segments. This would
leak mbufs over time causing crashes.

PR:		221202
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	gergely.czuczy@harmless.hu
Sponsored by:	Limelight Networks
2017-08-10 03:43:23 +00:00
Sean Bruno
18a660b344 Export IFCAP_HWSTATS so that we don't experience double stats counting
on iflib enabled devices.

PR:		220198
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	Ben Woods <woodsb02@freebsd.org>
Sponsored by:	Limelight Networks
2017-08-10 03:11:05 +00:00
Andrey V. Elsukov
95e8b991ca Add to if_enc(4) ability to capture packets via BPF after pfil processing.
New flag 0x4 can be configured in net.enc.[in|out].ipsec_bpf_mask.
When it is set, if_enc(4) additionally captures a packet via BPF after
invoking pfil hook. This may be useful for debugging.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D11804
2017-08-09 12:24:07 +00:00
Andrey V. Elsukov
1a01e0e7ac Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
from enc_hhook().

This should solve the problem when pf is used with if_enc(4) interface,
and outbound packet with existing PCB checked by pf, and this leads to
deadlock due to pf does its own PCB lookup and tries to take rlock when
wlock is already held.

Now we pass PCB pointer if it is known to the pfil hook, this helps to
avoid extra PCB lookup and thus rlock acquiring is not needed.
For inbound packets it is safe to pass NULL, because we do not held any
PCB locks yet.

PR:		220217
MFC after:	3 weeks
Sponsored by:	Yandex LLC
2017-07-31 11:04:35 +00:00
Luiz Otavio O Souza
f227f64a64 Remove the unused mutex since r273220.
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-07-28 04:41:57 +00:00
Sean Bruno
9d35858f48 Slight restructure of iflib_busdma_load_mbuf_sg() to fix accounting
when m_collapse() fails.

Submitted by:	krzystof.galazka@intel.com
Reviewed by:	Jeb Cramer <cramerj@intel.com>
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D11476
2017-07-27 22:53:47 +00:00
Sean Bruno
9bc7588cb6 Deprecate unused int isc_max_txqsets and int isc_max_rxqsets as they
were redundant and not being used to set anything up.

Submitted by:	Matt Macy <mmacy@mattmacy.io>
Reported by:	Jeb Cramer <cramerj@intel.com>
Sponsored by:	Limelight Networks
2017-07-27 21:21:43 +00:00
Bjoern A. Zeeb
ae69ad884d After inpcb route caching was put back in place there is no need for
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).

Reviewed by:		np
Differential Revision:	https://reviews.freebsd.org/D11448
2017-07-27 13:03:36 +00:00
Sean Bruno
51352d9d81 Don't hold the RM lock during lagg_proto_addport() to avoid an LOR.
Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
Reviewed by:	mav
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11711
2017-07-25 14:41:50 +00:00
Sepherosa Ziehau
86a8f5ff2a rndis: Add LINK_SPEED_CHANGE status
Reviewed by:	hselasky
MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11650
2017-07-24 03:59:50 +00:00
Sepherosa Ziehau
8819ad852f ethernet: Add ethernet interface attached event and devctl notification.
ifnet_arrival_event may not be adequate under certain situation; e.g.
when the LLADDR is needed.  So the ethernet ifattach event is announced
after all necessary bits are setup.

MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11617
2017-07-24 03:32:10 +00:00
Luiz Otavio O Souza
f9eece461b Update netmap_user.h with the current version of netmap. This file should
have been committed together with r319881.

MFC after:	1 week
MFC with:	r319881
Pointy hat to:	loos
2017-07-21 03:42:09 +00:00
Dimitry Andric
9d0a88de9b Fix printf format warning in iflib.c
Clang 5.0.0 got better warnings about printf format strings using %zd,
and this leads to the following -Werror warning on e.g. arm:

    sys/net/iflib.c:1517:8: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
                                              sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
                                              ^~~~~~~~~~~~~~~~~~~~
    sys/net/iflib.c:1517:41: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
                                              sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
                                                                               ^~~~~~~~~~~~~~~~~~~~~~~

Fix this by casting bus_size_t arguments to uintmax_t, and using %ju
instead.

Reviewed by:	emaste
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D11679
2017-07-20 20:28:31 +00:00
Sean Bruno
dcbc025ff1 Don't cache mbuf pointers if the number of descriptors is greater than
the number of buffers.

Submitted by:	Matt Macy <mmacy@mattmacy.io>
Sponsored by:	Limelight Networks
2017-07-19 21:18:04 +00:00
Sean Bruno
25d528119f iflib - flib_busdma_load_mbuf_sg used isc_tx_maxsize as max semgent size.
Submitted by:	krzysztof.galazka@intel.com
Differential Revision:	https://reviews.freebsd.org/D11403
2017-07-03 19:23:45 +00:00
Sean Bruno
87890dbaf6 bnxt(4) Enable LRO support, redux
iflib - reset fl-ifl_fragidx to 0 on iflib_fl_bufs_free().  This caused the
panic in em/igb when adding it to a bridge device.

iflib - Handle out of order packet delivery from hardware in support of LRO

Out of order updates to rxd's is fixed in r315217. However, it is not
completely fixed.  While refilling the buffers, iflib is not considering
the out of order descriptors. Hence, it is refilling sequentially.
"idx" variable in _iflib_fl_refill routine is incremented sequentially.
By doing refilling sequentially, it will override the SGEs that
are *IN USE* by other connections.  Fix is to maintain a bitmap of
rx descriptors and differentiate the used one with unused one and
refill only at the unused indices.  This patch also fixes a
few bugs in bnxt, related to the same feature.

Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	venkatkumar.duvvuru@broadcom.com shurd
Differential Revision:	https://reviews.freebsd.org/D10681
2017-07-03 18:23:35 +00:00
Jason A. Harmening
eb36b1d0bc Clean up MD pollution of bus_dma.h:
--Remove special-case handling of sparc64 bus_dmamap* functions.
  Replace with a more generic mechanism that allows MD busdma
  implementations to generate inline mapping functions by
  defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>.  This
  is currently useful for sparc64, x86, and arm64, which all
  implement non-load dmamap operations as simple wrappers
  around map objects which may be bus- or device-specific.

--Remove NULL-checked bus_dmamap macros.  Implement the
  equivalent NULL checks in the inlined x86 implementation.
  For non-x86 platforms, these checks are a minor pessimization
  as those platforms do not currently allow NULL maps.  NULL
  maps were originally allowed on arm64, which appears to have
  been the motivation behind adding arm[64]-specific barriers
  to bus_dma.h, but that support was removed in r299463.

--Simplify the internal interface used by the bus_dmamap_load*
  variants and move it to bus_dma_internal.h

--Fix some drivers that directly include sys/bus_dma.h
  despite the recommendations of bus_dma(9)

Reviewed by:	kib (previous revision), marius
Differential Revision:	https://reviews.freebsd.org/D10729
2017-07-01 05:35:29 +00:00
Justin Hibbits
b436609213 Update comments and simplify conditionals for compat32
Only amd64 (because of i386) needs 32-bit time_t compat now, everything else is
64-bit time_t.  Rather than checking on all 64-bit time_t archs, only check the
oddball amd64/i386.

Reviewed By: emaste, kib, andrew
Differential Revision: https://reviews.freebsd.org/D11364
2017-06-27 01:29:10 +00:00
Justin Hibbits
fbcf7bcdf4 Solve the y2038 problem for powerpc
AKA Make time_t 64 bits on powerpc(32).

PowerPC currently (until now) was one of two architectures with a 32-bit time_t
on 32-bit archs (the other being i386).  This is an ABI breakage, so all ports,
and all local binaries, *must* be recompiled.

Tested by:	andreast, others
MFC after:	Never
Relnotes:	Yes
2017-06-26 02:25:19 +00:00
Sean Bruno
fa5416a819 Revert r319989 "bnxt(4) Enable LRO support"
This generates startup LORs and panics when adding elements to bridge
devices. I will document further in https://reviews.freebsd.org/D10681

PR:	220073
Submitted by:	dchagin
Reported by:	db
2017-06-17 17:42:52 +00:00
Sean Bruno
51a621f7c0 bnxt(4) Enable LRO support
iflib - Handle out of order packet delivery from hardware in support of LRO

Out of order updates to rxd's is fixed in r315217. However, it is not
completely fixed.  While refilling the buffers, iflib is not considering
the out of order descriptors. Hence, it is refilling sequentially.
"idx" variable in _iflib_fl_refill routine is incremented sequentially.
By doing refilling sequentially, it will override the SGEs that
are *IN USE* by other connections.  Fix is to maintain a bitmap of
rx descriptors and differentiate the used one with unused one and
refill only at the unused indices.  This patch also fixes a
few bugs in bnxt, related to the same feature.

Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	shurd@
Differential Revision:	https://reviews.freebsd.org/D10681
2017-06-15 21:06:03 +00:00
Sean Bruno
3600bd1e38 Revert r319921 which seems to cause NFS booting assertion panics in
various configurations.

Reported by:	pho@
2017-06-15 17:46:20 +00:00
Sean Bruno
f7587db036 Add new sysctl to allow changing of timing of the txq timers.
Add new sysctl to override use of busdma in the driver.

Submitted by:	Drew Gallitin <gallatin@netflix.com>
2017-06-13 23:16:38 +00:00
Sean Bruno
aa8fa07cd8 Plug mbuf leak in the busdma path of iflib.
Submitted by:	Michael Tuexen <tuexen@freebsd.org>
Reported by:	Drew Gallitin <gallatin@netflix.com>
2017-06-13 19:32:23 +00:00
Andrey V. Elsukov
b83aa367a5 Resurrect RTF_RNH_LOCKED flag and restore ability to call rtalloc1_fib()
with acquired RIB lock.

This fixes a possible panic due to trying to acquire RIB rlock when it is
already exclusive locked.

PR:		215963, 215122
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-06-13 10:52:31 +00:00
Alexander Motin
41cf0d54a2 Call VLAN_CAPABILITIES() when LAGG capabilities change.
This makes VLAN on top of LAGG to expose proper capabilities if they are
changed after creation.

MFC after:	1 week
2017-05-26 22:22:48 +00:00
Alexander Motin
8403ab7919 Improve applying unified capabilities to the lagg ports.
Some NICs have some capabilities dependent, so that disabling one require
disabling some other (TXCSUM/RXCSUM on em).  This code tries to reach the
consensus more insistently.

PR:		219453
MFC after:	1 week
2017-05-26 20:15:33 +00:00
Alexander Motin
e3d90506c4 Remove some code, dead from the day one. 2017-05-25 23:19:09 +00:00
Alexander Motin
9bcf3ae4c7 Add parent interface reference counting to if_vlan.
Using plain ifunit() looks like a request for troubles.

MFC after:	1 week
2017-05-23 00:13:27 +00:00
Sepherosa Ziehau
ae0669e390 net/vlan: Revert 305177
Miss read the parentheses.

Reported by:	oleg@
Reviewed by:	hps@
MFC after:	3 days
Sponsored by:	Microsoft
2017-05-19 01:42:31 +00:00
Ed Maste
3e85b721d6 Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10193
2017-05-17 00:34:34 +00:00
Ravi Pokala
5b1a5e45b2 Persistently store NIC's hardware MAC address, and add a way to retrive it
An earlier version of r318160 allocated if_hw_addr unconditionally; when it
became conditional, I forgot to check for NULL in ether_ifattach().

Reviewed by:	kp
MFC after:	1 week
MFC with:	r318160
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D10678
Pointy-hat to:	rpokala
2017-05-11 06:46:39 +00:00
Ravi Pokala
ddae57504b Persistently store NIC's hardware MAC address, and add a way to retrive it
The MAC address reported by `ifconfig ${nic} ether' does not always match
the address in the hardware, as reported by the driver during attach. In
particular, NICs which are components of a lagg(4) interface all report the
same MAC.

When attaching, the NIC driver passes the MAC address it read from the
hardware as an argument to ether_ifattach(). Keep a second copy of it, and
create ioctl(SIOCGHWADDR) to return it. Teach `ifconfig' to report it along
with the active MAC address.

PR:		194386
Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D10609
2017-05-10 22:13:47 +00:00
Eric Joyner
6e105d4e35 Add several new media types to if_media.h
These include several 25G types (for active direct attach cables and LR modules),
and a missing type for 10G active direct attach.

Differential Revision:	https://reviews.freebsd.org/D10425
Reviewed by:	smh, imp
MFC after:	3 days
Sponsored by:	Intel Corporation
2017-05-10 18:33:40 +00:00