Commit Graph

4079 Commits

Author SHA1 Message Date
mjg
08fabf55c9 uma: fix up r334824
Turns out there is code which ends up passing M_ZERO to counters.
Since counters zero unconditionally on their own, just ignore drop the
flag in that place.
2018-06-08 05:40:36 +00:00
mmacy
69a922f7ab rtentry_zinit: don't blindly pass through M_ZERO to counter alloc 2018-06-08 05:17:06 +00:00
erj
0ac17051d5 iflib: Record TCP checksum info in iflib when TCP checksum is requested
ixl(4) (when it switches over to using iflib) devices need the TCP header
length in order to do TCP checksum offload.

Reviewed by:	gallatin@, shurd@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15558
2018-06-07 13:03:07 +00:00
ae
d1ee857bcf Rework if_gif(4) to use new encap_lookup_t method to speedup lookup
of needed interface when many gif interfaces are present.

Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations.
Use hash table to speedup lookup of needed softc. Interfaces
with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST.
Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed
16 years ago, and actually it could work only for outbound direction.
Each protocol, that can be handled by if_gif(4) interface is registered
by separate encap handler, this helps avoid invoking the handler
for unrelated protocols (GRE, PIM, etc.).

This change allows dramatically improve performance when many gif(4)
interfaces are used.

Sponsored by:	Yandex LLC
2018-06-05 21:24:59 +00:00
ae
dfbd18b5fe Rework IP encapsulation handling code.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
  data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
  families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
  that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
  registered, encapcheck handler of each interface is invoked for each
  packet. The search takes O(n) for n interfaces. All this work is done
  with exclusive lock held.

What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
  addedr: min_length is the minimum packet length, that encapsulation
  handler expects to see; exact_match is maximum number of bits, that
  can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
  targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
  many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
  method was used from this structure, so I don't see the need to keep
  using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
  pointer. Now it is passed directly trough encap_input_t method.
  encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
  any code in the tree that uses them. All consumers use encap_attach_func()
  method, that relies on invoking of encapcheck() to determine the needed
  handler.
- introduced struct encap_config, it contains parameters of encap handler
  that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
  handlers that need more bits to match will be checked first, and if
  encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.

Reviewed by:	mmacy
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15617
2018-06-05 20:51:01 +00:00
mmacy
3fe03791ac Reduce overhead of entropy collection
- move harvest mask check inline
- move harvest mask to frequently_read out of actively
  modified cache line
- disable ether_input collection and describe its limitations
  in NOTES

Typically entropy collection in ether_input was stirring zero
in to the entropy pool while at the same time greatly reducing
max pps. This indicates that perhaps we should more closely
scrutinize how much entropy we're getting from a given source
as well as what our actual entropy collection needs are for
seeding Yarrow.

Reviewed by: cem, gallatin, delphij
Approved by: secteam
Differential Revision: https://reviews.freebsd.org/D15526
2018-05-31 21:53:07 +00:00
hselasky
34e8800cc7 Re-apply r190640.
- Restore local change to include <net/bpf.h> inside pcap.h.
This fixes ports build problems.
- Update local copy of dlt.h with new DLT types.
- Revert no longer needed <net/bpf.h> includes which were added
as part of r334277.

Suggested by:	antoine@, delphij@, np@
MFC after:	3 weeks
Sponsored by:	Mellanox Technologies
2018-05-31 09:11:21 +00:00
mmacy
a5d5b1e5b9 if_setlladdr: don't call ioctl in epoch context
PR: 228612
Reported by: markj
2018-05-30 21:46:10 +00:00
kp
fbe5f2b7e0 pf: Add missing include statement
rmlocks require <sys/lock.h> as well as <sys/rmlock.h>.
Unbreak mips build.
2018-05-30 12:40:37 +00:00
kp
de7905d658 pf: Replace rwlock on PF_RULES_LOCK with rmlock
Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.

While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.

Submitted by:	farrokhi@
Differential Revision:	https://reviews.freebsd.org/D15502
2018-05-30 07:11:33 +00:00
shurd
40a1e4b33c iflib: mark irq allocation name parameter as constant
The *name parameter passed to iflib_irq_alloc_generic and
iflib_softirq_alloc_generic is never modified. Many places in code pass
string literals and thus should not be modified.

Mark the *name parameter as a const char * instead, so that we enforce
that the name is not modified before passing to bus_describe_intr()

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	kmacy
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15343
2018-05-29 21:56:39 +00:00
mmacy
3b32e27457 iflib: hold context lock across detach for drivers that need it 2018-05-29 18:03:43 +00:00
mmacy
fd829508af rt_getifa_fib: don't use ifa but info->rti_ifa
Reported by:	kp
2018-05-29 07:14:57 +00:00
mmacy
722df2d2de route: fix missed ref adds
- ensure that we bump the ifa ref whenever we add a reference
 - defer freeing epoch protected references until after the if_purgaddrs
   loop
2018-05-29 00:53:53 +00:00
erj
701350ae28 iflib: Add new shared flag: IFLIB_ADMIN_ALWAYS_RUN
ixl(4)'s nvmupdate utility expects the nvmupdate process to run
while the interface is down; these nvm update commands use the
admin queue, so the admin queue needs to be able to generate
interrupts and be processed while the interface is down.

So add a flag that ixl(4) sets that lets the entire admin task
run even when the interface is marked down/IFF_DRV_RUNNING isn't set.

With this change, nvmupdate should function like it did pre-iflib.

Reviewed by:	gallatin@, sbruno@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15575
2018-05-26 00:46:08 +00:00
mmacy
bef06dbd7a rtrequest1_fib: we need to always bump the ifaddr refcount when we take a reference from
an rtentry. r334118 introduced a case when this was not done.

While we're here make the intent more obvious by moving the refcount
bump down to when we know we'll actually need it.

Reported by:	markj
2018-05-25 19:48:26 +00:00
mmacy
c937b516d8 CK: update consumers to use CK macros across the board
r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
2018-05-24 23:21:23 +00:00
mmacy
710b4829e5 if_delgroups: add missed unlock introduced by r334118 2018-05-24 17:54:08 +00:00
mmacy
ecd6e9d307 UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
pizzamig
fbb6c8b691 Improve MAC address uniqueness on if_epair(4).
As reported in PR184149, it can happen that epair devices can have the same
MAC address.
This solution is based on a 32-bit hash, obtained combining the if_index of
the a interface and the hostid.
If the hostid is zero, a random number is used.

PR:		184149
Reviewed by:	wollman, eugen
Approved by:	cognet
Differential Revision:	https://reviews.freebsd.org/D15329
2018-05-23 13:10:57 +00:00
markj
c354f4f2f6 Simplify lagg_input().
No functional change intended.

MFC after:	2 weeks
2018-05-22 15:35:38 +00:00
mmacy
8e61308048 ck: simplify interface with libkvm consumers by defining ck_queue types
as their queue.h equivalents if !_KERNEL
2018-05-21 01:53:23 +00:00
mmacy
da84b7fa5a net: fix uninitialized variable warning 2018-05-19 19:00:04 +00:00
mmacy
d814acfa7c mp_ring: fix i386
Even though 64-bit atomics are supported on i386 there are panics
indicating that the code does not work correctly there. Switch
to mutex based variant (and fix that while we're here).

Reported by:	pho, kib
2018-05-19 16:44:12 +00:00
mmacy
0db6398617 net: fix set but not used 2018-05-19 05:27:49 +00:00
mmacy
7aeac9ef18 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
mmacy
9f0d447325 epoch(9): allocate net epochs earlier in boot 2018-05-18 18:48:00 +00:00
mmacy
2f5a774893 epoch: move epoch variables to read mostly section 2018-05-18 17:58:15 +00:00
emaste
f0cc1a044c Use NULL for SYSINIT's last arg, which is a pointer type
Sponsored by:	The FreeBSD Foundation
2018-05-18 17:58:09 +00:00
mmacy
a48d80f193 epoch(9): Make epochs non-preemptible by default
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.

Reported by:	markj
2018-05-18 17:29:43 +00:00
mmacy
aac2a8081e epoch: add non-preemptible "critical" variant
adds:
- epoch_enter_critical() - can be called inside a different epoch,
  starts a section that will acquire any MTX_DEF mutexes or do
  anything that might sleep.
- epoch_exit_critical() - corresponding exit call
- epoch_wait_critical() - wait variant that is guaranteed that any
  threads in a section are running.
- epoch_global_critical - an epoch_wait_critical safe epoch instance

Requested by:   markj
Approved by:	sbruno
2018-05-18 01:52:51 +00:00
mmacy
00950b6e0c Fix !netmap build post r333686
Approved by:	sbruno
2018-05-16 22:25:47 +00:00
shurd
df10b02879 Work around lack of TX IRQs in iflib for netmap
When poll() is called via netmap, txsync is initially called,
and if there are no available buffers to reclaim, it waits for the driver
to notify of new buffers. Since the TX IRQ is generally not used in iflib
drivers, this ends up causing a timeout.

Work around this by having the reclaim DELAY(1) if it's initially unable
to reclaim anything, then schedule the tx task, which will spin by
continuously rescheduling itself until some buffers are reclaimed. In
general, the delay is enough to allow some buffers to be reclaimed, so
spinning is minimized.

Reported by:	Johannes Lundberg <johalun0@gmail.com>
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15455
2018-05-16 21:03:22 +00:00
shurd
8b4a96b13e Replace rmlock with epoch in lagg
Use the new epoch based reclamation API. Now the hot paths will not
block at all, and the sx lock is used for the softc data.  This fixes LORs
reported where the rwlock was obtained when the sxlock was held.

Submitted by:	mmacy
Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15355
2018-05-14 20:06:49 +00:00
mmacy
0a7aab5128 iflib(9): Add support for cloning pseudo interfaces
Part 3 of many ...
The VPC framework relies heavily on cloning pseudo interfaces
(vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc).

This pulls in that piece. Some ancillary changes get pulled
in as a side effect.

Reviewed by:	shurd@
Approved by:	sbruno@
Sponsored by:	Joyent, Inc.
Differential Revision:	https://reviews.freebsd.org/D15347
2018-05-11 20:08:28 +00:00
ae
c53ab47acf Apply the change from r272770 to if_ipsec(4) interface.
It is guaranteed that if_ipsec(4) interface is used only for tunnel
mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header.
Thus we can consider it as new packet and clear the protocols flags.
This allows ICMP/ICMPv6 properly handle errors that may cause this packet.

PR:		228108
MFC after:	1 week
2018-05-11 16:50:25 +00:00
mmacy
361b54f07a Allow different bridge types to coexist
if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.

Reviewed by:	sbruno@
Approved by:	sbruno@
Differential Revision:	https://reviews.freebsd.org/D15344
2018-05-11 05:00:40 +00:00
des
42f7c6ed63 Slight cleanup of interface event logging.
Make if_printf() use vlog() instead of vprintf().  This means it can no
longer return the number of characters printed, as it used to, but every
single call to if_printf() in the entire kernel ignores the return value
anyway; just return 0 so we don't have to change the prototype.

Consistently use if_printf() throughout sys/net/if.c, instead of a
mixture of if_printf() and log().

In ifa_maintain_loopback_route(), don't needlessly log an error if we
either failed to add a route because it already existed or failed to
remove one because it did not.  We still return an error code, though.

MFC after:	1 week
2018-05-11 00:19:49 +00:00
mmacy
0f77b86d64 Allocate epoch for networking at startup
Additionally add CK to include paths for modules

Approved by:	sbruno@
2018-05-10 19:13:00 +00:00
ae
0490c97003 Add IFCAP_LINKSTATE support to if_loop(4).
Reviewed by:	wollman
Obtained from:	Yandex LLC
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D15278
2018-05-09 10:50:51 +00:00
shurd
e81ffd1828 iflib: print message when iflib_tx_structures_setup fails
Print a message when iflib_tx_structures_setup fails, like we do for
iflib_rx_structures_setup.

Now that we always print a message from within
iflib_qset_structures_setup when it fails, stop printing one in
iflib_device_register() at the call site.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15300
2018-05-08 17:15:10 +00:00
shurd
3d0eb99e32 iflib: cleanup queues when iflib_device_register fail
Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15299
2018-05-08 16:56:02 +00:00
gallatin
f2a9371551 Fix an off-by-one error when deciding to request a tx interrupt
The canonical check for whether or not a ring is drainable is
TXQ_AVAIL() > MAX_TX_DESC() + 2.  Use this same construct here,
in order to avoid a potential off-by-one error where we might otherwise
fail to request an interrupt.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-07 18:11:22 +00:00
mmacy
d3f138323c r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.

Synchronously remove all external references to a multicast address before enqueueing for delete.

Reported by:	lwhsu
Approved by:	sbruno
2018-05-06 20:34:13 +00:00
mmacy
e532109522 The ifnet pointer (ifp) in rt_newaddrmsg can be valid without ifp->if_addr being set if
if the ifnet is still live by way of a reference but
in line for deletion. Check ifp->if_addr before dereferencing.

Approved by:	sbruno
2018-05-06 20:32:47 +00:00
markj
1de3a6fa6d Add netdump support to iflib.
em(4) and igb(4) were tested by me, and ixgbe(4) and bnxt(4) were
tested by sbruno.

Reviewed by:	mmacy, shurd
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15262
2018-05-06 00:57:52 +00:00
markj
071e927e22 Import the netdump client code.
This is a component of a system which lets the kernel dump core to
a remote host after a panic, rather than to a local storage device.
The server component is available in the ports tree. netdump is
particularly useful on diskless systems.

The netdump(4) man page contains some details describing the protocol.
Support for configuring netdump will be added to dumpon(8) in a future
commit. To use netdump, the kernel must have been compiled with the
NETDUMP option.

The initial revision of netdump was written by Darrell Anderson and
was integrated into Sandvine's OS, from which this version was derived.

Reviewed by:	bdrewery, cem (earlier versions), julian, sbruno
MFC after:	1 month
X-MFC note:	use a spare field in struct ifnet
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15253
2018-05-06 00:38:29 +00:00
mmacy
1e91ab0baf fix gcc8 warnings
Approved by:	sbruno
2018-05-04 18:57:05 +00:00
shurd
83e3d4c922 iflib: fix invalid free during queue allocation failure
In r301567, code was added to cleanup to prevent memory leaks for the
Tx and Rx ring structs. This code carefully tracked txq and rxq, and
made sure to free them properly during cleanup.

Because we assigned the txq and rxq pointers into the ctx->ifc_txqs and
ctx->ifc_rxqs, we carefully reset these pointers to NULL, so that
cleanup code would not accidentally free the memory twice.

This was changed by r304021 ("Update iflib to support more NIC designs"),
which removed this resetting of the pointers to NULL, because it re-used
the txq and rxq pointers as an index into the queue set array.

Unfortunately, the cleanup code was left alone. Thus, if we fail to
allocate DMA or fail to configure the queues using the drivers ifdi
methods, we will attempt to free txq and rxq. These variables would now
incorrectly point to the wrong location, resulting in a page fault.

There are a number of methods to correct this, but ultimately the root
cause was that we reuse the txq and rxq pointers for two different
purposes.

Instead, when allocating, store the returned pointer directly into
ctx->ifc_txqs and ctx->ifc_rxqs. Then, assign this to txq and rxq as
index pointers before starting the loop to allocate each queue.
Drop the cleanup code for txq and rxq, and only use ctx->ifc_txqs and
ctx->ifc_rxqs.

Thus, we no longer need to free txq or rxq under any error flow, and
intsead rely solely on the pointers stored in ctx->ifc_txqs and
ctx->ifc_rxqs. This prevents the invalid free(), and ensures that we
still properly cleanup after ourselves as before when failing to
allocate.

Submitted by:	Jacob Keller
Reviewed by:	gallatin, sbruno
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15285
2018-05-04 15:20:34 +00:00
shurd
83da3b83e5 iflib: remove unused brscp pointer from iflib_queues_alloc
This pointer was no longer written to as of r315217. Since nothing writes
to the variable, remove it.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	gallatin, kmacy, sbruno
Differential Revision:	https://reviews.freebsd.org/D15284
2018-05-04 15:11:16 +00:00
shurd
fc5848e1e6 Allow iflib NIC drivers to sleep rather than busy wait
Since the move to SMP NIC driver locking has had to go through serious
contortions using mtx around long running hardware operations. This moves
iflib past that.

Individual drivers may now sleep when appropriate.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14983
2018-05-03 17:02:31 +00:00
shurd
7d4b8facc7 Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by:	mmacy <mmacy@mattmacy.io>
Reviewed by:	shurd, sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14969
2018-05-02 19:36:29 +00:00
gallatin
40ab8d5ea9 Fix iflib_encap() EFBIG handling bugs
1) Don't give up if m_collapse() fails.  Rather than giving up, try
m_defrag() immediately.

2) Fix a leak where, if the NIC driver rejected the defrag'ed chain
as having too many segments, we would fail to free the chain.

Reviewed by:  Matthew Macy <mmacy@mattmacy.io> (this version of patch)
Submitted by: Matthew Macy <mmacy@mattmacy.io> (early version of leak fix)
2018-04-30 23:53:27 +00:00
hselasky
5663cd2837 Add network device event for priority code point, PCP, changes.
When the PCP is changed for either a VLAN network interface or when
prio tagging is enabled for a regular ethernet network interface,
broadcast the IFNET_EVENT_PCP event so applications like ibcore can
update its GID tables accordingly.

MFC after:	3 days
Reviewed by:	ae, kib
Differential Revision:	https://reviews.freebsd.org/D15040
Sponsored by:	Mellanox Technologies
2018-04-26 08:58:27 +00:00
brooks
289ce12298 Translate 32-bit ifmedia requests into native ones.
We use transformation rather than accessors as virtually ever driver
implements SIOCGIFMEDIA and all would have to be touched.

Keep the code readable by always performing copies and (possiably no-op)
transforms.

Reviewed by:	jhb, kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14996
2018-04-25 15:30:42 +00:00
markj
943580a700 Use dead_bpf_if instead of bp_null.
This fixes a -Wunused error when DEV_BPF and NETGRAPH_BPF are not
defined.

Also remove a stray semicolon added in r332812.

X-MFC with:	r332812
2018-04-24 17:42:25 +00:00
brooks
9a0f94467e Finish removing FDDI and tokenring media support.
This fixes media display for 802.11 wireless devices.

Software outside the base system that uses these media types and
defines should use #ifdef IFM_FDDI or IFM_TOKEN to include or remove
support.

Reported by:	zeising
Reviewed by:	emaste, kib, zeising
Tested by:	zeising
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15170
2018-04-23 21:10:33 +00:00
ae
c6192ec749 Add dead_bpf_if structure, that should be used as fake bpf_if
during ifnet detach.

Since destroying interface is not atomic operation and due to the
lack of synhronization during destroy, it is possible, that in the
time between bpfdetach() and if_free() some queued on destroying
interface mbuf will be used by ether_input_internal() and
bpf_peers_present() can dereference NULL bpf_if pointer. To protect
from this, assign pointer to empty bpf_if_ext structure instead of
NULL pointer after bpfdetach().

Reviewed by:	melifaro, eugen
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15083
2018-04-20 09:57:31 +00:00
shurd
90779c2bbf iflib: Fix queue distribution when there are no threads
Previously, if there are no threads, all queues which targeted
cores that share an L2 cache were bound to a single core. The intent is
to distribute them across these cores.

Reported by:	olivier
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15120
2018-04-18 15:34:18 +00:00
brooks
26c165ead9 Remove support for the Arcnet protocol.
While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.

Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.

PR:		182297
Reviewed by:	jhibbits, vangyzen
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15057
2018-04-13 21:18:04 +00:00
sbruno
18512a7765 Restore r332389 after resolution of locking fixes.
Add one extra lock initialization to iflib_register() that was missed
in the git<->phab conversion.

Split out flag manipulation from general context manipulation in iflib

To avoid blocking on the context lock in the swi thread and risk potential
deadlocks, this change protects lighter weight updates that only need to
be consistent with each other with their own lock.

Submitted by:   Matthew Macy <mmacy@mattmacy.io>
Reviewed by:    shurd
Sponsored by:   Limelight Networks
Differential Revision:  https://reviews.freebsd.org/D14967
2018-04-12 14:35:37 +00:00
vmaffione
3c7434c730 netmap: align codebase to the current upstream (commit id 3fb001303718146)
Changelist:
    - Turn tx_rings and rx_rings arrays into arrays of pointers to kring
      structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib,
      vtnet and ptnet drivers to cope with the change.
    - Generalize the nm_config() callback to accept a struct containing many
      parameters.
    - Introduce NKR_FAKERING to support buffers sharing (used for netmap
      pipes)
    - Improved API for external VALE modules.
    - Various bug fixes and improvements to the netmap memory allocator,
      including support for externally (userspace) allocated memory.
    - Refactoring of netmap pipes: now linked rings share the same netmap
      buffers, with a separate set of kring pointers (rhead, rcur, rtail).
      Buffer swapping does not need to happen anymore.
    - Large refactoring of the control API towards an extensible solution;
      the goal is to allow the addition of more commands and extension of
      existing ones (with new options) without the need of hacks or the
      risk of running out of configuration space.
      A new NIOCCTRL ioctl has been added to handle all the requests of the
      new control API, which cover all the functionalities so far supported.
      The netmap API bumps from 11 to 12 with this patch. Full backward
      compatibility is provided for the old control command (NIOCREGIF), by
      means of a new netmap_legacy module. Many parts of the old netmap.h
      header has now been moved to netmap_legacy.h (included by netmap.h).

Approved by:	hrs (mentor)
2018-04-12 07:20:50 +00:00
mjg
fa5413e897 iflib: fix up a mismerge in r332419
Lead to crashes on boot while in ifconfig.

Submitted by: Matthew Macy <mmacy@mattmacy.io>
2018-04-12 04:11:37 +00:00
shurd
63bcfab69d Properly initialize ifc_nhwtxqs.
Also, since ifc_nhwrxqs is only used in one place, remove it from the struct.
This was preventing iflib_dma_free() from being called via
iflib_device_detach().

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
2018-04-11 21:41:59 +00:00
brooks
6dcf9514b3 Remove support for FDDI networks.
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by:	kib, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15017
2018-04-11 17:28:24 +00:00
sbruno
a87ab66df6 Revert r332389 as it is causing panics for various users and we need
to add some more test cases.
2018-04-11 17:26:53 +00:00
shurd
d55e53acf5 Split out flag manipulation from general context manipulation in iflib
To avoid blocking on the context lock in the swi thread and risk potential
deadlocks, this change protects lighter weight updates that only need to
be consistent with each other with their own lock.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14967
2018-04-10 19:48:24 +00:00
shurd
05f9f1edaa Make BPF global lock an SX
This allows NIC drivers to sleep on polling config operations.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14982
2018-04-10 19:42:50 +00:00
vmaffione
8b391e44ef netmap: align codebase to upstream version v11.4
Changelist:
  - remove unused nkr_slot_flags
  - new nm_intr adapter callback to enable/disable interrupts
  - remove unused sysctls and document the other sysctls
  - new infrastructure to support NS_MOREFRAG for NIC ports
  - support for external memory allocator (for now linux-only),
    including linux-specific changes in common headers
  - optimizations within netmap pipes datapath
  - improvements on VALE control API
  - new nm_parse() helper function in netmap_user.h
  - various bug fixes and code clean up

Approved by:	hrs (mentor)
2018-04-09 09:24:26 +00:00
brooks
c2e6899488 Remove the thread argument from ifr_buffer_*() accessors.
They are always used in a context where curthread is the correct thread.
This makes them more similar to the ifr_data_get_ptr() accessor.
2018-04-06 23:25:54 +00:00
brooks
7a2353df98 ifconf(): correct handling of sockaddrs smaller than struct sockaddr.
Portable programs that use SIOCGIFCONF (e.g. traceroute) assume
that each pseudo ifreq is of length MAX(sizeof(struct ifreq),
sizeof(ifr_name) + ifr_addr.sa_len).  For short sockaddrs we copied
too much from the source sockaddr resulting in a heap leak.

I believe only one such sockaddr exists (struct sockaddr_sco which
is 8 bytes) and it is unclear if such sockaddrs end up on interfaces
in practice.  If it did, the result would be an 8 byte heap leak on
current architectures.

admbugs:	869
Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	3 days
Security:	kernel heap leak
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14981
2018-04-06 20:26:56 +00:00
brooks
9d79658aab Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
kp
337a0778fc pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.

Limit the allocation to required size (or the user allocation, if that's
smaller). That does mean we need to do the allocation with the rules
lock held (so the number doesn't change while we're doing this), so it
can't M_WAITOK.

MFC after:	1 week
2018-04-06 15:54:30 +00:00
brooks
0080e81d7c Add 32-bit compat for ioctls that take struct ifgroupreq.
Use an accessor to access ifgr_group and ifgr_groups.

Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such
as "case SIOCAIFGROUP:". This avoids poluting the switch statements
with large numbers of #ifdefs.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14960
2018-04-05 22:14:55 +00:00
brooks
8f46ff8fe4 ifconf(): Always zero the whole struct ifreq.
The previous split of zeroing ifr_name and ifr_addr seperately is safe
on current architectures, but would be unsafe if pointers were larger
than 8 bytes. Combining the zeroing adds no real cost (a few
instructions) and makes the security property easier to verify.

Reviewed by:	kib, emaste
Obtained from:	CheriBSD
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14912
2018-04-05 21:58:28 +00:00
vmaffione
6dfc9aee0f netmap: align if_ptnet guest driver to the upstream code (commit 0e15788)
The change upgrades the driver to use the split Communication Status
Block (CSB) format. In this way the variables written by the guest
and read by the host are allocated in a different cacheline than
the variables written by the host and read by the guest; this is
needed to avoid cache thrashing.

Approved by:	hrs (mentor)
2018-04-04 21:31:12 +00:00
brooks
2b96daf50f Document and enforce assumptions about struct (in6_)ifreq.
- The two types must be type-punnable for shared members of ifr_ifru.
  This allows compatibility accessors to be shared.

- There must be no padding gap between ifr_name and ifr_ifru.  This is
  assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
  broadly portable.  This is true for all current architectures, but very
  large (256-bit) fat-pointers could violate this invariant.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14910
2018-03-30 21:38:53 +00:00
brooks
ac0325b4db Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
brooks
a45d44647f Remove infrastructure for token-ring networks.
Reviewed by:	cem, imp, jhb, jmallett
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14875
2018-03-28 23:33:26 +00:00
brooks
308f791e1c Improve copy-and-pasted versions of SIOCGIFADDR.
The original implementation used a reference to ifr_data and a cast to
do the equivalent of accessing ifr_addr. This was copied multiple
times since 1996.

Approved by:	kib
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14873
2018-03-27 20:51:49 +00:00
brooks
6907bd334c Fix a whitespace bug missed in refactoring prior to r331641.
MFC with:	r331641
2018-03-27 18:55:39 +00:00
brooks
0754c526f1 Fix access to ifru_buffer on freebsd32.
Make all kernel accesses to ifru_buffer go via access functions
which take the process ABI into account and use an appropriate union
to access members in the correct place in struct ifreq.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14846
2018-03-27 18:26:50 +00:00
kib
9de215608c Allow to specify PCP on packets not belonging to any VLAN.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful.  See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.

Make it possible to specify PCP value for outgoing packets on an
ethernet interface.  When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value.  The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.

Drivers might have issues with filtering VID 0 packets on
receive.  This bug should be fixed for each driver.

Reviewed by:	ae (previous version), hselasky, melifaro
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D14702
2018-03-27 15:29:32 +00:00
markj
12dff6d870 Clamp IFLIB_RX_COPY_THRESH to MHLEN in iflib_rxd_pkt_get().
If one has added fields to struct mbuf such that MHLEN is smaller than
this threshold (128), iflib_rxd_pkt_get() may otherwise overrun the
internal mbuf buffer while copying.

Reviewed by:	mmacy
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14843
2018-03-25 23:23:19 +00:00
kp
109a7b5eec netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by:	ae, kevans
Differential Revision:	https://reviews.freebsd.org/D13715
2018-03-23 16:56:44 +00:00
melifaro
75159f749d Use count(9) api for the bpf(4) statistics.
Currently each bfp descriptor uses u64 variables to maintain its counters.
On interfaces with high packet rate this leads to unnecessary contention
and inaccurate reporting.

PR:		kern/205320
Reported by:	elofu17 at hotmail.com
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14726
2018-03-20 22:57:06 +00:00
melifaro
7bb5ee0db4 Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.
Current arp/nd code relies on the feedback from the datapath indicating
 that the entry is still used. This mechanism is incorporated into the
 arpresolve()/nd6_resolve() routines. After the inpcb route cache
 introduction, the packet path for the locally-originated packets changed,
 passing cached lle pointer to the ether_output() directly. This resulted
 in the arp/ndp entry expire each time exactly after the configured max_age
 interval. During the small window between the ARP/NDP request and reply
 from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
 path used by route cache. Unify the notification code by using single
 inlined function with the per-AF callbacks.

Reported by:	sthaug at nethelp.no
Reviewed by:	ae
MFC after:	2 weeks
2018-03-17 17:05:48 +00:00
avos
6ddab56276 Correct comment for IFM_IEEE80211_VHT media variant. 2018-03-15 23:32:29 +00:00
ae
d3176e34c7 Define ethernet type 0x88A8 as ETHERTYPE_QINQ.
Reviewed by:	kp
Obtained from:	OpenBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14593
2018-03-06 12:01:31 +00:00
shurd
6bea9d5c59 iflib: stop timer callout when stopping
iflib_timer has been seen running after the interface had been removed.
This change prevents that.

Submitted by:	matt.macy@joyent.com
2018-03-02 18:48:07 +00:00
kp
fc599d4911 pf: Cope with overly large net.pf.states_hashsize
If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.

Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.

PR:		209475
Submitted by:	Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D14367
2018-02-25 08:56:44 +00:00
rstone
dd56500020 Allow route change requests to not specify the gateway.
Only require a gateway to be specified on a route add request.  On
a route change request that does not specify the gateway, the
gateway will remain the same.  This allows changing other route
parameters without having to re-specifying the gateway, like in
"route change 10.0.0.0/8 -mtu 9000".

Update the route(8) manpage to explicitly call out this usage
as being supported.

MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Reviewed By: eugen (rtsock.c change), rgrimes
Differential Revision: https://reviews.freebsd.org/D14291
2018-02-21 19:13:23 +00:00
shurd
6699ee32ca IFLIB: Make isc_magic unsigned
The IFLIB_MAGIC macro is > INT_MAX, so isc_magic should
be able to contain it.

Reported by:	jeb
Sponsored by:	Limelight  Networks
2018-02-21 18:57:00 +00:00
np
a38188ac36 Catch up with the removal of nktr_slot_flags from upstream netmap. No
functional impact intended.

Submitted by:	Vincenzo Maffione <v.maffione@gmail.com>
2018-02-20 21:42:45 +00:00
shurd
c1c5080794 IFLIB: do not remove dmamap on buffer unload
Dmamap is created only on IFC attach. If we remove it on
buffer release, we won't be able to do ifconfig down&up. Only destroy
when in detach.

Reported by:	wma
Reviewed by:	wma
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14060
2018-02-20 18:33:45 +00:00
wma
d14bbbd48e BPF: Switch to 32 bit compatible mode only when thread is 32 bit
Sometimes 32 bit and 64 bit ioctls are represented by the same number.
It causes unnecessary switch to 32 bit commpatible mode.

This patch prevents switching when we are dealing with 64 bit executable.
It fixes issue mentioned here

Authored by:           Patryk Duda <pdk@semihalf.com>
Submitted by:          Wojciech Macek <wma@semihalf.com>
Reviewed by:           andrew, wma
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14023
2018-01-25 12:13:41 +00:00
smh
c62edd09fd Added missing CTLFLAG_VNET to lacp default_strict_mode
Added CTLFLAG_VNET to net.link.lagg.lacp.default_strict_mode which was missed
in r290450.

Reported by:	julian@
MFC after:	1 week
Sponsored by:	Multiplay
2018-01-24 10:13:14 +00:00
rstone
9c794ac899 Increment the route table gen count after a modify
Increment the route table generation count after modifying a
route.  This signals back to TCP connections that they need to
update their L2 caches as the gateway for their route may have
changed.  This is a heavier hammer than is needed, strictly
speaking, but route changes will be unlikely enough that the
performance effects of invalidating all connection route caches
should be negligible.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13990
Reviewed by:	karels
2018-01-23 03:15:44 +00:00
rstone
c0c5474ab0 Reduce code duplication for inpcb route caching
Add a new macro to clear both the L3 and L2 route caches, to
hopefully prevent future instances where only the L3 cache was
cleared when both should have been.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13989
Reviewed by:	karels
2018-01-23 03:15:39 +00:00
rstone
4fb0175a26 Invalidate inpcb LLE cache if cached route is invalidated
When the inpcb route cache is invalidated after a change to the
routing tables, we need to invalidate the LLE cache as well.
Previous to this change packets for the connection would continue
to use the old L2 information from the old L3 gateway, and the
packets for the connection would likely be blackholed.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13988
Reviewed by:	karels
2018-01-23 03:15:39 +00:00
kib
027c7f4d66 Fix compat32 for sysctl net.PF_ROUTE...NET_RT_IFLISTL.
Route messages are aligned to the host long type alignment, which
breaks 32bit.

Reported and tested by:	lwhsu
Diagnosed by:	Yuri Pankov <yuripv@icloud.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-22 20:49:17 +00:00
pfg
ced875130d Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
pfg
bf156bc88c net*: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:21:51 +00:00
smh
a74104797b Disabled the use of flowid for lagg by default
Disabled the use of RSS hash from the network card aka flowid for
lagg(4) interfaces by default as it's currently incompatible with
the lacp and loadbalance protocols.

The incompatibility is due to the fact that the flowid isn't know
for the first packet of a new outbound stream which can result in
the hash calculation method changing and hence a stream being
incorrectly split across multiple interfaces during normal
operation.

This can be re-enabled by setting the following in loader.conf:
net.link.lagg.default_use_flowid="1"

Discussed with: kmacy
Sponsored by:	Multiplay
2018-01-04 20:05:47 +00:00
kp
affaad48ea pf: Clean all fragments on shutdown
When pf is unloaded, or a vnet jail using pf is stopped we need to
ensure we clean up all fragments, not just the expired ones.
2017-12-31 10:01:31 +00:00
bryanv
5f0e777be3 Add macro for vxlan list mutex lock and unlock
This will simplify some later VNET support.

Submitted by:	hrs
MFC after:	2 weeks
2017-12-30 19:49:40 +00:00
bryanv
10d5e637cc Advertise IFCAP_LINKSTAT after r326480 added link status support
MFC after:	2 weeks
2017-12-30 19:35:12 +00:00
bryanv
acc32109f0 Add support for IPv6 scoped addresses to vxlan
MFC after:	2 weeks
2017-12-30 04:03:53 +00:00
shurd
36e9cfb008 Don't pass rids to taskqgroup_attach()
As everywhere else, we want to pass rman_get_start(irq->ii_res).  This
caused set affinity errors when not using MSI-X vectors (legacy and MSI
interrupts).

Reported by:	sbruno
Sponsored by:	Limelight Networks
2017-12-27 20:42:30 +00:00
shurd
d81b03b75c Remove assertion that's not true for !EARLY_AP_STARTUP
gtask->gt_taskqueue is NULL when EARLY_AP_STARTUP is not enabled.
Remove assertion to allow this config to work.

Reported by:	oleg
Sponsored by:	Limelight Networks
2017-12-27 19:14:15 +00:00
shurd
49a54f9138 Fix indentation.
Sponsored by:	Limelight Networks
2017-12-27 19:12:32 +00:00
eadler
421a929b1e kernel: Fix several typos and minor errors
- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by:	imp, benno
2017-12-27 03:23:21 +00:00
kan
c8da6fae2c Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
kan
693547b17b Do not pass NULL pointer to copyout in if_clone_list.
Sometimes caller is only interested in how many clones
are there and NULL pointer is passed for the destination
buffer. Do not pass it to copyout then.
2017-12-23 16:45:24 +00:00
kan
1b88a37f88 Remove some trailing whitespace.
Reviewed by: glebius, ae
Differential Revision: https://reviews.freebsd.org/D10386
2017-12-23 16:24:00 +00:00
kan
c3dc6afa32 Do not double free the memory in if_clone.
if_clone_attach function will drop the reference on failure  which will
free the if_clone structure. No need to do it second time.

Reviewed by: glebius, ae
Differential Revision: https://reviews.freebsd.org/D10386
2017-12-23 16:23:58 +00:00
imp
cb06f90265 The device tables end with a sentinel in iflib. Don't include the
sentinel in the output.
2017-12-23 04:50:52 +00:00
imp
ff6ebd2b2f Use '#' rather than some made up name for fields we want to ignore. 2017-12-22 17:53:27 +00:00
kib
b287d6f562 Fix build for kernels with SCHED_4BSD.
Sponsored by:	The FreeBSD Foundation
2017-12-21 23:05:13 +00:00
shurd
4d16fa9f49 Don't call tcp_lro_rx() unless hardware verified TCP/UDP csum
It seems that tcp_lro_rx() doesn't verify TCP checksums, so
if there are bad checksums in the packets caused by invalid data, the
invalid data will pass through without errors.

This was noticed with the igb driver and a specific internet host:
fetch http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz -o test.bin && sha256 test.bin
Would result in a different value sometimes.

This ends up making LRO require RXCSUM to be enabled, and RXCSUM to
support TCP and UDP checksums.

PR:		224346
Reported by:	gjb
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13561
2017-12-21 01:22:36 +00:00
lwhsu
065607f663 Add missing ;
Approved by:	kevlo
2017-12-20 06:08:16 +00:00
shurd
607c735784 Support attaching tx queues to cpus
This will attempt to use a different thread/core on the same L2
cache when possible, or use the same cpu as the rx thread when not.
If SMP isn't enabled, don't go looking for cores to use. This is mostly
useful when using shared TX/RX queues.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12446
2017-12-20 01:03:34 +00:00
shurd
0cf83abb13 Update Matthew Macy contact info
Email address has changed, uses consistent name (Matthew, not Matt)

Reported by:	Matthew Macy <mmacy@mattmacy.io>
Differential Revision:	https://reviews.freebsd.org/D13537
2017-12-19 17:59:00 +00:00
ae
84904a6912 Fix possible memory leak.
vxlan_ftable entries are sorted in ascending order, due to wrong arguments
order it is possible to stop search before existing element will be found.
Then new element will be allocated in vxlan_ftable_update_locked() and can
be inserted in the list second time or trigger MPASS() assertion with
enabled INVARIANTS.

PR:		224371
MFC after:	1 week
2017-12-16 14:36:21 +00:00
rstone
01dbe7a2a4 Plug an ifaddr leak when changing a route's src
If a route is modified in a way that changes the route's source
address (i.e. the address used to access the gateway), then a
reference on the ifaddr representing the old source address will
be leaked if the address type does not have an ifa_rtrequest
method defined.  Plug the leak by releasing the reference in
all cases.

Differential Revision:	https://reviews.freebsd.org/D13417
Reviewed by:	ae
MFC after:	3 weeks
Sponsored by:	Dell
2017-12-14 20:48:50 +00:00
shurd
b5324f21c0 Increment encap_pad_mbuf_fail when m_dup() fails in padding
Previously, the counter was only incremented when m_append() failed.  Since
the function can also fail on m_dup() now, increment the counter there as
well.

Sponsored by:	Limelight Networks
2017-12-11 20:01:28 +00:00
shurd
01b240c231 Free mbuf chain when m_dup fails
Fix memory leak where mbuf chain wasn't free()d if iflib_ether_pad()
has a failure in m_dup().

Reported by:	"Ryan Stone" <rysto32@gmail.com>
Sponsored by:	Limelight Networks
2017-12-08 19:50:06 +00:00
shurd
c37f5a169b Handle read-only mbufs in iflib ether pad function
If ethernet padding is enabled, and a read-only mbuf is passed,
it would modify the mbuf using m_append(). Instead, call m_dup() and
append to the new packet.

Reported by:	Pyun YongHyeon
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13414
2017-12-08 18:43:31 +00:00
glebius
9c54c9c64c Garbage collect IFCAP_POLLING_NOCOUNT. It wasn't used since very
beginning of polling(4).  The module always ignored return value
from driver polling handler.
2017-12-06 23:03:34 +00:00
shurd
923e962b98 iflib: Support to padding Ethernet frames to a min size
Some bnxt devices do not correctly send frames smaller than
52 bytes (without CRC), so add a quirk that will pad frames to an
arbitrary size before passing off to the encap routine.

Reported by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13269
2017-12-05 21:00:31 +00:00
shurd
f54cf2fa3b Avoid calling CURVNET_[SET|RESTORE] for each packet
The LRO possible test was calling CURVNET_SET once for IPv4 or IPv6 for
each packet in a chain. Only call it once per chain instead.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	cem, ae
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13368
2017-12-05 20:43:24 +00:00
erj
87656821dc ifconfig(8): Display extended compliance code string for SFP transceivers
- Updates tables in affected files with new entries from newer spec
revisions of SFF-8472, SFF-8024, and SFF-8636

- Change ifconfig to read and display the extended compliance code for
SFP media if the extended compliance code is not 0. This was being displayed
for QSFP transceivers only, but SFP28 media uses this to report 25G
capability.

Reviewed by:	melifaro, sbruno
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D13286
2017-12-05 18:42:07 +00:00
bryanv
8a35018f48 Add if media and link status events to vxlan
PR:		214359
MFC after:	2 weeks
2017-12-02 22:04:00 +00:00
shurd
76ca06b62f Add support for SIOCGIFXMEDIA to iflib
SIOCGIFXMEDIA is required for extended ethernet media types,
but iflib did not support it.

Reported by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13312
2017-12-01 17:58:20 +00:00
hselasky
923fadf4e8 Properly define the VLAN_XXX() function macros to avoid miscompilation when
used inside "if" statements comparing with another value.

Detailed explanation:
"if (a ? b : c != 0)" is not the same like "if ((a ? b : c) != 0)"
which is the expected behaviour of a function macro.

Affects:
toecore, linuxkpi and ibcore.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-11-30 11:35:22 +00:00
shurd
23eaeecabf Fix comment introduced in r326369
The code uses the set of all CPUs, it doesn't zero out the set.

Sponsored by:	Limelight Networks
2017-11-29 18:21:17 +00:00
shurd
0de624aa47 Ensure that ctx->ifc_cpus is always initialized
If a device didn't support MSI-X, ctx->ifc_cpus would not be initialized,
but the IRQ allocation routines still uses the value.  Move the
initialization to common code.

Sponsored by:	Limelight Networks
2017-11-29 18:14:57 +00:00
hselasky
30eed323c8 Disallow TUN and TAP character device IOCTLs to modify the network device
type to any value. This can cause page faults and panics due to accessing
uninitialized fields in the "struct ifnet" which are specific to the network
device type.

MFC after:	1 week
Found by:	jau@iki.fi
PR:		223767
Sponsored by:	Mellanox Technologies
2017-11-29 09:40:11 +00:00
pfg
78a6b08618 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
shurd
4334349090 Fix off-by-one error in bit_nclear() usage
bit_nclear() takes the bit numbers for the start and end bits, not the start
and a count.  This was resulting in memory corruption past the end of the
bitstr_t.

Sponsored by:	Limelight Networks
2017-11-20 21:57:04 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
kib
1df1702579 Fix build.
Sponsored by:	The FreeBSD Foundation
2017-11-19 11:21:16 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
shurd
0708b7cc37 Fix default numbers of iflib queue sets
The intent appears to be having one RX/TX queue set per core,
but since scctx->isc_n[tr]xqsets is set to max before calling
iflib_msix_init(), both end up being set to total number of cores.

Use ctx->ifc_sysctl_n[rt]xqs as the selected value and
scctx->isc_n[rt]xqsets as the max. This should result in what appears
to be the intended behaviour

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13096
2017-11-16 18:52:58 +00:00
antoine
870ff83df6 Do not leak control in raw_usend 2017-11-08 23:20:05 +00:00
kib
ce9362dfb8 Add a place for a driver to report rx timestamps in nanoseconds from
boot for the received packets.

The rcv_tstmp field overlaps the place of Ln header length indicators,
not used by received packets.  The basic pkthdr rearrangement change
in sys/mbuf.h was provided by gallatin.

There are two accompanying M_ flags: M_TSTMP means that there is the
timestamp (and it was generated by hardware).

Another flag M_TSTMP_HPREC indicates that the timestamp is
high-precision.  Practically M_TSTMP_HPREC means that hardware
provided additional precision comparing with the stamps when the flag
is not set.  E.g., for ConnectX all packets are stamped by hardware
when PCIe transaction to write out the completion descriptor is
performed, but PTP packet are stamped on port.  For Intel cards, when
PTP assist is enabled, only PTP packets are stamped in the limited
number of registers, so if Intel cards ever start support this
mechanism, they would always set M_TSTMP | M_TSTMP_HPREC if hardware
timestamp is present for the given packet.

Add IFCAP_HWRXTSTMP interface capability to indicate the support for
hardware rx timestamping, and ifconfig(8) command to toggle it.

Based on the patch by:	gallatin
Reviewed by:	gallatin (previous version), hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks (? mbuf KBI issue)
X-Differential revision:	https://reviews.freebsd.org/D12638
2017-11-07 09:29:14 +00:00
sbruno
4824ebc09f Fix NOINET/NOINET6 build during compilation of iflib.
Reported by:	kib
2017-11-06 19:54:25 +00:00
shurd
230873ed5b Only chain non-LRO mbufs when LRO is not possible
Preserve packet order between tcp_lro_rx() and if_input() to avoid
creating extra corner cases. If no packets can be LROed, combine them
into one chain for submission via if_input(). If any packet can
potentially be LROed however, retain old behaviour and call if_input()
for each packet.

This should keep the 12% improvement for small packet forwarding intact,
but mostly avoids impacting the LRO case.

Reviewed by:	cem, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12876
2017-11-06 16:23:21 +00:00
eugen
8ac1be2af4 Allow a process to assign an IP address to local ppp interface
even if kernel routing table already has a route to the address in question
installed by some routing daemon (PR 223129).

Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647).

PR:			222647, 223129
Reviewed by:		gnn
Approved by:		avg (mentor), mav (mentor)
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D12747
2017-11-05 14:41:48 +00:00
kp
cbe96f9524 epair: Fix panic on unload
The VNET_SYSUNINIT() callback is executed after the MOD_UNLOAD. That means
that netisr_unregister() has already been called when
netisr_unregister_vnet() gets calls, leading to an assertion failure.

Restore the expected order of operations by performing everything that
was done in MOD_UNLOAD to a SYSUNINIT() (that will be called after the
VNET_SYSUNINIT()).

Differential Revision:	https://reviews.freebsd.org/D12771
2017-11-01 14:27:26 +00:00
shurd
c53c5aea0e Preserve TSO checksum flags
r323941 incorrectly disabled TSO flags based on MTU.

Reported by:	Yuri Pankov <yuripv@gmx.com>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12880
2017-10-31 19:03:35 +00:00
shurd
0268910d9e Fix PR221990 - Assertion at iflib.c:1947
ifl_pidx and ifl_credits are going out of sync in _iflib_fl_refill() as they
use different update log.  Use the same update logic for both, and add a
final call to isc_rxd_refill() to handle early exits from the loop.

PR:		221990
Reported by:	pho
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12798
2017-10-31 17:50:42 +00:00
shurd
a63bfd2b4e Fix build with nodevice netmap
iru_init() was declared and used outside the DEV_NETMAP
conditional blocks, but was implemented inside one. Move the
implementation out of the DEV_NETMAP block to allow building with
netmap disabled.

Reported by:	Andrew Turner <andrew@fubar.geek.nz>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12842
2017-10-31 02:49:28 +00:00
shurd
22a78b5938 bnxt: HW_LRO Rx Pkt with > 32 fragments caused Crash (iflib)
Broadcom NIC with HW_LRO setting max_agg_segs >= 6 can generate Rx pkt with
64 (2^6) fragments, modify IFLIB_MAX_RX_SEGS to 64 to avoid memory
corruption / Crash.

Submitted by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D12774
2017-10-30 21:20:33 +00:00
shurd
77989da976 Fix PR222744 - netmap errors with iflib em driver
Fix error when refilling netmap buffers that resulted in the first
buffer of the successive passes through ifl_bus_addrs[] leaving the
first value unset (tmp_pidx started at 1, not zero after the first time
through the loop).

Leave the one unused buffer required by some NICs visible in the netmap
ring rather than hidden. There will always be a buffer in use by the
kernel now when an iflib driver is used via netmap.

Always get the netmap slot index via netmap_idx_n2k() to account for
nkr_hwofs in a consistent way.

Split shared functionality into new functions.
iru_init(): shared by _iflib_fl_refill() and netmap_fl_refill()
netmap_fl_refill(): shared by iflib_netmap_rxsync() and
iflib_netmap_rxq_init()

PR:		222744
Reported by:	Shirkdog <mshirk@daemon-security.com>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12769
2017-10-30 21:14:31 +00:00
shurd
b8ebc4c70f Avoid enabling MSI-X if MSI-X is disabled globally
It was reported on the community call that with
hw.pci.enable_msix=0, iflib would enable MSI-X on the device and attempt
to use it, which caused issues. Test the sysctl explicitly and do not
enable MSI-X if it's disabled globally.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12805
2017-10-30 21:08:12 +00:00
shurd
8b51323439 Some cache related optimizations
1. prefetch 128 bytes of mbufs.
2. Re-order filling the pkt_info so cache stalls happen at the end
3. Define empty prefetch2cachelines() macro when the function isn't present.

Provides small performance improvments on some hardware

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12447
2017-10-23 20:50:08 +00:00
bz
48b1992757 With r181803 on 2008-08-17 23:27:27Z the first VIMAGE commit went into
HEAD.  Enable VIMAGE in GENERIC kernels and some others (where GENERIC does
not exist) on HEAD.

Disable building LINT-VIMAGE with VIMAGE being default.

This should give it a lot more exposure in the run-up to 12 to help
us evaluate whether to keep it on by default or not.
We are also hoping to get better performance testing.
The feature can be disabled using nooptions.

Requested by:		many
Reviewed by:		kristof, emaste, hiren
X-MFC after:		never
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D12639
2017-10-20 21:40:59 +00:00
avos
204b05f25a ifnet(9): split ifc_alloc_unit() (should simplify code flow)
Allocate smallest unit number from pool via ifc_alloc_unit_next()
and exact unit number (if available) via ifc_alloc_unit_specific().

While here, address possible deadlock (mentioned in PR).

PR:		217401
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D12551
2017-10-16 21:21:31 +00:00
sephe
ba59229eaf rss: Remove never defined UDP_IPV4_EX
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12455
2017-10-11 06:08:01 +00:00
shurd
bb993c697a Fix "taskqgroup_attach: setaffinity failed: 3" with iflib drivers
Improved logging added in r323879 exposed an error during
attach. We need the irq, not the rid to work correctly. em uses
shared irqs, so it will use the same irq for TX as RX. bnxt does
not use shared irqs, or TX irqs at all, so there's no need to set
the TX irq affinity.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12496
2017-10-05 14:43:30 +00:00
cem
8a3fa60af6 Add PNP metadata to more drivers
GPUs: radeonkms, i915kms
NICs: if_em, if_igb, if_bnxt

This metadata isn't used yet, but it will be handy to have later to
implement automatic module loading.

Reviewed by:	imp, mmacy
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12488
2017-09-26 23:23:58 +00:00
shurd
663b443d07 Have ifmp_ring_enqueue() abdicate instead of switch to a consumer
Move TX out of the enqueue() path. As a result, we need
to have ifmp_ring_check_drainage() pick up from the abdicate state.

We also need to either enqueue the TX task, or check drainage
after calling ifmp_ring_enqueue() to ensure it's sent.

This change results in a 30% small packet forwarding improvement.

Reviewed by:	olivier, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12439
2017-09-23 16:46:30 +00:00
shurd
67f35a364e Make the rx budget a tunable
This allows tuning the rx budget for special load profiles
as well as more easily testing to determine sane defaults.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12445
2017-09-23 01:37:01 +00:00
shurd
ceadb28fa2 Chain mbufs before passing to if_input()
Build a list of mbufs to pass to if_input() after LRO. Results in
12% small packet forwarding rate improvement.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12444
2017-09-23 01:35:14 +00:00
shurd
f81cb8df32 Some small packet performance improvements
If the packet is smaller than MTU, disable the TSO flags.
Move TCP header parsing inside the IS_TSO?() test.
Add a new IFLIB_NEED_ZERO_CSUM flag to indicate the checksums need to be zeroed before TX.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12442
2017-09-23 01:33:20 +00:00
kp
deed61f436 bridge: Set module version
This ensures that the loader will not load the module if it's also built in to
the kernel.

PR:		220860
Submitted by:	Eugene Grosbein <eugen@freebsd.org>
Reported by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
2017-09-21 14:14:01 +00:00
shurd
29d0e17e0c Fix iflib netmap RX
RXQ setup for netmap was broken because netmap_rxq_init was getting called
before IFDI_INIT - thus we ended up with ring tail pointer being reset to zero.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12140
2017-09-20 20:40:49 +00:00
alc
98c7e4b2b1 In r288122, we changed vm_page_unwire() so that it returns a Boolean
indicating whether the page's wire count transitioned to zero.  Use that
return value in zbuf_page_free() rather than checking the wire count.

MFC after:	1 week
2017-09-20 04:59:52 +00:00
shurd
acad431d06 Revert r323516 (iflib rollup)
This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.

I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.

Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
2017-09-16 02:41:38 +00:00
shurd
99c641b97c Roll up iflib commits from github. This pulls in most of the work done
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/

This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree.  The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.

Here is a summary of changes included in this patch:

1)  More checks when INVARIANTS are enabled for eariler problem
    detection
2)  Group Task Queue cleanups
    - Fix use of duplicate shortdesc for gtaskqueue malloc type.
      Some interfaces such as memguard(9) use the short description to
      identify malloc types, so duplicates should be avoided.
3)  Allow gtaskqueues to use ithreads in addition to taskqueues
    - In some cases, this can improve performance
4)  Better logging when taskqgroup_attach*() fails to set interrupt
    affinity.
5)  Do not start gtaskqueues until they're needed
6)  Have mp_ring enqueue function enter the ABDICATED rather than BUSY
    state.  This moves the TX to the gtaskq and allows processing to
    continue faster as well as make TX batching more likely.
7)  Add an ift_txd_errata function to struct if_txrx.  This allows
    drivers to inspect/modify mbufs before transmission.
8)  Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
    checksums zeroed for checksum offload to work.  This avoids modifying
    packet data in the TX path when possible.
9)  Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B.  We
    often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
    bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
    This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
    This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
    rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
    before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
    IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
    especially when RX and TX share an interrupt.  RX will attempt to
    take the first threads on a core, and TX will attempt to take
    successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
    cpus an interrupt has affinity with.  This allows TX queues to
    ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
    - timer_int - interval at which to run per-queue timers in ticks
    - force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
    - rx_budget allows tuning the batch size on the RX path
    - watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
    IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
    when waiting for firmware
    - After interrupts are enabled, convert all waits to sleeps
    - Eliminates e1000 software/firmware synchronization busy waits after
      startup
25) e1000: Remove special case for budget=1 in em_txrx.c
    - Premature optimization which may actually be incorrect with
      multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
    RX and TX.
    - Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
    Much easier to understand separate functions and "if (is_igb)" than
    previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"

#blamebruno

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12235
2017-09-13 01:18:42 +00:00
mjoras
ad8bdad7b2 Allow vlan interfaces to rx through netmap(4).
Normally after receiving a packet, a vlan(4) interface sends the packet
back through its parent interface's rx routine so that it can be
processed as an untagged frame. It does this by using the parent's
ifp->if_input. This is incompatible with netmap(4), which replaces the
vlan(4) interface's if_input with a netmap(4) hook. Fix this by using
the vlan(4) interface's ifp instead of the parent's directly.

Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	rstone
Approved by:	rstone (mentor)
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12191
2017-09-13 00:25:09 +00:00
np
e20a000158 Make LACP based lagg work with interfaces (like 100Gbps and 25Gbps) that
report extended media types.

lacp_aggregator_bandwidth() uses the media to determine the speed of the
interface and returns 0 for IFM_OTHER without the bits in the extended
range.

Reported by:	kbowling@
Reviewed by:	eugen_grosbein.net, mjoras@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D12188
2017-09-06 14:36:35 +00:00
hselasky
7c2ab1d9f6 Add support for generic backpressure indicator for ratelimited
transmit queues aswell as non-ratelimited ones.

Add the required structure bits in order to support a backpressure
indication with ratelimited connections aswell as non-ratelimited
ones. The backpressure indicator is a value between zero and 65535
inclusivly, indicating if the destination transmit queue is empty or
full respectivly. Applications can use this value as a decision point
for when to stop transmitting data to avoid endless ENOBUFS error
codes upon transmitting an mbuf. This indicator is also useful to
reduce the latency for ratelimited queues.

Reviewed by:		gallatin, kib, gnn
Differential Revision:	https://reviews.freebsd.org/D11518
Sponsored by:		Mellanox Technologies
2017-09-06 13:56:18 +00:00
sephe
7d1e282c9d if: Add ioctls to get RSS key and hash type/function.
It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299
https://svnweb.freebsd.org/base?view=revision&revision=322485

These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.

Setting RSS key and hash type/function is a different story,
which probably requires more discussion.

Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.

Reviewed by:	gallatin
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12174
2017-09-05 05:28:52 +00:00
glebius
2ab7524597 Do not abuse flag that is clearly marked as unused.
This creates conflicts with FreeBSD variations that may use it.  The
usage of the flag M_TOOBIG is limited to iflib queue, thus using
one of M_PROTO flags is fine.  There is no need to grab global flag.

Silence from:	kmacy, sbruno (2 weeks)
2017-08-31 23:19:18 +00:00
sbruno
6330970227 Revert r323008 and its conversion of e1000/iflib to using SX locks.
This seems to be missing something on the 82574L causing NFS root mounts
to hang.

Reported by:	kib
2017-08-30 18:56:24 +00:00
sbruno
5e435d46c1 Continuation of lock cleanup in e1000.
Post-cold sleep instead of DELAY when waiting for firmware.

Convert softc mutex to an SX lock.  Change all waits to sleeps
once interrupts are enabled (and it is safe to sleep).

Submitted by:	Matt Macy <matt@mattmacy.io>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12101
2017-08-30 00:20:43 +00:00
glebius
81191b72db Garbage collect RT_NORTREF, which is no longer in use after FLOWTABLE removal. 2017-08-24 23:08:12 +00:00
sbruno
f6aa72734a iflib: call device's if_init function during vlan initialization.
Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	shurd
Sponsored by:	Broadcom
Differential Revision:	https://reviews.freebsd.org/D12098
2017-08-23 21:49:56 +00:00
kp
6ab3759e1c bpf: Fix incorrect cleanup
Cleaning up a bpf_if is a two stage process. We first move it to the
bpf_freelist (in bpfdetach()) and only later do we actually free it (in
bpf_ifdetach()).

We cannot set the ifp->if_bpf to NULL from bpf_ifdetach() because it's
possible that the ifnet has already gone away, or that it has been assigned
a new bpf_if.
This can lead to a struct ifnet which is up, but has if_bpf set to NULL,
which will panic when we try to send the next packet.

Keep track of the pointer to the bpf_if (because it's not always
ifp->if_bpf), and NULL it immediately in bpfdetach().

PR:		213896
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11782
2017-08-16 19:40:07 +00:00
mjoras
7b31c2eca8 Rework vlan(4) locking.
Previously the locking of vlan(4) interfaces was not very comprehensive.
Particularly there was very little protection against the destruction of
active vlan(4) interfaces or concurrent modification of a vlan(4)
interface. The former readily produced several different panics.

The changes can be summarized as using two global vlan locks (an
rmlock(9) and an sx(9)) to protect accesses to the if_vlantrunk field of
struct ifnet, in addition to other places where global exclusive access
is required. vlan(4) should now be much more resilient to the destruction
of active interfaces and concurrent calls into the configuration path.

PR:	220980
Reviewed by:	ae, markj, mav, rstone
Approved by:	rstone (mentor)
MFC after:	4 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11370
2017-08-15 17:52:37 +00:00
sbruno
fa16cef53f Don't leak mbufs if clusers exceeds the number of segments. This would
leak mbufs over time causing crashes.

PR:		221202
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	gergely.czuczy@harmless.hu
Sponsored by:	Limelight Networks
2017-08-10 03:43:23 +00:00
sbruno
3be0299b70 Export IFCAP_HWSTATS so that we don't experience double stats counting
on iflib enabled devices.

PR:		220198
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	Ben Woods <woodsb02@freebsd.org>
Sponsored by:	Limelight Networks
2017-08-10 03:11:05 +00:00
ae
6dd69d60f6 Add to if_enc(4) ability to capture packets via BPF after pfil processing.
New flag 0x4 can be configured in net.enc.[in|out].ipsec_bpf_mask.
When it is set, if_enc(4) additionally captures a packet via BPF after
invoking pfil hook. This may be useful for debugging.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D11804
2017-08-09 12:24:07 +00:00
ae
4c7d86cbec Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
from enc_hhook().

This should solve the problem when pf is used with if_enc(4) interface,
and outbound packet with existing PCB checked by pf, and this leads to
deadlock due to pf does its own PCB lookup and tries to take rlock when
wlock is already held.

Now we pass PCB pointer if it is known to the pfil hook, this helps to
avoid extra PCB lookup and thus rlock acquiring is not needed.
For inbound packets it is safe to pass NULL, because we do not held any
PCB locks yet.

PR:		220217
MFC after:	3 weeks
Sponsored by:	Yandex LLC
2017-07-31 11:04:35 +00:00
loos
9d6de9853e Remove the unused mutex since r273220.
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-07-28 04:41:57 +00:00
sbruno
f2cee7313a Slight restructure of iflib_busdma_load_mbuf_sg() to fix accounting
when m_collapse() fails.

Submitted by:	krzystof.galazka@intel.com
Reviewed by:	Jeb Cramer <cramerj@intel.com>
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D11476
2017-07-27 22:53:47 +00:00
sbruno
4100e547e2 Deprecate unused int isc_max_txqsets and int isc_max_rxqsets as they
were redundant and not being used to set anything up.

Submitted by:	Matt Macy <mmacy@mattmacy.io>
Reported by:	Jeb Cramer <cramerj@intel.com>
Sponsored by:	Limelight Networks
2017-07-27 21:21:43 +00:00
bz
a0dcb7af20 After inpcb route caching was put back in place there is no need for
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).

Reviewed by:		np
Differential Revision:	https://reviews.freebsd.org/D11448
2017-07-27 13:03:36 +00:00
sbruno
0c1d43a29b Don't hold the RM lock during lagg_proto_addport() to avoid an LOR.
Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
Reviewed by:	mav
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11711
2017-07-25 14:41:50 +00:00
sephe
66f5bd66f9 rndis: Add LINK_SPEED_CHANGE status
Reviewed by:	hselasky
MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11650
2017-07-24 03:59:50 +00:00
sephe
8f8d888385 ethernet: Add ethernet interface attached event and devctl notification.
ifnet_arrival_event may not be adequate under certain situation; e.g.
when the LLADDR is needed.  So the ethernet ifattach event is announced
after all necessary bits are setup.

MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11617
2017-07-24 03:32:10 +00:00
loos
17be7892ad Update netmap_user.h with the current version of netmap. This file should
have been committed together with r319881.

MFC after:	1 week
MFC with:	r319881
Pointy hat to:	loos
2017-07-21 03:42:09 +00:00
dim
3c2763f846 Fix printf format warning in iflib.c
Clang 5.0.0 got better warnings about printf format strings using %zd,
and this leads to the following -Werror warning on e.g. arm:

    sys/net/iflib.c:1517:8: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
                                              sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
                                              ^~~~~~~~~~~~~~~~~~~~
    sys/net/iflib.c:1517:41: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
                                              sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
                                                                               ^~~~~~~~~~~~~~~~~~~~~~~

Fix this by casting bus_size_t arguments to uintmax_t, and using %ju
instead.

Reviewed by:	emaste
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D11679
2017-07-20 20:28:31 +00:00
sbruno
65d93b0e1a Don't cache mbuf pointers if the number of descriptors is greater than
the number of buffers.

Submitted by:	Matt Macy <mmacy@mattmacy.io>
Sponsored by:	Limelight Networks
2017-07-19 21:18:04 +00:00
sbruno
00c83bb626 iflib - flib_busdma_load_mbuf_sg used isc_tx_maxsize as max semgent size.
Submitted by:	krzysztof.galazka@intel.com
Differential Revision:	https://reviews.freebsd.org/D11403
2017-07-03 19:23:45 +00:00
sbruno
247f67d439 bnxt(4) Enable LRO support, redux
iflib - reset fl-ifl_fragidx to 0 on iflib_fl_bufs_free().  This caused the
panic in em/igb when adding it to a bridge device.

iflib - Handle out of order packet delivery from hardware in support of LRO

Out of order updates to rxd's is fixed in r315217. However, it is not
completely fixed.  While refilling the buffers, iflib is not considering
the out of order descriptors. Hence, it is refilling sequentially.
"idx" variable in _iflib_fl_refill routine is incremented sequentially.
By doing refilling sequentially, it will override the SGEs that
are *IN USE* by other connections.  Fix is to maintain a bitmap of
rx descriptors and differentiate the used one with unused one and
refill only at the unused indices.  This patch also fixes a
few bugs in bnxt, related to the same feature.

Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	venkatkumar.duvvuru@broadcom.com shurd
Differential Revision:	https://reviews.freebsd.org/D10681
2017-07-03 18:23:35 +00:00
jah
d1caaa9300 Clean up MD pollution of bus_dma.h:
--Remove special-case handling of sparc64 bus_dmamap* functions.
  Replace with a more generic mechanism that allows MD busdma
  implementations to generate inline mapping functions by
  defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>.  This
  is currently useful for sparc64, x86, and arm64, which all
  implement non-load dmamap operations as simple wrappers
  around map objects which may be bus- or device-specific.

--Remove NULL-checked bus_dmamap macros.  Implement the
  equivalent NULL checks in the inlined x86 implementation.
  For non-x86 platforms, these checks are a minor pessimization
  as those platforms do not currently allow NULL maps.  NULL
  maps were originally allowed on arm64, which appears to have
  been the motivation behind adding arm[64]-specific barriers
  to bus_dma.h, but that support was removed in r299463.

--Simplify the internal interface used by the bus_dmamap_load*
  variants and move it to bus_dma_internal.h

--Fix some drivers that directly include sys/bus_dma.h
  despite the recommendations of bus_dma(9)

Reviewed by:	kib (previous revision), marius
Differential Revision:	https://reviews.freebsd.org/D10729
2017-07-01 05:35:29 +00:00
jhibbits
342f85f1d4 Update comments and simplify conditionals for compat32
Only amd64 (because of i386) needs 32-bit time_t compat now, everything else is
64-bit time_t.  Rather than checking on all 64-bit time_t archs, only check the
oddball amd64/i386.

Reviewed By: emaste, kib, andrew
Differential Revision: https://reviews.freebsd.org/D11364
2017-06-27 01:29:10 +00:00
jhibbits
11d34fd62a Solve the y2038 problem for powerpc
AKA Make time_t 64 bits on powerpc(32).

PowerPC currently (until now) was one of two architectures with a 32-bit time_t
on 32-bit archs (the other being i386).  This is an ABI breakage, so all ports,
and all local binaries, *must* be recompiled.

Tested by:	andreast, others
MFC after:	Never
Relnotes:	Yes
2017-06-26 02:25:19 +00:00
sbruno
627887830a Revert r319989 "bnxt(4) Enable LRO support"
This generates startup LORs and panics when adding elements to bridge
devices. I will document further in https://reviews.freebsd.org/D10681

PR:	220073
Submitted by:	dchagin
Reported by:	db
2017-06-17 17:42:52 +00:00
sbruno
0ba7565d57 bnxt(4) Enable LRO support
iflib - Handle out of order packet delivery from hardware in support of LRO

Out of order updates to rxd's is fixed in r315217. However, it is not
completely fixed.  While refilling the buffers, iflib is not considering
the out of order descriptors. Hence, it is refilling sequentially.
"idx" variable in _iflib_fl_refill routine is incremented sequentially.
By doing refilling sequentially, it will override the SGEs that
are *IN USE* by other connections.  Fix is to maintain a bitmap of
rx descriptors and differentiate the used one with unused one and
refill only at the unused indices.  This patch also fixes a
few bugs in bnxt, related to the same feature.

Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	shurd@
Differential Revision:	https://reviews.freebsd.org/D10681
2017-06-15 21:06:03 +00:00
sbruno
26edb1a898 Revert r319921 which seems to cause NFS booting assertion panics in
various configurations.

Reported by:	pho@
2017-06-15 17:46:20 +00:00
sbruno
c1b571ca59 Add new sysctl to allow changing of timing of the txq timers.
Add new sysctl to override use of busdma in the driver.

Submitted by:	Drew Gallitin <gallatin@netflix.com>
2017-06-13 23:16:38 +00:00
sbruno
3af83f20cf Plug mbuf leak in the busdma path of iflib.
Submitted by:	Michael Tuexen <tuexen@freebsd.org>
Reported by:	Drew Gallitin <gallatin@netflix.com>
2017-06-13 19:32:23 +00:00
ae
032b580995 Resurrect RTF_RNH_LOCKED flag and restore ability to call rtalloc1_fib()
with acquired RIB lock.

This fixes a possible panic due to trying to acquire RIB rlock when it is
already exclusive locked.

PR:		215963, 215122
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-06-13 10:52:31 +00:00
mav
0c95955a5e Call VLAN_CAPABILITIES() when LAGG capabilities change.
This makes VLAN on top of LAGG to expose proper capabilities if they are
changed after creation.

MFC after:	1 week
2017-05-26 22:22:48 +00:00
mav
b1b0d2011f Improve applying unified capabilities to the lagg ports.
Some NICs have some capabilities dependent, so that disabling one require
disabling some other (TXCSUM/RXCSUM on em).  This code tries to reach the
consensus more insistently.

PR:		219453
MFC after:	1 week
2017-05-26 20:15:33 +00:00
mav
20571e84fe Remove some code, dead from the day one. 2017-05-25 23:19:09 +00:00
mav
de4e1191cd Add parent interface reference counting to if_vlan.
Using plain ifunit() looks like a request for troubles.

MFC after:	1 week
2017-05-23 00:13:27 +00:00
sephe
bd9a963c16 net/vlan: Revert 305177
Miss read the parentheses.

Reported by:	oleg@
Reviewed by:	hps@
MFC after:	3 days
Sponsored by:	Microsoft
2017-05-19 01:42:31 +00:00
emaste
1901c3e1f2 Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10193
2017-05-17 00:34:34 +00:00
rpokala
22bc11147b Persistently store NIC's hardware MAC address, and add a way to retrive it
An earlier version of r318160 allocated if_hw_addr unconditionally; when it
became conditional, I forgot to check for NULL in ether_ifattach().

Reviewed by:	kp
MFC after:	1 week
MFC with:	r318160
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D10678
Pointy-hat to:	rpokala
2017-05-11 06:46:39 +00:00
rpokala
0cfdb3c305 Persistently store NIC's hardware MAC address, and add a way to retrive it
The MAC address reported by `ifconfig ${nic} ether' does not always match
the address in the hardware, as reported by the driver during attach. In
particular, NICs which are components of a lagg(4) interface all report the
same MAC.

When attaching, the NIC driver passes the MAC address it read from the
hardware as an argument to ether_ifattach(). Keep a second copy of it, and
create ioctl(SIOCGHWADDR) to return it. Teach `ifconfig' to report it along
with the active MAC address.

PR:		194386
Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D10609
2017-05-10 22:13:47 +00:00
erj
0d01aa2c62 Add several new media types to if_media.h
These include several 25G types (for active direct attach cables and LR modules),
and a missing type for 10G active direct attach.

Differential Revision:	https://reviews.freebsd.org/D10425
Reviewed by:	smh, imp
MFC after:	3 days
Sponsored by:	Intel Corporation
2017-05-10 18:33:40 +00:00
bz
702ccdb9bb Adjust a comment.
MFC after:	3 days
2017-05-09 08:29:55 +00:00
mav
d332b8edc8 Relax r317696 locking to not drain taskqueue under the lock.
MFC after:	11 days
2017-05-05 16:51:53 +00:00
mav
d6379610b4 Fix r317696 build without debug.
MFC after:	2 weeks
2017-05-03 02:54:11 +00:00
mav
39f39609ab Introduce sleepable locks into if_lagg.
Before this change if_lagg was using nonsleepable rmlocks to protect its
internal state.  This patch introduces another sx lock to protect code
paths that require sleeping, while still uses old rmlock to protect hot
nonsleepable data paths.

This change allows to remove taskqueue decoupling used before to change
interface addresses without holding the lock.  Instead it uses sx lock to
protect direct if_ioctl() calls.

As another bonus, the new code synchronizes enabled capabilities of member
interfaces, and allows to control them with ifconfig laggX, that was
impossible before.  This part should fix interoperation with if_bridge,
that may need to disable some capabilities, such as TXCSUM or LRO, to allow
bridging with noncapable interfaces.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D10514
2017-05-02 19:09:11 +00:00
mav
08f5e30a7c Make if_bridge complain if it can't disable some capabilities.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-04-29 08:52:07 +00:00
mav
9899f92afe Propagate IFCAP_LRO from trunk to vlan interface.
False positive here cost nothing, while false negative may lead to some
confusions.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-04-29 08:28:59 +00:00
mav
5fe4345195 Allow some control over enabled capabilities for if_vlan.
It improves interoperability with if_bridge, which may need to disable
some capabilities not supported by other members.  IMHO there is still
open question about LRO capability, which may need to be disabled on
physical interface.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-04-28 11:00:58 +00:00
brooks
35c0325946 Remove the NATM framework including the en(4), fatm(4), hatm(4), and
patm(4) devices.

Maintaining an address family and framework has real costs when we make
infrastructure improvements.  In the case of NATM we support no devices
manufactured in the last 20 years and some will not even work in modern
motherboards (some newer devices that patm(4) could be updated to
support apparently exist, but we do not currently have support).

With this change, support remains for some netgraph modules that don't
require NATM support code. It is unclear if all these should remain,
though ng_atmllc certainly stands alone.

Note well: FreeBSD 11 supports NATM and will continue to do so until at
least September 30, 2021.  Improvements to the code in FreeBSD 11 are
certainly welcome.

Reviewed by:	philip
Approved by:	harti
2017-04-24 21:21:49 +00:00
mav
e3b5ab75e5 Remove unneeded conditions.
MFC after:	2 weeks
2017-04-22 08:38:49 +00:00
mav
dfb79d4360 Add interface reference counting to if_lagg.
Using plain ifunit() looks like request for troubles.

MFC after:	2 weeks
2017-04-21 13:45:01 +00:00
jkim
9445e9d6e1 Use kmem_malloc() instead of malloc(9) for the native amd64 filter.
r316767 broke the BPF JIT compiler for amd64 because malloc()'d space is no
longer executable.

Discussed with:	kib, alc
2017-04-17 22:02:09 +00:00
jkim
a1dd1ff45c Remove an unnecessary declaration missed in the previous commit. 2017-04-17 21:57:23 +00:00
jkim
22babe5387 Move declarations for a machine-dependent function to the header file. 2017-04-17 21:51:26 +00:00
pkelsey
f3f1c24017 Fix userland tools that don't check the format of routing socket
messages before accessing message fields that may not be present,
removing dead/duplicate/misleading code along the way.

Document the message format for each routing socket message in
route.h.

Fix a bug in usr.bin/netstat introduced in r287351 that resulted in
pointer computation with essentially random 16-bit offsets and
dereferencing of the results.

Reviewed by:	ae
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D10330
2017-04-16 19:17:10 +00:00
ae
b8fb127599 Inherit IPv6 checksum offloading flags to vlan interfaces.
if_vlan(4) interfaces inherit IPv4 checksum offloading flags from the
parent when VLAN_HWCSUM and VLAN_HWTAGGING flags are present on the
parent interface. Do the same for IPv6 checksum offloading flags.

Reported by:	Harry Schmalzbauer
Reviewed by:	np, gnn
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10356
2017-04-11 19:23:25 +00:00
ae
e39b657cfd Do not adjust interface MTU automatically. Leave this task to the system
administrator.

This restores the behavior that was prior to r274246.

No objection from:	#network
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10215
2017-04-11 08:56:18 +00:00
pkelsey
8f92862e14 Fixed typo in comment found while reading commit email for fix of
other typo in same comment.

ned -> need

MFC after:	3 days
2017-04-08 04:50:50 +00:00
pkelsey
a9e90857e4 Fixed typo in comment.
patckets -> packets

MFC after:	3 days
2017-04-08 04:45:52 +00:00
pkelsey
e8472f7b89 Fix typo in comment.
logest -> longest

MFC after:	3 days
2017-04-08 04:37:01 +00:00
sbruno
a2a631bb57 Move pause frame counter out of struct if_ctx and into struct if_softc_ctx_t
so that we can use it in iflib to detect pause frames.

The igb(4) driver definitely used to use this in its old timer function and
I see no reason to restrict it to that driver only.

Sponsored by:	Limelight Networks
2017-04-07 00:33:03 +00:00
sbruno
89da930ab7 Allow MSIX to be turned off by tuneable per interface, per driver.
Sponsored by:	Limelight Networks
2017-04-04 21:03:34 +00:00
ae
818d823c95 Remove "IPFW static rules" rmlock.
Make PFIL's lock global and use it for this purpose.
This reduces the number of locks needed to acquire for each packet.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
No objection from: #network
Differential Revision:	https://reviews.freebsd.org/D10154
2017-04-03 13:35:04 +00:00
sbruno
15395e4837 Don't call init functions directly from the timer/watchdog function.
Enqueue this in the admin task now that it can process it.

Submitted by:	Matt Macy <mmacy@nextbsd.org>
Sponsored by:	Limelight Networks
2017-03-30 16:54:01 +00:00
sbruno
21ffe1a952 Assert IFF_DRV_OACTIVE in iflib_timer() when the "hung" case is detected
so that iflib's admin task can still process the reset directive and restore
functionality.

Sponsored by:	Limelight Networks
2017-03-30 16:03:51 +00:00
ae
1a8fbf8eae ake pfil's locking macros private.
Obtained from:	Yandex LLC
MFC after:	1 week
2017-03-27 08:18:13 +00:00
ae
1aa0212700 Declare module version.
MFC after:	1 week
2017-03-27 07:56:41 +00:00
eri
fa16664608 Correct handling of ALTQ with epair(4) interfaces but presenting that ALTQ(9) is supported.
Approved by:	ae
MFC after:	2 weeks
2017-03-24 00:55:16 +00:00
kp
6f8b05d841 pf: Fix possible shutdown race
Prevent possible races in the pf_unload() / pf_purge_thread() shutdown
code. Lock the pf_purge_thread() with the new pf_end_lock to prevent
these races.

Use a shared/exclusive lock, as we need to also acquire another sx lock
(VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload()
to sleep,

Pointed out by: eri, glebius, jhb
Differential Revision:	https://reviews.freebsd.org/D10026
2017-03-22 21:18:18 +00:00
sbruno
8d94af0c96 Change casting to a uintptr_t to be compatible with non-x86 architectures.
Submitted by:	Matt Macy <mmacy@nextbsd.org>
Reported by:	rpokala
Sponsored by:	Limelight Networks
2017-03-14 22:25:07 +00:00
sbruno
ead4920e1e Fixup LINT by using uint64_t type as we do on all other calls to PNMB()
Found with Jenkins.

Reported by:	lwshu
Sponsored by:	Limelight Networks
2017-03-14 15:08:56 +00:00
sbruno
303d1e3a0e IFLIB updates
- unconditionally enable BUS_DMA on non-x86 architectures
- speed up rxd zeroing via customized function
- support out of order updates to rxd's
- add prefetching to hardware descriptor rings
- only prefetch on 10G or faster hardware
- add seperate tx queue intr function
- preliminary rework of NETMAP interfaces, WIP

Submitted by:	Matt Macy <mmacy@nextbsd.org>
Sponsored by:	Limelight Networks
2017-03-13 22:53:06 +00:00
ae
84935b56a6 Ignore ifnet renaming in the bpf ifnet departure handler.
PR:		213015
MFC after:	1 week
2017-03-13 09:04:10 +00:00
ae
a968322d07 Remove now unneded cast. 2017-03-08 08:09:41 +00:00
ae
749be37c28 Introduce the concept of IPsec security policies scope.
Currently are defined three scopes: global, ifnet, and pcb.
Generic security policies that IKE daemon can add via PF_KEY interface
or an administrator creates with setkey(8) utility have GLOBAL scope.
Such policies can be applied by the kernel to outgoing packets and checked
agains inbound packets after IPsec processing.
Security policies created by if_ipsec(4) interfaces have IFNET scope.
Such policies are applied to packets that are passed through if_ipsec(4)
interface.
And security policies created by application using setsockopt()
IP_IPSEC_POLICY option have PCB scope. Such policies are applied to
packets related to specific socket. Currently there is no way to list
PCB policies via setkey(8) utility.

Modify setkey(8) and libipsec(3) to be able distinguish the scope of
security policies in the `setkey -DP` listing. Add two optional flags:
'-t' to list only policies related to virtual *tunneling* interfaces,
i.e. policies with IFNET scope, and '-g' to list only policies with GLOBAL
scope. By default policies from all scopes are listed.

To implement this PF_KEY's sadb_x_policy structure was modified.
sadb_x_policy_reserved field is used to pass the policy scope from the
kernel to userland. SADB_SPDDUMP message extended to support filtering
by scope: sadb_msg_satype field is used to specify bit mask of requested
scopes.

For IFNET policies the sadb_x_policy_priority field of struct sadb_x_policy
is used to pass if_ipsec's interface if_index to the userland. For GLOBAL
policies sadb_x_policy_priority is used only to manage order of security
policies in the SPDB. For IFNET policies it is not used, so it can be used
to keep if_index.

After this change the output of `setkey -DP` now looks like:
# setkey -DPt
0.0.0.0/0[any] 0.0.0.0/0[any] any
	in ipsec
	esp/tunnel/87.250.242.144-87.250.242.145/unique:145
	spid=7 seq=3 pid=58025 scope=ifnet ifname=ipsec0
	refcnt=1
# setkey -DPg
::/0 ::/0 icmp6 135,0
	out none
	spid=5 seq=1 pid=872 scope=global
	refcnt=1

No objection from:	#network
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9805
2017-03-07 00:13:53 +00:00