Commit Graph

4189 Commits

Author SHA1 Message Date
erj
c98f1a9c3d iflib: Prevent watchdog from resetting idle queues
While changing link state in iflib_link_state_change(), queues are
marked as IFLIB_QUEUE_IDLE to disable watchdog. Currently, iflib_timer()
watchdog does not check for previous queue status before marking it as
IFLIB_QUEUE_HUNG.

This patch adds check of queue status before marking it as hung.

Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>

PR:		239240
Submitted by:	Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reported by:	ultima@
Reviewed by:	gallatin@, erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21712
2020-01-02 23:35:06 +00:00
melifaro
82f36e1339 Plug loopback idaddr refcount leak.
Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22980
2020-01-02 09:08:45 +00:00
glebius
ef9a657efe In r343631 error code for a packet blocked by a firewall was
changed from EACCES to EPERM.  This change was not intentional,
so fix that.  Return EACCESS if a firewall forbids sending.

Noticed by:	ae
2020-01-01 17:31:43 +00:00
melifaro
a5202d041b Fix NOINET6 build broken by r356236.
MFC after:	2 weeks
2019-12-31 17:57:12 +00:00
melifaro
a8d37c4de1 Split gigantic rtsock route_output() into smaller functions.
Amount of changes to the original code has been intentionally minimised
to ease diffing.
The changes are mostly mechanical, with the following exceptions:

* lltable handler is now called directly based of RTF_LLINFO flag presense.
* "report" logic for updating rtm in RTM_GET/RTM_DELETE has been simplified,
  fixing several potential use-after-free cases in rt_addrinfo.
* llable asserts has been replaced with error-returning, preventing kernel
  crashes when lltable gw af family is invalid (root required).

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22864
2019-12-31 17:26:53 +00:00
markj
7de9f0a7f0 Plug some ifaddr refcount leaks.
- Only take an ifaddr ref in in rt_exportinfo() if the caller explicitly
  requests it.  Take care to release it in this case.
- Don't unconditionally take a ref in rtrequest1_fib().  rt_getifa_fib()
  will acquire a reference, in which case we would previously acquire
  two references.
- Stop taking a reference in rtinit1() before calling rtrequest1_fib().
  rtrequest1_fib() will acquire a reference for the RTM_ADD case.

PR:		242746
Reviewed by:	melifaro (previous version)
Tested by:	ghuckriede@blackberry.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22912
2019-12-27 01:12:54 +00:00
markj
7733daac8a lagg: Clean up handling of the rr_limit option.
- Don't allow an unprivileged user to set the stride. [1]
- Only set the stride under the softc lock.
- Rename the internal fields to accurately reflect their use.  Keep
  ro_bkt to avoid changing the user API.
- Simplify the implementation.  The port index is just sc_seq / stride.
- Document rr_limit in ifconfig.8.

Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com> [1]
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22857
2019-12-22 21:56:47 +00:00
cy
81062ad740 MFV r353141 (by phillip):
Update libpcap from 1.9.0 to 1.9.1.

MFC after:	2 weeks
2019-12-21 21:01:03 +00:00
markj
0812f9fdb5 Deduplicate code between if_delgroup() and if_delgroups().
Fix some style in if_addgroup().  No functional change intended.

Reviewed by:	hselasky
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22892
2019-12-20 20:15:34 +00:00
markj
1d28034bba Fix a memory leak in if_delgroups() introduced in r334118.
PR:		242712
Submitted by:	ghuckriede@blackberry.com
MFC after:	3 days
2019-12-20 17:21:57 +00:00
glebius
eb4e118487 Convert routing statistics to VNET_PCPUSTAT.
Submitted by:	ocochard
Reviewed by:	melifaro, glebius
Differential Revision:	https://reviews.freebsd.org/D22834
2019-12-17 02:02:26 +00:00
jhb
cbeae41eb0 Use a void * argument to callout handlers instead of timeout_t casts.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22684
2019-12-05 18:47:29 +00:00
bz
81a051c6de Allow kernel to compile without BPF.
r297816 added some bpf magic for VIMAGE unconditionally which no longer
allows kernels to compile without bpf (but with other networking).
Add the missing ifdef checks and allow a kernel to compile without bpf
again.

PR:		242136
Reported by:	dave mischler.com
MFC after:	2 weeks
2019-11-24 23:21:47 +00:00
cem
35d496b56a Add explicit SI_SUB_EPOCH
Add explicit SI_SUB_EPOCH, after SI_SUB_TASKQ and before SI_SUB_SMP
(EARLY_AP_STARTUP).  Rename existing "SI_SUB_TASKQ + 1" to SI_SUB_EPOCH.

epoch(9) consumers cannot epoch_alloc() before SI_SUB_EPOCH:SI_ORDER_SECOND,
but likely should allocate before SI_SUB_SMP.  Prior to this change,
consumers (well, epoch itself, and net/if.c) just open-coded the
SI_SUB_TASKQ + 1 order to match epoch.c, but this was fragile.

Reviewed by:	mmacy
Differential Revision:	https://reviews.freebsd.org/D22503
2019-11-22 23:23:40 +00:00
vmaffione
5eaf402248 netmap: check if we already ran mmap before we attempt it
Submitted by:	neel@neelc.org
Reviewed by:	vmaffione
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22390
2019-11-19 21:29:49 +00:00
bz
a8ddb9b30d if_llatbl: change htable_unlink_entry() to early exist if no work to do
Adjust the logic in htable_unlink_entry() to the one in
htable_link_entry() saving a block indent and making it more clear
in which case we do not do any work.

No functional change.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-15 23:12:19 +00:00
bz
0d249af98c if_llatbl: cleanup
Remove function prototypes which are not needed (no use before function
definition for these file static functions).

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-15 11:00:03 +00:00
glebius
9bd9f041d3 In if_siocaddmulti() enter VNET.
Reported & tested by:	garga
2019-11-13 16:28:53 +00:00
bz
228fa8096b lltabl: remove dead code
Remove the long (8? years ago) #if 0 marked function lltable_drain() and
while here also remove the unused function llentry_alloc() which has call
paths tools keep finding and are never used.

Sponsored by:	Netflix
2019-11-13 11:21:02 +00:00
glebius
2be4404487 sysctl_rtsock() has all necessary locking and doesn't need Giant to run.
While here add description.
2019-11-07 19:06:18 +00:00
ae
c2745f25b2 Enqueue lladdr_task to update link level address of vlan, when its parent
interface has changed.

During vlan reconfiguration without destroying interface, it is possible,
that parent interface will be changed. This usually means, that link
layer address of vlan will be different. Therefore we need to update all
associated with vlan's addresses permanent llentries - NDP for IPv6
addresses, and ARP for IPv4 addresses. This is done via lladdr_task
execution. To avoid extra work, before execution do the check, that L2
address is different.

No objection from:	#network
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D22243
2019-11-07 15:00:37 +00:00
erj
2cefc8b702 net: add ETHER_IS_ZERO macro similar to ETHER_IS_BROADCAST
Some places in network code may need to verify that an ethernet address
is not the 'zero' address. Provide a standard macro ETHER_IS_ZERO for
this purpose, similar to the ETHER_IS_BROADCAST macro already available.

This patch also removes previous ETHER_IS_ZERO definitions in several
USB ethernet drivers, in favor of this centrally-located macro.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21240
2019-11-05 00:12:21 +00:00
erj
08b39c8928 iflib: properly release memory allocated for DMA
DMA memory allocations using the bus_dma.h interface are not properly
released in all cases for both Tx and Rx. This causes ~448 bytes of
M_DEVBUF allocations to be leaked.

First, the DMA maps for Rx are not properly destroyed. A slight attempt
is made in iflib_fl_bufs_free to destroy the maps if we're detaching.
However, this function may not be reliably called during detach. Indeed,
there is a comment "asking" if this should be moved out.

Fix this by moving the bus_dmamap_destroy call into iflib_rx_sds_free,
where we already sync and unload the DMA.

Second, the DMA tag associated with the ifr_ifdi descriptor DMA is not
released properly anywhere. Add a call to iflib_dma_free in
iflib_rx_structures_free.

Third, use of NULL as a canary value on the map pointer returned by
bus_dmamap_create is not valid. On some platforms, notably x86, this
value may be NULL. In this case, we fail to properly release the related
resources.

Remove the NULL checks on map values in both iflib_fl_bufs_free and
iflib_txsd_destroy.

With all of these fixes applied, the leaks to M_DEVBUF are squelched,
and iflib drivers now seem to properly cleanup when detaching.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22203
2019-11-04 23:06:57 +00:00
vmaffione
15ff2564d1 netmap: fix build issue in netmap_user.h
The issue was a comparison of integers of different signs
on 32 bit architectures.

Reported by:	jenkins
MFC after:	1 week
2019-10-31 22:16:20 +00:00
vmaffione
78a2f9a94d add valectl to the system commands
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap

Reviewed by:	hrs, bcr, lwhsu, kevans
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22146
2019-10-31 21:01:34 +00:00
erj
68049ffe3a iflib: cleanup memory leaks on driver detach
From Jake:
The iflib stack failed to release all of the memory allocated under
M_IFLIB during device detach.

Specifically, the ifmp_ring, the ift_ifdi Tx DMA info, and the ifr_ifdi Rx
DMA info were not being released.

Release this memory so that iflib won't leak memory when a device
detaches.

Since we're freeing the ift_ifdi pointer during iflib_txq_destroy we
need to call this only after iflib_dma_free in iflib_tx_structures_free.

Additionally, also ensure that we destroy the callout mutex associated
with each Tx queue when we free it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22157
2019-10-30 20:45:12 +00:00
glebius
4d54a39862 There is a long standing problem with multicast programming for NICs
and IPv6.  With IPv6 we may call if_addmulti() in context of processing
of an incoming packet.  Usually this is interrupt context.  While most
of the NIC drivers are able to reprogram multicast filters without
sleeping, some of them can't.  An example is e1000 family of drivers.
With iflib conversion the problem was somewhat hidden.  Iflib processes
packets in private taskqueue, so going to sleep doesn't trigger an
assertion.  However, the sleep would block operation of the driver and
following incoming packets would fill the ring and eventually would
start being dropped.  Enabling epoch for the full time of a packet
processing again started to trigger assertions for e1000.

Fix this problem once and for all using a general taskqueue to call
if_ioctl() method in all cases when if_addmulti() is called in a
non sleeping context.  Note that nobody cares about returned value.

Reviewed by:	hselasky, kib
Differential Revision:	  https://reviews.freebsd.org/D22154
2019-10-29 17:36:06 +00:00
erj
a49574e5a2 iflib: call ether_ifdetach and netmap_detach before stop
From Jake:
Calling ether_ifdetach after iflib_stop leads to a potential race where
a stale ifp pointer can remain in the route entry list for IPv6 traffic.
This will potentially cause a page fault or other system instability if
the ifp pointer is accessed.

Move both iflib_netmap_detach and ether_ifdetach to be called prior to
iflib_stop. This avoids the race above, and helps ensure that other ifp
references are removed before stopping the interface.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, jhb@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22071
2019-10-23 23:20:49 +00:00
cem
cae3e20483 Prevent a panic when a driver provides bogus debugnet parameters
This is just a bandaid; we should fix the driver(s) too.  Introduced in
r353685.

PR:		241403
X-MFC-With:	r353685
Reported by:	np and others
2019-10-23 16:48:22 +00:00
kevans
83fa8806ca tuntap(4): Fix NOINET build after r353741
Shuffle headers around to more appropriate #ifdef OPTION blocks (INET vs.
INET6) -- double checked LINT-{NOINET,NOINET6,NOIP}, all seem good.

Reported by:	cem
2019-10-23 02:15:15 +00:00
kevans
eb99e62aa8 tuntap(4): properly declare if_tun and if_tap modules
Simply adding MODULE_VERSION does not do the trick, because the modules
haven't been declared. This should actually fix modfind/kldstat, which
r351229 aimed and failed to do.

This should make vm-bhyve do the right thing again when using the ports
version, rather than the latest version not in ports.

MFC after:	3 days
2019-10-22 00:18:16 +00:00
glebius
8ec25643ca Remove obsoleted KPIs that were used to access interface address lists. 2019-10-21 18:17:03 +00:00
kevans
26d6f82958 tuntap(4): restrict scope of net.link.tap.user_open slightly
net.link.tap.user_open has historically allowed non-root users to do devfs
cloning and open /dev/tap* nodes based on permissions. Loosen this up to
make it only allow users to do devfs cloning -- we no longer check it in
tunopen.

This allows tap devices to be created that can actually be opened by a user,
rather than swiftly restricting them to root because the magic sysctl has
not been set.

The sysctl has not yet been completely deprecated, because more thought is
needed for how to handle the devfs cloning case. There is not an easy
suitable replacement for the sysctl there, and more care needs to be placed
in determining whether that's OK or not.

PR:		200185
2019-10-21 14:38:11 +00:00
kevans
70c45ec99a tuntap(4): use cdevpriv w/ dtor for last close instead of d_close
cdevpriv dtors will be called when the reference count on the associated
struct file drops to 0, while d_close can be unreliable for cleaning up
state at "last close" for a number of reasons. As far as tunclose/tundtor is
concerned the difference is minimal, so make the switch.
2019-10-20 22:55:47 +00:00
kevans
58e3081500 tuntap(4): Use make_dev_s to avoid si_drv1 race
This allows us to avoid some dance in tunopen for dealing with the
possibility of dev->si_drv1 being NULL as it's set prior to the devfs node
being created in all cases.

There's still the possibility that the tun device hasn't been fully
initialized, since that's done after the devfs node was created. Alleviate
this by returning ENXIO if we're not to that point of tuncreate yet.

This work is what sparked r353128, full initialization of cloned devices
w/ specified make_dev_args.
2019-10-20 22:39:40 +00:00
kevans
472be58c41 tuntap(4): break out after setting TUN_DSTADDR
This is now the only flag we set in this loop, terminate early.
2019-10-20 21:06:25 +00:00
kevans
1942df0600 tuntap(4): Drop TUN_IASET
This flag appears to have been effectively unused since introduction to
if_tun(4) -- drop it now.
2019-10-20 21:03:48 +00:00
tuexen
71224580cf Fix compile issues when building a kernel without the VIMAGE option.
Thanks to cem@ for discussing the issue which resulted in this patch.

Reviewed by:		cem@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22089
2019-10-19 20:48:53 +00:00
cem
96be16debb Fix debugnet(4) link/build fallout on some configurations
Introduced in r353685 (sys/conf/files), r353694 (debugnet.c db_printf).

Submitted by:	kevans
Reported by:	cy
X-MFC-With:	r353685, r353694
2019-10-18 22:03:36 +00:00
vmaffione
8e95c619ec tap: add support for virtio-net offloads
This patch is part of an effort to make bhyve networking (in particular TCP)
faster. The key strategy to enhance TCP throughput is to let the whole packet
datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet
overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver,
which is able to handle TSO/LRO packets leveraging the virtio-net header
(see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend,
which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net
header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the
virtio-net header. Once the virtio-net header is set, the tapX interface
acquires all the IFCAP capabilities necessary for TSO/LRO.

Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D21263
2019-10-18 21:53:27 +00:00
glebius
417268623e Make rt_getifa_fib() static. 2019-10-18 15:20:24 +00:00
cem
45bf92cd20 Implement NetGDB(4)
NetGDB(4) is a component of a system using a panic-time network stack to
remotely debug crashed FreeBSD kernels over the network, instead of
traditional serial interfaces.

There are three pieces in the complete NetGDB system.

First, a dedicated proxy server must be running to accept connections from
both NetGDB and gdb(1), and pass bidirectional traffic between the two
protocols.

Second, the NetGDB client is activated much like ordinary 'gdb' and
similarly to 'netdump' in ddb(4) after a panic.  Like other debugnet(4)
clients (netdump(4)), the network interface on the route to the proxy server
must be online and support debugnet(4).

Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any
other TCP remote) to connect to the proxy server.

The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and
uses a 1:1 relationship between GDB packets and sequences of debugnet
packets (fragmented by MTU).  There is no encryption utilized to keep
debugging sessions private, so this is only appropriate for local
segments or trusted networks.

Submitted by:	John Reimer <john.reimer AT emc.com> (earlier version)
Discussed some with:	emaste, markj
Relnotes:	sure
Differential Revision:	https://reviews.freebsd.org/D21568
2019-10-17 21:33:01 +00:00
cem
abc2745a10 debugnet(4): Add optional full-duplex mode
It remains unattached to any client protocol.  Netdump is unaffected
(remaining half-duplex).  The intended consumer is NetGDB.

Submitted by:	John Reimer <john.reimer AT emc.com> (earlier version)
Discussed with:	markj
Differential Revision:	https://reviews.freebsd.org/D21541
2019-10-17 20:25:15 +00:00
glebius
7d6b8e7344 Revert two parts of r353292 that enter epoch when processing vlan capabilities.
It could be that entering epoch isn't necessary here, but better take a
conservative approach.

Submitted by:	kp
2019-10-17 20:18:07 +00:00
cem
f92f351606 debugnet(4): Infer non-server connection parameters
Loosen requirements for connecting to debugnet-type servers.  Only require a
destination address; the rest can theoretically be inferred from the routing
table.

Relax corresponding constraints in netdump(4) and move ifp validation to
debugnet connection time.

Submitted by:	John Reimer <john.reimer AT emc.com> (earlier version)
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D21482
2019-10-17 20:10:32 +00:00
cem
b0452a96d2 Add ddb(4) 'netdump' command to netdump a core without preconfiguration
Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine
to the generic debugnet code.  The imagined use is both netdump, shown here,
and NetGDB (vaporware).  It uses the ddb(4) lexer, with some new extensions,
to parse out IPv4 addresses.

'Netdump' uses the generic debugnet routine to load a configuration and
start a dump, without any netdump configuration prior to panic.

Loosely derived from work by:	John Reimer <john.reimer AT emc.com>
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D21460
2019-10-17 19:49:20 +00:00
cem
07646b2459 debugnet: Respond to broadcast ARP requests
The in-tree netdump code has always ignored non-directed ARP requests, and
that seems to work most of the time for netdump.

In my work and testing on NetGDB, it seems like sometimes the remote FreeBSD
conversant (the non-panic system) will send broadcast-destination ARP
requests to the debugnet kernel; without this change, those are dropped and
the remote will see EHOSTDOWN "Host is down" errors from the userspace
interface of the network stack.

Discussed with:	markj
2019-10-17 17:48:32 +00:00
cem
0478ef8c6b debugnet(4): Check hardware-validated UDP checksums
Similar to INET checksums, lazily validate UDP checksums when the driver has
already performed the check for us.  Like debugnet(4) INET checksums,
validation in software is left as future work.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D21745
2019-10-17 17:19:16 +00:00
cem
f3a0ee41db Split out a more generic debugnet(4) from netdump(4)
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport.  It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4).  Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c.  UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c.  The
separation is not perfect and future improvement is welcome.  Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry.  I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking.  Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time.  If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark.  Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by:	markj (earlier version)
Some discussion with:	emaste, jhb
Objection from:	marius
Differential Revision:	https://reviews.freebsd.org/D21421
2019-10-17 16:23:03 +00:00
philip
28635f8b5e ether: add older ethertype definitions for QinQ
Older network equipment used the ethertypes 0x9100, 0x9200, and 0x9300 for
outer VLANs, before standardisation introduced 0x88a8.

Submitted by:	 Lutz Donnerhacke <lutz_donnerhacke.de>
Differential Revision:	https://reviews.freebsd.org/D21846
2019-10-17 00:34:53 +00:00