Commit Graph

4266 Commits

Author SHA1 Message Date
Gleb Smirnoff
9758a507e9 gif_transmit() must always be called in the network epoch. 2020-01-15 06:18:32 +00:00
Gleb Smirnoff
2a4bd982d0 Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch.   The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC
2020-01-15 06:05:20 +00:00
Gleb Smirnoff
97168be809 Mechanically substitute assertion of in_epoch(net_epoch_preempt) to
NET_EPOCH_ASSERT(). NFC
2020-01-15 05:45:27 +00:00
Gleb Smirnoff
3264dcadc9 - Move global network epoch definition to epoch.h, as more different
subsystems tend to need to know about it, and including if_var.h is
  huge header pollution for them.  Polluting possible non-network
  users with single symbol seems much lesser evil.
- Remove non-preemptible network epoch.  Not used yet, and unlikely
  to get used in close future.
2020-01-15 03:34:21 +00:00
Vincenzo Maffione
2ec213aba4 netmap: disable passthrough with no hypervisor support
The netmap passthrough subsystem requires proper support in the
hypervisor. In particular, two PCI device ids (from the Red Hat
PCI vendor id 0x1b36) need to be assigned to the two netmap
virtual devices. We then disable these devices until the ids have
not been assigned, in order to avoid conflicts with other
virtual devices emulated by upstream QEMU.

PR:	241774
MFC after:	3 days
2020-01-13 21:47:23 +00:00
Alexander V. Chernikov
ead85fe415 Add fibnum, family and vnet pointer to each rib head.
Having metadata such as fibnum or vnet in the struct rib_head
 is handy as it eases building functionality in the routing space.
This change is required to properly bring back route redirect support.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23047
2020-01-09 17:21:00 +00:00
Mark Johnston
c23df8eafa lagg: Further cleanup of the rr_limit option.
Add an option flag so that arbitrary updates to a lagg's configuration
do not clear sc_stride.  Preseve compatibility for old ifconfig
binaries.  Update ifconfig to use the new flag and improve the casting
used when parsing the option parameter.

Modify the RR transmit function to avoid locklessly reading sc_stride
twice.  Ensure that sc_stride is always 1 or greater.

Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23092
2020-01-09 14:58:41 +00:00
Kyle Evans
c7bab2a7ca if_vmove: return proper error status
if_vmove can fail if it lost a race and the vnet's already been moved. The
callers (and their callers) can generally cope with this, but right now
success is assumed. Plumb out the ENOENT from if_detach_internal if it
happens so that the error's properly reported to userland.

Reviewed by:	bz, kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22780
2020-01-09 03:52:50 +00:00
Alexander V. Chernikov
e02d3fe70c Fix rtsock route message generation for interface addresses.
Reviewed by:	olivier
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D22974
2020-01-07 21:16:30 +00:00
Eric Joyner
f6afed726b iflib: Prevent watchdog from resetting idle queues
While changing link state in iflib_link_state_change(), queues are
marked as IFLIB_QUEUE_IDLE to disable watchdog. Currently, iflib_timer()
watchdog does not check for previous queue status before marking it as
IFLIB_QUEUE_HUNG.

This patch adds check of queue status before marking it as hung.

Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>

PR:		239240
Submitted by:	Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reported by:	ultima@
Reviewed by:	gallatin@, erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21712
2020-01-02 23:35:06 +00:00
Alexander V. Chernikov
5fcb2832e3 Plug loopback idaddr refcount leak.
Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22980
2020-01-02 09:08:45 +00:00
Gleb Smirnoff
8d5c56dab1 In r343631 error code for a packet blocked by a firewall was
changed from EACCES to EPERM.  This change was not intentional,
so fix that.  Return EACCESS if a firewall forbids sending.

Noticed by:	ae
2020-01-01 17:31:43 +00:00
Alexander V. Chernikov
d930203192 Fix NOINET6 build broken by r356236.
MFC after:	2 weeks
2019-12-31 17:57:12 +00:00
Alexander V. Chernikov
c83dda362e Split gigantic rtsock route_output() into smaller functions.
Amount of changes to the original code has been intentionally minimised
to ease diffing.
The changes are mostly mechanical, with the following exceptions:

* lltable handler is now called directly based of RTF_LLINFO flag presense.
* "report" logic for updating rtm in RTM_GET/RTM_DELETE has been simplified,
  fixing several potential use-after-free cases in rt_addrinfo.
* llable asserts has been replaced with error-returning, preventing kernel
  crashes when lltable gw af family is invalid (root required).

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22864
2019-12-31 17:26:53 +00:00
Mark Johnston
6b5d8e30f1 Plug some ifaddr refcount leaks.
- Only take an ifaddr ref in in rt_exportinfo() if the caller explicitly
  requests it.  Take care to release it in this case.
- Don't unconditionally take a ref in rtrequest1_fib().  rt_getifa_fib()
  will acquire a reference, in which case we would previously acquire
  two references.
- Stop taking a reference in rtinit1() before calling rtrequest1_fib().
  rtrequest1_fib() will acquire a reference for the RTM_ADD case.

PR:		242746
Reviewed by:	melifaro (previous version)
Tested by:	ghuckriede@blackberry.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22912
2019-12-27 01:12:54 +00:00
Mark Johnston
c104c2990d lagg: Clean up handling of the rr_limit option.
- Don't allow an unprivileged user to set the stride. [1]
- Only set the stride under the softc lock.
- Rename the internal fields to accurately reflect their use.  Keep
  ro_bkt to avoid changing the user API.
- Simplify the implementation.  The port index is just sc_seq / stride.
- Document rr_limit in ifconfig.8.

Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com> [1]
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22857
2019-12-22 21:56:47 +00:00
Cy Schubert
57e22627f9 MFV r353141 (by phillip):
Update libpcap from 1.9.0 to 1.9.1.

MFC after:	2 weeks
2019-12-21 21:01:03 +00:00
Mark Johnston
3f197b134c Deduplicate code between if_delgroup() and if_delgroups().
Fix some style in if_addgroup().  No functional change intended.

Reviewed by:	hselasky
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22892
2019-12-20 20:15:34 +00:00
Mark Johnston
718ef55ec7 Fix a memory leak in if_delgroups() introduced in r334118.
PR:		242712
Submitted by:	ghuckriede@blackberry.com
MFC after:	3 days
2019-12-20 17:21:57 +00:00
Gleb Smirnoff
185c3d2b93 Convert routing statistics to VNET_PCPUSTAT.
Submitted by:	ocochard
Reviewed by:	melifaro, glebius
Differential Revision:	https://reviews.freebsd.org/D22834
2019-12-17 02:02:26 +00:00
John Baldwin
65d2f9c12b Use a void * argument to callout handlers instead of timeout_t casts.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22684
2019-12-05 18:47:29 +00:00
Bjoern A. Zeeb
3232273f42 Allow kernel to compile without BPF.
r297816 added some bpf magic for VIMAGE unconditionally which no longer
allows kernels to compile without bpf (but with other networking).
Add the missing ifdef checks and allow a kernel to compile without bpf
again.

PR:		242136
Reported by:	dave mischler.com
MFC after:	2 weeks
2019-11-24 23:21:47 +00:00
Conrad Meyer
7993a104a1 Add explicit SI_SUB_EPOCH
Add explicit SI_SUB_EPOCH, after SI_SUB_TASKQ and before SI_SUB_SMP
(EARLY_AP_STARTUP).  Rename existing "SI_SUB_TASKQ + 1" to SI_SUB_EPOCH.

epoch(9) consumers cannot epoch_alloc() before SI_SUB_EPOCH:SI_ORDER_SECOND,
but likely should allocate before SI_SUB_SMP.  Prior to this change,
consumers (well, epoch itself, and net/if.c) just open-coded the
SI_SUB_TASKQ + 1 order to match epoch.c, but this was fragile.

Reviewed by:	mmacy
Differential Revision:	https://reviews.freebsd.org/D22503
2019-11-22 23:23:40 +00:00
Vincenzo Maffione
a56e6334d1 netmap: check if we already ran mmap before we attempt it
Submitted by:	neel@neelc.org
Reviewed by:	vmaffione
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22390
2019-11-19 21:29:49 +00:00
Bjoern A. Zeeb
d9a61c960c if_llatbl: change htable_unlink_entry() to early exist if no work to do
Adjust the logic in htable_unlink_entry() to the one in
htable_link_entry() saving a block indent and making it more clear
in which case we do not do any work.

No functional change.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-15 23:12:19 +00:00
Bjoern A. Zeeb
9744f55686 if_llatbl: cleanup
Remove function prototypes which are not needed (no use before function
definition for these file static functions).

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-15 11:00:03 +00:00
Gleb Smirnoff
9352fab6ab In if_siocaddmulti() enter VNET.
Reported & tested by:	garga
2019-11-13 16:28:53 +00:00
Bjoern A. Zeeb
8b5f9bb755 lltabl: remove dead code
Remove the long (8? years ago) #if 0 marked function lltable_drain() and
while here also remove the unused function llentry_alloc() which has call
paths tools keep finding and are never used.

Sponsored by:	Netflix
2019-11-13 11:21:02 +00:00
Gleb Smirnoff
8a9a28c455 sysctl_rtsock() has all necessary locking and doesn't need Giant to run.
While here add description.
2019-11-07 19:06:18 +00:00
Andrey V. Elsukov
a961401ee0 Enqueue lladdr_task to update link level address of vlan, when its parent
interface has changed.

During vlan reconfiguration without destroying interface, it is possible,
that parent interface will be changed. This usually means, that link
layer address of vlan will be different. Therefore we need to update all
associated with vlan's addresses permanent llentries - NDP for IPv6
addresses, and ARP for IPv4 addresses. This is done via lladdr_task
execution. To avoid extra work, before execution do the check, that L2
address is different.

No objection from:	#network
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D22243
2019-11-07 15:00:37 +00:00
Eric Joyner
74954211d6 net: add ETHER_IS_ZERO macro similar to ETHER_IS_BROADCAST
Some places in network code may need to verify that an ethernet address
is not the 'zero' address. Provide a standard macro ETHER_IS_ZERO for
this purpose, similar to the ETHER_IS_BROADCAST macro already available.

This patch also removes previous ETHER_IS_ZERO definitions in several
USB ethernet drivers, in favor of this centrally-located macro.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21240
2019-11-05 00:12:21 +00:00
Eric Joyner
db8e8f1ede iflib: properly release memory allocated for DMA
DMA memory allocations using the bus_dma.h interface are not properly
released in all cases for both Tx and Rx. This causes ~448 bytes of
M_DEVBUF allocations to be leaked.

First, the DMA maps for Rx are not properly destroyed. A slight attempt
is made in iflib_fl_bufs_free to destroy the maps if we're detaching.
However, this function may not be reliably called during detach. Indeed,
there is a comment "asking" if this should be moved out.

Fix this by moving the bus_dmamap_destroy call into iflib_rx_sds_free,
where we already sync and unload the DMA.

Second, the DMA tag associated with the ifr_ifdi descriptor DMA is not
released properly anywhere. Add a call to iflib_dma_free in
iflib_rx_structures_free.

Third, use of NULL as a canary value on the map pointer returned by
bus_dmamap_create is not valid. On some platforms, notably x86, this
value may be NULL. In this case, we fail to properly release the related
resources.

Remove the NULL checks on map values in both iflib_fl_bufs_free and
iflib_txsd_destroy.

With all of these fixes applied, the leaks to M_DEVBUF are squelched,
and iflib drivers now seem to properly cleanup when detaching.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22203
2019-11-04 23:06:57 +00:00
Vincenzo Maffione
9dd7582c12 netmap: fix build issue in netmap_user.h
The issue was a comparison of integers of different signs
on 32 bit architectures.

Reported by:	jenkins
MFC after:	1 week
2019-10-31 22:16:20 +00:00
Vincenzo Maffione
c7c7805531 add valectl to the system commands
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap

Reviewed by:	hrs, bcr, lwhsu, kevans
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22146
2019-10-31 21:01:34 +00:00
Eric Joyner
244e7cffa5 iflib: cleanup memory leaks on driver detach
From Jake:
The iflib stack failed to release all of the memory allocated under
M_IFLIB during device detach.

Specifically, the ifmp_ring, the ift_ifdi Tx DMA info, and the ifr_ifdi Rx
DMA info were not being released.

Release this memory so that iflib won't leak memory when a device
detaches.

Since we're freeing the ift_ifdi pointer during iflib_txq_destroy we
need to call this only after iflib_dma_free in iflib_tx_structures_free.

Additionally, also ensure that we destroy the callout mutex associated
with each Tx queue when we free it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22157
2019-10-30 20:45:12 +00:00
Gleb Smirnoff
0839aa5c04 There is a long standing problem with multicast programming for NICs
and IPv6.  With IPv6 we may call if_addmulti() in context of processing
of an incoming packet.  Usually this is interrupt context.  While most
of the NIC drivers are able to reprogram multicast filters without
sleeping, some of them can't.  An example is e1000 family of drivers.
With iflib conversion the problem was somewhat hidden.  Iflib processes
packets in private taskqueue, so going to sleep doesn't trigger an
assertion.  However, the sleep would block operation of the driver and
following incoming packets would fill the ring and eventually would
start being dropped.  Enabling epoch for the full time of a packet
processing again started to trigger assertions for e1000.

Fix this problem once and for all using a general taskqueue to call
if_ioctl() method in all cases when if_addmulti() is called in a
non sleeping context.  Note that nobody cares about returned value.

Reviewed by:	hselasky, kib
Differential Revision:	  https://reviews.freebsd.org/D22154
2019-10-29 17:36:06 +00:00
Eric Joyner
1558015e3e iflib: call ether_ifdetach and netmap_detach before stop
From Jake:
Calling ether_ifdetach after iflib_stop leads to a potential race where
a stale ifp pointer can remain in the route entry list for IPv6 traffic.
This will potentially cause a page fault or other system instability if
the ifp pointer is accessed.

Move both iflib_netmap_detach and ether_ifdetach to be called prior to
iflib_stop. This avoids the race above, and helps ensure that other ifp
references are removed before stopping the interface.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, jhb@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22071
2019-10-23 23:20:49 +00:00
Conrad Meyer
65366903c3 Prevent a panic when a driver provides bogus debugnet parameters
This is just a bandaid; we should fix the driver(s) too.  Introduced in
r353685.

PR:		241403
X-MFC-With:	r353685
Reported by:	np and others
2019-10-23 16:48:22 +00:00
Kyle Evans
f7810883d4 tuntap(4): Fix NOINET build after r353741
Shuffle headers around to more appropriate #ifdef OPTION blocks (INET vs.
INET6) -- double checked LINT-{NOINET,NOINET6,NOIP}, all seem good.

Reported by:	cem
2019-10-23 02:15:15 +00:00
Kyle Evans
200abb43c0 tuntap(4): properly declare if_tun and if_tap modules
Simply adding MODULE_VERSION does not do the trick, because the modules
haven't been declared. This should actually fix modfind/kldstat, which
r351229 aimed and failed to do.

This should make vm-bhyve do the right thing again when using the ports
version, rather than the latest version not in ports.

MFC after:	3 days
2019-10-22 00:18:16 +00:00
Gleb Smirnoff
19e09f447f Remove obsoleted KPIs that were used to access interface address lists. 2019-10-21 18:17:03 +00:00
Kyle Evans
3d5013337a tuntap(4): restrict scope of net.link.tap.user_open slightly
net.link.tap.user_open has historically allowed non-root users to do devfs
cloning and open /dev/tap* nodes based on permissions. Loosen this up to
make it only allow users to do devfs cloning -- we no longer check it in
tunopen.

This allows tap devices to be created that can actually be opened by a user,
rather than swiftly restricting them to root because the magic sysctl has
not been set.

The sysctl has not yet been completely deprecated, because more thought is
needed for how to handle the devfs cloning case. There is not an easy
suitable replacement for the sysctl there, and more care needs to be placed
in determining whether that's OK or not.

PR:		200185
2019-10-21 14:38:11 +00:00
Kyle Evans
6025077704 tuntap(4): use cdevpriv w/ dtor for last close instead of d_close
cdevpriv dtors will be called when the reference count on the associated
struct file drops to 0, while d_close can be unreliable for cleaning up
state at "last close" for a number of reasons. As far as tunclose/tundtor is
concerned the difference is minimal, so make the switch.
2019-10-20 22:55:47 +00:00
Kyle Evans
6869d530c7 tuntap(4): Use make_dev_s to avoid si_drv1 race
This allows us to avoid some dance in tunopen for dealing with the
possibility of dev->si_drv1 being NULL as it's set prior to the devfs node
being created in all cases.

There's still the possibility that the tun device hasn't been fully
initialized, since that's done after the devfs node was created. Alleviate
this by returning ENXIO if we're not to that point of tuncreate yet.

This work is what sparked r353128, full initialization of cloned devices
w/ specified make_dev_args.
2019-10-20 22:39:40 +00:00
Kyle Evans
486c0b2269 tuntap(4): break out after setting TUN_DSTADDR
This is now the only flag we set in this loop, terminate early.
2019-10-20 21:06:25 +00:00
Kyle Evans
6041d76e0c tuntap(4): Drop TUN_IASET
This flag appears to have been effectively unused since introduction to
if_tun(4) -- drop it now.
2019-10-20 21:03:48 +00:00
Michael Tuexen
4ad8cb6813 Fix compile issues when building a kernel without the VIMAGE option.
Thanks to cem@ for discussing the issue which resulted in this patch.

Reviewed by:		cem@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22089
2019-10-19 20:48:53 +00:00
Conrad Meyer
0634308df2 Fix debugnet(4) link/build fallout on some configurations
Introduced in r353685 (sys/conf/files), r353694 (debugnet.c db_printf).

Submitted by:	kevans
Reported by:	cy
X-MFC-With:	r353685, r353694
2019-10-18 22:03:36 +00:00
Vincenzo Maffione
f8bc74e2f4 tap: add support for virtio-net offloads
This patch is part of an effort to make bhyve networking (in particular TCP)
faster. The key strategy to enhance TCP throughput is to let the whole packet
datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet
overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver,
which is able to handle TSO/LRO packets leveraging the virtio-net header
(see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend,
which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net
header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the
virtio-net header. Once the virtio-net header is set, the tapX interface
acquires all the IFCAP capabilities necessary for TSO/LRO.

Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D21263
2019-10-18 21:53:27 +00:00
Gleb Smirnoff
fda454099b Make rt_getifa_fib() static. 2019-10-18 15:20:24 +00:00