freebsd-dev

Author	SHA1	Message	Date
Vincenzo Maffione	9dd7582c12	netmap: fix build issue in netmap_user.h The issue was a comparison of integers of different signs on 32 bit architectures. Reported by: jenkins MFC after: 1 week	2019-10-31 22:16:20 +00:00
Vincenzo Maffione	c7c7805531	add valectl to the system commands The valectl(4) program is used to manage vale(4) switches. Add it to the system commands so that it can be used right away. This program was previously called vale-ctl, and stored in tools/tools/netmap Reviewed by: hrs, bcr, lwhsu, kevans MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D22146	2019-10-31 21:01:34 +00:00
Eric Joyner	244e7cffa5	iflib: cleanup memory leaks on driver detach From Jake: The iflib stack failed to release all of the memory allocated under M_IFLIB during device detach. Specifically, the ifmp_ring, the ift_ifdi Tx DMA info, and the ifr_ifdi Rx DMA info were not being released. Release this memory so that iflib won't leak memory when a device detaches. Since we're freeing the ift_ifdi pointer during iflib_txq_destroy we need to call this only after iflib_dma_free in iflib_tx_structures_free. Additionally, also ensure that we destroy the callout mutex associated with each Tx queue when we free it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22157	2019-10-30 20:45:12 +00:00
Gleb Smirnoff	0839aa5c04	There is a long standing problem with multicast programming for NICs and IPv6. With IPv6 we may call if_addmulti() in context of processing of an incoming packet. Usually this is interrupt context. While most of the NIC drivers are able to reprogram multicast filters without sleeping, some of them can't. An example is e1000 family of drivers. With iflib conversion the problem was somewhat hidden. Iflib processes packets in private taskqueue, so going to sleep doesn't trigger an assertion. However, the sleep would block operation of the driver and following incoming packets would fill the ring and eventually would start being dropped. Enabling epoch for the full time of a packet processing again started to trigger assertions for e1000. Fix this problem once and for all using a general taskqueue to call if_ioctl() method in all cases when if_addmulti() is called in a non sleeping context. Note that nobody cares about returned value. Reviewed by: hselasky, kib Differential Revision: https://reviews.freebsd.org/D22154	2019-10-29 17:36:06 +00:00
Eric Joyner	1558015e3e	iflib: call ether_ifdetach and netmap_detach before stop From Jake: Calling ether_ifdetach after iflib_stop leads to a potential race where a stale ifp pointer can remain in the route entry list for IPv6 traffic. This will potentially cause a page fault or other system instability if the ifp pointer is accessed. Move both iflib_netmap_detach and ether_ifdetach to be called prior to iflib_stop. This avoids the race above, and helps ensure that other ifp references are removed before stopping the interface. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@, jhb@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22071	2019-10-23 23:20:49 +00:00
Conrad Meyer	65366903c3	Prevent a panic when a driver provides bogus debugnet parameters This is just a bandaid; we should fix the driver(s) too. Introduced in r353685. PR: 241403 X-MFC-With: r353685 Reported by: np and others	2019-10-23 16:48:22 +00:00
Kyle Evans	f7810883d4	tuntap(4): Fix NOINET build after r353741 Shuffle headers around to more appropriate #ifdef OPTION blocks (INET vs. INET6) -- double checked LINT-{NOINET,NOINET6,NOIP}, all seem good. Reported by: cem	2019-10-23 02:15:15 +00:00
Kyle Evans	200abb43c0	tuntap(4): properly declare if_tun and if_tap modules Simply adding MODULE_VERSION does not do the trick, because the modules haven't been declared. This should actually fix modfind/kldstat, which r351229 aimed and failed to do. This should make vm-bhyve do the right thing again when using the ports version, rather than the latest version not in ports. MFC after: 3 days	2019-10-22 00:18:16 +00:00
Gleb Smirnoff	19e09f447f	Remove obsoleted KPIs that were used to access interface address lists.	2019-10-21 18:17:03 +00:00
Kyle Evans	3d5013337a	tuntap(4): restrict scope of net.link.tap.user_open slightly net.link.tap.user_open has historically allowed non-root users to do devfs cloning and open /dev/tap* nodes based on permissions. Loosen this up to make it only allow users to do devfs cloning -- we no longer check it in tunopen. This allows tap devices to be created that can actually be opened by a user, rather than swiftly restricting them to root because the magic sysctl has not been set. The sysctl has not yet been completely deprecated, because more thought is needed for how to handle the devfs cloning case. There is not an easy suitable replacement for the sysctl there, and more care needs to be placed in determining whether that's OK or not. PR: 200185	2019-10-21 14:38:11 +00:00
Kyle Evans	6025077704	tuntap(4): use cdevpriv w/ dtor for last close instead of d_close cdevpriv dtors will be called when the reference count on the associated struct file drops to 0, while d_close can be unreliable for cleaning up state at "last close" for a number of reasons. As far as tunclose/tundtor is concerned the difference is minimal, so make the switch.	2019-10-20 22:55:47 +00:00
Kyle Evans	6869d530c7	tuntap(4): Use make_dev_s to avoid si_drv1 race This allows us to avoid some dance in tunopen for dealing with the possibility of dev->si_drv1 being NULL as it's set prior to the devfs node being created in all cases. There's still the possibility that the tun device hasn't been fully initialized, since that's done after the devfs node was created. Alleviate this by returning ENXIO if we're not to that point of tuncreate yet. This work is what sparked r353128, full initialization of cloned devices w/ specified make_dev_args.	2019-10-20 22:39:40 +00:00
Kyle Evans	486c0b2269	tuntap(4): break out after setting TUN_DSTADDR This is now the only flag we set in this loop, terminate early.	2019-10-20 21:06:25 +00:00
Kyle Evans	6041d76e0c	tuntap(4): Drop TUN_IASET This flag appears to have been effectively unused since introduction to if_tun(4) -- drop it now.	2019-10-20 21:03:48 +00:00
Michael Tuexen	4ad8cb6813	Fix compile issues when building a kernel without the VIMAGE option. Thanks to cem@ for discussing the issue which resulted in this patch. Reviewed by: cem@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D22089	2019-10-19 20:48:53 +00:00
Conrad Meyer	0634308df2	Fix debugnet(4) link/build fallout on some configurations Introduced in r353685 (sys/conf/files), r353694 (debugnet.c db_printf). Submitted by: kevans Reported by: cy X-MFC-With: r353685, r353694	2019-10-18 22:03:36 +00:00
Vincenzo Maffione	f8bc74e2f4	tap: add support for virtio-net offloads This patch is part of an effort to make bhyve networking (in particular TCP) faster. The key strategy to enhance TCP throughput is to let the whole packet datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet overhead is amortized over a large number of bytes. This capability is supported in the guest by means of the vtnet(4) driver, which is able to handle TSO/LRO packets leveraging the virtio-net header (see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf). A bhyve VM exchanges packets with the host through a network backend, which can be vale(4) or if_tap(4). While vale(4) supports TSO/LRO packets, if_tap(4) does not. This patch extends if_tap(4) with the ability to understand the virtio-net header, so that a tapX interface can process TSO/LRO packets. A couple of ioctl commands have been added to configure and probe the virtio-net header. Once the virtio-net header is set, the tapX interface acquires all the IFCAP capabilities necessary for TSO/LRO. Reviewed by: kevans Differential Revision: https://reviews.freebsd.org/D21263	2019-10-18 21:53:27 +00:00
Gleb Smirnoff	fda454099b	Make rt_getifa_fib() static.	2019-10-18 15:20:24 +00:00
Conrad Meyer	dda17b3672	Implement NetGDB(4) NetGDB(4) is a component of a system using a panic-time network stack to remotely debug crashed FreeBSD kernels over the network, instead of traditional serial interfaces. There are three pieces in the complete NetGDB system. First, a dedicated proxy server must be running to accept connections from both NetGDB and gdb(1), and pass bidirectional traffic between the two protocols. Second, the NetGDB client is activated much like ordinary 'gdb' and similarly to 'netdump' in ddb(4) after a panic. Like other debugnet(4) clients (netdump(4)), the network interface on the route to the proxy server must be online and support debugnet(4). Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any other TCP remote) to connect to the proxy server. The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and uses a 1:1 relationship between GDB packets and sequences of debugnet packets (fragmented by MTU). There is no encryption utilized to keep debugging sessions private, so this is only appropriate for local segments or trusted networks. Submitted by: John Reimer <john.reimer AT emc.com> (earlier version) Discussed some with: emaste, markj Relnotes: sure Differential Revision: https://reviews.freebsd.org/D21568	2019-10-17 21:33:01 +00:00
Conrad Meyer	e9c6962599	debugnet(4): Add optional full-duplex mode It remains unattached to any client protocol. Netdump is unaffected (remaining half-duplex). The intended consumer is NetGDB. Submitted by: John Reimer <john.reimer AT emc.com> (earlier version) Discussed with: markj Differential Revision: https://reviews.freebsd.org/D21541	2019-10-17 20:25:15 +00:00
Gleb Smirnoff	b2807792f1	Revert two parts of r353292 that enter epoch when processing vlan capabilities. It could be that entering epoch isn't necessary here, but better take a conservative approach. Submitted by: kp	2019-10-17 20:18:07 +00:00
Conrad Meyer	fde2cf65ce	debugnet(4): Infer non-server connection parameters Loosen requirements for connecting to debugnet-type servers. Only require a destination address; the rest can theoretically be inferred from the routing table. Relax corresponding constraints in netdump(4) and move ifp validation to debugnet connection time. Submitted by: John Reimer <john.reimer AT emc.com> (earlier version) Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D21482	2019-10-17 20:10:32 +00:00
Conrad Meyer	8270d35eca	Add ddb(4) 'netdump' command to netdump a core without preconfiguration Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine to the generic debugnet code. The imagined use is both netdump, shown here, and NetGDB (vaporware). It uses the ddb(4) lexer, with some new extensions, to parse out IPv4 addresses. 'Netdump' uses the generic debugnet routine to load a configuration and start a dump, without any netdump configuration prior to panic. Loosely derived from work by: John Reimer <john.reimer AT emc.com> Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D21460	2019-10-17 19:49:20 +00:00
Conrad Meyer	6d567ec2da	debugnet: Respond to broadcast ARP requests The in-tree netdump code has always ignored non-directed ARP requests, and that seems to work most of the time for netdump. In my work and testing on NetGDB, it seems like sometimes the remote FreeBSD conversant (the non-panic system) will send broadcast-destination ARP requests to the debugnet kernel; without this change, those are dropped and the remote will see EHOSTDOWN "Host is down" errors from the userspace interface of the network stack. Discussed with: markj	2019-10-17 17:48:32 +00:00
Conrad Meyer	d39756c142	debugnet(4): Check hardware-validated UDP checksums Similar to INET checksums, lazily validate UDP checksums when the driver has already performed the check for us. Like debugnet(4) INET checksums, validation in software is left as future work. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D21745	2019-10-17 17:19:16 +00:00
Conrad Meyer	7790c8c199	Split out a more generic debugnet(4) from netdump(4) Debugnet is a simplistic and specialized panic- or debug-time reliable datagram transport. It can drive a single connection at a time and is currently unidirectional (debug/panic machine transmit to remote server only). It is mostly a verbatim code lift from netdump(4). Netdump(4) remains the only consumer (until the rest of this patch series lands). The INET-specific logic has been extracted somewhat more thoroughly than previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as much as possible as is protocol-independent, remains in debugnet.c. The separation is not perfect and future improvement is welcome. Supporting INET6 is a long-term goal. Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to 'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the generic module would be more confusing than the refactoring. The only functional change here is the mbuf allocation / tracking. Instead of initiating solely on netdump-configured interface(s) at dumpon(8) configuration time, we watch for any debugnet-enabled NIC for link activation and query it for mbuf parameters at that time. If they exceed the existing high-water mark allocation, we re-allocate and track the new high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone. In a future patch in this series, this will allow initiating netdump from panic ddb(4) without pre-panic configuration. No other functional change intended. Reviewed by: markj (earlier version) Some discussion with: emaste, jhb Objection from: marius Differential Revision: https://reviews.freebsd.org/D21421	2019-10-17 16:23:03 +00:00
Philip Paeps	579b70db89	ether: add older ethertype definitions for QinQ Older network equipment used the ethertypes 0x9100, 0x9200, and 0x9300 for outer VLANs, before standardisation introduced 0x88a8. Submitted by: Lutz Donnerhacke <lutz_donnerhacke.de> Differential Revision: https://reviews.freebsd.org/D21846	2019-10-17 00:34:53 +00:00
Gleb Smirnoff	b46d70fd88	do_link_state_change() is executed in taskqueue context and in general is allowed to sleep. Don't enter the epoch for the whole duration. If some event handlers need the epoch, they should handle that theirselves. Discussed with: hselasky	2019-10-16 16:32:58 +00:00
Hans Petter Selasky	270b83b9d1	The two functions ifnet_byindex() and ifnet_byindex_locked() are exactly the same after the network stack was epochified. Merge the two into one function and cleanup all uses of ifnet_byindex_locked(). While at it: - Add branch prediction macros. - Make sure the ifnet pointer is only deferred once, also when code optimisation is disabled. Sponsored by: Mellanox Technologies	2019-10-15 12:08:09 +00:00
Hans Petter Selasky	93cfeb0ed9	Exclude the network link eventhandler from epochification after r353292. This fixes the following assert when "options RATELIMIT" is used: panic() malloc() sysctl_add_oid() tcp_rl_ifnet_link() do_link_state_change() taskqueue_run_locked() Sponsored by: Mellanox Technologies	2019-10-15 11:20:16 +00:00
Gleb Smirnoff	416a1d1e70	if_delmulti() is never called without ifp argument, assert this instead of doing a useless search through interfaces.	2019-10-14 21:18:37 +00:00
Michael Tuexen	69104ebe0f	Add missing include which breaks builds without VIMAGE. The bug was introduced by me in r353480. Reported by: Michael Butler MFC after: 3 days	2019-10-13 19:58:37 +00:00
Michael Tuexen	d6e23cf0cf	Use an event handler to notify the SCTP about IP address changes instead of calling an SCTP specific function from the IP code. This is a requirement of supporting SCTP as a kernel loadable module. This patch was developed by markj@, I tweaked a bit the SCTP related code. Submitted by: markj@ MFC after: 3 days	2019-10-13 18:17:08 +00:00
Gleb Smirnoff	6dcec895d9	vlan_config() isn't always called in epoch context. Reported by: kp	2019-10-13 15:15:09 +00:00
Gleb Smirnoff	45c1d51c39	Don't use if_maddr_rlock() in sppp(4), use epoch(9) directly instead.	2019-10-10 23:54:37 +00:00
Gleb Smirnoff	73c96bbeac	Don't use if_maddr_rlock() in tuntap(4), use epoch(9) directly instead.	2019-10-10 23:51:14 +00:00
Gleb Smirnoff	4b24e5b1ef	Interface output method must be executed in network epoch, so if_addr_rlock() isn't needed here.	2019-10-10 23:50:32 +00:00
Gleb Smirnoff	fb3fc771f6	Add two extra functions that basically give count of addresses on interface. Such function could been implemented on top of the if_foreach_llm?addr(), but several drivers need counting, so avoid copy-n-paste inside the drivers.	2019-10-10 23:44:56 +00:00
Gleb Smirnoff	826857c833	Provide new KPI for network drivers to access lists of interface addresses. The KPI doesn't reveal neither how addresses are stored, how the access to them is synchronized, neither reveal struct ifaddr and struct ifmaddr. Reviewed by: gallatin, erj, hselasky, philip, stevek Differential Revision: https://reviews.freebsd.org/D21943	2019-10-10 23:42:55 +00:00
Gleb Smirnoff	caeeeaa7c5	ifnet_byindex_ref() requires network epoch.	2019-10-09 16:21:50 +00:00
Gleb Smirnoff	1e80e4f26c	Remove epoch assertion from if_setlladdr(). Originally this function was protected by IF_ADDR_LOCK(), which was a mutex, so that two simultaneous if_setlladdr() can't execute. Later it was switched to IF_ADDR_RLOCK(), likely by a mistake. Later it was switched to NET_EPOCH_ENTER(). Then I incorrectly added NET_EPOCH_ASSERT() here. In reality ifp->if_addr never goes away and never changes its length. So, doing bcopy() in it is always "safe", meaning it won't dereference a wrong pointer or write into someone's else memory. Of course doing two bcopy() in parallel would result in a mess of two addresses, but net epoch doesn't protect against that, neither IF_ADDR_RLOCK() did. So for now, just remove the assertion and leave for later a proper fix. Reported by: markj	2019-10-08 17:55:45 +00:00
Gleb Smirnoff	e9dc46cc30	In DIAGNOSTIC block of if_delmulti_ifma_flags() enter the network epoch. This quickly plugs the regression from r353292. The locking of multicast definitely needs a broader review today... Reported by: pho, dhw	2019-10-08 16:45:56 +00:00
Hans Petter Selasky	a362cf527e	Fix regression issue after r353274: Make sure the vnet_shutdown field is not set until after all VNET_SYSUNINIT()'s in the SI_SUB_VNET_DONE subsystem have been executed. Especially the vnet_if_return() functions requires that if_move() is still operational. Reported by: lwhsu@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-08 11:06:24 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Hans Petter Selasky	4715738b12	Compile time assert a valid subsystem for all VNET init and uninit functions. Using VNET init and uninit functions outside the given range has undefined behaviour. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-07 14:24:59 +00:00
Hans Petter Selasky	204e2f30d9	Factor out VNET shutdown check into an own vnet structure field. Remove the now obsolete vnet_state field. This greatly simplifies the detection of VNET shutdown and avoids code duplication. Discussed with: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-07 14:15:41 +00:00
Kyle Evans	291287667c	tuntap(4): loosen up tunclose restrictions Realistically, this cannot work. We don't allow the tun to be opened twice, so it must be done via fd passing, fork, dup, some mechanism like these. Applications demonstrably do not enforce strict ordering when they're handing off tun devices, so the parent closing before the child will easily leave the tun/tap device in a bad state where it can't be destroyed and a confused user because they did nothing wrong. Concede that we can't leave the tun/tap device in this kind of state because of software not playing the TUNSIFPID game, but it is still good to find and fix this kind of thing to keep ifconfig(8) up-to-date and help ensure good discipline in tun handling. MFC after: 3 days	2019-10-04 13:43:07 +00:00
Kyle Evans	59997c3c46	if_tuntap: create /dev aliases when a tuntap device gets renamed Currently, if you do: $ ifconfig tun0 create $ ifconfig tun0 name wg0 $ ls -l /dev \| egrep 'wg\|tun' You will see tun0, but no wg0. In fact, it's slightly more annoying to make the association between the new name and the old name in order to open the device (if it hadn't been opened during the rename). Register an eventhandler for ifnet_arrival_events and catch interface renames. We can determine if the ifnet is a tun easily enough from the if_dname, which matches the cevsw.d_name from the associated tuntap_driver. Some locking dance is required because renames don't require the device to be opened, so it could go away in the middle of handling the ioctl, but as soon as we've verified this isn't the case we can attempt to busy the tun and either bail out if the tun device is dying, or we can proceed with the rename. We only create these aliases on a best-effort basis. Renaming a tun device to "usbctl", which doesn't exist as an ifnet but does as a /dev, is clearly not that disastrous, but we can't and won't create a /dev for that.	2019-10-03 17:54:00 +00:00
Kyle Evans	c4cad1549e	if_tuntap: add a busy/unbusy mechanism, replace destroy OPEN check A future commit will create device aliases when a tuntap device is renamed so that it's still easily found in /dev after the rename. Said mechanism will want to keep the tun alive long enough to either realize that it's about to go away or complete the alias creation, even if the alias is about to get destroyed. While we're introducing it, using it to prevent open devices from going away makes plenty of sense and keeps the logic on waking up tun_destroy clean, so we don't have multiple places trying to cv_broadcast unless it's still in use elsewhere.	2019-10-03 17:46:27 +00:00
Mark Johnston	4166913371	Add IFLIB_SINGLE_IRQ_RX_ONLY. As of r347221 the iflib legacy interrupt mode setup assumes that drivers perform both receive and transmit processing from the interrupt handler. This assumption is invalid in the vmxnet3 driver, so introduce the IFLIB_SINGLE_IRQ_RX_ONLY flag to make iflib avoid tx processing in the interrupt handler. PR: 239118 Reported and tested by: Juraj Lutter <otis@sk.freebsd.org> Obtained from: marius Reviewed by: gallatin MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D21831	2019-09-30 15:59:07 +00:00

1 2 3 4 5 ...

4234 Commits