freebsd-dev

Author	SHA1	Message	Date
Marcelo Araujo	fbd8c33022	Allow changing lagg(4) MTU. Previously, changing the MTU would require destroying the lagg and creating a new one. Now it is allowed to change the MTU of the lagg interface and the MTU of the ports will be set to match. If any port cannot set the new MTU, all ports are reverted to the original MTU of the lagg. Additionally, when adding ports, the MTU of a port will be automatically set to the MTU of the lagg. As always, the MTU of the lagg is initially determined by the MTU of the first port added. If adding an interface as a port for some reason fails, that interface is reverted to its original MTU. Submitted by: Ryan Moeller <ryan@freqlabs.com> Reviewed by: mav Relnotes: Yes Sponsored by: iXsystems Inc. Differential Revision: https://reviews.freebsd.org/D17576	2018-10-30 09:53:57 +00:00
Eugene Grosbein	232485a17e	Prevent stf(4) from panicing due to unprotected access to INADDR_HASH. PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others	2018-10-27 04:45:28 +00:00
Eric Joyner	46fa0c2552	Revert r339634. That commit is causing kernel panics in em(4), so this will be reverted until those are fixed. Reported by: ae@, pho@, et al Sponsored by: Intel Corporation	2018-10-23 17:06:36 +00:00
Andrey V. Elsukov	8796e291f8	Add the check that current VNET is ready and access to srchash is allowed. This change is similar to r339646. The callback that checks for appearing and disappearing of tunnel ingress address can be called during VNET teardown. To prevent access to already freed memory, add check to the callback and epoch_wait() call to be sure that callback has finished its work. MFC after: 20 days	2018-10-23 13:11:45 +00:00
Andrey V. Elsukov	221022e190	Add the check that current VNET is ready and access to srchash is allowed. ipsec_srcaddr() callback can be called during VNET teardown, since ingress address checking subsystem isn't VNET specific. And thus callback can make access to already freed memory. To prevent this, use V_ipsec_idhtbl pointer as indicator of VNET readiness. And make epoch_wait() after resetting it to NULL in vnet_ipsec_uninit() to be sure that ipsec_srcaddr() is finished its work. Reported by: kp MFC after: 20 days	2018-10-23 13:03:03 +00:00
Andrey V. Elsukov	e6b383b2f5	Remove softc from idhash when interface is destroyed. MFC after: 20 days	2018-10-23 12:50:28 +00:00
Vincenzo Maffione	2a7db7a63d	netmap: align codebase to the current upstream (sha 8374e1a7e6941) Changelist: - Move large parts of VALE code to a new file and header netmap_bdg.[ch]. This is useful to reuse the code within upcoming projects. - Improvements and bug fixes to pipes and monitors. - Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to handle differences between FreeBSD and Linux. - Introduce some new helper functions to handle more host rings and fake rings (netmap_all_rings(), netmap_real_rings(), ...) - Added new sysctl to enable/disable hw checksum in emulated netmap mode. - nm_inject: add support for NS_MOREFRAG Approved by: gnn (mentor) Differential Revision: https://reviews.freebsd.org/D17364	2018-10-23 08:55:16 +00:00
Eric Joyner	940f62d616	iflib: drain enqueued tasks before detaching from taskqgroup The taskqgroup_detach function does not check if task is already enqueued when detaching it. This may lead to kernel panic if enqueued task starts after context state lock is destroyed. Ensure that the already enqueued admin tasks are executed before detaching them. The issue was discovered during validation of D16429. Unloading of if_ixlv followed by immediate removal of VFs with iovctl -D may lead to panic on NODEBUG kernel. As well, check if iflib is in detach before enqueueing new admin or iov tasks, to prevent new tasks from executing while the taskqgroup tasks are being drained. Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com> Reviewed by: shurd@, erj@ Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D17404	2018-10-23 04:37:29 +00:00
Hans Petter Selasky	88d57e9eac	Resolve deadlock between epoch(9) and various network interface SX-locks, during if_purgeaddrs(), by not allowing to hold the epoch read lock over typical network IOCTL code paths. This is a regression issue after r334305. Reviewed by: ae (network) Differential revision: https://reviews.freebsd.org/D17647 MFC after: 1 week Sponsored by: Mellanox Technologies	2018-10-22 13:25:26 +00:00
Andrey V. Elsukov	cc958ed28c	Follow the fix in r339532 (by glebius): Fix exiting an epoch(9) we never entered. May happen only with MAC. MFC after: 1 month	2018-10-21 18:30:27 +00:00
Andrey V. Elsukov	2c87fdf059	Rework if_ipsec(4) to use epoch(9) instead of rmlock. * use CK_LIST and FNV hash to keep chains of softc; * read access to softc is protected by epoch(); * write access is protected by ipsec_ioctl_sx. Changing of softc fields is allowed only when softc is unlinked from CK_LIST chains. * linking/unlinking of softc is allowed only when ipsec_ioctl_sx is exclusive locked. * the plain LIST of all softc is replaced by hash table that uses ingress address of tunnels as a key. * added support for appearing/disappearing of ingress address handling. Now it is allowed configure non-local ingress IP address, and thus the problem with if_ipsec(4) configuration that happens on boot, when ingress address is not yet configured, is solved. MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17190	2018-10-21 18:24:20 +00:00
Andrey V. Elsukov	df49ca9fd3	Add handling for appearing/disappearing of ingress addresses to if_me(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; MFC after: 1 month Sponsored by: Yandex LLC	2018-10-21 18:18:37 +00:00
Andrey V. Elsukov	19873f4780	Add handling for appearing/disappearing of ingress addresses to if_gre(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17214	2018-10-21 18:13:45 +00:00
Andrey V. Elsukov	009d82ee0f	Add handling for appearing/disappearing of ingress addresses to if_gif(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; * remove the note about ingress address from BUGS section. MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17134	2018-10-21 18:06:15 +00:00
Andrey V. Elsukov	8251c68d5c	Add KPI that can be used by tunneling interfaces to handle IP addresses appearing and disappearing on the host system. Such handling is need, because tunneling interfaces must use addresses, that are configured on the host as ingress addresses for tunnels. Otherwise the system can send spoofed packets with source address, that belongs to foreign host. The KPI uses ifaddr_event_ext event to implement addresses tracking. Tunneling interfaces register event handlers and then they are notified by the kernel, when an address disappears or appears. ifaddr_event_compat() handler from if.c replaced by srcaddr_change_event() in the ip_encap.c MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17134	2018-10-21 17:55:26 +00:00
Kristof Provost	5191a3aea6	vlan: Fix panic with lagg and vlan vlan_lladdr_fn() is called from taskqueue, which means there's no vnet context set. We can end up trying to send ARP messages (through the iflladdr_event event), which requires a vnet context. PR: 227654 MFC after: 3 days	2018-10-21 16:51:35 +00:00
Andrey V. Elsukov	64d63b1e03	Add ifaddr_event_ext event. It is similar to ifaddr_event, but the handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL, and the pointer to ifaddr. Also ifaddr_event now is implemented using ifaddr_event_ext handler. MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17100	2018-10-21 15:02:06 +00:00
Gleb Smirnoff	0a27163f8d	Fix exiting an epoch(9) we never entered. May happen only with MAC.	2018-10-21 12:39:00 +00:00
Hans Petter Selasky	c32a9d66fb	Fix deadlock when destroying VLANs. Synchronizing the epoch before freeing the multicast addresses while holding the VLAN_XLOCK() might lead to a deadlock. Use deferred freeing of the VLAN multicast addresses to resolve deadlock. Backtrace: Thread1: epoch_block_handler_preempt() ck_epoch_synchronize_wait() epoch_wait_preempt() vlan_setmulti() vlan_ioctl() in6m_release_task() gtaskqueue_run_locked() gtaskqueue_thread_loop() fork_exit() fork_trampoline() Thread2: sleepq_switch() sleepq_wait() _sx_xlock_hard() _sx_xlock() in6_leavegroup() in6_purgeaddr() if_purgeaddrs() if_detach_internal() if_detach() vlan_clone_destroy() if_clone_destroyif() if_clone_destroy() ifioctl() kern_ioctl() sys_ioctl() amd64_syscall() fast_syscall_common() syscall() Differential revision: https://reviews.freebsd.org/D17496 Reviewed by: slavash, mmacy Approved by: re (kib) Sponsored by: Mellanox Technologies	2018-10-15 10:29:29 +00:00
Eric Joyner	77c1fcec91	ixl/iavf(4): Change ixlv to iavf and update it to use iflib(9) Finishes the conversion of the 40Gb Intel Ethernet drivers to iflib(9) for FreeBSD 12.0, and fixes numerous bugs in both ixl(4) and iavf(4). This commit also re-adds the VF driver to GENERIC since it now compiles and functions. The VF driver name was changed from ixlv(4) to iavf(4) because the VF driver is now intended to be used with future products, not just with Fortville/Fort Park VFs. A man page update that documents these drivers is forthcoming in a separate commit. Reviewed by: sbruno@, kbowling@ Tested by: jeffrey.e.pieper@intel.com Approved by: re (gjb@) Relnotes: yes Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D16429	2018-10-12 22:40:54 +00:00
Jonathan T. Looney	13c6ba6d94	There are three places where we return from a function which entered an epoch section without exiting that epoch section. This is bad for two reasons: the epoch section won't exit, and we will leave the epoch tracker from the stack on the epoch list. Fix the epoch leak by making sure we exit epoch sections before returning. Reviewed by: ae, gallatin, mmacy Approved by: re (gjb, kib) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D17450	2018-10-09 13:26:06 +00:00
Michael Tuexen	580e30a33e	Use strlcpy() instead of strncpy(). Approved by: re (kib@) CID: 1395980, 1395981 X-MFC with: r339012 MFC after: 1 week	2018-10-03 07:35:16 +00:00
Michael Tuexen	1687b1ab24	For changing the MTU on tun/tap devices, it should not matter whether it is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command. Without this patch, for IPv6 the new MTU is not used when creating routes. Especially, when initiating TCP connections after increasing the MTU, the old MTU is still used to compute the MSS. Thanks to ae@ and bz@ for helping to improve the patch. Reviewed by: ae@, bz@ Approved by: re (kib@) MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D17180	2018-09-29 13:01:23 +00:00
Warner Losh	329e817fcc	Reapply, with minor tweaks, r338025, from the original commit: Remove unused and easy to misuse PNP macro parameter Inspired by r338025, just remove the element size parameter to the MODULE_PNP_INFO macro entirely. The 'table' parameter is now required to have correct pointer (or array) type. Since all invocations of the macro already had this property and the emitted PNP data continues to include the element size, there is no functional change. Mostly done with the coccinelle 'spatch' tool: $ cat modpnpsize0.cocci @normaltables@ identifier b,c; expression a,d,e; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,d, -sizeof(d[0]), e); @singletons@ identifier b,c,d; expression a; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,&d, -sizeof(d), 1); $ rg -l MODULE_PNP_INFO -- sys \| \ xargs spatch --in-place --sp-file modpnpsize0.cocci (Note that coccinelle invokes diff(1) via a PATH search and expects diff to tolerate the -B flag, which BSD diff does not. So I had to link gdiff into PATH as diff to use spatch.) Tinderbox'd (-DMAKE_JUST_KERNELS). Approved by: re (glen)	2018-09-26 17:12:14 +00:00
Matt Macy	b08d611de8	fix vlan locking to permit sx acquisition in ioctl calls - update vlan(9) to handle changes earlier this year in multicast locking Tested by: np@, darkfiberu at gmail.com PR: 230510 Reviewed by: mjoras@, shurd@, sbruno@ Approved by: re (gjb@) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16808	2018-09-21 01:37:08 +00:00
Stephen Hurd	0c919c2370	Fix capabilities handling for iflib drivers Various capabilities were not being handled correctly in the SIOCSIFCAP handler. Specifically: IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6 could be set even if not supported It was impossible to disable IFCAP_RXCSUM and/or IFCAP_RXCSUM_IPV6 via ifconfig since it does ioctl() per command-line flag rather than combine them into a single call. IFCAP_VLAN_HWCSUM could not be modified via the ioctl() Setting any combination of the three IFCAP_WOL flags would set only IFCAP_WOL_MCAST \| IFCAP_WOL_MAGIC. For example, setting only IFCAP_WOL_UCAST would result in both IFCAP_WOL_MCAST and IFCAP_WOL_MAGIC being enabled, but IFCAP_WOL_UCAST would not be enabled. Because if_vlancap() was called before if_togglecapenable(), vlan flags were sometimes not applied correctly. Interfaces were being unnecessarily stopped and restarted for WoL PR: 231151 Submitted by: Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> Reported by: Shirkdog <mshirk@daemon-security.com> Reviewed by: galladin Approved by: re (gjb) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D17158	2018-09-20 19:35:35 +00:00
Andrey V. Elsukov	c6851ad04c	Restore outbound packets capturing for if_gre(4). It was missed in r335048. Also clear M_MCAST and M_BCAST flags for encapsulated datagram, since it will have new IP header. Approved by: re (kib)	2018-09-17 10:10:14 +00:00
Ruslan Bukin	86c5937532	Don't mark module data as static on RISC-V. Similar to arm64, riscv compiler uses PC-relative loads/stores, and with static data compiler does not emit relocations. In result, kernel module linker has nothing to fix and data accessed from the wrong location. Approved by: re (gjb) Sponsored by: DARPA, AFRL	2018-09-12 08:05:33 +00:00
Stephen Hurd	64e6fc1379	Clean up iflib sysctls Remove sysctls: txq_drain_encapfail - now a duplicate of encap_txd_encap_fail intr_link - was never incremented intr_msix - was never incremented rx_zero_len - was never incremented The following were not incremented in all code-paths that apply: m_pullups, mbuf_defrag, rxd_flush, tx_encap, rx_intr_enables, tx_frees, encap_txd_encap_fail. Fixes: Replace the broken collapse_pkthdr() implementation with an MPASS(). fl_refills and fl_refills_large were not incremented when using netmap. Reviewed by: gallatin Approved by: re (marius) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16733	2018-09-06 18:51:52 +00:00
Bjoern A. Zeeb	d4672c61d3	Rather than duplicating the functionality of a macro after r322866 use the already existing one. No functional changes. Reviewed by: karels, ae Approved by: re (rgrimes) Differential Revision: https://reviews.freebsd.org/D17004	2018-09-03 22:10:49 +00:00
Stephen Hurd	bc0e855bd9	Fix compile error due to missing parenthesis in r338372 Approved by: re (gjb)	2018-08-29 16:21:34 +00:00
Stephen Hurd	a520f8b6fe	Fix potential data corruption in iflib The MP ring may have txq pointers enqueued. Previously, these were passed to m_free() when IFC_QFLUSH was set. This patch checks for the value and doesn't call m_free(). Reviewed by: gallatin Approved by: re (gjb) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16882	2018-08-29 15:55:25 +00:00
Mark Murray	19fa89e938	Remove the Yarrow PRNG algorithm option in accordance with due notice given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898	2018-08-26 12:51:46 +00:00
Navdeep Parhar	47c39432b1	Unbreak VLANs after r337943. ether_set_pcp should not be called from ether_output_frame for VLAN interfaces -- the vid + pcp will be inserted during vlan_transmit in that case. r337943 sets the VLAN's ifnet's if_pcp to a proper PCP value and this led to double encapsulation (once with vid 0 and second time with vid+pcp). PR: 230794 Reviewed by: kib@ Approved by: re@ (gjb@) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16887	2018-08-24 21:48:13 +00:00
Patrick Kelsey	249cc75fd1	Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of 2^32 bps or greater to be used. Prior to this, bandwidth parameters would simply wrap at the 2^32 boundary. The computations in the HFSC scheduler and token bucket regulator have been modified to operate correctly up to at least 100 Gbps. No other algorithms have been examined or modified for correct operation above 2^32 bps (some may have existing computation resolution or overflow issues at rates below that threshold). pfctl(8) will now limit non-HFSC bandwidth parameters to 2^32 - 1 before passing them to the kernel. The extensions to the pf(4) ioctl interface have been made in a backwards-compatible way by versioning affected data structures, supporting all versions in the kernel, and implementing macros that will cause existing code that consumes that interface to use version 0 without source modifications. If version 0 consumers of the interface are used against a new kernel that has had bandwidth parameters of 2^32 or greater configured by updated tools, such bandwidth parameters will be reported as 2^32 - 1 bps by those old consumers. All in-tree consumers of the pf(4) interface have been updated. To update out-of-tree consumers to the latest version of the interface, define PFIOC_USE_LATEST ahead of any includes and use the code of pfctl(8) as a guide for the ioctls of interest. PR: 211730 Reviewed by: jmallett, kp, loos MFC after: 2 weeks Relnotes: yes Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D16782	2018-08-22 19:38:48 +00:00
Eric Joyner	fe2bf351fe	if_media: Add new 2.5G/5G/25G/40G/50G/100G/200G/400G media types Upcoming Ethernet hardware will support new media types that aren't in the kernel yet, so they are added here. These mostly include new 25G/50G/100G media types; and this commit introduces new 200G/400G speeds and media. Reviewed by: hselasky@, jhb@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D16731	2018-08-22 18:19:56 +00:00
Matt Macy	77ad07b6a3	fix copy/paste error when clearing ifma flag CID: 1395119 Reported by: vangyzen	2018-08-21 22:59:22 +00:00
Conrad Meyer	b8e771e97a	Back out r338035 until Warner is finished churning GSoC PNP patches I was not aware Warner was making or planning to make forward progress in this area and have since been informed of that. It's easy to apply/reapply when churn dies down.	2018-08-19 00:46:22 +00:00
Conrad Meyer	faa319436f	Remove unused and easy to misuse PNP macro parameter Inspired by r338025, just remove the element size parameter to the MODULE_PNP_INFO macro entirely. The 'table' parameter is now required to have correct pointer (or array) type. Since all invocations of the macro already had this property and the emitted PNP data continues to include the element size, there is no functional change. Mostly done with the coccinelle 'spatch' tool: $ cat modpnpsize0.cocci @normaltables@ identifier b,c; expression a,d,e; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,d, -sizeof(d[0]), e); @singletons@ identifier b,c,d; expression a; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,&d, -sizeof(d), 1); $ rg -l MODULE_PNP_INFO -- sys \| \ xargs spatch --in-place --sp-file modpnpsize0.cocci (Note that coccinelle invokes diff(1) via a PATH search and expects diff to tolerate the -B flag, which BSD diff does not. So I had to link gdiff into PATH as diff to use spatch.) Tinderbox'd (-DMAKE_JUST_KERNELS).	2018-08-19 00:22:21 +00:00
Navdeep Parhar	32a52e9e39	if_vlan(4): A VLAN always has a PCP and its ifnet's if_pcp should be set to the PCP value in use instead of IFNET_PCP_NONE. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-17 01:03:23 +00:00
Navdeep Parhar	32d2623ae2	Add the ability to look up the 3b PCP of a VLAN interface. Use it in toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752	2018-08-16 23:46:38 +00:00
Matt Macy	f9be038601	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu	2018-08-15 20:23:08 +00:00
Andrew Gallatin	5ccac9f972	lagg: allow lacp to manage the link state Lacp needs to manage the link state itself. Unlike other lagg protocols, the ability of lacp to pass traffic depends not only on the lagg members having link, but also on the lacp protocol converging to a distributing state with the link partner. If we prematurely mark the link as up, then we will send a gratuitous arp (via arp_handle_ifllchange()) before the lacp interface is capable of passing traffic. When this happens, the gratuitous arp is lost, and our link partner may cache a stale mac address (eg, when the base mac address for the lagg bundle changes, due to a BIOS change re-ordering NIC unit numbers) Reviewed by: jtl, hselasky Sponsored by: Netflix	2018-08-13 14:13:25 +00:00
Kristof Provost	91e0f2d200	pf: Increase default hash table size Now that we (by default) limit the number of states to 100.000 it makse sense to also adjust the default size of the hash table. Based on the benchmarking results in https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/pf-states_hashsize/results/fbsd12-head.r332390/README.md 128K entries offers a good compromise between performance and memory use. Users may still overrule this setting with the net.pf.states_hashsize and net.pf.source_nodes_hashsize loader(8) tunables.	2018-08-05 13:54:37 +00:00
Patrick Kelsey	8f410865b8	Mark the send queue ready so ALTQ is available.	2018-08-04 01:45:17 +00:00
Andrew Turner	b6ea4c5a2a	As with DPCPU_DEFINE_STATIC make VNET_DEFINE_STATIC non-static on arm64 in modules. It also fails in the same way, we are unable to relocate static variables as the compiler uses PC-relative loads with nothing for the kernel linker to relocate. Sponsored by: DARPA, AFRL	2018-07-30 15:05:07 +00:00
Andrew Turner	cd2106eaea	Ensure the DPCPU and VNET module spaces are aligned to hold a pointer. Previously they may have been aligned to a char, leading to misaligned DPCPU and VNET variables. Sponsored by: DARPA, AFRL	2018-07-30 14:25:17 +00:00
Andrew Turner	bc61d94997	As with DPCPU_DEFINE make it a compile error to use static with VNET_DEFINE. There is the VNET_DEFINE_STATIC macro for that.	2018-07-30 12:44:44 +00:00
Patrick Kelsey	b8ca475604	ALTQ support for iflib. Reviewed by: jmallett, mmacy Differential Revision: https://reviews.freebsd.org/D16433	2018-07-25 22:46:36 +00:00
Marius Strobl	c9a49a4fd8	Since r336611, n is only used for INET in iflib_parse_header(). Reported by: rpokala	2018-07-24 23:40:27 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Andrew Turner	fceba23f93	As with DPCPU create VNET_DEFINE_STATIC for when a variable needs to be declaired static. This will allow us to change the definition on arm64 as it has the same issues described in r336349. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:31:16 +00:00
Eugene Grosbein	804771f553	epair(4): make sure we do not duplicate MAC addresses in case of reused if_index. PR: 229957 Tested by: O. Hartmann <ohartmann@walstatt.org> Approved by: avg (mentor)	2018-07-23 07:11:58 +00:00
Marius Strobl	7474544bac	Use the maximum of isc_tx_{nsegments,tso_segments_max} for MAX_TX_DESC. Since r336313, TSO support for LEM-class devices is removed again as it was before the conversion of {l,}em(4) to iflib(4) in r311849 and as a result, isc_tx_tso_segments_max is 0 for LEM-class devices now. Thus, inappropriate watermarks were used for this class. This is really only a band-aid, though, because so far iflib(9) doesn't fully take into account that DMA engines can support different maxima of segments for transfers of TSO and non-TSO packets. For example, the DESC_RECLAIMABLE macro is based on isc_tx_nsegments while MAX_TX_DESC used isc_tx_tso_segments_max only. For most in-tree consumers that doesn't make a difference as the maxima are the same for both kinds of transfers (that is, apart from the fact that TSO may require up to 2 sentinel descriptors but also not with every MAC supported). However, isc_tx_nsegments is 8 but isc_tx_tso_segments_max is 85 by default with ixl(4).	2018-07-22 17:51:11 +00:00
Marius Strobl	8b8d90931d	- Given that the controlling expression of the receive loop in iflib_rxeof() tests for avail > 0, avail can never be 0 within that loop. Thus, move decrementing avail and budget_left into the loop and before the code which checks for additional descriptors having become available in case all the previous ones have been processed but there still is budget left so the latter code works as expected. [1] - In iflib_{busdma_load_mbuf_sg,parse_header}(), remove dead stores to m and n respectively. [2, 3] - In collapse_pkthdr(), ensure that m_next isn't NULL before dereferencing it. [4] - Remove a duplicate assignment of segs in iflib_encap(). Reported by: Coverity CID: 1356027 [1], 1356047 [2], 1368205 [3], 1356028 [4]	2018-07-22 17:45:44 +00:00
Stephen Hurd	fe51d4cdfe	Add knob to control tx ring abdication. r323954 changed the mp ring behaviour when 64-bit atomics were available to abdicate the TX ring rather than having one become a consumer thereby running to completion on TX. The consumer of the mp ring was then triggered in the tx task rather than blocking the TX call. While this significantly lowered the number of RX drops in small-packet forwarding, it also negatively impacts TX performance. With this change, the default behaviour is reverted, causing one TX ring to become a consumer during the enqueue call. A new sysctl, dev.X.Y.iflib.tx_abdicate is added to control this behaviour. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16302	2018-07-20 17:45:26 +00:00
Stephen Hurd	dd7fbcf193	Improve netmap TX handling when TX IRQs are not used/supported Use the timer to poll for TX completions when there are outstanding TX slots. Track when the last driver timer was called to prevent overcalling it. Also clean up some kring vs NIC ring usage. Reviewed by: marius, Johannes Lundberg <johalun0@gmail.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16300	2018-07-20 17:24:45 +00:00
Andrey V. Elsukov	acf673edf0	Move invoking of callout_stop(&lle->lle_timer) into llentry_free(). This deduplicates the code a bit, and also implicitly adds missing callout_stop() to in[6]_lltable_delete_entry() functions. PR: 209682, 225927 Submitted by: hselasky (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4605	2018-07-17 11:33:23 +00:00
Marius Strobl	7f87c0406d	Assorted TSO fixes for em(4)/iflib(9) and dead code removal: - Ever since the workaround for the silicon bug of TSO4 causing MAC hangs was committed in r295133, CSUM_TSO always got disabled unconditionally by em(4) on the first invocation of em_init_locked(). However, even with that problem fixed, it turned out that for at least e. g. 82579 not all necessary TSO workarounds are in place, still causing MAC hangs even at Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled in r323292 (r323293 for stable/10) for the EM-class by default, allowing users to turn it on if it happens to work with their particular EM MAC in a Gigabit-only environment. In head, the TSO workaround for speeds other than Gigabit was lost with the conversion to iflib(9) in r311849 (possibly along with another one or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4 got enabled by default again, causing device hangs. Therefore, change the default for this hardware class back to have TSO4 off, allowing users to turn it on manually if it happens to work in their environment as we do in stable/{10,11}. An alternative would be to add a whitelist of EM-class devices where TSO4 actually is reliable with the workarounds in place, but given that the advantage of TSO at Gigabit speed is rather limited - especially with the overhead of these workarounds -, that's really not worth it. [1] This change includes the addition of an isc_capabilities to struct if_softc_ctx so iflib(9) can also handle interface capabilities that shouldn't be enabled by default which is used to handle the default-off capabilities of e1000 as suggested by shurd@ and moving their handling from em_setup_interface() to em_if_attach_pre() accordingly. - Although 82543 support TSO4 in theory, the former lem(4) didn't have support for TSO4, presumably because TSO4 is even more broken in the LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class devices was enabled as part of the conversion to iflib(9) in r311849, causing device hangs. So revert back to the pre-r311849 behavior of not supporting TSO4 for LEM-class at all, which includes not creating a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2] - In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET (65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO DMA must have a maxsize of the maximum TSO size plus the size of a VLAN header for software VLAN tagging. The iflib(9) converted em(4), thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET in em_setup_interface() (apparently, left-over from pre-iflib(9) times). So remove the later and correct iflib(9) to correctly cap the maximum TSO size reported to the stack at IP_MAXPACKET. While at it, let iflib(9) use if_sethwtsomax(). This change includes the addition of isc_tso_max{seg,}size DMA engine constraints for the TSO DMA tag to struct if_shared_ctx and letting iflib_txsd_alloc() automatically adjust the maxsize of that tag in case IFCAP_VLAN_MTU is supported as requested by shurd@. - Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9) so adjustment is automatically done in case IFCAP_VLAN_MTU is supported. As a consequence, this adjustment now is also done in case of bnxt(4) which missed it previously. - Move the reduction of the maximum TSO segment count reported to the stack by the number of m_pullup(9) calls (which in the worst case, can add another mbuf and, thus, the requirement for another DMA segment each) in the transmit path for performance reasons from em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now done in iflib_parse_header() rather than in the no longer existing em_xmit(). Moreover, this optimization applies to all drivers using iflib(9) and not just em(4); all in-tree iflib(9) consumers still have enough room to handle full size TSO packets. Also, reduce the adjustment to the maximum number of m_pullup(9)'s now performed in iflib_parse_header(). - Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9) in r311849 and r335338 respectively, these drivers didn't enable IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on by default but also lagg(4) was fixed in that regard in r203548. So just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling in {em,ixl,ixlv}_setup_interface(). - Nuke other redundant IFCAP_ setting in {em,ixl,ixlv}_setup_interface() which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre() now. - Remove some redundant/dead setting of scctx->isc_tx_csum_flags in em_if_attach_pre(). - Remove some IFCAP_* duplicated either directly or indirectly (e. g. via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS. - Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as iflib(9) adds that capability unconditionally. - Remove some unused macros from em(4). - Bump __FreeBSD_version as some of the above changes require the modules of drivers using iflib(9) to be recompiled. Okayed by: sbruno@ at 201806 DevSummit Transport Working Group [1] Reviewed by: sbruno (earlier version), erj PR: 219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2] Differential Revision: https://reviews.freebsd.org/D15720	2018-07-15 19:04:23 +00:00
Kristof Provost	b895839daa	pf: Fix typo in r336221 Reported by: olivier@	2018-07-12 18:07:28 +00:00
Kristof Provost	f4cdb8aedd	pf: Increate default state table size The typical system now has a lot more memory than when pf was new, and is also expected to handle more connections. Increase the default size of the state table. Note that users can overrule this using 'set limit states' in pf.conf. From OpenBSD: The year is 2018. Mercury, Bowie, Cash, Motorola and DEC all left us. Just pf still has a default state table limit of 10000. Had! Now it's a tiny little bit more, 100k. lead guitar: me ok chorus: phessler theo claudio benno background school girl laughing: bob Obtained from: OpenBSD	2018-07-12 16:35:35 +00:00
Andrey V. Elsukov	98a8fdf6da	Deduplicate the code. Add generic function if_tunnel_check_nesting() that does check for allowed nesting level for tunneling interfaces and also does loop detection. Use it in gif(4), gre(4) and me(4) interfaces. Differential Revision: https://reviews.freebsd.org/D16162	2018-07-09 11:03:28 +00:00
Sean Bruno	2da1967762	struct ifmediareq *ifmrp is only used in the COMPAT_FREEBSD32 parts of ifioctl(). Move it inside the proper #ifdef. This was throwing a valid "Assigned but unused" warning with gcc. Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16063	2018-07-07 13:35:06 +00:00
Will Andrews	cc535c95ca	Revert r335833. Several third-parties use at least some of these ioctls. While it would be better for regression testing if they were used in base (or at least in the test suite), it's currently not worth the trouble to push through removal. Submitted by: antoine, markj	2018-07-04 03:36:46 +00:00
Matt Macy	6573d7580b	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
Will Andrews	c1887e9f09	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
Andrey V. Elsukov	6e081509db	Add NULL pointer check. encap_lookup_t method can be invoked by IP encap subsytem even if none of gif/gre/me interfaces are exist. Hash tables are allocated on demand, when first interface is created. So, make NULL pointer check before doing access to hash table. PR: 229378	2018-06-28 11:39:27 +00:00
Andrey V. Elsukov	ca3cd72b17	Move BPFIF_* macro definitions into .c file, where struct bpf_if is declared. They are only used in this file and there is no need to export them via bpfdesc.h.	2018-06-19 10:34:45 +00:00
Eric Joyner	dfae03b5b5	iflib: Style fixes MFC after: 1 week	2018-06-18 17:27:43 +00:00
Marius Strobl	e4defe55a8	Assorted fixes to MSI-X/MSI/INTx setup in iflib(9): - In iflib_msix_init(), VMMs with broken MSI-X activation are trying to be worked around by manually enabling PCIM_MSIXCTRL_MSIX_ENABLE before calling pci_alloc_msix(9). Apart from constituting a layering violation, this has the problem of leaving PCIM_MSIXCTRL_MSIX_ENABLE enabled when falling back to MSI or INTx when e. g. MSI-X is black- listed and initially also when disabled via hw.pci.enable_msix. The later in turn was incorrectly worked around in r325166. Since r310806, pci(4) itself has code to deal with broken MSI-X handling of VMMs, so all of these workarounds in iflib(9) can go, fixing non-working interrupts when falling back to MSI/INTx. In any case, possibly further adjustments to broken MSI-X activation of VMMs like enabling r310806 by default in VM environments need to be placed into pci(4), not iflib(9). [1] - Also remove the pci_enable_busmaster(9) call from iflib_msix_init(), which is already more properly invoked from iflib_device_attach(). - When falling back to MSI/INTx, release the MSI-X BAR resource again. - When falling back to INTx, ensure scctx->isc_vectors is set to 1 and not to something higher from a device with more than one MSI message supported. - Make the nearby ring_state(s) stuff (static) const. Discussed with: jhb at BSDCan 2018 [1] Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D15729	2018-06-17 20:33:02 +00:00
Andrey V. Elsukov	ae11b3829c	Fix typo. Reported by: rpokala	2018-06-16 19:21:09 +00:00
Andrey V. Elsukov	20efcfc602	Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Rmlock fits better for such workloads. Reviewed by: melifaro, olivier Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15789	2018-06-16 08:26:23 +00:00
Andrey V. Elsukov	9597ff83b8	Add missing BPF_MTAP2() for outbound packets.	2018-06-14 15:04:30 +00:00
Andrey V. Elsukov	2addcba7d5	Convert if_me(4) driver to use encap_lookup_t method and be lockless on data path.	2018-06-14 14:53:24 +00:00
Jonathan T. Looney	0766f278d8	Make UMA and malloc(9) return non-executable memory in most cases. Most kernel memory that is allocated after boot does not need to be executable. There are a few exceptions. For example, kernel modules do need executable memory, but they don't use UMA or malloc(9). The BPF JIT compiler also needs executable memory and did use malloc(9) until r317072. (Note that a side effect of r316767 was that the "small allocation" path in UMA on amd64 already returned non-executable memory. This meant that some calls to malloc(9) or the UMA zone(9) allocator could return executable memory, while others could return non-executable memory. This change makes the behavior consistent.) This change makes malloc(9) return non-executable memory unless the new M_EXEC flag is specified. After this change, the UMA zone(9) allocator will always return non-executable memory, and a KASSERT will catch attempts to use the M_EXEC flag to allocate executable memory using uma_zalloc() or its variants. Allocations that do need executable memory have various choices. They may use the M_EXEC flag to malloc(9), or they may use a different VM interfact to obtain executable pages. Now that malloc(9) again allows executable allocations, this change also reverts most of r317072. PR: 228927 Reviewed by: alc, kib, markj, jhb (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15691	2018-06-13 17:04:41 +00:00
Andrey V. Elsukov	a5185adeb6	Rework if_gre(4) to use encap_lookup_t method to speedup lookup of needed interface when many gre interfaces are present. Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc.	2018-06-13 11:11:33 +00:00
Jonathan T. Looney	16a227c7c9	Fix a memory leak for the BIOCSETWF ioctl on kernels with the BPF_JITTER option. The BPF code was creating a compiled filter in the common filter-creation path. However, BPF only uses compiled filters in the read direction. When creating a write filter, the common filter-creation code was creating an unneeded write filter and leaking the memory used for that. MFC after: 2 weeks Sponsored by: Netflix	2018-06-11 23:32:06 +00:00
Andrey V. Elsukov	44bcc06816	Explicitly change the link state when we assingn an address. Since we are setting IFF_UP flag on SIOCSIFADDR, it is possible, that after this link state information still not initialized properly. This leads to problems with routing, since now interface has IFCAP_LINKSTATE capability and a route is considered as working only when interface's link state is in LINK_STATE_UP (see RT_LINK_IS_UP() macro). Reported by: Marek Zarychta MFC after: 3 days	2018-06-09 09:57:14 +00:00
Stephen Hurd	3ab4a96085	Remove tx task spinning added in r333686 This caused issues with PASTE. Just remove the reschedule since the DELAY() should be enough for use cases such as pkt-gen which were failing before the change. Reported by: Michio Honda Sponsored by: Limelight Networks	2018-06-08 21:49:19 +00:00
Mateusz Guzik	b8af2820f6	uma: fix up r334824 Turns out there is code which ends up passing M_ZERO to counters. Since counters zero unconditionally on their own, just ignore drop the flag in that place.	2018-06-08 05:40:36 +00:00
Matt Macy	58378a8971	rtentry_zinit: don't blindly pass through M_ZERO to counter alloc	2018-06-08 05:17:06 +00:00
Eric Joyner	a06424ddd3	iflib: Record TCP checksum info in iflib when TCP checksum is requested ixl(4) (when it switches over to using iflib) devices need the TCP header length in order to do TCP checksum offload. Reviewed by: gallatin@, shurd@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15558	2018-06-07 13:03:07 +00:00
Andrey V. Elsukov	b941bc1d6e	Rework if_gif(4) to use new encap_lookup_t method to speedup lookup of needed interface when many gif interfaces are present. Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc. Interfaces with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST. Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed 16 years ago, and actually it could work only for outbound direction. Each protocol, that can be handled by if_gif(4) interface is registered by separate encap handler, this helps avoid invoking the handler for unrelated protocols (GRE, PIM, etc.). This change allows dramatically improve performance when many gif(4) interfaces are used. Sponsored by: Yandex LLC	2018-06-05 21:24:59 +00:00
Andrey V. Elsukov	6d8fdfa9d5	Rework IP encapsulation handling code. Currently it has several disadvantages: - it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all. - it uses single list to keep encap handlers for both INET and INET6 families. - struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree. - matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. The search takes O(n) for n interfaces. All this work is done with exclusive lock held. What this patch includes: - the datapath is converted to be lockless using epoch(9) KPI. - struct encaptab now linked using CK_LIST. - all unused fields removed from struct encaptab. Several new fields addedr: min_length is the minimum packet length, that encapsulation handler expects to see; exact_match is maximum number of bits, that can return an encapsulation handler, when it wants to consume a packet. - IPv6 and IPv4 handlers are stored in separate lists; - added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces. - the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it. - encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input_t method. encap_getarg() funtions is removed. - all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck() to determine the needed handler. - introduced struct encap_config, it contains parameters of encap handler that is going to be registered by encap_attach() function. - encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped. - all current consumers changed to use new KPI. Reviewed by: mmacy Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15617	2018-06-05 20:51:01 +00:00
Matt Macy	a6bc59f203	Reduce overhead of entropy collection - move harvest mask check inline - move harvest mask to frequently_read out of actively modified cache line - disable ether_input collection and describe its limitations in NOTES Typically entropy collection in ether_input was stirring zero in to the entropy pool while at the same time greatly reducing max pps. This indicates that perhaps we should more closely scrutinize how much entropy we're getting from a given source as well as what our actual entropy collection needs are for seeding Yarrow. Reviewed by: cem, gallatin, delphij Approved by: secteam Differential Revision: https://reviews.freebsd.org/D15526	2018-05-31 21:53:07 +00:00
Hans Petter Selasky	5fd1ea0810	Re-apply r190640. - Restore local change to include <net/bpf.h> inside pcap.h. This fixes ports build problems. - Update local copy of dlt.h with new DLT types. - Revert no longer needed <net/bpf.h> includes which were added as part of r334277. Suggested by: antoine@, delphij@, np@ MFC after: 3 weeks Sponsored by: Mellanox Technologies	2018-05-31 09:11:21 +00:00
Matt Macy	91d6c9b93e	if_setlladdr: don't call ioctl in epoch context PR: 228612 Reported by: markj	2018-05-30 21:46:10 +00:00
Kristof Provost	d25c25dc52	pf: Add missing include statement rmlocks require <sys/lock.h> as well as <sys/rmlock.h>. Unbreak mips build.	2018-05-30 12:40:37 +00:00
Kristof Provost	455969d305	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
Stephen Hurd	3e0e6330b5	iflib: mark irq allocation name parameter as constant The name parameter passed to iflib_irq_alloc_generic and iflib_softirq_alloc_generic is never modified. Many places in code pass string literals and thus should not be modified. Mark the name parameter as a const char * instead, so that we enforce that the name is not modified before passing to bus_describe_intr() Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: kmacy Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15343	2018-05-29 21:56:39 +00:00
Matt Macy	6c3c319414	iflib: hold context lock across detach for drivers that need it	2018-05-29 18:03:43 +00:00
Matt Macy	134804c89a	rt_getifa_fib: don't use ifa but info->rti_ifa Reported by: kp	2018-05-29 07:14:57 +00:00
Matt Macy	1ebec5faf4	route: fix missed ref adds - ensure that we bump the ifa ref whenever we add a reference - defer freeing epoch protected references until after the if_purgaddrs loop	2018-05-29 00:53:53 +00:00
Eric Joyner	1d7ef1867a	iflib: Add new shared flag: IFLIB_ADMIN_ALWAYS_RUN ixl(4)'s nvmupdate utility expects the nvmupdate process to run while the interface is down; these nvm update commands use the admin queue, so the admin queue needs to be able to generate interrupts and be processed while the interface is down. So add a flag that ixl(4) sets that lets the entire admin task run even when the interface is marked down/IFF_DRV_RUNNING isn't set. With this change, nvmupdate should function like it did pre-iflib. Reviewed by: gallatin@, sbruno@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15575	2018-05-26 00:46:08 +00:00
Matt Macy	9379029a92	rtrequest1_fib: we need to always bump the ifaddr refcount when we take a reference from an rtentry. r334118 introduced a case when this was not done. While we're here make the intent more obvious by moving the refcount bump down to when we know we'll actually need it. Reported by: markj	2018-05-25 19:48:26 +00:00
Matt Macy	0f8d79d977	CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors	2018-05-24 23:21:23 +00:00
Matt Macy	5328b11c95	if_delgroups: add missed unlock introduced by r334118	2018-05-24 17:54:08 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
Luca Pizzamiglio	11d416666d	Improve MAC address uniqueness on if_epair(4). As reported in PR184149, it can happen that epair devices can have the same MAC address. This solution is based on a 32-bit hash, obtained combining the if_index of the a interface and the hostid. If the hostid is zero, a random number is used. PR: 184149 Reviewed by: wollman, eugen Approved by: cognet Differential Revision: https://reviews.freebsd.org/D15329	2018-05-23 13:10:57 +00:00
Mark Johnston	db5a36bddf	Simplify lagg_input(). No functional change intended. MFC after: 2 weeks	2018-05-22 15:35:38 +00:00
Matt Macy	fd04260d3f	ck: simplify interface with libkvm consumers by defining ck_queue types as their queue.h equivalents if !_KERNEL	2018-05-21 01:53:23 +00:00
Matt Macy	f6cb0dea4c	net: fix uninitialized variable warning	2018-05-19 19:00:04 +00:00
Matt Macy	e335651e1e	mp_ring: fix i386 Even though 64-bit atomics are supported on i386 there are panics indicating that the code does not work correctly there. Switch to mutex based variant (and fix that while we're here). Reported by: pho, kib	2018-05-19 16:44:12 +00:00
Matt Macy	46d0f824be	net: fix set but not used	2018-05-19 05:27:49 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Matt Macy	f2d19f98c1	epoch(9): allocate net epochs earlier in boot	2018-05-18 18:48:00 +00:00
Matt Macy	d71e30de40	epoch: move epoch variables to read mostly section	2018-05-18 17:58:15 +00:00
Ed Maste	891cf3ed44	Use NULL for SYSINIT's last arg, which is a pointer type Sponsored by: The FreeBSD Foundation	2018-05-18 17:58:09 +00:00
Matt Macy	70398c2f86	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj	2018-05-18 17:29:43 +00:00
Matt Macy	5e68a3dfe3	epoch: add non-preemptible "critical" variant adds: - epoch_enter_critical() - can be called inside a different epoch, starts a section that will acquire any MTX_DEF mutexes or do anything that might sleep. - epoch_exit_critical() - corresponding exit call - epoch_wait_critical() - wait variant that is guaranteed that any threads in a section are running. - epoch_global_critical - an epoch_wait_critical safe epoch instance Requested by: markj Approved by: sbruno	2018-05-18 01:52:51 +00:00
Matt Macy	2aa6f526de	Fix !netmap build post r333686 Approved by: sbruno	2018-05-16 22:25:47 +00:00
Stephen Hurd	5ee36c68e1	Work around lack of TX IRQs in iflib for netmap When poll() is called via netmap, txsync is initially called, and if there are no available buffers to reclaim, it waits for the driver to notify of new buffers. Since the TX IRQ is generally not used in iflib drivers, this ends up causing a timeout. Work around this by having the reclaim DELAY(1) if it's initially unable to reclaim anything, then schedule the tx task, which will spin by continuously rescheduling itself until some buffers are reclaimed. In general, the delay is enough to allow some buffers to be reclaimed, so spinning is minimized. Reported by: Johannes Lundberg <johalun0@gmail.com> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15455	2018-05-16 21:03:22 +00:00
Stephen Hurd	99031b8f7d	Replace rmlock with epoch in lagg Use the new epoch based reclamation API. Now the hot paths will not block at all, and the sx lock is used for the softc data. This fixes LORs reported where the rwlock was obtained when the sxlock was held. Submitted by: mmacy Reported by: Harry Schmalzbauer <freebsd@omnilan.de> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15355	2018-05-14 20:06:49 +00:00
Matt Macy	09f6ff4f1a	iflib(9): Add support for cloning pseudo interfaces Part 3 of many ... The VPC framework relies heavily on cloning pseudo interfaces (vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc). This pulls in that piece. Some ancillary changes get pulled in as a side effect. Reviewed by: shurd@ Approved by: sbruno@ Sponsored by: Joyent, Inc. Differential Revision: https://reviews.freebsd.org/D15347	2018-05-11 20:08:28 +00:00
Andrey V. Elsukov	e287c474be	Apply the change from r272770 to if_ipsec(4) interface. It is guaranteed that if_ipsec(4) interface is used only for tunnel mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header. Thus we can consider it as new packet and clear the protocols flags. This allows ICMP/ICMPv6 properly handle errors that may cause this packet. PR: 228108 MFC after: 1 week	2018-05-11 16:50:25 +00:00
Matt Macy	5c30b378f0	Allow different bridge types to coexist if_bridge has a lot of limitations that make it scale poorly to higher data rates. In my projects/VPC branch I leverage the bridge interface between layers for my high speed soft switch as well as for purposes of stacking in general. Reviewed by: sbruno@ Approved by: sbruno@ Differential Revision: https://reviews.freebsd.org/D15344	2018-05-11 05:00:40 +00:00
Dag-Erling Smørgrav	20f8d7bc7e	Slight cleanup of interface event logging. Make if_printf() use vlog() instead of vprintf(). This means it can no longer return the number of characters printed, as it used to, but every single call to if_printf() in the entire kernel ignores the return value anyway; just return 0 so we don't have to change the prototype. Consistently use if_printf() throughout sys/net/if.c, instead of a mixture of if_printf() and log(). In ifa_maintain_loopback_route(), don't needlessly log an error if we either failed to add a route because it already existed or failed to remove one because it did not. We still return an error code, though. MFC after: 1 week	2018-05-11 00:19:49 +00:00
Matt Macy	7bf272a612	Allocate epoch for networking at startup Additionally add CK to include paths for modules Approved by: sbruno@	2018-05-10 19:13:00 +00:00
Andrey V. Elsukov	2e4531a12b	Add IFCAP_LINKSTATE support to if_loop(4). Reviewed by: wollman Obtained from: Yandex LLC MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15278	2018-05-09 10:50:51 +00:00
Stephen Hurd	ac88e6da11	iflib: print message when iflib_tx_structures_setup fails Print a message when iflib_tx_structures_setup fails, like we do for iflib_rx_structures_setup. Now that we always print a message from within iflib_qset_structures_setup when it fails, stop printing one in iflib_device_register() at the call site. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15300	2018-05-08 17:15:10 +00:00
Stephen Hurd	6108c01395	iflib: cleanup queues when iflib_device_register fail Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15299	2018-05-08 16:56:02 +00:00
Andrew Gallatin	1f7ce05d1d	Fix an off-by-one error when deciding to request a tx interrupt The canonical check for whether or not a ring is drainable is TXQ_AVAIL() > MAX_TX_DESC() + 2. Use this same construct here, in order to avoid a potential off-by-one error where we might otherwise fail to request an interrupt. Reviewed by: mmacy Sponsored by: Netflix	2018-05-07 18:11:22 +00:00
Matt Macy	b6f6f88018	r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a a multicast update would still be queued for deletion when ifconfig deleted the interface thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses on the interface. Synchronously remove all external references to a multicast address before enqueueing for delete. Reported by: lwhsu Approved by: sbruno	2018-05-06 20:34:13 +00:00
Matt Macy	7edd877a1e	The ifnet pointer (ifp) in rt_newaddrmsg can be valid without ifp->if_addr being set if if the ifnet is still live by way of a reference but in line for deletion. Check ifp->if_addr before dereferencing. Approved by: sbruno	2018-05-06 20:32:47 +00:00
Mark Johnston	9461882562	Add netdump support to iflib. em(4) and igb(4) were tested by me, and ixgbe(4) and bnxt(4) were tested by sbruno. Reviewed by: mmacy, shurd MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15262	2018-05-06 00:57:52 +00:00
Mark Johnston	e505460228	Import the netdump client code. This is a component of a system which lets the kernel dump core to a remote host after a panic, rather than to a local storage device. The server component is available in the ports tree. netdump is particularly useful on diskless systems. The netdump(4) man page contains some details describing the protocol. Support for configuring netdump will be added to dumpon(8) in a future commit. To use netdump, the kernel must have been compiled with the NETDUMP option. The initial revision of netdump was written by Darrell Anderson and was integrated into Sandvine's OS, from which this version was derived. Reviewed by: bdrewery, cem (earlier versions), julian, sbruno MFC after: 1 month X-MFC note: use a spare field in struct ifnet Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15253	2018-05-06 00:38:29 +00:00
Matt Macy	1ae4848cd0	fix gcc8 warnings Approved by: sbruno	2018-05-04 18:57:05 +00:00
Stephen Hurd	b89827a052	iflib: fix invalid free during queue allocation failure In r301567, code was added to cleanup to prevent memory leaks for the Tx and Rx ring structs. This code carefully tracked txq and rxq, and made sure to free them properly during cleanup. Because we assigned the txq and rxq pointers into the ctx->ifc_txqs and ctx->ifc_rxqs, we carefully reset these pointers to NULL, so that cleanup code would not accidentally free the memory twice. This was changed by r304021 ("Update iflib to support more NIC designs"), which removed this resetting of the pointers to NULL, because it re-used the txq and rxq pointers as an index into the queue set array. Unfortunately, the cleanup code was left alone. Thus, if we fail to allocate DMA or fail to configure the queues using the drivers ifdi methods, we will attempt to free txq and rxq. These variables would now incorrectly point to the wrong location, resulting in a page fault. There are a number of methods to correct this, but ultimately the root cause was that we reuse the txq and rxq pointers for two different purposes. Instead, when allocating, store the returned pointer directly into ctx->ifc_txqs and ctx->ifc_rxqs. Then, assign this to txq and rxq as index pointers before starting the loop to allocate each queue. Drop the cleanup code for txq and rxq, and only use ctx->ifc_txqs and ctx->ifc_rxqs. Thus, we no longer need to free txq or rxq under any error flow, and intsead rely solely on the pointers stored in ctx->ifc_txqs and ctx->ifc_rxqs. This prevents the invalid free(), and ensures that we still properly cleanup after ourselves as before when failing to allocate. Submitted by: Jacob Keller Reviewed by: gallatin, sbruno Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15285	2018-05-04 15:20:34 +00:00
Stephen Hurd	4d613f5d04	iflib: remove unused brscp pointer from iflib_queues_alloc This pointer was no longer written to as of r315217. Since nothing writes to the variable, remove it. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin, kmacy, sbruno Differential Revision: https://reviews.freebsd.org/D15284	2018-05-04 15:11:16 +00:00
Stephen Hurd	aa8a24d37f	Allow iflib NIC drivers to sleep rather than busy wait Since the move to SMP NIC driver locking has had to go through serious contortions using mtx around long running hardware operations. This moves iflib past that. Individual drivers may now sleep when appropriate. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14983	2018-05-03 17:02:31 +00:00
Stephen Hurd	f3e1324b41	Separate list manipulation locking from state change in multicast Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead. Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969	2018-05-02 19:36:29 +00:00
Andrew Gallatin	f759470750	Fix iflib_encap() EFBIG handling bugs 1) Don't give up if m_collapse() fails. Rather than giving up, try m_defrag() immediately. 2) Fix a leak where, if the NIC driver rejected the defrag'ed chain as having too many segments, we would fail to free the chain. Reviewed by: Matthew Macy <mmacy@mattmacy.io> (this version of patch) Submitted by: Matthew Macy <mmacy@mattmacy.io> (early version of leak fix)	2018-04-30 23:53:27 +00:00
Hans Petter Selasky	4a381a9e42	Add network device event for priority code point, PCP, changes. When the PCP is changed for either a VLAN network interface or when prio tagging is enabled for a regular ethernet network interface, broadcast the IFNET_EVENT_PCP event so applications like ibcore can update its GID tables accordingly. MFC after: 3 days Reviewed by: ae, kib Differential Revision: https://reviews.freebsd.org/D15040 Sponsored by: Mellanox Technologies	2018-04-26 08:58:27 +00:00
Brooks Davis	3edb7f4eaf	Translate 32-bit ifmedia requests into native ones. We use transformation rather than accessors as virtually ever driver implements SIOCGIFMEDIA and all would have to be touched. Keep the code readable by always performing copies and (possiably no-op) transforms. Reviewed by: jhb, kib Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14996	2018-04-25 15:30:42 +00:00
Mark Johnston	8bd1f0cfd3	Use dead_bpf_if instead of bp_null. This fixes a -Wunused error when DEV_BPF and NETGRAPH_BPF are not defined. Also remove a stray semicolon added in r332812. X-MFC with: r332812	2018-04-24 17:42:25 +00:00
Brooks Davis	4204224162	Finish removing FDDI and tokenring media support. This fixes media display for 802.11 wireless devices. Software outside the base system that uses these media types and defines should use #ifdef IFM_FDDI or IFM_TOKEN to include or remove support. Reported by: zeising Reviewed by: emaste, kib, zeising Tested by: zeising Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15170	2018-04-23 21:10:33 +00:00
Andrey V. Elsukov	2b9600b449	Add dead_bpf_if structure, that should be used as fake bpf_if during ifnet detach. Since destroying interface is not atomic operation and due to the lack of synhronization during destroy, it is possible, that in the time between bpfdetach() and if_free() some queued on destroying interface mbuf will be used by ether_input_internal() and bpf_peers_present() can dereference NULL bpf_if pointer. To protect from this, assign pointer to empty bpf_if_ext structure instead of NULL pointer after bpfdetach(). Reviewed by: melifaro, eugen Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15083	2018-04-20 09:57:31 +00:00
Stephen Hurd	0b75ac7796	iflib: Fix queue distribution when there are no threads Previously, if there are no threads, all queues which targeted cores that share an L2 cache were bound to a single core. The intent is to distribute them across these cores. Reported by: olivier Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15120	2018-04-18 15:34:18 +00:00
Brooks Davis	3a4fc8a8a1	Remove support for the Arcnet protocol. While Arcnet has some continued deployment in industrial controls, the lack of drivers for any of the PCI, USB, or PCIe NICs on the market suggests such users aren't running FreeBSD. Evidence in the PR database suggests that the cm(4) driver (our sole Arcnet NIC) was broken in 5.0 and has not worked since. PR: 182297 Reviewed by: jhibbits, vangyzen Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15057	2018-04-13 21:18:04 +00:00
Sean Bruno	7b610b60df	Restore r332389 after resolution of locking fixes. Add one extra lock initialization to iflib_register() that was missed in the git<->phab conversion. Split out flag manipulation from general context manipulation in iflib To avoid blocking on the context lock in the swi thread and risk potential deadlocks, this change protects lighter weight updates that only need to be consistent with each other with their own lock. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14967	2018-04-12 14:35:37 +00:00
Vincenzo Maffione	2ff91c175e	netmap: align codebase to the current upstream (commit id 3fb001303718146) Changelist: - Turn tx_rings and rx_rings arrays into arrays of pointers to kring structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib, vtnet and ptnet drivers to cope with the change. - Generalize the nm_config() callback to accept a struct containing many parameters. - Introduce NKR_FAKERING to support buffers sharing (used for netmap pipes) - Improved API for external VALE modules. - Various bug fixes and improvements to the netmap memory allocator, including support for externally (userspace) allocated memory. - Refactoring of netmap pipes: now linked rings share the same netmap buffers, with a separate set of kring pointers (rhead, rcur, rtail). Buffer swapping does not need to happen anymore. - Large refactoring of the control API towards an extensible solution; the goal is to allow the addition of more commands and extension of existing ones (with new options) without the need of hacks or the risk of running out of configuration space. A new NIOCCTRL ioctl has been added to handle all the requests of the new control API, which cover all the functionalities so far supported. The netmap API bumps from 11 to 12 with this patch. Full backward compatibility is provided for the old control command (NIOCREGIF), by means of a new netmap_legacy module. Many parts of the old netmap.h header has now been moved to netmap_legacy.h (included by netmap.h). Approved by: hrs (mentor)	2018-04-12 07:20:50 +00:00
Mateusz Guzik	66def52613	iflib: fix up a mismerge in r332419 Lead to crashes on boot while in ifconfig. Submitted by: Matthew Macy <mmacy@mattmacy.io>	2018-04-12 04:11:37 +00:00
Stephen Hurd	90d72813d2	Properly initialize ifc_nhwtxqs. Also, since ifc_nhwrxqs is only used in one place, remove it from the struct. This was preventing iflib_dma_free() from being called via iflib_device_detach(). Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks	2018-04-11 21:41:59 +00:00
Brooks Davis	0437c8e3b1	Remove support for FDDI networks. Defines in net/if_media.h remain in case code copied from ifconfig is in use elsewere (supporting non-existant media type is harmless). Reviewed by: kib, jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15017	2018-04-11 17:28:24 +00:00
Sean Bruno	7feb881918	Revert r332389 as it is causing panics for various users and we need to add some more test cases.	2018-04-11 17:26:53 +00:00
Stephen Hurd	5c1d8c4b73	Split out flag manipulation from general context manipulation in iflib To avoid blocking on the context lock in the swi thread and risk potential deadlocks, this change protects lighter weight updates that only need to be consistent with each other with their own lock. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14967	2018-04-10 19:48:24 +00:00
Stephen Hurd	f422673e10	Make BPF global lock an SX This allows NIC drivers to sleep on polling config operations. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14982	2018-04-10 19:42:50 +00:00
Vincenzo Maffione	4f80b14ce2	netmap: align codebase to upstream version v11.4 Changelist: - remove unused nkr_slot_flags - new nm_intr adapter callback to enable/disable interrupts - remove unused sysctls and document the other sysctls - new infrastructure to support NS_MOREFRAG for NIC ports - support for external memory allocator (for now linux-only), including linux-specific changes in common headers - optimizations within netmap pipes datapath - improvements on VALE control API - new nm_parse() helper function in netmap_user.h - various bug fixes and code clean up Approved by: hrs (mentor)	2018-04-09 09:24:26 +00:00
Brooks Davis	8a4a4a43f8	Remove the thread argument from ifr_buffer_*() accessors. They are always used in a context where curthread is the correct thread. This makes them more similar to the ifr_data_get_ptr() accessor.	2018-04-06 23:25:54 +00:00
Brooks Davis	e7fdc72e95	ifconf(): correct handling of sockaddrs smaller than struct sockaddr. Portable programs that use SIOCGIFCONF (e.g. traceroute) assume that each pseudo ifreq is of length MAX(sizeof(struct ifreq), sizeof(ifr_name) + ifr_addr.sa_len). For short sockaddrs we copied too much from the source sockaddr resulting in a heap leak. I believe only one such sockaddr exists (struct sockaddr_sco which is 8 bytes) and it is unclear if such sockaddrs end up on interfaces in practice. If it did, the result would be an 8 byte heap leak on current architectures. admbugs: 869 Reviewed by: kib Obtained from: CheriBSD MFC after: 3 days Security: kernel heap leak Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14981	2018-04-06 20:26:56 +00:00
Brooks Davis	6469bdcdb6	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
Kristof Provost	adfe2f6aff	pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. Limit the allocation to required size (or the user allocation, if that's smaller). That does mean we need to do the allocation with the rules lock held (so the number doesn't change while we're doing this), so it can't M_WAITOK. MFC after: 1 week	2018-04-06 15:54:30 +00:00
Brooks Davis	756181b8f5	Add 32-bit compat for ioctls that take struct ifgroupreq. Use an accessor to access ifgr_group and ifgr_groups. Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such as "case SIOCAIFGROUP:". This avoids poluting the switch statements with large numbers of #ifdefs. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14960	2018-04-05 22:14:55 +00:00
Brooks Davis	2443045f30	ifconf(): Always zero the whole struct ifreq. The previous split of zeroing ifr_name and ifr_addr seperately is safe on current architectures, but would be unsafe if pointers were larger than 8 bytes. Combining the zeroing adds no real cost (a few instructions) and makes the security property easier to verify. Reviewed by: kib, emaste Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14912	2018-04-05 21:58:28 +00:00
Vincenzo Maffione	46023447b6	netmap: align if_ptnet guest driver to the upstream code (commit 0e15788) The change upgrades the driver to use the split Communication Status Block (CSB) format. In this way the variables written by the guest and read by the host are allocated in a different cacheline than the variables written by the host and read by the guest; this is needed to avoid cache thrashing. Approved by: hrs (mentor)	2018-04-04 21:31:12 +00:00
Brooks Davis	8708f1bdaf	Document and enforce assumptions about struct (in6_)ifreq. - The two types must be type-punnable for shared members of ifr_ifru. This allows compatibility accessors to be shared. - There must be no padding gap between ifr_name and ifr_ifru. This is assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be broadly portable. This is true for all current architectures, but very large (256-bit) fat-pointers could violate this invariant. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14910	2018-03-30 21:38:53 +00:00
Brooks Davis	541d96aaaf	Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900	2018-03-30 18:50:13 +00:00
Brooks Davis	69f0fecbd6	Remove infrastructure for token-ring networks. Reviewed by: cem, imp, jhb, jmallett Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14875	2018-03-28 23:33:26 +00:00
Brooks Davis	38d958a647	Improve copy-and-pasted versions of SIOCGIFADDR. The original implementation used a reference to ifr_data and a cast to do the equivalent of accessing ifr_addr. This was copied multiple times since 1996. Approved by: kib MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14873	2018-03-27 20:51:49 +00:00
Brooks Davis	f8f65519d2	Fix a whitespace bug missed in refactoring prior to r331641. MFC with: r331641	2018-03-27 18:55:39 +00:00
Brooks Davis	86d2ef167a	Fix access to ifru_buffer on freebsd32. Make all kernel accesses to ifru_buffer go via access functions which take the process ABI into account and use an appropriate union to access members in the correct place in struct ifreq. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14846	2018-03-27 18:26:50 +00:00
Konstantin Belousov	f137973487	Allow to specify PCP on packets not belonging to any VLAN. According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be considered as untagged, and only PCP and DEI values from the VLAN tag are meaningful. See for instance https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html. Make it possible to specify PCP value for outgoing packets on an ethernet interface. When PCP is supplied, the tag is appended, VLAN id set to 0, and PCP is filled by the supplied value. The code to do VLAN tag encapsulation is refactored from the if_vlan.c and moved into if_ethersubr.c. Drivers might have issues with filtering VID 0 packets on receive. This bug should be fixed for each driver. Reviewed by: ae (previous version), hselasky, melifaro Sponsored by: Mellanox Technologies MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14702	2018-03-27 15:29:32 +00:00
Mark Johnston	18628b74bd	Clamp IFLIB_RX_COPY_THRESH to MHLEN in iflib_rxd_pkt_get(). If one has added fields to struct mbuf such that MHLEN is smaller than this threshold (128), iflib_rxd_pkt_get() may otherwise overrun the internal mbuf buffer while copying. Reviewed by: mmacy MFC after: 3 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14843	2018-03-25 23:23:19 +00:00
Kristof Provost	effaab8861	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
Alexander V. Chernikov	b2b7ca49dc	Use count(9) api for the bpf(4) statistics. Currently each bfp descriptor uses u64 variables to maintain its counters. On interfaces with high packet rate this leads to unnecessary contention and inaccurate reporting. PR: kern/205320 Reported by: elofu17 at hotmail.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D14726	2018-03-20 22:57:06 +00:00
Alexander V. Chernikov	1435dcd94f	Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration. Current arp/nd code relies on the feedback from the datapath indicating that the entry is still used. This mechanism is incorporated into the arpresolve()/nd6_resolve() routines. After the inpcb route cache introduction, the packet path for the locally-originated packets changed, passing cached lle pointer to the ether_output() directly. This resulted in the arp/ndp entry expire each time exactly after the configured max_age interval. During the small window between the ARP/NDP request and reply from the router, most of the packets got lost. Fix this behaviour by plugging datapath notification code to the packet path used by route cache. Unify the notification code by using single inlined function with the per-AF callbacks. Reported by: sthaug at nethelp.no Reviewed by: ae MFC after: 2 weeks	2018-03-17 17:05:48 +00:00
Andriy Voskoboinyk	2a440d19c1	Correct comment for IFM_IEEE80211_VHT media variant.	2018-03-15 23:32:29 +00:00
Andrey V. Elsukov	6d5948bbe3	Define ethernet type 0x88A8 as ETHERTYPE_QINQ. Reviewed by: kp Obtained from: OpenBSD MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14593	2018-03-06 12:01:31 +00:00
Stephen Hurd	226fb85d19	iflib: stop timer callout when stopping iflib_timer has been seen running after the interface had been removed. This change prevents that. Submitted by: matt.macy@joyent.com	2018-03-02 18:48:07 +00:00
Kristof Provost	bf56a3fe47	pf: Cope with overly large net.pf.states_hashsize If the user configures a states_hashsize or source_nodes_hashsize value we may not have enough memory to allocate this. This used to lock up pf, because these allocations used M_WAITOK. Cope with this by attempting the allocation with M_NOWAIT and falling back to the default sizes (with M_WAITOK) if these fail. PR: 209475 Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com> MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14367	2018-02-25 08:56:44 +00:00
Ryan Stone	b3b6ff23e7	Allow route change requests to not specify the gateway. Only require a gateway to be specified on a route add request. On a route change request that does not specify the gateway, the gateway will remain the same. This allows changing other route parameters without having to re-specifying the gateway, like in "route change 10.0.0.0/8 -mtu 9000". Update the route(8) manpage to explicitly call out this usage as being supported. MFC after: 2 weeks Sponsored by: Dell EMC Isilon Reviewed By: eugen (rtsock.c change), rgrimes Differential Revision: https://reviews.freebsd.org/D14291	2018-02-21 19:13:23 +00:00
Stephen Hurd	81ad57b1b8	IFLIB: Make isc_magic unsigned The IFLIB_MAGIC macro is > INT_MAX, so isc_magic should be able to contain it. Reported by: jeb Sponsored by: Limelight Networks	2018-02-21 18:57:00 +00:00
Navdeep Parhar	7cb7c6e37a	Catch up with the removal of nktr_slot_flags from upstream netmap. No functional impact intended. Submitted by: Vincenzo Maffione <v.maffione@gmail.com>	2018-02-20 21:42:45 +00:00
Stephen Hurd	a4e5960730	IFLIB: do not remove dmamap on buffer unload Dmamap is created only on IFC attach. If we remove it on buffer release, we won't be able to do ifconfig down&up. Only destroy when in detach. Reported by: wma Reviewed by: wma Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14060	2018-02-20 18:33:45 +00:00
Wojciech Macek	74549d4b0f	BPF: Switch to 32 bit compatible mode only when thread is 32 bit Sometimes 32 bit and 64 bit ioctls are represented by the same number. It causes unnecessary switch to 32 bit commpatible mode. This patch prevents switching when we are dealing with 64 bit executable. It fixes issue mentioned here Authored by: Patryk Duda <pdk@semihalf.com> Submitted by: Wojciech Macek <wma@semihalf.com> Reviewed by: andrew, wma Obtained from: Semihalf Sponsored by: IBM, QCM Technologies Differential revision: https://reviews.freebsd.org/D14023	2018-01-25 12:13:41 +00:00
Steven Hartland	6fb1399a4c	Added missing CTLFLAG_VNET to lacp default_strict_mode Added CTLFLAG_VNET to net.link.lagg.lacp.default_strict_mode which was missed in r290450. Reported by: julian@ MFC after: 1 week Sponsored by: Multiplay	2018-01-24 10:13:14 +00:00
Ryan Stone	bc3d87fd59	Increment the route table gen count after a modify Increment the route table generation count after modifying a route. This signals back to TCP connections that they need to update their L2 caches as the gateway for their route may have changed. This is a heavier hammer than is needed, strictly speaking, but route changes will be unlikely enough that the performance effects of invalidating all connection route caches should be negligible. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13990 Reviewed by: karels	2018-01-23 03:15:44 +00:00
Ryan Stone	fc21c53f63	Reduce code duplication for inpcb route caching Add a new macro to clear both the L3 and L2 route caches, to hopefully prevent future instances where only the L3 cache was cleared when both should have been. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13989 Reviewed by: karels	2018-01-23 03:15:39 +00:00
Ryan Stone	dbeab32f94	Invalidate inpcb LLE cache if cached route is invalidated When the inpcb route cache is invalidated after a change to the routing tables, we need to invalidate the LLE cache as well. Previous to this change packets for the connection would continue to use the old L2 information from the old L3 gateway, and the packets for the connection would likely be blackholed. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13988 Reviewed by: karels	2018-01-23 03:15:35 +00:00
Konstantin Belousov	279e33d489	Fix compat32 for sysctl net.PF_ROUTE...NET_RT_IFLISTL. Route messages are aligned to the host long type alignment, which breaks 32bit. Reported and tested by: lwhsu Diagnosed by: Yuri Pankov <yuripv@icloud.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-22 20:49:17 +00:00
Pedro F. Giffuni	ac2fffa4b7	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
Pedro F. Giffuni	443133416b	net*: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:21:51 +00:00
Steven Hartland	fd3bb7aa46	Disabled the use of flowid for lagg by default Disabled the use of RSS hash from the network card aka flowid for lagg(4) interfaces by default as it's currently incompatible with the lacp and loadbalance protocols. The incompatibility is due to the fact that the flowid isn't know for the first packet of a new outbound stream which can result in the hash calculation method changing and hence a stream being incorrectly split across multiple interfaces during normal operation. This can be re-enabled by setting the following in loader.conf: net.link.lagg.default_use_flowid="1" Discussed with: kmacy Sponsored by: Multiplay	2018-01-04 20:05:47 +00:00
Kristof Provost	5d0020d6d7	pf: Clean all fragments on shutdown When pf is unloaded, or a vnet jail using pf is stopped we need to ensure we clean up all fragments, not just the expired ones.	2017-12-31 10:01:31 +00:00
Bryan Venteicher	ac2b436d20	Add macro for vxlan list mutex lock and unlock This will simplify some later VNET support. Submitted by: hrs MFC after: 2 weeks	2017-12-30 19:49:40 +00:00
Bryan Venteicher	6d7bc5838b	Advertise IFCAP_LINKSTAT after r326480 added link status support MFC after: 2 weeks	2017-12-30 19:35:12 +00:00
Bryan Venteicher	33e0d8f057	Add support for IPv6 scoped addresses to vxlan MFC after: 2 weeks	2017-12-30 04:03:53 +00:00
Stephen Hurd	9c58cafaa3	Don't pass rids to taskqgroup_attach() As everywhere else, we want to pass rman_get_start(irq->ii_res). This caused set affinity errors when not using MSI-X vectors (legacy and MSI interrupts). Reported by: sbruno Sponsored by: Limelight Networks	2017-12-27 20:42:30 +00:00
Stephen Hurd	ca03863c85	Remove assertion that's not true for !EARLY_AP_STARTUP gtask->gt_taskqueue is NULL when EARLY_AP_STARTUP is not enabled. Remove assertion to allow this config to work. Reported by: oleg Sponsored by: Limelight Networks	2017-12-27 19:14:15 +00:00
Stephen Hurd	de13095409	Fix indentation. Sponsored by: Limelight Networks	2017-12-27 19:12:32 +00:00
Eitan Adler	caa7e52f3f	kernel: Fix several typos and minor errors - duplicate words - typos - references to old versions of FreeBSD Reviewed by: imp, benno	2017-12-27 03:23:21 +00:00
Alexander Kabaev	151ba7933a	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
Alexander Kabaev	92f19df431	Do not pass NULL pointer to copyout in if_clone_list. Sometimes caller is only interested in how many clones are there and NULL pointer is passed for the destination buffer. Do not pass it to copyout then.	2017-12-23 16:45:24 +00:00
Alexander Kabaev	ce4ab99d82	Remove some trailing whitespace. Reviewed by: glebius, ae Differential Revision: https://reviews.freebsd.org/D10386	2017-12-23 16:24:00 +00:00
Alexander Kabaev	3395dd6eb8	Do not double free the memory in if_clone. if_clone_attach function will drop the reference on failure which will free the if_clone structure. No need to do it second time. Reviewed by: glebius, ae Differential Revision: https://reviews.freebsd.org/D10386	2017-12-23 16:23:58 +00:00
Warner Losh	79554b4049	The device tables end with a sentinel in iflib. Don't include the sentinel in the output.	2017-12-23 04:50:52 +00:00
Warner Losh	d2064cf030	Use '#' rather than some made up name for fields we want to ignore.	2017-12-22 17:53:27 +00:00
Konstantin Belousov	97755e83f5	Fix build for kernels with SCHED_4BSD. Sponsored by: The FreeBSD Foundation	2017-12-21 23:05:13 +00:00
Stephen Hurd	25ac1dd5c7	Don't call tcp_lro_rx() unless hardware verified TCP/UDP csum It seems that tcp_lro_rx() doesn't verify TCP checksums, so if there are bad checksums in the packets caused by invalid data, the invalid data will pass through without errors. This was noticed with the igb driver and a specific internet host: fetch http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz -o test.bin && sha256 test.bin Would result in a different value sometimes. This ends up making LRO require RXCSUM to be enabled, and RXCSUM to support TCP and UDP checksums. PR: 224346 Reported by: gjb Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D13561	2017-12-21 01:22:36 +00:00
Li-Wen Hsu	40cf51c438	Add missing `;` Approved by: kevlo	2017-12-20 06:08:16 +00:00

... 2 3 4 5 6 ...

4176 Commits