freebsd-skq

Author	SHA1	Message	Date
mmacy	b9a21f35a8	fix copy/paste error when clearing ifma flag CID: 1395119 Reported by: vangyzen	2018-08-21 22:59:22 +00:00
cem	d70d723ffc	Back out r338035 until Warner is finished churning GSoC PNP patches I was not aware Warner was making or planning to make forward progress in this area and have since been informed of that. It's easy to apply/reapply when churn dies down.	2018-08-19 00:46:22 +00:00
cem	3d8ae7a0f4	Remove unused and easy to misuse PNP macro parameter Inspired by r338025, just remove the element size parameter to the MODULE_PNP_INFO macro entirely. The 'table' parameter is now required to have correct pointer (or array) type. Since all invocations of the macro already had this property and the emitted PNP data continues to include the element size, there is no functional change. Mostly done with the coccinelle 'spatch' tool: $ cat modpnpsize0.cocci @normaltables@ identifier b,c; expression a,d,e; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,d, -sizeof(d[0]), e); @singletons@ identifier b,c,d; expression a; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,&d, -sizeof(d), 1); $ rg -l MODULE_PNP_INFO -- sys \| \ xargs spatch --in-place --sp-file modpnpsize0.cocci (Note that coccinelle invokes diff(1) via a PATH search and expects diff to tolerate the -B flag, which BSD diff does not. So I had to link gdiff into PATH as diff to use spatch.) Tinderbox'd (-DMAKE_JUST_KERNELS).	2018-08-19 00:22:21 +00:00
np	6e862a5f4b	if_vlan(4): A VLAN always has a PCP and its ifnet's if_pcp should be set to the PCP value in use instead of IFNET_PCP_NONE. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-17 01:03:23 +00:00
np	06d6f82b42	Add the ability to look up the 3b PCP of a VLAN interface. Use it in toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752	2018-08-16 23:46:38 +00:00
mmacy	99cec0a00c	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu	2018-08-15 20:23:08 +00:00
gallatin	aeea8c4eef	lagg: allow lacp to manage the link state Lacp needs to manage the link state itself. Unlike other lagg protocols, the ability of lacp to pass traffic depends not only on the lagg members having link, but also on the lacp protocol converging to a distributing state with the link partner. If we prematurely mark the link as up, then we will send a gratuitous arp (via arp_handle_ifllchange()) before the lacp interface is capable of passing traffic. When this happens, the gratuitous arp is lost, and our link partner may cache a stale mac address (eg, when the base mac address for the lagg bundle changes, due to a BIOS change re-ordering NIC unit numbers) Reviewed by: jtl, hselasky Sponsored by: Netflix	2018-08-13 14:13:25 +00:00
kp	7016cbb5d6	pf: Increase default hash table size Now that we (by default) limit the number of states to 100.000 it makse sense to also adjust the default size of the hash table. Based on the benchmarking results in https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/pf-states_hashsize/results/fbsd12-head.r332390/README.md 128K entries offers a good compromise between performance and memory use. Users may still overrule this setting with the net.pf.states_hashsize and net.pf.source_nodes_hashsize loader(8) tunables.	2018-08-05 13:54:37 +00:00
pkelsey	10742aaed6	Mark the send queue ready so ALTQ is available.	2018-08-04 01:45:17 +00:00
andrew	081aff6081	As with DPCPU_DEFINE_STATIC make VNET_DEFINE_STATIC non-static on arm64 in modules. It also fails in the same way, we are unable to relocate static variables as the compiler uses PC-relative loads with nothing for the kernel linker to relocate. Sponsored by: DARPA, AFRL	2018-07-30 15:05:07 +00:00
andrew	537fdde573	Ensure the DPCPU and VNET module spaces are aligned to hold a pointer. Previously they may have been aligned to a char, leading to misaligned DPCPU and VNET variables. Sponsored by: DARPA, AFRL	2018-07-30 14:25:17 +00:00
andrew	6785775244	As with DPCPU_DEFINE make it a compile error to use static with VNET_DEFINE. There is the VNET_DEFINE_STATIC macro for that.	2018-07-30 12:44:44 +00:00
pkelsey	06ba49246a	ALTQ support for iflib. Reviewed by: jmallett, mmacy Differential Revision: https://reviews.freebsd.org/D16433	2018-07-25 22:46:36 +00:00
marius	2e3a85c261	Since r336611, n is only used for INET in iflib_parse_header(). Reported by: rpokala	2018-07-24 23:40:27 +00:00
andrew	a6605d2938	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
andrew	65d35e69cb	As with DPCPU create VNET_DEFINE_STATIC for when a variable needs to be declaired static. This will allow us to change the definition on arm64 as it has the same issues described in r336349. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:31:16 +00:00
eugen	4596989e7e	epair(4): make sure we do not duplicate MAC addresses in case of reused if_index. PR: 229957 Tested by: O. Hartmann <ohartmann@walstatt.org> Approved by: avg (mentor)	2018-07-23 07:11:58 +00:00
marius	9c190e8f72	Use the maximum of isc_tx_{nsegments,tso_segments_max} for MAX_TX_DESC. Since r336313, TSO support for LEM-class devices is removed again as it was before the conversion of {l,}em(4) to iflib(4) in r311849 and as a result, isc_tx_tso_segments_max is 0 for LEM-class devices now. Thus, inappropriate watermarks were used for this class. This is really only a band-aid, though, because so far iflib(9) doesn't fully take into account that DMA engines can support different maxima of segments for transfers of TSO and non-TSO packets. For example, the DESC_RECLAIMABLE macro is based on isc_tx_nsegments while MAX_TX_DESC used isc_tx_tso_segments_max only. For most in-tree consumers that doesn't make a difference as the maxima are the same for both kinds of transfers (that is, apart from the fact that TSO may require up to 2 sentinel descriptors but also not with every MAC supported). However, isc_tx_nsegments is 8 but isc_tx_tso_segments_max is 85 by default with ixl(4).	2018-07-22 17:51:11 +00:00
marius	8ef4610a11	- Given that the controlling expression of the receive loop in iflib_rxeof() tests for avail > 0, avail can never be 0 within that loop. Thus, move decrementing avail and budget_left into the loop and before the code which checks for additional descriptors having become available in case all the previous ones have been processed but there still is budget left so the latter code works as expected. [1] - In iflib_{busdma_load_mbuf_sg,parse_header}(), remove dead stores to m and n respectively. [2, 3] - In collapse_pkthdr(), ensure that m_next isn't NULL before dereferencing it. [4] - Remove a duplicate assignment of segs in iflib_encap(). Reported by: Coverity CID: 1356027 [1], 1356047 [2], 1368205 [3], 1356028 [4]	2018-07-22 17:45:44 +00:00
shurd	06b406febd	Add knob to control tx ring abdication. r323954 changed the mp ring behaviour when 64-bit atomics were available to abdicate the TX ring rather than having one become a consumer thereby running to completion on TX. The consumer of the mp ring was then triggered in the tx task rather than blocking the TX call. While this significantly lowered the number of RX drops in small-packet forwarding, it also negatively impacts TX performance. With this change, the default behaviour is reverted, causing one TX ring to become a consumer during the enqueue call. A new sysctl, dev.X.Y.iflib.tx_abdicate is added to control this behaviour. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16302	2018-07-20 17:45:26 +00:00
shurd	4db9126b14	Improve netmap TX handling when TX IRQs are not used/supported Use the timer to poll for TX completions when there are outstanding TX slots. Track when the last driver timer was called to prevent overcalling it. Also clean up some kring vs NIC ring usage. Reviewed by: marius, Johannes Lundberg <johalun0@gmail.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16300	2018-07-20 17:24:45 +00:00
ae	d94c744a40	Move invoking of callout_stop(&lle->lle_timer) into llentry_free(). This deduplicates the code a bit, and also implicitly adds missing callout_stop() to in[6]_lltable_delete_entry() functions. PR: 209682, 225927 Submitted by: hselasky (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4605	2018-07-17 11:33:23 +00:00
marius	eeec306a59	Assorted TSO fixes for em(4)/iflib(9) and dead code removal: - Ever since the workaround for the silicon bug of TSO4 causing MAC hangs was committed in r295133, CSUM_TSO always got disabled unconditionally by em(4) on the first invocation of em_init_locked(). However, even with that problem fixed, it turned out that for at least e. g. 82579 not all necessary TSO workarounds are in place, still causing MAC hangs even at Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled in r323292 (r323293 for stable/10) for the EM-class by default, allowing users to turn it on if it happens to work with their particular EM MAC in a Gigabit-only environment. In head, the TSO workaround for speeds other than Gigabit was lost with the conversion to iflib(9) in r311849 (possibly along with another one or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4 got enabled by default again, causing device hangs. Therefore, change the default for this hardware class back to have TSO4 off, allowing users to turn it on manually if it happens to work in their environment as we do in stable/{10,11}. An alternative would be to add a whitelist of EM-class devices where TSO4 actually is reliable with the workarounds in place, but given that the advantage of TSO at Gigabit speed is rather limited - especially with the overhead of these workarounds -, that's really not worth it. [1] This change includes the addition of an isc_capabilities to struct if_softc_ctx so iflib(9) can also handle interface capabilities that shouldn't be enabled by default which is used to handle the default-off capabilities of e1000 as suggested by shurd@ and moving their handling from em_setup_interface() to em_if_attach_pre() accordingly. - Although 82543 support TSO4 in theory, the former lem(4) didn't have support for TSO4, presumably because TSO4 is even more broken in the LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class devices was enabled as part of the conversion to iflib(9) in r311849, causing device hangs. So revert back to the pre-r311849 behavior of not supporting TSO4 for LEM-class at all, which includes not creating a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2] - In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET (65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO DMA must have a maxsize of the maximum TSO size plus the size of a VLAN header for software VLAN tagging. The iflib(9) converted em(4), thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET in em_setup_interface() (apparently, left-over from pre-iflib(9) times). So remove the later and correct iflib(9) to correctly cap the maximum TSO size reported to the stack at IP_MAXPACKET. While at it, let iflib(9) use if_sethwtsomax(). This change includes the addition of isc_tso_max{seg,}size DMA engine constraints for the TSO DMA tag to struct if_shared_ctx and letting iflib_txsd_alloc() automatically adjust the maxsize of that tag in case IFCAP_VLAN_MTU is supported as requested by shurd@. - Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9) so adjustment is automatically done in case IFCAP_VLAN_MTU is supported. As a consequence, this adjustment now is also done in case of bnxt(4) which missed it previously. - Move the reduction of the maximum TSO segment count reported to the stack by the number of m_pullup(9) calls (which in the worst case, can add another mbuf and, thus, the requirement for another DMA segment each) in the transmit path for performance reasons from em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now done in iflib_parse_header() rather than in the no longer existing em_xmit(). Moreover, this optimization applies to all drivers using iflib(9) and not just em(4); all in-tree iflib(9) consumers still have enough room to handle full size TSO packets. Also, reduce the adjustment to the maximum number of m_pullup(9)'s now performed in iflib_parse_header(). - Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9) in r311849 and r335338 respectively, these drivers didn't enable IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on by default but also lagg(4) was fixed in that regard in r203548. So just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling in {em,ixl,ixlv}_setup_interface(). - Nuke other redundant IFCAP_ setting in {em,ixl,ixlv}_setup_interface() which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre() now. - Remove some redundant/dead setting of scctx->isc_tx_csum_flags in em_if_attach_pre(). - Remove some IFCAP_* duplicated either directly or indirectly (e. g. via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS. - Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as iflib(9) adds that capability unconditionally. - Remove some unused macros from em(4). - Bump __FreeBSD_version as some of the above changes require the modules of drivers using iflib(9) to be recompiled. Okayed by: sbruno@ at 201806 DevSummit Transport Working Group [1] Reviewed by: sbruno (earlier version), erj PR: 219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2] Differential Revision: https://reviews.freebsd.org/D15720	2018-07-15 19:04:23 +00:00
kp	21b4f170cf	pf: Fix typo in r336221 Reported by: olivier@	2018-07-12 18:07:28 +00:00
kp	f5bc1a9c7b	pf: Increate default state table size The typical system now has a lot more memory than when pf was new, and is also expected to handle more connections. Increase the default size of the state table. Note that users can overrule this using 'set limit states' in pf.conf. From OpenBSD: The year is 2018. Mercury, Bowie, Cash, Motorola and DEC all left us. Just pf still has a default state table limit of 10000. Had! Now it's a tiny little bit more, 100k. lead guitar: me ok chorus: phessler theo claudio benno background school girl laughing: bob Obtained from: OpenBSD	2018-07-12 16:35:35 +00:00
ae	19e11c571f	Deduplicate the code. Add generic function if_tunnel_check_nesting() that does check for allowed nesting level for tunneling interfaces and also does loop detection. Use it in gif(4), gre(4) and me(4) interfaces. Differential Revision: https://reviews.freebsd.org/D16162	2018-07-09 11:03:28 +00:00
sbruno	3886e43127	struct ifmediareq *ifmrp is only used in the COMPAT_FREEBSD32 parts of ifioctl(). Move it inside the proper #ifdef. This was throwing a valid "Assigned but unused" warning with gcc. Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16063	2018-07-07 13:35:06 +00:00
will	5ce23703c1	Revert r335833. Several third-parties use at least some of these ioctls. While it would be better for regression testing if they were used in base (or at least in the test suite), it's currently not worth the trouble to push through removal. Submitted by: antoine, markj	2018-07-04 03:36:46 +00:00
mmacy	14de8a2820	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
will	af6017a22f	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
ae	fd52110019	Add NULL pointer check. encap_lookup_t method can be invoked by IP encap subsytem even if none of gif/gre/me interfaces are exist. Hash tables are allocated on demand, when first interface is created. So, make NULL pointer check before doing access to hash table. PR: 229378	2018-06-28 11:39:27 +00:00
ae	377f86ae2b	Move BPFIF_* macro definitions into .c file, where struct bpf_if is declared. They are only used in this file and there is no need to export them via bpfdesc.h.	2018-06-19 10:34:45 +00:00
erj	a5400f53b1	iflib: Style fixes MFC after: 1 week	2018-06-18 17:27:43 +00:00
marius	6802d9dfbc	Assorted fixes to MSI-X/MSI/INTx setup in iflib(9): - In iflib_msix_init(), VMMs with broken MSI-X activation are trying to be worked around by manually enabling PCIM_MSIXCTRL_MSIX_ENABLE before calling pci_alloc_msix(9). Apart from constituting a layering violation, this has the problem of leaving PCIM_MSIXCTRL_MSIX_ENABLE enabled when falling back to MSI or INTx when e. g. MSI-X is black- listed and initially also when disabled via hw.pci.enable_msix. The later in turn was incorrectly worked around in r325166. Since r310806, pci(4) itself has code to deal with broken MSI-X handling of VMMs, so all of these workarounds in iflib(9) can go, fixing non-working interrupts when falling back to MSI/INTx. In any case, possibly further adjustments to broken MSI-X activation of VMMs like enabling r310806 by default in VM environments need to be placed into pci(4), not iflib(9). [1] - Also remove the pci_enable_busmaster(9) call from iflib_msix_init(), which is already more properly invoked from iflib_device_attach(). - When falling back to MSI/INTx, release the MSI-X BAR resource again. - When falling back to INTx, ensure scctx->isc_vectors is set to 1 and not to something higher from a device with more than one MSI message supported. - Make the nearby ring_state(s) stuff (static) const. Discussed with: jhb at BSDCan 2018 [1] Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D15729	2018-06-17 20:33:02 +00:00
ae	3d1b3c6fd6	Fix typo. Reported by: rpokala	2018-06-16 19:21:09 +00:00
ae	a58623ba71	Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Rmlock fits better for such workloads. Reviewed by: melifaro, olivier Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15789	2018-06-16 08:26:23 +00:00
ae	b4d3b30b6c	Add missing BPF_MTAP2() for outbound packets.	2018-06-14 15:04:30 +00:00
ae	8020fef9d7	Convert if_me(4) driver to use encap_lookup_t method and be lockless on data path.	2018-06-14 14:53:24 +00:00
jtl	8222f5cb7c	Make UMA and malloc(9) return non-executable memory in most cases. Most kernel memory that is allocated after boot does not need to be executable. There are a few exceptions. For example, kernel modules do need executable memory, but they don't use UMA or malloc(9). The BPF JIT compiler also needs executable memory and did use malloc(9) until r317072. (Note that a side effect of r316767 was that the "small allocation" path in UMA on amd64 already returned non-executable memory. This meant that some calls to malloc(9) or the UMA zone(9) allocator could return executable memory, while others could return non-executable memory. This change makes the behavior consistent.) This change makes malloc(9) return non-executable memory unless the new M_EXEC flag is specified. After this change, the UMA zone(9) allocator will always return non-executable memory, and a KASSERT will catch attempts to use the M_EXEC flag to allocate executable memory using uma_zalloc() or its variants. Allocations that do need executable memory have various choices. They may use the M_EXEC flag to malloc(9), or they may use a different VM interfact to obtain executable pages. Now that malloc(9) again allows executable allocations, this change also reverts most of r317072. PR: 228927 Reviewed by: alc, kib, markj, jhb (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15691	2018-06-13 17:04:41 +00:00
ae	e6c79fbed1	Rework if_gre(4) to use encap_lookup_t method to speedup lookup of needed interface when many gre interfaces are present. Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc.	2018-06-13 11:11:33 +00:00
jtl	a0a72815a8	Fix a memory leak for the BIOCSETWF ioctl on kernels with the BPF_JITTER option. The BPF code was creating a compiled filter in the common filter-creation path. However, BPF only uses compiled filters in the read direction. When creating a write filter, the common filter-creation code was creating an unneeded write filter and leaking the memory used for that. MFC after: 2 weeks Sponsored by: Netflix	2018-06-11 23:32:06 +00:00
ae	4764682061	Explicitly change the link state when we assingn an address. Since we are setting IFF_UP flag on SIOCSIFADDR, it is possible, that after this link state information still not initialized properly. This leads to problems with routing, since now interface has IFCAP_LINKSTATE capability and a route is considered as working only when interface's link state is in LINK_STATE_UP (see RT_LINK_IS_UP() macro). Reported by: Marek Zarychta MFC after: 3 days	2018-06-09 09:57:14 +00:00
shurd	f7f3ce47d0	Remove tx task spinning added in r333686 This caused issues with PASTE. Just remove the reschedule since the DELAY() should be enough for use cases such as pkt-gen which were failing before the change. Reported by: Michio Honda Sponsored by: Limelight Networks	2018-06-08 21:49:19 +00:00
mjg	08fabf55c9	uma: fix up r334824 Turns out there is code which ends up passing M_ZERO to counters. Since counters zero unconditionally on their own, just ignore drop the flag in that place.	2018-06-08 05:40:36 +00:00
mmacy	69a922f7ab	rtentry_zinit: don't blindly pass through M_ZERO to counter alloc	2018-06-08 05:17:06 +00:00
erj	0ac17051d5	iflib: Record TCP checksum info in iflib when TCP checksum is requested ixl(4) (when it switches over to using iflib) devices need the TCP header length in order to do TCP checksum offload. Reviewed by: gallatin@, shurd@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15558	2018-06-07 13:03:07 +00:00
ae	d1ee857bcf	Rework if_gif(4) to use new encap_lookup_t method to speedup lookup of needed interface when many gif interfaces are present. Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc. Interfaces with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST. Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed 16 years ago, and actually it could work only for outbound direction. Each protocol, that can be handled by if_gif(4) interface is registered by separate encap handler, this helps avoid invoking the handler for unrelated protocols (GRE, PIM, etc.). This change allows dramatically improve performance when many gif(4) interfaces are used. Sponsored by: Yandex LLC	2018-06-05 21:24:59 +00:00
ae	dfbd18b5fe	Rework IP encapsulation handling code. Currently it has several disadvantages: - it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all. - it uses single list to keep encap handlers for both INET and INET6 families. - struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree. - matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. The search takes O(n) for n interfaces. All this work is done with exclusive lock held. What this patch includes: - the datapath is converted to be lockless using epoch(9) KPI. - struct encaptab now linked using CK_LIST. - all unused fields removed from struct encaptab. Several new fields addedr: min_length is the minimum packet length, that encapsulation handler expects to see; exact_match is maximum number of bits, that can return an encapsulation handler, when it wants to consume a packet. - IPv6 and IPv4 handlers are stored in separate lists; - added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces. - the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it. - encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input_t method. encap_getarg() funtions is removed. - all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck() to determine the needed handler. - introduced struct encap_config, it contains parameters of encap handler that is going to be registered by encap_attach() function. - encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped. - all current consumers changed to use new KPI. Reviewed by: mmacy Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15617	2018-06-05 20:51:01 +00:00
mmacy	3fe03791ac	Reduce overhead of entropy collection - move harvest mask check inline - move harvest mask to frequently_read out of actively modified cache line - disable ether_input collection and describe its limitations in NOTES Typically entropy collection in ether_input was stirring zero in to the entropy pool while at the same time greatly reducing max pps. This indicates that perhaps we should more closely scrutinize how much entropy we're getting from a given source as well as what our actual entropy collection needs are for seeding Yarrow. Reviewed by: cem, gallatin, delphij Approved by: secteam Differential Revision: https://reviews.freebsd.org/D15526	2018-05-31 21:53:07 +00:00
hselasky	34e8800cc7	Re-apply r190640. - Restore local change to include <net/bpf.h> inside pcap.h. This fixes ports build problems. - Update local copy of dlt.h with new DLT types. - Revert no longer needed <net/bpf.h> includes which were added as part of r334277. Suggested by: antoine@, delphij@, np@ MFC after: 3 weeks Sponsored by: Mellanox Technologies	2018-05-31 09:11:21 +00:00
mmacy	a5d5b1e5b9	if_setlladdr: don't call ioctl in epoch context PR: 228612 Reported by: markj	2018-05-30 21:46:10 +00:00
kp	fbe5f2b7e0	pf: Add missing include statement rmlocks require <sys/lock.h> as well as <sys/rmlock.h>. Unbreak mips build.	2018-05-30 12:40:37 +00:00
kp	de7905d658	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
shurd	40a1e4b33c	iflib: mark irq allocation name parameter as constant The name parameter passed to iflib_irq_alloc_generic and iflib_softirq_alloc_generic is never modified. Many places in code pass string literals and thus should not be modified. Mark the name parameter as a const char * instead, so that we enforce that the name is not modified before passing to bus_describe_intr() Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: kmacy Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15343	2018-05-29 21:56:39 +00:00
mmacy	3b32e27457	iflib: hold context lock across detach for drivers that need it	2018-05-29 18:03:43 +00:00
mmacy	fd829508af	rt_getifa_fib: don't use ifa but info->rti_ifa Reported by: kp	2018-05-29 07:14:57 +00:00
mmacy	722df2d2de	route: fix missed ref adds - ensure that we bump the ifa ref whenever we add a reference - defer freeing epoch protected references until after the if_purgaddrs loop	2018-05-29 00:53:53 +00:00
erj	701350ae28	iflib: Add new shared flag: IFLIB_ADMIN_ALWAYS_RUN ixl(4)'s nvmupdate utility expects the nvmupdate process to run while the interface is down; these nvm update commands use the admin queue, so the admin queue needs to be able to generate interrupts and be processed while the interface is down. So add a flag that ixl(4) sets that lets the entire admin task run even when the interface is marked down/IFF_DRV_RUNNING isn't set. With this change, nvmupdate should function like it did pre-iflib. Reviewed by: gallatin@, sbruno@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15575	2018-05-26 00:46:08 +00:00
mmacy	bef06dbd7a	rtrequest1_fib: we need to always bump the ifaddr refcount when we take a reference from an rtentry. r334118 introduced a case when this was not done. While we're here make the intent more obvious by moving the refcount bump down to when we know we'll actually need it. Reported by: markj	2018-05-25 19:48:26 +00:00
mmacy	c937b516d8	CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors	2018-05-24 23:21:23 +00:00
mmacy	710b4829e5	if_delgroups: add missed unlock introduced by r334118	2018-05-24 17:54:08 +00:00
mmacy	ecd6e9d307	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
pizzamig	fbb6c8b691	Improve MAC address uniqueness on if_epair(4). As reported in PR184149, it can happen that epair devices can have the same MAC address. This solution is based on a 32-bit hash, obtained combining the if_index of the a interface and the hostid. If the hostid is zero, a random number is used. PR: 184149 Reviewed by: wollman, eugen Approved by: cognet Differential Revision: https://reviews.freebsd.org/D15329	2018-05-23 13:10:57 +00:00
markj	c354f4f2f6	Simplify lagg_input(). No functional change intended. MFC after: 2 weeks	2018-05-22 15:35:38 +00:00
mmacy	8e61308048	ck: simplify interface with libkvm consumers by defining ck_queue types as their queue.h equivalents if !_KERNEL	2018-05-21 01:53:23 +00:00
mmacy	da84b7fa5a	net: fix uninitialized variable warning	2018-05-19 19:00:04 +00:00
mmacy	d814acfa7c	mp_ring: fix i386 Even though 64-bit atomics are supported on i386 there are panics indicating that the code does not work correctly there. Switch to mutex based variant (and fix that while we're here). Reported by: pho, kib	2018-05-19 16:44:12 +00:00
mmacy	0db6398617	net: fix set but not used	2018-05-19 05:27:49 +00:00
mmacy	7aeac9ef18	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
mmacy	9f0d447325	epoch(9): allocate net epochs earlier in boot	2018-05-18 18:48:00 +00:00
mmacy	2f5a774893	epoch: move epoch variables to read mostly section	2018-05-18 17:58:15 +00:00
emaste	f0cc1a044c	Use NULL for SYSINIT's last arg, which is a pointer type Sponsored by: The FreeBSD Foundation	2018-05-18 17:58:09 +00:00
mmacy	a48d80f193	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj	2018-05-18 17:29:43 +00:00
mmacy	aac2a8081e	epoch: add non-preemptible "critical" variant adds: - epoch_enter_critical() - can be called inside a different epoch, starts a section that will acquire any MTX_DEF mutexes or do anything that might sleep. - epoch_exit_critical() - corresponding exit call - epoch_wait_critical() - wait variant that is guaranteed that any threads in a section are running. - epoch_global_critical - an epoch_wait_critical safe epoch instance Requested by: markj Approved by: sbruno	2018-05-18 01:52:51 +00:00
mmacy	00950b6e0c	Fix !netmap build post r333686 Approved by: sbruno	2018-05-16 22:25:47 +00:00
shurd	df10b02879	Work around lack of TX IRQs in iflib for netmap When poll() is called via netmap, txsync is initially called, and if there are no available buffers to reclaim, it waits for the driver to notify of new buffers. Since the TX IRQ is generally not used in iflib drivers, this ends up causing a timeout. Work around this by having the reclaim DELAY(1) if it's initially unable to reclaim anything, then schedule the tx task, which will spin by continuously rescheduling itself until some buffers are reclaimed. In general, the delay is enough to allow some buffers to be reclaimed, so spinning is minimized. Reported by: Johannes Lundberg <johalun0@gmail.com> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15455	2018-05-16 21:03:22 +00:00
shurd	8b4a96b13e	Replace rmlock with epoch in lagg Use the new epoch based reclamation API. Now the hot paths will not block at all, and the sx lock is used for the softc data. This fixes LORs reported where the rwlock was obtained when the sxlock was held. Submitted by: mmacy Reported by: Harry Schmalzbauer <freebsd@omnilan.de> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15355	2018-05-14 20:06:49 +00:00
mmacy	0a7aab5128	iflib(9): Add support for cloning pseudo interfaces Part 3 of many ... The VPC framework relies heavily on cloning pseudo interfaces (vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc). This pulls in that piece. Some ancillary changes get pulled in as a side effect. Reviewed by: shurd@ Approved by: sbruno@ Sponsored by: Joyent, Inc. Differential Revision: https://reviews.freebsd.org/D15347	2018-05-11 20:08:28 +00:00
ae	c53ab47acf	Apply the change from r272770 to if_ipsec(4) interface. It is guaranteed that if_ipsec(4) interface is used only for tunnel mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header. Thus we can consider it as new packet and clear the protocols flags. This allows ICMP/ICMPv6 properly handle errors that may cause this packet. PR: 228108 MFC after: 1 week	2018-05-11 16:50:25 +00:00
mmacy	361b54f07a	Allow different bridge types to coexist if_bridge has a lot of limitations that make it scale poorly to higher data rates. In my projects/VPC branch I leverage the bridge interface between layers for my high speed soft switch as well as for purposes of stacking in general. Reviewed by: sbruno@ Approved by: sbruno@ Differential Revision: https://reviews.freebsd.org/D15344	2018-05-11 05:00:40 +00:00
des	42f7c6ed63	Slight cleanup of interface event logging. Make if_printf() use vlog() instead of vprintf(). This means it can no longer return the number of characters printed, as it used to, but every single call to if_printf() in the entire kernel ignores the return value anyway; just return 0 so we don't have to change the prototype. Consistently use if_printf() throughout sys/net/if.c, instead of a mixture of if_printf() and log(). In ifa_maintain_loopback_route(), don't needlessly log an error if we either failed to add a route because it already existed or failed to remove one because it did not. We still return an error code, though. MFC after: 1 week	2018-05-11 00:19:49 +00:00
mmacy	0f77b86d64	Allocate epoch for networking at startup Additionally add CK to include paths for modules Approved by: sbruno@	2018-05-10 19:13:00 +00:00
ae	0490c97003	Add IFCAP_LINKSTATE support to if_loop(4). Reviewed by: wollman Obtained from: Yandex LLC MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15278	2018-05-09 10:50:51 +00:00
shurd	e81ffd1828	iflib: print message when iflib_tx_structures_setup fails Print a message when iflib_tx_structures_setup fails, like we do for iflib_rx_structures_setup. Now that we always print a message from within iflib_qset_structures_setup when it fails, stop printing one in iflib_device_register() at the call site. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15300	2018-05-08 17:15:10 +00:00
shurd	3d0eb99e32	iflib: cleanup queues when iflib_device_register fail Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15299	2018-05-08 16:56:02 +00:00
gallatin	f2a9371551	Fix an off-by-one error when deciding to request a tx interrupt The canonical check for whether or not a ring is drainable is TXQ_AVAIL() > MAX_TX_DESC() + 2. Use this same construct here, in order to avoid a potential off-by-one error where we might otherwise fail to request an interrupt. Reviewed by: mmacy Sponsored by: Netflix	2018-05-07 18:11:22 +00:00
mmacy	d3f138323c	r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a a multicast update would still be queued for deletion when ifconfig deleted the interface thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses on the interface. Synchronously remove all external references to a multicast address before enqueueing for delete. Reported by: lwhsu Approved by: sbruno	2018-05-06 20:34:13 +00:00
mmacy	e532109522	The ifnet pointer (ifp) in rt_newaddrmsg can be valid without ifp->if_addr being set if if the ifnet is still live by way of a reference but in line for deletion. Check ifp->if_addr before dereferencing. Approved by: sbruno	2018-05-06 20:32:47 +00:00
markj	1de3a6fa6d	Add netdump support to iflib. em(4) and igb(4) were tested by me, and ixgbe(4) and bnxt(4) were tested by sbruno. Reviewed by: mmacy, shurd MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15262	2018-05-06 00:57:52 +00:00
markj	071e927e22	Import the netdump client code. This is a component of a system which lets the kernel dump core to a remote host after a panic, rather than to a local storage device. The server component is available in the ports tree. netdump is particularly useful on diskless systems. The netdump(4) man page contains some details describing the protocol. Support for configuring netdump will be added to dumpon(8) in a future commit. To use netdump, the kernel must have been compiled with the NETDUMP option. The initial revision of netdump was written by Darrell Anderson and was integrated into Sandvine's OS, from which this version was derived. Reviewed by: bdrewery, cem (earlier versions), julian, sbruno MFC after: 1 month X-MFC note: use a spare field in struct ifnet Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15253	2018-05-06 00:38:29 +00:00
mmacy	1e91ab0baf	fix gcc8 warnings Approved by: sbruno	2018-05-04 18:57:05 +00:00
shurd	83e3d4c922	iflib: fix invalid free during queue allocation failure In r301567, code was added to cleanup to prevent memory leaks for the Tx and Rx ring structs. This code carefully tracked txq and rxq, and made sure to free them properly during cleanup. Because we assigned the txq and rxq pointers into the ctx->ifc_txqs and ctx->ifc_rxqs, we carefully reset these pointers to NULL, so that cleanup code would not accidentally free the memory twice. This was changed by r304021 ("Update iflib to support more NIC designs"), which removed this resetting of the pointers to NULL, because it re-used the txq and rxq pointers as an index into the queue set array. Unfortunately, the cleanup code was left alone. Thus, if we fail to allocate DMA or fail to configure the queues using the drivers ifdi methods, we will attempt to free txq and rxq. These variables would now incorrectly point to the wrong location, resulting in a page fault. There are a number of methods to correct this, but ultimately the root cause was that we reuse the txq and rxq pointers for two different purposes. Instead, when allocating, store the returned pointer directly into ctx->ifc_txqs and ctx->ifc_rxqs. Then, assign this to txq and rxq as index pointers before starting the loop to allocate each queue. Drop the cleanup code for txq and rxq, and only use ctx->ifc_txqs and ctx->ifc_rxqs. Thus, we no longer need to free txq or rxq under any error flow, and intsead rely solely on the pointers stored in ctx->ifc_txqs and ctx->ifc_rxqs. This prevents the invalid free(), and ensures that we still properly cleanup after ourselves as before when failing to allocate. Submitted by: Jacob Keller Reviewed by: gallatin, sbruno Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15285	2018-05-04 15:20:34 +00:00
shurd	83da3b83e5	iflib: remove unused brscp pointer from iflib_queues_alloc This pointer was no longer written to as of r315217. Since nothing writes to the variable, remove it. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin, kmacy, sbruno Differential Revision: https://reviews.freebsd.org/D15284	2018-05-04 15:11:16 +00:00
shurd	fc5848e1e6	Allow iflib NIC drivers to sleep rather than busy wait Since the move to SMP NIC driver locking has had to go through serious contortions using mtx around long running hardware operations. This moves iflib past that. Individual drivers may now sleep when appropriate. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14983	2018-05-03 17:02:31 +00:00
shurd	7d4b8facc7	Separate list manipulation locking from state change in multicast Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead. Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969	2018-05-02 19:36:29 +00:00
gallatin	40ab8d5ea9	Fix iflib_encap() EFBIG handling bugs 1) Don't give up if m_collapse() fails. Rather than giving up, try m_defrag() immediately. 2) Fix a leak where, if the NIC driver rejected the defrag'ed chain as having too many segments, we would fail to free the chain. Reviewed by: Matthew Macy <mmacy@mattmacy.io> (this version of patch) Submitted by: Matthew Macy <mmacy@mattmacy.io> (early version of leak fix)	2018-04-30 23:53:27 +00:00
hselasky	5663cd2837	Add network device event for priority code point, PCP, changes. When the PCP is changed for either a VLAN network interface or when prio tagging is enabled for a regular ethernet network interface, broadcast the IFNET_EVENT_PCP event so applications like ibcore can update its GID tables accordingly. MFC after: 3 days Reviewed by: ae, kib Differential Revision: https://reviews.freebsd.org/D15040 Sponsored by: Mellanox Technologies	2018-04-26 08:58:27 +00:00
brooks	289ce12298	Translate 32-bit ifmedia requests into native ones. We use transformation rather than accessors as virtually ever driver implements SIOCGIFMEDIA and all would have to be touched. Keep the code readable by always performing copies and (possiably no-op) transforms. Reviewed by: jhb, kib Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14996	2018-04-25 15:30:42 +00:00
markj	943580a700	Use dead_bpf_if instead of bp_null. This fixes a -Wunused error when DEV_BPF and NETGRAPH_BPF are not defined. Also remove a stray semicolon added in r332812. X-MFC with: r332812	2018-04-24 17:42:25 +00:00
brooks	9a0f94467e	Finish removing FDDI and tokenring media support. This fixes media display for 802.11 wireless devices. Software outside the base system that uses these media types and defines should use #ifdef IFM_FDDI or IFM_TOKEN to include or remove support. Reported by: zeising Reviewed by: emaste, kib, zeising Tested by: zeising Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15170	2018-04-23 21:10:33 +00:00

1 2 3 4 5 ...

3972 Commits