freebsd-skq

Author	SHA1	Message	Date
bz	44d2e30ec4	Rather than duplicating the functionality of a macro after r322866 use the already existing one. No functional changes. Reviewed by: karels, ae Approved by: re (rgrimes) Differential Revision: https://reviews.freebsd.org/D17004	2018-09-03 22:10:49 +00:00
shurd	aacbdebbaf	Fix compile error due to missing parenthesis in r338372 Approved by: re (gjb)	2018-08-29 16:21:34 +00:00
shurd	ab689463dd	Fix potential data corruption in iflib The MP ring may have txq pointers enqueued. Previously, these were passed to m_free() when IFC_QFLUSH was set. This patch checks for the value and doesn't call m_free(). Reviewed by: gallatin Approved by: re (gjb) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16882	2018-08-29 15:55:25 +00:00
markm	d8723e8b03	Remove the Yarrow PRNG algorithm option in accordance with due notice given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898	2018-08-26 12:51:46 +00:00
np	e16f1bf84a	Unbreak VLANs after r337943. ether_set_pcp should not be called from ether_output_frame for VLAN interfaces -- the vid + pcp will be inserted during vlan_transmit in that case. r337943 sets the VLAN's ifnet's if_pcp to a proper PCP value and this led to double encapsulation (once with vid 0 and second time with vid+pcp). PR: 230794 Reviewed by: kib@ Approved by: re@ (gjb@) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16887	2018-08-24 21:48:13 +00:00
pkelsey	2e5630c90a	Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of 2^32 bps or greater to be used. Prior to this, bandwidth parameters would simply wrap at the 2^32 boundary. The computations in the HFSC scheduler and token bucket regulator have been modified to operate correctly up to at least 100 Gbps. No other algorithms have been examined or modified for correct operation above 2^32 bps (some may have existing computation resolution or overflow issues at rates below that threshold). pfctl(8) will now limit non-HFSC bandwidth parameters to 2^32 - 1 before passing them to the kernel. The extensions to the pf(4) ioctl interface have been made in a backwards-compatible way by versioning affected data structures, supporting all versions in the kernel, and implementing macros that will cause existing code that consumes that interface to use version 0 without source modifications. If version 0 consumers of the interface are used against a new kernel that has had bandwidth parameters of 2^32 or greater configured by updated tools, such bandwidth parameters will be reported as 2^32 - 1 bps by those old consumers. All in-tree consumers of the pf(4) interface have been updated. To update out-of-tree consumers to the latest version of the interface, define PFIOC_USE_LATEST ahead of any includes and use the code of pfctl(8) as a guide for the ioctls of interest. PR: 211730 Reviewed by: jmallett, kp, loos MFC after: 2 weeks Relnotes: yes Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D16782	2018-08-22 19:38:48 +00:00
erj	7b4938ad4e	if_media: Add new 2.5G/5G/25G/40G/50G/100G/200G/400G media types Upcoming Ethernet hardware will support new media types that aren't in the kernel yet, so they are added here. These mostly include new 25G/50G/100G media types; and this commit introduces new 200G/400G speeds and media. Reviewed by: hselasky@, jhb@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D16731	2018-08-22 18:19:56 +00:00
mmacy	b9a21f35a8	fix copy/paste error when clearing ifma flag CID: 1395119 Reported by: vangyzen	2018-08-21 22:59:22 +00:00
cem	d70d723ffc	Back out r338035 until Warner is finished churning GSoC PNP patches I was not aware Warner was making or planning to make forward progress in this area and have since been informed of that. It's easy to apply/reapply when churn dies down.	2018-08-19 00:46:22 +00:00
cem	3d8ae7a0f4	Remove unused and easy to misuse PNP macro parameter Inspired by r338025, just remove the element size parameter to the MODULE_PNP_INFO macro entirely. The 'table' parameter is now required to have correct pointer (or array) type. Since all invocations of the macro already had this property and the emitted PNP data continues to include the element size, there is no functional change. Mostly done with the coccinelle 'spatch' tool: $ cat modpnpsize0.cocci @normaltables@ identifier b,c; expression a,d,e; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,d, -sizeof(d[0]), e); @singletons@ identifier b,c,d; expression a; declarer MODULE_PNP_INFO; @@ MODULE_PNP_INFO(a,b,c,&d, -sizeof(d), 1); $ rg -l MODULE_PNP_INFO -- sys \| \ xargs spatch --in-place --sp-file modpnpsize0.cocci (Note that coccinelle invokes diff(1) via a PATH search and expects diff to tolerate the -B flag, which BSD diff does not. So I had to link gdiff into PATH as diff to use spatch.) Tinderbox'd (-DMAKE_JUST_KERNELS).	2018-08-19 00:22:21 +00:00
np	6e862a5f4b	if_vlan(4): A VLAN always has a PCP and its ifnet's if_pcp should be set to the PCP value in use instead of IFNET_PCP_NONE. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-17 01:03:23 +00:00
np	06d6f82b42	Add the ability to look up the 3b PCP of a VLAN interface. Use it in toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752	2018-08-16 23:46:38 +00:00
mmacy	99cec0a00c	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu	2018-08-15 20:23:08 +00:00
gallatin	aeea8c4eef	lagg: allow lacp to manage the link state Lacp needs to manage the link state itself. Unlike other lagg protocols, the ability of lacp to pass traffic depends not only on the lagg members having link, but also on the lacp protocol converging to a distributing state with the link partner. If we prematurely mark the link as up, then we will send a gratuitous arp (via arp_handle_ifllchange()) before the lacp interface is capable of passing traffic. When this happens, the gratuitous arp is lost, and our link partner may cache a stale mac address (eg, when the base mac address for the lagg bundle changes, due to a BIOS change re-ordering NIC unit numbers) Reviewed by: jtl, hselasky Sponsored by: Netflix	2018-08-13 14:13:25 +00:00
kp	7016cbb5d6	pf: Increase default hash table size Now that we (by default) limit the number of states to 100.000 it makse sense to also adjust the default size of the hash table. Based on the benchmarking results in https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/pf-states_hashsize/results/fbsd12-head.r332390/README.md 128K entries offers a good compromise between performance and memory use. Users may still overrule this setting with the net.pf.states_hashsize and net.pf.source_nodes_hashsize loader(8) tunables.	2018-08-05 13:54:37 +00:00
pkelsey	10742aaed6	Mark the send queue ready so ALTQ is available.	2018-08-04 01:45:17 +00:00
andrew	081aff6081	As with DPCPU_DEFINE_STATIC make VNET_DEFINE_STATIC non-static on arm64 in modules. It also fails in the same way, we are unable to relocate static variables as the compiler uses PC-relative loads with nothing for the kernel linker to relocate. Sponsored by: DARPA, AFRL	2018-07-30 15:05:07 +00:00
andrew	537fdde573	Ensure the DPCPU and VNET module spaces are aligned to hold a pointer. Previously they may have been aligned to a char, leading to misaligned DPCPU and VNET variables. Sponsored by: DARPA, AFRL	2018-07-30 14:25:17 +00:00
andrew	6785775244	As with DPCPU_DEFINE make it a compile error to use static with VNET_DEFINE. There is the VNET_DEFINE_STATIC macro for that.	2018-07-30 12:44:44 +00:00
pkelsey	06ba49246a	ALTQ support for iflib. Reviewed by: jmallett, mmacy Differential Revision: https://reviews.freebsd.org/D16433	2018-07-25 22:46:36 +00:00
marius	2e3a85c261	Since r336611, n is only used for INET in iflib_parse_header(). Reported by: rpokala	2018-07-24 23:40:27 +00:00
andrew	a6605d2938	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
andrew	65d35e69cb	As with DPCPU create VNET_DEFINE_STATIC for when a variable needs to be declaired static. This will allow us to change the definition on arm64 as it has the same issues described in r336349. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:31:16 +00:00
eugen	4596989e7e	epair(4): make sure we do not duplicate MAC addresses in case of reused if_index. PR: 229957 Tested by: O. Hartmann <ohartmann@walstatt.org> Approved by: avg (mentor)	2018-07-23 07:11:58 +00:00
marius	9c190e8f72	Use the maximum of isc_tx_{nsegments,tso_segments_max} for MAX_TX_DESC. Since r336313, TSO support for LEM-class devices is removed again as it was before the conversion of {l,}em(4) to iflib(4) in r311849 and as a result, isc_tx_tso_segments_max is 0 for LEM-class devices now. Thus, inappropriate watermarks were used for this class. This is really only a band-aid, though, because so far iflib(9) doesn't fully take into account that DMA engines can support different maxima of segments for transfers of TSO and non-TSO packets. For example, the DESC_RECLAIMABLE macro is based on isc_tx_nsegments while MAX_TX_DESC used isc_tx_tso_segments_max only. For most in-tree consumers that doesn't make a difference as the maxima are the same for both kinds of transfers (that is, apart from the fact that TSO may require up to 2 sentinel descriptors but also not with every MAC supported). However, isc_tx_nsegments is 8 but isc_tx_tso_segments_max is 85 by default with ixl(4).	2018-07-22 17:51:11 +00:00
marius	8ef4610a11	- Given that the controlling expression of the receive loop in iflib_rxeof() tests for avail > 0, avail can never be 0 within that loop. Thus, move decrementing avail and budget_left into the loop and before the code which checks for additional descriptors having become available in case all the previous ones have been processed but there still is budget left so the latter code works as expected. [1] - In iflib_{busdma_load_mbuf_sg,parse_header}(), remove dead stores to m and n respectively. [2, 3] - In collapse_pkthdr(), ensure that m_next isn't NULL before dereferencing it. [4] - Remove a duplicate assignment of segs in iflib_encap(). Reported by: Coverity CID: 1356027 [1], 1356047 [2], 1368205 [3], 1356028 [4]	2018-07-22 17:45:44 +00:00
shurd	06b406febd	Add knob to control tx ring abdication. r323954 changed the mp ring behaviour when 64-bit atomics were available to abdicate the TX ring rather than having one become a consumer thereby running to completion on TX. The consumer of the mp ring was then triggered in the tx task rather than blocking the TX call. While this significantly lowered the number of RX drops in small-packet forwarding, it also negatively impacts TX performance. With this change, the default behaviour is reverted, causing one TX ring to become a consumer during the enqueue call. A new sysctl, dev.X.Y.iflib.tx_abdicate is added to control this behaviour. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16302	2018-07-20 17:45:26 +00:00
shurd	4db9126b14	Improve netmap TX handling when TX IRQs are not used/supported Use the timer to poll for TX completions when there are outstanding TX slots. Track when the last driver timer was called to prevent overcalling it. Also clean up some kring vs NIC ring usage. Reviewed by: marius, Johannes Lundberg <johalun0@gmail.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16300	2018-07-20 17:24:45 +00:00
ae	d94c744a40	Move invoking of callout_stop(&lle->lle_timer) into llentry_free(). This deduplicates the code a bit, and also implicitly adds missing callout_stop() to in[6]_lltable_delete_entry() functions. PR: 209682, 225927 Submitted by: hselasky (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4605	2018-07-17 11:33:23 +00:00
marius	eeec306a59	Assorted TSO fixes for em(4)/iflib(9) and dead code removal: - Ever since the workaround for the silicon bug of TSO4 causing MAC hangs was committed in r295133, CSUM_TSO always got disabled unconditionally by em(4) on the first invocation of em_init_locked(). However, even with that problem fixed, it turned out that for at least e. g. 82579 not all necessary TSO workarounds are in place, still causing MAC hangs even at Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled in r323292 (r323293 for stable/10) for the EM-class by default, allowing users to turn it on if it happens to work with their particular EM MAC in a Gigabit-only environment. In head, the TSO workaround for speeds other than Gigabit was lost with the conversion to iflib(9) in r311849 (possibly along with another one or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4 got enabled by default again, causing device hangs. Therefore, change the default for this hardware class back to have TSO4 off, allowing users to turn it on manually if it happens to work in their environment as we do in stable/{10,11}. An alternative would be to add a whitelist of EM-class devices where TSO4 actually is reliable with the workarounds in place, but given that the advantage of TSO at Gigabit speed is rather limited - especially with the overhead of these workarounds -, that's really not worth it. [1] This change includes the addition of an isc_capabilities to struct if_softc_ctx so iflib(9) can also handle interface capabilities that shouldn't be enabled by default which is used to handle the default-off capabilities of e1000 as suggested by shurd@ and moving their handling from em_setup_interface() to em_if_attach_pre() accordingly. - Although 82543 support TSO4 in theory, the former lem(4) didn't have support for TSO4, presumably because TSO4 is even more broken in the LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class devices was enabled as part of the conversion to iflib(9) in r311849, causing device hangs. So revert back to the pre-r311849 behavior of not supporting TSO4 for LEM-class at all, which includes not creating a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2] - In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET (65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO DMA must have a maxsize of the maximum TSO size plus the size of a VLAN header for software VLAN tagging. The iflib(9) converted em(4), thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET in em_setup_interface() (apparently, left-over from pre-iflib(9) times). So remove the later and correct iflib(9) to correctly cap the maximum TSO size reported to the stack at IP_MAXPACKET. While at it, let iflib(9) use if_sethwtsomax(). This change includes the addition of isc_tso_max{seg,}size DMA engine constraints for the TSO DMA tag to struct if_shared_ctx and letting iflib_txsd_alloc() automatically adjust the maxsize of that tag in case IFCAP_VLAN_MTU is supported as requested by shurd@. - Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9) so adjustment is automatically done in case IFCAP_VLAN_MTU is supported. As a consequence, this adjustment now is also done in case of bnxt(4) which missed it previously. - Move the reduction of the maximum TSO segment count reported to the stack by the number of m_pullup(9) calls (which in the worst case, can add another mbuf and, thus, the requirement for another DMA segment each) in the transmit path for performance reasons from em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now done in iflib_parse_header() rather than in the no longer existing em_xmit(). Moreover, this optimization applies to all drivers using iflib(9) and not just em(4); all in-tree iflib(9) consumers still have enough room to handle full size TSO packets. Also, reduce the adjustment to the maximum number of m_pullup(9)'s now performed in iflib_parse_header(). - Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9) in r311849 and r335338 respectively, these drivers didn't enable IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on by default but also lagg(4) was fixed in that regard in r203548. So just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling in {em,ixl,ixlv}_setup_interface(). - Nuke other redundant IFCAP_ setting in {em,ixl,ixlv}_setup_interface() which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre() now. - Remove some redundant/dead setting of scctx->isc_tx_csum_flags in em_if_attach_pre(). - Remove some IFCAP_* duplicated either directly or indirectly (e. g. via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS. - Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as iflib(9) adds that capability unconditionally. - Remove some unused macros from em(4). - Bump __FreeBSD_version as some of the above changes require the modules of drivers using iflib(9) to be recompiled. Okayed by: sbruno@ at 201806 DevSummit Transport Working Group [1] Reviewed by: sbruno (earlier version), erj PR: 219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2] Differential Revision: https://reviews.freebsd.org/D15720	2018-07-15 19:04:23 +00:00
kp	21b4f170cf	pf: Fix typo in r336221 Reported by: olivier@	2018-07-12 18:07:28 +00:00
kp	f5bc1a9c7b	pf: Increate default state table size The typical system now has a lot more memory than when pf was new, and is also expected to handle more connections. Increase the default size of the state table. Note that users can overrule this using 'set limit states' in pf.conf. From OpenBSD: The year is 2018. Mercury, Bowie, Cash, Motorola and DEC all left us. Just pf still has a default state table limit of 10000. Had! Now it's a tiny little bit more, 100k. lead guitar: me ok chorus: phessler theo claudio benno background school girl laughing: bob Obtained from: OpenBSD	2018-07-12 16:35:35 +00:00
ae	19e11c571f	Deduplicate the code. Add generic function if_tunnel_check_nesting() that does check for allowed nesting level for tunneling interfaces and also does loop detection. Use it in gif(4), gre(4) and me(4) interfaces. Differential Revision: https://reviews.freebsd.org/D16162	2018-07-09 11:03:28 +00:00
sbruno	3886e43127	struct ifmediareq *ifmrp is only used in the COMPAT_FREEBSD32 parts of ifioctl(). Move it inside the proper #ifdef. This was throwing a valid "Assigned but unused" warning with gcc. Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16063	2018-07-07 13:35:06 +00:00
will	5ce23703c1	Revert r335833. Several third-parties use at least some of these ioctls. While it would be better for regression testing if they were used in base (or at least in the test suite), it's currently not worth the trouble to push through removal. Submitted by: antoine, markj	2018-07-04 03:36:46 +00:00
mmacy	14de8a2820	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
will	af6017a22f	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
ae	fd52110019	Add NULL pointer check. encap_lookup_t method can be invoked by IP encap subsytem even if none of gif/gre/me interfaces are exist. Hash tables are allocated on demand, when first interface is created. So, make NULL pointer check before doing access to hash table. PR: 229378	2018-06-28 11:39:27 +00:00
ae	377f86ae2b	Move BPFIF_* macro definitions into .c file, where struct bpf_if is declared. They are only used in this file and there is no need to export them via bpfdesc.h.	2018-06-19 10:34:45 +00:00
erj	a5400f53b1	iflib: Style fixes MFC after: 1 week	2018-06-18 17:27:43 +00:00
marius	6802d9dfbc	Assorted fixes to MSI-X/MSI/INTx setup in iflib(9): - In iflib_msix_init(), VMMs with broken MSI-X activation are trying to be worked around by manually enabling PCIM_MSIXCTRL_MSIX_ENABLE before calling pci_alloc_msix(9). Apart from constituting a layering violation, this has the problem of leaving PCIM_MSIXCTRL_MSIX_ENABLE enabled when falling back to MSI or INTx when e. g. MSI-X is black- listed and initially also when disabled via hw.pci.enable_msix. The later in turn was incorrectly worked around in r325166. Since r310806, pci(4) itself has code to deal with broken MSI-X handling of VMMs, so all of these workarounds in iflib(9) can go, fixing non-working interrupts when falling back to MSI/INTx. In any case, possibly further adjustments to broken MSI-X activation of VMMs like enabling r310806 by default in VM environments need to be placed into pci(4), not iflib(9). [1] - Also remove the pci_enable_busmaster(9) call from iflib_msix_init(), which is already more properly invoked from iflib_device_attach(). - When falling back to MSI/INTx, release the MSI-X BAR resource again. - When falling back to INTx, ensure scctx->isc_vectors is set to 1 and not to something higher from a device with more than one MSI message supported. - Make the nearby ring_state(s) stuff (static) const. Discussed with: jhb at BSDCan 2018 [1] Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D15729	2018-06-17 20:33:02 +00:00
ae	3d1b3c6fd6	Fix typo. Reported by: rpokala	2018-06-16 19:21:09 +00:00
ae	a58623ba71	Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Rmlock fits better for such workloads. Reviewed by: melifaro, olivier Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15789	2018-06-16 08:26:23 +00:00
ae	b4d3b30b6c	Add missing BPF_MTAP2() for outbound packets.	2018-06-14 15:04:30 +00:00
ae	8020fef9d7	Convert if_me(4) driver to use encap_lookup_t method and be lockless on data path.	2018-06-14 14:53:24 +00:00
jtl	8222f5cb7c	Make UMA and malloc(9) return non-executable memory in most cases. Most kernel memory that is allocated after boot does not need to be executable. There are a few exceptions. For example, kernel modules do need executable memory, but they don't use UMA or malloc(9). The BPF JIT compiler also needs executable memory and did use malloc(9) until r317072. (Note that a side effect of r316767 was that the "small allocation" path in UMA on amd64 already returned non-executable memory. This meant that some calls to malloc(9) or the UMA zone(9) allocator could return executable memory, while others could return non-executable memory. This change makes the behavior consistent.) This change makes malloc(9) return non-executable memory unless the new M_EXEC flag is specified. After this change, the UMA zone(9) allocator will always return non-executable memory, and a KASSERT will catch attempts to use the M_EXEC flag to allocate executable memory using uma_zalloc() or its variants. Allocations that do need executable memory have various choices. They may use the M_EXEC flag to malloc(9), or they may use a different VM interfact to obtain executable pages. Now that malloc(9) again allows executable allocations, this change also reverts most of r317072. PR: 228927 Reviewed by: alc, kib, markj, jhb (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15691	2018-06-13 17:04:41 +00:00
ae	e6c79fbed1	Rework if_gre(4) to use encap_lookup_t method to speedup lookup of needed interface when many gre interfaces are present. Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc.	2018-06-13 11:11:33 +00:00
jtl	a0a72815a8	Fix a memory leak for the BIOCSETWF ioctl on kernels with the BPF_JITTER option. The BPF code was creating a compiled filter in the common filter-creation path. However, BPF only uses compiled filters in the read direction. When creating a write filter, the common filter-creation code was creating an unneeded write filter and leaking the memory used for that. MFC after: 2 weeks Sponsored by: Netflix	2018-06-11 23:32:06 +00:00
ae	4764682061	Explicitly change the link state when we assingn an address. Since we are setting IFF_UP flag on SIOCSIFADDR, it is possible, that after this link state information still not initialized properly. This leads to problems with routing, since now interface has IFCAP_LINKSTATE capability and a route is considered as working only when interface's link state is in LINK_STATE_UP (see RT_LINK_IS_UP() macro). Reported by: Marek Zarychta MFC after: 3 days	2018-06-09 09:57:14 +00:00
shurd	f7f3ce47d0	Remove tx task spinning added in r333686 This caused issues with PASTE. Just remove the reschedule since the DELAY() should be enough for use cases such as pkt-gen which were failing before the change. Reported by: Michio Honda Sponsored by: Limelight Networks	2018-06-08 21:49:19 +00:00
mjg	08fabf55c9	uma: fix up r334824 Turns out there is code which ends up passing M_ZERO to counters. Since counters zero unconditionally on their own, just ignore drop the flag in that place.	2018-06-08 05:40:36 +00:00
mmacy	69a922f7ab	rtentry_zinit: don't blindly pass through M_ZERO to counter alloc	2018-06-08 05:17:06 +00:00
erj	0ac17051d5	iflib: Record TCP checksum info in iflib when TCP checksum is requested ixl(4) (when it switches over to using iflib) devices need the TCP header length in order to do TCP checksum offload. Reviewed by: gallatin@, shurd@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15558	2018-06-07 13:03:07 +00:00
ae	d1ee857bcf	Rework if_gif(4) to use new encap_lookup_t method to speedup lookup of needed interface when many gif interfaces are present. Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc. Interfaces with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST. Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed 16 years ago, and actually it could work only for outbound direction. Each protocol, that can be handled by if_gif(4) interface is registered by separate encap handler, this helps avoid invoking the handler for unrelated protocols (GRE, PIM, etc.). This change allows dramatically improve performance when many gif(4) interfaces are used. Sponsored by: Yandex LLC	2018-06-05 21:24:59 +00:00
ae	dfbd18b5fe	Rework IP encapsulation handling code. Currently it has several disadvantages: - it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all. - it uses single list to keep encap handlers for both INET and INET6 families. - struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree. - matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. The search takes O(n) for n interfaces. All this work is done with exclusive lock held. What this patch includes: - the datapath is converted to be lockless using epoch(9) KPI. - struct encaptab now linked using CK_LIST. - all unused fields removed from struct encaptab. Several new fields addedr: min_length is the minimum packet length, that encapsulation handler expects to see; exact_match is maximum number of bits, that can return an encapsulation handler, when it wants to consume a packet. - IPv6 and IPv4 handlers are stored in separate lists; - added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces. - the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it. - encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input_t method. encap_getarg() funtions is removed. - all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck() to determine the needed handler. - introduced struct encap_config, it contains parameters of encap handler that is going to be registered by encap_attach() function. - encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped. - all current consumers changed to use new KPI. Reviewed by: mmacy Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15617	2018-06-05 20:51:01 +00:00
mmacy	3fe03791ac	Reduce overhead of entropy collection - move harvest mask check inline - move harvest mask to frequently_read out of actively modified cache line - disable ether_input collection and describe its limitations in NOTES Typically entropy collection in ether_input was stirring zero in to the entropy pool while at the same time greatly reducing max pps. This indicates that perhaps we should more closely scrutinize how much entropy we're getting from a given source as well as what our actual entropy collection needs are for seeding Yarrow. Reviewed by: cem, gallatin, delphij Approved by: secteam Differential Revision: https://reviews.freebsd.org/D15526	2018-05-31 21:53:07 +00:00
hselasky	34e8800cc7	Re-apply r190640. - Restore local change to include <net/bpf.h> inside pcap.h. This fixes ports build problems. - Update local copy of dlt.h with new DLT types. - Revert no longer needed <net/bpf.h> includes which were added as part of r334277. Suggested by: antoine@, delphij@, np@ MFC after: 3 weeks Sponsored by: Mellanox Technologies	2018-05-31 09:11:21 +00:00
mmacy	a5d5b1e5b9	if_setlladdr: don't call ioctl in epoch context PR: 228612 Reported by: markj	2018-05-30 21:46:10 +00:00
kp	fbe5f2b7e0	pf: Add missing include statement rmlocks require <sys/lock.h> as well as <sys/rmlock.h>. Unbreak mips build.	2018-05-30 12:40:37 +00:00
kp	de7905d658	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
shurd	40a1e4b33c	iflib: mark irq allocation name parameter as constant The name parameter passed to iflib_irq_alloc_generic and iflib_softirq_alloc_generic is never modified. Many places in code pass string literals and thus should not be modified. Mark the name parameter as a const char * instead, so that we enforce that the name is not modified before passing to bus_describe_intr() Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: kmacy Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15343	2018-05-29 21:56:39 +00:00
mmacy	3b32e27457	iflib: hold context lock across detach for drivers that need it	2018-05-29 18:03:43 +00:00
mmacy	fd829508af	rt_getifa_fib: don't use ifa but info->rti_ifa Reported by: kp	2018-05-29 07:14:57 +00:00
mmacy	722df2d2de	route: fix missed ref adds - ensure that we bump the ifa ref whenever we add a reference - defer freeing epoch protected references until after the if_purgaddrs loop	2018-05-29 00:53:53 +00:00
erj	701350ae28	iflib: Add new shared flag: IFLIB_ADMIN_ALWAYS_RUN ixl(4)'s nvmupdate utility expects the nvmupdate process to run while the interface is down; these nvm update commands use the admin queue, so the admin queue needs to be able to generate interrupts and be processed while the interface is down. So add a flag that ixl(4) sets that lets the entire admin task run even when the interface is marked down/IFF_DRV_RUNNING isn't set. With this change, nvmupdate should function like it did pre-iflib. Reviewed by: gallatin@, sbruno@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15575	2018-05-26 00:46:08 +00:00
mmacy	bef06dbd7a	rtrequest1_fib: we need to always bump the ifaddr refcount when we take a reference from an rtentry. r334118 introduced a case when this was not done. While we're here make the intent more obvious by moving the refcount bump down to when we know we'll actually need it. Reported by: markj	2018-05-25 19:48:26 +00:00
mmacy	c937b516d8	CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors	2018-05-24 23:21:23 +00:00
mmacy	710b4829e5	if_delgroups: add missed unlock introduced by r334118	2018-05-24 17:54:08 +00:00
mmacy	ecd6e9d307	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
pizzamig	fbb6c8b691	Improve MAC address uniqueness on if_epair(4). As reported in PR184149, it can happen that epair devices can have the same MAC address. This solution is based on a 32-bit hash, obtained combining the if_index of the a interface and the hostid. If the hostid is zero, a random number is used. PR: 184149 Reviewed by: wollman, eugen Approved by: cognet Differential Revision: https://reviews.freebsd.org/D15329	2018-05-23 13:10:57 +00:00
markj	c354f4f2f6	Simplify lagg_input(). No functional change intended. MFC after: 2 weeks	2018-05-22 15:35:38 +00:00
mmacy	8e61308048	ck: simplify interface with libkvm consumers by defining ck_queue types as their queue.h equivalents if !_KERNEL	2018-05-21 01:53:23 +00:00
mmacy	da84b7fa5a	net: fix uninitialized variable warning	2018-05-19 19:00:04 +00:00
mmacy	d814acfa7c	mp_ring: fix i386 Even though 64-bit atomics are supported on i386 there are panics indicating that the code does not work correctly there. Switch to mutex based variant (and fix that while we're here). Reported by: pho, kib	2018-05-19 16:44:12 +00:00
mmacy	0db6398617	net: fix set but not used	2018-05-19 05:27:49 +00:00
mmacy	7aeac9ef18	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
mmacy	9f0d447325	epoch(9): allocate net epochs earlier in boot	2018-05-18 18:48:00 +00:00
mmacy	2f5a774893	epoch: move epoch variables to read mostly section	2018-05-18 17:58:15 +00:00
emaste	f0cc1a044c	Use NULL for SYSINIT's last arg, which is a pointer type Sponsored by: The FreeBSD Foundation	2018-05-18 17:58:09 +00:00
mmacy	a48d80f193	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj	2018-05-18 17:29:43 +00:00
mmacy	aac2a8081e	epoch: add non-preemptible "critical" variant adds: - epoch_enter_critical() - can be called inside a different epoch, starts a section that will acquire any MTX_DEF mutexes or do anything that might sleep. - epoch_exit_critical() - corresponding exit call - epoch_wait_critical() - wait variant that is guaranteed that any threads in a section are running. - epoch_global_critical - an epoch_wait_critical safe epoch instance Requested by: markj Approved by: sbruno	2018-05-18 01:52:51 +00:00
mmacy	00950b6e0c	Fix !netmap build post r333686 Approved by: sbruno	2018-05-16 22:25:47 +00:00
shurd	df10b02879	Work around lack of TX IRQs in iflib for netmap When poll() is called via netmap, txsync is initially called, and if there are no available buffers to reclaim, it waits for the driver to notify of new buffers. Since the TX IRQ is generally not used in iflib drivers, this ends up causing a timeout. Work around this by having the reclaim DELAY(1) if it's initially unable to reclaim anything, then schedule the tx task, which will spin by continuously rescheduling itself until some buffers are reclaimed. In general, the delay is enough to allow some buffers to be reclaimed, so spinning is minimized. Reported by: Johannes Lundberg <johalun0@gmail.com> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15455	2018-05-16 21:03:22 +00:00
shurd	8b4a96b13e	Replace rmlock with epoch in lagg Use the new epoch based reclamation API. Now the hot paths will not block at all, and the sx lock is used for the softc data. This fixes LORs reported where the rwlock was obtained when the sxlock was held. Submitted by: mmacy Reported by: Harry Schmalzbauer <freebsd@omnilan.de> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15355	2018-05-14 20:06:49 +00:00
mmacy	0a7aab5128	iflib(9): Add support for cloning pseudo interfaces Part 3 of many ... The VPC framework relies heavily on cloning pseudo interfaces (vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc). This pulls in that piece. Some ancillary changes get pulled in as a side effect. Reviewed by: shurd@ Approved by: sbruno@ Sponsored by: Joyent, Inc. Differential Revision: https://reviews.freebsd.org/D15347	2018-05-11 20:08:28 +00:00
ae	c53ab47acf	Apply the change from r272770 to if_ipsec(4) interface. It is guaranteed that if_ipsec(4) interface is used only for tunnel mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header. Thus we can consider it as new packet and clear the protocols flags. This allows ICMP/ICMPv6 properly handle errors that may cause this packet. PR: 228108 MFC after: 1 week	2018-05-11 16:50:25 +00:00
mmacy	361b54f07a	Allow different bridge types to coexist if_bridge has a lot of limitations that make it scale poorly to higher data rates. In my projects/VPC branch I leverage the bridge interface between layers for my high speed soft switch as well as for purposes of stacking in general. Reviewed by: sbruno@ Approved by: sbruno@ Differential Revision: https://reviews.freebsd.org/D15344	2018-05-11 05:00:40 +00:00
des	42f7c6ed63	Slight cleanup of interface event logging. Make if_printf() use vlog() instead of vprintf(). This means it can no longer return the number of characters printed, as it used to, but every single call to if_printf() in the entire kernel ignores the return value anyway; just return 0 so we don't have to change the prototype. Consistently use if_printf() throughout sys/net/if.c, instead of a mixture of if_printf() and log(). In ifa_maintain_loopback_route(), don't needlessly log an error if we either failed to add a route because it already existed or failed to remove one because it did not. We still return an error code, though. MFC after: 1 week	2018-05-11 00:19:49 +00:00
mmacy	0f77b86d64	Allocate epoch for networking at startup Additionally add CK to include paths for modules Approved by: sbruno@	2018-05-10 19:13:00 +00:00
ae	0490c97003	Add IFCAP_LINKSTATE support to if_loop(4). Reviewed by: wollman Obtained from: Yandex LLC MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15278	2018-05-09 10:50:51 +00:00
shurd	e81ffd1828	iflib: print message when iflib_tx_structures_setup fails Print a message when iflib_tx_structures_setup fails, like we do for iflib_rx_structures_setup. Now that we always print a message from within iflib_qset_structures_setup when it fails, stop printing one in iflib_device_register() at the call site. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15300	2018-05-08 17:15:10 +00:00
shurd	3d0eb99e32	iflib: cleanup queues when iflib_device_register fail Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15299	2018-05-08 16:56:02 +00:00
gallatin	f2a9371551	Fix an off-by-one error when deciding to request a tx interrupt The canonical check for whether or not a ring is drainable is TXQ_AVAIL() > MAX_TX_DESC() + 2. Use this same construct here, in order to avoid a potential off-by-one error where we might otherwise fail to request an interrupt. Reviewed by: mmacy Sponsored by: Netflix	2018-05-07 18:11:22 +00:00
mmacy	d3f138323c	r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a a multicast update would still be queued for deletion when ifconfig deleted the interface thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses on the interface. Synchronously remove all external references to a multicast address before enqueueing for delete. Reported by: lwhsu Approved by: sbruno	2018-05-06 20:34:13 +00:00
mmacy	e532109522	The ifnet pointer (ifp) in rt_newaddrmsg can be valid without ifp->if_addr being set if if the ifnet is still live by way of a reference but in line for deletion. Check ifp->if_addr before dereferencing. Approved by: sbruno	2018-05-06 20:32:47 +00:00
markj	1de3a6fa6d	Add netdump support to iflib. em(4) and igb(4) were tested by me, and ixgbe(4) and bnxt(4) were tested by sbruno. Reviewed by: mmacy, shurd MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15262	2018-05-06 00:57:52 +00:00
markj	071e927e22	Import the netdump client code. This is a component of a system which lets the kernel dump core to a remote host after a panic, rather than to a local storage device. The server component is available in the ports tree. netdump is particularly useful on diskless systems. The netdump(4) man page contains some details describing the protocol. Support for configuring netdump will be added to dumpon(8) in a future commit. To use netdump, the kernel must have been compiled with the NETDUMP option. The initial revision of netdump was written by Darrell Anderson and was integrated into Sandvine's OS, from which this version was derived. Reviewed by: bdrewery, cem (earlier versions), julian, sbruno MFC after: 1 month X-MFC note: use a spare field in struct ifnet Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15253	2018-05-06 00:38:29 +00:00
mmacy	1e91ab0baf	fix gcc8 warnings Approved by: sbruno	2018-05-04 18:57:05 +00:00
shurd	83e3d4c922	iflib: fix invalid free during queue allocation failure In r301567, code was added to cleanup to prevent memory leaks for the Tx and Rx ring structs. This code carefully tracked txq and rxq, and made sure to free them properly during cleanup. Because we assigned the txq and rxq pointers into the ctx->ifc_txqs and ctx->ifc_rxqs, we carefully reset these pointers to NULL, so that cleanup code would not accidentally free the memory twice. This was changed by r304021 ("Update iflib to support more NIC designs"), which removed this resetting of the pointers to NULL, because it re-used the txq and rxq pointers as an index into the queue set array. Unfortunately, the cleanup code was left alone. Thus, if we fail to allocate DMA or fail to configure the queues using the drivers ifdi methods, we will attempt to free txq and rxq. These variables would now incorrectly point to the wrong location, resulting in a page fault. There are a number of methods to correct this, but ultimately the root cause was that we reuse the txq and rxq pointers for two different purposes. Instead, when allocating, store the returned pointer directly into ctx->ifc_txqs and ctx->ifc_rxqs. Then, assign this to txq and rxq as index pointers before starting the loop to allocate each queue. Drop the cleanup code for txq and rxq, and only use ctx->ifc_txqs and ctx->ifc_rxqs. Thus, we no longer need to free txq or rxq under any error flow, and intsead rely solely on the pointers stored in ctx->ifc_txqs and ctx->ifc_rxqs. This prevents the invalid free(), and ensures that we still properly cleanup after ourselves as before when failing to allocate. Submitted by: Jacob Keller Reviewed by: gallatin, sbruno Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15285	2018-05-04 15:20:34 +00:00
shurd	83da3b83e5	iflib: remove unused brscp pointer from iflib_queues_alloc This pointer was no longer written to as of r315217. Since nothing writes to the variable, remove it. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: gallatin, kmacy, sbruno Differential Revision: https://reviews.freebsd.org/D15284	2018-05-04 15:11:16 +00:00
shurd	fc5848e1e6	Allow iflib NIC drivers to sleep rather than busy wait Since the move to SMP NIC driver locking has had to go through serious contortions using mtx around long running hardware operations. This moves iflib past that. Individual drivers may now sleep when appropriate. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14983	2018-05-03 17:02:31 +00:00
shurd	7d4b8facc7	Separate list manipulation locking from state change in multicast Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead. Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969	2018-05-02 19:36:29 +00:00
gallatin	40ab8d5ea9	Fix iflib_encap() EFBIG handling bugs 1) Don't give up if m_collapse() fails. Rather than giving up, try m_defrag() immediately. 2) Fix a leak where, if the NIC driver rejected the defrag'ed chain as having too many segments, we would fail to free the chain. Reviewed by: Matthew Macy <mmacy@mattmacy.io> (this version of patch) Submitted by: Matthew Macy <mmacy@mattmacy.io> (early version of leak fix)	2018-04-30 23:53:27 +00:00
hselasky	5663cd2837	Add network device event for priority code point, PCP, changes. When the PCP is changed for either a VLAN network interface or when prio tagging is enabled for a regular ethernet network interface, broadcast the IFNET_EVENT_PCP event so applications like ibcore can update its GID tables accordingly. MFC after: 3 days Reviewed by: ae, kib Differential Revision: https://reviews.freebsd.org/D15040 Sponsored by: Mellanox Technologies	2018-04-26 08:58:27 +00:00
brooks	289ce12298	Translate 32-bit ifmedia requests into native ones. We use transformation rather than accessors as virtually ever driver implements SIOCGIFMEDIA and all would have to be touched. Keep the code readable by always performing copies and (possiably no-op) transforms. Reviewed by: jhb, kib Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14996	2018-04-25 15:30:42 +00:00
markj	943580a700	Use dead_bpf_if instead of bp_null. This fixes a -Wunused error when DEV_BPF and NETGRAPH_BPF are not defined. Also remove a stray semicolon added in r332812. X-MFC with: r332812	2018-04-24 17:42:25 +00:00
brooks	9a0f94467e	Finish removing FDDI and tokenring media support. This fixes media display for 802.11 wireless devices. Software outside the base system that uses these media types and defines should use #ifdef IFM_FDDI or IFM_TOKEN to include or remove support. Reported by: zeising Reviewed by: emaste, kib, zeising Tested by: zeising Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15170	2018-04-23 21:10:33 +00:00
ae	c6192ec749	Add dead_bpf_if structure, that should be used as fake bpf_if during ifnet detach. Since destroying interface is not atomic operation and due to the lack of synhronization during destroy, it is possible, that in the time between bpfdetach() and if_free() some queued on destroying interface mbuf will be used by ether_input_internal() and bpf_peers_present() can dereference NULL bpf_if pointer. To protect from this, assign pointer to empty bpf_if_ext structure instead of NULL pointer after bpfdetach(). Reviewed by: melifaro, eugen Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15083	2018-04-20 09:57:31 +00:00
shurd	90779c2bbf	iflib: Fix queue distribution when there are no threads Previously, if there are no threads, all queues which targeted cores that share an L2 cache were bound to a single core. The intent is to distribute them across these cores. Reported by: olivier Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15120	2018-04-18 15:34:18 +00:00
brooks	26c165ead9	Remove support for the Arcnet protocol. While Arcnet has some continued deployment in industrial controls, the lack of drivers for any of the PCI, USB, or PCIe NICs on the market suggests such users aren't running FreeBSD. Evidence in the PR database suggests that the cm(4) driver (our sole Arcnet NIC) was broken in 5.0 and has not worked since. PR: 182297 Reviewed by: jhibbits, vangyzen Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15057	2018-04-13 21:18:04 +00:00
sbruno	18512a7765	Restore r332389 after resolution of locking fixes. Add one extra lock initialization to iflib_register() that was missed in the git<->phab conversion. Split out flag manipulation from general context manipulation in iflib To avoid blocking on the context lock in the swi thread and risk potential deadlocks, this change protects lighter weight updates that only need to be consistent with each other with their own lock. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14967	2018-04-12 14:35:37 +00:00
vmaffione	3c7434c730	netmap: align codebase to the current upstream (commit id 3fb001303718146) Changelist: - Turn tx_rings and rx_rings arrays into arrays of pointers to kring structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib, vtnet and ptnet drivers to cope with the change. - Generalize the nm_config() callback to accept a struct containing many parameters. - Introduce NKR_FAKERING to support buffers sharing (used for netmap pipes) - Improved API for external VALE modules. - Various bug fixes and improvements to the netmap memory allocator, including support for externally (userspace) allocated memory. - Refactoring of netmap pipes: now linked rings share the same netmap buffers, with a separate set of kring pointers (rhead, rcur, rtail). Buffer swapping does not need to happen anymore. - Large refactoring of the control API towards an extensible solution; the goal is to allow the addition of more commands and extension of existing ones (with new options) without the need of hacks or the risk of running out of configuration space. A new NIOCCTRL ioctl has been added to handle all the requests of the new control API, which cover all the functionalities so far supported. The netmap API bumps from 11 to 12 with this patch. Full backward compatibility is provided for the old control command (NIOCREGIF), by means of a new netmap_legacy module. Many parts of the old netmap.h header has now been moved to netmap_legacy.h (included by netmap.h). Approved by: hrs (mentor)	2018-04-12 07:20:50 +00:00
mjg	fa5413e897	iflib: fix up a mismerge in r332419 Lead to crashes on boot while in ifconfig. Submitted by: Matthew Macy <mmacy@mattmacy.io>	2018-04-12 04:11:37 +00:00
shurd	63bcfab69d	Properly initialize ifc_nhwtxqs. Also, since ifc_nhwrxqs is only used in one place, remove it from the struct. This was preventing iflib_dma_free() from being called via iflib_device_detach(). Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks	2018-04-11 21:41:59 +00:00
brooks	6dcf9514b3	Remove support for FDDI networks. Defines in net/if_media.h remain in case code copied from ifconfig is in use elsewere (supporting non-existant media type is harmless). Reviewed by: kib, jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15017	2018-04-11 17:28:24 +00:00
sbruno	a87ab66df6	Revert r332389 as it is causing panics for various users and we need to add some more test cases.	2018-04-11 17:26:53 +00:00
shurd	d55e53acf5	Split out flag manipulation from general context manipulation in iflib To avoid blocking on the context lock in the swi thread and risk potential deadlocks, this change protects lighter weight updates that only need to be consistent with each other with their own lock. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14967	2018-04-10 19:48:24 +00:00
shurd	05f9f1edaa	Make BPF global lock an SX This allows NIC drivers to sleep on polling config operations. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: shurd Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14982	2018-04-10 19:42:50 +00:00
vmaffione	8b391e44ef	netmap: align codebase to upstream version v11.4 Changelist: - remove unused nkr_slot_flags - new nm_intr adapter callback to enable/disable interrupts - remove unused sysctls and document the other sysctls - new infrastructure to support NS_MOREFRAG for NIC ports - support for external memory allocator (for now linux-only), including linux-specific changes in common headers - optimizations within netmap pipes datapath - improvements on VALE control API - new nm_parse() helper function in netmap_user.h - various bug fixes and code clean up Approved by: hrs (mentor)	2018-04-09 09:24:26 +00:00
brooks	c2e6899488	Remove the thread argument from ifr_buffer_*() accessors. They are always used in a context where curthread is the correct thread. This makes them more similar to the ifr_data_get_ptr() accessor.	2018-04-06 23:25:54 +00:00
brooks	7a2353df98	ifconf(): correct handling of sockaddrs smaller than struct sockaddr. Portable programs that use SIOCGIFCONF (e.g. traceroute) assume that each pseudo ifreq is of length MAX(sizeof(struct ifreq), sizeof(ifr_name) + ifr_addr.sa_len). For short sockaddrs we copied too much from the source sockaddr resulting in a heap leak. I believe only one such sockaddr exists (struct sockaddr_sco which is 8 bytes) and it is unclear if such sockaddrs end up on interfaces in practice. If it did, the result would be an 8 byte heap leak on current architectures. admbugs: 869 Reviewed by: kib Obtained from: CheriBSD MFC after: 3 days Security: kernel heap leak Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14981	2018-04-06 20:26:56 +00:00
brooks	9d79658aab	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
kp	337a0778fc	pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. Limit the allocation to required size (or the user allocation, if that's smaller). That does mean we need to do the allocation with the rules lock held (so the number doesn't change while we're doing this), so it can't M_WAITOK. MFC after: 1 week	2018-04-06 15:54:30 +00:00
brooks	0080e81d7c	Add 32-bit compat for ioctls that take struct ifgroupreq. Use an accessor to access ifgr_group and ifgr_groups. Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such as "case SIOCAIFGROUP:". This avoids poluting the switch statements with large numbers of #ifdefs. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14960	2018-04-05 22:14:55 +00:00
brooks	8f46ff8fe4	ifconf(): Always zero the whole struct ifreq. The previous split of zeroing ifr_name and ifr_addr seperately is safe on current architectures, but would be unsafe if pointers were larger than 8 bytes. Combining the zeroing adds no real cost (a few instructions) and makes the security property easier to verify. Reviewed by: kib, emaste Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14912	2018-04-05 21:58:28 +00:00
vmaffione	6dfc9aee0f	netmap: align if_ptnet guest driver to the upstream code (commit 0e15788) The change upgrades the driver to use the split Communication Status Block (CSB) format. In this way the variables written by the guest and read by the host are allocated in a different cacheline than the variables written by the host and read by the guest; this is needed to avoid cache thrashing. Approved by: hrs (mentor)	2018-04-04 21:31:12 +00:00
brooks	2b96daf50f	Document and enforce assumptions about struct (in6_)ifreq. - The two types must be type-punnable for shared members of ifr_ifru. This allows compatibility accessors to be shared. - There must be no padding gap between ifr_name and ifr_ifru. This is assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be broadly portable. This is true for all current architectures, but very large (256-bit) fat-pointers could violate this invariant. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14910	2018-03-30 21:38:53 +00:00
brooks	ac0325b4db	Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900	2018-03-30 18:50:13 +00:00
brooks	a45d44647f	Remove infrastructure for token-ring networks. Reviewed by: cem, imp, jhb, jmallett Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14875	2018-03-28 23:33:26 +00:00
brooks	308f791e1c	Improve copy-and-pasted versions of SIOCGIFADDR. The original implementation used a reference to ifr_data and a cast to do the equivalent of accessing ifr_addr. This was copied multiple times since 1996. Approved by: kib MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14873	2018-03-27 20:51:49 +00:00
brooks	6907bd334c	Fix a whitespace bug missed in refactoring prior to r331641. MFC with: r331641	2018-03-27 18:55:39 +00:00
brooks	0754c526f1	Fix access to ifru_buffer on freebsd32. Make all kernel accesses to ifru_buffer go via access functions which take the process ABI into account and use an appropriate union to access members in the correct place in struct ifreq. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14846	2018-03-27 18:26:50 +00:00
kib	9de215608c	Allow to specify PCP on packets not belonging to any VLAN. According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be considered as untagged, and only PCP and DEI values from the VLAN tag are meaningful. See for instance https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html. Make it possible to specify PCP value for outgoing packets on an ethernet interface. When PCP is supplied, the tag is appended, VLAN id set to 0, and PCP is filled by the supplied value. The code to do VLAN tag encapsulation is refactored from the if_vlan.c and moved into if_ethersubr.c. Drivers might have issues with filtering VID 0 packets on receive. This bug should be fixed for each driver. Reviewed by: ae (previous version), hselasky, melifaro Sponsored by: Mellanox Technologies MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14702	2018-03-27 15:29:32 +00:00
markj	12dff6d870	Clamp IFLIB_RX_COPY_THRESH to MHLEN in iflib_rxd_pkt_get(). If one has added fields to struct mbuf such that MHLEN is smaller than this threshold (128), iflib_rxd_pkt_get() may otherwise overrun the internal mbuf buffer while copying. Reviewed by: mmacy MFC after: 3 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14843	2018-03-25 23:23:19 +00:00
kp	109a7b5eec	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
melifaro	75159f749d	Use count(9) api for the bpf(4) statistics. Currently each bfp descriptor uses u64 variables to maintain its counters. On interfaces with high packet rate this leads to unnecessary contention and inaccurate reporting. PR: kern/205320 Reported by: elofu17 at hotmail.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D14726	2018-03-20 22:57:06 +00:00
melifaro	7bb5ee0db4	Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration. Current arp/nd code relies on the feedback from the datapath indicating that the entry is still used. This mechanism is incorporated into the arpresolve()/nd6_resolve() routines. After the inpcb route cache introduction, the packet path for the locally-originated packets changed, passing cached lle pointer to the ether_output() directly. This resulted in the arp/ndp entry expire each time exactly after the configured max_age interval. During the small window between the ARP/NDP request and reply from the router, most of the packets got lost. Fix this behaviour by plugging datapath notification code to the packet path used by route cache. Unify the notification code by using single inlined function with the per-AF callbacks. Reported by: sthaug at nethelp.no Reviewed by: ae MFC after: 2 weeks	2018-03-17 17:05:48 +00:00
avos	6ddab56276	Correct comment for IFM_IEEE80211_VHT media variant.	2018-03-15 23:32:29 +00:00
ae	d3176e34c7	Define ethernet type 0x88A8 as ETHERTYPE_QINQ. Reviewed by: kp Obtained from: OpenBSD MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14593	2018-03-06 12:01:31 +00:00
shurd	6bea9d5c59	iflib: stop timer callout when stopping iflib_timer has been seen running after the interface had been removed. This change prevents that. Submitted by: matt.macy@joyent.com	2018-03-02 18:48:07 +00:00
kp	fc599d4911	pf: Cope with overly large net.pf.states_hashsize If the user configures a states_hashsize or source_nodes_hashsize value we may not have enough memory to allocate this. This used to lock up pf, because these allocations used M_WAITOK. Cope with this by attempting the allocation with M_NOWAIT and falling back to the default sizes (with M_WAITOK) if these fail. PR: 209475 Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com> MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14367	2018-02-25 08:56:44 +00:00
rstone	dd56500020	Allow route change requests to not specify the gateway. Only require a gateway to be specified on a route add request. On a route change request that does not specify the gateway, the gateway will remain the same. This allows changing other route parameters without having to re-specifying the gateway, like in "route change 10.0.0.0/8 -mtu 9000". Update the route(8) manpage to explicitly call out this usage as being supported. MFC after: 2 weeks Sponsored by: Dell EMC Isilon Reviewed By: eugen (rtsock.c change), rgrimes Differential Revision: https://reviews.freebsd.org/D14291	2018-02-21 19:13:23 +00:00
shurd	6699ee32ca	IFLIB: Make isc_magic unsigned The IFLIB_MAGIC macro is > INT_MAX, so isc_magic should be able to contain it. Reported by: jeb Sponsored by: Limelight Networks	2018-02-21 18:57:00 +00:00
np	a38188ac36	Catch up with the removal of nktr_slot_flags from upstream netmap. No functional impact intended. Submitted by: Vincenzo Maffione <v.maffione@gmail.com>	2018-02-20 21:42:45 +00:00
shurd	c1c5080794	IFLIB: do not remove dmamap on buffer unload Dmamap is created only on IFC attach. If we remove it on buffer release, we won't be able to do ifconfig down&up. Only destroy when in detach. Reported by: wma Reviewed by: wma Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14060	2018-02-20 18:33:45 +00:00
wma	d14bbbd48e	BPF: Switch to 32 bit compatible mode only when thread is 32 bit Sometimes 32 bit and 64 bit ioctls are represented by the same number. It causes unnecessary switch to 32 bit commpatible mode. This patch prevents switching when we are dealing with 64 bit executable. It fixes issue mentioned here Authored by: Patryk Duda <pdk@semihalf.com> Submitted by: Wojciech Macek <wma@semihalf.com> Reviewed by: andrew, wma Obtained from: Semihalf Sponsored by: IBM, QCM Technologies Differential revision: https://reviews.freebsd.org/D14023	2018-01-25 12:13:41 +00:00
smh	c62edd09fd	Added missing CTLFLAG_VNET to lacp default_strict_mode Added CTLFLAG_VNET to net.link.lagg.lacp.default_strict_mode which was missed in r290450. Reported by: julian@ MFC after: 1 week Sponsored by: Multiplay	2018-01-24 10:13:14 +00:00
rstone	9c794ac899	Increment the route table gen count after a modify Increment the route table generation count after modifying a route. This signals back to TCP connections that they need to update their L2 caches as the gateway for their route may have changed. This is a heavier hammer than is needed, strictly speaking, but route changes will be unlikely enough that the performance effects of invalidating all connection route caches should be negligible. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13990 Reviewed by: karels	2018-01-23 03:15:44 +00:00
rstone	c0c5474ab0	Reduce code duplication for inpcb route caching Add a new macro to clear both the L3 and L2 route caches, to hopefully prevent future instances where only the L3 cache was cleared when both should have been. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13989 Reviewed by: karels	2018-01-23 03:15:39 +00:00
rstone	4fb0175a26	Invalidate inpcb LLE cache if cached route is invalidated When the inpcb route cache is invalidated after a change to the routing tables, we need to invalidate the LLE cache as well. Previous to this change packets for the connection would continue to use the old L2 information from the old L3 gateway, and the packets for the connection would likely be blackholed. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13988 Reviewed by: karels	2018-01-23 03:15:39 +00:00
kib	027c7f4d66	Fix compat32 for sysctl net.PF_ROUTE...NET_RT_IFLISTL. Route messages are aligned to the host long type alignment, which breaks 32bit. Reported and tested by: lwhsu Diagnosed by: Yuri Pankov <yuripv@icloud.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-22 20:49:17 +00:00
pfg	ced875130d	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
pfg	bf156bc88c	net*: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:21:51 +00:00
smh	a74104797b	Disabled the use of flowid for lagg by default Disabled the use of RSS hash from the network card aka flowid for lagg(4) interfaces by default as it's currently incompatible with the lacp and loadbalance protocols. The incompatibility is due to the fact that the flowid isn't know for the first packet of a new outbound stream which can result in the hash calculation method changing and hence a stream being incorrectly split across multiple interfaces during normal operation. This can be re-enabled by setting the following in loader.conf: net.link.lagg.default_use_flowid="1" Discussed with: kmacy Sponsored by: Multiplay	2018-01-04 20:05:47 +00:00
kp	affaad48ea	pf: Clean all fragments on shutdown When pf is unloaded, or a vnet jail using pf is stopped we need to ensure we clean up all fragments, not just the expired ones.	2017-12-31 10:01:31 +00:00
bryanv	5f0e777be3	Add macro for vxlan list mutex lock and unlock This will simplify some later VNET support. Submitted by: hrs MFC after: 2 weeks	2017-12-30 19:49:40 +00:00
bryanv	10d5e637cc	Advertise IFCAP_LINKSTAT after r326480 added link status support MFC after: 2 weeks	2017-12-30 19:35:12 +00:00
bryanv	acc32109f0	Add support for IPv6 scoped addresses to vxlan MFC after: 2 weeks	2017-12-30 04:03:53 +00:00
shurd	36e9cfb008	Don't pass rids to taskqgroup_attach() As everywhere else, we want to pass rman_get_start(irq->ii_res). This caused set affinity errors when not using MSI-X vectors (legacy and MSI interrupts). Reported by: sbruno Sponsored by: Limelight Networks	2017-12-27 20:42:30 +00:00
shurd	d81b03b75c	Remove assertion that's not true for !EARLY_AP_STARTUP gtask->gt_taskqueue is NULL when EARLY_AP_STARTUP is not enabled. Remove assertion to allow this config to work. Reported by: oleg Sponsored by: Limelight Networks	2017-12-27 19:14:15 +00:00
shurd	49a54f9138	Fix indentation. Sponsored by: Limelight Networks	2017-12-27 19:12:32 +00:00
eadler	421a929b1e	kernel: Fix several typos and minor errors - duplicate words - typos - references to old versions of FreeBSD Reviewed by: imp, benno	2017-12-27 03:23:21 +00:00
kan	c8da6fae2c	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
kan	693547b17b	Do not pass NULL pointer to copyout in if_clone_list. Sometimes caller is only interested in how many clones are there and NULL pointer is passed for the destination buffer. Do not pass it to copyout then.	2017-12-23 16:45:24 +00:00
kan	1b88a37f88	Remove some trailing whitespace. Reviewed by: glebius, ae Differential Revision: https://reviews.freebsd.org/D10386	2017-12-23 16:24:00 +00:00
kan	c3dc6afa32	Do not double free the memory in if_clone. if_clone_attach function will drop the reference on failure which will free the if_clone structure. No need to do it second time. Reviewed by: glebius, ae Differential Revision: https://reviews.freebsd.org/D10386	2017-12-23 16:23:58 +00:00
imp	cb06f90265	The device tables end with a sentinel in iflib. Don't include the sentinel in the output.	2017-12-23 04:50:52 +00:00
imp	ff6ebd2b2f	Use '#' rather than some made up name for fields we want to ignore.	2017-12-22 17:53:27 +00:00
kib	b287d6f562	Fix build for kernels with SCHED_4BSD. Sponsored by: The FreeBSD Foundation	2017-12-21 23:05:13 +00:00
shurd	4d16fa9f49	Don't call tcp_lro_rx() unless hardware verified TCP/UDP csum It seems that tcp_lro_rx() doesn't verify TCP checksums, so if there are bad checksums in the packets caused by invalid data, the invalid data will pass through without errors. This was noticed with the igb driver and a specific internet host: fetch http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz -o test.bin && sha256 test.bin Would result in a different value sometimes. This ends up making LRO require RXCSUM to be enabled, and RXCSUM to support TCP and UDP checksums. PR: 224346 Reported by: gjb Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D13561	2017-12-21 01:22:36 +00:00
lwhsu	065607f663	Add missing `;` Approved by: kevlo	2017-12-20 06:08:16 +00:00
shurd	607c735784	Support attaching tx queues to cpus This will attempt to use a different thread/core on the same L2 cache when possible, or use the same cpu as the rx thread when not. If SMP isn't enabled, don't go looking for cores to use. This is mostly useful when using shared TX/RX queues. Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12446	2017-12-20 01:03:34 +00:00
shurd	0cf83abb13	Update Matthew Macy contact info Email address has changed, uses consistent name (Matthew, not Matt) Reported by: Matthew Macy <mmacy@mattmacy.io> Differential Revision: https://reviews.freebsd.org/D13537	2017-12-19 17:59:00 +00:00
ae	84904a6912	Fix possible memory leak. vxlan_ftable entries are sorted in ascending order, due to wrong arguments order it is possible to stop search before existing element will be found. Then new element will be allocated in vxlan_ftable_update_locked() and can be inserted in the list second time or trigger MPASS() assertion with enabled INVARIANTS. PR: 224371 MFC after: 1 week	2017-12-16 14:36:21 +00:00
rstone	01dbe7a2a4	Plug an ifaddr leak when changing a route's src If a route is modified in a way that changes the route's source address (i.e. the address used to access the gateway), then a reference on the ifaddr representing the old source address will be leaked if the address type does not have an ifa_rtrequest method defined. Plug the leak by releasing the reference in all cases. Differential Revision: https://reviews.freebsd.org/D13417 Reviewed by: ae MFC after: 3 weeks Sponsored by: Dell	2017-12-14 20:48:50 +00:00
shurd	b5324f21c0	Increment encap_pad_mbuf_fail when m_dup() fails in padding Previously, the counter was only incremented when m_append() failed. Since the function can also fail on m_dup() now, increment the counter there as well. Sponsored by: Limelight Networks	2017-12-11 20:01:28 +00:00
shurd	01b240c231	Free mbuf chain when m_dup fails Fix memory leak where mbuf chain wasn't free()d if iflib_ether_pad() has a failure in m_dup(). Reported by: "Ryan Stone" <rysto32@gmail.com> Sponsored by: Limelight Networks	2017-12-08 19:50:06 +00:00
shurd	c37f5a169b	Handle read-only mbufs in iflib ether pad function If ethernet padding is enabled, and a read-only mbuf is passed, it would modify the mbuf using m_append(). Instead, call m_dup() and append to the new packet. Reported by: Pyun YongHyeon Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D13414	2017-12-08 18:43:31 +00:00
glebius	9c54c9c64c	Garbage collect IFCAP_POLLING_NOCOUNT. It wasn't used since very beginning of polling(4). The module always ignored return value from driver polling handler.	2017-12-06 23:03:34 +00:00
shurd	923e962b98	iflib: Support to padding Ethernet frames to a min size Some bnxt devices do not correctly send frames smaller than 52 bytes (without CRC), so add a quirk that will pad frames to an arbitrary size before passing off to the encap routine. Reported by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D13269	2017-12-05 21:00:31 +00:00
shurd	f54cf2fa3b	Avoid calling CURVNET_[SET\|RESTORE] for each packet The LRO possible test was calling CURVNET_SET once for IPv4 or IPv6 for each packet in a chain. Only call it once per chain instead. Submitted by: Matthew Macy <mmacy@mattmacy.io> Reviewed by: cem, ae Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D13368	2017-12-05 20:43:24 +00:00
erj	87656821dc	ifconfig(8): Display extended compliance code string for SFP transceivers - Updates tables in affected files with new entries from newer spec revisions of SFF-8472, SFF-8024, and SFF-8636 - Change ifconfig to read and display the extended compliance code for SFP media if the extended compliance code is not 0. This was being displayed for QSFP transceivers only, but SFP28 media uses this to report 25G capability. Reviewed by: melifaro, sbruno Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D13286	2017-12-05 18:42:07 +00:00
bryanv	8a35018f48	Add if media and link status events to vxlan PR: 214359 MFC after: 2 weeks	2017-12-02 22:04:00 +00:00
shurd	76ca06b62f	Add support for SIOCGIFXMEDIA to iflib SIOCGIFXMEDIA is required for extended ethernet media types, but iflib did not support it. Reported by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D13312	2017-12-01 17:58:20 +00:00
hselasky	923fadf4e8	Properly define the VLAN_XXX() function macros to avoid miscompilation when used inside "if" statements comparing with another value. Detailed explanation: "if (a ? b : c != 0)" is not the same like "if ((a ? b : c) != 0)" which is the expected behaviour of a function macro. Affects: toecore, linuxkpi and ibcore. Reviewed by: kib MFC after: 3 days Sponsored by: Mellanox Technologies	2017-11-30 11:35:22 +00:00
shurd	23eaeecabf	Fix comment introduced in r326369 The code uses the set of all CPUs, it doesn't zero out the set. Sponsored by: Limelight Networks	2017-11-29 18:21:17 +00:00
shurd	0de624aa47	Ensure that ctx->ifc_cpus is always initialized If a device didn't support MSI-X, ctx->ifc_cpus would not be initialized, but the IRQ allocation routines still uses the value. Move the initialization to common code. Sponsored by: Limelight Networks	2017-11-29 18:14:57 +00:00
hselasky	30eed323c8	Disallow TUN and TAP character device IOCTLs to modify the network device type to any value. This can cause page faults and panics due to accessing uninitialized fields in the "struct ifnet" which are specific to the network device type. MFC after: 1 week Found by: jau@iki.fi PR: 223767 Sponsored by: Mellanox Technologies	2017-11-29 09:40:11 +00:00
pfg	78a6b08618	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
shurd	4334349090	Fix off-by-one error in bit_nclear() usage bit_nclear() takes the bit numbers for the start and end bits, not the start and a count. This was resulting in memory corruption past the end of the bitstr_t. Sponsored by: Limelight Networks	2017-11-20 21:57:04 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
kib	1df1702579	Fix build. Sponsored by: The FreeBSD Foundation	2017-11-19 11:21:16 +00:00
pfg	9da7bdde06	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133	2017-11-18 14:26:50 +00:00
shurd	0708b7cc37	Fix default numbers of iflib queue sets The intent appears to be having one RX/TX queue set per core, but since scctx->isc_n[tr]xqsets is set to max before calling iflib_msix_init(), both end up being set to total number of cores. Use ctx->ifc_sysctl_n[rt]xqs as the selected value and scctx->isc_n[rt]xqsets as the max. This should result in what appears to be the intended behaviour Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D13096	2017-11-16 18:52:58 +00:00
antoine	870ff83df6	Do not leak control in raw_usend	2017-11-08 23:20:05 +00:00
kib	ce9362dfb8	Add a place for a driver to report rx timestamps in nanoseconds from boot for the received packets. The rcv_tstmp field overlaps the place of Ln header length indicators, not used by received packets. The basic pkthdr rearrangement change in sys/mbuf.h was provided by gallatin. There are two accompanying M_ flags: M_TSTMP means that there is the timestamp (and it was generated by hardware). Another flag M_TSTMP_HPREC indicates that the timestamp is high-precision. Practically M_TSTMP_HPREC means that hardware provided additional precision comparing with the stamps when the flag is not set. E.g., for ConnectX all packets are stamped by hardware when PCIe transaction to write out the completion descriptor is performed, but PTP packet are stamped on port. For Intel cards, when PTP assist is enabled, only PTP packets are stamped in the limited number of registers, so if Intel cards ever start support this mechanism, they would always set M_TSTMP \| M_TSTMP_HPREC if hardware timestamp is present for the given packet. Add IFCAP_HWRXTSTMP interface capability to indicate the support for hardware rx timestamping, and ifconfig(8) command to toggle it. Based on the patch by: gallatin Reviewed by: gallatin (previous version), hselasky Sponsored by: Mellanox Technologies MFC after: 2 weeks (? mbuf KBI issue) X-Differential revision: https://reviews.freebsd.org/D12638	2017-11-07 09:29:14 +00:00
sbruno	4824ebc09f	Fix NOINET/NOINET6 build during compilation of iflib. Reported by: kib	2017-11-06 19:54:25 +00:00
shurd	230873ed5b	Only chain non-LRO mbufs when LRO is not possible Preserve packet order between tcp_lro_rx() and if_input() to avoid creating extra corner cases. If no packets can be LROed, combine them into one chain for submission via if_input(). If any packet can potentially be LROed however, retain old behaviour and call if_input() for each packet. This should keep the 12% improvement for small packet forwarding intact, but mostly avoids impacting the LRO case. Reviewed by: cem, sbruno Approved by: sbruno (mentor) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12876	2017-11-06 16:23:21 +00:00
eugen	8ac1be2af4	Allow a process to assign an IP address to local ppp interface even if kernel routing table already has a route to the address in question installed by some routing daemon (PR 223129). Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647). PR: 222647, 223129 Reviewed by: gnn Approved by: avg (mentor), mav (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12747	2017-11-05 14:41:48 +00:00
kp	cbe96f9524	epair: Fix panic on unload The VNET_SYSUNINIT() callback is executed after the MOD_UNLOAD. That means that netisr_unregister() has already been called when netisr_unregister_vnet() gets calls, leading to an assertion failure. Restore the expected order of operations by performing everything that was done in MOD_UNLOAD to a SYSUNINIT() (that will be called after the VNET_SYSUNINIT()). Differential Revision: https://reviews.freebsd.org/D12771	2017-11-01 14:27:26 +00:00

... 2 3 4 5 6 ...

4079 Commits