freebsd-skq

Author	SHA1	Message	Date
Kristof Provost	fac24ad7f0	bridge: Simplify mac address generation Unconditionally use ether_gen_addr() to generate bridge mac addresses. This function is now less likely to generate duplicate mac addresses across jails. The old hand rolled hostid based code adds no value. Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D24432	2020-04-18 08:00:58 +00:00
Kristof Provost	3f8bc99c4c	ethersubr: Make the mac address generation more robust If we create two (vnet) jails and create a bridge interface in each we end up with the same mac address on both bridge interfaces. These very often conflicts, resulting in same mac address in both jails. Mitigate this problem by including the jail name in the mac address. Reviewed by: kevans, melifaro MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24383	2020-04-18 07:50:30 +00:00
Alexander V. Chernikov	ae4b62595e	Unbreak build by reverting if_bridge part of r360047. Pointy hat to: melifaro	2020-04-17 18:22:37 +00:00
Alexander V. Chernikov	6745294280	Finish r191148: replace rtentry with route in if_bridge if_output() callback. Generic if_output() callback signature was modified to use struct route instead of struct rtentry in r191148, back in 2009. Quoting commit message: Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Fix bridge_output() to match this signature and update the remaining comment in if_var.h. Reviewed by: kp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24394	2020-04-17 17:05:58 +00:00
Adrian Chadd	2538a4b1b4	Remove an duplicate definition of nhops_dump_sysctl() One of the source files included both nhop.h and shared.h, leading to this clashing. Tested with: mips-gcc cross toolchain	2020-04-16 23:28:47 +00:00
Alexander V. Chernikov	5cdc484410	Fix userland build broken by r360014.	2020-04-16 17:53:23 +00:00
Alexander V. Chernikov	539642a29d	Add nhop parameter to rti_filter callback. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. Additionally, with the followup multipath changes, single rtentry can point to multiple nexthops. With that in mind, convert rti_filter callback used when traversing the routing table to accept pair (rt, nhop) instead of nexthop. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24440	2020-04-16 17:20:18 +00:00
Alexander V. Chernikov	dd4776f0cc	Reorganise nd6 notification code to avoid direct rtentry field access. One of the goals of the new routing KPI defined in r359823 is to entirely hide `struct rtentry` from the consumers. Doing so will allow to improve routing subsystem internals and deliver features more easily. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification code to avoid accessing most of the rtentry fields. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24404	2020-04-14 22:48:33 +00:00
Alexander V. Chernikov	1968b7eb21	Postpone multipath seed init till SI_SUB_LAST, as it is needed only after some useland program installs multiple paths to the same destination. While here, make multipath init conditional. Discussed with: cem,ian	2020-04-14 07:38:34 +00:00
Andrew Gallatin	bd673b9942	lagg: stop double-counting output errors and counting drops as errors Before this change, lagg double-counted errors from lagg members, and counted every drop by a lagg member as an error. Eg, if lagg sent a packet, and the underlying hardware driver dropped it, a counter would be incremented by both lagg and the underlying driver. This change attempts to fix that by incrementing lagg's counters only for errors that do not come from underlying drivers. Reviewed by: hselasky, jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24331	2020-04-13 23:06:56 +00:00
Alexander V. Chernikov	a666325282	Introduce nexthop objects and new routing KPI. This is the foundational change for the routing subsytem rearchitecture. More details and goals are available in https://reviews.freebsd.org/D24141 . This patch introduces concept of nexthop objects and new nexthop-based routing KPI. Nexthops are objects, containing all necessary information for performing the packet output decision. Output interface, mtu, flags, gw address goes there. For most of the cases, these objects will serve the same role as the struct rtentry is currently serving. Typically there will be low tens of such objects for the router even with multiple BGP full-views, as these objects will be shared between routing entries. This allows to store more information in the nexthop. New KPI: struct nhop_object fib4_lookup(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid); struct nhop_object fib6_lookup(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid); These 2 function are intended to replace all all flavours of <in_\|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous fib[46]-generation functions. Upon successful lookup, they return nexthop object which is guaranteed to exist within current NET_EPOCH. If longer lifetime is desired, one can specify NHR_REF as a flag and get a referenced version of the nexthop. Reference semantic closely resembles rtentry one, allowing sed-style conversion. Additionally, another 2 functions are introduced to support uRPF functionality inside variety of our firewalls. Their primary goal is to hide the multipath implementation details inside the routing subsystem, greatly simplifying firewalls implementation: int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); All functions have a separate scopeid argument, paving way to eliminating IPv6 scope embedding and allowing to support IPv4 link-locals in the future. Structure changes: * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size. * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz. Old KPI: During the transition state old and new KPI will coexists. As there are another 4-5 decent-sized conversion patches, it will probably take a couple of weeks. To support both KPIs, fields not required by the new KPI (most of rtentry) has to be kept, resulting in the temporary size increase. Once conversion is finished, rtentry will notably shrink. More details: * architectural overview: https://reviews.freebsd.org/D24141 * list of the next changes: https://reviews.freebsd.org/D24232 Reviewed by: ae,glebius(initial version) Differential Revision: https://reviews.freebsd.org/D24232	2020-04-12 14:30:00 +00:00
Alexander V. Chernikov	62d95afacb	Fix build by adding forgotten header to radix_mpath.c after r359797.	2020-04-11 09:38:45 +00:00
Alexander V. Chernikov	4684d3cbcb	Remove per-AF radix_mpath initializtion functions. Split their functionality by moving random seed allocation to SYSINIT and calling (new) generic multipath function from standard IPv4/IPv5 RIB init handlers. Differential Revision: https://reviews.freebsd.org/D24356	2020-04-11 07:37:08 +00:00
Alexander V. Chernikov	aef2d5fb0e	Split rtrequest1_fib() into smaller manageable chunks. No functional changes. * Move route addition / route deletion code from rtrequest1_fib() to add_route() and del_route() respectively. * Rename rtrequest1_fib_change() to change_route() for consistency. * Shrink the scope of ugly info #defines. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24349	2020-04-10 16:27:27 +00:00
Kristof Provost	dd00a42a6b	bridge: Change lists to CK_LIST as a peparation for epochification Prepare the ground for a rework of the bridge locking approach. We will use an epoch-based approach in the datapath and making it safe to iterate over the interface, span and rtnode lists without holding the BRIDGE_LOCK. Replace the relevant lists by their ConcurrencyKit equivalents. No functional change in this commit. Reviewed by: emaste, ae, philip (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24249	2020-04-05 17:15:20 +00:00
Ed Maste	aeb665b538	remove extraneous double ;s in sys/	2020-03-30 16:04:25 +00:00
Mark Johnston	59d50fe5ef	Simplify taskqgroup inititialization. taskqgroup initialization was broken into two steps: 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; 2. initialize taskqueues, start taskqueue threads, enqueue "binder" tasks to bind threads to specific CPUs, at SI_SUB_SMP. Step 2 tries to handle the case where tasks have already been attached to a queue, by migrating them to their intended queue. In particular, tasks can't be enqueued before step 2 has completed. This breaks NFS mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP is not defined, since mountroot happens before SI_SUB_SMP in this case. Simplify initialization: do all initialization except for CPU binding at SI_SUB_TASKQ. This means that until CPU binding is completed, group tasks may be executed on a CPU other than that to which they were bound, but this should not be a problem for existing users of the taskqgroup KPIs. Reported by: sbruno Tested by: bdragon, sbruno MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24188	2020-03-30 14:22:52 +00:00
Conrad Meyer	704101dd3d	Fix PNP matching for iflib NIC drivers The previous descriptor string specified that all fields were significant for match. However, the only significant fields for in-tree drivers are vendor:devid, and the fictitious zero values constructed by PVID() did not match real subvendor, subdevice, revision, and/or class values, resulting in no automatic probe. If a future iflib driver needs to match on other criteria, the descriptor string can be updated accordingly. (E.g., "V32" and ~0 for unspecified values in PVID().) Reported by: mav Sponsored by: Dell EMC Isilon	2020-03-24 19:20:10 +00:00
Ed Maste	ed6611cc8c	iflib: simplify MPASS assertion Submitted by: andrew	2020-03-24 17:54:34 +00:00
Ed Maste	68af0153a7	iflib: split compound assertion ThunderX cluster systems are panicking on boot with a failed assertion MPASS(gtask != NULL && gtask->gt_taskqueue != NULL). Split the assertion so that it's clear which part is failing.	2020-03-24 17:25:56 +00:00
Patrick Kelsey	876996910a	Remove extraneous code from iflib ifsd_cidx is never used, and the line removed from rxd_frag_to_sd() is just dead code. Reviewed by: erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23951	2020-03-14 20:13:42 +00:00
Patrick Kelsey	3caff1885f	Remove refill budget from iflib Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23948	2020-03-14 19:58:50 +00:00
Patrick Kelsey	b38136097a	Allow iflib drivers to specify the buffer size used for each receive queue Reviewed by: erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23947	2020-03-14 19:56:46 +00:00
Patrick Kelsey	e503049067	Remove freelist contiguous-indexes assertion from rxd_frag_to_sd() The vmx driver is an example of an iflib driver that might report packets using non-contiguous descriptors (with unused descriptors either between received packets or between the fragments of a received packet), so this assertion needs to be removed. For such drivers, the freelist producer and consumer indexes don't relate directly to driver ring slots (the driver deals directly with freelist buffer indexes supplied by iflib during refill, and reports them with each fragment during packet reception), but do continue to be used by iflib for accounting, such as determining the number of ring slots that are refillable. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23946	2020-03-14 19:55:05 +00:00
Patrick Kelsey	4f2beb721b	Fix iflib zero-length fragment handling The dmamap for zero-length fragments should not be unloaded, as doing so breaks the the cluster-reuse logic in _iflib_fl_refill(). All zero-length fragments are now handled by the assemble_segments() path so that the cluster-reuse logic there does not have to be replicated in the small-single-fragment-packet path of iflib_rxd_pkt_get(). Packets consisting entirely of zero-length fragments (which result in a NULL mbuf pointer) are now properly tolerated. This allows drivers (such as the vmx driver) to pass such packets to iflib when a descriptor error occurs during packet reception, the advantage being that the refill of descriptors associated with the error packet are handled via the existing iflib machinery without having to duplicate parts of that machinery in the driver to handle that error case. Reviewed by: avg, erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23945	2020-03-14 19:51:55 +00:00
Patrick Kelsey	9e9b738ac5	Fix iflib freelist state corruption This fixes a bug in iflib freelist management that breaks the required correspondence between freelist indexes and driver ring slots. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: avg, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23943	2020-03-14 19:43:44 +00:00
Andrew Gallatin	98085bae8c	make lacp's use_numa hashing aware of send tags When I did the use_numa support, I missed the fact that there is a separate hash function for send tag nic selection. So when use_numa is enabled, ktls offload does not work properly, as it does not reliably allocate a send tag on the proper egress nic since different egress nics are selected for send-tag allocation and packet transmit. To fix this, this change: - refectors lacp_select_tx_port_by_hash() and lacp_select_tx_port() to make lacp_select_tx_port_by_hash() always called by lacp_select_tx_port() - pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash() - adds a numa_domain field to if_snd_tag_alloc_params - plumbs the numa domain into places where we allocate send tags In testing with NIC TLS setup on a NUMA machine, I see thousands of output errors before the change when enabling kern.ipc.tls.ifnet.permitted=1. After the change, I see no errors, and I see the NIC sysctl counters showing active TLS offload sessions. Reviewed by: rrs, hselasky, jhb Sponsored by: Netflix	2020-03-09 13:44:51 +00:00
Bjoern A. Zeeb	3818c25a1d	Implement optional table entry limits for if_llatbl. Implement counting of table entries linked on a per-table base with an optional (if set > 0) limit of the maximum number of table entries. For that the public lltable_link_entry() and lltable_unlink_entry() functions as well as the internal function pointers change from void to having an int return type. Given no consumer currently sets the new llt_maxentries this can be committed on its own. The moment we make use of the table limits, the callers of the link function must check the return value as it can change and entries might not be added. Adjustments for IPv6 (and possibly IPv4) will follow. Sponsored by: Netflix (originally) Reviewed by: melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22713	2020-03-04 17:17:02 +00:00
Brooks Davis	8ad798ae9a	Expose ifr_buffer_get_(buffer\|length) outside if.c. This is a preparatory commit for D23933. Reviewed by: jhb	2020-03-03 18:05:11 +00:00
Alexander V. Chernikov	ea2773323c	Fix dynamic redrects by adding forgotten RTF_HOST flag. Improve tests to verify the generated route flags. Reported by: jtl MFC after: 2 weeks	2020-03-03 15:33:43 +00:00
Kyle Evans	072afcdffc	if_edsc: generate an arbitrary MAC address When generating an cloned interface instance in edsc_clone_create(), generate a MAC address from the FF OUI with ether_gen_addr(). This allows us to have unique local-link addresses. Previously, the MAC address was zero. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D23842	2020-03-02 02:45:57 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Randall Stewart	d7313dc6f5	This commit expands tcp_ratelimit to be able to handle cards like the mlx-c5 and c6 that require a "setup" routine before the tcp_ratelimit code can declare and use a rate. I add the setup routine to if_var as well as fix tcp_ratelimit to call it. I also revisit the rates so that in the case of a mlx card of type c5/6 we will use about 100 rates concentrated in the range where the most gain can be had (1-200Mbps). Note that I have tested these on a c5 and they work and perform well. In fact in an unloaded system they pace right to the correct rate (great job mlx!). There will be a further commit here from Hans that will add the respective changes to the mlx driver to support this work (which I was testing with). Sponsored by: Netflix Inc. Differential Revision: ttps://reviews.freebsd.org/D23647	2020-02-26 13:48:33 +00:00
Kristof Provost	33b1fe11a2	bridge: Move locking defines into if_bridge.c The locking defines for if_bridge used to live in if_bridgevar.h, but they're only ever used by the bridge implementation itself (in if_bridge.c). Moving them into the .c file. Reported by: philip, emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23808	2020-02-26 08:47:18 +00:00
Gleb Smirnoff	e87c494015	Although most of the NIC drivers are epoch ready, due to peer pressure switch over to opt-in instead of opt-out for epoch. Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch when processing its packets. Now this will create recursive entrance in epoch in >90% network drivers, but will guarantee safeness of the transition. Mark several tested drivers as IFF_KNOWSEPOCH. Reviewed by: hselasky, jeff, bz, gallatin Differential Revision: https://reviews.freebsd.org/D23674	2020-02-24 21:07:30 +00:00
Bjoern A. Zeeb	10108cb673	Partially revert VNET change and expand VNET structure. Revert parts of r353274 replacing vnet_state with a shutdown flag. Not having the state flag for the current SI_SUB_* makes it harder to debug kernel or module panics related to VNET bringup or teardown. Not having the state also does not allow us to check for other dependency levels between components, e.g. for moving interfaces. Expand the VNET structure with the new boolean flag indicating that we are doing a shutdown of a given vnet and update the vnet magic cookie for the change. Update libkvm to compile with a bool in the kernel struct. Bump __FreeBSD_version for (external) module builds to more easily detect the change. Reviewed by: hselasky MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23097	2020-02-17 11:08:50 +00:00
Hans Petter Selasky	bacb11c9ed	Fix kernel panic while trying to read multicast stream. When VIMAGE is enabled make sure the "m_pkthdr.rcvif" pointer is set for all mbufs being input by the IGMP/MLD6 code. Else there will be a NULL-pointer dereference in the netisr code when trying to set the VNET based on the incoming mbuf. Add an assert to catch this when queueing mbufs on a netisr to make debugging of similar cases easier. Found by: Vladislav V. Prodan PR: 244002 Reviewed by: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies	2020-02-17 09:46:32 +00:00
Hans Petter Selasky	f98977b521	Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process incoming packets in taskqueue context. This patch extends r357772. Tested by: yp@mm.st Sponsored by: Mellanox Technologies	2020-02-12 09:19:47 +00:00
Hans Petter Selasky	fb1a29b45e	Make sure the so-called end of receive interrupts don't starve in iflib. When the receive ring cannot be filled with mbufs, due to lack of memory, no more interrupts may be generated to fill the receive ring later on. Make sure to have a watchdog, to try refilling the receive ring from time to time, hopefully when more mbufs are available. Differential Revision: https://reviews.freebsd.org/D23315 MFC after: 1 week Reviewed by: gallatin@ Sponsored by: Mellanox Technologies	2020-02-12 08:30:07 +00:00
Gleb Smirnoff	6c3e93cb5a	Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process incoming packets in taskqueue context. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D23518	2020-02-11 18:57:07 +00:00
Konstantin Belousov	5d1277ca9a	if_media.h: Add 50G KR4 ethernet media type. Submitted by: Adam Peace <adam.e.peace@gmail.com> Reviewed by: hselasky Differential revision: https://reviews.freebsd.org/D23620	2020-02-11 18:03:45 +00:00
Konstantin Belousov	48ad3b215c	if_media.c: staticize and constify ifmedia description structures used under IFMEDIA_DEBUG. The reason for this change is to make it clear the scope of the in-kernel usage of IFM_TYPE_DESCRIPTIONS and IFM_SUBTYPE_ETHERNET_DESCRIPTIONS macros. Also it is somewhat better C. Reviewed by: hselasky Sponsored by: Mellanox Technologies Differential revision: https://reviews.freebsd.org/D23620	2020-02-11 17:45:01 +00:00
Konstantin Belousov	a249895df8	if_media.c: use __FBSDID(). Reviewed by: hselasky Sponsored by: Mellanox Technologies Differential revision: https://reviews.freebsd.org/D23620	2020-02-11 17:41:45 +00:00
Pedro F. Giffuni	a1b769b32d	typo: stray spaces. No functional change	2020-02-07 15:16:04 +00:00
Jeff Roberson	cd0be8b2ed	Temporarily force IFF_NEEDSEPOCH until drivers have been resolved. Recent network epoch changes have left some drivers unexpectedly broken and there is not yet a consensus on the correct fix. This is patch is a minor performance impact until we can agree on the correct path forward. Reviewed by: core, network, imp, glebius, hselasky Differential Revision: https://reviews.freebsd.org/D23515	2020-02-06 20:47:50 +00:00
Pedro F. Giffuni	abfc5e8591	ethernet: Add a couple more Ethertypes. Powerlink and Sercos III are used in automation. Both have been standardized and In the case of Ethernet Powerlink there is a BSD-licensed stack.	2020-02-05 19:11:07 +00:00
Pedro F. Giffuni	2a1481fbbf	typo: Registration. Pointed by: Dikshie Fauzie	2020-02-03 02:02:13 +00:00
Pedro F. Giffuni	ad2b6d4e9b	ethernet: Minor cleanup. Consistently use uppercase for ethertype hex numbers.	2020-02-03 01:08:15 +00:00
Pedro F. Giffuni	b33c19776b	style(9): Fix spaces after #define. No functional change.	2020-02-02 19:02:07 +00:00
Pedro F. Giffuni	682397c263	ethernet: add some more Ethertypes. Sort ETHERTYPE_FCOE, from r357414.	2020-02-02 18:33:20 +00:00

1 2 3 4 5 ...

4333 Commits