freebsd-skq

Author	SHA1	Message	Date
Alexander V. Chernikov	8a0d57baec	[fib algo] Delay algo init at fib growth to to allow to reliably use rib KPI. Currently, most of the rib(9) KPI does not use rnh pointers, using fibnum and family parameters to determine the rib pointer instead. This works well except for the case when we initialize new rib pointers during fib growth. In that case, there is no mapping between fib/family and the new rib, as an entirely new rib pointer array is populated. Address this by delaying fib algo initialization till after switching to the new pointer array and updating the number of fibs. Set datapath pointer to the dummy function, so the potential callers won't crash the kernel in the brief moment when the rib exists, but no fib algo is attached. This change allows to avoid creating duplicates of existing rib functions, with altered signature. Differential Revision: https://reviews.freebsd.org/D29969 MFC after: 1 week	2021-04-27 22:10:08 +00:00
Alexander V. Chernikov	439d087d0b	[fib algo] always commit static routes synchronously. Modular fib lookup framework features logic that allows route update batching for the algorithms that cannot easily apply the routing change without rebuilding. As a result, dataplane lookups may return old data until the the sync takes place. With the default sync timeout of 50ms, it is possible that new binary like ping(8) executed exactly after route(8) will still use the old fib data. To address some aspects of the problem, framework executes all rtable changes without RTF_GATEWAY synchronously. To fix the aforementioned problem, this diff extends sync execution for all RTF_STATIC routes (e.g. ones maintained by route(8). This fixes a bunch of tests in the networking space. Reported by: ci, arichardson MFC after: 2 weeks	2021-04-27 08:31:40 +00:00
Alexander V. Chernikov	25682e6a49	Fix rtsock sockaddr alignment. `b31fbebeb3` introduced alloc_sockaddr_aligned() which, in fact, failed to produce aligned addresses. Reported by: Oskar Holmlund <oskar.holmlund at yahoo.com> MFC after: immediately	2021-04-27 08:04:19 +00:00
Alexander V. Chernikov	bc5ef45aec	Fix drace CTF for the rib_head. `33cb3cb2e3` introduced an `rib_head` structure field under the FIB_ALGO define. This may be problematic for the CTF, as some of the files including `route_var.h` do not have `fib_algo` defined. Make dtrace happy by making the field unconditional. Suggested by: markj	2021-04-27 07:47:53 +00:00
Kristof Provost	5f5bf88949	pfsync: Expose PFSYNCF_OK flag to userspace Add 'syncok' field to ifconfig's pfsync interface output. This allows userspace to figure out when pfsync has completed the initial bulk import. Reviewed by: donner MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29948	2021-04-26 14:31:17 +02:00
Kristof Provost	6fcc8e042a	pf: Allow multiple labels to be set on a rule Allow up to 5 labels to be set on each rule. This offers more flexibility in using labels. For example, it replaces the customer 'schedule' keyword used by pfSense to terminate states according to a schedule. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29936	2021-04-26 14:14:21 +02:00
Patrick Kelsey	ca7005f189	iflib: Improve mapping of TX/RX queues to CPUs iflib now supports mapping each (TX,RX) queue pair to the same CPU (default), to separate CPUs, or to a pair of physical and logical CPUs that share the same L2 cache. The mapping mechanism supports unequal numbers of TX and RX queues, with the excess queues always being mapped to consecutive physical CPUs. When the platform cannot distinguish between physical and logical CPUs, all are treated as physical CPUs. See the comment on get_cpuid_for_queue() for the entire matrix. The following device-specific tunables influence the mapping process: dev.<device>.<unit>.iflib.core_offset (existing) dev.<device>.<unit>.iflib.separate_txrx (existing) dev.<device>.<unit>.iflib.use_logical_cores (new) The following new, read-only sysctls provide visibility of the mapping results: dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu When an iflib driver allocates TX softirqs without providing reference RX IRQs, iflib now binds those TX softirqs to CPUs using the above mapping mechanism (that is, treats them as if they were TX IRQs). Previously, such bindings were left up to the grouptaskqueue code and thus fell outside of the iflib CPU mapping strategy. Reviewed by: kbowling Tested by: olivier, pkelsey MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D24094	2021-04-26 01:06:34 -04:00
Alexander V. Chernikov	7d222ce3c1	Fix NOINET[6],!VIMAGE builds after FIB_ALGO addition to GENERIC Reported by: jbeich PR: 255390	2021-04-21 05:53:42 +01:00
Alexander V. Chernikov	67372fb3e0	Fix NOINET[6] build after enabling FIB_ALGO in GENERIC. Submitted by: jbeich PR: 255389	2021-04-21 02:49:18 +01:00
Alexander V. Chernikov	c23385612d	[fib algo] Do not print algo attach/detach message on boot MFC after: 1 day	2021-04-25 08:58:06 +00:00
Alexander V. Chernikov	a81e2e7890	Make gcc happy by initializing error in rib_handle_ifaddr_info().	2021-04-25 08:44:59 +00:00
Stefan Eßer	6409e59427	Fix build with gcc Correctly declare function without arguments as f(void) instead of f().	2021-04-25 10:15:17 +02:00
Alexander V. Chernikov	5d1403a79a	[rtsock] Enforce netmask/RTF_HOST consistency. Traditionally we had 2 sources of information whether the added/delete route request targets network or a host route: netmask (RTA_NETMASK) and RTF_HOST flag. The former one is tricky: netmask can be empty or can explicitly specify the host netmask. Parsing netmask sockaddr requires per-family parsing and that's what rtsock code traditionally avoided. As a result, consistency was not enforced and it was possible to specify network with the RTF_HOST flag and vice versa. Continue normalization efforts from D29826 and D29826 and ensure that RTF_HOST flag always reflects host/network data from netmask field. Differential Revision: https://reviews.freebsd.org/D29958 MFC after: 2 days	2021-04-24 22:41:27 +00:00
Mark Johnston	8e8f1cc9bb	Re-enable network ioctls in capability mode This reverts a portion of `274579831b` ("capsicum: Limit socket operations in capability mode") as at least rtsol and dhcpcd rely on being able to configure network interfaces while in capability mode. Reported by: bapt, Greg V Sponsored by: The FreeBSD Foundation	2021-04-23 09:22:49 -04:00
Andrew Gallatin	3183d0b680	iflib: initialize LRO unconditionally Changes to the LRO code have exposed a bug in iflib where devices which are not capable of doing LRO are still calling tcp_lro_flush_all(), even when they have not initialized the LRO context. This used to be mostly harmless, but the LRO code now sets the VNET based on the ifp in the lro context and will try to access it through a NULL ifp resulting in a panic at boot. To fix this, we unconditionally initializes LRO so that we have a valid LRO context when calling tcp_lro_flush_all(). One alternative is to check the device capabilities before calling tcp_lro_flush_all() or adding a new state flag in the ctx. However, it seems unwise to add an extra, mostly useless test for higher performance devices when we can just initialize LRO for all devices. Reviewed by: erj, hselasky, markj, olivier Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D29928	2021-04-23 05:55:20 -04:00
Alexander V. Chernikov	33cb3cb2e3	Fix rib generation count for fib algo. Currently, PCB caching mechanism relies on the rib generation counter (rnh_gen) to invalidate cached nhops/LLE entries. With certain fib algorithms, it is now possible that the datapath lookup state applies RIB changes with some delay. In that scenario, PCB cache will invalidate on the RIB change, but the new lookup may result in the same nexthop being returned. When fib algo finally gets in sync with the RIB changes, PCB cache will not receive any notification and will end up caching the stale data. To fix this, introduce additional counter, rnh_gen_rib, which is used only when FIB_ALGO is enabled. This counter is incremented by the control plane. Each time when fib algo synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value. Differential Revision: https://reviews.freebsd.org/D29812 Reviewed by: donner MFC after: 2 weeks	2021-04-20 22:02:41 +00:00
Alexander V. Chernikov	b31fbebeb3	Relax rtsock message restrictions. Address multiple issues with strict rtsock message validation. D28668 "normalisation" approach was based on the assumption that we always have at least "standard" sockaddr len. It turned out to be false - certain older applications like quagga or routed abuse sin[6]_len field and set it to the offset to the first fully-zero bit in the mask. It is impossible to normalise such sockaddrs without reallocation. With that in mind, change the approach to use a distinct memory buffer for the altered sockaddrs. This allows supporting the older software while maintaining the guarantee on the "standard" sockaddrs. PR: 255273,255089 Differential Revision: https://reviews.freebsd.org/D29826 MFC after: 3 days	2021-04-20 21:34:19 +00:00
Alexander V. Chernikov	758c9d54d4	Improve error reporting in rtsock.c MFC after: 3 days	2021-04-19 20:36:41 +00:00
Kristof Provost	42ec75f83a	pf: Optionally attempt to preserve rule counter values across ruleset updates Usually rule counters are reset to zero on every update of the ruleset. With keepcounters set pf will attempt to find matching rules between old and new rulesets and preserve the rule counters. MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29780	2021-04-19 14:31:47 +02:00
Kristof Provost	4f1f67e888	pf: PFRULE_REFS should not be user-visible Split the PFRULE_REFS flag from the rule_flag field. PFRULE_REFS is a kernel-internal flag and should not be exposed to or read from userspace. MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29778	2021-04-19 14:31:47 +02:00
Jonah Caplan	0e4025bffa	bridgestp: validate timer values in config BPDU IEEE Std 802.1D-2004 Section 17.14 defines permitted ranges for timers. Incoming BPDU messages should be checked against the permitted ranges. The rest of 17.14 appears to be enforced already. PR: 254924 Reviewed by: kp, donner Differential Revision: https://reviews.freebsd.org/D29782	2021-04-19 12:09:18 +02:00
Alexander V. Chernikov	0abb6ff590	fib algo: do not reallocate datapath index for datapath ptr update. Fib algo uses a per-family array indexed by the fibnum to store lookup function pointers and per-fib data. Each algorithm rebuild currently requires re-allocating this array to support atomic change of two pointers. As in reality most of the changes actually involve changing only data pointer, add a shortcut performing in-flight pointer update. MFC after: 2 weeks	2021-04-18 16:12:13 +01:00
Alexander V. Chernikov	e2f79d9e51	Fib algo: extend KPI by allowing algo to set datapath pointers. Some algorithms may require updating datapath and control plane algo pointers after the (batched) updates. Export fib_set_datapath_ptr() to allow setting the new datapath function or data pointer from the algo. Add fib_set_algo_ptr() to allow updating algo control plane pointer from the algo. Add fib_epoch_call() epoch(9) wrapper to simplify freeing old datapath state. Reviewed by: zec Differential Revision: https://reviews.freebsd.org/D29799 MFC after: 1 week	2021-04-18 16:12:12 +01:00
Alexander V. Chernikov	6b8ef0d428	Add batched update support for the fib algo. Initial fib algo implementation was build on a very simple set of principles w.r.t updates: 1) algorithm is ether able to apply the change synchronously (DIR24-8) or requires full rebuild (bsearch, lradix). 2) framework falls back to rebuild on every error (memory allocation, nhg limit, other internal algo errors, etc). This changes brings the new "intermediate" concept - batched updates. Algotirhm can indicate that the particular update has to be handled in batched fashion (FLM_BATCH). The framework will write this update and other updates to the temporary buffer instead of pushing them to the algo callback. Depending on the update rate, the framework will batch 50..1024 ms of updates and submit them to a different algo callback. This functionality is handy for the slow-to-rebuild algorithms like DXR. Differential Revision: https://reviews.freebsd.org/D29588 Reviewed by: zec MFC after: 2 weeks	2021-04-14 23:54:11 +01:00
Tai-hwa Liang	d9b61e7153	if_firewire: fixing panic upon packet reception for VNET build netisr_dispatch_src() needs valid VNET pointer or firewire_input() will panic when receiving a packet. Reviewed by: glebius MFC after: 2 weeks	2021-04-13 22:59:58 +00:00
Kurosawa Takahiro	2aa21096c7	pf: Implement the NAT source port selection of MAP-E Customer Edge MAP-E (RFC 7597) requires special care for selecting source ports in NAT operation on the Customer Edge because a part of bits of the port numbers are used by the Border Relay to distinguish another side of the IPv4-over-IPv6 tunnel. PR: 254577 Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D29468	2021-04-13 10:53:18 +02:00
Alexander V. Chernikov	afbb64f1d8	Fix vlan creation for the older ifconfig(8) binaries. Reported by: allanjude MFC after: immediately	2021-04-11 18:13:09 +01:00
Alexander V. Chernikov	7f5f3fcc32	Fix direct route installation with net/bird. Slighly relax the gateway validation rules imposed by the `2fe5a79425`, by requiring only first 8 bytes (everyhing before sdl_data to be present in the AF_LINK gateway. Reported by: olivier	2021-04-10 16:31:16 +01:00
Alexander V. Chernikov	63dceebe68	Appease -Wsign-compare in radix.c Differential Revision: https://reviews.freebsd.org/D29661 Submitted by: zec MFC after 2 weeks	2021-04-10 13:48:25 +00:00
Alexander V. Chernikov	caf2f62765	Allow to specify debugnet fib in sysctl/tunable. Differential Revision: https://reviews.freebsd.org/D29593 Reviewed by: donner MFC after: 2 weeks	2021-04-10 13:47:49 +00:00
Kristof Provost	d710367d11	pf: Implement nvlist variant of DIOCGETRULE MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29559	2021-04-10 11:16:01 +02:00
Kristof Provost	5c62eded5a	pf: Introduce nvlist variant of DIOCADDRULE This will make future extensions of the API much easier. The intent is to remove support for DIOCADDRULE in FreeBSD 14. Reviewed by: markj (previous version), glebius (previous version) MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29557	2021-04-10 11:16:00 +02:00
Alexander V. Chernikov	ee2cf2b360	Implement better rebuild-delay fib algo policy. The intent is to better handle time intervals with large amount of RIB updates (e.g. BGP peer going up or down), while still keeping low sync delay for the rest scenarios. The implementation is the following: updates are bucketed into the buckets of size 50ms. If the number of updates within a current bucket exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket interval), the update is delayed for another 50ms. This can be repeated until the maximum update delay (1 sec) is reached. All 3 variables are runtime tunables: * net.route.algo.fib_max_sync_delay_ms: 1000 * net.route.algo.bucket_change_threshold_rate: 500 * net.route.algo.bucket_time_ms: 50 Differential Review: https://reviews.freebsd.org/D29588 MFC after: 2 weeks	2021-04-09 21:33:03 +01:00
Alexander V. Chernikov	9e5243d7b6	Enforce check for using the return result for ifa?_try_ref(). Suggested by: hps MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29504	2021-04-05 03:35:19 +01:00
Kristof Provost	4967f672ef	pf: Remove unused variable rt_listid from struct pf_krule Reviewed by: donner MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29639	2021-04-08 13:24:35 +02:00
Mark Johnston	274579831b	capsicum: Limit socket operations in capability mode Capsicum did not prevent certain privileged networking operations, specifically creation of raw sockets and network configuration ioctls. However, these facilities can be used to circumvent some of the restrictions that capability mode is supposed to enforce. Add capability mode checks to disallow network configuration ioctls and creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET internet sockets. Reviewed by: oshogbo Discussed with: emaste Reported by: manu Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29423	2021-04-07 14:32:56 -04:00
Vincenzo Maffione	361e950180	iflib: add support for netmap offsets Follow-up change to `a6d768d845`. This change adds iflib support for netmap offsets, enabling applications to use offsets on any driver backed by iflib.	2021-04-05 07:54:47 +00:00
Vincenzo Maffione	9bad2638cc	netmap: restore commit `a56e6334d1` The fix in `a56e6334d1` was accidentally reverted by commit `45c67e8f6b`.	2021-04-02 10:45:47 +00:00
Vincenzo Maffione	45c67e8f6b	netmap: several typo fixes No functional changes intended.	2021-04-02 07:01:20 +00:00
Konstantin Belousov	baacf70137	vxlan: correct interface MTU when using hw offloads Otherwise it breaks when offloading like checksum or TSO are used, because second (encapsulated) ip_output() processing passes fragments of the encapsulated packet down to the hardware interface. Diagnosed by: hselasky Reviewed by: np Sponsored by: Nvidia Networking / Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29501	2021-03-31 14:38:26 +03:00
Konstantin Belousov	e243367b64	mbuf: add a way to mark flowid as calculated from the internal headers In some settings offload might calculate hash from decapsulated packet. Reserve a bit in packet header rsstype to indicate that. Add m_adj_decap() that acts similarly to m_adj, but also either clear flowid if it is not marked as inner, or transfer it to the decapsulated header, clearing inner indicator. It depends on the internals of m_adj() that reuses the argument packet header for the result. Use m_adj_decap() for decapsulating vxlan(4) and gif(4) input packets. Reviewed by: ae, hselasky, np Sponsored by: Nvidia Networking / Mellanox Technologies MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28773	2021-03-31 14:38:26 +03:00
Alexander V. Chernikov	0c2a0e0380	Fix typo in the `9fa8d1582b`. Reported by: cy	2021-03-29 23:42:48 +00:00
Alexander V. Chernikov	9fa8d1582b	Put bandaid for nhgrp_dump_sysctl() malloc KASSERT(). Recent rtsock changes widened epoch and covered nhgrp_dump_sysctl(), resulting in `netstat -4On` triggering with KASSERT. MFC after: 1 day	2021-03-29 23:12:11 +00:00
Alexander V. Chernikov	0f30a36ded	Rename variables inside nexhtop group consider_resize() code. No functional changes. MFC after: 3 days	2021-03-29 23:06:13 +00:00
Alexander V. Chernikov	9095dc7da4	Fix nexhtop group index array scaling. The current code has the limit of 127 nexthop groups due to the wrongly-checked bitmask_copy() return value. PR: 254303 Reported by: Aleks <a.ivanov at veesp.com> MFC after: 1 day	2021-03-29 23:00:17 +00:00
Vincenzo Maffione	660a47cb99	netmap: monitor: add a flag to distinguish packet direction The netmap monitor intercepts any TX/RX packets on the monitored port. However, before this change there was no way to tell whether an intercepted packet was being transmitted or received on the monitored port. A TXMON flag in the netmap slot has been added for this purpose.	2021-03-29 16:32:54 +00:00
Vincenzo Maffione	a6d768d845	netmap: add kernel support for the "offsets" feature This feature enables applications to ask netmap to transmit or receive packets starting at a user-specified offset from the beginning of the netmap buffer. This is meant to ease those packet manipulation operations such as pushing or popping packet headers, that may be useful to implement software switches, routers and other packet processors. To use the feature, drivers (e.g., iflib, vtnet, etc.) must have explicit support. This change does not add support for any driver, but introduces the necessary kernel changes. However, offsets support is already included for VALE ports and pipes.	2021-03-29 16:29:01 +00:00
you@x	21d0c01226	netmap: iflib: add nm_config callback This per-driver callback is invoked by netmap when it wants to align the number of TX/RX netmap rings and/or the number of TX/RX netmap slots to the actual state configured in the hardware. The alignment happens when netmap mode is switched on (with no active netmap file descriptors for that netmap port), or when collecting netmap port information. MFC after: 1 week	2021-03-29 09:31:18 +00:00
Alexander V. Chernikov	6f43c72b47	Zero `struct weightened_nhop` fields in nhgrp_get_addition_group(). `struct weightened_nhop` has spare 32bit between the fields due to the alignment (on amd64). Not zeroing these spare bits results in duplicating nhop groups in the kernel due to the way how comparison works. MFC after: 1 day	2021-03-20 08:26:03 +00:00
Alexander V. Chernikov	24cd2796cf	Fix !VNET build broken by `66f138563b`.	2021-03-25 00:31:08 +00:00

1 2 3 4 5 ...

4654 Commits