freebsd-dev

Author	SHA1	Message	Date
Alexander V. Chernikov	53729367d3	Fix subinterface vlan creation. D26436 introduced support for stacked vlans that changed the way vlans are configured. In particular, this change broke setups that have same-number vlans as subinterfaces. Vlan support was initially created assuming "vlanX" semantics. In this paradigm, automatic number assignment supported by cloning (ifconfig vlan create) was a natural fit. When "ifaceX.Y" support was added, allowing to have the same vlan number on multiple devices, cloning code became more complex, as the is no unified "vlan" namespace anymore. Such interfaces got the first spare index from "vlan" cloner. This, in turn, led to the following problem: ifconfig ix0.333 create -> index 1 ifconfig ix0.444 create -> index 2 ifconfig vlan2 create -> allocation failure This change fixes such allocations by using cloning indexes only for "vlanX" interfaces. Reviewed by: hselasky MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D27505	2021-01-29 21:43:20 +00:00
Gleb Smirnoff	3f43ada98c	Catch up with `6edfd179c8`: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG. Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was extended to array of such pointers. Then, such mbufs were augmented with header/trailer. Basically, extended mbufs are extended, and set of features is subject to change. The new name should be generic enough to avoid further renaming.	2021-01-29 11:46:24 -08:00
Randall Stewart	1a714ff204	This pulls over all the changes that are in the netflix tree that fix the ratelimit code. There were several bugs in tcp_ratelimit itself and we needed further work to support the multiple tag format coming for the joint TLS and Ratelimit dances. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28357	2021-01-28 11:53:05 -05:00
John Baldwin	36e0a362ac	Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc(). This gives a more uniform API for send tag life cycle management. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27000	2020-10-29 23:28:39 +00:00
John Baldwin	521eac97f3	Support hardware rate limiting (pacing) with TLS offload. - Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type). - When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag. - When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable. - When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone. - Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock. - Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46]. Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691	2020-10-29 00:23:16 +00:00
Alexander V. Chernikov	c7cffd65c5	Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q). 802.1ad interfaces are created with ifconfig using the "vlanproto" parameter. Eg., the following creates a 802.1Q VLAN (id #42) over a 802.1ad S-VLAN (id #5) over a physical Ethernet interface (em0). ifconfig vlan5 create vlandev em0 vlan 5 vlanproto 802.1ad up ifconfig vlan42 create vlandev vlan5 vlan 42 inet 10.5.42.1/24 VLAN_MTU, VLAN_HWCSUM and VLAN_TSO capabilities should be properly supported. VLAN_HWTAGGING is only partially supported, as there is currently no IFCAP_VLAN_* denoting the possibility to set the VLAN EtherType to anything else than 0x8100 (802.1ad uses 0x88A8). Submitted by: Olivier Piras Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D26436	2020-10-21 21:28:20 +00:00
John Baldwin	56fb710f1b	Store the send tag type in the common send tag header. Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag structures inherited from, so in practice it seems drivers need this in the tag anyway. This permits removing these extra header indirections (struct cxgbe_snd_tag and struct mlx5e_snd_tag). In addition, this permits driver-independent code to query the type of a tag, e.g. to know what type of tag is being queried via if_snd_query. Reviewed by: gallatin, hselasky, np, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26689	2020-10-06 17:58:56 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Kristof Provost	eb03a44325	vlan: Fix panic when vnet jail with a vlan interface is destroyed During vnet cleanup vnet_if_uninit() checks that no more interfaces remain in the vnet. Any interface borrowed from another vnet is returned by vnet_if_return(). Other interfaces (i.e. cloned interfaces) should have been destroyed by their cloner at this point. The if_vlan VNET_SYSUNINIT had priority SI_ORDER_FIRST, which means it had equal priority as vnet_if_uninit(). In other words: it was possible for it to be called after vnet_if_uninit(), which would lead to assertion failures. Set the priority to SI_ORDER_ANY, like other cloners to ensure that vlan interfaces are destroyed before we enter vnet_if_uninit(). The sys/net/if_vlan test provoked this.	2020-01-31 22:54:44 +00:00
Alexander V. Chernikov	4be465ab46	Plug parent iface refcount leak on <ifname>.X vlan creation. PR: kern/242270 Submitted by: Andrew Boyer <aboyer at pensando.io> MFC after: 2 weeks	2020-01-29 18:41:35 +00:00
Alexander Motin	84becee1ac	Update route MTUs for bridge, lagg and vlan interfaces. Those interfaces may implicitly change their MTU on addition of parent interface in addition to normal SIOCSIFMTU ioctl path, where the route MTUs are updated normally. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-01-22 20:36:45 +00:00
Gleb Smirnoff	2a4bd982d0	Introduce NET_EPOCH_CALL() macro and use it everywhere where we free data based on the network epoch. The macro reverses the argument order of epoch_call(9) - first function, then its argument. NFC	2020-01-15 06:05:20 +00:00
Andrey V. Elsukov	a961401ee0	Enqueue lladdr_task to update link level address of vlan, when its parent interface has changed. During vlan reconfiguration without destroying interface, it is possible, that parent interface will be changed. This usually means, that link layer address of vlan will be different. Therefore we need to update all associated with vlan's addresses permanent llentries - NDP for IPv6 addresses, and ARP for IPv4 addresses. This is done via lladdr_task execution. To avoid extra work, before execution do the check, that L2 address is different. No objection from: #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D22243	2019-11-07 15:00:37 +00:00
Gleb Smirnoff	b2807792f1	Revert two parts of r353292 that enter epoch when processing vlan capabilities. It could be that entering epoch isn't necessary here, but better take a conservative approach. Submitted by: kp	2019-10-17 20:18:07 +00:00
Gleb Smirnoff	6dcec895d9	vlan_config() isn't always called in epoch context. Reported by: kp	2019-10-13 15:15:09 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Gleb Smirnoff	bf7700e44f	style(9): remove extraneous empty lines	2019-09-25 20:46:09 +00:00
Matt Joras	16cf6bdbb6	Wrap a vlan's parent's if_output in a separate function. When a vlan interface is created, its if_output is set directly to the parent interface's if_output. This is fine in the normal case but has an unfortunate consequence if you end up with a certain combination of vlan and lagg interfaces. Consider you have a lagg interface with a single laggport member. When an interface is added to a lagg its if_output is set to lagg_port_output, which blackholes traffic from the normal networking stack but not certain frames from BPF (pseudo_AF_HDRCMPLT). If you now create a vlan with the laggport member (not the lagg interface) as its parent, its if_output is set to lagg_port_output as well. While this is confusing conceptually and likely represents a misconfigured system, it is not itself a problem. The problem arises when you then remove the lagg interface. Doing this resets the if_output of the laggport member back to its original state, but the vlan's if_output is left pointing to lagg_port_output. This gives rise to the possibility that the system will panic when e.g. bpf is used to send any frames on the vlan interface. Fix this by creating a new function, vlan_output, which simply wraps the parent's current if_output. That way when the parent's if_output is restored there is no stale usage of lagg_port_output. Reviewed by: rstone Differential Revision: D21209	2019-08-30 20:19:43 +00:00
John Baldwin	b2e60773c6	Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277	2019-08-27 00:01:56 +00:00
John Baldwin	66d0c056be	Support IFCAP_NOMAP in vlan(4). Enable IFCAP_NOMAP for a vlan interface if it is supported by the underlying trunk device. Reviewed by: gallatin, hselasky, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:51:38 +00:00
John Baldwin	fb3bc59600	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117	2019-05-24 22:30:40 +00:00
Randall Stewart	fa91f84502	This commit adds the missing release mechanism for the ratelimiting code. The two modules (lagg and vlan) did have allocation routines, and even though they are indirect (and vector down to the underlying interfaces) they both need to have a free routine (that also vectors down to the actual interface). Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D19032	2019-02-13 14:57:59 +00:00
Gleb Smirnoff	7b7f772fa0	Bring the comment up to date.	2019-01-10 00:37:14 +00:00
Mark Johnston	72755d285f	Stop setting if_linkmib in vlan(4) ifnets. There are several reasons: - The structure being exported via IFDATA_LINKSPECIFIC doesn't appear to be a standard MIB. - The structure being exported is private to the kernel and always has been. - No other drivers in common use set the if_linkmib field. - Because IFDATA_LINKSPECIFIC can be used to overwrite the linkmib structure, a privileged user could use it to corrupt internal vlan(4) state. [1] PR: 219472 Reported by: CTurt <ecturt@gmail.com> [1] Reviewed by: kp (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18779	2019-01-09 16:47:16 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Oleg Bulyzhin	cac302483e	Unbreak kernel build with VLAN_ARRAY defined. MFC after: 1 week	2018-11-21 13:34:21 +00:00
Kristof Provost	5191a3aea6	vlan: Fix panic with lagg and vlan vlan_lladdr_fn() is called from taskqueue, which means there's no vnet context set. We can end up trying to send ARP messages (through the iflladdr_event event), which requires a vnet context. PR: 227654 MFC after: 3 days	2018-10-21 16:51:35 +00:00
Hans Petter Selasky	c32a9d66fb	Fix deadlock when destroying VLANs. Synchronizing the epoch before freeing the multicast addresses while holding the VLAN_XLOCK() might lead to a deadlock. Use deferred freeing of the VLAN multicast addresses to resolve deadlock. Backtrace: Thread1: epoch_block_handler_preempt() ck_epoch_synchronize_wait() epoch_wait_preempt() vlan_setmulti() vlan_ioctl() in6m_release_task() gtaskqueue_run_locked() gtaskqueue_thread_loop() fork_exit() fork_trampoline() Thread2: sleepq_switch() sleepq_wait() _sx_xlock_hard() _sx_xlock() in6_leavegroup() in6_purgeaddr() if_purgeaddrs() if_detach_internal() if_detach() vlan_clone_destroy() if_clone_destroyif() if_clone_destroy() ifioctl() kern_ioctl() sys_ioctl() amd64_syscall() fast_syscall_common() syscall() Differential revision: https://reviews.freebsd.org/D17496 Reviewed by: slavash, mmacy Approved by: re (kib) Sponsored by: Mellanox Technologies	2018-10-15 10:29:29 +00:00
Matt Macy	b08d611de8	fix vlan locking to permit sx acquisition in ioctl calls - update vlan(9) to handle changes earlier this year in multicast locking Tested by: np@, darkfiberu at gmail.com PR: 230510 Reviewed by: mjoras@, shurd@, sbruno@ Approved by: re (gjb@) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16808	2018-09-21 01:37:08 +00:00
Navdeep Parhar	32a52e9e39	if_vlan(4): A VLAN always has a PCP and its ifnet's if_pcp should be set to the PCP value in use instead of IFNET_PCP_NONE. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-17 01:03:23 +00:00
Navdeep Parhar	32d2623ae2	Add the ability to look up the 3b PCP of a VLAN interface. Use it in toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752	2018-08-16 23:46:38 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Hans Petter Selasky	4a381a9e42	Add network device event for priority code point, PCP, changes. When the PCP is changed for either a VLAN network interface or when prio tagging is enabled for a regular ethernet network interface, broadcast the IFNET_EVENT_PCP event so applications like ibcore can update its GID tables accordingly. MFC after: 3 days Reviewed by: ae, kib Differential Revision: https://reviews.freebsd.org/D15040 Sponsored by: Mellanox Technologies	2018-04-26 08:58:27 +00:00
Brooks Davis	541d96aaaf	Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900	2018-03-30 18:50:13 +00:00
Brooks Davis	38d958a647	Improve copy-and-pasted versions of SIOCGIFADDR. The original implementation used a reference to ifr_data and a cast to do the equivalent of accessing ifr_addr. This was copied multiple times since 1996. Approved by: kib MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14873	2018-03-27 20:51:49 +00:00
Konstantin Belousov	f137973487	Allow to specify PCP on packets not belonging to any VLAN. According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be considered as untagged, and only PCP and DEI values from the VLAN tag are meaningful. See for instance https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html. Make it possible to specify PCP value for outgoing packets on an ethernet interface. When PCP is supplied, the tag is appended, VLAN id set to 0, and PCP is filled by the supplied value. The code to do VLAN tag encapsulation is refactored from the if_vlan.c and moved into if_ethersubr.c. Drivers might have issues with filtering VID 0 packets on receive. This bug should be fixed for each driver. Reviewed by: ae (previous version), hselasky, melifaro Sponsored by: Mellanox Technologies MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14702	2018-03-27 15:29:32 +00:00
Pedro F. Giffuni	ac2fffa4b7	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
Pedro F. Giffuni	443133416b	net*: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:21:51 +00:00
Matt Joras	fdbf11746a	Allow vlan interfaces to rx through netmap(4). Normally after receiving a packet, a vlan(4) interface sends the packet back through its parent interface's rx routine so that it can be processed as an untagged frame. It does this by using the parent's ifp->if_input. This is incompatible with netmap(4), which replaces the vlan(4) interface's if_input with a netmap(4) hook. Fix this by using the vlan(4) interface's ifp instead of the parent's directly. Reported by: Harry Schmalzbauer <freebsd@omnilan.de> Reviewed by: rstone Approved by: rstone (mentor) MFC after: 3 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12191	2017-09-13 00:25:09 +00:00
Matt Joras	d148c2a2b1	Rework vlan(4) locking. Previously the locking of vlan(4) interfaces was not very comprehensive. Particularly there was very little protection against the destruction of active vlan(4) interfaces or concurrent modification of a vlan(4) interface. The former readily produced several different panics. The changes can be summarized as using two global vlan locks (an rmlock(9) and an sx(9)) to protect accesses to the if_vlantrunk field of struct ifnet, in addition to other places where global exclusive access is required. vlan(4) should now be much more resilient to the destruction of active interfaces and concurrent calls into the configuration path. PR: 220980 Reviewed by: ae, markj, mav, rstone Approved by: rstone (mentor) MFC after: 4 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11370	2017-08-15 17:52:37 +00:00
Alexander Motin	9bcf3ae4c7	Add parent interface reference counting to if_vlan. Using plain ifunit() looks like a request for troubles. MFC after: 1 week	2017-05-23 00:13:27 +00:00
Alexander Motin	59150e9141	Propagate IFCAP_LRO from trunk to vlan interface. False positive here cost nothing, while false negative may lead to some confusions. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2017-04-29 08:28:59 +00:00
Alexander Motin	d89baa5aac	Allow some control over enabled capabilities for if_vlan. It improves interoperability with if_bridge, which may need to disable some capabilities not supported by other members. IMHO there is still open question about LRO capability, which may need to be disabled on physical interface. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2017-04-28 11:00:58 +00:00
Andrey V. Elsukov	070f87e878	Inherit IPv6 checksum offloading flags to vlan interfaces. if_vlan(4) interfaces inherit IPv4 checksum offloading flags from the parent when VLAN_HWCSUM and VLAN_HWTAGGING flags are present on the parent interface. Do the same for IPv6 checksum offloading flags. Reported by: Harry Schmalzbauer Reviewed by: np, gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10356	2017-04-11 19:23:25 +00:00
Hans Petter Selasky	f3e7afe2d7	Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months	2017-01-18 13:31:17 +00:00
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Marcelo Araujo	2ccbbd06d2	Add support to priority code point (PCP) that is an 3-bit field which refers to IEEE 802.1p class of service and maps to the frame priority level. Values in order of priority are: 1 (Background (lowest)), 0 (Best effort (default)), 2 (Excellent effort), 3 (Critical applications), 4 (Video, < 100ms latency), 5 (Video, < 10ms latency), 6 (Internetwork control) and 7 (Network control (highest)). Example of usage: root# ifconfig em0.1 create root# ifconfig em0.1 vlanpcp 3 Note: The review D801 includes the pf(4) part, but as discussed with kristof, we won't commit the pf(4) bits for now. The credits of the original code is from rwatson. Differential Revision: https://reviews.freebsd.org/D801 Reviewed by: gnn, adrian, loos Discussed with: rwatson, glebius, kristof Tested by: many including Matthew Grooms <mgrooms__shrew.net> Obtained from: pfSense Relnotes: Yes	2016-06-06 09:51:58 +00:00
Pedro F. Giffuni	a4641f4eaa	sys/net*: minor spelling fixes. No functional change.	2016-05-03 18:05:43 +00:00
Alexander V. Chernikov	8ad43f2d0a	Move iflladdr_event eventhandler invocation to if_setlladdr. Suggested by: glebius	2015-11-14 13:34:03 +00:00

1 2 3 4 5

244 Commits