freebsd-skq

Author	SHA1	Message	Date
bz	84630aa64f	Update udp6_output() inp locking to avoid concurrency issues with route cache updates. Bring over locking changes applied to udp_output() for the route cache in r297225 and fixed in r306559 which achieve multiple things: (1) acquire an exclusive inp lock earlier depending on the expected conditions; we add a comment explaining this in udp6, (2) having acquired the exclusive lock earlier eliminates a slight possible chance for a race condition which was present in v4 for multiple years as well and is now gone, and (3) only pass the inp_route6 to ip6_output() if we are holding an exclusive inp lock, so that possible route cache updates in case of routing table generation number changes can happen safely. In addition this change (as the legacy IP counterpart) decomposes the tracking of inp and pcbinfo lock and adds extra assertions, that the two together are acquired correctly. PR: 230950 Reviewed by: karels, markj Approved by: re (gjb) Pointyhat to: bz (for completely missing this bit) Differential Revision: https://reviews.freebsd.org/D17230	2018-09-19 18:49:37 +00:00
markj	6131d60f6a	Fix synchronization of LB group access. Lookups are protected by an epoch section, so the LB group linkage must be a CK_LIST rather than a plain LIST. Furthermore, we were not deferring LB group frees, so in_pcbremlbgrouphash() could race with readers and cause a use-after-free. Reviewed by: sbruno, Johannes Lundberg <johalun0@gmail.com> Tested by: gallatin Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17031	2018-09-10 19:00:29 +00:00
bz	209555fd34	Replicate r328271 from legacy IP to IPv6 using a single macro to clear L2 and L3 route caches. Also mark one function argument as __unused. Reviewed by: karels, ae Approved by: re (rgrimes) Differential Revision: https://reviews.freebsd.org/D17007	2018-09-03 22:27:27 +00:00
bz	78d4f16823	Replicate r307234 from legacy IP to IPv6 code, using the RO_RTFREE() macro rather than hand crafted code. No functional changes. Reviewed by: karels Approved by: re (rgrimes) Differential Revision: https://reviews.freebsd.org/D17006	2018-09-03 22:14:37 +00:00
bz	a789306797	As discussed in D6262 post-commit review, change inp_route to inp_route6 for IPv6 code after r301217. This was most likely a c&p error from the legacy IP code, which did not matter as it is a union and both structures have the same layout at the beginning. No functional changes. Reviewed by: karels, ae Approved by: re (rgrimes) Differential Revision: https://reviews.freebsd.org/D17005	2018-09-03 22:12:48 +00:00
kp	82b4e78e14	frag6: Fix fragment reassembly r337776 started hashing the fragments into buckets for faster lookup. The hashkey is larger than intended. This results in random stack data being included in the hashed data, which in turn means that fragments of the same packet might end up in different buckets, causing the reassembly to fail. Set the correct size for hashkey. PR: 231045 Approved by: re (kib) MFC after: 3 days	2018-08-31 08:37:15 +00:00
gallatin	c5fa7c7210	Reject IPv4 SO_REUSEPORT_LB groups when looking up an IPv6 listening socket Similar to how the IPv4 code will reject an IPv6 LB group, we must ignore IPv4 LB groups when looking up an IPv6 listening socket. If this is not done, a port only match may return an IPv4 socket, which causes problems (like sending IPv6 packets with a hopcount of 0, making them unrouteable). Thanks to rrs for all the work to diagnose this. Approved by: re (rgrimes) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16899	2018-08-27 18:13:20 +00:00
bz	be242dfa60	Unbreak RSS builds after r338257. Folding both RSS blocks together I missed the closing } of the new combined block. Pointyhat to: bz Reported by: np Approved by: re (kib)	2018-08-24 21:49:21 +00:00
bz	415411ca28	MFp4 bz_ipv6_fast: Migrate udp6_send() v4mapped code to udp6_output() saving us a re-lock and further simplifying the address-family handling code by eliminating AF_INET checks and almost all v4mapped handling right after the start as cases could actually not happen anymore. Rework output path locking similar to UDP4 allowing for better parallelism (see r222488, and later versions). Sponsored by: The FreeBSD Foundation (2012) Sponsored by: iXsystems (2012) Differential Revision: https://reviews.freebsd.org/D3721	2018-08-23 16:54:22 +00:00
mmacy	47fa74161c	in_mcast: fix copy paste error when clearing flag	2018-08-22 04:09:55 +00:00
mmacy	d41a2ca02d	Fix null deref in mld_v1_transmit_report After r337866 it is possible for an in_multi6 to be referenced while mid teardown. Handle case of cleared ifnet pointer. Reported by: ae	2018-08-21 23:03:02 +00:00
ae	93218b657a	Properly initialize IP version in IPv6 header. This was missed in r334673. Reported by: Lars Schotte <lars at gustik dot eu>	2018-08-16 09:19:06 +00:00
mmacy	99cec0a00c	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu	2018-08-15 20:23:08 +00:00
jtl	bd4f87b859	Lower the default limits on the IPv6 reassembly queue. Currently, the limits are quite high. On machines with millions of mbuf clusters, the reassembly queue limits can also run into the millions. Lower these values. Also, try to ensure that no bucket will have a reassembly queue larger than approximately 100 items. This limits the cost to find the correct reassembly queue when processing an incoming fragment. Due to the low limits on each bucket's length, increase the size of the hash table from 64 to 1024. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:32:07 +00:00
jtl	55789af7ee	Drop 0-byte IPv6 fragments. Currently, we process IPv6 fragments with 0 bytes of payload, add them to the reassembly queue, and do not recognize them as duplicating or overlapping with adjacent 0-byte fragments. An attacker can exploit this to create long fragment queues. There is no legitimate reason for a fragment with no payload. However, because IPv6 packets with an empty payload are acceptable, allow an "atomic" fragment with no payload. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:29:22 +00:00
jtl	e5f23fbf44	Implement a limit on on the number of IPv6 reassembly queues per bucket. There is a hashing algorithm which should distribute IPv6 reassembly queues across the available buckets in a relatively even way. However, if there is a flaw in the hashing algorithm which allows a large number of IPv6 fragment reassembly queues to end up in a single bucket, a per- bucket limit could help mitigate the performance impact of this flaw. Implement such a limit, with a default of twice the maximum number of reassembly queues divided by the number of buckets. Recalculate the limit any time the maximum number of reassembly queues changes. However, allow the user to override the value using a sysctl (net.inet6.ip6.maxfragbucketsize). Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:27:41 +00:00
jtl	a7668fa529	Add a limit of the number of fragments per IPv6 packet. The IPv4 fragment reassembly code supports a limit on the number of fragments per packet. The default limit is currently 17 fragments. Among other things, this limit serves to limit the number of fragments the code must parse when trying to reassembly a packet. Add a limit to the IPv6 reassembly code. By default, limit a packet to 65 fragments (64 on the queue, plus one final fragment to complete the packet). This allows an average fragment size of 1,008 bytes, which should be sufficient to hold a fragment. (Recall that the IPv6 minimum MTU is 1280 bytes. Therefore, this configuration allows a full-size IPv6 packet to be fragmented on a link with the minimum MTU and still carry approximately 272 bytes of headers before the fragmented portion of the packet.) Users can adjust this limit using the net.inet6.ip6.maxfragsperpacket sysctl. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:26:07 +00:00
jtl	1f361945df	Make the IPv6 fragment limits be global, rather than per-VNET, limits. The IPv6 reassembly fragment limit is based on the number of mbuf clusters, which are a global resource. However, the limit is currently applied on a per-VNET basis. Given enough VNETs (or given sufficient customization on enough VNETs), it is possible that the sum of all the VNET fragment limits will exceed the number of mbuf clusters available in the system. Given the fact that the fragment limits are intended (at least in part) to regulate access to a global resource, the IPv6 fragment limit should be applied on a global basis. Note that it is still possible to disable fragmentation for a particular VNET by setting the net.inet6.ip6.maxfragpackets sysctl to 0 for that VNET. In addition, it is now possible to disable fragmentation globally by setting the net.inet6.ip6.maxfrags sysctl to 0. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:24:26 +00:00
jtl	dca433c72f	Improve IPv6 reassembly performance by hashing fragments into buckets. Currently, all IPv6 fragment reassembly queues are kept in a flat linked list. This has a number of implications. Two significant implications are: all reassembly operations share a common lock, and it is possible for the linked list to grow quite large. Improve IPv6 reassembly performance by hashing fragments into buckets, each of which has its own lock. Calculate the hash key using a Jenkins hash with a random seed. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:17:37 +00:00
tuexen	f64223beb5	Use a macro to set the assoc state. I missed this in r337706.	2018-08-14 08:33:47 +00:00
ae	694891e438	Restore ability to send ICMP and ICMPv6 redirects. It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled only when sending redirects for corresponding IP version is disabled via sysctl. Otherwise will be used default forwarding function. PR: 221137 Submitted by: mckay@ MFC after: 2 weeks	2018-08-14 07:54:14 +00:00
luporl	2f30606f2f	[ppc] Fix kernel panic when using BOOTP_NFSROOT On PowerPC (and possibly other architectures), that doesn't use EARLY_AP_STARTUP, the config task queue may be used initialized. This was observed while trying to mount the root fs from NFS, as reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168. This patch has 2 main changes: 1- Perform a basic initialization of qgroup_config, similar to what is done in taskqgroup_adjust, but simpler. This makes qgroup_config ready to be used during NFS root mount. 2- When EARLY_AP_STARTUP is not used, call inm_init() and in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs to send multicast packages to request an IP. PR: Bug 230168 Reported by: sbruno Reviewed by: jhibbits, mmacy, sbruno Approved by: jhibbits Differential Revision: D16633	2018-08-09 14:04:51 +00:00
tuexen	17f71a271f	Add a dtrace provider for UDP-Lite. The dtrace provider for UDP-Lite is modeled after the UDP provider. This fixes the bug that UDP-Lite packets were triggering the UDP provider. Thanks to dteske@ for providing the dwatch module. Reviewed by: dteske@, markj@, rrs@ Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16377	2018-07-31 22:56:03 +00:00
tuexen	e122a5a1f6	Allow implicit TCP connection setup for TCP/IPv6. TCP/IPv4 allows an implicit connection setup using sendto(), which is used for TTCP and TCP fast open. This patch adds support for TCP/IPv6. While there, improve some tests for detecting multicast addresses, which are mapped. Reviewed by: bz@, kbowling@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16458	2018-07-30 21:27:26 +00:00
asomers	b3776cb8de	Make timespecadd(3) and friends public The timespecadd(3) family of macros were imported from NetBSD back in r35029. However, they were initially guarded by #ifdef _KERNEL. In the meantime, we have grown at least 28 syscalls that use timespecs in some way, leading many programs both inside and outside of the base system to redefine those macros. It's better just to make the definitions public. Our kernel currently defines two-argument versions of timespecadd and timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define three-argument versions. Solaris also defines a three-argument version, but only in its kernel. This revision changes our definition to match the common three-argument version. Bump _FreeBSD_version due to the breaking KPI change. Discussed with: cem, jilles, ian, bde Differential Revision: https://reviews.freebsd.org/D14725	2018-07-30 15:46:40 +00:00
andrew	a6605d2938	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
tuexen	ff46e28acc	Add missing dtrace probes for received UDP packets. Fire UDP receive probes when a packet is received and there is no endpoint consuming it. Fire the probe also if the TTL of the received packet is smaller than the minimum required by the endpoint. Clarify also in the man page, when the probe fires. Reviewed by: dteske@, markj@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16046	2018-07-20 15:32:20 +00:00
tuexen	9bf2bb1b21	Whitespace changes due to changes in ident.	2018-07-19 20:16:33 +00:00
tuexen	14de4a3d5b	Revert https://svnweb.freebsd.org/changeset/base/336503 since I also ran the export script with different parameters.	2018-07-19 20:11:14 +00:00
tuexen	5810243631	Whitespace changes due to change if ident.	2018-07-19 19:33:42 +00:00
ae	d94c744a40	Move invoking of callout_stop(&lle->lle_timer) into llentry_free(). This deduplicates the code a bit, and also implicitly adds missing callout_stop() to in[6]_lltable_delete_entry() functions. PR: 209682, 225927 Submitted by: hselasky (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4605	2018-07-17 11:33:23 +00:00
mmacy	fd2ad050dc	acquire inp lock around ip6_pcbopt to fix IPV6_TCLASS panic Simple fix to address panics relating to setting IPV6_TCLASS with setsockopt(). The premise of this change is that it is ok to call malloc with M_NOWAIT while holding a lock on the in6p. If it later turns out that it is not ok, then major surgery will be required, as ip6_setpktopt() will have to be fixed (as it also calls malloc with M_NOWAIT) which pulls in the ip6_pcbopts(), ip6_setpktopts(), ip6_setpktopt() call chain. Submitted by: Jason Eggnet Reviewed by: rrs, transport, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16201	2018-07-15 00:47:06 +00:00
mmacy	1d12a6fcd3	fix 335919 - check "last" not "inp" where appropriate Submitted by: ae Reported by: cy	2018-07-04 16:34:07 +00:00
mmacy	14de8a2820	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
mmacy	17a2ff6226	udp6_input: validate inpcb before use When traversing pcbinfo lists (rather than calling lookup) we need to explicitly validate an inpcb before use.	2018-07-03 23:30:53 +00:00
mmacy	4dacdb5826	in6_pcblookup_hash: validate inp for liveness	2018-07-01 01:01:59 +00:00
ae	fd52110019	Add NULL pointer check. encap_lookup_t method can be invoked by IP encap subsytem even if none of gif/gre/me interfaces are exist. Hash tables are allocated on demand, when first interface is created. So, make NULL pointer check before doing access to hash table. PR: 229378	2018-06-28 11:39:27 +00:00
ae	a58623ba71	Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Rmlock fits better for such workloads. Reviewed by: melifaro, olivier Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15789	2018-06-16 08:26:23 +00:00
ae	1879a98b2d	Add NULL check like the rest of code has. It is possible that ifma_protospec becomes NULL in this function for some entry, but it is still referenced and thus it will not unlinked from the list. Then "restart" condition triggers and this entry with NULL ifma_protospec will lead to page fault. PR: 228982	2018-06-14 09:36:25 +00:00
ae	5568626787	Remove stale comment. in6_ifdetach() can be called from places where addresses are not removed yet.	2018-06-14 09:29:39 +00:00
mmacy	255aa2f16f	Fix PCBGROUPS build post CK conversion of pcbinfo	2018-06-13 23:19:54 +00:00
ae	e6c79fbed1	Rework if_gre(4) to use encap_lookup_t method to speedup lookup of needed interface when many gre interfaces are present. Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc.	2018-06-13 11:11:33 +00:00
mmacy	1cbc14be82	mechanical CK macro conversion of inpcbinfo lists This is a dependency for converting the inpcbinfo hash and info rlocks to epoch.	2018-06-12 22:18:20 +00:00
sbruno	d0aeaa5af7	Load balance sockets with new SO_REUSEPORT_LB option. This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures: Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations: As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). This is a substantially different contribution as compared to its original incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@ for the substantive feedback that is included in this commit. Submitted by: Johannes Lundberg <johalun0@gmail.com> Obtained from: DragonflyBSD Relnotes: Yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003	2018-06-06 15:45:57 +00:00
ae	f21fcbdd6f	Use m_copyback() function to write delayed checksum when it isn't located in the first mbuf of the chain. MFC after: 1 week	2018-06-06 10:46:24 +00:00
ae	e06398dd48	Fix LINT-NOINET build. Use known at build time size for min_length value. Also remove the check from in6_gre_encapcheck(), now it is done in generic code.	2018-06-06 05:17:21 +00:00
ae	d1ee857bcf	Rework if_gif(4) to use new encap_lookup_t method to speedup lookup of needed interface when many gif interfaces are present. Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc. Interfaces with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST. Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed 16 years ago, and actually it could work only for outbound direction. Each protocol, that can be handled by if_gif(4) interface is registered by separate encap handler, this helps avoid invoking the handler for unrelated protocols (GRE, PIM, etc.). This change allows dramatically improve performance when many gif(4) interfaces are used. Sponsored by: Yandex LLC	2018-06-05 21:24:59 +00:00
ae	8066b881af	Constify argument of in6_getscope().	2018-06-05 20:54:29 +00:00
ae	dfbd18b5fe	Rework IP encapsulation handling code. Currently it has several disadvantages: - it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all. - it uses single list to keep encap handlers for both INET and INET6 families. - struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree. - matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. The search takes O(n) for n interfaces. All this work is done with exclusive lock held. What this patch includes: - the datapath is converted to be lockless using epoch(9) KPI. - struct encaptab now linked using CK_LIST. - all unused fields removed from struct encaptab. Several new fields addedr: min_length is the minimum packet length, that encapsulation handler expects to see; exact_match is maximum number of bits, that can return an encapsulation handler, when it wants to consume a packet. - IPv6 and IPv4 handlers are stored in separate lists; - added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces. - the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it. - encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input_t method. encap_getarg() funtions is removed. - all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck() to determine the needed handler. - introduced struct encap_config, it contains parameters of encap handler that is going to be registered by encap_attach() function. - encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped. - all current consumers changed to use new KPI. Reviewed by: mmacy Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15617	2018-06-05 20:51:01 +00:00
ae	a4e83add6a	Remove empty encap_init() function. MFC after: 2 weeks	2018-05-29 12:32:08 +00:00
mmacy	c937b516d8	CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors	2018-05-24 23:21:23 +00:00
mmacy	ecd6e9d307	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
emaste	3de0b917bf	Pair CURVNET_SET and CURVNET_RESTORE in a block Per vnet(9), CURVNET_SET and CURVNET_RESTORE cannot be used as a single statement for a conditional and CURVNET_RESTORE must be in the same block as CURVNET_SET (or a subblock). Reviewed by: andrew Sponsored by: The FreeBSD Foundation	2018-05-21 13:08:44 +00:00
emaste	326c8de9ed	Revert r333968, it broke all archs but i386 and amd64	2018-05-21 11:56:07 +00:00
mmacy	649df389bf	in(6)_mcast: Expand out vnet set / restore macro so that they work in a conditional block Reported by: zec at fer.hr	2018-05-21 08:34:10 +00:00
mmacy	117b59274f	make sure vnet is set when freeing Reported by: pho	2018-05-20 20:48:26 +00:00
mmacy	ce91e745ec	ip(6)_freemoptions: defer imo destruction to epoch callback task Avoid the ugly unlock / lock of the inpcbinfo where we need to figure out what kind of lock we hold by simply deferring the operation to another context. (Also a small dependency for converting the pcbinfo read lock to epoch)	2018-05-20 00:22:28 +00:00
mmacy	7aeac9ef18	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
brooks	f7adfa5151	Unwrap some not-so-long lines now that extra tabs been removed.	2018-05-15 17:59:46 +00:00
brooks	f5d8b3c62e	Remove stray tabs in in6_lltable_dump_entry(). NFC.	2018-05-15 17:57:46 +00:00
shurd	210352aeb1	Fix LORs in in6?_leave_group() r333175 updated the join_group functions, but not the leave_group ones. Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15393	2018-05-11 21:42:27 +00:00
gallatin	21f42492ac	Fix a panic in the IPv6 multicast code. Use LIST_FOREACH_SAFE in in6m_disconnect() since we're deleting and freeing item from the membership list while traversing the list. Reviewed by: mmacy Sponsored by: Netflix	2018-05-10 16:19:41 +00:00
hselasky	95d5665495	Fix for missing network interface address event when adding the default IPv6 based link-local address. The default link local address for IPv6 is added as part of bringing the network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)" from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should catch all the cases of adding IPv6 based addresses to a network interface. Add a witness warning in case the event handler is not allowed to sleep. Reviewed by: network (ae), kib Differential Revision: https://reviews.freebsd.org/D13407 MFC after: 1 week Sponsored by: Mellanox Technologies	2018-05-08 11:39:01 +00:00
mmacy	d3f138323c	r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a a multicast update would still be queued for deletion when ifconfig deleted the interface thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses on the interface. Synchronously remove all external references to a multicast address before enqueueing for delete. Reported by: lwhsu Approved by: sbruno	2018-05-06 20:34:13 +00:00
mmacy	d4a327c789	Currently in_pcbfree will unconditionally wunlock the pcbinfo lock to avoid a LOR on the multicast list lock in the freemoptions routines. As it turns out, tcp_usr_detach can acquire the tcbinfo lock readonly. Trying to wunlock the pcbinfo lock in that context has caused a number of reported crashes. This change unclutters in_pcbfree and moves the handling of wunlock vs runlock of pcbinfo to the freemoptions routine. Reported by: mjg@, bde@, o.hartmann at walstatt.org Approved by: sbruno	2018-05-05 22:40:40 +00:00
tuexen	10bc3d3127	Send an ICMPv6 PacketTooBig message in case of forwading a packet which is too big for the outgoing interface and no firewall is involed. This problem was introduced in https://svnweb.freebsd.org/changeset/base/324996 Thanks to Irene Ruengeler for finding the bug and testing the fix. Reviewed by: kp@ MFC after: 3 days	2018-05-02 22:11:16 +00:00
shurd	7d4b8facc7	Separate list manipulation locking from state change in multicast Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead. Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969	2018-05-02 19:36:29 +00:00
sbruno	257e6e5563	Revert r332894 at the request of the submitter. Submitted by: Johannes Lundberg <johalun0_gmail.com> Sponsored by: Limelight Networks	2018-04-24 19:55:12 +00:00
sbruno	bbf7d4dd03	Load balance sockets with new SO_REUSEPORT_LB option This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). Submitted by: Johannes Lundberg <johanlun0@gmail.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003	2018-04-23 19:51:00 +00:00
ae	ed1f56d3ff	icmp6_reflect() sends ICMPv6 message with new IPv6 header. So, it is considered as originated by our host packet. And thus rcvif should be NULL, since it is used by ipfw(4) to determine that packet was originated from this host. Some of icmp6_reflect() consumers reuse mbuf and m_pkthdr without resetting rcvif pointer. To avoid this always reset m_pkthdr.rcvif pointer to NULL in icmp6_reflect(). Also remove such line and comment describing this from icmp6_error(), since it does not longer matters. PR: 227674 Reported by: eugen MFC after: 1 week	2018-04-23 12:20:07 +00:00
brooks	26c165ead9	Remove support for the Arcnet protocol. While Arcnet has some continued deployment in industrial controls, the lack of drivers for any of the PCI, USB, or PCIe NICs on the market suggests such users aren't running FreeBSD. Evidence in the PR database suggests that the cm(4) driver (our sole Arcnet NIC) was broken in 5.0 and has not worked since. PR: 182297 Reviewed by: jhibbits, vangyzen Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15057	2018-04-13 21:18:04 +00:00
ae	e6638f6749	Add check that mbuf had not multicast layer2 address. Such packets should be handled by ip6_mforward(). Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2018-04-13 16:13:59 +00:00
brooks	6dcf9514b3	Remove support for FDDI networks. Defines in net/if_media.h remain in case code copied from ifconfig is in use elsewere (supporting non-existant media type is harmless). Reviewed by: kib, jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15017	2018-04-11 17:28:24 +00:00
tuexen	c3e1813aee	Fix a logical inversion bug. Thanks to Irene Ruengeler for finding and reporting this bug. MFC after: 3 days	2018-04-08 12:08:20 +00:00
brooks	9d79658aab	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
brooks	2b96daf50f	Document and enforce assumptions about struct (in6_)ifreq. - The two types must be type-punnable for shared members of ifr_ifru. This allows compatibility accessors to be shared. - There must be no padding gap between ifr_name and ifr_ifru. This is assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be broadly portable. This is true for all current architectures, but very large (256-bit) fat-pointers could violate this invariant. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14910	2018-03-30 21:38:53 +00:00
brooks	349ad8a8de	Remove a comment that suggests checking that a non-pointer is non-NULL. Reviewed by: melifaro, markj, hrs, ume Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14904	2018-03-30 18:26:29 +00:00
brooks	a45d44647f	Remove infrastructure for token-ring networks. Reviewed by: cem, imp, jhb, jmallett Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14875	2018-03-28 23:33:26 +00:00
jtl	5ccb8e69d6	This change adds a flag to the DAD entry to indicate whether it is currently on the queue. This prevents accidentally doubly-removing a DAD entry from the queue, while also simplifying some of the logic in nd6_dad_stop(). Reviewed by: ae, hrs, vangyzen MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10943	2018-03-24 13:18:09 +00:00
jtl	14225f2440	Remove some unneccessary variable sets in IPv6 code, as detected by clang's static analyzer. Reviewed by: bz MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10940	2018-03-24 12:43:34 +00:00
sbruno	5e726646d5	Revert r331379 as the "simple" lock changes have revealed a deeper problem and need for a rethink. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks	2018-03-23 18:34:38 +00:00
kp	109a7b5eec	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
sbruno	622ba6f45a	Refactor ip6_getpcbopt() for better locking and memory management Created GET_PKTOPT_EXT_HDR() and GET_PKTOPT_SOCKADDR() macros to handle safely fetching options from in6p_outputopts, including properly dealing with in6p locking and preparing memory for sooptcopyout(). Changed the function signature of ip6_getpcbopt() to allow the function to acquire and release locks on in6p as needed. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14619	2018-03-22 23:34:48 +00:00
sbruno	57df63d5af	Simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14624	2018-03-22 22:29:32 +00:00
sbruno	b74ecf8d2a	Handle locking and memory safety for IPV6_PATHMTU in ip6_ctloutput(). Submitted by: Jason Eggleston <jason@eggnet.com> Reviewed by: ae Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14622	2018-03-22 21:18:34 +00:00
sbruno	bcda73b875	Improve write locking in ip6_ctloutput() with macros. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14620	2018-03-22 20:21:05 +00:00
jtl	ee029a5d0b	If the INP lock is uncontested, avoid taking a reference and jumping through the lock-switching hoops. A few of the INP lookup operations that lock INPs after the lookup do so using this mechanism (to maintain lock ordering): 1. Lock lookup structure. 2. Find INP. 3. Acquire reference on INP. 4. Drop lock on lookup structure. 5. Acquire INP lock. 6. Drop reference on INP. This change provides a slightly shorter path for cases where the INP lock is uncontested: 1. Lock lookup structure. 2. Find INP. 3. Try to acquire the INP lock. 4. If successful, drop lock on lookup structure. Of course, if the INP lock is contested, the functions will need to revert to the previous way of switching locks safely. This saves a few atomic operations when the INP lock is uncontested. Discussed with: gallatin, rrs, rwatson MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D12911	2018-03-21 15:54:46 +00:00
melifaro	7bb5ee0db4	Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration. Current arp/nd code relies on the feedback from the datapath indicating that the entry is still used. This mechanism is incorporated into the arpresolve()/nd6_resolve() routines. After the inpcb route cache introduction, the packet path for the locally-originated packets changed, passing cached lle pointer to the ether_output() directly. This resulted in the arp/ndp entry expire each time exactly after the configured max_age interval. During the small window between the ARP/NDP request and reply from the router, most of the packets got lost. Fix this behaviour by plugging datapath notification code to the packet path used by route cache. Unify the notification code by using single inlined function with the per-AF callbacks. Reported by: sthaug at nethelp.no Reviewed by: ae MFC after: 2 weeks	2018-03-17 17:05:48 +00:00
vangyzen	63337e8587	Update the MTU in affected routes when IPv6 RA changes the MTU ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache nor the route provides an MTU. Update the routes so they do not provide stale MTUs. This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09, which use a RA to reduce the link MTU from 1500 to 1280. Reported and tested by: Farrell Woods <Farrell_Woods@Dell.com> Reviewed by: dab, melifaro Discussed with: ae MFC after: 1 week Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D14257	2018-02-12 19:49:20 +00:00
vangyzen	192a6ce090	Fix ICMPv6 redirects icmp6_redirect_input() validates that a redirect packet came from the current gateway for the respective destination. To do this, it compares the source address, which has an embedded scope zone id, to the next-hop address, which does not. If the address is link-local, which should be the case, the comparison fails and the redirect is ignored. Insert the scope zone id into the next-hop address so the comparison is accurate. Unsurprisingly, this fixes 35 UNH IPv6 conformance test cases. Submitted by: Farrell Woods <Farrell_Woods@Dell.com> (initial revision) Reviewed by: ae melifaro dab MFC after: 1 week Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D14254	2018-02-09 00:13:05 +00:00
ae	8d74fbedd3	Modify ip6_get_prevhdr() to be able use it safely. Instead of returning pointer to the previous header, return its offset. In frag6_input() use m_copyback() and determined offset to store next header instead of accessing to it by pointer and assuming that the memory is contiguous. In rip6_input() use offset returned by ip6_get_prevhdr() instead of calculating it from pointers arithmetic, because IP header can belong to another mbuf in the chain. Reported by: Maxime Villard <max at m00nbsd dot net> Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14158	2018-02-05 09:22:07 +00:00
ae	ca864f3d25	Merge r1.120 from NetBSD: Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE, not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain on an mbuf that was already freed. Reported by: Maxime Villard <max at m00nbsd dot net> MFC after: 3 days	2018-02-02 07:39:34 +00:00
vangyzen	08bd8b9104	ND6: Set the correct state for new neighbor cache entries Restore state 6. Many of the UNH tests end up exercising this state, where we have a new neighbor cache entry and a new link-layer entry is being created for it. The link-layer address is currently unknown so the initial state of the "llentry" should remain initialized to ND6_LLINFO_NOSTATE so that the ND code will send a solicitation. Setting this to ND6_LLINFO_STALE implies that the link-level entry is valid and can be used (but needs to be refreshed via the Neighbor Unreachability state machine). https://forums.freebsd.org/threads/64287/ Submitted by: Farrell Woods <Farrell_Woods@Dell.com> Reviewed by: mjoras, dab, ae MFC after: 1 week Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D14059	2018-01-29 16:12:26 +00:00
ae	5c4c621097	Do not skip scope zone violation check, when mbuf has M_FASTFWD_OURS flag. When mbuf has M_FASTFWD_OURS flag, this means that a destination address is our local, but we still need to pass scope zone violation check, because protocol level expects that IPv6 link-local addresses have embedded scope zone indexes. This should fix the problem, when ipfw is used to forward packets to local address and source address of a packet is IPv6 LLA. Reported by: sbruno MFC after: 3 weeks	2018-01-29 11:03:29 +00:00
ae	ffc9f88671	Assign IPv6 link-local address to loopback interfaces whith unit > 0. When an interface has IFF_LOOPBACK flag in6_ifattach() tries to assing IPv6 loopback address to this interface. It uses in6ifa_ifpwithaddr() to check, that interface doesn't already have given address and then uses in6_ifattach_loopback(). If in6_ifattach_loopback() fails, it just exits and thus skips assignment of IPv6 LLA. Fix this using in6ifa_ifwithaddr() function. If IPv6 loopback address is already assigned in the system, do not call in6_ifattach_loopback(). PR: 138678 MFC after: 3 weeks	2018-01-29 10:33:55 +00:00
np	af35a0e296	Do not generate illegal mbuf chains during IP fragment reassembly. Only the first mbuf of the reassembled datagram should have a pkthdr. This was discovered with cxgbe(4) + IPSEC + ping with payload more than interface MTU. cxgbe can generate !M_WRITEABLE mbufs and this results in m_unshare being called on the reassembled datagram, and it complains: panic: m_unshare: m0 0xfffff80020f82600, m 0xfffff8005d054100 has M_PKTHDR PR: 224922 Reviewed by: ae@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14009	2018-01-24 05:09:21 +00:00
asomers	c9a63ac910	sys/netinet6: fix typos in comments. No functional change. MFC after: 3 weeks Sponsored by: Spectra Logic Corp	2018-01-23 19:40:05 +00:00
pfg	ced875130d	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
pfg	bf156bc88c	net*: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:21:51 +00:00
pfg	508a3c553c	Fix some typos. Obtained from: OpenBSD (CVS v1.5)	2017-12-28 20:40:56 +00:00

1 2 3 4 5 ...

1834 Commits