freebsd-dev

Author	SHA1	Message	Date
Michael Tuexen	e4a5561e01	Fix compilation on platforms using gcc. When compiling RACK on platforms using gcc, a warning that tcp_outflags is defined but not used is issued and terminates compilation on PPC64, for example. So don't indicate that tcp_outflags is used. Reviewed by: rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D20971	2019-07-16 17:54:20 +00:00
Michael Tuexen	8a3cfbff92	Don't free read control entries, which are still on the stream queue when adding them the the read queue fails MFC after: 1 week	2019-07-15 20:45:01 +00:00
Michael Tuexen	248bd1b80f	Add support for MSG_EOR and MSG_EOF in sendmsg() for SCTP. This is an FreeBSD extension, not covered by Posix. This issue was found by running syzkaller. MFC after: 1 week	2019-07-15 14:54:04 +00:00
Michael Tuexen	25fa310a5f	Fix socket state handling when freeing an SCTP endpoint. This issue was found by runing syzkaller. MFC after: 1 week	2019-07-15 14:52:52 +00:00
Randall Stewart	e5926fd368	This is the second in a number of patches needed to get BBRv1 into the tree. This fixes the DSACK bug but is also needed by BBR. We have yet to go two more one will be for the pacing code (tcp_ratelimit.c) and the second will be for the new updated LRO code that allows a transport to know the arrival times of packets and (tcp_lro.c). After that we should finally be able to get BBRv1 into head. Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D20908	2019-07-14 16:05:47 +00:00
Michael Tuexen	8a956abe12	When calling sctp_initialize_auth_params(), the inp must have at least a read lock. To avoid more complex locking dances, just call it in sctp_aloc_assoc() when the write lock is still held. Reported by: syzbot+08a486f7e6966f1c3cfb@syzkaller.appspotmail.com MFC after: 1 week	2019-07-14 12:04:39 +00:00
Randall Stewart	55f795883f	add back the comment around the pending DSACK fixes.	2019-07-12 11:45:42 +00:00
Randall Stewart	1cf999a5f3	Update to jhb's other suggestion, use #error when we are missing HPTS.	2019-07-11 04:40:58 +00:00
Randall Stewart	9cf3c235c0	Update copyright per JBH's suggestions.. thanks.	2019-07-11 04:38:33 +00:00
Randall Stewart	3b0b41e613	This commit updates rack to what is basically being used at NF as well as sets in some of the groundwork for committing BBR. The hpts system is updated as well as some other needed utilities for the entrance of BBR. This is actually part 1 of 3 more needed commits which will finally complete with BBRv1 being added as a new tcp stack. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D20834	2019-07-10 20:40:39 +00:00
John Baldwin	82334850ea	Add an external mbuf buffer type that holds multiple unmapped pages. Unmapped mbufs allow sendfile to carry multiple pages of data in a single mbuf, without mapping those pages. It is a requirement for Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web serving workloads when used by sendfile, due to effectively compressing socket buffers by an order of magnitude, and hence reducing cache misses. For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer now points to a struct mbuf_ext_pgs structure instead of a data buffer. This structure contains an array of physical addresses (this reduces cache misses compared to an earlier version that stored an array of vm_page_t pointers). It also stores additional fields needed for in-kernel TLS such as the TLS header and trailer data that are currently unused. To more easily detect these mbufs, the M_NOMAP flag is set in m_flags in addition to M_EXT. Various functions like m_copydata() have been updated to safely access packet contents (using uiomove_fromphys()), to make things like BPF safe. NIC drivers advertise support for unmapped mbufs on transmit via a new IFCAP_NOMAP capability. This capability can be toggled via the new 'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only transmit packet contents via DMA and use bus_dma, adding the capability to if_capabilities and if_capenable should be all that is required. If a NIC does not support unmapped mbufs, they are converted to a chain of mapped mbufs (using sf_bufs to provide the mapping) in ip_output or ip6_output. If an unmapped mbuf requires software checksums, it is also converted to a chain of mapped mbufs before computing the checksum. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Discussed with: ae, kp (firewalls) Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:48:33 +00:00
John Baldwin	6b69072acc	Reject attempts to register a TCP stack being unloaded. Reviewed by: gallatin MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20617	2019-06-27 22:34:05 +00:00
Hans Petter Selasky	59854ecf55	Convert all IPv4 and IPv6 multicast memberships into using a STAILQ instead of a linear array. The multicast memberships for the inpcb structure are protected by a non-sleepable lock, INP_WLOCK(), which needs to be dropped when calling the underlying possibly sleeping if_ioctl() method. When using a linear array to keep track of multicast memberships, the computed memory location of the multicast filter may suddenly change, due to concurrent insertion or removal of elements in the linear array. This in turn leads to various invalid memory access issues and kernel panics. To avoid this problem, put all multicast memberships on a STAILQ based list. Then the memory location of the IPv4 and IPv6 multicast filters become fixed during their lifetime and use after free and memory leak issues are easier to track, for example by: vmstat -m \| grep multi All list manipulation has been factored into inline functions including some macros, to easily allow for a future hash-list implementation, if needed. This patch has been tested by pho@ . Differential Revision: https://reviews.freebsd.org/D20080 Reviewed by: markj @ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-25 11:54:41 +00:00
Andrey V. Elsukov	978f2d1728	Add "tcpmss" opcode to match the TCP MSS value. With this opcode it is possible to match TCP packets with specified MSS option, whose value corresponds to configured in opcode value. It is allowed to specify single value, range of values, or array of specific values or ranges. E.g. # ipfw add deny log tcp from any to any tcpmss 0-500 Reviewed by: melifaro,bcr Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-06-21 10:54:51 +00:00
Kristof Provost	05fc9d78d7	ip_output: pass PFIL_FWD in the slow path If we take the slow path for forwarding we should still tell our firewalls (hooked through pfil(9)) that we're forwarding. Pass the ip_output() flags to ip_output_pfil() so it can set the PFIL_FWD flag when we're forwarding. MFC after: 1 week Sponsored by: Axiado	2019-06-21 07:58:08 +00:00
Jonathan T. Looney	5e02b277a4	Add the ability to limit how much the code will fragment the RACK send map in response to SACKs. The default behavior is unchanged; however, the limit can be activated by changing the new net.inet.tcp.rack.split_limit sysctl. Submitted by: Peter Lei <peterlei@netflix.com> Reported by: jtl Reviewed by: lstewart (earlier version) Security: CVE-2019-5599	2019-06-19 13:55:00 +00:00
Xin LI	f89d207279	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193	2019-06-17 19:49:08 +00:00
John Baldwin	77a0144145	Sort opt_foo.h #includes and add a missing blank line in ip_output().	2019-06-11 22:07:39 +00:00
Bjoern A. Zeeb	4c62bffef5	Fix dpcpu and vnet panics with complex types at the end of the section. Apply a linker script when linking i386 kernel modules to apply padding to a set_pcpu or set_vnet section. The padding value is kind-of random and is used to catch modules not compiled with the linker-script, so possibly still having problems leading to kernel panics. This is needed as the code generated on certain architectures for non-simple-types, e.g., an array can generate an absolute relocation on the edge (just outside) the section and thus will not be properly relocated. Adding the padding to the end of the section will ensure that even absolute relocations of complex types will be inside the section, if they are the last object in there and hence relocation will work properly and avoid panics such as observed with carp.ko or ipsec.ko. There is a rather lengthy discussion of various options to apply in the mentioned PRs and their depends/blocks, and the review. There seems no best solution working across multiple toolchains and multiple version of them, so I took the liberty of taking one, as currently our users (and our CI system) are hitting this on just i386 and we need some solution. I wish we would have a proper fix rather than another "hack". Also backout r340009 which manually, temporarily fixed CARP before 12.0-R "by chance" after a lead-up of various other link-elf.c and related fixes. PR: 230857,238012 With suggestions from: arichardson (originally last year) Tested by: lwhsu Event: Waterloo Hackathon 2019 Reported by: lwhsu, olivier MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D17512	2019-06-08 17:44:42 +00:00
Michael Tuexen	d1156b0505	r347382 added receiver side DSACK support for the TCP base stack. The corresponding changes for the RACK stack where missed and are added by this commit. Reviewed by: Richard Scheffenegger, rrs@ MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D20372	2019-06-06 07:49:03 +00:00
Bjoern A. Zeeb	eafaa1bc35	After parts of the locking fixes in r346595, syzkaller found another one in udp_output(). This one is a race condition. We do check on the laddr and lport without holding a lock in order to determine whether we want a read or a write lock (this is in the "sendto/sendmsg" cases where addr (sin) is given). Instrumenting the kernel showed that after taking the lock, we had bound to a local port from a parallel thread on the same socket. If we find that case, unlock, and retry again. Taking the write lock would not be a problem in first place (apart from killing some parallelism). However the retry is needed as later on based on similar condition checks we do acquire the pcbinfo lock and if the conditions have changed, we might find ourselves with a lock inconsistency, hence at the end of the function when trying to unlock, hitting the KASSERT. Reported by: syzbot+bdf4caa36f3ceeac198f@syzkaller.appspotmail.com Reviewed by: markj MFC after: 6 weeks Event: Waterloo Hackathon 2019	2019-06-01 14:57:42 +00:00
Mark Johnston	8726929d67	netdump: Buffer pages to avoid calling netdump_send() on each 4KB write. netdump waits for acknowledgement from the server for each write. When dumping page table pages, we perform many small writes, limiting throughput. Use the netdump client's buffer to buffer small contiguous writes before calling netdump_send() to flush the MAXDUMPPGS-sized buffer. This results in a significant reduction in the time taken to complete a netdump. Submitted by: Sam Gwydir <sam@samgwydir.com> Reviewed by: cem MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20317	2019-05-31 18:29:12 +00:00
Michael Tuexen	bc35229fad	When an ACK segment as the third message of the three way handshake is received and support for time stamps was negotiated in the SYN/SYNACK exchange, perform the PAWS check and only expand the syn cache entry if the check is passed. Without this check, endpoints may get stuck on the incomplete queue. Reviewed by: jtl@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D20374	2019-05-26 17:18:14 +00:00
John Baldwin	fb3bc59600	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117	2019-05-24 22:30:40 +00:00
Bjoern A. Zeeb	bd4549f467	Massively blow up the locking-related KASSERTs used to make sure that we end up in a consistent locking state at the end of udp_output() in order to be able to see what the values are based on which we once took a decision (note: some values may have changed). This helped to debug a syzkaller report. MFC after: 2 months Event: Waterloo Hackathon 2019	2019-05-21 19:23:56 +00:00
Bjoern A. Zeeb	f9a6e8d72f	Similarly to r338257,338306 try to fold the two consecutive #ifdef RSS section in udp_output() into one by moving a '}' outside of the conditional block. MFC after: 2 months Event: Waterloo Hackathon 2019	2019-05-21 19:18:55 +00:00
Conrad Meyer	04e0c883c5	Add two missing eventhandler.h headers These are obviously missing from the .c files, but don't show up in any tinderbox configuration (due to latent header pollution of some kind). It seems some configurations don't have this pollution, and the includes are obviously missing, so go ahead and add them. Reported by: Peter Jeremy <peter AT rulingia.com> X-MFC-With: r347984	2019-05-21 00:04:19 +00:00
Conrad Meyer	e2e050c8ef	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
Michael Tuexen	3a4f12e3b1	Allow sending on demand SCTP HEARTBEATS only in the ESTABLISHED state. This issue was found by running syzkaller. MFC after: 3 days	2019-05-19 17:53:36 +00:00
Michael Tuexen	fc26bf717c	Improve input validation for the IPPROTO_SCTP level socket options SCTP_CONNECT_X and SCTP_CONNECT_X_DELAYED. Some issues where found by running syzkaller. MFC after: 3 days	2019-05-19 17:28:00 +00:00
Mark Johnston	f00876fb60	Revert r347582 for now. The inp lock still needs to be dropped when calling into the driver ioctl handler, as some drivers expect to be able to sleep. Reported by: kib	2019-05-16 13:04:26 +00:00
Mark Johnston	5a1e222bfd	Close some races in multicast socket option handling. r333175 converted the global multicast lock to a sleepable sx lock, so the lock order with respect to the (non-sleepable) inp lock changed. To handle this, r333175 and r333505 added code to drop the inp lock, but this opened races that could leave multicast group description structures in an inconsistent state. This change fixes the problem by simply acquiring the global lock sooner. Along the way, this fixes some LORs and bogus error handling introduced in r333175, and commits some related cleanup. Reported by: syzbot+ba7c4943547e0604faca@syzkaller.appspotmail.com Reported by: syzbot+1b803796ab94d11a46f9@syzkaller.appspotmail.com Reviewed by: ae MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20070	2019-05-14 21:30:55 +00:00
Conrad Meyer	64e7d18f34	netdump: Ref the interface we're attached to Serialize netdump configuration / deconfiguration, and discard our configuration when the affiliated interface goes away by monitoring ifnet_departure_event. Reviewed by: markj, with input from vangyzen@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20206	2019-05-10 23:12:59 +00:00
Conrad Meyer	070e7bf95e	netdump: Fix boot-time configuration typo Boot-time netdump configuration is much more useful if one can configure the client and gateway addresses. Fix trivial typo. (Long-standing bug, I believe it dates to the original netdump commit.) Spotted by: one of vangyzen@ or markj@ Sponsored by: Dell EMC Isilon	2019-05-10 23:10:22 +00:00
Conrad Meyer	6144b50f8b	netdump: Don't store sensitive key data we don't need Prior to this revision, struct diocskerneldump_arg (and struct netdump_conf with embedded diocskerneldump_arg before r347192), were copied in their entirety to the global 'nd_conf' variable. Also prior to this revision, de-configuring netdump would not remove the the key material from global nd_conf. As part of Encrypted Kernel Crash Dumps (EKCD), which was developed contemporaneously with netdump but happened to land first, the diocskerneldump_arg structure will contain sensitive key material (kda_key[]) when encrypted dumps are configured. Netdump doesn't have any use for the key data -- encryption is handled in the core dumper code -- so in this revision, we no longer store it. Unfortunately, I think this leak dates to the initial import of netdump in r333283; so it's present in FreeBSD 12.0. Fortunately, the impact seems relatively minor. Any new netdump configuration would overwrite the key material; for active encrypted netdump configurations, the key data stored was just a duplicate of the key material already in the core dumper code; and no user interface (other than /dev/kmem) actually exposed the leaked material to userspace. Reviewed by: markj, rpokala (earlier commit message) MFC after: 2 weeks Security: yes (minor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20233	2019-05-10 21:55:11 +00:00
Gleb Smirnoff	54bb7ac0c4	Fix regression from r347375: do not panic when sending an IP multicast packet from an interface that doesn't have IPv4 address. Reported by: Michael Butler <imb protected-networks.net>	2019-05-10 21:51:17 +00:00
Andrew Gallatin	4e255d7479	Bind TCP HPTS (pacer) threads to NUMA domains Bind the TCP pacer threads to NUMA domains and build per-domain pacer-thread lookup tables. These tables allow us to use the inpcb's NUMA domain information to match an inpcb with a pacer thread on the same domain. The motivation for this is to keep the TCP connection local to a NUMA domain as much as possible. Thanks to jhb for pre-reviewing an earlier version of the patch. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20134	2019-05-10 13:41:19 +00:00
Michael Tuexen	b5a154d8e3	Don't use C++ style comments. These where introduced in r347382. Reported by: ngie@	2019-05-09 21:00:15 +00:00
Michael Tuexen	5acfd95cbc	Receiver side DSACK implemenation. This adds initial support for RFC 2883. Submitted by: Richard Scheffenegger Reviewed by: rrs@ Differential Revision: https://reviews.freebsd.org/D19334	2019-05-09 07:34:15 +00:00
Michael Tuexen	5cc11a89db	Prevent cwnd to collapse down to 1 MSS after exiting recovery. This is descrined in RFC 6582, which updates RFC 3782. Submitted by: Richard Scheffenegger Reviewed by: lstewart@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17614	2019-05-09 07:11:08 +00:00
Gleb Smirnoff	6ca363eb7b	Existense of PCB route caching doesn't allow us to use new fast route lookup KPI in ip_output() like it is already used in ip_forward(). However, when there is no PCB provided we can use fast KPI, gaining performance advantage. Typical case when ip_output() is called without a PCB pointer is a sendto(2) on a not connected UDP socket. In practice DNS servers do this. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D19804	2019-05-08 23:39:24 +00:00
Conrad Meyer	6b6e2954dd	List-ify kernel dump device configuration Allow users to specify multiple dump configurations in a prioritized list. This enables fallback to secondary device(s) if primary dump fails. E.g., one might configure a preference for netdump, but fallback to disk dump as a second choice if netdump is unavailable. This change does not list-ify netdump configuration, which is tracked separately from ordinary disk dumps internally; only one netdump configuration can be made at a time, for now. It also does not implement IPv6 netdump. savecore(8) is already capable of scanning and iterating multiple devices from /etc/fstab or passed on the command line. This change doesn't update the rc or loader variables 'dumpdev' in any way; it can still be set to configure a single dump device, and rc.d/savecore still uses it as a single device. Only dumpon(8) is updated to be able to configure the more complicated configurations for now. As part of revving the ABI, unify netdump and disk dump configuration ioctl / structure, and leave room for ipv6 netdump as a future possibility. Backwards-compatibility ioctls are added to smooth ABI transition, especially for developers who may not keep kernel and userspace perfectly synced. Reviewed by: markj, scottl (earlier version) Relnotes: maybe Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19996	2019-05-06 18:24:07 +00:00
Alexander Motin	fb6a844704	ip multicast debug: fix strings vs defines Turning on multicast debug made multicast failure worse because the strings and #define values no longer matched up. Fix them, and make sure they stay matched-up. Submitted by: torek MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-04-29 18:09:55 +00:00
Andrew Gallatin	50575ce11c	Track TCP connection's NUMA domain in the inpcb Drivers can now pass up numa domain information via the mbuf numa domain field. This information is then used by TCP syncache_socket() to associate that information with the inpcb. The domain information is then fed back into transmitted mbufs in ip{6}_output(). This mechanism is nearly identical to what is done to track RSS hash values in the inp_flowid. Follow on changes will use this information for lacp egress port selection, binding TCP pacers to the appropriate NUMA domain, etc. Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20028	2019-04-25 15:37:28 +00:00
Andrey V. Elsukov	aee793eec9	Add GRE-in-UDP encapsulation support as defined in RFC8086. This GRE-in-UDP encapsulation allows the UDP source port field to be used as an entropy field for load-balancing of GRE traffic in transit networks. Also most of multiqueue network cards are able distribute incoming UDP datagrams to different NIC queues, while very little are able do this for GRE packets. When an administrator enables UDP encapsulation with command `ifconfig gre0 udpencap`, the driver creates kernel socket, that binds to tunnel source address and after udp_set_kernel_tunneling() starts receiving of all UDP packets destined to 4754 port. Each kernel socket maintains list of tunnels with different destination addresses. Thus when several tunnels use the same source address, they all handled by single socket. The IP[V6]_BINDANY socket option is used to be able bind socket to source address even if it is not yet available in the system. This may happen on system boot, when gre(4) interface is created before source address become available. The encapsulation and sending of packets is done directly from gre(4) into ip[6]_output() without using sockets. Reviewed by: eugen MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D19921	2019-04-24 09:05:45 +00:00
Conrad Meyer	a9f7f19242	netdump: Fix !COMPAT_FREEBSD11 unused variable warning Reported by: Ralf Wenk <iz-rpi03_hs-karlsruhe.de> Sponsored by: Dell EMC Isilon	2019-04-23 17:05:57 +00:00
Bjoern A. Zeeb	d86ecbe993	iFix udp_output() lock inconsistency. In r297225 the initial INP_RLOCK() was replaced by an early acquisition of an r- or w-lock depending on input variables possibly extending the write locked area for reasons not entirely clear but possibly to avoid a later case of unlock and relock leading to a possible race condition and possibly in order to allow the route cache to work for connected sockets. Unfortunately the conditions were not 1:1 replicated (probably because of the route cache needs). While this would not be a problem the legacy IP code compared to IPv6 has an extra case when dealing with IP_SENDSRCADDR. In a particular case we were holding an exclusive inp lock and acquired the shared udbinfo lock (now epoch). When then running into an error case, the locking assertions on release fired as the udpinfo and inp lock levels did not match. Break up the special case and in that particular case acquire and udpinfo lock depending on the exclusitivity of the inp lock. MFC After: 9 days Reported-by: syzbot+1f5c6800e4f99bdb1a48@syzkaller.appspotmail.com Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D19594	2019-04-23 10:12:33 +00:00
Hans Petter Selasky	6bbdbbb830	Revert r346530 until further. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-04-22 19:36:19 +00:00
Bjoern A. Zeeb	ade1258dc1	r297225 move the assignment of sin from add to the top of the function. sin is not changed after the initial assignment, so no need to set it again. MFC after: 10 days	2019-04-22 14:53:53 +00:00
Bjoern A. Zeeb	e932299837	Remove some excessive brackets. No functional change. MFC after: 10 days	2019-04-22 14:20:49 +00:00

1 2 3 4 5 ...

6266 Commits