freebsd-skq

Author	SHA1	Message	Date
jhb	520aafe3ec	Add an external mbuf buffer type that holds multiple unmapped pages. Unmapped mbufs allow sendfile to carry multiple pages of data in a single mbuf, without mapping those pages. It is a requirement for Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web serving workloads when used by sendfile, due to effectively compressing socket buffers by an order of magnitude, and hence reducing cache misses. For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer now points to a struct mbuf_ext_pgs structure instead of a data buffer. This structure contains an array of physical addresses (this reduces cache misses compared to an earlier version that stored an array of vm_page_t pointers). It also stores additional fields needed for in-kernel TLS such as the TLS header and trailer data that are currently unused. To more easily detect these mbufs, the M_NOMAP flag is set in m_flags in addition to M_EXT. Various functions like m_copydata() have been updated to safely access packet contents (using uiomove_fromphys()), to make things like BPF safe. NIC drivers advertise support for unmapped mbufs on transmit via a new IFCAP_NOMAP capability. This capability can be toggled via the new 'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only transmit packet contents via DMA and use bus_dma, adding the capability to if_capabilities and if_capenable should be all that is required. If a NIC does not support unmapped mbufs, they are converted to a chain of mapped mbufs (using sf_bufs to provide the mapping) in ip_output or ip6_output. If an unmapped mbuf requires software checksums, it is also converted to a chain of mapped mbufs before computing the checksum. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Discussed with: ae, kp (firewalls) Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:48:33 +00:00
jhb	eb4237a478	Reject attempts to register a TCP stack being unloaded. Reviewed by: gallatin MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20617	2019-06-27 22:34:05 +00:00
hselasky	1a5fd513af	Convert all IPv4 and IPv6 multicast memberships into using a STAILQ instead of a linear array. The multicast memberships for the inpcb structure are protected by a non-sleepable lock, INP_WLOCK(), which needs to be dropped when calling the underlying possibly sleeping if_ioctl() method. When using a linear array to keep track of multicast memberships, the computed memory location of the multicast filter may suddenly change, due to concurrent insertion or removal of elements in the linear array. This in turn leads to various invalid memory access issues and kernel panics. To avoid this problem, put all multicast memberships on a STAILQ based list. Then the memory location of the IPv4 and IPv6 multicast filters become fixed during their lifetime and use after free and memory leak issues are easier to track, for example by: vmstat -m \| grep multi All list manipulation has been factored into inline functions including some macros, to easily allow for a future hash-list implementation, if needed. This patch has been tested by pho@ . Differential Revision: https://reviews.freebsd.org/D20080 Reviewed by: markj @ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-25 11:54:41 +00:00
ae	c6d750cdc7	Add "tcpmss" opcode to match the TCP MSS value. With this opcode it is possible to match TCP packets with specified MSS option, whose value corresponds to configured in opcode value. It is allowed to specify single value, range of values, or array of specific values or ranges. E.g. # ipfw add deny log tcp from any to any tcpmss 0-500 Reviewed by: melifaro,bcr Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-06-21 10:54:51 +00:00
kp	5b2895d685	ip_output: pass PFIL_FWD in the slow path If we take the slow path for forwarding we should still tell our firewalls (hooked through pfil(9)) that we're forwarding. Pass the ip_output() flags to ip_output_pfil() so it can set the PFIL_FWD flag when we're forwarding. MFC after: 1 week Sponsored by: Axiado	2019-06-21 07:58:08 +00:00
jtl	a21d319b29	Add the ability to limit how much the code will fragment the RACK send map in response to SACKs. The default behavior is unchanged; however, the limit can be activated by changing the new net.inet.tcp.rack.split_limit sysctl. Submitted by: Peter Lei <peterlei@netflix.com> Reported by: jtl Reviewed by: lstewart (earlier version) Security: CVE-2019-5599	2019-06-19 13:55:00 +00:00
delphij	8581c5bfb9	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193	2019-06-17 19:49:08 +00:00
jhb	6c30621191	Sort opt_foo.h #includes and add a missing blank line in ip_output().	2019-06-11 22:07:39 +00:00
bz	24f298a9c6	Fix dpcpu and vnet panics with complex types at the end of the section. Apply a linker script when linking i386 kernel modules to apply padding to a set_pcpu or set_vnet section. The padding value is kind-of random and is used to catch modules not compiled with the linker-script, so possibly still having problems leading to kernel panics. This is needed as the code generated on certain architectures for non-simple-types, e.g., an array can generate an absolute relocation on the edge (just outside) the section and thus will not be properly relocated. Adding the padding to the end of the section will ensure that even absolute relocations of complex types will be inside the section, if they are the last object in there and hence relocation will work properly and avoid panics such as observed with carp.ko or ipsec.ko. There is a rather lengthy discussion of various options to apply in the mentioned PRs and their depends/blocks, and the review. There seems no best solution working across multiple toolchains and multiple version of them, so I took the liberty of taking one, as currently our users (and our CI system) are hitting this on just i386 and we need some solution. I wish we would have a proper fix rather than another "hack". Also backout r340009 which manually, temporarily fixed CARP before 12.0-R "by chance" after a lead-up of various other link-elf.c and related fixes. PR: 230857,238012 With suggestions from: arichardson (originally last year) Tested by: lwhsu Event: Waterloo Hackathon 2019 Reported by: lwhsu, olivier MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D17512	2019-06-08 17:44:42 +00:00
tuexen	3b648a5e27	r347382 added receiver side DSACK support for the TCP base stack. The corresponding changes for the RACK stack where missed and are added by this commit. Reviewed by: Richard Scheffenegger, rrs@ MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D20372	2019-06-06 07:49:03 +00:00
bz	4dc2772cf1	After parts of the locking fixes in r346595, syzkaller found another one in udp_output(). This one is a race condition. We do check on the laddr and lport without holding a lock in order to determine whether we want a read or a write lock (this is in the "sendto/sendmsg" cases where addr (sin) is given). Instrumenting the kernel showed that after taking the lock, we had bound to a local port from a parallel thread on the same socket. If we find that case, unlock, and retry again. Taking the write lock would not be a problem in first place (apart from killing some parallelism). However the retry is needed as later on based on similar condition checks we do acquire the pcbinfo lock and if the conditions have changed, we might find ourselves with a lock inconsistency, hence at the end of the function when trying to unlock, hitting the KASSERT. Reported by: syzbot+bdf4caa36f3ceeac198f@syzkaller.appspotmail.com Reviewed by: markj MFC after: 6 weeks Event: Waterloo Hackathon 2019	2019-06-01 14:57:42 +00:00
markj	e9b44e8630	netdump: Buffer pages to avoid calling netdump_send() on each 4KB write. netdump waits for acknowledgement from the server for each write. When dumping page table pages, we perform many small writes, limiting throughput. Use the netdump client's buffer to buffer small contiguous writes before calling netdump_send() to flush the MAXDUMPPGS-sized buffer. This results in a significant reduction in the time taken to complete a netdump. Submitted by: Sam Gwydir <sam@samgwydir.com> Reviewed by: cem MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20317	2019-05-31 18:29:12 +00:00
tuexen	6901687b42	When an ACK segment as the third message of the three way handshake is received and support for time stamps was negotiated in the SYN/SYNACK exchange, perform the PAWS check and only expand the syn cache entry if the check is passed. Without this check, endpoints may get stuck on the incomplete queue. Reviewed by: jtl@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D20374	2019-05-26 17:18:14 +00:00
jhb	5518ae8169	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117	2019-05-24 22:30:40 +00:00
bz	0f70df8712	Massively blow up the locking-related KASSERTs used to make sure that we end up in a consistent locking state at the end of udp_output() in order to be able to see what the values are based on which we once took a decision (note: some values may have changed). This helped to debug a syzkaller report. MFC after: 2 months Event: Waterloo Hackathon 2019	2019-05-21 19:23:56 +00:00
bz	f365d1c4d7	Similarly to r338257,338306 try to fold the two consecutive #ifdef RSS section in udp_output() into one by moving a '}' outside of the conditional block. MFC after: 2 months Event: Waterloo Hackathon 2019	2019-05-21 19:18:55 +00:00
cem	2e158b518b	Add two missing eventhandler.h headers These are obviously missing from the .c files, but don't show up in any tinderbox configuration (due to latent header pollution of some kind). It seems some configurations don't have this pollution, and the includes are obviously missing, so go ahead and add them. Reported by: Peter Jeremy <peter AT rulingia.com> X-MFC-With: r347984	2019-05-21 00:04:19 +00:00
cem	250e158ddf	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
tuexen	724fce5e3b	Allow sending on demand SCTP HEARTBEATS only in the ESTABLISHED state. This issue was found by running syzkaller. MFC after: 3 days	2019-05-19 17:53:36 +00:00
tuexen	568140500b	Improve input validation for the IPPROTO_SCTP level socket options SCTP_CONNECT_X and SCTP_CONNECT_X_DELAYED. Some issues where found by running syzkaller. MFC after: 3 days	2019-05-19 17:28:00 +00:00
markj	cd39cf0fa8	Revert r347582 for now. The inp lock still needs to be dropped when calling into the driver ioctl handler, as some drivers expect to be able to sleep. Reported by: kib	2019-05-16 13:04:26 +00:00
markj	46ad7dbca8	Close some races in multicast socket option handling. r333175 converted the global multicast lock to a sleepable sx lock, so the lock order with respect to the (non-sleepable) inp lock changed. To handle this, r333175 and r333505 added code to drop the inp lock, but this opened races that could leave multicast group description structures in an inconsistent state. This change fixes the problem by simply acquiring the global lock sooner. Along the way, this fixes some LORs and bogus error handling introduced in r333175, and commits some related cleanup. Reported by: syzbot+ba7c4943547e0604faca@syzkaller.appspotmail.com Reported by: syzbot+1b803796ab94d11a46f9@syzkaller.appspotmail.com Reviewed by: ae MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20070	2019-05-14 21:30:55 +00:00
cem	21be2414d8	netdump: Ref the interface we're attached to Serialize netdump configuration / deconfiguration, and discard our configuration when the affiliated interface goes away by monitoring ifnet_departure_event. Reviewed by: markj, with input from vangyzen@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20206	2019-05-10 23:12:59 +00:00
cem	772f931bda	netdump: Fix boot-time configuration typo Boot-time netdump configuration is much more useful if one can configure the client and gateway addresses. Fix trivial typo. (Long-standing bug, I believe it dates to the original netdump commit.) Spotted by: one of vangyzen@ or markj@ Sponsored by: Dell EMC Isilon	2019-05-10 23:10:22 +00:00
cem	de66b6077b	netdump: Don't store sensitive key data we don't need Prior to this revision, struct diocskerneldump_arg (and struct netdump_conf with embedded diocskerneldump_arg before r347192), were copied in their entirety to the global 'nd_conf' variable. Also prior to this revision, de-configuring netdump would not remove the the key material from global nd_conf. As part of Encrypted Kernel Crash Dumps (EKCD), which was developed contemporaneously with netdump but happened to land first, the diocskerneldump_arg structure will contain sensitive key material (kda_key[]) when encrypted dumps are configured. Netdump doesn't have any use for the key data -- encryption is handled in the core dumper code -- so in this revision, we no longer store it. Unfortunately, I think this leak dates to the initial import of netdump in r333283; so it's present in FreeBSD 12.0. Fortunately, the impact seems relatively minor. Any new netdump configuration would overwrite the key material; for active encrypted netdump configurations, the key data stored was just a duplicate of the key material already in the core dumper code; and no user interface (other than /dev/kmem) actually exposed the leaked material to userspace. Reviewed by: markj, rpokala (earlier commit message) MFC after: 2 weeks Security: yes (minor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20233	2019-05-10 21:55:11 +00:00
glebius	909cac0308	Fix regression from r347375: do not panic when sending an IP multicast packet from an interface that doesn't have IPv4 address. Reported by: Michael Butler <imb protected-networks.net>	2019-05-10 21:51:17 +00:00
gallatin	fbc304aae0	Bind TCP HPTS (pacer) threads to NUMA domains Bind the TCP pacer threads to NUMA domains and build per-domain pacer-thread lookup tables. These tables allow us to use the inpcb's NUMA domain information to match an inpcb with a pacer thread on the same domain. The motivation for this is to keep the TCP connection local to a NUMA domain as much as possible. Thanks to jhb for pre-reviewing an earlier version of the patch. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20134	2019-05-10 13:41:19 +00:00
tuexen	78ed458c48	Don't use C++ style comments. These where introduced in r347382. Reported by: ngie@	2019-05-09 21:00:15 +00:00
tuexen	4730cb344f	Receiver side DSACK implemenation. This adds initial support for RFC 2883. Submitted by: Richard Scheffenegger Reviewed by: rrs@ Differential Revision: https://reviews.freebsd.org/D19334	2019-05-09 07:34:15 +00:00
tuexen	274fb26176	Prevent cwnd to collapse down to 1 MSS after exiting recovery. This is descrined in RFC 6582, which updates RFC 3782. Submitted by: Richard Scheffenegger Reviewed by: lstewart@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17614	2019-05-09 07:11:08 +00:00
glebius	c150a0f6fa	Existense of PCB route caching doesn't allow us to use new fast route lookup KPI in ip_output() like it is already used in ip_forward(). However, when there is no PCB provided we can use fast KPI, gaining performance advantage. Typical case when ip_output() is called without a PCB pointer is a sendto(2) on a not connected UDP socket. In practice DNS servers do this. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D19804	2019-05-08 23:39:24 +00:00
cem	6058a49bde	List-ify kernel dump device configuration Allow users to specify multiple dump configurations in a prioritized list. This enables fallback to secondary device(s) if primary dump fails. E.g., one might configure a preference for netdump, but fallback to disk dump as a second choice if netdump is unavailable. This change does not list-ify netdump configuration, which is tracked separately from ordinary disk dumps internally; only one netdump configuration can be made at a time, for now. It also does not implement IPv6 netdump. savecore(8) is already capable of scanning and iterating multiple devices from /etc/fstab or passed on the command line. This change doesn't update the rc or loader variables 'dumpdev' in any way; it can still be set to configure a single dump device, and rc.d/savecore still uses it as a single device. Only dumpon(8) is updated to be able to configure the more complicated configurations for now. As part of revving the ABI, unify netdump and disk dump configuration ioctl / structure, and leave room for ipv6 netdump as a future possibility. Backwards-compatibility ioctls are added to smooth ABI transition, especially for developers who may not keep kernel and userspace perfectly synced. Reviewed by: markj, scottl (earlier version) Relnotes: maybe Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19996	2019-05-06 18:24:07 +00:00
mav	a49c89d580	ip multicast debug: fix strings vs defines Turning on multicast debug made multicast failure worse because the strings and #define values no longer matched up. Fix them, and make sure they stay matched-up. Submitted by: torek MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-04-29 18:09:55 +00:00
gallatin	63aec3850f	Track TCP connection's NUMA domain in the inpcb Drivers can now pass up numa domain information via the mbuf numa domain field. This information is then used by TCP syncache_socket() to associate that information with the inpcb. The domain information is then fed back into transmitted mbufs in ip{6}_output(). This mechanism is nearly identical to what is done to track RSS hash values in the inp_flowid. Follow on changes will use this information for lacp egress port selection, binding TCP pacers to the appropriate NUMA domain, etc. Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20028	2019-04-25 15:37:28 +00:00
ae	97ddb4fef9	Add GRE-in-UDP encapsulation support as defined in RFC8086. This GRE-in-UDP encapsulation allows the UDP source port field to be used as an entropy field for load-balancing of GRE traffic in transit networks. Also most of multiqueue network cards are able distribute incoming UDP datagrams to different NIC queues, while very little are able do this for GRE packets. When an administrator enables UDP encapsulation with command `ifconfig gre0 udpencap`, the driver creates kernel socket, that binds to tunnel source address and after udp_set_kernel_tunneling() starts receiving of all UDP packets destined to 4754 port. Each kernel socket maintains list of tunnels with different destination addresses. Thus when several tunnels use the same source address, they all handled by single socket. The IP[V6]_BINDANY socket option is used to be able bind socket to source address even if it is not yet available in the system. This may happen on system boot, when gre(4) interface is created before source address become available. The encapsulation and sending of packets is done directly from gre(4) into ip[6]_output() without using sockets. Reviewed by: eugen MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D19921	2019-04-24 09:05:45 +00:00
cem	16e165fb1d	netdump: Fix !COMPAT_FREEBSD11 unused variable warning Reported by: Ralf Wenk <iz-rpi03_hs-karlsruhe.de> Sponsored by: Dell EMC Isilon	2019-04-23 17:05:57 +00:00
bz	87874d0b3e	iFix udp_output() lock inconsistency. In r297225 the initial INP_RLOCK() was replaced by an early acquisition of an r- or w-lock depending on input variables possibly extending the write locked area for reasons not entirely clear but possibly to avoid a later case of unlock and relock leading to a possible race condition and possibly in order to allow the route cache to work for connected sockets. Unfortunately the conditions were not 1:1 replicated (probably because of the route cache needs). While this would not be a problem the legacy IP code compared to IPv6 has an extra case when dealing with IP_SENDSRCADDR. In a particular case we were holding an exclusive inp lock and acquired the shared udbinfo lock (now epoch). When then running into an error case, the locking assertions on release fired as the udpinfo and inp lock levels did not match. Break up the special case and in that particular case acquire and udpinfo lock depending on the exclusitivity of the inp lock. MFC After: 9 days Reported-by: syzbot+1f5c6800e4f99bdb1a48@syzkaller.appspotmail.com Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D19594	2019-04-23 10:12:33 +00:00
hselasky	8019d4dcb5	Revert r346530 until further. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-04-22 19:36:19 +00:00
bz	b1a35f1113	r297225 move the assignment of sin from add to the top of the function. sin is not changed after the initial assignment, so no need to set it again. MFC after: 10 days	2019-04-22 14:53:53 +00:00
bz	e6c9a8836e	Remove some excessive brackets. No functional change. MFC after: 10 days	2019-04-22 14:20:49 +00:00
hselasky	8752233742	Fix build for mips and powerpc after r346530. Need to include sys/kernel.h to define SYSINIT() which is used by sys/eventhandler.h . MFC after: 1 week Sponsored by: Mellanox Technologies	2019-04-22 08:32:00 +00:00
hselasky	b65993070e	Fix panic in network stack due to memory use after free in relation to fragmented packets. When sending IPv4 and IPv6 fragmented packets and a fragment is lost, the mbuf making up the fragment will remain in the temporary hashed fragment list for a while. If the network interface departs before the so-called slow timeout clears the packet, the fragment causes a panic when the timeout kicks in due to accessing a freed network interface structure. Make sure that when a network device is departing, all hashed IPv4 and IPv6 fragments belonging to it, get freed. Backtrace: panic() icmp6_reflect() hlim = ND_IFINFO(m->m_pkthdr.rcvif)->chlim; ^^^^ rcvif->if_afdata[AF_INET6] is NULL. icmp6_error() frag6_freef() frag6_slowtimo() pfslowtimo() softclock_call_cc() softclock() ithread_loop() Differential Revision: https://reviews.freebsd.org/D19622 Reviewed by: bz (network), adrian MFC after: 1 week Sponsored by: Mellanox Technologies	2019-04-22 07:27:24 +00:00
cem	926ea6e16b	netdump: Fix 11 compatibility DIOCSKERNELDUMP ioctl The logic was present for the 11 version of the DIOCSKERNELDUMP ioctl, but had not been updated for the 12 ABI. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D19980	2019-04-20 16:07:29 +00:00
jhb	b4bd29249c	Push down INP_WLOCK slightly in tcp_ctloutput. The inp lock is not needed for testing the V6 flag as that flag is set once when the inp is created and never changes. For non-TCP socket options the lock is immediately dropped after checking that flag. This just pushes the lock down to only be acquired for TCP socket options. This isn't a hot-path, more a cosmetic cleanup I noticed while reading the code. Reviewed by: bz MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19740	2019-04-18 23:21:26 +00:00
tuexen	485db168ce	When sending IPv4 packets on a SOCK_RAW socket using the IP_HDRINCL option, ensure that the ip_hl field is valid. Furthermore, ensure that the complete IPv4 header is contained in the first mbuf. Finally, move the length checks before relying on them when accessing fields of the IPv4 header. Reported by: jtl@ Reviewed by: jtl@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19181	2019-04-13 10:47:47 +00:00
tuexen	7186df98c8	Fix an SCTP related locking issue. Don't report that the TCB_SEND_LOCK is owned, when it is not. This issue was found by running syzkaller. MFC after: 1 week	2019-04-11 20:39:12 +00:00
markj	9abf4945e6	Reinitialize multicast source filter structures after invalidation. When leaving a multicast group, a hole may be created in the inpcb's source filter and group membership arrays. To remove the hole, the succeeding array elements are copied over by one entry. The multicast code expects that a newly allocated array element is initialized, but the code which shifts a tail of the array was leaving stale data in the final entry. Fix this by explicitly reinitializing the last entry following such a copy. Reported by: syzbot+f8c3c564ee21d650475e@syzkaller.appspotmail.com Reviewed by: ae MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19872	2019-04-11 08:00:59 +00:00
rrs	5883516e75	Fix a small bug in the tcp_log_id where the bucket was unlocked and yet the bucket-unlock flag was not changed to false. This can cause a panic if INVARIANTS is on and we go through the right path (though rare). This fixes the correct bug :) Reported by: syzbot+179a1ad49f3c4c215fa2@syzkaller.appspotmail.com Reviewed by: tuexen@	2019-04-10 18:58:11 +00:00
rgrimes	cda8035706	Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter the values in use by these macros. Correct that by replacing handcrafted code with proper macro usage. Reviewed by: karels, kristof Approved by: bde (mentor) MFC after: 3 weeks Sponsored by: John Gilmore Differential Revision: https://reviews.freebsd.org/D19317	2019-04-04 19:01:13 +00:00
rrs	50c7932ba8	Undo my previous erroneous commit changing the tcp_output kassert. Hmm now the question is where did the tcp_log_id change go :o	2019-04-03 19:35:07 +00:00
np	3ee5c69403	tcp_autorcvbuf_inc was removed in r344433. Discussed with: tuexen@ Sponsored by: Chelsio Communications	2019-03-29 21:39:47 +00:00
jhb	24a54e79c9	Don't check the inp socket pointer in in_pcboutput_eagain. Reviewed by: hps (by saying it was ok to be removed) MFC after: 1 month Sponsored by: Netflix	2019-03-29 19:47:42 +00:00
markj	6cc01e9149	Add CTLFLAG_VNET to the net.inet.icmp.tstamprepl definition. Reported by: Hans Fiedler <hans@hfconsulting.com> MFC after: 3 days	2019-03-26 22:14:50 +00:00
rrs	b6ca75d739	Fix a small bug in the tcp_log_id where the bucket was unlocked and yet the bucket-unlock flag was not changed to false. This can cause a panic if INVARIANTS is on and we go through the right path (though rare). Reported by: syzbot+179a1ad49f3c4c215fa2@syzkaller.appspotmail.com Reviewed by: tuexen@ MFC after: 1 week	2019-03-26 10:41:27 +00:00
tuexen	92665ddcf3	Fix a double free of an SCTP association in an error path. This is joint work with rrs@. The issue was found by running syzkaller. MFC after: 1 week	2019-03-26 08:27:00 +00:00
tuexen	873fcf8446	Initialize scheduler specific data for the FCFS scheduler. This is joint work with rrs@. The issue was reported by using syzkaller. MFC after: 1 week	2019-03-25 16:40:54 +00:00
tuexen	a150bffcbf	Improve locking when tearing down an SCTP association. This is joint work with rrs@ and the issue was found by syzkaller. MFC after: 1 week	2019-03-25 15:23:20 +00:00
tuexen	e89e1927c7	Fix the handling of fragmented unordered messages when using DATA chunks and FORWARD-TSN. This bug was reported in https://github.com/sctplab/usrsctp/issues/286 for the userland stack. This is joint work with rrs@. MFC after: 1 week	2019-03-25 09:47:22 +00:00
tuexen	1ff39c37aa	Fix build issue for the userland stack. Joint work with rrs@. MFC after: 1 week	2019-03-24 12:13:05 +00:00
tuexen	aa72882b7f	Fox more signed unsigned issues. This time on the send path. This is joint work with rrs@ and was found by running syzkaller. MFC after: 1 week	2019-03-24 10:40:20 +00:00
tuexen	ff6cd9e93e	Fix a signed/unsigned bug when receiving SCTP messages. This is joint work with rrs@. Reported by: syzbot+6b8a4bc8cc828e9d9790@syzkaller.appspotmail.com MFC after: 1 week	2019-03-24 09:46:16 +00:00
tuexen	46b4806255	Limit the size of messages sent on 1-to-many style SCTP sockets with the SCTP_SENDALL flag. Allow also only one operation per SCTP endpoint. This fixes an issue found by running syzkaller and is joint work with rrs@. MFC after: 1 week	2019-03-23 22:56:03 +00:00
tuexen	5e3a245f1b	Limit the number of bytes which can be queued for SCTP sockets. This is joint work with rrs@. Reported by: syzbot+307f167f9bc214f095bc@syzkaller.appspotmail.com MFC after: 1 week	2019-03-23 22:46:29 +00:00
tuexen	f674536274	Add sysctl variable net.inet.tcp.rexmit_initial for setting RTO.Initial used by TCP. Reviewed by: rrs@, 0mp@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19355	2019-03-23 21:36:59 +00:00
tuexen	202ab2ae5b	Fix a KASSERT() in tcp_output(). When checking the length of the headers at this point, the IP level options have not been added to the mbuf chain. So don't take them into account. Reported by: syzbot+16025fff7ee5f7c5957b@syzkaller.appspotmail.com Reported by: syzbot+adb5836b8a9ff621b2aa@syzkaller.appspotmail.com Reported by: syzbot+d25a5352bcdf40acdbb8@syzkaller.appspotmail.com Reviewed by: rrs@ MFC after: 3 days Sponsored by: Netflix, Inc.	2019-03-23 09:56:41 +00:00
ae	93a7173b74	Add NAT64 CLAT implementation as defined in RFC6877. CLAT is customer-side translator that algorithmically translates 1:1 private IPv4 addresses to global IPv6 addresses, and vice versa. It is implemented as part of ipfw_nat64 kernel module. When module is loaded or compiled into the kernel, it registers "nat64clat" external action. External action named instance can be created using `create` command and then used in ipfw rules. The create command accepts two IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted, IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used. # ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX # ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out # ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in Obtained from: Yandex LLC Submitted by: Boris N. Lytochkin MFC after: 1 month Relnotes: yes Sponsored by: Yandex LLC	2019-03-18 11:44:53 +00:00
glebius	f2e5fcd17c	Remove 'dir' argument from dummynet_io(). This makes it possible to make dn_dir flags private to dummynet. There is still some room for improvement.	2019-03-14 22:32:50 +00:00
glebius	8a5caee402	Remove 'dir' argument in ng_ipfw_input, since ip_fw_args now has this info. While here make 'tee' boolean.	2019-03-14 22:30:05 +00:00
glebius	25a76c3618	Make second argument of ip_divert(), that specifies packet direction a bool. This allows pf(4) to avoid including ipfw(4) private files.	2019-03-14 22:23:09 +00:00
bz	886b55fe42	Improve ARP logging. r344504 added an extra ARP_LOG() call in case of an if_output() failure. It turns out IPv4 can be noisy. In order to not spam the console by default: (a) add a counter for these events so people can keep better track of how often it happens, and (b) add a sysctl to select the default ARP_LOG log level and set it to INFO avoiding the one (the new) DEBUG level by default. Claim a spare (1st one after 10 years since the stats were added) in order to not break netstat from FreeBSD 12->13 updates in the future. Reviewed by: karels Differential Revision: https://reviews.freebsd.org/D19490	2019-03-09 01:12:59 +00:00
tuexen	1e1af27f31	Fix locking bug. MFC after: 3 days	2019-03-08 18:17:57 +00:00
tuexen	272f72e22d	Some cleanup and consistency improvements. MFC after: 3 days	2019-03-08 18:16:19 +00:00
tuexen	8abf87e138	After removing an entry from the stream scheduler list, set the pointers to NULL, since we are checking for it in case the element gets inserted again. This issue was found by running syzkaller. MFC after: 3 days	2019-03-07 08:43:20 +00:00
tuexen	a301c92873	Allocate an assocition id and register the stcb with holding the lock. This avoids a race where stcbs can be found, which are not completely initialized. This was found by running syzkaller. MFC after: 3 days	2019-03-03 19:55:06 +00:00
tuexen	848287bd3d	Remove debug output. MFC after: 3 days	2019-03-02 16:10:11 +00:00
tuexen	bde12a4960	Allow SCTP stream reconfiguration operations only in ESTABLISHED state. This issue was found by running syzkaller. MFC after: 3 days	2019-03-02 14:30:27 +00:00
tuexen	48fbfe3342	Handle the case when calling the IPPROTO_SCTP level socket option SCTP_STATUS on an association with no primary path (early state). This issue was found by running syzkaller. MFC after: 3 days	2019-03-02 14:15:33 +00:00
tuexen	5b706bbb5a	Report the correct length when using the IPPROTO_SCTP level socket options SCTP_GET_PEER_ADDRESSES and SCTP_GET_LOCAL_ADDRESSES.	2019-03-02 13:12:37 +00:00
tuexen	9143b7c0de	Honor the memory limits provided when processing the IPPROTO_SCTP level socket option SCTP_GET_LOCAL_ADDRESSES in a getsockopt() call. Thanks to Thomas Barabosch for reporting the issue which was found by running syzkaller. MFC after: 3 days	2019-03-01 18:47:41 +00:00
tuexen	5ec78c1aa0	Improve consistency, not functional change. MFC after: 3 days	2019-03-01 15:57:55 +00:00
jhb	1498304f92	Various cleanups to the management of multiple TCP stacks. - Use strlcpy() with sizeof() instead of strncpy(). - Simplify initialization of TCP functions structures. init_tcp_functions() was already called before the first call to register a stack. Just inline the work in the SYSINIT and remove the racy helper variable. Instead, KASSERT that the rw lock is initialized when registering a stack. - Protect the default stack via a direct pointer comparison. The default stack uses the name "freebsd" instead of "default" so this protection wasn't working for the default stack anyway. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19152	2019-02-27 20:24:23 +00:00
bz	0d49e52bdf	Make arp code return (more) errors. arprequest() is a void function and in case of error we simply return without any feedback. In case of any local operation or *if_output() failing no feedback is send up the stack for the packet which triggered the arp request to be sent. arpresolve_full() has three pre-canned possible errors returned (if we have not yet sent enough arp requests or if we tried often enough without success) otherwise "no error" is returned. Make arprequest() an "internal" function arprequest_internal() which does return a possible error to the caller. Preserve arprequest() as a void wrapper function for external consumers. In arpresolve_full() add an extra error checking. Use the arprequest_internal() function and only return an error if non of the three ones (mentioend above) are already set. This will return possible errors all the way up the stack and allows functions and programs to react on the send errors rather than leaving them in the dark. Also they might get more detailed feedback of why packets cannot be sent and they will receive it quicker. Reviewed by: karels, hselasky Differential Revision: https://reviews.freebsd.org/D18904	2019-02-24 22:49:56 +00:00
glebius	13a1011c10	Support struct ip_mreqn as argument for IP_ADD_MEMBERSHIP. Legacy support for struct ip_mreq remains in place. The struct ip_mreqn is Linux extension to classic BSD multicast API. It has extra field allowing to specify the interface index explicitly. In Linux it used as argument for IP_MULTICAST_IF and IP_ADD_MEMBERSHIP. FreeBSD kernel also declares this structure and supports it as argument to IP_MULTICAST_IF since r170613. So, we have structure declared but not fully supported, this confused third party application configure scripts. Code handling IP_ADD_MEMBERSHIP was mixed together with code for IP_ADD_SOURCE_MEMBERSHIP. Bringing legacy and new structure support into the mess would made the "argument switcharoo" intolerable, so code was separated into its own switch case clause. MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D19276	2019-02-23 06:03:18 +00:00
tuexen	e763f31429	The receive buffer autoscaling for TCP is based on a linear growth, which is acceptable in the congestion avoidance phase, but not during slow start. The MTU is is also not taken into account. Use a method instead, which is based on exponential growth working also in slow start and being independent from the MTU. This is joint work with rrs@. Reviewed by: rrs@, Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18375	2019-02-21 10:35:32 +00:00
tuexen	192c95e996	This patch addresses an issue brought up by bz@ in D18968: When TCP_REASS_LOGGING is defined, a NULL pointer dereference would happen, if user data was received during the TCP handshake and BB logging is used. A KASSERT is also added to detect tcp_reass() calls with illegal parameter combinations. Reported by: bz@ Reviewed by: rrs@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19254	2019-02-21 09:34:47 +00:00
tuexen	87f2a8bca4	Reduce the TCP initial retransmission timeout from 3 seconds to 1 second as allowed by RFC 6298. Reviewed by: kbowling@, Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18941	2019-02-20 18:03:43 +00:00
tuexen	796133921a	Use exponential backoff for retransmitting SYN segments as specified in the TCP RFCs. Reviewed by: rrs@, Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18974	2019-02-20 17:56:38 +00:00
tuexen	0c4eb6ecc6	Fix a byte ordering issue for the advertised receiver window in ACK segments sent in TIMEWAIT state, which I introduced in r336937. MFC after: 3 days Sponsored by: Netflix, Inc.	2019-02-15 09:45:17 +00:00
ae	50a8601b2e	In r335015 PCB destroing was made deferred using epoch_call(). But ipsec_delete_pcbpolicy() uses some VNET-virtualized variables, and thus it needs VNET context, that is missing during gtaskqueue executing. Use inp_vnet context to set curvnet in in_pcbfree_deferred(). PR: 235684 MFC after: 1 week	2019-02-13 15:46:05 +00:00
kp	8e2074c2e1	garp: Fix vnet related panic for gratuitous arp Gratuitous ARP packets are sent from a timer, which means we don't have a vnet context set. As a result we panic trying to send the packet. Set the vnet context based on the interface associated with the interface address. To reproduce: sysctl net.link.ether.inet.garp_rexmit_count=2 ifconfig vtnet1 10.0.0.1/24 up PR: 235699 Reviewed by: vangyzen@ MFC after: 1 week	2019-02-12 21:22:57 +00:00
tuexen	6903958c2a	Improve input validation for raw IPv4 socket using the IP_HDRINCL option. This issue was found by running syzkaller on OpenBSD. Greg Steuck made me aware that the problem might also exist on FreeBSD. Reported by: Greg Steuck MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D18834	2019-02-12 10:17:21 +00:00
tuexen	b80fcf68dd	Fix a locking issue when reporing outbount messages. MFC after: 3 days	2019-02-10 14:02:14 +00:00
tuexen	eee3d66791	Fix a locking issue in the IPPROTO_SCTP level SCTP_PEER_ADDR_THLDS socket option. The problem affects only setsockopt with invalid parameters. This issue was found by syzkaller. MFC after: 3 days	2019-02-10 13:55:32 +00:00
tuexen	fb17e65b4c	Fix a locking bug in the IPPROTO_SCTP level SCTP_EVENT socket option. This occurs when call setsockopt() with invalid parameters. This issue was found by syzkaller. MFC after: 3 days	2019-02-10 10:42:16 +00:00
tuexen	b30530d5f6	Fix locking for IPPROTO_SCTP level SCTP_DEFAULT_PRINFO socket option. This problem occurred when calling setsockopt() will invalid parameters. This issue was found by running syzkaller. MFC after: 3 days	2019-02-10 08:28:56 +00:00
tuexen	57ec47006e	Ensure that when using the TCP CDG congestion control and setting the sysctl variable net.inet.tcp.cc.cdg.smoothing_factor to 0, the smoothing is disabled. Without this patch, a division by zero orrurs. PR: 193762 Reviewed by: lstewart@, rrs@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19071	2019-02-08 20:42:49 +00:00
tuexen	c6fb952de9	Only reduce the PMTU after the send call. The only way to increase it, is via PMTUD. This fixes an MTU issue reported by Timo Voelker. MFC after: 3 days	2019-02-05 10:29:31 +00:00
tuexen	78654493eb	Fix an off-by-one error in the input validation of the SCTP_RESET_STREAMS socketoption. This was found by running syzkaller. MFC after: 3 days	2019-02-05 10:13:51 +00:00
imp	82650adfef	Regularize the Netflix copyright Use recent best practices for Copyright form at the top of the license: 1. Remove all the All Rights Reserved clauses on our stuff. Where we piggybacked others, use a separate line to make things clear. 2. Use "Netflix, Inc." everywhere. 3. Use a single line for the copyright for grep friendliness. 4. Use date ranges in all places for our stuff. Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)	2019-02-04 21:28:25 +00:00
tuexen	46bca47606	When handling SYN-ACK segments in the SYN-RCVD state, set tp->snd_wnd consistently. This inconsistency was observed when working on the bug reported in PR 235256, although it does not fix the reported issue. The fix for the PR will be a separate commit. PR: 235256 Reviewed by: rrs@, Richard Scheffenegger MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19033	2019-02-01 12:33:00 +00:00
glebius	711fa71dfe	Repair siftr(4): PFIL_IN and PFIL_OUT are defines of some value, relying on them having particular values can break things.	2019-02-01 08:10:26 +00:00
glebius	9978a7d924	New pfil(9) KPI together with newborn pfil API and control utility. The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951	2019-01-31 23:01:03 +00:00
brooks	b686fa9ec4	Add a simple port filter to SIFTR. SIFTR does not allow any kind of filtering, but captures every packet processed by the TCP stack. Often, only a specific session or service is of interest, and doing the filtering in post-processing of the log adds to the overhead of SIFTR. This adds a new sysctl net.inet.siftr.port_filter. When set to zero, all packets get captured as previously. If set to any other value, only packets where either the source or the destination ports match, are captured in the log file. Submitted by: Richard Scheffenegger Reviewed by: Cheng Cui Differential Revision: https://reviews.freebsd.org/D18897	2019-01-30 17:44:30 +00:00
tuexen	7c894e3728	Fix the detection of ECN-setup SYN-ACK packets. RFC 3168 defines an ECN-setup SYN-ACK packet as on with the ECE flags set and the CWR flags not set. The code was only checking if ECE flag is set. This patch adds the check to verify that the CWR flags is not set. Submitted by: Richard Scheffenegger Reviewed by: tuexen@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18996	2019-01-28 12:45:31 +00:00
tuexen	98afe6eb13	Don't include two header files when not needed. This allows the part of the rewrite of TCP reassembly in this files to be MFCed to stable/11 with manual change. MFC after: 3 days Sponsored by: Netflix, Inc.	2019-01-25 17:08:28 +00:00
tuexen	8e4ad3b84a	Fix a bug in the restart window computation of TCP New Reno When implementing support for IW10, an update in the computation of the restart window used after an idle phase was missed. To minimize code duplication, implement the logic in tcp_compute_initwnd() and call it. This fixes a bug in NewReno, which was not aware of IW10. Submitted by: Richard Scheffenegger Reviewed by: tuexen@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18940	2019-01-25 13:57:09 +00:00
tuexen	cbb1f242e8	Get the arithmetic right... MFC after: 3 days Sponsored by: Netflix, Inc.	2019-01-24 16:47:18 +00:00
tuexen	347de7d146	Kill a trailing whitespace character... MFC after: 3 days Sponsored by: Netflix, Inc.	2019-01-24 16:43:13 +00:00
tuexen	7521f0c094	Update a comment to reflect the current reality. SYN-cache entries live for abaut 12 seconds, not 45, when default setting are used. MFC after: 1 week Sponsored by: Netflix, Inc.	2019-01-24 16:40:14 +00:00
markj	acda75d443	Style. Reviewed by: bz MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-01-23 22:19:49 +00:00
markj	af69719726	Fix an LLE lookup race. After the afdata read lock was converted to epoch(9), readers could observe a linked LLE and block on the LLE while a thread was unlinking the LLE. The writer would then release the lock and schedule the LLE for deferred free, allowing readers to continue and potentially schedule the LLE timer. By the point the timer fires, the structure is freed, typically resulting in a crash in the callout subsystem. Fix the problem by modifying the lookup path to check for the LLE_LINKED flag upon acquiring the LLE lock. If it's not set, the lookup fails. PR: 234296 Reviewed by: bz Tested by: sbruno, Victor <chernov_victor@list.ru>, Mike Andrews <mandrews@bit0.com> MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18906	2019-01-23 22:18:23 +00:00
brooks	8ba0807050	Make SIFTR work again after r342125 (D18443). Correct a logic error. Only disable when already enabled or enable when disabled. Submitted by: Richard Scheffenegger Reviewed by: Cheng Cui Obtained from: Cheng Cui MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D18885	2019-01-18 21:46:38 +00:00
tuexen	eb82fa876e	Limit the user-controllable amount of memory the kernel allocates via IPPROTO_SCTP level socket options. This issue was found by running syzkaller. MFC after: 1 week	2019-01-16 11:33:47 +00:00
shurd	1b9893d7d7	Fix window update issue when scaling disabled When the TCP window scale option is not used, and the window opens up enough in one soreceive, a window update will not be sent. For example, if recwin == 65535, so->so_rcv.sb_hiwat >= 262144, and so->so_rcv.sb_hiwat <= 524272, the window update will never be sent. This is because recwin and adv are clamped to TCP_MAXWIN << tp->rcv_scale, and so will never be >= so->so_rcv.sb_hiwat / 4 or <= so->so_rcv.sb_hiwat / 8. This patch ensures a window update is sent if the window opens by TCP_MAXWIN << tp->rcv_scale, which should only happen when the window size goes from zero to the max expressible. This issue looks like it was introduced in r306769 when recwin was clamped to TCP_MAXWIN << tp->rcv_scale. MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D18821	2019-01-15 17:40:19 +00:00
tuexen	87ee738236	Fix getsockopt() for IP_OPTIONS/IP_RETOPTS. r336616 copies inp->inp_options using the m_dup() function. However, this function expects an mbuf packet header at the beginning, which is not true in this case. Therefore, use m_copym() instead of m_dup(). This issue was found by syzkaller. Reviewed by: mmacy@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18753	2019-01-09 06:36:57 +00:00
glebius	6d8cc191f9	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
markj	85037a09f3	Support MSG_DONTWAIT in send(2). As it does for recv(2), MSG_DONTWAIT indicates that the call should not block, returning EAGAIN instead. Linux and OpenBSD both implement this, so the change makes porting easier, especially since we do not return EINVAL or so when unrecognized flags are specified. Submitted by: Greg V <greg@unrelenting.technology> Reviewed by: tuexen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18728	2019-01-04 17:31:50 +00:00
tuexen	c5b096f7c9	Fix a regression in the TCP handling of received segments. When receiving TCP segments the stack protects itself by limiting the resources allocated for a TCP connections. This patch adds an exception to these limitations for the TCP segement which is the next expected in-sequence segment. Without this patch, TCP connections may stall and finally fail in some cases of packet loss. Reported by: jhb@ Reviewed by: jtl@, rrs@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18580	2018-12-20 16:05:30 +00:00
hiren	3dc78ca62f	Revert r331567 CC Cubic: fix underflow for cubic_cwnd() This change is causing TCP connections using cubic to hang. Need to dig more to find exact cause and fix it. Reported by: tj at mrsk dot me, Matt Garber (via twitter) Discussed with: sbruno (previously), allanjude, cperciva MFC after: 3 days	2018-12-15 17:01:16 +00:00
brooks	acaa8abf5c	Fix bugs in plugable CC algorithm and siftr sysctls. Use the sysctl_handle_int() handler to write out the old value and read the new value into a temporary variable. Use the temporary variable for any checks of values rather than using the CAST_PTR_INT() macro on req->newptr. The prior usage read directly from userspace memory if the sysctl() was called correctly. This is unsafe and doesn't work at all on some architectures (at least i386.) In some cases, the code could also be tricked into reading from kernel memory and leaking limited information about the contents or crashing the system. This was true for CDG, newreno, and siftr on all platforms and true for i386 in all cases. The impact of this bug is largest in VIMAGE jails which have been configured to allow writing to these sysctls. Per discussion with the security officer, we will not be issuing an advisory for this issue as root access and a non-default config are required to be impacted. Reviewed by: markj, bz Discussed with: gordon (security officer) MFC after: 3 days Security: kernel information leak, local DoS (both require root) Differential Revision: https://reviews.freebsd.org/D18443	2018-12-15 15:06:22 +00:00
mjg	7e31d1de7e	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
markj	a74ba1d431	Clamp the INPCB port hash tables to IPPORT_MAX + 1 chains. Memory beyond that limit was previously unused, wasting roughly 1MB per 8GB of RAM. Also retire INP_PCBLBGROUP_PORTHASH, which was identical to INP_PCBPORTHASH. Reviewed by: glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D17803	2018-12-05 17:06:00 +00:00
ae	0d01acf0ac	Add ability to request listing and deleting only for dynamic states. This can be useful, when net.inet.ip.fw.dyn_keep_states is enabled, but after rules reloading some state must be deleted. Added new flag '-D' for such purpose. Retire '-e' flag, since there can not be expired states in the meaning that this flag historically had. Also add "verbose" mode for listing of dynamic states, it can be enabled with '-v' flag and adds additional information to states list. This can be useful for debugging. Obtained from: Yandex LLC MFC after: 2 months Sponsored by: Yandex LLC	2018-12-04 16:12:43 +00:00
tuexen	2da835a4bb	Limit option_len for the TCP_CCALGOOPT. Limiting the length to 2048 bytes seems to be acceptable, since the values used right now are using 8 bytes. Reviewed by: glebius, bz, rrs MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18366	2018-11-30 10:50:07 +00:00
markj	5c563658ea	Plug some networking sysctl leaks. Various network protocol sysctl handlers were not zero-filling their output buffers and thus would export uninitialized stack memory to userland. Fix a number of such handlers. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: tuexen MFC after: 3 days Security: kernel memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18301	2018-11-22 20:49:41 +00:00
tuexen	a02b4525ca	A TCP stack is required to check SEG.ACK first, when processing a segment in the SYN-SENT state as stated in Section 3.9 of RFC 793, page 66. Ensure this is also done by the TCP RACK stack. Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18034	2018-11-22 20:05:57 +00:00
tuexen	82210e189d	Ensure that the TCP RACK stack honours the setting of the net.inet.tcp.drop_synfin sysctl-variable. Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18033	2018-11-22 20:02:39 +00:00
tuexen	e4a2b60c79	Ensure that the default RTT stack can make an RTT measurement if the TCP connection was initiated using the RACK stack, but the peer does not support the TCP RACK extension. This ensures that the TCP behaviour on the wire is the same if the TCP connection is initated using the RACK stack or the default stack. Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18032	2018-11-22 19:56:52 +00:00
tuexen	3ece71ca83	Ensure that TCP RST-segments announce consistently a receiver window of zero. This was already done when sending them via tcp_respond(). Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D17949	2018-11-22 19:49:52 +00:00
tuexen	3ca57eff63	Improve two KASSERTs in the TCP RACK stack. There are two locations where an always true comparison was made in a KASSERT. Replace this by an appropriate check and use a consistent panic message. Also use this code when checking a similar condition. PR: 229664 Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18021	2018-11-21 18:19:15 +00:00
ae	d19730211c	Make multiline APPLY_MASK() macro to be function-like. Reported by: cem MFC after: 1 week	2018-11-20 18:38:28 +00:00
bz	cdb3c89094	Improve the comment for arpresolve_full() in if_ether.c. No functional changes. MFC after: 6 weeks	2018-11-17 16:13:09 +00:00
bz	1324fdd80d	Retire arpresolve_addr(), which is not used anywhere, from if_ether.c.	2018-11-17 16:08:36 +00:00
jtl	805519e47d	Add some additional length checks to the IPv4 fragmentation code. Specifically, block 0-length fragments, even when the MF bit is clear. Also, ensure that every fragment with the MF bit clear ends at the same offset and that no subsequently-received fragments exceed that offset. Reviewed by: glebius, markj MFC after: 3 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D17922	2018-11-16 18:32:48 +00:00
markj	9fb51bf6af	Ensure that IP fragments do not extend beyond IP_MAXPACKET. Such fragments are obviously invalid, and when processed may end up violating the sort order (by offset) of fragments of a given packet. This doesn't appear to be exploitable, however. Reviewed by: emaste Discussed with: jtl MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17914	2018-11-10 03:00:36 +00:00
emaste	a9fb9d9a30	Avoid buffer underwrite in icmp_error icmp_error allocates either an mbuf (with pkthdr) or a cluster depending on the size of data to be quoted in the ICMP reply, but the calculation failed to account for the additional padding that m_align may apply. Include the ip header in the size passed to m_align. On 64-bit archs this will have the net effect of moving everything 4 bytes later in the mbuf or cluster. This will result in slightly pessimal alignment for the ICMP data copy. Also add an assertion that we do not move m_data before the beginning of the mbuf or cluster. Reported by: A reddit user Reviewed by: bz, jtl MFC after: 3 days Security: CVE-2018-17156 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17909	2018-11-08 20:17:36 +00:00
tuexen	ece78450d2	Don't use a function when neither INET nor INET6 are defined. This is a valid case for the userland stack, where this fixes two set-but-not-used warnings in this case. Thanks to Christian Wright for reporting the issue.	2018-11-06 12:55:03 +00:00
jtl	6e3370ff0c	m_pulldown() may reallocate n. Update the oip pointer after the m_pulldown() call. MFC after: 2 weeks Sponsored by: Netflix	2018-11-02 19:14:15 +00:00
bz	3e0888e473	carpstats are the last virtualised variable in the file and end up at the end of the vnet_set. The generated code uses an absolute relocation at one byte beyond the end of the carpstats array. This means the relocation for the vnet does not happen for carpstats initialisation and as a result the kernel panics on module load. This problem has only been observed with carp and only on i386. We considered various possible solutions including using linker scripts to add padding to all kernel modules for pcpu and vnet sections. While the symbols (by chance) stay in the order of appearance in the file adding an unused non-file-local variable at the end of the file will extend the size of set_vnet and hence make the absolute relocation for carpstats work (think of this as a single-module set_vnet padding). This is a (tmporary) hack. It is the least intrusive one as we need a timely solution for the upcoming release. We will revisit the problem in HEAD. For a lot more information and the possible alternate solutions please see the PR and the references therein. PR: 230857 MFC after: 3 days	2018-11-01 17:26:18 +00:00
markj	96786af957	Remove redundant checks for a NULL lbgroup table. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17108	2018-11-01 15:52:49 +00:00
markj	fbdfd87a7a	Improve style in in_pcbinslbgrouphash() and related subroutines. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17107	2018-11-01 15:51:49 +00:00
tuexen	16b71af179	Remove debug code which slipped in accidently. MFC after: 4 weeks X-MFC with: r339989 Sponsored by: Netflix, Inc.	2018-11-01 11:41:40 +00:00
tuexen	903e3f7b49	Improve a comment to refer to the actual sections in the TCP specification for the comparisons made. Thanks to lstewart@ for the suggestion. MFC after: 4 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D17595	2018-11-01 11:35:28 +00:00
bz	3431d451a5	Initial implementation of draft-ietf-6man-ipv6only-flag. This change defines the RA "6" (IPv6-Only) flag which routers may advertise, kernel logic to check if all routers on a link have the flag set and accordingly update a per-interface flag. If all routers agree that it is an IPv6-only link, ether_output_frame(), based on the interface flag, will filter out all ETHERTYPE_IP/ARP frames, drop them, and return EAFNOSUPPORT to upper layers. The change also updates ndp to show the "6" flag, ifconfig to display the IPV6_ONLY nd6 flag if set, and rtadvd to allow announcing the flag. Further changes to tcpdump (contrib code) are availble and will be upstreamed. Tested the code (slightly earlier version) with 2 FreeBSD IPv6 routers, a FreeBSD laptop on ethernet as well as wifi, and with Win10 and OSX clients (which did not fall over with the "6" flag set but not understood). We may also want to (a) implement and RX filter, and (b) over time enahnce user space to, say, stop dhclient from running when the interface flag is set. Also we might want to start IPv6 before IPv4 in the future. All the code is hidden under the EXPERIMENTAL option and not compiled by default as the draft is a work-in-progress and we cannot rely on the fact that IANA will assign the bits as requested by the draft and hence they may change. Dear 6man, you have running code. Discussed with: Bob Hinden, Brian E Carpenter	2018-10-30 20:08:48 +00:00
markj	c5f66c0d3c	Expose some netdump configuration parameters through sysctl. Reviewed by: cem MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17755	2018-10-29 21:16:26 +00:00
eugen	07121b9ff8	Prevent ip_input() from panicing due to unprotected access to INADDR_HASH. PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others	2018-10-27 04:59:35 +00:00
eugen	78f0bffe82	Prevent multicast code from panicing due to unprotected access to INADDR_HASH. PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others	2018-10-27 04:53:25 +00:00
tuexen	8397e600b6	Add initial descriptions for SCTP related MIB variable. This work was mostly done by Marie-Helene Kvello-Aune. MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D3583	2018-10-26 21:04:17 +00:00
ae	91cf1d92ac	Add the check that current VNET is ready and access to srchash is allowed. This change is similar to r339646. The callback that checks for appearing and disappearing of tunnel ingress address can be called during VNET teardown. To prevent access to already freed memory, add check to the callback and epoch_wait() call to be sure that callback has finished its work. MFC after: 20 days	2018-10-23 13:11:45 +00:00
jhb	439be2ca2b	A couple of style fixes in recent TCP changes. - Add a blank line before a block comment to match other block comments in the same function. - Sort the prototype for sbsndptr_adv and fix whitespace between return type and function name. Reviewed by: gallatin, bz Differential Revision: https://reviews.freebsd.org/D17474	2018-10-22 21:17:36 +00:00
eugen	c7b538ee72	New sysctl: net.inet.icmp.error_keeptags Currently, icmp_error() function copies FIB number from original packet into generated ICMP response but not mbuf_tags(9) chain. This prevents us from easily matching ICMP responses corresponding to tagged original packets by means of packet filter such as ipfw(8). For example, ICMP "time-exceeded in-transit" packets usually generated in response to traceroute probes lose tags attached to original packets. This change adds new sysctl net.inet.icmp.error_keeptags that defaults to 0 to avoid extra overhead when this feature not needed. Set net.inet.icmp.error_keeptags=1 to make icmp_error() copy mbuf_tags from original packet to generated ICMP response. PR: 215874 MFC after: 1 month	2018-10-21 21:29:19 +00:00
ae	221039c91a	Include <sys/eventhandler.h> to fix the build. MFC after: 1 month	2018-10-21 18:39:34 +00:00
ae	b620bf12c6	Add handling for appearing/disappearing of ingress addresses to if_gre(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17214	2018-10-21 18:13:45 +00:00
ae	802ce6d2c8	Add handling for appearing/disappearing of ingress addresses to if_gif(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; * remove the note about ingress address from BUGS section. MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17134	2018-10-21 18:06:15 +00:00
ae	09f5d08690	Add KPI that can be used by tunneling interfaces to handle IP addresses appearing and disappearing on the host system. Such handling is need, because tunneling interfaces must use addresses, that are configured on the host as ingress addresses for tunnels. Otherwise the system can send spoofed packets with source address, that belongs to foreign host. The KPI uses ifaddr_event_ext event to implement addresses tracking. Tunneling interfaces register event handlers and then they are notified by the kernel, when an address disappears or appears. ifaddr_event_compat() handler from if.c replaced by srcaddr_change_event() in the ip_encap.c MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17134	2018-10-21 17:55:26 +00:00
ae	32b03c3d5c	Add IPFW_RULE_JUSTOPTS flag, that is used by ipfw(8) to mark rule, that was added using "new rule format". And then, when the kernel returns rule with this flag, ipfw(8) can correctly show it. Reported by: lev MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17373	2018-10-21 15:10:59 +00:00
ae	8d3e25d418	Add ifaddr_event_ext event. It is similar to ifaddr_event, but the handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL, and the pointer to ifaddr. Also ifaddr_event now is implemented using ifaddr_event_ext handler. MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17100	2018-10-21 15:02:06 +00:00
tuexen	d08b474ae2	The handling of RST segments in the SYN-RCVD state exists in the code paths. Both are not consistent and the one on the syn cache code does not conform to the relevant specifications (Page 69 of RFC 793 and Section 4.2 of RFC 5961). This patch fixes this: * The sequence numbers checks are fixed as specified on page Page 69 RFC 793. * The sysctl variable net.inet.tcp.insecure_rst is now honoured and the behaviour as specified in Section 4.2 of RFC 5961. Approved by: re (gjb@) Reviewed by: bz@, glebius@, rrs@, Differential Revision: https://reviews.freebsd.org/D17595 Sponsored by: Netflix, Inc.	2018-10-18 19:21:18 +00:00
jtl	ac81f2f2c7	In r338102, the TCP reassembly code was substantially restructured. Prior to this change, the code sometimes used a temporary stack variable to hold details of a TCP segment. r338102 stopped using the variable to hold segments, but did not actually remove the variable. Because the variable is no longer used, we can safely remove it. Approved by: re (gjb)	2018-10-16 14:41:09 +00:00
bz	071c24dda8	In udp_input() when walking the pcblist we can come across an inp marked FREED after the epoch(9) changes. Check once we hold the lock and skip the inp if it is the case. Contrary to IPv6 the locking of the inp is outside the multicast section and hence a single check seems to suffice. PR: 232192 Reviewed by: mmacy, markj Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17540	2018-10-12 22:51:45 +00:00
bz	4136a8dd5f	r217592 moved the check for imo in udp_input() into the conditional block but leaving the variable assignment outside the block, where it is no longer used. Move both the variable and the assignment one block further in. This should result in no functional changes. It will however make upcoming changes slightly easier to apply. Reviewed by: markj, jtl, tuexen Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17525	2018-10-12 11:30:46 +00:00
jtl	1d041dd4b1	There are three places where we return from a function which entered an epoch section without exiting that epoch section. This is bad for two reasons: the epoch section won't exit, and we will leave the epoch tracker from the stack on the epoch list. Fix the epoch leak by making sure we exit epoch sections before returning. Reviewed by: ae, gallatin, mmacy Approved by: re (gjb, kib) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D17450	2018-10-09 13:26:06 +00:00
tuexen	d78994fa55	Avoid truncating unrecognised parameters when reporting them. This resulted in sending malformed packets. Approved by: re (kib@) MFC after: 1 week	2018-10-07 15:13:47 +00:00
tuexen	31b6fdc2c2	Ensure that the ips_localout counter is incremented for locally generated SCTP packets sent over IPv4. This make the behaviour consistent with IPv6. Reviewed by: ae@, bz@, jtl@ Approved by: re (kib@) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17406	2018-10-07 11:26:15 +00:00
thj	558062352d	Convert UDP length to host byte order When getting the number of bytes to checksum make sure to convert the UDP length to host byte order when the entire header is not in the first mbuf. Reviewed by: jtl, tuexen, ae Approved by: re (gjb), jtl (mentor) Differential Revision: https://reviews.freebsd.org/D17357	2018-10-05 12:51:30 +00:00
rstone	05d785dcc2	Hold a write lock across udp_notify() With the new route cache feature udp_notify() will modify the inp when it needs to invalidate the route cache. Ensure that we hold a write lock on the inp before calling the function to ensure that multiple threads don't race while trying to invalidate the cache (which previously lead to a page fault). Differential Revision: https://reviews.freebsd.org/D17246 Reviewed by: sbruno, bz, karels Sponsored by: Dell EMC Isilon Approved by: re (gjb)	2018-10-04 22:03:58 +00:00
tuexen	f23e3b010d	Mitigate providing a timing signal if the COOKIE or AUTH validation fails. Thanks to jmg@ for reporting the issue, which was discussed in https://admbugs.freebsd.org/show_bug.cgi?id=878 Approved by: re (TBD@) MFC after: 1 week	2018-10-01 14:05:31 +00:00
tuexen	2a03dff127	After allocating chunks set the fields in a consistent way. This removes two assignments for the flags field being done twice and adds one, which was missing. Thanks to Felix Weinrank for reporting the issue he found by using fuzz testing of the userland stack. Approved by: re (kib@) MFC after: 1 week	2018-10-01 13:09:18 +00:00
ae	001b7b7b0f	Add INP_INFO_WUNLOCK_ASSERT() macro and use it instead of INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic it is possible, that the code is running in net_epoch_preempt section, and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case. PR: 231428 Reviewed by: mmacy, tuexen Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17335	2018-10-01 10:46:00 +00:00
tuexen	04d432fdc0	Plug mbuf leak in the SCTP input path in an error case. Approved by: re (kib@) MFC after: 1 week CID: 749312	2018-09-30 21:54:02 +00:00
tuexen	b367218794	Plug mbuf leaks in the SCTP output path in error cases. Approved by: re (kib@) MFC after: 1 week CID: 1395307	2018-09-30 21:31:33 +00:00
tuexen	6e83ea0505	Fix the handling of ancillary data for SCTP socket. Implement sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs() similar to sctp_find_cmsg() to improve consistency and avoid the signed/unsigned issues in sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs(). Thanks to andrew@ for reporting the problem he found using syzcaller. Approved by: re (kib@) MFC after: 1 week	2018-09-30 16:21:31 +00:00
tuexen	3b9f1e4292	Increment the corresponding UDP stats counter (udps_opackets) when sending UDP encapsulated SCTP packets. This is consistent with the behaviour that when such packets are received, the corresponding UDP stats counter (udps_ipackets) is incremented. Thanks to Peter Lei for making me aware of this inconsistency. Approved by: re (kib@) MFC after: 1 week	2018-09-30 12:16:06 +00:00
tuexen	94131e0973	Fix typo in comment. Reported by: @danfe Approved by: re (kib@) MFC after: 1 week X-MFC: r338941	2018-09-28 19:47:32 +00:00
tuexen	26444c890a	Whitespace changes and fixing a typo. No functional change. Approved by: re (kib@) MFC after: 1 week	2018-09-26 10:24:50 +00:00
tuexen	db05ff408d	Remove the unused parameter 'locked' from the function syncache_respond(). There is no functional change. The parameter became unused in r313330, but wasn't removed. Approved by: re (kib@) MFC after: 1 month Sponsored by: Netflix, Inc.	2018-09-23 16:37:32 +00:00
ae	9b21fdca8b	Add new field max_hdrsize to struct encap_config. It is currently unused and reserved for future use to keep KBI/KPI. Also add several spare pointers to be able extend structure if it will be needed. Approved by: re (gjb)	2018-09-20 19:45:27 +00:00
tuexen	5d6c9ffbf4	Remove unused code. Approved by: re (kib@) MFC after: 1 week	2018-09-18 10:53:07 +00:00
tuexen	7ac790e6a6	Fix TCP Fast Open for the TCP RACK stack. * Fix a bug where the SYN handling during established state was applied to a front state. * Move a check for retransmission after the timer handling. This was suppressing timer based retransmissions. * Fix an off-by one byte in the sequence number of retransmissions. * Apply fixes corresponding to https://svnweb.freebsd.org/changeset/base/336934 Reviewed by: rrs@ Approved by: re (kib@) MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16912	2018-09-12 10:27:58 +00:00
markj	6131d60f6a	Fix synchronization of LB group access. Lookups are protected by an epoch section, so the LB group linkage must be a CK_LIST rather than a plain LIST. Furthermore, we were not deferring LB group frees, so in_pcbremlbgrouphash() could race with readers and cause a use-after-free. Reviewed by: sbruno, Johannes Lundberg <johalun0@gmail.com> Tested by: gallatin Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17031	2018-09-10 19:00:29 +00:00
markj	eb3d0d016c	Use ratecheck(9) in in_pcbinslbgrouphash(). Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com> Approved by: re (kib) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17065	2018-09-07 21:11:41 +00:00
bz	f7f4373428	The inp_lle field to struct inpcb, along with two "valid" flags for the rt and lle cache were added in r191129 (2009). To my best knowledge they have never been used and route caching has converted the inp_rt field from that commit to inp_route rendering this field and these flags obsolete. Convert the pointer into a spare pointer to not change the size of the structure anymore (and to have a spare pointer) and mark the two fields as unused. Reviewed by: markj, karels Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17062	2018-09-06 19:55:40 +00:00
bz	d24fb55973	Make tcp_hpts.c compile a LINT kernel with options RSS and PCBGROUPS added by adding the missing include files and changing a the type of cpuid which would otherwise cause a false comparison with NETISR_CPUID_NONE. Reviewed by: rrs Approved by: re (marius) Differential Revision: https://reviews.freebsd.org/D16891	2018-09-06 16:11:24 +00:00
markj	8e62ee612e	Define sctp probes only when SCTP is configured. Otherwise the "depends_on provider" guard in sctp.d does not work as intended. Reported by: mjg Reviewed by: tuexen Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17057	2018-09-06 14:15:03 +00:00
markj	4b466cd9a2	Fix style bugs in in_pcblookup_lbgroup(). No functional change intended. Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com> Approved by: re (rgrimes) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17030	2018-09-05 15:04:11 +00:00
eugen	3eb0af32cd	Fix "ipfw fwd" to work for incoming IPv4 packets when ip_tryforward() chooses fast forwarding path, as it already works for IPv6 and for both of them on old slow path. PR: 231143 Reviewed by: ae Approved by: re (gjb) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17039	2018-09-05 13:59:36 +00:00
markj	b60ed05720	Use the correct malloc type in in_pcblbgroup_free(). Approved by: re (kib) Sponsored by: The FreeBSD Foundation	2018-09-03 17:39:09 +00:00
tuexen	7a5c8ca8d6	Fix a shadowed variable warning. Thanks to Peter Lei for reporting the issue. Approved by: re(kib@) MFH: 1 month Sponsored by: Netflix, Inc.	2018-08-24 10:50:19 +00:00
tuexen	de9ca6bed2	Use arc4rand() instead of read_random() in the SCTP and TCP code. This was suggested by jmg@. Reviewed by: delphij@, jmg@, jtl@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16860	2018-08-23 19:10:45 +00:00
tuexen	406f6f6c81	Don't use the explicit number 32 for the length of the secrets, use sizeof() or explicit #definesi instead. No functional change. This was suggested by jmg@. MFC after: 1 month XMFC with: r338053 Sponsored by: Netflix, Inc.	2018-08-23 06:03:59 +00:00
tuexen	bd6049115f	Add support for send, receive and state-change DTrace providers for SCTP. They are based on what is specified in the Solaris DTrace manual for Solaris 11.4. Reviewed by: 0mp, dteske, markj Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16839	2018-08-22 21:23:32 +00:00
mmacy	47fa74161c	in_mcast: fix copy paste error when clearing flag	2018-08-22 04:09:55 +00:00
tuexen	3055b3b326	Enabling the IPPROTO_IPV6 level socket option IPV6_USE_MIN_MTU on a TCP socket resulted in sending fragmented IPV6 packets. This is fixes by reducing the MSS to the appropriate value. In addtion, if the socket option is set before the handshake happens, announce this MSS to the peer. This is not stricly required, but done since TCP is conservative. PR: 173444 Reviewed by: bz@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16796	2018-08-21 14:12:30 +00:00
tuexen	3f0e4c422a	Fix the inheritance of IPv6 level socket options on TCP sockets. This was broken for IPv6 listening socket, which are not IPV6_ONLY, and the accepted TCP connection was using IPv4. Reviewed by: bz@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16792	2018-08-21 14:07:36 +00:00
tuexen	ffb8cde70c	Whitespace change.	2018-08-21 13:37:06 +00:00
tuexen	ee82947ed9	Refactor the SHUTDOWN_PENDING state handling. This is not a functional change but a preperation for the upcoming DTrace support. It is necessary to change the state in one logical operation, even if it involves clearing the sub state SHUTDOWN_PENDING. MFC after: 1 month	2018-08-21 13:25:32 +00:00
bz	cc0e36203d	GC inc_isipv6; it was added for "temp" compatibility in 2001, r86764 and does not seem to be used.	2018-08-20 20:06:36 +00:00
rrs	f87b4f4276	This change represents a substantial restructure of the way we reassembly inbound tcp segments. The old algorithm just blindly dropped in segments without coalescing. This meant that every segment could take up greater and greater room on the linked list of segments. This of course is now subject to a tighter limit (100) of segments which in a high BDP situation will cause us to be a lot more in-efficent as we drop segments beyond 100 entries that we receive. What this restructure does is cause the reassembly buffer to coalesce segments putting an emphasis on the two common cases (which avoid walking the list of segments) i.e. where we add to the back of the queue of segments and where we add to the front. We also have the reassembly buffer supporting a couple of debug options (black box logging as well as counters for code coverage). These are compiled out by default but can be added by uncommenting the defines. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16626	2018-08-20 12:43:18 +00:00
tuexen	1547506439	Don't expose the uptime via the TCP timestamps. The TCP client side or the TCP server side when not using SYN-cookies used the uptime as the TCP timestamp value. This patch uses in all cases an offset, which is the result of a keyed hash function taking the source and destination addresses and port numbers into account. The keyed hash function is the same a used for the initial TSN. Reviewed by: rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16636	2018-08-19 14:56:10 +00:00
np	06d6f82b42	Add the ability to look up the 3b PCP of a VLAN interface. Use it in toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752	2018-08-16 23:46:38 +00:00
mmacy	99cec0a00c	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu	2018-08-15 20:23:08 +00:00
loos	0c53676dfd	Late style follow up on r312770. Submitted by: glebius X-MFC with: r312770 MFC after: 3 days	2018-08-15 15:44:30 +00:00
jtl	25dd68b737	Lower the default limits on the IPv4 reassembly queue. In particular, try to ensure that no bucket will have a reassembly queue larger than approximately 100 items. This limits the cost to find the correct reassembly queue when processing an incoming fragment. Due to the low limits on each bucket's length, increase the size of the hash table from 64 to 1024. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:30:46 +00:00
jtl	fba8297bee	Implement a limit on on the number of IPv4 reassembly queues per bucket. There is a hashing algorithm which should distribute IPv4 reassembly queues across the available buckets in a relatively even way. However, if there is a flaw in the hashing algorithm which allows a large number of IPv4 fragment reassembly queues to end up in a single bucket, a per- bucket limit could help mitigate the performance impact of this flaw. Implement such a limit, with a default of twice the maximum number of reassembly queues divided by the number of buckets. Recalculate the limit any time the maximum number of reassembly queues changes. However, allow the user to override the value using a sysctl (net.inet.ip.maxfragbucketsize). Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:23:05 +00:00
jtl	5a5ca2cd22	Add a global limit on the number of IPv4 fragments. The IP reassembly fragment limit is based on the number of mbuf clusters, which are a global resource. However, the limit is currently applied on a per-VNET basis. Given enough VNETs (or given sufficient customization of enough VNETs), it is possible that the sum of all the VNET limits will exceed the number of mbuf clusters available in the system. Given the fact that the fragment limit is intended (at least in part) to regulate access to a global resource, the fragment limit should be applied on a global basis. VNET-specific limits can be adjusted by modifying the net.inet.ip.maxfragpackets and net.inet.ip.maxfragsperpacket sysctls. To disable fragment reassembly globally, set net.inet.ip.maxfrags to 0. To disable fragment reassembly for a particular VNET, set net.inet.ip.maxfragpackets to 0. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:19:49 +00:00
jtl	6acb9fd7be	Improve hashing of IPv4 fragments. Currently, IPv4 fragments are hashed into buckets based on a 32-bit key which is calculated by (src_ip ^ ip_id) and combined with a random seed. However, because an attacker can control the values of src_ip and ip_id, it is possible to construct an attack which causes very deep chains to form in a given bucket. To ensure more uniform distribution (and lower predictability for an attacker), calculate the hash based on a key which includes all the fields we use to identify a reassembly queue (dst_ip, src_ip, ip_id, and the ip protocol) as well as a random seed. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:15:47 +00:00
tuexen	272c923b99	Remove a set but not used warning showing up in usrsctp.	2018-08-14 08:32:33 +00:00
ae	694891e438	Restore ability to send ICMP and ICMPv6 redirects. It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled only when sending redirects for corresponding IP version is disabled via sysctl. Otherwise will be used default forwarding function. PR: 221137 Submitted by: mckay@ MFC after: 2 weeks	2018-08-14 07:54:14 +00:00
tuexen	40db44d0ca	Use the stacb instead of the asoc in state macros. This is not a functional change. Just a preparation for upcoming dtrace state change provider support.	2018-08-13 13:58:45 +00:00
tuexen	9898120d74	Use consistently the macors to modify the assoc state. No functional change.	2018-08-13 11:56:21 +00:00
tuexen	5543b8edb0	Add explicit cast to silence a warning for the userland stack. Thanks to Felix Weinrank for providing the patch.	2018-08-12 14:05:15 +00:00
dteske	03591a0d28	Fix misspellings of transmitter/transmitted Reviewed by: emaste, bcr Sponsored by: Smule, Inc. Differential Revision: https://reviews.freebsd.org/D16025	2018-08-10 20:37:32 +00:00
ae	0645c33e3e	Remove unneeded ipsec-related includes. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D16637	2018-08-10 07:24:01 +00:00
luporl	2f30606f2f	[ppc] Fix kernel panic when using BOOTP_NFSROOT On PowerPC (and possibly other architectures), that doesn't use EARLY_AP_STARTUP, the config task queue may be used initialized. This was observed while trying to mount the root fs from NFS, as reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168. This patch has 2 main changes: 1- Perform a basic initialization of qgroup_config, similar to what is done in taskqgroup_adjust, but simpler. This makes qgroup_config ready to be used during NFS root mount. 2- When EARLY_AP_STARTUP is not used, call inm_init() and in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs to send multicast packages to request an IP. PR: Bug 230168 Reported by: sbruno Reviewed by: jhibbits, mmacy, sbruno Approved by: jhibbits Differential Revision: D16633	2018-08-09 14:04:51 +00:00
rrs	e506903e41	Fix a small bug in rack where it will end up sending the FIN twice. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16604	2018-08-08 13:36:49 +00:00
jtl	20a8fc967a	Address concerns about CPU usage while doing TCP reassembly. Currently, the per-queue limit is a function of the receive buffer size and the MSS. In certain cases (such as connections with large receive buffers), the per-queue segment limit can be quite large. Because we process segments as a linked list, large queues may not perform acceptably. The better long-term solution is to make the queue more efficient. But, in the short-term, we can provide a way for a system administrator to set the maximum queue size. We set the default queue limit to 100. This is an effort to balance performance with a sane resource limit. Depending on their environment, goals, etc., an administrator may choose to modify this limit in either direction. Reviewed by: jhb Approved by: so Security: FreeBSD-SA-18:08.tcp Security: CVE-2018-6922	2018-08-06 17:36:57 +00:00
rrs	84cb99c5d1	This fixes a bug in Rack where we were not properly using the correct value for Delayed Ack. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16579	2018-08-06 09:22:07 +00:00
glebius	8a6f698b85	Now that after r335979 the kernel addresses in API structures are fixed size, there is no reason left for the unions. Discussed with: brooks	2018-08-04 00:03:21 +00:00
tuexen	17f71a271f	Add a dtrace provider for UDP-Lite. The dtrace provider for UDP-Lite is modeled after the UDP provider. This fixes the bug that UDP-Lite packets were triggering the UDP provider. Thanks to dteske@ for providing the dwatch module. Reviewed by: dteske@, markj@, rrs@ Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16377	2018-07-31 22:56:03 +00:00
tuexen	e39b8e20f5	Fix INET only builds. r336940 introduced an "unused variable" warning on platforms which support INET, but not INET6, like MALTA and MALTA64 as reported by Mark Millard. Improve the #ifdefs to address this issue. Sponsored by: Netflix, Inc.	2018-07-31 06:27:05 +00:00
tuexen	e122a5a1f6	Allow implicit TCP connection setup for TCP/IPv6. TCP/IPv4 allows an implicit connection setup using sendto(), which is used for TTCP and TCP fast open. This patch adds support for TCP/IPv6. While there, improve some tests for detecting multicast addresses, which are mapped. Reviewed by: bz@, kbowling@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16458	2018-07-30 21:27:26 +00:00
tuexen	8cb442560a	Send consistent SEG.WIN when using timewait codepath for TCP. When sending TCP segments from the timewait code path, a stored value of the last sent window is used. Use the same code for computing this in the timewait code path as in the main code path used in tcp_output() to avoiv inconsistencies. Reviewed by: rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16503	2018-07-30 21:13:42 +00:00
tuexen	202d355a8d	Fix some TCP fast open issues. The following issues are fixed: * Whenever a TCP server with TCP fast open enabled, calls accept(), recv(), send(), and close() before the TCP-ACK segment has been received, the TCP connection is just dropped and the reception of the TCP-ACK segment triggers the sending of a TCP-RST segment. * Whenever a TCP server with TCP fast open enabled, calls accept(), recv(), send(), send(), and close() before the TCP-ACK segment has been received, the first byte provided in the second send call is not transferred. * Whenever a TCP client with TCP fast open enabled calls sendto() followed by close() the TCP connection is just dropped. Reviewed by: jtl@, kbowling@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16485	2018-07-30 20:35:50 +00:00
tuexen	0f18e562b3	Add missing send/recv dtrace probes for TCP. These missing probe are mostly in the syncache and timewait code. Reviewed by: markj@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16369	2018-07-30 20:13:38 +00:00
asomers	b3776cb8de	Make timespecadd(3) and friends public The timespecadd(3) family of macros were imported from NetBSD back in r35029. However, they were initially guarded by #ifdef _KERNEL. In the meantime, we have grown at least 28 syscalls that use timespecs in some way, leading many programs both inside and outside of the base system to redefine those macros. It's better just to make the definitions public. Our kernel currently defines two-argument versions of timespecadd and timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define three-argument versions. Solaris also defines a three-argument version, but only in its kernel. This revision changes our definition to match the common three-argument version. Bump _FreeBSD_version due to the breaking KPI change. Discussed with: cem, jilles, ian, bde Differential Revision: https://reviews.freebsd.org/D14725	2018-07-30 15:46:40 +00:00
rrs	f40c01b28e	This fixes a hole where rack could end up sending an invalid segment into the reassembly queue. This would happen if you enabled the data after close option. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16453	2018-07-30 10:23:29 +00:00
andrew	5b192cd830	icmp_quotelen was accidentially changes in r336676, undo this. Sponsored by: DARPA, AFRL	2018-07-24 16:45:01 +00:00
andrew	a6605d2938	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
rrs	9df38bfa1f	Delete the example tcp stack "fastpath" which was only put in has an example. Sponsored by: Netflix inc. Differential Revision: https://reviews.freebsd.org/D16420	2018-07-24 14:55:47 +00:00
mmacy	813f5d12cc	Fix a potential use after free in getsockopt() access to inp_options Discussed with: jhb Reviewed by: sbruno, transport MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14621	2018-07-22 20:02:14 +00:00
mmacy	6fcaec6a10	NULL out cc_data in pluggable TCP {cc}_cb_destroy When ABE was added (rS331214) to NewReno and leak fixed (rS333699) , it now has a destructor (newreno_cb_destroy) for per connection state. Other congestion controls may allocate and free cc_data on entry and exit, but the field is never explicitly NULLed if moving back to NewReno which only internally allocates stateful data (no entry contstructor) resulting in a situation where newreno_cb_destory might be called on a junk pointer. - NULL out cc_data in the framework after calling {cc}_cb_destroy - free(9) checks for NULL so there is no need to perform not NULL checks before calling free. - Improve a comment about NewReno in tcp_ccalgounload This is the result of a debugging session from Jason Wolfe, Jason Eggleston, and mmacy@ and very helpful insight from lstewart@. Submitted by: Kevin Bowling Reviewed by: lstewart Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16282	2018-07-22 05:37:58 +00:00
tuexen	72da02e61c	Set the IPv4 version in the IP header for UDP and UDPLite.	2018-07-21 02:14:13 +00:00
tuexen	ff46e28acc	Add missing dtrace probes for received UDP packets. Fire UDP receive probes when a packet is received and there is no endpoint consuming it. Fire the probe also if the TTL of the received packet is smaller than the minimum required by the endpoint. Clarify also in the man page, when the probe fires. Reviewed by: dteske@, markj@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16046	2018-07-20 15:32:20 +00:00
tuexen	9bf2bb1b21	Whitespace changes due to changes in ident.	2018-07-19 20:16:33 +00:00
tuexen	14de4a3d5b	Revert https://svnweb.freebsd.org/changeset/base/336503 since I also ran the export script with different parameters.	2018-07-19 20:11:14 +00:00
tuexen	5810243631	Whitespace changes due to change if ident.	2018-07-19 19:33:42 +00:00
rrs	4b9f4bff13	Bump the ICMP echo limits to match the RFC Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16333	2018-07-18 22:49:53 +00:00
ae	d94c744a40	Move invoking of callout_stop(&lle->lle_timer) into llentry_free(). This deduplicates the code a bit, and also implicitly adds missing callout_stop() to in[6]_lltable_delete_entry() functions. PR: 209682, 225927 Submitted by: hselasky (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4605	2018-07-17 11:33:23 +00:00
sbruno	d142ab3470	There was quite a bit of feedback on r336282 that has led to the submitter to want to revert it.	2018-07-14 23:53:51 +00:00
sbruno	388f09b02b	Fixup memory management for fetching options in ip_ctloutput() Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14621	2018-07-14 16:19:46 +00:00
markj	0f3f6a3bb8	Remove a duplicate check. PR: 229663 Submitted by: David Binderman <dcb314@hotmail.com> MFC after: 3 days	2018-07-11 14:54:56 +00:00
brooks	39f527e7ee	Use uintptr_t alone when assigning to kvaddr_t variables. Suggested by: jhb	2018-07-10 13:03:06 +00:00
tuexen	c7a5475854	Add support for printing the TCP FO client-side cookie cache via the sysctl interface. This is similar to the TCP host cache. Reviewed by: pkelsey@, kbowling@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D14554	2018-07-10 10:50:43 +00:00
tuexen	6123188c8a	Use appropriate MSS value when populating the TCP FO client cookie cache When a client receives a SYN-ACK segment with a TFP fast open cookie, but without an MSS option, an MSS value from uninitialised stack memory is used. This patch ensures that in case no MSS option is included in the SYN-ACK, the appropriate value as given in RFC 7413 is used. Reviewed by: kbowling@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16175	2018-07-10 10:42:48 +00:00
smh	c6ce3f9dca	Removed pointless NULL check Removed pointless NULL check after malloc with M_WAITOK which can never return NULL. Sponsored by: Multiplay	2018-07-10 08:05:32 +00:00
ae	544b51e5e3	Add "record-state", "set-limit" and "defer-action" rule options to ipfw. "record-state" is similar to "keep-state", but it doesn't produce implicit O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the same feature as "record-state", it is single opcode without implicit O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic states. When rule with this opcode is matched, the rule's action will not be executed, instead dynamic state will be created. And when this state will be matched by "check-state", then rule action will be executed. This allows create a more complicated rulesets. Submitted by: lev MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D1776	2018-07-09 11:35:18 +00:00
tuexen	68c3da5c03	Allow alternate TCP stack to populate the TCP FO client cookie cache. Without this patch, TCP FO could be used when using alternate TCP stack, but only existing entires in the TCP client cookie cache could be used. This cache was not populated by connections using alternate TCP stacks. Sponsored by: Netflix, Inc.	2018-07-07 12:28:16 +00:00
tuexen	ab8567c6ff	When initializing the TCP FO client cookie cache, take into account whether the TCP FO support is enabled or not for the client side. The code in tcp_fastopen_init() implicitly assumed that the sysctl variable V_tcp_fastopen_client_enable was initialized to 0. This was initially true, but was changed in r335610, which unmasked this bug. Thanks to Pieter de Goeje for reporting the issue on freebsd-net@	2018-07-07 11:18:26 +00:00
brooks	c4d0432c6f	One more 32-bit fix for r335979. Reported by: tuexen	2018-07-06 13:34:45 +00:00
brooks	8baf738e84	Correct breakage on 32-bit platforms from r335979.	2018-07-06 10:03:33 +00:00

... 3 4 5 6 7 ...

6388 Commits