freebsd-dev

Author	SHA1	Message	Date
Conrad Meyer	e2e050c8ef	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
Michael Tuexen	3a4f12e3b1	Allow sending on demand SCTP HEARTBEATS only in the ESTABLISHED state. This issue was found by running syzkaller. MFC after: 3 days	2019-05-19 17:53:36 +00:00
Michael Tuexen	fc26bf717c	Improve input validation for the IPPROTO_SCTP level socket options SCTP_CONNECT_X and SCTP_CONNECT_X_DELAYED. Some issues where found by running syzkaller. MFC after: 3 days	2019-05-19 17:28:00 +00:00
Mark Johnston	f00876fb60	Revert r347582 for now. The inp lock still needs to be dropped when calling into the driver ioctl handler, as some drivers expect to be able to sleep. Reported by: kib	2019-05-16 13:04:26 +00:00
Mark Johnston	5a1e222bfd	Close some races in multicast socket option handling. r333175 converted the global multicast lock to a sleepable sx lock, so the lock order with respect to the (non-sleepable) inp lock changed. To handle this, r333175 and r333505 added code to drop the inp lock, but this opened races that could leave multicast group description structures in an inconsistent state. This change fixes the problem by simply acquiring the global lock sooner. Along the way, this fixes some LORs and bogus error handling introduced in r333175, and commits some related cleanup. Reported by: syzbot+ba7c4943547e0604faca@syzkaller.appspotmail.com Reported by: syzbot+1b803796ab94d11a46f9@syzkaller.appspotmail.com Reviewed by: ae MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20070	2019-05-14 21:30:55 +00:00
Conrad Meyer	64e7d18f34	netdump: Ref the interface we're attached to Serialize netdump configuration / deconfiguration, and discard our configuration when the affiliated interface goes away by monitoring ifnet_departure_event. Reviewed by: markj, with input from vangyzen@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20206	2019-05-10 23:12:59 +00:00
Conrad Meyer	070e7bf95e	netdump: Fix boot-time configuration typo Boot-time netdump configuration is much more useful if one can configure the client and gateway addresses. Fix trivial typo. (Long-standing bug, I believe it dates to the original netdump commit.) Spotted by: one of vangyzen@ or markj@ Sponsored by: Dell EMC Isilon	2019-05-10 23:10:22 +00:00
Conrad Meyer	6144b50f8b	netdump: Don't store sensitive key data we don't need Prior to this revision, struct diocskerneldump_arg (and struct netdump_conf with embedded diocskerneldump_arg before r347192), were copied in their entirety to the global 'nd_conf' variable. Also prior to this revision, de-configuring netdump would not remove the the key material from global nd_conf. As part of Encrypted Kernel Crash Dumps (EKCD), which was developed contemporaneously with netdump but happened to land first, the diocskerneldump_arg structure will contain sensitive key material (kda_key[]) when encrypted dumps are configured. Netdump doesn't have any use for the key data -- encryption is handled in the core dumper code -- so in this revision, we no longer store it. Unfortunately, I think this leak dates to the initial import of netdump in r333283; so it's present in FreeBSD 12.0. Fortunately, the impact seems relatively minor. Any new netdump configuration would overwrite the key material; for active encrypted netdump configurations, the key data stored was just a duplicate of the key material already in the core dumper code; and no user interface (other than /dev/kmem) actually exposed the leaked material to userspace. Reviewed by: markj, rpokala (earlier commit message) MFC after: 2 weeks Security: yes (minor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20233	2019-05-10 21:55:11 +00:00
Gleb Smirnoff	54bb7ac0c4	Fix regression from r347375: do not panic when sending an IP multicast packet from an interface that doesn't have IPv4 address. Reported by: Michael Butler <imb protected-networks.net>	2019-05-10 21:51:17 +00:00
Andrew Gallatin	4e255d7479	Bind TCP HPTS (pacer) threads to NUMA domains Bind the TCP pacer threads to NUMA domains and build per-domain pacer-thread lookup tables. These tables allow us to use the inpcb's NUMA domain information to match an inpcb with a pacer thread on the same domain. The motivation for this is to keep the TCP connection local to a NUMA domain as much as possible. Thanks to jhb for pre-reviewing an earlier version of the patch. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20134	2019-05-10 13:41:19 +00:00
Michael Tuexen	b5a154d8e3	Don't use C++ style comments. These where introduced in r347382. Reported by: ngie@	2019-05-09 21:00:15 +00:00
Michael Tuexen	5acfd95cbc	Receiver side DSACK implemenation. This adds initial support for RFC 2883. Submitted by: Richard Scheffenegger Reviewed by: rrs@ Differential Revision: https://reviews.freebsd.org/D19334	2019-05-09 07:34:15 +00:00
Michael Tuexen	5cc11a89db	Prevent cwnd to collapse down to 1 MSS after exiting recovery. This is descrined in RFC 6582, which updates RFC 3782. Submitted by: Richard Scheffenegger Reviewed by: lstewart@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17614	2019-05-09 07:11:08 +00:00
Gleb Smirnoff	6ca363eb7b	Existense of PCB route caching doesn't allow us to use new fast route lookup KPI in ip_output() like it is already used in ip_forward(). However, when there is no PCB provided we can use fast KPI, gaining performance advantage. Typical case when ip_output() is called without a PCB pointer is a sendto(2) on a not connected UDP socket. In practice DNS servers do this. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D19804	2019-05-08 23:39:24 +00:00
Conrad Meyer	6b6e2954dd	List-ify kernel dump device configuration Allow users to specify multiple dump configurations in a prioritized list. This enables fallback to secondary device(s) if primary dump fails. E.g., one might configure a preference for netdump, but fallback to disk dump as a second choice if netdump is unavailable. This change does not list-ify netdump configuration, which is tracked separately from ordinary disk dumps internally; only one netdump configuration can be made at a time, for now. It also does not implement IPv6 netdump. savecore(8) is already capable of scanning and iterating multiple devices from /etc/fstab or passed on the command line. This change doesn't update the rc or loader variables 'dumpdev' in any way; it can still be set to configure a single dump device, and rc.d/savecore still uses it as a single device. Only dumpon(8) is updated to be able to configure the more complicated configurations for now. As part of revving the ABI, unify netdump and disk dump configuration ioctl / structure, and leave room for ipv6 netdump as a future possibility. Backwards-compatibility ioctls are added to smooth ABI transition, especially for developers who may not keep kernel and userspace perfectly synced. Reviewed by: markj, scottl (earlier version) Relnotes: maybe Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19996	2019-05-06 18:24:07 +00:00
Alexander Motin	fb6a844704	ip multicast debug: fix strings vs defines Turning on multicast debug made multicast failure worse because the strings and #define values no longer matched up. Fix them, and make sure they stay matched-up. Submitted by: torek MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-04-29 18:09:55 +00:00
Andrew Gallatin	50575ce11c	Track TCP connection's NUMA domain in the inpcb Drivers can now pass up numa domain information via the mbuf numa domain field. This information is then used by TCP syncache_socket() to associate that information with the inpcb. The domain information is then fed back into transmitted mbufs in ip{6}_output(). This mechanism is nearly identical to what is done to track RSS hash values in the inp_flowid. Follow on changes will use this information for lacp egress port selection, binding TCP pacers to the appropriate NUMA domain, etc. Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20028	2019-04-25 15:37:28 +00:00
Andrey V. Elsukov	aee793eec9	Add GRE-in-UDP encapsulation support as defined in RFC8086. This GRE-in-UDP encapsulation allows the UDP source port field to be used as an entropy field for load-balancing of GRE traffic in transit networks. Also most of multiqueue network cards are able distribute incoming UDP datagrams to different NIC queues, while very little are able do this for GRE packets. When an administrator enables UDP encapsulation with command `ifconfig gre0 udpencap`, the driver creates kernel socket, that binds to tunnel source address and after udp_set_kernel_tunneling() starts receiving of all UDP packets destined to 4754 port. Each kernel socket maintains list of tunnels with different destination addresses. Thus when several tunnels use the same source address, they all handled by single socket. The IP[V6]_BINDANY socket option is used to be able bind socket to source address even if it is not yet available in the system. This may happen on system boot, when gre(4) interface is created before source address become available. The encapsulation and sending of packets is done directly from gre(4) into ip[6]_output() without using sockets. Reviewed by: eugen MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D19921	2019-04-24 09:05:45 +00:00
Conrad Meyer	a9f7f19242	netdump: Fix !COMPAT_FREEBSD11 unused variable warning Reported by: Ralf Wenk <iz-rpi03_hs-karlsruhe.de> Sponsored by: Dell EMC Isilon	2019-04-23 17:05:57 +00:00
Bjoern A. Zeeb	d86ecbe993	iFix udp_output() lock inconsistency. In r297225 the initial INP_RLOCK() was replaced by an early acquisition of an r- or w-lock depending on input variables possibly extending the write locked area for reasons not entirely clear but possibly to avoid a later case of unlock and relock leading to a possible race condition and possibly in order to allow the route cache to work for connected sockets. Unfortunately the conditions were not 1:1 replicated (probably because of the route cache needs). While this would not be a problem the legacy IP code compared to IPv6 has an extra case when dealing with IP_SENDSRCADDR. In a particular case we were holding an exclusive inp lock and acquired the shared udbinfo lock (now epoch). When then running into an error case, the locking assertions on release fired as the udpinfo and inp lock levels did not match. Break up the special case and in that particular case acquire and udpinfo lock depending on the exclusitivity of the inp lock. MFC After: 9 days Reported-by: syzbot+1f5c6800e4f99bdb1a48@syzkaller.appspotmail.com Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D19594	2019-04-23 10:12:33 +00:00
Hans Petter Selasky	6bbdbbb830	Revert r346530 until further. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-04-22 19:36:19 +00:00
Bjoern A. Zeeb	ade1258dc1	r297225 move the assignment of sin from add to the top of the function. sin is not changed after the initial assignment, so no need to set it again. MFC after: 10 days	2019-04-22 14:53:53 +00:00
Bjoern A. Zeeb	e932299837	Remove some excessive brackets. No functional change. MFC after: 10 days	2019-04-22 14:20:49 +00:00
Hans Petter Selasky	04f44499ca	Fix build for mips and powerpc after r346530. Need to include sys/kernel.h to define SYSINIT() which is used by sys/eventhandler.h . MFC after: 1 week Sponsored by: Mellanox Technologies	2019-04-22 08:32:00 +00:00
Hans Petter Selasky	40eb389666	Fix panic in network stack due to memory use after free in relation to fragmented packets. When sending IPv4 and IPv6 fragmented packets and a fragment is lost, the mbuf making up the fragment will remain in the temporary hashed fragment list for a while. If the network interface departs before the so-called slow timeout clears the packet, the fragment causes a panic when the timeout kicks in due to accessing a freed network interface structure. Make sure that when a network device is departing, all hashed IPv4 and IPv6 fragments belonging to it, get freed. Backtrace: panic() icmp6_reflect() hlim = ND_IFINFO(m->m_pkthdr.rcvif)->chlim; ^^^^ rcvif->if_afdata[AF_INET6] is NULL. icmp6_error() frag6_freef() frag6_slowtimo() pfslowtimo() softclock_call_cc() softclock() ithread_loop() Differential Revision: https://reviews.freebsd.org/D19622 Reviewed by: bz (network), adrian MFC after: 1 week Sponsored by: Mellanox Technologies	2019-04-22 07:27:24 +00:00
Conrad Meyer	60ade167fd	netdump: Fix 11 compatibility DIOCSKERNELDUMP ioctl The logic was present for the 11 version of the DIOCSKERNELDUMP ioctl, but had not been updated for the 12 ABI. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D19980	2019-04-20 16:07:29 +00:00
John Baldwin	68cea2b106	Push down INP_WLOCK slightly in tcp_ctloutput. The inp lock is not needed for testing the V6 flag as that flag is set once when the inp is created and never changes. For non-TCP socket options the lock is immediately dropped after checking that flag. This just pushes the lock down to only be acquired for TCP socket options. This isn't a hot-path, more a cosmetic cleanup I noticed while reading the code. Reviewed by: bz MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19740	2019-04-18 23:21:26 +00:00
Michael Tuexen	20a6a3a7a7	When sending IPv4 packets on a SOCK_RAW socket using the IP_HDRINCL option, ensure that the ip_hl field is valid. Furthermore, ensure that the complete IPv4 header is contained in the first mbuf. Finally, move the length checks before relying on them when accessing fields of the IPv4 header. Reported by: jtl@ Reviewed by: jtl@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19181	2019-04-13 10:47:47 +00:00
Michael Tuexen	6982c0fac1	Fix an SCTP related locking issue. Don't report that the TCB_SEND_LOCK is owned, when it is not. This issue was found by running syzkaller. MFC after: 1 week	2019-04-11 20:39:12 +00:00
Mark Johnston	f1ef572a1e	Reinitialize multicast source filter structures after invalidation. When leaving a multicast group, a hole may be created in the inpcb's source filter and group membership arrays. To remove the hole, the succeeding array elements are copied over by one entry. The multicast code expects that a newly allocated array element is initialized, but the code which shifts a tail of the array was leaving stale data in the final entry. Fix this by explicitly reinitializing the last entry following such a copy. Reported by: syzbot+f8c3c564ee21d650475e@syzkaller.appspotmail.com Reviewed by: ae MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19872	2019-04-11 08:00:59 +00:00
Randall Stewart	8021928623	Fix a small bug in the tcp_log_id where the bucket was unlocked and yet the bucket-unlock flag was not changed to false. This can cause a panic if INVARIANTS is on and we go through the right path (though rare). This fixes the correct bug :) Reported by: syzbot+179a1ad49f3c4c215fa2@syzkaller.appspotmail.com Reviewed by: tuexen@	2019-04-10 18:58:11 +00:00
Rodney W. Grimes	6c1c6ae537	Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter the values in use by these macros. Correct that by replacing handcrafted code with proper macro usage. Reviewed by: karels, kristof Approved by: bde (mentor) MFC after: 3 weeks Sponsored by: John Gilmore Differential Revision: https://reviews.freebsd.org/D19317	2019-04-04 19:01:13 +00:00
Randall Stewart	fd29ff5d53	Undo my previous erroneous commit changing the tcp_output kassert. Hmm now the question is where did the tcp_log_id change go :o	2019-04-03 19:35:07 +00:00
Navdeep Parhar	7893235ff0	tcp_autorcvbuf_inc was removed in r344433. Discussed with: tuexen@ Sponsored by: Chelsio Communications	2019-03-29 21:39:47 +00:00
John Baldwin	43b65e3c98	Don't check the inp socket pointer in in_pcboutput_eagain. Reviewed by: hps (by saying it was ok to be removed) MFC after: 1 month Sponsored by: Netflix	2019-03-29 19:47:42 +00:00
Mark Johnston	7762bbc30e	Add CTLFLAG_VNET to the net.inet.icmp.tstamprepl definition. Reported by: Hans Fiedler <hans@hfconsulting.com> MFC after: 3 days	2019-03-26 22:14:50 +00:00
Randall Stewart	7854c63d6f	Fix a small bug in the tcp_log_id where the bucket was unlocked and yet the bucket-unlock flag was not changed to false. This can cause a panic if INVARIANTS is on and we go through the right path (though rare). Reported by: syzbot+179a1ad49f3c4c215fa2@syzkaller.appspotmail.com Reviewed by: tuexen@ MFC after: 1 week	2019-03-26 10:41:27 +00:00
Michael Tuexen	eb3b9ea3fe	Fix a double free of an SCTP association in an error path. This is joint work with rrs@. The issue was found by running syzkaller. MFC after: 1 week	2019-03-26 08:27:00 +00:00
Michael Tuexen	7c96d54f20	Initialize scheduler specific data for the FCFS scheduler. This is joint work with rrs@. The issue was reported by using syzkaller. MFC after: 1 week	2019-03-25 16:40:54 +00:00
Michael Tuexen	689ed08920	Improve locking when tearing down an SCTP association. This is joint work with rrs@ and the issue was found by syzkaller. MFC after: 1 week	2019-03-25 15:23:20 +00:00
Michael Tuexen	2de5b90420	Fix the handling of fragmented unordered messages when using DATA chunks and FORWARD-TSN. This bug was reported in https://github.com/sctplab/usrsctp/issues/286 for the userland stack. This is joint work with rrs@. MFC after: 1 week	2019-03-25 09:47:22 +00:00
Michael Tuexen	58e6eeef45	Fix build issue for the userland stack. Joint work with rrs@. MFC after: 1 week	2019-03-24 12:13:05 +00:00
Michael Tuexen	6b6de29ca1	Fox more signed unsigned issues. This time on the send path. This is joint work with rrs@ and was found by running syzkaller. MFC after: 1 week	2019-03-24 10:40:20 +00:00
Michael Tuexen	0d3cf13dab	Fix a signed/unsigned bug when receiving SCTP messages. This is joint work with rrs@. Reported by: syzbot+6b8a4bc8cc828e9d9790@syzkaller.appspotmail.com MFC after: 1 week	2019-03-24 09:46:16 +00:00
Michael Tuexen	7de4780412	Limit the size of messages sent on 1-to-many style SCTP sockets with the SCTP_SENDALL flag. Allow also only one operation per SCTP endpoint. This fixes an issue found by running syzkaller and is joint work with rrs@. MFC after: 1 week	2019-03-23 22:56:03 +00:00
Michael Tuexen	2ef5bd2f0c	Limit the number of bytes which can be queued for SCTP sockets. This is joint work with rrs@. Reported by: syzbot+307f167f9bc214f095bc@syzkaller.appspotmail.com MFC after: 1 week	2019-03-23 22:46:29 +00:00
Michael Tuexen	0999766ddf	Add sysctl variable net.inet.tcp.rexmit_initial for setting RTO.Initial used by TCP. Reviewed by: rrs@, 0mp@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19355	2019-03-23 21:36:59 +00:00
Michael Tuexen	05fb056c06	Fix a KASSERT() in tcp_output(). When checking the length of the headers at this point, the IP level options have not been added to the mbuf chain. So don't take them into account. Reported by: syzbot+16025fff7ee5f7c5957b@syzkaller.appspotmail.com Reported by: syzbot+adb5836b8a9ff621b2aa@syzkaller.appspotmail.com Reported by: syzbot+d25a5352bcdf40acdbb8@syzkaller.appspotmail.com Reviewed by: rrs@ MFC after: 3 days Sponsored by: Netflix, Inc.	2019-03-23 09:56:41 +00:00
Andrey V. Elsukov	5c04f73e07	Add NAT64 CLAT implementation as defined in RFC6877. CLAT is customer-side translator that algorithmically translates 1:1 private IPv4 addresses to global IPv6 addresses, and vice versa. It is implemented as part of ipfw_nat64 kernel module. When module is loaded or compiled into the kernel, it registers "nat64clat" external action. External action named instance can be created using `create` command and then used in ipfw rules. The create command accepts two IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted, IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used. # ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX # ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out # ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in Obtained from: Yandex LLC Submitted by: Boris N. Lytochkin MFC after: 1 month Relnotes: yes Sponsored by: Yandex LLC	2019-03-18 11:44:53 +00:00
Gleb Smirnoff	dc0fa4f712	Remove 'dir' argument from dummynet_io(). This makes it possible to make dn_dir flags private to dummynet. There is still some room for improvement.	2019-03-14 22:32:50 +00:00
Gleb Smirnoff	cef9f220cd	Remove 'dir' argument in ng_ipfw_input, since ip_fw_args now has this info. While here make 'tee' boolean.	2019-03-14 22:30:05 +00:00
Gleb Smirnoff	1830dae3d3	Make second argument of ip_divert(), that specifies packet direction a bool. This allows pf(4) to avoid including ipfw(4) private files.	2019-03-14 22:23:09 +00:00
Bjoern A. Zeeb	b25d74e06c	Improve ARP logging. r344504 added an extra ARP_LOG() call in case of an if_output() failure. It turns out IPv4 can be noisy. In order to not spam the console by default: (a) add a counter for these events so people can keep better track of how often it happens, and (b) add a sysctl to select the default ARP_LOG log level and set it to INFO avoiding the one (the new) DEBUG level by default. Claim a spare (1st one after 10 years since the stats were added) in order to not break netstat from FreeBSD 12->13 updates in the future. Reviewed by: karels Differential Revision: https://reviews.freebsd.org/D19490	2019-03-09 01:12:59 +00:00
Michael Tuexen	3a35ad54a8	Fix locking bug. MFC after: 3 days	2019-03-08 18:17:57 +00:00
Michael Tuexen	a458a6e620	Some cleanup and consistency improvements. MFC after: 3 days	2019-03-08 18:16:19 +00:00
Michael Tuexen	e6dcce69ca	After removing an entry from the stream scheduler list, set the pointers to NULL, since we are checking for it in case the element gets inserted again. This issue was found by running syzkaller. MFC after: 3 days	2019-03-07 08:43:20 +00:00
Michael Tuexen	be62c88b80	Allocate an assocition id and register the stcb with holding the lock. This avoids a race where stcbs can be found, which are not completely initialized. This was found by running syzkaller. MFC after: 3 days	2019-03-03 19:55:06 +00:00
Michael Tuexen	5f98c80550	Remove debug output. MFC after: 3 days	2019-03-02 16:10:11 +00:00
Michael Tuexen	bab9988af5	Allow SCTP stream reconfiguration operations only in ESTABLISHED state. This issue was found by running syzkaller. MFC after: 3 days	2019-03-02 14:30:27 +00:00
Michael Tuexen	49f1449309	Handle the case when calling the IPPROTO_SCTP level socket option SCTP_STATUS on an association with no primary path (early state). This issue was found by running syzkaller. MFC after: 3 days	2019-03-02 14:15:33 +00:00
Michael Tuexen	e57d481c5e	Report the correct length when using the IPPROTO_SCTP level socket options SCTP_GET_PEER_ADDRESSES and SCTP_GET_LOCAL_ADDRESSES.	2019-03-02 13:12:37 +00:00
Michael Tuexen	20ab225b61	Honor the memory limits provided when processing the IPPROTO_SCTP level socket option SCTP_GET_LOCAL_ADDRESSES in a getsockopt() call. Thanks to Thomas Barabosch for reporting the issue which was found by running syzkaller. MFC after: 3 days	2019-03-01 18:47:41 +00:00
Michael Tuexen	3aee58ca76	Improve consistency, not functional change. MFC after: 3 days	2019-03-01 15:57:55 +00:00
John Baldwin	dbcc200058	Various cleanups to the management of multiple TCP stacks. - Use strlcpy() with sizeof() instead of strncpy(). - Simplify initialization of TCP functions structures. init_tcp_functions() was already called before the first call to register a stack. Just inline the work in the SYSINIT and remove the racy helper variable. Instead, KASSERT that the rw lock is initialized when registering a stack. - Protect the default stack via a direct pointer comparison. The default stack uses the name "freebsd" instead of "default" so this protection wasn't working for the default stack anyway. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19152	2019-02-27 20:24:23 +00:00
Bjoern A. Zeeb	a4c69b8bc0	Make arp code return (more) errors. arprequest() is a void function and in case of error we simply return without any feedback. In case of any local operation or *if_output() failing no feedback is send up the stack for the packet which triggered the arp request to be sent. arpresolve_full() has three pre-canned possible errors returned (if we have not yet sent enough arp requests or if we tried often enough without success) otherwise "no error" is returned. Make arprequest() an "internal" function arprequest_internal() which does return a possible error to the caller. Preserve arprequest() as a void wrapper function for external consumers. In arpresolve_full() add an extra error checking. Use the arprequest_internal() function and only return an error if non of the three ones (mentioend above) are already set. This will return possible errors all the way up the stack and allows functions and programs to react on the send errors rather than leaving them in the dark. Also they might get more detailed feedback of why packets cannot be sent and they will receive it quicker. Reviewed by: karels, hselasky Differential Revision: https://reviews.freebsd.org/D18904	2019-02-24 22:49:56 +00:00
Gleb Smirnoff	0dfc145abe	Support struct ip_mreqn as argument for IP_ADD_MEMBERSHIP. Legacy support for struct ip_mreq remains in place. The struct ip_mreqn is Linux extension to classic BSD multicast API. It has extra field allowing to specify the interface index explicitly. In Linux it used as argument for IP_MULTICAST_IF and IP_ADD_MEMBERSHIP. FreeBSD kernel also declares this structure and supports it as argument to IP_MULTICAST_IF since r170613. So, we have structure declared but not fully supported, this confused third party application configure scripts. Code handling IP_ADD_MEMBERSHIP was mixed together with code for IP_ADD_SOURCE_MEMBERSHIP. Bringing legacy and new structure support into the mess would made the "argument switcharoo" intolerable, so code was separated into its own switch case clause. MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D19276	2019-02-23 06:03:18 +00:00
Michael Tuexen	560c058683	The receive buffer autoscaling for TCP is based on a linear growth, which is acceptable in the congestion avoidance phase, but not during slow start. The MTU is is also not taken into account. Use a method instead, which is based on exponential growth working also in slow start and being independent from the MTU. This is joint work with rrs@. Reviewed by: rrs@, Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18375	2019-02-21 10:35:32 +00:00
Michael Tuexen	a1f0e13475	This patch addresses an issue brought up by bz@ in D18968: When TCP_REASS_LOGGING is defined, a NULL pointer dereference would happen, if user data was received during the TCP handshake and BB logging is used. A KASSERT is also added to detect tcp_reass() calls with illegal parameter combinations. Reported by: bz@ Reviewed by: rrs@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19254	2019-02-21 09:34:47 +00:00
Michael Tuexen	3b853844d7	Reduce the TCP initial retransmission timeout from 3 seconds to 1 second as allowed by RFC 6298. Reviewed by: kbowling@, Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18941	2019-02-20 18:03:43 +00:00
Michael Tuexen	c6dcb64b18	Use exponential backoff for retransmitting SYN segments as specified in the TCP RFCs. Reviewed by: rrs@, Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18974	2019-02-20 17:56:38 +00:00
Michael Tuexen	e82fdca156	Fix a byte ordering issue for the advertised receiver window in ACK segments sent in TIMEWAIT state, which I introduced in r336937. MFC after: 3 days Sponsored by: Netflix, Inc.	2019-02-15 09:45:17 +00:00
Andrey V. Elsukov	c7ee62fcd5	In r335015 PCB destroing was made deferred using epoch_call(). But ipsec_delete_pcbpolicy() uses some VNET-virtualized variables, and thus it needs VNET context, that is missing during gtaskqueue executing. Use inp_vnet context to set curvnet in in_pcbfree_deferred(). PR: 235684 MFC after: 1 week	2019-02-13 15:46:05 +00:00
Kristof Provost	3838c6a3e6	garp: Fix vnet related panic for gratuitous arp Gratuitous ARP packets are sent from a timer, which means we don't have a vnet context set. As a result we panic trying to send the packet. Set the vnet context based on the interface associated with the interface address. To reproduce: sysctl net.link.ether.inet.garp_rexmit_count=2 ifconfig vtnet1 10.0.0.1/24 up PR: 235699 Reviewed by: vangyzen@ MFC after: 1 week	2019-02-12 21:22:57 +00:00
Michael Tuexen	aef0641755	Improve input validation for raw IPv4 socket using the IP_HDRINCL option. This issue was found by running syzkaller on OpenBSD. Greg Steuck made me aware that the problem might also exist on FreeBSD. Reported by: Greg Steuck MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D18834	2019-02-12 10:17:21 +00:00
Michael Tuexen	d9707e43df	Fix a locking issue when reporing outbount messages. MFC after: 3 days	2019-02-10 14:02:14 +00:00
Michael Tuexen	507bb10421	Fix a locking issue in the IPPROTO_SCTP level SCTP_PEER_ADDR_THLDS socket option. The problem affects only setsockopt with invalid parameters. This issue was found by syzkaller. MFC after: 3 days	2019-02-10 13:55:32 +00:00
Michael Tuexen	6cf360772f	Fix a locking bug in the IPPROTO_SCTP level SCTP_EVENT socket option. This occurs when call setsockopt() with invalid parameters. This issue was found by syzkaller. MFC after: 3 days	2019-02-10 10:42:16 +00:00
Michael Tuexen	333669e016	Fix locking for IPPROTO_SCTP level SCTP_DEFAULT_PRINFO socket option. This problem occurred when calling setsockopt() will invalid parameters. This issue was found by running syzkaller. MFC after: 3 days	2019-02-10 08:28:56 +00:00
Michael Tuexen	aa36fbd6fa	Ensure that when using the TCP CDG congestion control and setting the sysctl variable net.inet.tcp.cc.cdg.smoothing_factor to 0, the smoothing is disabled. Without this patch, a division by zero orrurs. PR: 193762 Reviewed by: lstewart@, rrs@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19071	2019-02-08 20:42:49 +00:00
Michael Tuexen	baed5270e1	Only reduce the PMTU after the send call. The only way to increase it, is via PMTUD. This fixes an MTU issue reported by Timo Voelker. MFC after: 3 days	2019-02-05 10:29:31 +00:00
Michael Tuexen	e4c42fa266	Fix an off-by-one error in the input validation of the SCTP_RESET_STREAMS socketoption. This was found by running syzkaller. MFC after: 3 days	2019-02-05 10:13:51 +00:00
Warner Losh	52467047aa	Regularize the Netflix copyright Use recent best practices for Copyright form at the top of the license: 1. Remove all the All Rights Reserved clauses on our stuff. Where we piggybacked others, use a separate line to make things clear. 2. Use "Netflix, Inc." everywhere. 3. Use a single line for the copyright for grep friendliness. 4. Use date ranges in all places for our stuff. Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)	2019-02-04 21:28:25 +00:00
Michael Tuexen	116ef4d6e7	When handling SYN-ACK segments in the SYN-RCVD state, set tp->snd_wnd consistently. This inconsistency was observed when working on the bug reported in PR 235256, although it does not fix the reported issue. The fix for the PR will be a separate commit. PR: 235256 Reviewed by: rrs@, Richard Scheffenegger MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19033	2019-02-01 12:33:00 +00:00
Gleb Smirnoff	547392731f	Repair siftr(4): PFIL_IN and PFIL_OUT are defines of some value, relying on them having particular values can break things.	2019-02-01 08:10:26 +00:00
Gleb Smirnoff	b252313f0b	New pfil(9) KPI together with newborn pfil API and control utility. The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951	2019-01-31 23:01:03 +00:00
Brooks Davis	435a8c1560	Add a simple port filter to SIFTR. SIFTR does not allow any kind of filtering, but captures every packet processed by the TCP stack. Often, only a specific session or service is of interest, and doing the filtering in post-processing of the log adds to the overhead of SIFTR. This adds a new sysctl net.inet.siftr.port_filter. When set to zero, all packets get captured as previously. If set to any other value, only packets where either the source or the destination ports match, are captured in the log file. Submitted by: Richard Scheffenegger Reviewed by: Cheng Cui Differential Revision: https://reviews.freebsd.org/D18897	2019-01-30 17:44:30 +00:00
Michael Tuexen	bf7fcdb18a	Fix the detection of ECN-setup SYN-ACK packets. RFC 3168 defines an ECN-setup SYN-ACK packet as on with the ECE flags set and the CWR flags not set. The code was only checking if ECE flag is set. This patch adds the check to verify that the CWR flags is not set. Submitted by: Richard Scheffenegger Reviewed by: tuexen@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18996	2019-01-28 12:45:31 +00:00
Michael Tuexen	f635b1c264	Don't include two header files when not needed. This allows the part of the rewrite of TCP reassembly in this files to be MFCed to stable/11 with manual change. MFC after: 3 days Sponsored by: Netflix, Inc.	2019-01-25 17:08:28 +00:00
Michael Tuexen	7dc90a1de0	Fix a bug in the restart window computation of TCP New Reno When implementing support for IW10, an update in the computation of the restart window used after an idle phase was missed. To minimize code duplication, implement the logic in tcp_compute_initwnd() and call it. This fixes a bug in NewReno, which was not aware of IW10. Submitted by: Richard Scheffenegger Reviewed by: tuexen@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18940	2019-01-25 13:57:09 +00:00
Michael Tuexen	989321df11	Get the arithmetic right... MFC after: 3 days Sponsored by: Netflix, Inc.	2019-01-24 16:47:18 +00:00
Michael Tuexen	42395cbe31	Kill a trailing whitespace character... MFC after: 3 days Sponsored by: Netflix, Inc.	2019-01-24 16:43:13 +00:00
Michael Tuexen	34bb795ba1	Update a comment to reflect the current reality. SYN-cache entries live for abaut 12 seconds, not 45, when default setting are used. MFC after: 1 week Sponsored by: Netflix, Inc.	2019-01-24 16:40:14 +00:00
Mark Johnston	49cf58e559	Style. Reviewed by: bz MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-01-23 22:19:49 +00:00
Mark Johnston	c06cc56e39	Fix an LLE lookup race. After the afdata read lock was converted to epoch(9), readers could observe a linked LLE and block on the LLE while a thread was unlinking the LLE. The writer would then release the lock and schedule the LLE for deferred free, allowing readers to continue and potentially schedule the LLE timer. By the point the timer fires, the structure is freed, typically resulting in a crash in the callout subsystem. Fix the problem by modifying the lookup path to check for the LLE_LINKED flag upon acquiring the LLE lock. If it's not set, the lookup fails. PR: 234296 Reviewed by: bz Tested by: sbruno, Victor <chernov_victor@list.ru>, Mike Andrews <mandrews@bit0.com> MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18906	2019-01-23 22:18:23 +00:00
Brooks Davis	c53d6b90ba	Make SIFTR work again after r342125 (D18443). Correct a logic error. Only disable when already enabled or enable when disabled. Submitted by: Richard Scheffenegger Reviewed by: Cheng Cui Obtained from: Cheng Cui MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D18885	2019-01-18 21:46:38 +00:00
Michael Tuexen	d9ba240c1c	Limit the user-controllable amount of memory the kernel allocates via IPPROTO_SCTP level socket options. This issue was found by running syzkaller. MFC after: 1 week	2019-01-16 11:33:47 +00:00
Stephen Hurd	500759395a	Fix window update issue when scaling disabled When the TCP window scale option is not used, and the window opens up enough in one soreceive, a window update will not be sent. For example, if recwin == 65535, so->so_rcv.sb_hiwat >= 262144, and so->so_rcv.sb_hiwat <= 524272, the window update will never be sent. This is because recwin and adv are clamped to TCP_MAXWIN << tp->rcv_scale, and so will never be >= so->so_rcv.sb_hiwat / 4 or <= so->so_rcv.sb_hiwat / 8. This patch ensures a window update is sent if the window opens by TCP_MAXWIN << tp->rcv_scale, which should only happen when the window size goes from zero to the max expressible. This issue looks like it was introduced in r306769 when recwin was clamped to TCP_MAXWIN << tp->rcv_scale. MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D18821	2019-01-15 17:40:19 +00:00
Michael Tuexen	10731c54b6	Fix getsockopt() for IP_OPTIONS/IP_RETOPTS. r336616 copies inp->inp_options using the m_dup() function. However, this function expects an mbuf packet header at the beginning, which is not true in this case. Therefore, use m_copym() instead of m_dup(). This issue was found by syzkaller. Reviewed by: mmacy@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18753	2019-01-09 06:36:57 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Mark Johnston	2f2ddd68a5	Support MSG_DONTWAIT in send(2). As it does for recv(2), MSG_DONTWAIT indicates that the call should not block, returning EAGAIN instead. Linux and OpenBSD both implement this, so the change makes porting easier, especially since we do not return EINVAL or so when unrecognized flags are specified. Submitted by: Greg V <greg@unrelenting.technology> Reviewed by: tuexen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18728	2019-01-04 17:31:50 +00:00
Michael Tuexen	09423f72fd	Fix a regression in the TCP handling of received segments. When receiving TCP segments the stack protects itself by limiting the resources allocated for a TCP connections. This patch adds an exception to these limitations for the TCP segement which is the next expected in-sequence segment. Without this patch, TCP connections may stall and finally fail in some cases of packet loss. Reported by: jhb@ Reviewed by: jtl@, rrs@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18580	2018-12-20 16:05:30 +00:00
Hiren Panchasara	51e712f865	Revert r331567 CC Cubic: fix underflow for cubic_cwnd() This change is causing TCP connections using cubic to hang. Need to dig more to find exact cause and fix it. Reported by: tj at mrsk dot me, Matt Garber (via twitter) Discussed with: sbruno (previously), allanjude, cperciva MFC after: 3 days	2018-12-15 17:01:16 +00:00
Brooks Davis	855acb84ca	Fix bugs in plugable CC algorithm and siftr sysctls. Use the sysctl_handle_int() handler to write out the old value and read the new value into a temporary variable. Use the temporary variable for any checks of values rather than using the CAST_PTR_INT() macro on req->newptr. The prior usage read directly from userspace memory if the sysctl() was called correctly. This is unsafe and doesn't work at all on some architectures (at least i386.) In some cases, the code could also be tricked into reading from kernel memory and leaking limited information about the contents or crashing the system. This was true for CDG, newreno, and siftr on all platforms and true for i386 in all cases. The impact of this bug is largest in VIMAGE jails which have been configured to allow writing to these sysctls. Per discussion with the security officer, we will not be issuing an advisory for this issue as root access and a non-default config are required to be impacted. Reviewed by: markj, bz Discussed with: gordon (security officer) MFC after: 3 days Security: kernel information leak, local DoS (both require root) Differential Revision: https://reviews.freebsd.org/D18443	2018-12-15 15:06:22 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Mark Johnston	9d2877fc3d	Clamp the INPCB port hash tables to IPPORT_MAX + 1 chains. Memory beyond that limit was previously unused, wasting roughly 1MB per 8GB of RAM. Also retire INP_PCBLBGROUP_PORTHASH, which was identical to INP_PCBPORTHASH. Reviewed by: glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D17803	2018-12-05 17:06:00 +00:00
Andrey V. Elsukov	d66f9c86fa	Add ability to request listing and deleting only for dynamic states. This can be useful, when net.inet.ip.fw.dyn_keep_states is enabled, but after rules reloading some state must be deleted. Added new flag '-D' for such purpose. Retire '-e' flag, since there can not be expired states in the meaning that this flag historically had. Also add "verbose" mode for listing of dynamic states, it can be enabled with '-v' flag and adds additional information to states list. This can be useful for debugging. Obtained from: Yandex LLC MFC after: 2 months Sponsored by: Yandex LLC	2018-12-04 16:12:43 +00:00
Michael Tuexen	c8b53ced95	Limit option_len for the TCP_CCALGOOPT. Limiting the length to 2048 bytes seems to be acceptable, since the values used right now are using 8 bytes. Reviewed by: glebius, bz, rrs MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18366	2018-11-30 10:50:07 +00:00
Mark Johnston	79db6fe7aa	Plug some networking sysctl leaks. Various network protocol sysctl handlers were not zero-filling their output buffers and thus would export uninitialized stack memory to userland. Fix a number of such handlers. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: tuexen MFC after: 3 days Security: kernel memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18301	2018-11-22 20:49:41 +00:00
Michael Tuexen	ad2be38941	A TCP stack is required to check SEG.ACK first, when processing a segment in the SYN-SENT state as stated in Section 3.9 of RFC 793, page 66. Ensure this is also done by the TCP RACK stack. Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18034	2018-11-22 20:05:57 +00:00
Michael Tuexen	fef56019e9	Ensure that the TCP RACK stack honours the setting of the net.inet.tcp.drop_synfin sysctl-variable. Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18033	2018-11-22 20:02:39 +00:00
Michael Tuexen	7e729f0787	Ensure that the default RTT stack can make an RTT measurement if the TCP connection was initiated using the RACK stack, but the peer does not support the TCP RACK extension. This ensures that the TCP behaviour on the wire is the same if the TCP connection is initated using the RACK stack or the default stack. Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18032	2018-11-22 19:56:52 +00:00
Michael Tuexen	794107181a	Ensure that TCP RST-segments announce consistently a receiver window of zero. This was already done when sending them via tcp_respond(). Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D17949	2018-11-22 19:49:52 +00:00
Michael Tuexen	3bea9a2664	Improve two KASSERTs in the TCP RACK stack. There are two locations where an always true comparison was made in a KASSERT. Replace this by an appropriate check and use a consistent panic message. Also use this code when checking a similar condition. PR: 229664 Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18021	2018-11-21 18:19:15 +00:00
Andrey V. Elsukov	5786c6b9f9	Make multiline APPLY_MASK() macro to be function-like. Reported by: cem MFC after: 1 week	2018-11-20 18:38:28 +00:00
Bjoern A. Zeeb	945aad9c62	Improve the comment for arpresolve_full() in if_ether.c. No functional changes. MFC after: 6 weeks	2018-11-17 16:13:09 +00:00
Bjoern A. Zeeb	90d99b6587	Retire arpresolve_addr(), which is not used anywhere, from if_ether.c.	2018-11-17 16:08:36 +00:00
Jonathan T. Looney	2157f3c36a	Add some additional length checks to the IPv4 fragmentation code. Specifically, block 0-length fragments, even when the MF bit is clear. Also, ensure that every fragment with the MF bit clear ends at the same offset and that no subsequently-received fragments exceed that offset. Reviewed by: glebius, markj MFC after: 3 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D17922	2018-11-16 18:32:48 +00:00
Mark Johnston	86af1d0241	Ensure that IP fragments do not extend beyond IP_MAXPACKET. Such fragments are obviously invalid, and when processed may end up violating the sort order (by offset) of fragments of a given packet. This doesn't appear to be exploitable, however. Reviewed by: emaste Discussed with: jtl MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17914	2018-11-10 03:00:36 +00:00
Ed Maste	2bfaf585ca	Avoid buffer underwrite in icmp_error icmp_error allocates either an mbuf (with pkthdr) or a cluster depending on the size of data to be quoted in the ICMP reply, but the calculation failed to account for the additional padding that m_align may apply. Include the ip header in the size passed to m_align. On 64-bit archs this will have the net effect of moving everything 4 bytes later in the mbuf or cluster. This will result in slightly pessimal alignment for the ICMP data copy. Also add an assertion that we do not move m_data before the beginning of the mbuf or cluster. Reported by: A reddit user Reviewed by: bz, jtl MFC after: 3 days Security: CVE-2018-17156 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17909	2018-11-08 20:17:36 +00:00
Michael Tuexen	8553b984a5	Don't use a function when neither INET nor INET6 are defined. This is a valid case for the userland stack, where this fixes two set-but-not-used warnings in this case. Thanks to Christian Wright for reporting the issue.	2018-11-06 12:55:03 +00:00
Jonathan T. Looney	54e675342b	m_pulldown() may reallocate n. Update the oip pointer after the m_pulldown() call. MFC after: 2 weeks Sponsored by: Netflix	2018-11-02 19:14:15 +00:00
Bjoern A. Zeeb	e2c532f156	carpstats are the last virtualised variable in the file and end up at the end of the vnet_set. The generated code uses an absolute relocation at one byte beyond the end of the carpstats array. This means the relocation for the vnet does not happen for carpstats initialisation and as a result the kernel panics on module load. This problem has only been observed with carp and only on i386. We considered various possible solutions including using linker scripts to add padding to all kernel modules for pcpu and vnet sections. While the symbols (by chance) stay in the order of appearance in the file adding an unused non-file-local variable at the end of the file will extend the size of set_vnet and hence make the absolute relocation for carpstats work (think of this as a single-module set_vnet padding). This is a (tmporary) hack. It is the least intrusive one as we need a timely solution for the upcoming release. We will revisit the problem in HEAD. For a lot more information and the possible alternate solutions please see the PR and the references therein. PR: 230857 MFC after: 3 days	2018-11-01 17:26:18 +00:00
Mark Johnston	d9ff5789be	Remove redundant checks for a NULL lbgroup table. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17108	2018-11-01 15:52:49 +00:00
Mark Johnston	79ee680b65	Improve style in in_pcbinslbgrouphash() and related subroutines. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17107	2018-11-01 15:51:49 +00:00
Michael Tuexen	6999f6975c	Remove debug code which slipped in accidently. MFC after: 4 weeks X-MFC with: r339989 Sponsored by: Netflix, Inc.	2018-11-01 11:41:40 +00:00
Michael Tuexen	099ab39f44	Improve a comment to refer to the actual sections in the TCP specification for the comparisons made. Thanks to lstewart@ for the suggestion. MFC after: 4 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D17595	2018-11-01 11:35:28 +00:00
Bjoern A. Zeeb	201100c58b	Initial implementation of draft-ietf-6man-ipv6only-flag. This change defines the RA "6" (IPv6-Only) flag which routers may advertise, kernel logic to check if all routers on a link have the flag set and accordingly update a per-interface flag. If all routers agree that it is an IPv6-only link, ether_output_frame(), based on the interface flag, will filter out all ETHERTYPE_IP/ARP frames, drop them, and return EAFNOSUPPORT to upper layers. The change also updates ndp to show the "6" flag, ifconfig to display the IPV6_ONLY nd6 flag if set, and rtadvd to allow announcing the flag. Further changes to tcpdump (contrib code) are availble and will be upstreamed. Tested the code (slightly earlier version) with 2 FreeBSD IPv6 routers, a FreeBSD laptop on ethernet as well as wifi, and with Win10 and OSX clients (which did not fall over with the "6" flag set but not understood). We may also want to (a) implement and RX filter, and (b) over time enahnce user space to, say, stop dhclient from running when the interface flag is set. Also we might want to start IPv6 before IPv4 in the future. All the code is hidden under the EXPERIMENTAL option and not compiled by default as the draft is a work-in-progress and we cannot rely on the fact that IANA will assign the bits as requested by the draft and hence they may change. Dear 6man, you have running code. Discussed with: Bob Hinden, Brian E Carpenter	2018-10-30 20:08:48 +00:00
Mark Johnston	da7d7778b0	Expose some netdump configuration parameters through sysctl. Reviewed by: cem MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17755	2018-10-29 21:16:26 +00:00
Eugene Grosbein	1a5995cc88	Prevent ip_input() from panicing due to unprotected access to INADDR_HASH. PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others	2018-10-27 04:59:35 +00:00
Eugene Grosbein	4f1e3122ac	Prevent multicast code from panicing due to unprotected access to INADDR_HASH. PR: 220078 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12457 Tested-by: Cassiano Peixoto and others	2018-10-27 04:53:25 +00:00
Michael Tuexen	de00ad05e6	Add initial descriptions for SCTP related MIB variable. This work was mostly done by Marie-Helene Kvello-Aune. MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D3583	2018-10-26 21:04:17 +00:00
Andrey V. Elsukov	8796e291f8	Add the check that current VNET is ready and access to srchash is allowed. This change is similar to r339646. The callback that checks for appearing and disappearing of tunnel ingress address can be called during VNET teardown. To prevent access to already freed memory, add check to the callback and epoch_wait() call to be sure that callback has finished its work. MFC after: 20 days	2018-10-23 13:11:45 +00:00
John Baldwin	74e10fb613	A couple of style fixes in recent TCP changes. - Add a blank line before a block comment to match other block comments in the same function. - Sort the prototype for sbsndptr_adv and fix whitespace between return type and function name. Reviewed by: gallatin, bz Differential Revision: https://reviews.freebsd.org/D17474	2018-10-22 21:17:36 +00:00
Eugene Grosbein	410634efd1	New sysctl: net.inet.icmp.error_keeptags Currently, icmp_error() function copies FIB number from original packet into generated ICMP response but not mbuf_tags(9) chain. This prevents us from easily matching ICMP responses corresponding to tagged original packets by means of packet filter such as ipfw(8). For example, ICMP "time-exceeded in-transit" packets usually generated in response to traceroute probes lose tags attached to original packets. This change adds new sysctl net.inet.icmp.error_keeptags that defaults to 0 to avoid extra overhead when this feature not needed. Set net.inet.icmp.error_keeptags=1 to make icmp_error() copy mbuf_tags from original packet to generated ICMP response. PR: 215874 MFC after: 1 month	2018-10-21 21:29:19 +00:00
Andrey V. Elsukov	f252e3f2f2	Include <sys/eventhandler.h> to fix the build. MFC after: 1 month	2018-10-21 18:39:34 +00:00
Andrey V. Elsukov	19873f4780	Add handling for appearing/disappearing of ingress addresses to if_gre(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17214	2018-10-21 18:13:45 +00:00
Andrey V. Elsukov	009d82ee0f	Add handling for appearing/disappearing of ingress addresses to if_gif(4). * register handler for ingress address appearing/disappearing; * add new srcaddr hash table for fast softc lookup by srcaddr; * when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface, and set it otherwise; * remove the note about ingress address from BUGS section. MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17134	2018-10-21 18:06:15 +00:00
Andrey V. Elsukov	8251c68d5c	Add KPI that can be used by tunneling interfaces to handle IP addresses appearing and disappearing on the host system. Such handling is need, because tunneling interfaces must use addresses, that are configured on the host as ingress addresses for tunnels. Otherwise the system can send spoofed packets with source address, that belongs to foreign host. The KPI uses ifaddr_event_ext event to implement addresses tracking. Tunneling interfaces register event handlers and then they are notified by the kernel, when an address disappears or appears. ifaddr_event_compat() handler from if.c replaced by srcaddr_change_event() in the ip_encap.c MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17134	2018-10-21 17:55:26 +00:00
Andrey V. Elsukov	094d6f8d75	Add IPFW_RULE_JUSTOPTS flag, that is used by ipfw(8) to mark rule, that was added using "new rule format". And then, when the kernel returns rule with this flag, ipfw(8) can correctly show it. Reported by: lev MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17373	2018-10-21 15:10:59 +00:00
Andrey V. Elsukov	64d63b1e03	Add ifaddr_event_ext event. It is similar to ifaddr_event, but the handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL, and the pointer to ifaddr. Also ifaddr_event now is implemented using ifaddr_event_ext handler. MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17100	2018-10-21 15:02:06 +00:00
Michael Tuexen	93899d10b4	The handling of RST segments in the SYN-RCVD state exists in the code paths. Both are not consistent and the one on the syn cache code does not conform to the relevant specifications (Page 69 of RFC 793 and Section 4.2 of RFC 5961). This patch fixes this: * The sequence numbers checks are fixed as specified on page Page 69 RFC 793. * The sysctl variable net.inet.tcp.insecure_rst is now honoured and the behaviour as specified in Section 4.2 of RFC 5961. Approved by: re (gjb@) Reviewed by: bz@, glebius@, rrs@, Differential Revision: https://reviews.freebsd.org/D17595 Sponsored by: Netflix, Inc.	2018-10-18 19:21:18 +00:00
Jonathan T. Looney	ac75e35d85	In r338102, the TCP reassembly code was substantially restructured. Prior to this change, the code sometimes used a temporary stack variable to hold details of a TCP segment. r338102 stopped using the variable to hold segments, but did not actually remove the variable. Because the variable is no longer used, we can safely remove it. Approved by: re (gjb)	2018-10-16 14:41:09 +00:00
Bjoern A. Zeeb	4ba16a92c7	In udp_input() when walking the pcblist we can come across an inp marked FREED after the epoch(9) changes. Check once we hold the lock and skip the inp if it is the case. Contrary to IPv6 the locking of the inp is outside the multicast section and hence a single check seems to suffice. PR: 232192 Reviewed by: mmacy, markj Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17540	2018-10-12 22:51:45 +00:00
Bjoern A. Zeeb	3afdfcaf33	r217592 moved the check for imo in udp_input() into the conditional block but leaving the variable assignment outside the block, where it is no longer used. Move both the variable and the assignment one block further in. This should result in no functional changes. It will however make upcoming changes slightly easier to apply. Reviewed by: markj, jtl, tuexen Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17525	2018-10-12 11:30:46 +00:00
Jonathan T. Looney	13c6ba6d94	There are three places where we return from a function which entered an epoch section without exiting that epoch section. This is bad for two reasons: the epoch section won't exit, and we will leave the epoch tracker from the stack on the epoch list. Fix the epoch leak by making sure we exit epoch sections before returning. Reviewed by: ae, gallatin, mmacy Approved by: re (gjb, kib) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D17450	2018-10-09 13:26:06 +00:00
Michael Tuexen	3535cdc43e	Avoid truncating unrecognised parameters when reporting them. This resulted in sending malformed packets. Approved by: re (kib@) MFC after: 1 week	2018-10-07 15:13:47 +00:00
Michael Tuexen	3924dfa721	Ensure that the ips_localout counter is incremented for locally generated SCTP packets sent over IPv4. This make the behaviour consistent with IPv6. Reviewed by: ae@, bz@, jtl@ Approved by: re (kib@) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17406	2018-10-07 11:26:15 +00:00
Tom Jones	b6e870116f	Convert UDP length to host byte order When getting the number of bytes to checksum make sure to convert the UDP length to host byte order when the entire header is not in the first mbuf. Reviewed by: jtl, tuexen, ae Approved by: re (gjb), jtl (mentor) Differential Revision: https://reviews.freebsd.org/D17357	2018-10-05 12:51:30 +00:00
Ryan Stone	083a010c62	Hold a write lock across udp_notify() With the new route cache feature udp_notify() will modify the inp when it needs to invalidate the route cache. Ensure that we hold a write lock on the inp before calling the function to ensure that multiple threads don't race while trying to invalidate the cache (which previously lead to a page fault). Differential Revision: https://reviews.freebsd.org/D17246 Reviewed by: sbruno, bz, karels Sponsored by: Dell EMC Isilon Approved by: re (gjb)	2018-10-04 22:03:58 +00:00
Michael Tuexen	15a087e551	Mitigate providing a timing signal if the COOKIE or AUTH validation fails. Thanks to jmg@ for reporting the issue, which was discussed in https://admbugs.freebsd.org/show_bug.cgi?id=878 Approved by: re (TBD@) MFC after: 1 week	2018-10-01 14:05:31 +00:00
Michael Tuexen	9d2e3f14c4	After allocating chunks set the fields in a consistent way. This removes two assignments for the flags field being done twice and adds one, which was missing. Thanks to Felix Weinrank for reporting the issue he found by using fuzz testing of the userland stack. Approved by: re (kib@) MFC after: 1 week	2018-10-01 13:09:18 +00:00
Andrey V. Elsukov	384a5c3c28	Add INP_INFO_WUNLOCK_ASSERT() macro and use it instead of INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic it is possible, that the code is running in net_epoch_preempt section, and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case. PR: 231428 Reviewed by: mmacy, tuexen Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17335	2018-10-01 10:46:00 +00:00
Michael Tuexen	1b084a5e5e	Plug mbuf leak in the SCTP input path in an error case. Approved by: re (kib@) MFC after: 1 week CID: 749312	2018-09-30 21:54:02 +00:00
Michael Tuexen	66bcf0b333	Plug mbuf leaks in the SCTP output path in error cases. Approved by: re (kib@) MFC after: 1 week CID: 1395307	2018-09-30 21:31:33 +00:00
Michael Tuexen	8184648425	Fix the handling of ancillary data for SCTP socket. Implement sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs() similar to sctp_find_cmsg() to improve consistency and avoid the signed/unsigned issues in sctp_process_cmsgs_for_init() and sctp_findassociation_cmsgs(). Thanks to andrew@ for reporting the problem he found using syzcaller. Approved by: re (kib@) MFC after: 1 week	2018-09-30 16:21:31 +00:00
Michael Tuexen	ae0a9a8850	Increment the corresponding UDP stats counter (udps_opackets) when sending UDP encapsulated SCTP packets. This is consistent with the behaviour that when such packets are received, the corresponding UDP stats counter (udps_ipackets) is incremented. Thanks to Peter Lei for making me aware of this inconsistency. Approved by: re (kib@) MFC after: 1 week	2018-09-30 12:16:06 +00:00
Michael Tuexen	3552f16d82	Fix typo in comment. Reported by: @danfe Approved by: re (kib@) MFC after: 1 week X-MFC: r338941	2018-09-28 19:47:32 +00:00
Michael Tuexen	0277ec9c43	Whitespace changes and fixing a typo. No functional change. Approved by: re (kib@) MFC after: 1 week	2018-09-26 10:24:50 +00:00
Michael Tuexen	078a49a077	Remove the unused parameter 'locked' from the function syncache_respond(). There is no functional change. The parameter became unused in r313330, but wasn't removed. Approved by: re (kib@) MFC after: 1 month Sponsored by: Netflix, Inc.	2018-09-23 16:37:32 +00:00
Andrey V. Elsukov	76b09d1823	Add new field max_hdrsize to struct encap_config. It is currently unused and reserved for future use to keep KBI/KPI. Also add several spare pointers to be able extend structure if it will be needed. Approved by: re (gjb)	2018-09-20 19:45:27 +00:00
Michael Tuexen	ba4704a278	Remove unused code. Approved by: re (kib@) MFC after: 1 week	2018-09-18 10:53:07 +00:00
Michael Tuexen	a8a8a8a808	Fix TCP Fast Open for the TCP RACK stack. * Fix a bug where the SYN handling during established state was applied to a front state. * Move a check for retransmission after the timer handling. This was suppressing timer based retransmissions. * Fix an off-by one byte in the sequence number of retransmissions. * Apply fixes corresponding to https://svnweb.freebsd.org/changeset/base/336934 Reviewed by: rrs@ Approved by: re (kib@) MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16912	2018-09-12 10:27:58 +00:00
Mark Johnston	54af3d0dac	Fix synchronization of LB group access. Lookups are protected by an epoch section, so the LB group linkage must be a CK_LIST rather than a plain LIST. Furthermore, we were not deferring LB group frees, so in_pcbremlbgrouphash() could race with readers and cause a use-after-free. Reviewed by: sbruno, Johannes Lundberg <johalun0@gmail.com> Tested by: gallatin Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17031	2018-09-10 19:00:29 +00:00
Mark Johnston	a7026c7fd9	Use ratecheck(9) in in_pcbinslbgrouphash(). Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com> Approved by: re (kib) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17065	2018-09-07 21:11:41 +00:00
Bjoern A. Zeeb	113c4fad55	The inp_lle field to struct inpcb, along with two "valid" flags for the rt and lle cache were added in r191129 (2009). To my best knowledge they have never been used and route caching has converted the inp_rt field from that commit to inp_route rendering this field and these flags obsolete. Convert the pointer into a spare pointer to not change the size of the structure anymore (and to have a spare pointer) and mark the two fields as unused. Reviewed by: markj, karels Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17062	2018-09-06 19:55:40 +00:00
Bjoern A. Zeeb	6d2b0c0166	Make tcp_hpts.c compile a LINT kernel with options RSS and PCBGROUPS added by adding the missing include files and changing a the type of cpuid which would otherwise cause a false comparison with NETISR_CPUID_NONE. Reviewed by: rrs Approved by: re (marius) Differential Revision: https://reviews.freebsd.org/D16891	2018-09-06 16:11:24 +00:00
Mark Johnston	49365eb433	Define sctp probes only when SCTP is configured. Otherwise the "depends_on provider" guard in sctp.d does not work as intended. Reported by: mjg Reviewed by: tuexen Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17057	2018-09-06 14:15:03 +00:00
Mark Johnston	8be02ee4da	Fix style bugs in in_pcblookup_lbgroup(). No functional change intended. Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com> Approved by: re (rgrimes) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17030	2018-09-05 15:04:11 +00:00
Eugene Grosbein	d5d21ad932	Fix "ipfw fwd" to work for incoming IPv4 packets when ip_tryforward() chooses fast forwarding path, as it already works for IPv6 and for both of them on old slow path. PR: 231143 Reviewed by: ae Approved by: re (gjb) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17039	2018-09-05 13:59:36 +00:00
Mark Johnston	73ad0b6abf	Use the correct malloc type in in_pcblbgroup_free(). Approved by: re (kib) Sponsored by: The FreeBSD Foundation	2018-09-03 17:39:09 +00:00
Michael Tuexen	c6c0be2765	Fix a shadowed variable warning. Thanks to Peter Lei for reporting the issue. Approved by: re(kib@) MFH: 1 month Sponsored by: Netflix, Inc.	2018-08-24 10:50:19 +00:00
Michael Tuexen	90ab3571d8	Use arc4rand() instead of read_random() in the SCTP and TCP code. This was suggested by jmg@. Reviewed by: delphij@, jmg@, jtl@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16860	2018-08-23 19:10:45 +00:00
Michael Tuexen	4ba1513d1a	Don't use the explicit number 32 for the length of the secrets, use sizeof() or explicit #definesi instead. No functional change. This was suggested by jmg@. MFC after: 1 month XMFC with: r338053 Sponsored by: Netflix, Inc.	2018-08-23 06:03:59 +00:00
Michael Tuexen	1e88cc8b59	Add support for send, receive and state-change DTrace providers for SCTP. They are based on what is specified in the Solaris DTrace manual for Solaris 11.4. Reviewed by: 0mp, dteske, markj Relnotes: yes Differential Revision: https://reviews.freebsd.org/D16839	2018-08-22 21:23:32 +00:00
Matt Macy	d3878608d7	in_mcast: fix copy paste error when clearing flag	2018-08-22 04:09:55 +00:00
Michael Tuexen	5dff1c3845	Enabling the IPPROTO_IPV6 level socket option IPV6_USE_MIN_MTU on a TCP socket resulted in sending fragmented IPV6 packets. This is fixes by reducing the MSS to the appropriate value. In addtion, if the socket option is set before the handshake happens, announce this MSS to the peer. This is not stricly required, but done since TCP is conservative. PR: 173444 Reviewed by: bz@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16796	2018-08-21 14:12:30 +00:00
Michael Tuexen	7d4dcc36a8	Fix the inheritance of IPv6 level socket options on TCP sockets. This was broken for IPv6 listening socket, which are not IPV6_ONLY, and the accepted TCP connection was using IPv4. Reviewed by: bz@, rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16792	2018-08-21 14:07:36 +00:00
Michael Tuexen	6ef849e601	Whitespace change.	2018-08-21 13:37:06 +00:00
Michael Tuexen	1a0b021677	Refactor the SHUTDOWN_PENDING state handling. This is not a functional change but a preperation for the upcoming DTrace support. It is necessary to change the state in one logical operation, even if it involves clearing the sub state SHUTDOWN_PENDING. MFC after: 1 month	2018-08-21 13:25:32 +00:00
Bjoern A. Zeeb	10b070c166	GC inc_isipv6; it was added for "temp" compatibility in 2001, r86764 and does not seem to be used.	2018-08-20 20:06:36 +00:00
Randall Stewart	c28440db29	This change represents a substantial restructure of the way we reassembly inbound tcp segments. The old algorithm just blindly dropped in segments without coalescing. This meant that every segment could take up greater and greater room on the linked list of segments. This of course is now subject to a tighter limit (100) of segments which in a high BDP situation will cause us to be a lot more in-efficent as we drop segments beyond 100 entries that we receive. What this restructure does is cause the reassembly buffer to coalesce segments putting an emphasis on the two common cases (which avoid walking the list of segments) i.e. where we add to the back of the queue of segments and where we add to the front. We also have the reassembly buffer supporting a couple of debug options (black box logging as well as counters for code coverage). These are compiled out by default but can be added by uncommenting the defines. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16626	2018-08-20 12:43:18 +00:00
Michael Tuexen	8e02b4e00c	Don't expose the uptime via the TCP timestamps. The TCP client side or the TCP server side when not using SYN-cookies used the uptime as the TCP timestamp value. This patch uses in all cases an offset, which is the result of a keyed hash function taking the source and destination addresses and port numbers into account. The keyed hash function is the same a used for the initial TSN. Reviewed by: rrs@ MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16636	2018-08-19 14:56:10 +00:00
Navdeep Parhar	32d2623ae2	Add the ability to look up the 3b PCP of a VLAN interface. Use it in toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752	2018-08-16 23:46:38 +00:00
Matt Macy	f9be038601	Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu	2018-08-15 20:23:08 +00:00
Luiz Otavio O Souza	59b2022f94	Late style follow up on r312770. Submitted by: glebius X-MFC with: r312770 MFC after: 3 days	2018-08-15 15:44:30 +00:00
Jonathan T. Looney	a967df1c8f	Lower the default limits on the IPv4 reassembly queue. In particular, try to ensure that no bucket will have a reassembly queue larger than approximately 100 items. This limits the cost to find the correct reassembly queue when processing an incoming fragment. Due to the low limits on each bucket's length, increase the size of the hash table from 64 to 1024. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:30:46 +00:00
Jonathan T. Looney	ff790bbad0	Implement a limit on on the number of IPv4 reassembly queues per bucket. There is a hashing algorithm which should distribute IPv4 reassembly queues across the available buckets in a relatively even way. However, if there is a flaw in the hashing algorithm which allows a large number of IPv4 fragment reassembly queues to end up in a single bucket, a per- bucket limit could help mitigate the performance impact of this flaw. Implement such a limit, with a default of twice the maximum number of reassembly queues divided by the number of buckets. Recalculate the limit any time the maximum number of reassembly queues changes. However, allow the user to override the value using a sysctl (net.inet.ip.maxfragbucketsize). Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:23:05 +00:00
Jonathan T. Looney	7b9c5eb0a5	Add a global limit on the number of IPv4 fragments. The IP reassembly fragment limit is based on the number of mbuf clusters, which are a global resource. However, the limit is currently applied on a per-VNET basis. Given enough VNETs (or given sufficient customization of enough VNETs), it is possible that the sum of all the VNET limits will exceed the number of mbuf clusters available in the system. Given the fact that the fragment limit is intended (at least in part) to regulate access to a global resource, the fragment limit should be applied on a global basis. VNET-specific limits can be adjusted by modifying the net.inet.ip.maxfragpackets and net.inet.ip.maxfragsperpacket sysctls. To disable fragment reassembly globally, set net.inet.ip.maxfrags to 0. To disable fragment reassembly for a particular VNET, set net.inet.ip.maxfragpackets to 0. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:19:49 +00:00
Jonathan T. Looney	5d9bd45518	Improve hashing of IPv4 fragments. Currently, IPv4 fragments are hashed into buckets based on a 32-bit key which is calculated by (src_ip ^ ip_id) and combined with a random seed. However, because an attacker can control the values of src_ip and ip_id, it is possible to construct an attack which causes very deep chains to form in a given bucket. To ensure more uniform distribution (and lower predictability for an attacker), calculate the hash based on a key which includes all the fields we use to identify a reassembly queue (dst_ip, src_ip, ip_id, and the ip protocol) as well as a random seed. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:15:47 +00:00
Michael Tuexen	0f1346f7f4	Remove a set but not used warning showing up in usrsctp.	2018-08-14 08:32:33 +00:00
Andrey V. Elsukov	62484790e0	Restore ability to send ICMP and ICMPv6 redirects. It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled only when sending redirects for corresponding IP version is disabled via sysctl. Otherwise will be used default forwarding function. PR: 221137 Submitted by: mckay@ MFC after: 2 weeks	2018-08-14 07:54:14 +00:00
Michael Tuexen	839d21d62e	Use the stacb instead of the asoc in state macros. This is not a functional change. Just a preparation for upcoming dtrace state change provider support.	2018-08-13 13:58:45 +00:00
Michael Tuexen	61a2188021	Use consistently the macors to modify the assoc state. No functional change.	2018-08-13 11:56:21 +00:00
Michael Tuexen	812649d86f	Add explicit cast to silence a warning for the userland stack. Thanks to Felix Weinrank for providing the patch.	2018-08-12 14:05:15 +00:00
Devin Teske	ab9ed8a1bd	Fix misspellings of transmitter/transmitted Reviewed by: emaste, bcr Sponsored by: Smule, Inc. Differential Revision: https://reviews.freebsd.org/D16025	2018-08-10 20:37:32 +00:00
Andrey V. Elsukov	16bbf600d9	Remove unneeded ipsec-related includes. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D16637	2018-08-10 07:24:01 +00:00
Leandro Lupori	c8e2123b6a	[ppc] Fix kernel panic when using BOOTP_NFSROOT On PowerPC (and possibly other architectures), that doesn't use EARLY_AP_STARTUP, the config task queue may be used initialized. This was observed while trying to mount the root fs from NFS, as reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168. This patch has 2 main changes: 1- Perform a basic initialization of qgroup_config, similar to what is done in taskqgroup_adjust, but simpler. This makes qgroup_config ready to be used during NFS root mount. 2- When EARLY_AP_STARTUP is not used, call inm_init() and in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs to send multicast packages to request an IP. PR: Bug 230168 Reported by: sbruno Reviewed by: jhibbits, mmacy, sbruno Approved by: jhibbits Differential Revision: D16633	2018-08-09 14:04:51 +00:00
Randall Stewart	d18ea344e6	Fix a small bug in rack where it will end up sending the FIN twice. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16604	2018-08-08 13:36:49 +00:00
Jonathan T. Looney	95a914f631	Address concerns about CPU usage while doing TCP reassembly. Currently, the per-queue limit is a function of the receive buffer size and the MSS. In certain cases (such as connections with large receive buffers), the per-queue segment limit can be quite large. Because we process segments as a linked list, large queues may not perform acceptably. The better long-term solution is to make the queue more efficient. But, in the short-term, we can provide a way for a system administrator to set the maximum queue size. We set the default queue limit to 100. This is an effort to balance performance with a sane resource limit. Depending on their environment, goals, etc., an administrator may choose to modify this limit in either direction. Reviewed by: jhb Approved by: so Security: FreeBSD-SA-18:08.tcp Security: CVE-2018-6922	2018-08-06 17:36:57 +00:00
Randall Stewart	936b2b64ae	This fixes a bug in Rack where we were not properly using the correct value for Delayed Ack. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D16579	2018-08-06 09:22:07 +00:00

... 2 3 4 5 6 ...

6389 Commits