freebsd-nq

Author	SHA1	Message	Date
Gleb Smirnoff	e4c40a8a71	Quickly plug another regression from r353292. Again, multicast locking needs lots of work... Reported by: pho	2019-10-08 16:59:17 +00:00
Michael Tuexen	953b78bed9	Validate length before use it, not vice versa. r353060 should have contained this... This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18070 MFC after: 3 days	2019-10-08 11:07:16 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Michael Tuexen	746c7ae563	In r343587 a simple port filter as sysctl tunable was added to siftr. The new sysctl was not added to the siftr.4 man page at the time. This updates the man page, and removes one left over trailing whitespace. Submitted by: Richard Scheffenegger Reviewed by: bcr@ MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D21619	2019-10-07 20:35:04 +00:00
Randall Stewart	5b63b22075	Brad Davis identified a problem with the new LRO code, VLAN's no longer worked. The problem was that the defines used the same space as the VLAN id. This commit does three things. 1) Move the LRO used fields to the PH_per fields. This is safe since the entire PH_per is used for IP reassembly which LRO code will not hit. 2) Remove old unused pace fields that are not used in mbuf.h 3) The VLAN processing is not in the mbuf queueing code. Consequently if a VLAN submits to Rack or BBR we need to bypass the mbuf queueing for now until rack_bbr_common is updated to handle the VLAN properly. Reported by: Brad Davis	2019-10-06 22:29:02 +00:00
Michael Tuexen	63fb39ba7b	Plumb an mbuf leak in a code path that should not be taken. Also avoid that this path is taken by setting the tail pointer correctly. There is still bug related to handling unordered unfragmented messages which were delayed in deferred handling. This issue was found by OSS-Fuzz testing the usrsctp stack and reported in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17794 MFC after: 3 days	2019-10-06 08:47:10 +00:00
Michael Tuexen	2560cb1eb8	Fix a use after free bug when removing remote addresses. This bug was found by OSS-Fuzz and reported in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18004 MFC after: 3 days	2019-10-05 13:28:01 +00:00
Michael Tuexen	0941b9dc37	Plumb an mbuf leak found by Mark Wodrich from Google by fuzz testing the userland stack and reporting it in: https://github.com/sctplab/usrsctp/issues/396 MFC after: 3 days	2019-10-05 12:34:50 +00:00
Michael Tuexen	44f788d793	Fix the adding of padding to COOKIE-ECHO chunks. Thanks to Mark Wodrich who found this issue while fuzz testing the usrsctp stack and reported the issue in https://github.com/sctplab/usrsctp/issues/382 MFC after: 3 days	2019-10-05 09:46:11 +00:00
Michael Tuexen	aac50dab6d	When skipping the address parameter, take the padding into account. MFC after: 3 days	2019-10-03 20:47:57 +00:00
Michael Tuexen	5989470c37	Cleanup sctp_asconf_error_response() and ensure that the parameter is padded as required. This fixes the followig bug reported by OSS-Fuzz for the usersctp stack: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17790 MFC after: 3 days	2019-10-03 20:39:17 +00:00
Michael Tuexen	967e1a5333	Add missing input validation. This could result in reading from uninitialized memory. The issue was found by OSS-Fuzz for usrsctp and reported in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17780 MFC after: 3 days	2019-10-03 18:36:54 +00:00
Michael Tuexen	2974e263c3	Don't use stack memory which is not initialized. Thanks to Mark Wodrich for reporting this issue for the userland stack in https://github.com/sctplab/usrsctp/issues/380 This issue was also found for usrsctp by OSS-fuzz in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17778 MFC after: 3 days	2019-09-30 12:06:57 +00:00
Michael Tuexen	12a43d0d5d	RFC 7112 requires a host to put the complete IP header chain including the TCP header in the first IP packet. Enforce this in tcp_output(). In addition make sure that at least one byte payload fits in the TCP segement to allow making progress. Without this check, a kernel with INVARIANTS will panic. This issue was found by running an instance of syzkaller. Reviewed by: jtl@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21665	2019-09-29 10:45:13 +00:00
Michael Tuexen	71e85612a9	Replacing MD5 by SipHash improves the performance of the TCP time stamp initialisation, which is important when the host is dealing with a SYN flood. This affects the computation of the initial TCP sequence number for the client side. This has been discussed with secteam@. Reviewed by: gallatin@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21616	2019-09-28 13:13:23 +00:00
Michael Tuexen	79c2a2a07b	Ensure that the INP lock is released before leaving [gs]etsockopt() for RACK specific socket options. These issues were found by a syzkaller instance. Reviewed by: rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21825	2019-09-28 13:05:37 +00:00
Jonathan T. Looney	0b18fb0798	Add new functionality to switch to using cookies exclusively when we the syn cache overflows. Whether this is due to an attack or due to the system having more legitimate connections than the syn cache can hold, this situation can quickly impact performance. To make the system perform better during these periods, the code will now switch to exclusively using cookies until the syn cache stops overflowing. In order for this to occur, the system must be configured to use the syn cache with syn cookie fallback. If syn cookies are completely disabled, this change should have no functional impact. When the system is exclusively using syn cookies (either due to configuration or the overflow detection enabled by this change), the code will now skip acquiring a lock on the syn cache bucket. Additionally, the code will now skip lookups in several places (such as when the system receives a RST in response to a SYN\|ACK frame). Reviewed by: rrs, gallatin (previous version) Discussed with: tuexen Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:18:57 +00:00
Jonathan T. Looney	0bee4d631a	Access the syncache secret directly from the V_tcp_syncache variable, rather than indirectly through the backpointer to the tcp_syncache structure stored in the hashtable bucket. This also allows us to remove the requirement in syncookie_generate() and syncookie_lookup() that the syncache hashtable bucket must be locked. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:06:46 +00:00
Jonathan T. Looney	867e98f8ee	Remove the unused sch parameter to the syncache_respond() function. The use of this parameter was removed in r313330. This commit now removes passing this now-unused parameter. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:02:34 +00:00
Randall Stewart	ac7bd23a7a	lets put (void) in a couple of functions to keep older platforms that are stuck with gcc happy (ppc). The changes are needed in both bbr and rack. Obtained from: Michael Tuexen (mtuexen@)	2019-09-24 20:36:43 +00:00
Randall Stewart	da99b33b17	don't call in_ratelmit detach when RATELIMIT is not compiled in the kernel.	2019-09-24 20:11:55 +00:00
Randall Stewart	2f1cc984db	Fix the ifdefs in tcp_ratelimit.h. They were reversed so that instead of functions only being inside the _KERNEL and the absence of RATELIMIT causing us to have NULL/error returning interfaces we ended up with non-kernel getting the error path. opps..	2019-09-24 20:04:31 +00:00
Randall Stewart	35c7bb3407	This commit adds BBR (Bottleneck Bandwidth and RTT) congestion control. This is a completely separate TCP stack (tcp_bbr.ko) that will be built only if you add the make options WITH_EXTRA_TCP_STACKS=1 and also include the option TCPHPTS. You can also include the RATELIMIT option if you have a NIC interface that supports hardware pacing, BBR understands how to use such a feature. Note that this commit also adds in a general purpose time-filter which allows you to have a min-filter or max-filter. A filter allows you to have a low (or high) value for some period of time and degrade slowly to another value has time passes. You can find out the details of BBR by looking at the original paper at: https://queue.acm.org/detail.cfm?id=3022184 or consult many other web resources you can find on the web referenced by "BBR congestion control". It should be noted that BBRv1 (which this is) does tend to unfairness in cases of small buffered paths, and it will usually get less bandwidth in the case of large BDP paths(when competing with new-reno or cubic flows). BBR is still an active research area and we do plan on implementing V2 of BBR to see if it is an improvement over V1. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D21582	2019-09-24 18:18:11 +00:00
Michael Tuexen	2b861c1538	Plumb a memory leak. Thnanks to Felix Weinrank for finding this issue using fuzz testing and reporting it for the userland stack: https://github.com/sctplab/usrsctp/issues/378 MFC after: 3 days	2019-09-24 13:15:24 +00:00
Michael Tuexen	1325a0de13	Don't hold the info lock when calling sctp_select_a_tag(). This avoids a double lock bug in the NAT colliding state processing of SCTP. Thanks to Felix Weinrank for finding and reporting this issue in https://github.com/sctplab/usrsctp/issues/374 He found this bug using fuzz testing. MFC after: 3 days	2019-09-22 11:11:01 +00:00
Michael Tuexen	44f2a3272e	Cleanup the RTO calculation and perform some consistency checks before computing the RTO. This should fix an overflow issue reported by Felix Weinrank in https://github.com/sctplab/usrsctp/issues/375 for the userland stack and found by running a fuzz tester. MFC after: 3 days	2019-09-22 10:40:15 +00:00
Michael Tuexen	e6b3bd22d8	Fix the handling of invalid parameters in ASCONF chunks. Thanks to Mark Wodrich from Google for reproting the issue in https://github.com/sctplab/usrsctp/issues/376 for the userland stack. MFC after: 3 days	2019-09-20 08:20:20 +00:00
Michael Tuexen	dd3121a895	When the RACK stack computes the space for user data in a TCP segment, it wasn't taking the IP level options into account. This patch fixes this. In addition, it also corrects a KASSERT and adds protection code to assure that the IP header chain and the TCP head fit in the first fragment as required by RFC 7112. Reviewed by: rrs@ MFC after: 3 days Sponsored by: Nertflix, Inc. Differential Revision: https://reviews.freebsd.org/D21666	2019-09-19 10:27:47 +00:00
Michael Tuexen	3c19311544	Only allow a SCTP-AUTH shared key to be updated by the application if it is not deactivated and not used. This avoids a use-after-free problem. Reported by: da_cheng_shao@yeah.net MFC after: 3 days	2019-09-17 09:46:42 +00:00
Michael Tuexen	5b66b7f11b	Don't write to memory outside of the allocated array for SACK blocks. Obtained from: rrs@ MFC after: 3 days Sponsored by: Netflix, Inc.	2019-09-16 08:18:05 +00:00
Michael Tuexen	52f6a8090e	When the IP layer calls back into the SCTP layer to perform the SCTP checksum computation, do not assume that the IP header chain and the SCTP common header are in contiguous memory although the SCTP lays out the mbuf chains that way. If there are IP-level options inserted by the IP layer, the constraint is not fulfilled anymore. This issues was found by running syzkaller. Thanks to markj@ who is running an instance which also provides kernel dumps. This allowed me to find this issue. MFC after: 3 days	2019-09-15 18:29:45 +00:00
Andrew Gallatin	d2e6258258	Avoid unneeded call to arc4random() in syncache_add() Don't call arc4random() unconditionally to initialize sc_iss, and then when syncookies are enabled, just overwrite it with the return value from from syncookie_generate(). Instead, only call arc4random() to initialize sc_iss when syncookies are not enabled. Note that on a system under a syn flood attack, arc4random() becomes quite expensive, and the chacha_poly crypto that it calls is one of the more expensive things happening on the system. Removing this unneeded arc4random() call reduces CPU from about 40% to about 35% in my test scenario (Broadwell Xeon, 6Mpps syn flood attack). Reviewed by: rrs, tuxen, bz Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21591	2019-09-11 18:48:26 +00:00
Randall Stewart	6f32ca1936	With the recent commit of ktls, we no longer have a sb_tls_flags, its just the sb_flags. Also the ratelimit code, now that the defintion is in sockbuf.h, does not need the ktls.h file (or its predecessor). Sponsored by: Netflix Inc	2019-09-11 15:41:36 +00:00
Michael Tuexen	6d261981ee	Only update SACK/DSACK lists when a non-empty segment was received. This fixes hitting a KASSERT with a valid packet exchange. Reviewed by: rrs@, Richard Scheffenegger MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21567	2019-09-09 16:07:47 +00:00
Conrad Meyer	373013b05f	Fix build after r351934 tcp_queue_pkts() is only used if TCPHPTS is defined (and it is not by default). Reported by: gcc	2019-09-06 18:33:39 +00:00
Randall Stewart	af9b9e0d9f	This adds in the missing counter initialization which I had forgotten to bring over.. opps. Differential Revision: https://reviews.freebsd.org/D21127	2019-09-06 18:29:48 +00:00
Warner Losh	82e837f803	Initialize if_hw_tsomaxsegsize to 0 to appease gcc's flow analysis as a fail-safe.	2019-09-06 18:25:42 +00:00
Randall Stewart	e57b2d0e51	This adds the final tweaks to LRO that will now allow me to add BBR. These changes make it so you can get an array of timestamps instead of a compressed ack/data segment. BBR uses this to aid with its delivery estimates. We also now (via Drew's suggestions) will not go to the expense of the tcb lookup if no stack registers to want this feature. If HPTS is not present the feature is not present either and you just get the compressed behavior. Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D21127	2019-09-06 14:25:41 +00:00
Michael Tuexen	ecc5b1d156	Fix the SACK block generation in the base TCP stack by bringing it in sync with the RACK stack. Reviewed by: rrs@ MFC after: 5 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21513	2019-09-04 04:38:31 +00:00
Michael Tuexen	191ae5bfa9	Fix two TCP RACK issues: * Convert the TCP delayed ACK timer from ms to ticks as required. This fixes the timer on platforms with hz != 1000. * Don't delay acknowledgements which report duplicate data using DSACKs. Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21512	2019-09-03 19:48:02 +00:00
Michael Tuexen	fe5dee73f7	This patch improves the DSACK handling to conform with RFC 2883. The lowest SACK block is used when multiple Blocks would be elegible as DSACK blocks ACK blocks get reordered - while maintaining the ordering of SACK blocks not relevant in the DSACK context is maintained. Reviewed by: rrs@, tuexen@ Obtained from: Richard Scheffenegger MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21038	2019-09-02 19:04:02 +00:00
Michael Tuexen	b21097894e	Fix initialization of top_fsn. MFC after: 3 days	2019-09-01 10:39:16 +00:00
Michael Tuexen	6182677fb3	Improve the handling of state cookie parameters in INIT-ACK chunks. This fixes problem with parameters indicating a zero length or partial parameters after an unknown parameter indicating to stop processing. It also fixes a problem with state cookie parameters after unknown parametes indicating to stop porcessing. Thanks to Mark Wodrich from Google for finding two of these issues by fuzz testing the userland stack and reporting them in https://github.com/sctplab/usrsctp/issues/355 and https://github.com/sctplab/usrsctp/issues/352 MFC after: 3 days	2019-09-01 10:09:53 +00:00
Michael Tuexen	e30a1788fa	Improve function definition. MFC after: 3 days	2019-08-31 13:13:40 +00:00
Michael Tuexen	ec24a1b67c	Improve the handling of illegal sequence number combinations in received data chunks. Abort the association if there are data chunks with larger fragement sequence numbers than the fragement sequence of the last fragment. Thanks to Mark Wodrich from Google who found this issue by fuzz testing the userland stack and reporting this issue in https://github.com/sctplab/usrsctp/issues/355 MFC after: 3 days	2019-08-31 08:18:49 +00:00
John Baldwin	b2e60773c6	Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277	2019-08-27 00:01:56 +00:00
Michael Tuexen	15ddc5e43f	Don't hold the rs_mtx lock while calling malloc(). Reviewed by: rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21416	2019-08-26 16:23:47 +00:00
Randall Stewart	e13ad86c67	Fix an issue when TSO and Rack play together. Basically an retransmission of the initial SYN (with data) would cause us to strip the SYN and decrement/increase offset/len which then caused us a -1 offset and a panic. Reported by: Larry Rosenman (Michael Tuexen helped me debug this at the IETF)	2019-08-21 10:45:28 +00:00
Mark Johnston	ccdc986607	Fix netdump buffering after r348473. nd_buf is used to buffer headers (for both the kernel dump itself and for EKCD) before the final call to netdump_dumper(), which flushes residual data in nd_buf. As a result, a small portion of the residual data would be corrupted. This manifests when kernel dump compression is enabled since both zstd and zlib detect the corruption during decompression. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D21294	2019-08-19 16:29:51 +00:00
Andrey V. Elsukov	c4556b2fbe	Save ip_ttl value and restore it after checksum calculation. Since ipvoly is used for checksum calculation, part of original IP header is zeroed. This part includes ip_ttl field, that can be used later in IP_MINTTL socket option handling. PR: 239799 MFC after: 1 week	2019-08-13 12:47:53 +00:00

1 2 3 4 5 ...

6337 Commits