freebsd-skq

Author	SHA1	Message	Date
Bjoern A. Zeeb	32af08ecad	icmpv6: Fix mbuf change in mld After r354748 mld_input() can change the mbuf. The new pointer is never returned to icmp6_input() and when passed to icmp6_rip6_input() the mbuf may no longer valid leading to a panic. Pass a pointer to the mbuf to mld_input() so we can return an updated version in the non-error case. Add a test sending an MLD packet case which will trigger this bug. Pointyhat to: bz Reported by: gallatin, thj MFC After: 2 weeks X-MFC with: r354748 Sponsored by: Netflix	2019-11-18 21:59:47 +00:00
Bjoern A. Zeeb	808c432f62	nd6: retire defrouter_select(), use _fib() variant. Burn bridges and replace the last two calls of defrouter_select() with defrouter_select_fib(). That allows us to retire defrouter_select() and make it more clear in the calling code that it applies to all FIBs. Sponsored by: Netflix	2019-11-16 00:17:35 +00:00
Bjoern A. Zeeb	f592d0c377	nd6_rtr: Pull in the TAILQ_HEAD() as it is not needed outside nd6_rtr.c. Rename the TAILQ_HEAD() struct and the nd_defrouter variable from "nd_" to "nd6_" as they are not part of the RFC 3542 API which uses "ND_". Ideally I'd like to also rename the struct nd_defrouter {} to "nd6_*" but given that is used externally there is more work to do. No functional changes. MFC after: 3 weeks Sponsored by: Netflix	2019-11-16 00:02:36 +00:00
Bjoern A. Zeeb	63abacc204	netinet*: replace IP6_EXTHDR_GET() In a few places we have IP6_EXTHDR_GET() left in upper layer protocols. The IP6_EXTHDR_GET() macro might perform an m_pulldown() in case the data fragment is not contiguous. Convert these last remaining instances into m_pullup()s instead. In CARP, for example, we will a few lines later call m_pullup() anyway, the IPsec code coming from OpenBSD would otherwise have done the m_pullup() and are copying the data a bit later anyway, so pulling it in seems no better or worse. Note: this leaves very few m_pulldown() cases behind in the tree and we might want to consider removing them as well to make mbuf management easier again on a path to variable size mbufs, especially given m_pulldown() still has an issue not re-checking M_WRITEABLE(). Reviewed by: gallatin MFC after: 8 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22335	2019-11-15 21:44:17 +00:00
Bjoern A. Zeeb	a61b5cfbbf	netinet6: Remove PULLDOWN_TESTs. Remove the KAME introduced PULLDOWN_TESTs which did not even have a compile-time option in sys/conf to turn them on for a custom kernel build. They made the code a lot harder to read or more complicated in a few cases. Convert the IP6_EXTHDR_CHECK() calls into FreeBSD looking code. Rather than throwing the packet away if it would not fit the KAME mbuf expectations, convert the macros to m_pullup() calls. Do not do any extra manual conditional checks upfront as to whether the m_len would suffice (), simply let m_pullup() do its work (incl. an early check). Remove extra m_pullup() calls where earlier in the function or the only caller has already done the pullup. Discussed with: rwatson () Reviewed by: ae MFC after: 8 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22334	2019-11-15 21:40:40 +00:00
Bjoern A. Zeeb	e20b5bc485	nd6: simplify code We are taking the same actions in both cases of the branch inside the block. Simplify that code as the extra branch is not needed. MFC after: 3 weeks Sponsored by: Netflix	2019-11-15 13:45:38 +00:00
Bjoern A. Zeeb	b3a25d2993	nd6: remove unused structs and defines Remove a collections of unused structs and #defines to make it easier to understand what is actually in use. Sponsored by: Netflix	2019-11-13 14:28:07 +00:00
Bjoern A. Zeeb	d64df9a2b2	nd6: make nd6_alloc() file static nd6_alloc() is a function used only locally. Make it static and no longer export it. Keeps the KPI smaller. Sponsored by: Netflix	2019-11-13 13:53:17 +00:00
Bjoern A. Zeeb	ad675b3279	nd6 defrouter: consolidate nd_defrouter manipulations in nd6_rtr.c Move the nd_defrouter along with the sysctl handler from nd6.c to nd6_rtr.c and make the variable file static. Provide (temporary) new accessor functions for code manipulating nd_defrouter from nd6.c, and stop exporting functions no longer needed outside nd6_rtr.c. This also shuffles a few functions around in nd6_rtr.c without functional changes. Given all nd_defrouter logic is now in one place we can tidy up the code, locking and, and other open items. MFC after: 3 weeks X-MFC: keep exporting the functions Sponsored by: Netflix	2019-11-13 12:05:48 +00:00
Bjoern A. Zeeb	a8fe77d877	netinet: update mp to pass the proper value back In ip6_[direct_]input() we are looping over the extension headers to deal with the next header. We pass a pointer to an mbuf pointer to the handling functions. In certain cases the mbuf can be updated there and we need to pass the new one back. That missing in dest6_input() and route6_input(). In tcp6_input() we should also update it before we call tcp_input(). In addition to that mark the mbuf NULL all the times when we return that we are done with handling the packet and no next header should be checked (IPPROTO_DONE). This will eventually allow us to assert proper behaviour and catch the above kind of errors more easily, expecting *mp to always be set. This change is extracted from a larger patch and not an exhaustive change across the entire stack yet. PR: 240135 Reported by: prabhakar.lakhera gmail.com MFC after: 3 weeks Sponsored by: Netflix	2019-11-12 15:46:28 +00:00
Gleb Smirnoff	c17cd08f53	It is unclear why in6_pcblookup_local() would require write access to the PCB hash. The function doesn't modify the hash. It always asserted write lock historically, but with epoch conversion this fails in some special cases. Reviewed by: rwatson, bz Reported-by: syzbot+0b0488ca537e20cb2429@syzkaller.appspotmail.com	2019-11-11 06:28:25 +00:00
Bjoern A. Zeeb	c1131de6f1	frag6: properly handle atomic fragments according to RFCs. RFC 8200 says: "If the fragment is a whole datagram (that is, both the Fragment Offset field and the M flag are zero), then it does not need any further reassembly and should be processed as a fully reassembled packet (i.e., updating Next Header, adjust Payload Length, removing the Fragment header, etc.). .." That means we should remove the fragment header and make all the adjustments rather than just skipping over the fragment header. The difference should be noticeable in that a properly handled atomic fragment triggering an ICMPv6 message at an upper layer (e.g. dest unreach, unreachable port) will not include the fragment header. Update the test cases to also test for an unfragmentable part. That is needed so that the next header is properly updated (not just lengths). MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22155	2019-11-08 14:36:44 +00:00
Gleb Smirnoff	2435e507de	Now with epoch synchronized PCB lookup tables we can greatly simplify locking in udp_output() and udp6_output(). First, we select if we need read or write lock in PCB itself, we take the lock and enter network epoch. Then, we proceed for the rest of the function. In case if we need to modify PCB hash, we would take write lock on it for a short piece of code. We could exit the epoch before allocating an mbuf, but with this patch we are keeping it all the way into ip_output()/ip6_output(). Today this creates an epoch recursion, since ip_output() enters epoch itself. However, once all protocols are reviewed, ip_output() and ip6_output() would require epoch instead of entering it. Note: I'm not 100% sure that in udp6_output() the epoch is required. We don't do PCB hash lookup for a bound socket. And all branches of in6_select_src() don't require epoch, at least they lack assertions. Today inet6 address list is protected by rmlock, although it is CKLIST. AFAIU, the future plan is to protect it by network epoch. That would require epoch in in6_select_src(). Anyway, in future ip6_output() would require epoch, udp6_output() would need to enter it.	2019-11-07 21:01:36 +00:00
Gleb Smirnoff	d797164a86	Since r353292 on input path we are always in network epoch, when we lookup PCBs. Thus, do not enter epoch recursively in in_pcblookup_hash() and in6_pcblookup_hash(). Same applies to tcp_ctlinput() and tcp6_ctlinput(). This leaves several sysctl(9) handlers that return PCB credentials unprotected. Add epoch enter/exit to all of them. Differential Revision: https://reviews.freebsd.org/D22197	2019-11-07 20:49:56 +00:00
Gleb Smirnoff	cf377af6e2	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in icmp6_rip6_input(). It shall always run in the network epoch.	2019-11-07 20:43:12 +00:00
Gleb Smirnoff	f42347c39a	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in raw input functions for IPv4 and IPv6. They shall always run in the network epoch.	2019-11-07 20:40:44 +00:00
Gleb Smirnoff	8d28524a90	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in udp6_input(). It shall always run in the network epoch.	2019-11-07 20:38:53 +00:00
Bjoern A. Zeeb	503f4e4736	netinet*: variable cleanup In preparation for another change factor out various variable cleanups. These mainly include: (1) do not assign values to variables during declaration: this makes the code more readable and does allow for better grouping of variable declarations, (2) do not assign values to variables before need; e.g., if a variable is only used in the 2nd half of a function and we have multiple return paths before that, then do not set it before it is needed, and (3) try to avoid assigning the same value multiple times. MFC after: 3 weeks Sponsored by: Netflix	2019-11-07 18:29:51 +00:00
Gleb Smirnoff	751d8d156a	Widen network epoch coverage in nd6_prefix_onlink() as in6ifa_ifpforlinklocal() requires the epoch. Reported by: bz Reviewed by: bz	2019-11-07 17:00:20 +00:00
Gleb Smirnoff	d6dbfed81e	In nd6_timer() enter the network epoch earlier. The defrouter_del() may call into leaf functions that require epoch. Since the function is already run in non-sleepable context, it should be safe to cover it whole with epoch. Reported by: syzcaller	2019-11-04 17:35:37 +00:00
Bjoern A. Zeeb	6e6b5143f5	Properly set VNET when nuking recvif from fragment queues. In theory the eventhandler invoke should be in the same VNET as the the current interface. We however cannot guarantee that for all cases in the future. So before checking if the fragmentation handling for this VNET is active, switch the VNET to the VNET of the interface to always get the one we want. Reviewed by: hselasky MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22153	2019-10-25 18:54:06 +00:00
Bjoern A. Zeeb	702828f643	frag6: do not leak counter in error cases When allocating the IPv6 fragement packet queue entry we do checks against counters and if we pass we increment one of the counters to claim the spot. Right after that we have two cases (malloc and MAC) which can both fail in which case we free the entry but never released our claim on the counter. In theory this can lead to not accepting new fragments after a long time, especially if it would be MAC "refusing" them. Rather than immediately subtracting the value in the error case, only increment it after these two cases so we can no longer leak it. MFC after: 3 weeks Sponsored by: Netflix	2019-10-25 16:29:09 +00:00
Bjoern A. Zeeb	619456bb59	frag6: prevent overwriting initial fragoff=0 packet meta-data. When we receive the packet with the first fragmented part (fragoff=0) we remember the length of the unfragmentable part and the next header (and should probably also remember ECN) as meta-data on the reassembly queue. Someone replying this packet so far could change these 2 (3) values. While changing the next header seems more severe, for a full size fragmented UDP packet, for example, adding an extension header to the unfragmentable part would go unnoticed (as the framented part would be considered an exact duplicate) but make reassembly fail. So do not allow updating the meta-data after we have seen the first fragmented part anymore. The frag6_20 test case is added which failed before triggering an ICMPv6 "param prob" due to the check for each queued fragment for a max-size violation if a fragoff=0 packet was received. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 22:07:45 +00:00
Bjoern A. Zeeb	cd188da20f	frag6: handling of overlapping fragments to conform to RFC 8200 While the comment was updated in r350746, the code was not. RFC8200 says that unless fragment overlaps are exact (same fragment twice) not only the current fragment but the entire reassembly queue for this packet must be silently discarded, which we now do if fragment offset and fragment length do not match. Obtained from: jtl MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16850	2019-10-24 20:22:52 +00:00
Michael Tuexen	4a91aa8fc9	Ensure that the flags indicating IPv4/IPv6 are not changed by failing bind() calls. This would lead to inconsistent state resulting in a panic. A fix for stable/11 was committed in https://svnweb.freebsd.org/base?view=revision&revision=338986 An accelerated MFC is planned as discussed with emaste@. Reported by: syzbot+2609a378d89264ff5a42@syzkaller.appspotmail.com Obtained from: jtl@ MFC after: 1 day Sponsored by: Netflix, Inc.	2019-10-24 20:05:10 +00:00
Bjoern A. Zeeb	53707abd41	frag6: export another counter read-only by sysctl Similar to the system global counter also export the per-VNET counter "frag6_nfragpackets" detailing the current number of fragment packets in this VNET's reassembly queues. The read-only counter is helpful for in-VNET statistical monitoring and for test-cases. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 20:00:37 +00:00
Bjoern A. Zeeb	dda02192f9	frag6: fix counter leak in error case and optimise code In case the first fragmented part (off=0) arrives we check for the maximum packet size for each fragmented part we already queued with the addition of the unfragmentable part from the first one. For one we do not have to enter the loop at all if this is the first fragmented part to arrive, and we can skip the check. Should we encounter an error case we send an ICMPv6 message for any fragment exceeding the maximum length limit. While dequeueing the original packet and freeing it, statistics were not updated and leaked both the reassembly queue count for the fragment and the global fragment count. Found by code inspection and confirmed by tightening test cases checking more statistical and system counters. While here properly wrap a line. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 19:57:18 +00:00
Bjoern A. Zeeb	e5fffe9a69	frag6.c: do not leak packet queue entry in error case When we are checking for the maximum reassembled packet size of the fragmentable part and run into the error case (packet too big), we are leaking the packet queue enntry if this was a first fragment to arrive. Properly cleanup, removing the queue entry from the bucket, decrementing counters, and freeing the memory. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 19:47:32 +00:00
Bjoern A. Zeeb	30809ba9e3	frag6: leave a note about upper layer header checks TBD Per sepcification the upper layer header needs to be within the first fragment. The check was not done so far and there is an open review for related work, so just leave a note as to where to put it. Move the extraction of frag offset up to this as it is needed to determine whether this is a first fragment or not. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 12:16:15 +00:00
Bjoern A. Zeeb	7715d794ef	frag6: check global limits before hash and lock Check whether we are accepting more fragments (based on global limits) before doing expensive operations of calculating the hash and taking the bucket lock. This slightly increases a "race" between check time and incrementing counters (which is already there) possibly allowing a few more fragments than the maximum limits. However, when under attack, we rather save this CPU time for other packets/work. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 11:58:24 +00:00
Bjoern A. Zeeb	efdfee93c0	frag6: small improvements Rather than walking the mbuf chain manually use m_last() which doing exactly that for us. Defer initializing srcifp for longer as there are multiple exit paths out of the function which do not need it set. Initialize before taking the lock though. Rename the mtx lock to match the type better. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 08:15:40 +00:00
Bjoern A. Zeeb	da89a0fe94	frag6: remove IP6_REASS_MBUF macro The IP6_REASS_MBUF() macro did some pointer gynmastics to end up with the same type as it gets in [(cast *)&]. Spelling it out instead saves all this and makes the code more readable and less obfuscated directly using the structure field. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 07:53:10 +00:00
Bjoern A. Zeeb	f1664f3258	frag6: add "big picture" Add some ASCII relation of how the bits plug together. The terminology difference of "fragmented packets" and "fragment packets" is subtle. While here clear up more whitespace and comments. No functional change. MFC after: 3 weeks Sponsored by: Netflix	2019-10-23 23:10:12 +00:00
Bjoern A. Zeeb	21f08a074d	frag6: replace KAME hand-rolled queues with queue(9) TAILQs Remove the KAME custom circular queue for fragments and fragmented packets and replace them with a standard TAILQ. This make the code a lot more understandable and maintainable and removes further hand-rolled code from the the tree using a standard interface instead. Hide the still public structures under #ifdef _KERNEL as there is no use for them in user space. The naming is a bit confusing now as struct ip6q and the ip6q[] buckets array are not the same anymore; sadly struct ip6q is also used by the MAC framework and we cannot rename it. Submitted by: jtl (initally) MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16847 (jtl's original)	2019-10-23 23:01:18 +00:00
Bjoern A. Zeeb	3c7165b35e	frag6: whitespace changes Remove trailing white space, add a blank line, and compress a comment. No functional changes. MFC after: 10 days Sponsored by: Netflix	2019-10-23 20:37:15 +00:00
Gleb Smirnoff	be0c32e2ff	Execute nd6_dad_timer() in the network epoch, since nd6_dad_duplicated() requires it. Make nd6_dad_starttimer() require network epoch. Two calls out of three happen from nd6_dad_timer(). Enter epoch in the remaining one.	2019-10-22 16:06:33 +00:00
Bjoern A. Zeeb	67a10c4644	frag6: fix vnet teardown leak When shutting down a VNET we did not cleanup the fragmentation hashes. This has multiple problems: (1) leak memory but also (2) leak on the global counters, which might eventually lead to a problem on a system starting and stopping a lot of vnets and dealing with a lot of IPv6 fragments that the counters/limits would be exhausted and processing would no longer take place. Unfortunately we do not have a useable variable to indicate when per-VNET initialization of frag6 has happened (or when destroy happened) so introduce a boolean to flag this. This is needed here as well as it was in r353635 for ip_reass.c in order to avoid tripping over the already destroyed locks if interfaces go away after the frag6 destroy. While splitting things up convert the TRY_LOCK to a LOCK operation in now frag6_drain_one(). The try-lock was derived from a manual hand-rolled implementation and carried forward all the time. We no longer can afford not to get the lock as that would mean we would continue to leak memory. Assert that all the buckets are empty before destroying to lock to ensure long-term stability of a clean shutdown. Reported by: hselasky Reviewed by: hselasky MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22054	2019-10-21 08:48:47 +00:00
Bjoern A. Zeeb	65456706c0	frag6: add read-only sysctl for nfrags. Add a read-only sysctl exporting the global number of fragments (base system and all vnets). This is helpful to (a) know how many fragments are currently being processed, (b) if there are possible leaks, (c) if vnet teardown is not working correctly, and lastly (d) it can be used as part of test-suits to ensure (a) to (c). MFC after: 3 weeks Sponsored by: Netflix	2019-10-21 08:36:15 +00:00
Hans Petter Selasky	a55383e720	Fix panic in network stack due to use after free when receiving partial fragmented packets before a network interface is detached. When sending IPv4 or IPv6 fragmented packets and a fragment is lost before the network device is freed, the mbuf making up the fragment will remain in the temporary hashed fragment list and cause a panic when it times out due to accessing a freed network interface structure. 1) Make sure the m_pkthdr.rcvif always points to a valid network interface. Else the rcvif field should be set to NULL. 2) Use the rcvif of the last received fragment as m_pkthdr.rcvif for the fully defragged packet, instead of the first received fragment. Panic backtrace for IPv6: panic() icmp6_reflect() # tries to access rcvif->if_afdata[AF_INET6]->xxx icmp6_error() frag6_freef() frag6_slowtimo() pfslowtimo() softclock_call_cc() softclock() ithread_loop() Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D19622 MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-16 09:11:49 +00:00
Gleb Smirnoff	5f5ec65aaf	in6ifa_llaonifp() is never called from fast path, so do not require epoch being entered.	2019-10-14 15:33:53 +00:00
Michael Tuexen	583b625ba8	Remove line not needed. Submitted by: markj@ MFC after: 3 days	2019-10-13 09:35:03 +00:00
Gleb Smirnoff	ef2e580e56	Don't cover in6_ifattach() with network epoch, as it may call into network drivers ioctls, that may sleep. PR: 241223	2019-10-13 04:25:16 +00:00
Mark Johnston	49c5659e1c	Add a missing include of opt_sctp.h. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-10-12 22:58:33 +00:00
Gleb Smirnoff	1e4f4e56b9	ip6_output() has a complex set of gotos, and some can jump out of the epoch section towards return statement. Since entering epoch is cheap, it is easier to cover the whole function with epoch, rather than try to properly maintain its state.	2019-10-09 17:02:28 +00:00
Gleb Smirnoff	3af7f97c4e	Revert changes to rip6_bind() from r353292. This function is always called in syscall context, so it must enter epoch itself. This changeset originates from early version of the patch, and somehow slipped to the final version. Reported by: pho	2019-10-09 05:52:07 +00:00
Mark Johnston	cb49ec5431	Improve locking in the IPV6_V6ONLY socket option handler. Acquire the inp lock before checking whether the socket is already bound, and around updates to the inp_vflag field. MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21867	2019-10-07 23:35:23 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Michael Tuexen	e7a541b0b9	When processing an incoming IPv6 packet over the loopback interface which contains Hop-by-Hop options, the mbuf chain is potentially changed in ip6_hopopts_input(), called by ip6_input_hbh(). This can happen, because of the the use of IP6_EXTHDR_CHECK, which might call m_pullup(). So provide the updated pointer back to the called of ip6_input_hbh() to avoid using a freed mbuf chain in`ip6_input()`. Reviewed by: markj@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21664	2019-09-19 10:22:29 +00:00
John Baldwin	b2e60773c6	Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277	2019-08-27 00:01:56 +00:00
Bjoern A. Zeeb	1540a98e36	frag6: move public structure into file local space. Move ip6asfrag and the accompanying IP6_REASS_MBUF macro from ip6_var.h into frag6.c as they are not used outside frag6.c. Sadly struct ip6q is all over the mac framework so we have to leave it public. This reduces the public KPI space. MFC after: 3 months X-MFC: possibly MFC the #define only to stable branches Sponsored by: Netflix	2019-08-08 10:59:54 +00:00

1 2 3 4 5 ...

1962 Commits