freebsd-skq

Author	SHA1	Message	Date
glebius	61ee0e91c4	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in raw input functions for IPv4 and IPv6. They shall always run in the network epoch.	2019-11-07 20:40:44 +00:00
bz	be19ea6cb3	netinet*: variable cleanup In preparation for another change factor out various variable cleanups. These mainly include: (1) do not assign values to variables during declaration: this makes the code more readable and does allow for better grouping of variable declarations, (2) do not assign values to variables before need; e.g., if a variable is only used in the 2nd half of a function and we have multiple return paths before that, then do not set it before it is needed, and (3) try to avoid assigning the same value multiple times. MFC after: 3 weeks Sponsored by: Netflix	2019-11-07 18:29:51 +00:00
glebius	a09b9c105d	TCP timers are executed in callout context, so they need to enter network epoch to look into PCB lists. Mechanically convert INP_INFO_RLOCK() to NET_EPOCH_ENTER(). No functional change here.	2019-11-07 00:27:23 +00:00
glebius	d85e3111e4	Mechanically convert INP_INFO_RLOCK() to NET_EPOCH_ENTER() in TCP functions that are executed in syscall context. No functional change here.	2019-11-07 00:10:14 +00:00
glebius	62dc620e39	Mechanically convert INP_INFO_RLOCK() to NET_EPOCH_ENTER(). Remove few outdated comments and extraneous assertions. No functional change here.	2019-11-07 00:08:34 +00:00
bz	d7908b9933	Properly set VNET when nuking recvif from fragment queues. In theory the eventhandler invoke should be in the same VNET as the the current interface. We however cannot guarantee that for all cases in the future. So before checking if the fragmentation handling for this VNET is active, switch the VNET to the VNET of the interface to always get the one we want. Reviewed by: hselasky MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22153	2019-10-25 18:54:06 +00:00
tuexen	56626fc5ad	Ensure that the flags indicating IPv4/IPv6 are not changed by failing bind() calls. This would lead to inconsistent state resulting in a panic. A fix for stable/11 was committed in https://svnweb.freebsd.org/base?view=revision&revision=338986 An accelerated MFC is planned as discussed with emaste@. Reported by: syzbot+2609a378d89264ff5a42@syzkaller.appspotmail.com Obtained from: jtl@ MFC after: 1 day Sponsored by: Netflix, Inc.	2019-10-24 20:05:10 +00:00
tuexen	ddaae075ce	Store a handle for the event handler. This will be used when unloading the SCTP as a module. Obtained from: markj@	2019-10-24 09:22:23 +00:00
rrs	f651cbcfcc	Fix a small bug in bbr when running under a VM. Basically what happens is we are more delayed in the pacer calling in so we remove the stack from the pacer and recalculate how much time is left after all data has been acknowledged. However the comparision was backwards so we end up with a negative value in the last_pacing_delay time which causes us to add in a huge value to the next pacing time thus stalling the connection. Reported by: vm2.finance@gmail.com	2019-10-24 05:54:30 +00:00
tuexen	71224580cf	Fix compile issues when building a kernel without the VIMAGE option. Thanks to cem@ for discussing the issue which resulted in this patch. Reviewed by: cem@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D22089	2019-10-19 20:48:53 +00:00
cem	abc2745a10	debugnet(4): Add optional full-duplex mode It remains unattached to any client protocol. Netdump is unaffected (remaining half-duplex). The intended consumer is NetGDB. Submitted by: John Reimer <john.reimer AT emc.com> (earlier version) Discussed with: markj Differential Revision: https://reviews.freebsd.org/D21541	2019-10-17 20:25:15 +00:00
cem	f92f351606	debugnet(4): Infer non-server connection parameters Loosen requirements for connecting to debugnet-type servers. Only require a destination address; the rest can theoretically be inferred from the routing table. Relax corresponding constraints in netdump(4) and move ifp validation to debugnet connection time. Submitted by: John Reimer <john.reimer AT emc.com> (earlier version) Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D21482	2019-10-17 20:10:32 +00:00
cem	b0452a96d2	Add ddb(4) 'netdump' command to netdump a core without preconfiguration Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine to the generic debugnet code. The imagined use is both netdump, shown here, and NetGDB (vaporware). It uses the ddb(4) lexer, with some new extensions, to parse out IPv4 addresses. 'Netdump' uses the generic debugnet routine to load a configuration and start a dump, without any netdump configuration prior to panic. Loosely derived from work by: John Reimer <john.reimer AT emc.com> Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D21460	2019-10-17 19:49:20 +00:00
glebius	a38665c7e7	Quickly fix up r353683: enter the epoch before calling into netisr_dispatch().	2019-10-17 17:02:50 +00:00
cem	f3a0ee41db	Split out a more generic debugnet(4) from netdump(4) Debugnet is a simplistic and specialized panic- or debug-time reliable datagram transport. It can drive a single connection at a time and is currently unidirectional (debug/panic machine transmit to remote server only). It is mostly a verbatim code lift from netdump(4). Netdump(4) remains the only consumer (until the rest of this patch series lands). The INET-specific logic has been extracted somewhat more thoroughly than previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as much as possible as is protocol-independent, remains in debugnet.c. The separation is not perfect and future improvement is welcome. Supporting INET6 is a long-term goal. Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to 'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the generic module would be more confusing than the refactoring. The only functional change here is the mbuf allocation / tracking. Instead of initiating solely on netdump-configured interface(s) at dumpon(8) configuration time, we watch for any debugnet-enabled NIC for link activation and query it for mbuf parameters at that time. If they exceed the existing high-water mark allocation, we re-allocate and track the new high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone. In a future patch in this series, this will allow initiating netdump from panic ddb(4) without pre-panic configuration. No other functional change intended. Reviewed by: markj (earlier version) Some discussion with: emaste, jhb Objection from: marius Differential Revision: https://reviews.freebsd.org/D21421	2019-10-17 16:23:03 +00:00
glebius	a6c3971a22	igmp_v1v2_queue_report() doesn't require epoch.	2019-10-17 16:02:34 +00:00
hselasky	94dc322ef6	Fix panic in network stack due to use after free when receiving partial fragmented packets before a network interface is detached. When sending IPv4 or IPv6 fragmented packets and a fragment is lost before the network device is freed, the mbuf making up the fragment will remain in the temporary hashed fragment list and cause a panic when it times out due to accessing a freed network interface structure. 1) Make sure the m_pkthdr.rcvif always points to a valid network interface. Else the rcvif field should be set to NULL. 2) Use the rcvif of the last received fragment as m_pkthdr.rcvif for the fully defragged packet, instead of the first received fragment. Panic backtrace for IPv6: panic() icmp6_reflect() # tries to access rcvif->if_afdata[AF_INET6]->xxx icmp6_error() frag6_freef() frag6_slowtimo() pfslowtimo() softclock_call_cc() softclock() ithread_loop() Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D19622 MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-16 09:11:49 +00:00
tuexen	9c9657076e	Separate out SCTP related dtrace code. This is based on work done by markj@. Discussed with: markj@ MFC after: 3 days	2019-10-14 20:32:11 +00:00
rrs	2271f78dc9	if_hw_tsomaxsegsize needs to be initialized to zero, just like in bbr.c and tcp_output.c	2019-10-14 13:10:29 +00:00
tuexen	c376e82038	Rename sctp_dtrace_declare.h to sctp_kdtrace.h for consistentcy. MFC after: 3 days	2019-10-14 13:02:49 +00:00
glebius	c58d10ed59	Revert r353313. It is not needed with r353357 and is actually incorrect.	2019-10-14 04:10:00 +00:00
tuexen	e037f7f53e	Use an event handler to notify the SCTP about IP address changes instead of calling an SCTP specific function from the IP code. This is a requirement of supporting SCTP as a kernel loadable module. This patch was developed by markj@, I tweaked a bit the SCTP related code. Submitted by: markj@ MFC after: 3 days	2019-10-13 18:17:08 +00:00
markj	8cb00574b8	Move SCTP DTrace probe definitions into a .c file. Previously they were defined in a header which was included exactly once. Change this to follow the usual practice of putting definitions in C files. No functional change intended. Discussed with: tuexen MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-10-13 16:14:04 +00:00
tuexen	b9f81c4458	Ensure that local variables are reset to their initial value when dealing with error cases in a loop over all remote addresses. This issue was found and reported by OSS_Fuzz in: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18080 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18086 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18121 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18163 MFC after: 3 days	2019-10-12 17:57:03 +00:00
glebius	3c520efd9f	The divert(4) module must always be running in network epoch, thus call to if_addr_rlock() isn't needed.	2019-10-10 23:48:42 +00:00
imp	516b5533ca	Fix casting error from newer gcc Cast the pointers to (uintptr_t) before assigning to type uint64_t. This eliminates an error from gcc when we cast the pointer to a larger integer type.	2019-10-09 21:02:06 +00:00
hselasky	ff0a2f4dbc	Factor out TCP rateset destruction code. Ensure the epoch_call() function is not called more than one time before the callback has been executed, by always checking the RS_FUNERAL_SCHD flag before invoking epoch_call(). The "rs_number_dead" is balanced again after r353353. Discussed with: rrs@ Sponsored by: Mellanox Technologies	2019-10-09 17:08:40 +00:00
glebius	ac17d49973	Revert most of the multicast changes from r353292. This needs a more accurate approach.	2019-10-09 17:03:20 +00:00
hselasky	166e590d9c	Fix locking order reversal in the TCP ratelimit code by moving destructors outside the rsmtx mutex. Witness message: lock order reversal: (sleepable after non-sleepable) 1st tcp_rs_mtx (rsmtx) @ sys/netinet/tcp_ratelimit.c:242 2nd sysctl lock (sysctl lock) @ sys/kern/kern_sysctl.c:607 Backtrace: witness_debugger witness_checkorder _rm_wlock_debug sysctl_ctx_free rs_destroy epoch_call_task gtaskqueue_run_locked gtaskqueue_thread_loop Discussed with: rrs@ Sponsored by: Mellanox Technologies	2019-10-09 16:48:48 +00:00
jhb	02e5a4c53c	Add a TOE KTLS mode and a TOE hook for allocating TLS sessions. This adds the glue to allocate TLS sessions and invokes it from the TLS enable socket option handler. This also adds some counters for active TOE sessions. The TOE KTLS mode is returned by getsockopt(TLSTX_TLS_MODE) when TOE KTLS is in use on a socket, but cannot be set via setsockopt(). To simplify various checks, a TLS session now includes an explicit 'mode' member set to the value returned by TLSTX_TLS_MODE. Various places that used to check 'sw_encrypt' against NULL to determine software vs ifnet (NIC) TLS now check 'mode' instead. Reviewed by: np, gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21891	2019-10-08 21:34:06 +00:00
glebius	1c64917ffa	Quickly plug another regression from r353292. Again, multicast locking needs lots of work... Reported by: pho	2019-10-08 16:59:17 +00:00
tuexen	e8c2889eec	Validate length before use it, not vice versa. r353060 should have contained this... This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18070 MFC after: 3 days	2019-10-08 11:07:16 +00:00
glebius	337378e04f	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
tuexen	03060312fd	In r343587 a simple port filter as sysctl tunable was added to siftr. The new sysctl was not added to the siftr.4 man page at the time. This updates the man page, and removes one left over trailing whitespace. Submitted by: Richard Scheffenegger Reviewed by: bcr@ MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D21619	2019-10-07 20:35:04 +00:00
rrs	b1b5cae2c8	Brad Davis identified a problem with the new LRO code, VLAN's no longer worked. The problem was that the defines used the same space as the VLAN id. This commit does three things. 1) Move the LRO used fields to the PH_per fields. This is safe since the entire PH_per is used for IP reassembly which LRO code will not hit. 2) Remove old unused pace fields that are not used in mbuf.h 3) The VLAN processing is not in the mbuf queueing code. Consequently if a VLAN submits to Rack or BBR we need to bypass the mbuf queueing for now until rack_bbr_common is updated to handle the VLAN properly. Reported by: Brad Davis	2019-10-06 22:29:02 +00:00
tuexen	f1dc180209	Plumb an mbuf leak in a code path that should not be taken. Also avoid that this path is taken by setting the tail pointer correctly. There is still bug related to handling unordered unfragmented messages which were delayed in deferred handling. This issue was found by OSS-Fuzz testing the usrsctp stack and reported in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17794 MFC after: 3 days	2019-10-06 08:47:10 +00:00
tuexen	e878b27186	Fix a use after free bug when removing remote addresses. This bug was found by OSS-Fuzz and reported in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18004 MFC after: 3 days	2019-10-05 13:28:01 +00:00
tuexen	15678bfa06	Plumb an mbuf leak found by Mark Wodrich from Google by fuzz testing the userland stack and reporting it in: https://github.com/sctplab/usrsctp/issues/396 MFC after: 3 days	2019-10-05 12:34:50 +00:00
tuexen	353256a9ce	Fix the adding of padding to COOKIE-ECHO chunks. Thanks to Mark Wodrich who found this issue while fuzz testing the usrsctp stack and reported the issue in https://github.com/sctplab/usrsctp/issues/382 MFC after: 3 days	2019-10-05 09:46:11 +00:00
tuexen	f630826573	When skipping the address parameter, take the padding into account. MFC after: 3 days	2019-10-03 20:47:57 +00:00
tuexen	9d0210a9cf	Cleanup sctp_asconf_error_response() and ensure that the parameter is padded as required. This fixes the followig bug reported by OSS-Fuzz for the usersctp stack: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17790 MFC after: 3 days	2019-10-03 20:39:17 +00:00
tuexen	ff50a5791b	Add missing input validation. This could result in reading from uninitialized memory. The issue was found by OSS-Fuzz for usrsctp and reported in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17780 MFC after: 3 days	2019-10-03 18:36:54 +00:00
tuexen	5f8f65d380	Don't use stack memory which is not initialized. Thanks to Mark Wodrich for reporting this issue for the userland stack in https://github.com/sctplab/usrsctp/issues/380 This issue was also found for usrsctp by OSS-fuzz in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17778 MFC after: 3 days	2019-09-30 12:06:57 +00:00
tuexen	5f6d43232d	RFC 7112 requires a host to put the complete IP header chain including the TCP header in the first IP packet. Enforce this in tcp_output(). In addition make sure that at least one byte payload fits in the TCP segement to allow making progress. Without this check, a kernel with INVARIANTS will panic. This issue was found by running an instance of syzkaller. Reviewed by: jtl@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21665	2019-09-29 10:45:13 +00:00
tuexen	dc3d530b6c	Replacing MD5 by SipHash improves the performance of the TCP time stamp initialisation, which is important when the host is dealing with a SYN flood. This affects the computation of the initial TCP sequence number for the client side. This has been discussed with secteam@. Reviewed by: gallatin@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21616	2019-09-28 13:13:23 +00:00
tuexen	09c42850dd	Ensure that the INP lock is released before leaving [gs]etsockopt() for RACK specific socket options. These issues were found by a syzkaller instance. Reviewed by: rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21825	2019-09-28 13:05:37 +00:00
jtl	9b94fc3ad9	Add new functionality to switch to using cookies exclusively when we the syn cache overflows. Whether this is due to an attack or due to the system having more legitimate connections than the syn cache can hold, this situation can quickly impact performance. To make the system perform better during these periods, the code will now switch to exclusively using cookies until the syn cache stops overflowing. In order for this to occur, the system must be configured to use the syn cache with syn cookie fallback. If syn cookies are completely disabled, this change should have no functional impact. When the system is exclusively using syn cookies (either due to configuration or the overflow detection enabled by this change), the code will now skip acquiring a lock on the syn cache bucket. Additionally, the code will now skip lookups in several places (such as when the system receives a RST in response to a SYN\|ACK frame). Reviewed by: rrs, gallatin (previous version) Discussed with: tuexen Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:18:57 +00:00
jtl	3497c0f2a2	Access the syncache secret directly from the V_tcp_syncache variable, rather than indirectly through the backpointer to the tcp_syncache structure stored in the hashtable bucket. This also allows us to remove the requirement in syncookie_generate() and syncookie_lookup() that the syncache hashtable bucket must be locked. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:06:46 +00:00
jtl	7ca5024880	Remove the unused sch parameter to the syncache_respond() function. The use of this parameter was removed in r313330. This commit now removes passing this now-unused parameter. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D21644	2019-09-26 15:02:34 +00:00
rrs	804541b4e4	lets put (void) in a couple of functions to keep older platforms that are stuck with gcc happy (ppc). The changes are needed in both bbr and rack. Obtained from: Michael Tuexen (mtuexen@)	2019-09-24 20:36:43 +00:00

1 2 3 4 5 ...

6299 Commits