freebsd-skq

Author	SHA1	Message	Date
mmacy	7aeac9ef18	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
markj	8d72b7a607	Fix netdump configuration when VIMAGE is enabled. We need to set the current vnet before iterating over the global interface list. Because the dump device may only be set from the host, only proceed with configuration if the thread belongs to the default vnet. [1] Also fix a resource leak that occurs if the priv_check() in set_dumper() fails. Reported by: mmacy, sbruno [1] Reviewed by: sbruno X-MFC with: r333283 Differential Revision: https://reviews.freebsd.org/D15449	2018-05-17 04:08:57 +00:00
lstewart	c5f2dadecb	Plug a memory leak and potential NULL-pointer dereference introduced in r331214. Each TCP connection that uses the system default cc_newreno(4) congestion control algorithm module leaks a "struct newreno" (8 bytes of memory) at connection initialisation time. The NULL-pointer dereference is only germane when using the ABE feature, which is disabled by default. While at it: - Defer the allocation of memory until it is actually needed given that ABE is optional and disabled by default. - Document the ENOMEM errno in getsockopt(2)/setsockopt(2). - Document ENOMEM and ENOBUFS in tcp(4) as being synonymous given that they are used interchangeably throughout the code. - Fix a few other nits also accidentally omitted from the original patch. Reported by: Harsh Jain on freebsd-net@ Tested by: tjh@ Differential Revision: https://reviews.freebsd.org/D15358	2018-05-17 02:46:27 +00:00
brooks	f7144e3d61	Unwrap a line that no longer requires wrapping.	2018-05-15 20:14:38 +00:00
brooks	948a40e7de	Remove stray tabs from in_lltable_dump_entry().	2018-05-15 20:13:00 +00:00
shurd	d32470f6c3	Check that ifma_protospec != NULL in inm_lookup If ifma_protospec is NULL when inm_lookup() is called, there is a dereference in a NULL struct pointer. This ensures that struct is not NULL before comparing the address. Reported by: dumbbell Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15440	2018-05-15 16:54:41 +00:00
tuexen	2a6092ee58	sctp_get_mbuf_for_msg() should honor the allinone parameter. When it is not required that the buffer is not a chain, return a chain. This is based on a patch provided by Irene Ruengeler.	2018-05-14 15:16:51 +00:00
tuexen	e025e3497c	Ensure that the MTU's used are multiple of 4. The length of SCTP packets is always a multiple of 4. Therefore, ensure that the MTUs used are also a multiple of 4. Thanks to Irene Ruengeler for providing an earlier version of this patch. MFC after: 1 week	2018-05-14 13:50:17 +00:00
shurd	210352aeb1	Fix LORs in in6?_leave_group() r333175 updated the join_group functions, but not the leave_group ones. Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15393	2018-05-11 21:42:27 +00:00
imp	d8f0115e23	Use the full year, for real this time.	2018-05-09 20:26:37 +00:00
imp	e08b855ef7	Minor style nits Use full copyright year. Remove 'All Rights Reserved' from new file (rights holder OK'd) Minor #ifdef motion and #endif tagging Remove __FBSDID macro from comments Sponsored by: Netflix OK'd by: rrs@	2018-05-09 14:11:35 +00:00
tuexen	f92d8838e6	Fix two typos reported by N. J. Mann, which were introduced in https://svnweb.freebsd.org/changeset/base/333382 by me. MFC after: 3 days	2018-05-08 20:39:35 +00:00
tuexen	af0c2859d5	When reporting ERROR or ABORT chunks, don't use more data that is guaranteed to be contigous. Thanks to Felix Weinrank for finding and reporting this bug by fuzzing the usrsctp stack. MFC after: 3 days	2018-05-08 18:48:51 +00:00
mmacy	fe8ba29698	Fix spurious retransmit recovery on low latency networks TCP's smoothed RTT (SRTT) can be much larger than an actual observed RTT. This can be either because of hz restricting the calculable RTT to 10ms in VMs or 1ms using the default 1000hz or simply because SRTT recently incorporated a larger value. If an ACK arrives before the calculated badrxtwin (now + SRTT): tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1)); We'll erroneously reset snd_una to snd_max. If multiple segments were dropped and this happens repeatedly the transmit rate will be limited to 1MSS per RTO until we've retransmitted all drops. Reported by: rstone Reviewed by: hiren, transport Approved by: sbruno MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8556	2018-05-08 02:22:34 +00:00
mav	5c9b27a2b9	Keep CARP state as INIT when net.inet.carp.allow=0. Currently when net.inet.carp.allow=0 CARP state remains as MASTER, which is not very useful (if there are other masters -- it can lead to split brain, if there are none -- it makes no sense). Having it as INIT makes it clear that carp packets are disabled. Submitted by: wg MFC after: 1 month Relnotes: yes Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D14477	2018-05-07 14:44:55 +00:00
mmacy	d3f138323c	r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a a multicast update would still be queued for deletion when ifconfig deleted the interface thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses on the interface. Synchronously remove all external references to a multicast address before enqueueing for delete. Reported by: lwhsu Approved by: sbruno	2018-05-06 20:34:13 +00:00
tuexen	e313499b23	Ensure we are not dereferencing a NULL pointer. This was found by Coverity scanning the usrsctp stack (CID 203808). MFC after: 3 days	2018-05-06 14:19:50 +00:00
markj	071e927e22	Import the netdump client code. This is a component of a system which lets the kernel dump core to a remote host after a panic, rather than to a local storage device. The server component is available in the ports tree. netdump is particularly useful on diskless systems. The netdump(4) man page contains some details describing the protocol. Support for configuring netdump will be added to dumpon(8) in a future commit. To use netdump, the kernel must have been compiled with the NETDUMP option. The initial revision of netdump was written by Darrell Anderson and was integrated into Sandvine's OS, from which this version was derived. Reviewed by: bdrewery, cem (earlier versions), julian, sbruno MFC after: 1 month X-MFC note: use a spare field in struct ifnet Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15253	2018-05-06 00:38:29 +00:00
mmacy	d4a327c789	Currently in_pcbfree will unconditionally wunlock the pcbinfo lock to avoid a LOR on the multicast list lock in the freemoptions routines. As it turns out, tcp_usr_detach can acquire the tcbinfo lock readonly. Trying to wunlock the pcbinfo lock in that context has caused a number of reported crashes. This change unclutters in_pcbfree and moves the handling of wunlock vs runlock of pcbinfo to the freemoptions routine. Reported by: mjg@, bde@, o.hartmann at walstatt.org Approved by: sbruno	2018-05-05 22:40:40 +00:00
ae	141c20e3d9	Immediately propagate EACCES error code to application from tcp_output. In r309610 and r315514 the behavior of handling EACCES was changed, and tcp_output() now returns zero when EACCES happens. The reason of this change was a hesitation that applications that use TCP-MD5 will be affected by changes in project/ipsec. TCP-MD5 code returns EACCES when security assocition for given connection is not configured. But the same error code can return pfil(9), and this change has affected connections blocked by pfil(9). E.g. application doesn't return immediately when SYN segment is blocked, instead it waits when several tries will be failed. Actually, for TCP-MD5 application it doesn't matter will it get EACCES after first SYN, or after several tries. Security associtions must be configured before initiating TCP connection. I left the EACCES in the switch() to show that it has special handling. Reported by: Andreas Longwitz <longwitz at incore dot de> MFC after: 10 days	2018-05-04 09:28:12 +00:00
sbruno	8bb5c83353	cc_cubic: - Update cubic parameters to draft-ietf-tcpm-cubic-04 Submitted by: Matt Macy <mmacy@mattmacy.io> Reviewed by: lstewart Differential Revision: https://reviews.freebsd.org/D10556	2018-05-03 15:01:27 +00:00
tuexen	08084bf9c5	SImplify the call to tcp_drop(), since the handling of soft error is also done in tcp_drop(). No functional change. Sponsored by: Netflix, Inc.	2018-05-02 20:04:31 +00:00
shurd	7d4b8facc7	Separate list manipulation locking from state change in multicast Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead. Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969	2018-05-02 19:36:29 +00:00
rrs	65a6d604ff	This change re-arranges the fields within the tcp-pcb so that they are more in order of cache line use as one passes through the tcp_input/output paths (non-errors most likely path). This helps speed up cache line optimization so that the tcp stack runs a bit more efficently. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15136	2018-04-26 21:41:16 +00:00
sbruno	257e6e5563	Revert r332894 at the request of the submitter. Submitted by: Johannes Lundberg <johalun0_gmail.com> Sponsored by: Limelight Networks	2018-04-24 19:55:12 +00:00
sbruno	bbf7d4dd03	Load balance sockets with new SO_REUSEPORT_LB option This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple programs or threads to bind to the same port and incoming connections will be load balanced using a hash function. Most of the code was copied from a similar patch for DragonflyBSD. However, in DragonflyBSD, load balancing is a global on/off setting and can not be set per socket. This patch allows for simultaneous use of both the current SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system. Required changes to structures Globally change so_options from 16 to 32 bit value to allow for more options. Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets. Limitations As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or threads sharing the same socket). Submitted by: Johannes Lundberg <johanlun0@gmail.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11003	2018-04-23 19:51:00 +00:00
rrs	ff2546242d	These two modules need the tcp_hpts.h file for when the option is enabled (not sure how LINT/build-universe missed this) opps. Sponsored by: Netflix Inc	2018-04-19 15:03:48 +00:00
rrs	863f90dbfa	This commit brings in the TCP high precision timer system (tcp_hpts). It is the forerunner/foundational work of bringing in both Rack and BBR which use hpts for pacing out packets. The feature is optional and requires the TCPHPTS option to be enabled before the feature will be active. TCP modules that use it must assure that the base component is compile in the kernel in which they are loaded. MFC after: Never Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15020	2018-04-19 13:37:59 +00:00
brooks	26c165ead9	Remove support for the Arcnet protocol. While Arcnet has some continued deployment in industrial controls, the lack of drivers for any of the PCI, USB, or PCIe NICs on the market suggests such users aren't running FreeBSD. Evidence in the PR database suggests that the cm(4) driver (our sole Arcnet NIC) was broken in 5.0 and has not worked since. PR: 182297 Reviewed by: jhibbits, vangyzen Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15057	2018-04-13 21:18:04 +00:00
brooks	6dcf9514b3	Remove support for FDDI networks. Defines in net/if_media.h remain in case code copied from ifconfig is in use elsewere (supporting non-existant media type is harmless). Reviewed by: kib, jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15017	2018-04-11 17:28:24 +00:00
jtl	ba15ec2e62	Add missing header change from r332381. Sponsored by: Netflix, Inc. Pointy hat: jtl	2018-04-10 17:00:37 +00:00
jtl	32f06b2ce4	Modify the net.inet.tcp.function_ids sysctl introduced in r331347. Export additional information which may be helpful to userspace consumers and rename the sysctl to net.inet.tcp.function_info. Sponsored by: Netflix, Inc.	2018-04-10 16:59:36 +00:00
jtl	61b590a78c	Move the TCP Blackbox Recorder probe in tcp_output.c to be with the other tracing/debugging code. Sponsored by: Netflix, Inc.	2018-04-10 15:54:29 +00:00
jtl	fbca93e2ac	Clean up some debugging code left in tcp_log_buf.c from r331347. Sponsored by: Netflix, Inc.	2018-04-10 15:51:37 +00:00
tuexen	c3e1813aee	Fix a logical inversion bug. Thanks to Irene Ruengeler for finding and reporting this bug. MFC after: 3 days	2018-04-08 12:08:20 +00:00
tuexen	41e5c2452e	Small cleanup, no functional change. MFC after: 3 days	2018-04-08 11:50:06 +00:00
tuexen	d43cfbd71e	Fix a signed/unsigned warning showing up for the userland stack on some platforms. Thanks to Felix Weinrank for reporting the issue. MFC after:i 3 days	2018-04-08 11:37:00 +00:00
brooks	9d79658aab	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
jtl	c40b31d1bf	If a user closes the socket before we call tcp_usr_abort(), then tcp_drop() may unlock the INP. Currently, tcp_usr_abort() does not check for this case, which results in a panic while trying to unlock the already-unlocked INP (not to mention, a use-after-free violation). Make tcp_usr_abort() check the return value of tcp_drop(). In the case where tcp_drop() returns NULL, tcp_usr_abort() can skip further steps to abort the connection and simply unlock the INP_INFO lock prior to returning. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Netflix, Inc.	2018-04-06 17:20:37 +00:00
jtl	15af8ae175	Check that in_pcbfree() is only called once for each PCB. If that assumption is violated, "bad things" could follow. I believe such an assert would have detected some of the problems jch@ was chasing in PR 203175 (see r307551). We also use it in our internal TCP development efforts. And, in case a bug does slip through to released code, this change silently ignores subsequent calls to in_pcbfree(). Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D14990	2018-04-06 16:48:11 +00:00
emaste	44d4315517	Fix kernel memory disclosure in tcp_ctloutput strcpy was used to copy a string into a buffer copied to userland, which left uninitialized data after the terminating 0-byte. Use the same approach as in tcp_subr.c: strncpy and explicit '\0'. admbugs: 765, 822 MFC after: 1 day Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reported by: Vlad Tsyrklevich Security: Kernel memory disclosure Sponsored by: The FreeBSD Foundation	2018-04-04 21:12:35 +00:00
jtl	be2627c2e3	r330675 introduced an extra window check in the LRO code to ensure it captured and reported the highest window advertisement with the same SEQ/ACK. However, the window comparison uses modulo 216 math, rather than directly comparing the absolute values. Because windows use absolute values and not modulo 216 math (i.e. they don't wrap), we need to compare the absolute values. Reviewed by: gallatin MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D14937	2018-04-03 13:54:38 +00:00
np	c4e1d948c4	Add a hook to allow the toedev handling an offloaded connection to provide accurate TCP_INFO. Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14816	2018-04-03 01:08:54 +00:00
brooks	ac0325b4db	Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900	2018-03-30 18:50:13 +00:00
np	82a840f165	Fix RSS build (broken in r331309). Sponsored by: Chelsio Communications	2018-03-29 19:48:17 +00:00
brooks	a45d44647f	Remove infrastructure for token-ring networks. Reviewed by: cem, imp, jhb, jmallett Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14875	2018-03-28 23:33:26 +00:00
sbruno	6c66154766	CC Cubic: fix underflow for cubic_cwnd() Singed calculations in cubic_cwnd() can result in negative cwnd value which is then cast to an unsigned value. Values less than 1 mss are generally bad for other parts of the code, also fixed. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14141	2018-03-26 19:53:36 +00:00
jtl	54451a3dd7	Make the TCP blackbox code committed in r331347 be an optional feature controlled by the TCP_BLACKBOX option. Enable this as part of amd64 GENERIC. For now, leave it disabled on other platforms. Sponsored by: Netflix, Inc.	2018-03-24 12:48:10 +00:00
jtl	27cdb06ce8	Fix compilation for platforms that don't support atomic_fetchadd_64() after r331347. Reported by: avg, br, jhibbits Sponsored by: Netflix, Inc.	2018-03-24 12:40:45 +00:00
sbruno	5e726646d5	Revert r331379 as the "simple" lock changes have revealed a deeper problem and need for a rethink. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks	2018-03-23 18:34:38 +00:00
kp	109a7b5eec	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
sbruno	57df63d5af	Simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14624	2018-03-22 22:29:32 +00:00
jtl	a93bdf6963	Add the "TCP Blackbox Recorder" which we discussed at the developer summits at BSDCan and BSDCam in 2017. The TCP Blackbox Recorder allows you to capture events on a TCP connection in a ring buffer. It stores metadata with the event. It optionally stores the TCP header associated with an event (if the event is associated with a packet) and also optionally stores information on the sockets. It supports setting a log ID on a TCP connection and using this to correlate multiple connections that share a common log ID. You can log connections in different modes. If you are doing a coordinated test with a particular connection, you may tell the system to put it in mode 4 (continuous dump). Or, if you just want to monitor for errors, you can put it in mode 1 (ring buffer) and dump all the ring buffers associated with the connection ID when we receive an error signal for that connection ID. You can set a default mode that will be applied to a particular ratio of incoming connections. You can also manually set a mode using a socket option. This commit includes only basic probes. rrs@ has added quite an abundance of probes in his TCP development work. He plans to commit those soon. There are user-space programs which we plan to commit as ports. These read the data from the log device and output pcapng files, and then let you analyze the data (and metadata) in the pcapng files. Reviewed by: gnn (previous version) Obtained from: Netflix, Inc. Relnotes: yes Differential Revision: https://reviews.freebsd.org/D11085	2018-03-22 09:40:08 +00:00
glebius	3ef748bde0	Fix LINT-NOINET build initializing local to false. This is a dead code, since for NOINET build isipv6 is always true, but this dead code makes it compilable. Reported by: rpokala	2018-03-22 05:07:57 +00:00
glebius	e5ec0a0e43	The net.inet.tcp.nolocaltimewait=1 optimization prevents local TCP connections from entering the TIME_WAIT state. However, it omits sending the ACK for the FIN, which results in RST. This becomes a bigger deal if the sysctl net.inet.tcp.blackhole is 2. In this case RST isn't send, so the other side of the connection (also local) keeps retransmitting FINs. To fix that in tcp_twstart() we will not call tcp_close() immediately. Instead we will allocate a tcptw on stack and proceed to the end of the function all the way to tcp_twrespond(), to generate the correct ACK, then we will drop the last PCB reference. While here, make a few tiny improvements: - use bools for boolean variable - staticize nolocaltimewait - remove pointless acquisiton of socket lock Reported by: jtl Reviewed by: jtl Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D14697	2018-03-21 20:59:30 +00:00
jtl	ee029a5d0b	If the INP lock is uncontested, avoid taking a reference and jumping through the lock-switching hoops. A few of the INP lookup operations that lock INPs after the lookup do so using this mechanism (to maintain lock ordering): 1. Lock lookup structure. 2. Find INP. 3. Acquire reference on INP. 4. Drop lock on lookup structure. 5. Acquire INP lock. 6. Drop reference on INP. This change provides a slightly shorter path for cases where the INP lock is uncontested: 1. Lock lookup structure. 2. Find INP. 3. Try to acquire the INP lock. 4. If successful, drop lock on lookup structure. Of course, if the INP lock is contested, the functions will need to revert to the previous way of switching locks safely. This saves a few atomic operations when the INP lock is uncontested. Discussed with: gallatin, rrs, rwatson MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D12911	2018-03-21 15:54:46 +00:00
lstewart	22e6359728	Add support for the experimental Internet-Draft "TCP Alternative Backoff with ECN (ABE)" proposal to the New Reno congestion control algorithm module. ABE reduces the amount of congestion window reduction in response to ECN-signalled congestion relative to the loss-inferred congestion response. More details about ABE can be found in the Internet-Draft: https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn The implementation introduces four new sysctls: - net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to enable ABE for ECN-enabled TCP connections. - net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the multiplicative window decrease factor, specified as a percentage, applied to the congestion window in response to a loss-based or ECN-based congestion signal respectively. They default to the values specified in the draft i.e. beta=50 and beta_ecn=80. - net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to non-zero to enable the use of standard beta (50% by default) when repairing loss during an ECN-signalled congestion recovery episode. It enables a more conservative congestion response and is provided for the purposes of experimentation as a result of some discussion at IETF 100 in Singapore. The values of beta and beta_ecn can also be set per-connection by way of the TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or CC_NEWRENO_BETA_ECN CC algo sub-options. Submitted by: Tom Jones <tj@enoti.me> Tested by: Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au> Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D11616	2018-03-19 16:37:47 +00:00
melifaro	7bb5ee0db4	Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration. Current arp/nd code relies on the feedback from the datapath indicating that the entry is still used. This mechanism is incorporated into the arpresolve()/nd6_resolve() routines. After the inpcb route cache introduction, the packet path for the locally-originated packets changed, passing cached lle pointer to the ether_output() directly. This resulted in the arp/ndp entry expire each time exactly after the configured max_age interval. During the small window between the ARP/NDP request and reply from the router, most of the packets got lost. Fix this behaviour by plugging datapath notification code to the packet path used by route cache. Unify the notification code by using single inlined function with the per-AF callbacks. Reported by: sthaug at nethelp.no Reviewed by: ae MFC after: 2 weeks	2018-03-17 17:05:48 +00:00
tuexen	0130aa4ca8	Set the inp_vflag consistently for accepted TCP/IPv6 connections when net.inet6.ip6.v6only=0. Without this patch, the inp_vflag would have INP_IPV4 and the INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl variable net.inet6.ip6.v6only is 0. This resulted in netstat to report the source and destination addresses as IPv4 addresses, even they are IPv6 addresses. PR: 226421 Reviewed by: bz, hiren, kib MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D13514	2018-03-16 15:26:07 +00:00
sbruno	fcab00fadd	Update tcp_lro with tested bugfixes from Netflix and LLNW: rrs - Lets make the LRO code look for true dup-acks and window update acks fly on through and combine. rrs - Make the LRO engine a bit more aware of ack-only seq space. Lets not have it incorrectly wipe out newer acks for older acks when we have out-of-order acks (common in wifi environments). jeggleston - LRO eating window updates Based on all of the above I think we are RFC compliant doing it this way: https://tools.ietf.org/html/rfc1122 section 4.2.2.16 "Note that TCP has a heuristic to select the latest window update despite possible datagram reordering; as a result, it may ignore a window update with a smaller window than previously offered if neither the sequence number nor the acknowledgment number is increased." Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reviewed by: rstone gallatin Sponsored by: NetFlix and Limelight Networks Differential Revision: https://reviews.freebsd.org/D14540	2018-03-09 00:08:43 +00:00
tuexen	f5bbb5cecc	When checking the TCP fast cookie length, conststently also check for the minimum length. This fixes a bug where cookies of length 2 bytes (which is smaller than the minimum length of 4) is provided by the server. Sponsored by: Netflix, Inc.	2018-02-27 22:12:38 +00:00
pkelsey	6016e69c01	Ensure signed comparison to avoid false trip of assert during VNET teardown. Reported by: lwhsu MFC after: 1 month	2018-02-26 20:31:16 +00:00
pkelsey	91dbc8d21a	Greatly reduce the number of #ifdefs supporting the TCP_RFC7413 kernel option. The conditional compilation support is now centralized in tcp_fastopen.h and tcp_var.h. This doesn't provide the minimum theoretical code/data footprint when TCP_RFC7413 is disabled, but nearly all the TFO code should wind up being removed by the optimizer, the additional footprint in the syncache entries is a single pointer, and the additional overhead in the tcpcb is at the end of the structure. This enables the TCP_RFC7413 kernel option by default in amd64 and arm64 GENERIC. Reviewed by: hiren MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14048	2018-02-26 03:03:41 +00:00
pkelsey	d2557e90a6	This is an implementation of the client side of TCP Fast Open (TFO) [RFC7413]. It also includes a pre-shared key mode of operation in which the server requires the client to be in possession of a shared secret in order to successfully open TFO connections with that server. The names of some existing fastopen sysctls have changed (e.g., net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable). Reviewed by: tuexen MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14047	2018-02-26 02:53:22 +00:00
pkelsey	acbbc54961	Fix harmless locking bug in tfp_fastopen_check_cookie(). The keylist lock was not being acquired early enough. The only side effect of this bug is that the effective add time of a new key could be slightly later than it would have been otherwise, as seen by a TFO client. Reviewed by: tuexen MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14046	2018-02-26 02:43:26 +00:00
ae	a6619b0d9c	Reinitialize IP header length after checksum calculation. It is used later by TCP-MD5 code. This fixes the problem with broken TCP-MD5 over IPv4 when NIC has disabled TCP checksum offloading. PR: 223835 MFC after: 1 week	2018-02-10 10:13:17 +00:00
ae	545ce709f1	Rework ipfw dynamic states implementation to be lockless on fast path. o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and for dynamic states implementation information; o added DYN_LOOKUP_NEEDED() macro that can be used to determine the need of new lookup of dynamic states; o ipfw_dyn_rule now becomes obsolete. Currently it used to pass information from kernel to userland only. o IPv4 and IPv6 states now described by different structures dyn_ipv4_state and dyn_ipv6_state; o IPv6 scope zones support is added; o ipfw(4) now depends from Concurrency Kit; o states are linked with "entry" field using CK_SLIST. This allows lockless lookup and protected by mutex modifications. o the "expired" SLIST field is used for states expiring. o struct dyn_data is used to keep generic information for both IPv4 and IPv6; o struct dyn_parent is used to keep O_LIMIT_PARENT information; o IPv4 and IPv6 states are stored in different hash tables; o O_LIMIT_PARENT states now are kept separately from O_LIMIT and O_KEEP_STATE states; o per-cpu dyn_hp pointers are used to implement hazard pointers and they prevent freeing states that are locklessly used by lookup threads; o mutexes to protect modification of lists in hash tables now kept in separate arrays. 65535 limit to maximum number of hash buckets now removed. o Separate lookup and install functions added for IPv4 and IPv6 states and for parent states. o By default now is used Jenkinks hash function. Obtained from: Yandex LLC MFC after: 42 days Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D12685	2018-02-07 18:59:54 +00:00
jhb	3df465f54f	Export tcp_always_keepalive for use by the Chelsio TOM module. This used to work by accident with ld.bfd even though always_keepalive was marked as static. LLD honors static more correctly, so export this variable properly (including moving it into the tcp_* namespace). Reviewed by: bz, emaste MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14129	2018-01-30 23:01:37 +00:00
tuexen	17d3a1c234	Add constant for the PAD chunk as defined in RFC 4820. This will be used by traceroute and traceroute6 soon. MFC after: 1 week	2018-01-27 13:46:55 +00:00
tuexen	42eb2f4fed	Update references in comments, since the IDs have become an RFC long time ago. Also cleanup whitespaces. No functional change. MFC after: 1 week	2018-01-27 13:43:03 +00:00
cem	c060d198e3	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035	2018-01-25 22:25:13 +00:00
markj	19e9e67650	Use tcpinfoh_t for TCP headers in the tcp:::debug-{drop,input} probes. The header passed to these probes has some fields converted to host order by tcp_fields_to_host(), so the tcpinfo_t translator doesn't do what we want. Submitted by: Hannes Mehnert MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12647	2018-01-25 15:35:34 +00:00
np	af35a0e296	Do not generate illegal mbuf chains during IP fragment reassembly. Only the first mbuf of the reassembled datagram should have a pkthdr. This was discovered with cxgbe(4) + IPSEC + ping with payload more than interface MTU. cxgbe can generate !M_WRITEABLE mbufs and this results in m_unshare being called on the reassembled datagram, and it complains: panic: m_unshare: m0 0xfffff80020f82600, m 0xfffff8005d054100 has M_PKTHDR PR: 224922 Reviewed by: ae@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14009	2018-01-24 05:09:21 +00:00
rstone	c0c5474ab0	Reduce code duplication for inpcb route caching Add a new macro to clear both the L3 and L2 route caches, to hopefully prevent future instances where only the L3 cache was cleared when both should have been. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13989 Reviewed by: karels	2018-01-23 03:15:39 +00:00
tuexen	b096e231df	Fix a bug related to fast retransmissions. When processing a SACK advancing the cumtsn-ack in fast recovery, increment the miss-indications for all TSN's reported as missing. Thanks to Fabian Ising for finding the bug and to Timo Voelker for provinding a fix. This fix moves also CMT related initialisation of some variables to a more appropriate place. MFC after: 1 week	2018-01-16 21:58:38 +00:00
tuexen	8524b21106	Don't provide a (meaningless) cmsg when proving a notification in a recvmsg() call. MFC after: 1 week	2018-01-15 21:59:20 +00:00
pfg	f48ea5d543	libalias: small memory allocation cleanups. Make the calloc wrappers behave as expected by using mallocarray. It is rather weird that the malloc wrappers also zeroes the memory: update a comment to reflect at least two cases where it is expected. Reviewed by: tuexen	2018-01-12 23:12:30 +00:00
cy	4e273060ca	Correct the comment describing badrs which is bad router solicitiation, not bad router advertisement. MFC after: 3 days	2017-12-29 07:23:18 +00:00
tuexen	6ed0ec5502	White cleanups.	2017-12-26 16:33:55 +00:00
tuexen	ef787a75f4	Clearify CID 1008197. MFC after: 3 days	2017-12-26 16:12:04 +00:00
tuexen	5f82c8cdf4	Clearify issue reported in CID 1008198. MFC after: 3 days	2017-12-26 16:06:11 +00:00
tuexen	13f672e8e2	Fix CID 1008428. MFC after: 1 week	2017-12-26 15:29:11 +00:00
tuexen	d45a807ebc	Fix CID 1008936.	2017-12-26 15:24:42 +00:00
tuexen	c1c52b624d	Allow the first (and second) argument of sn_calloc() be a sum. This fixes a bug reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224103 PR: 224103	2017-12-26 14:37:47 +00:00
tuexen	3168e50bca	When adding support for sending SCTP packets containing an ABORT chunk to ipfw in https://svnweb.freebsd.org/changeset/base/326233, a dependency on the SCTP stack was added to ipfw by accident. This was noted by Kevel Bowling in https://reviews.freebsd.org/D13594 where also a solution was suggested. This patch is based on Kevin's suggestion, but implements the required SCTP checksum computation without any dependency on other SCTP sources. While there, do some cleanups and improve comments. Thanks to Kevin Kevin Browling for reporting the issue and suggesting a fix.	2017-12-26 12:35:02 +00:00
kan	c8da6fae2c	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
ae	8ac899f5b4	Fix mbuf leak when TCPMD5_OUTPUT() method returns error. PR: 223817 MFC after: 1 week	2017-12-14 12:54:20 +00:00
tuexen	e12cc9dfab	Cleaup, no functional change.	2017-12-13 17:11:57 +00:00
glebius	92cef0bd26	Separate out send buffer autoscaling code into function, so that alternative TCP stacks may reuse it instead of pasting. Obtained from: Netflix	2017-12-07 22:36:58 +00:00
tuexen	dbe62654cb	Retire SCTP_WITH_NO_CSUM option. This option was used in the early days to allow performance measurements extrapolating the use of SCTP checksum offloading. Since this feature is now available, get rid of this option. This also un-breaks the LINT kernel. Thanks to markj@ for making me aware of the problem.	2017-12-07 22:19:08 +00:00
pfg	78a6b08618	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
tuexen	6fd4821b43	Add to ipfw support for sending an SCTP packet containing an ABORT chunk. This is similar to the TCP case. where a TCP RST segment can be sent. There is one limitation: When sending an ABORT in response to an incoming packet, it should be tested if there is no ABORT chunk in the received packet. Currently, it is only checked if the first chunk is an ABORT chunk to avoid parsing the whole packet, which could result in a DOS attack. Thanks to Timo Voelker for helping me to test this patch. Reviewed by: bcr@ (man page part), ae@ (generic, non-SCTP part) Differential Revision: https://reviews.freebsd.org/D13239	2017-11-26 18:19:01 +00:00
tuexen	02d5109d72	Fix SPDX line as suggested by pfg	2017-11-24 19:38:59 +00:00
tuexen	05e0aa35ab	Unbreak compilation when using SCTP_DETAILED_STR_STATS option. MFC after: 1 week	2017-11-24 12:18:48 +00:00
tuexen	4505506cf7	Add SPDX line.	2017-11-24 11:25:53 +00:00
markj	1dac2d8e89	Use the right variable for the IP header parameter to tcp:::send. This addresses a regression from r311225. MFC after: 1 week	2017-11-22 14:13:40 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
tuexen	f5ee292f54	Fix the handling of ERROR chunks which a lot of error causes. While there, clean up the code. Thanks to Felix Weinrank who found the bug by using fuzz-testing the SCTP userland stack. MFC after: 1 week	2017-11-15 22:13:10 +00:00
tuexen	e46f5a7d3e	Simply the code and use the full buffer for contigous chunk representation. MFC after: 1 week	2017-11-14 02:30:21 +00:00
glebius	c3c12191e4	Style r320614: don't initialize at declaration, new line after declarations, shorten variable name to avoid extra long lines. No functional changes.	2017-11-13 22:16:47 +00:00
tuexen	03175f1fb8	Cleanup the handling of control chunks. While there fix some minor bug related to clearing the assoc retransmit counter and the dup TSN handling of NR-SACK chunks. MFC after: 3 days	2017-11-12 21:43:33 +00:00
kib	b5d757f3ab	Use hardware timestamps to report packet timestamps for SO_TIMESTAMP and other similar socket options. Provide new control message SCM_TIME_INFO to supply information about timestamp. Currently it indicates that the timestamp was hardware-assisted and high-precision, for software timestamps the message is not returned. Reserved fields are added to ABI to report additional info about it, it is expected that raw hardware clock value might be useful for some applications. Reviewed by: gallatin (previous version), hselasky Sponsored by: Mellanox Technologies MFC after: 2 weeks X-Differential revision: https://reviews.freebsd.org/D12638	2017-11-07 09:46:26 +00:00
tuexen	41648a46b7	Fix an accounting bug where data was counted twice if on the read queue and on the ordered or unordered queue. While there, improve the checking in INVARIANTs when computing the a_rwnd. MFC after: 3 days	2017-11-05 11:59:33 +00:00
tuexen	2731e7ec9f	Allow the setting of the MTU for future paths using an SCTP socket option. This functionality was missing. MFC after: 1 week	2017-11-03 20:46:12 +00:00
tuexen	5f19587fba	Fix the reporting of the MTU for SCTP sockets when using IPv6. MFC after: 1 week	2017-11-01 16:32:11 +00:00
tuexen	84d5fd2292	Fix parsing error when processing cmsg in SCTP send calls. Thei bug is related to a signed/unsigned mismatch. This should most likely fix the issue in sctp_sosend reported by Dmitry Vyukov on the freebsd-hackers mailing list and found by running syzkaller.	2017-10-27 19:27:05 +00:00
tuexen	9799ec2be8	Fix a bug reported by Felix Weinrank using the libfuzzer on the userland stack. MFC after: 3 days	2017-10-25 09:12:22 +00:00
tuexen	ae1c6d7d9f	Fix a bug in handling special ABORT chunks. Thanks to Felix Weinrank for finding this issue using libfuzzer with the userland stack. MFC after: 3 days	2017-10-24 16:24:12 +00:00
tuexen	874f77fd25	Fix a locking issue found by running AFL on the userland stack. Thanks to Felix Weinrank for reporting the issue. MFC after: 3 days	2017-10-24 14:28:56 +00:00
mav	9dea4c2ca0	Relax per-ifnet cif_vrs list double locking in carp(4). In all cases where cif_vrs list is modified, two locks are held: per-ifnet CIF_LOCK and global carp_sx. It means to read that list only one of them is enough to be held, so we can skip CIF_LOCK when we already have carp_sx. This fixes kernel panic, caused by attempts of copyout() to sleep while holding non-sleepable CIF_LOCK mutex. Discussed with: glebius MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2017-10-19 09:01:15 +00:00
tuexen	2e0c4e791c	Fix a signed/unsigned warning. MFC after: 1 week	2017-10-18 21:08:35 +00:00
tuexen	dab7a8b847	Abort an SCTP association, when a DATA chunk is followed by an unknown chunk with a length smaller than the minimum length. Thanks to Felix Weinrank for making me aware of the problem. MFC after: 3 days	2017-10-18 20:17:44 +00:00
tuexen	5dacf93c38	Revert change which got in accidently.	2017-10-18 18:59:35 +00:00
tuexen	16a5f8a231	Fix a bug introduced in r324638. Thanks to Felix Weinrank for making me aware of this. MFC after: 3 days	2017-10-18 18:56:56 +00:00
tuexen	5514853835	Fix the handling of parital and too short chunks. Ensure that the current behaviour is consistent: stop processing of the chunk, but finish the processing of the previous chunks. This behaviour might be changed in a later commit to ABORT the assoication due to a protocol violation, but changing this is a separate issue. MFC after: 3 days	2017-10-15 19:33:30 +00:00
tuexen	bbb2153a14	Code cleanup, not functional change. This avoids taking a pointer of a packed structure which allows simpler compilation of the userland stack. MFC after: 1 week	2017-10-14 10:02:59 +00:00
glebius	4e98454f00	Declare more TCP globals in tcp_var.h, so that alternative TCP stacks can use them. Gather all TCP tunables in tcp_var.h in one place and alphabetically sort them, to ease maintainance of the list. Don't copy and paste declarations in tcp_stacks/fastpath.c.	2017-10-11 20:36:09 +00:00
glebius	d48dcca311	Declare pmtud_blackhole global variables in tcp_timer.h, so that alternative TCP stacks can legally use them.	2017-10-06 20:33:40 +00:00
tuexen	ca2b4c393c	Ensure that the accept ABORT chunks with the T-bit set only the a non-zero matching peer tag is provided. MFC after: 1 week	2017-10-05 13:29:54 +00:00
jch	7b077c606b	Forgotten bits in r324179: Include sys/syslog.h if INVARIANTS is not defined MFC after: 1 week X-MFC with: r324179 Pointy hat to: jch	2017-10-02 09:45:17 +00:00
pkelsey	de84d253da	The soisconnected() call removed from syncache_socket() in r307966 was not extraneous in the TCP Fast Open (TFO) passive-open case. In the TFO passive-open case, syncache_socket() is being called during processing of a TFO SYN bearing a valid cookie, and a call to soisconnected() is required in order to allow the application to immediately consume any data delivered in the SYN and to have a chance to generate response data to accompany the SYN-ACK. The removal of this call to soisconnected() effectively converted all TFO passive opens to having the same RTT cost as a standard 3WHS. This commit adds a call to soisconnected() to syncache_tfo_expand() so that it is only in the TFO passive-open path, thereby restoring TFO passve-open RTT performance and preserving the non-TFO connection-rate performance gains realized by r307966. MFC after: 1 week Sponsored by: Limelight Networks	2017-10-01 23:37:17 +00:00
jch	954d1a711a	Fix an infinite loop in tcp_tw_2msl_scan() when an INP_TIMEWAIT inp has been destroyed before its tcptw with INVARIANTS undefined. This is a symmetric change of r307551: A INP_TIMEWAIT inp should not be destroyed before its tcptw, and INVARIANTS will catch this case. If INVARIANTS is undefined it will emit a log(LOG_ERR) and avoid a hard to debug infinite loop in tcp_tw_2msl_scan(). Reported by: Ben Rubson, hselasky Submitted by: hselasky Tested by: Ben Rubson, jch MFC after: 1 week Sponsored by: Verisign, inc Differential Revision: https://reviews.freebsd.org/D12267	2017-10-01 21:20:28 +00:00
ae	3193525e89	Some mbuf related fixes in icmp_error() * check mbuf length before doing mtod() and accessing to IP header; * update oip pointer and all depending pointers after m_pullup(); * remove extra checks and extra parentheses, wrap long lines; PR: 222670 Reported by: Prabhakar Lakhera MFC after: 1 week	2017-09-29 06:24:45 +00:00
tuexen	a1f4d252b8	Remove unused function. MFC after: 1 week	2017-09-27 13:05:23 +00:00
sephe	e1f9aeedee	tcp: Don't "negotiate" MSS. _NO_ OSes actually "negotiate" MSS. RFC 879: "... This Maximum Segment Size (MSS) announcement (often mistakenly called a negotiation) ..." This negotiation behaviour was introduced 11 years ago by r159955 without any explaination about why FreeBSD had to "negotiate" MSS: In syncache_respond() do not reply with a MSS that is larger than what the peer announced to us but make it at least tcp_minmss in size. Sponsored by: TCP/IP Optimization Fundraise 2005 The tcp_minmss behaviour is still kept. Syncookie fix was prodded by tuexen, who also helped to test this patch w/ packetdrill. Reviewed by: tuexen, karels, bz (previous version) MFC after: 2 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D12430	2017-09-27 05:52:37 +00:00
tuexen	e2a38ef632	Add missing locking. Found by Coverity while scanning the usrsctp library. MFC after: 1 week	2017-09-22 06:33:01 +00:00
tuexen	bea9f86ffd	Add missing socket lock. MFC after: 1 week	2017-09-22 06:07:47 +00:00
tuexen	8b25605cf2	Code cleanup, no functional change. MFC after: 1 week	2017-09-21 11:56:31 +00:00
tuexen	7ff23453d1	Free the control structure after using is, not before. Found by Coverity while scanning the usrsctp library. MFC after: 1 week	2017-09-21 09:47:56 +00:00
tuexen	862561790e	No need to wakeup, since sctp_add_to_readq() does it. MFC after: 1 week	2017-09-21 09:18:05 +00:00
tuexen	7db17aab93	Protect the address workqueue timer by a mutex. MFC after: 1 week	2017-09-20 21:29:54 +00:00
tuexen	5d986c4bec	Fix a warning. MFC after: 1 week	2017-09-19 20:24:13 +00:00
tuexen	4d5c2e99ca	Avoid an overflow when computing the staleness. This issue was found by running libfuzz on the userland stack. MFC after: 1 week	2017-09-19 20:09:58 +00:00
tuexen	7e36a33d6e	Remove a no longer used variable. Reported by: Felix Weinrank MFC after: 1 week	2017-09-19 15:00:19 +00:00
tuexen	45d9e3e378	Fix an accounting bug and use sctp_timer_start to start a timer. MFC after: 1 week	2017-09-17 09:27:27 +00:00
tuexen	5a3fccd9bc	Remove code not used on any platform currently supported. MFC after: 1 week	2017-09-16 21:26:06 +00:00
tuexen	7af7d26d49	Export the UDP encapsualation port and the path state.	2017-09-12 21:08:50 +00:00
tuexen	56b9f343a0	Add support to print the TCP stack being used. Sponsored by: Netflix, Inc.	2017-09-12 13:34:43 +00:00
tuexen	0e07dc13f0	Fix MTU computation. Coverity scanning usrsctp pointed to this code... MFC after: 3 days	2017-09-09 21:03:40 +00:00
tuexen	6024d7b6bf	Fix locking issues found by Coverity scanning the usrsctp library. MFC after: 3 days	2017-09-09 20:44:56 +00:00
tuexen	cd615c4f53	Silence a Coverity warning from scanning the usrsctp library. MFC after: 3 days	2017-09-09 20:08:26 +00:00
tuexen	ab6f49802f	Savely remove a chunk from the control queue. This bug was found by Coverity scanning the usrsctp library. MFC after: 3 days	2017-09-09 19:49:50 +00:00
hselasky	7c2ab1d9f6	Add support for generic backpressure indicator for ratelimited transmit queues aswell as non-ratelimited ones. Add the required structure bits in order to support a backpressure indication with ratelimited connections aswell as non-ratelimited ones. The backpressure indicator is a value between zero and 65535 inclusivly, indicating if the destination transmit queue is empty or full respectivly. Applications can use this value as a decision point for when to stop transmitting data to avoid endless ENOBUFS error codes upon transmitting an mbuf. This indicator is also useful to reduce the latency for ratelimited queues. Reviewed by: gallatin, kib, gnn Differential Revision: https://reviews.freebsd.org/D11518 Sponsored by: Mellanox Technologies	2017-09-06 13:56:18 +00:00
tuexen	2621be48c9	Fix blackhole detection. There were two bugs related to the blackhole detection: * The smalles size was tried more than two times. * The restored MSS was not the original one, but the second candidate. MFC after: 1 week Sponsored by: Netflix, Inc.	2017-08-28 11:41:18 +00:00
sbruno	d0208bfad0	Use counter(9) for PLPMTUD counters. Remove unused PLPMTUD sysctl counters. Bump UPDATING and FreeBSD Version to indicate a rebuild is required. Submitted by: kevin.bowling@kev009.com Reviewed by: jtl Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12003	2017-08-25 19:41:38 +00:00
tuexen	3417044ad6	Avoid TCP log messages which are false positives. This is https://svnweb.freebsd.org/changeset/base/322812, just for alternate TCP stacks. XMFC with: 322812	2017-08-23 15:08:51 +00:00
tuexen	52bf744b71	Avoid TCP log messages which are false positives. The check for timestamps are too early to handle SYN-ACK correctly. So move it down after the corresponing processing has been done. PR: 216832 Obtained from: antonfb@hesiod.org MFC after: 1 week	2017-08-23 14:50:08 +00:00
tuexen	9181a877e1	Ensure inp_vflag is consistently set for TCP endpoints. Make sure that the flags INP_IPV4 and INP_IPV6 are consistently set for inpcbs used for TCP sockets, no matter if the setting is derived from the net.inet6.ip6.v6only sysctl or the IPV6_V6ONLY socket option. For UDP this was already done right. PR: 221385 MFC after: 1 week	2017-08-18 07:27:15 +00:00
oleg	b81e7a7f7a	Fix comment typo.	2017-08-09 10:46:34 +00:00
des	8930206be0	Correct sysctl names.	2017-08-09 07:24:58 +00:00
bz	a0dcb7af20	After inpcb route caching was put back in place there is no need for flowtable anymore (as flowtable was never considered to be useful in the forwarding path). Reviewed by: np Differential Revision: https://reviews.freebsd.org/D11448	2017-07-27 13:03:36 +00:00
emaste	859cc79a19	cc_cubic: restore braces around if-condition block r307901 was reverted in r321480, restoring an incorrect block delimitation bug present in the original cc_cubic commit. Restore only the bugfix (brace addition) from r307901. CID: 1090182 Approved by: sbruno	2017-07-26 21:23:09 +00:00
sbruno	b4426c87f1	Revert r307901 - Inform CC modules about loss events. This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reported by: lawrence Reviewed by: hiren Sponsored by: Limelight Networks	2017-07-25 15:08:52 +00:00
sbruno	fab58e2974	Revert r308180 - Set slow start threshold more accurrately on loss ... This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: kevin Reported by: lawerence Reviewed by: hiren	2017-07-25 15:03:05 +00:00
tuexen	91bb3c532b	Remove duplicate statement.	2017-07-25 11:05:53 +00:00
tuexen	0c018c8ab7	Deal with listening socket correctly.	2017-07-20 14:50:13 +00:00
tuexen	e79328a85f	Fix the explicit EOR mode. If the final messages is not complete, send an ABORT. Joint work with rrs@ MFC after: 1 week	2017-07-20 11:09:33 +00:00
tuexen	75df636424	Avoid shadowed variables. MFC after: 1 week	2017-07-19 15:12:23 +00:00
tuexen	425c1c5a8c	Use memset/memcpy instead of bzero/bcopy. Just use one variant instead of both. Use the memset/memcpy ones since they cause less problems in crossplatform deployment. MFC after: 1 week	2017-07-19 14:28:58 +00:00
tuexen	afa6a856b4	Fix the accounting and add code to detect errors in accounting. Joint work with rrs@ MFC after: 1 week	2017-07-19 12:27:40 +00:00
tuexen	b36838d4aa	Fix the handling of Explicit EOR mode. While there, appropriately handle the overhead depending on the usage of DATA or I-DATA chunks. Take the overhead only into account, when required. Joint work with rrs@ MFC after: 1 week	2017-07-15 19:54:03 +00:00
kib	2f63b6248f	Correct sysent flags for dynamically loaded syscalls. Using the https://github.com/google/capsicum-test/ suite, the PosixMqueue.CapModeForked test was failing due to an ECAPMODE after calling kmq_notify(). On further inspection, the dynamically loaded syscall entry was initialized with sy_flags zeroed out, since SYSCALL_INIT_HELPER() left sysent.sy_flags with the default value. Add a new helper SYSCALL{,32}_INIT_HELPER_F() which takes an additional argument to specify the sy_flags value. Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11576	2017-07-14 09:34:44 +00:00
jtl	55bc1e6207	Don't overpromote values when calculating len in tcp_output(). sbavail() returns u_int and sendwin is a uint32_t. Therefore, min() (which operates on two u_int values) is able to correctly calculate the minimum of these two arguments. Reported by: rrs MFC after: 1 week Sponsored by: Netflix	2017-07-05 16:10:30 +00:00
tuexen	4bb3ea12bb	Move to open state after plausibility checks. When doing this too early, the MIB counters go wrong. MFC after: 1 week	2017-07-04 18:24:50 +00:00
tuexen	1963b0270d	Don't hold if refcount on an stcb when it is not needed. This improves the consistency with other parts of the code.	2017-07-04 18:04:44 +00:00
sbruno	2d13e5eb12	Add a sysctl to toggle the use of the sockets LOWAT when calculating auto window growth Submitted by: j@nitrology.com (Jason Wolfe) Reviewed by: gnn hiren Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11016	2017-07-03 19:39:58 +00:00
tuexen	cd3358b852	Handle sctp_get_next_param() in a consistent way. This addresses an issue found by Felix Weinrank using libfuzz. While there, use also consistent nameing. MFC after: 3 days	2017-06-23 21:01:57 +00:00
tuexen	b7801f570d	Check the length of a COOKIE chunk before accessing fields in it. Thanks to Felix Weinrank for reporting the issue he found by using libFuzzer. MFC after: 3 days	2017-06-23 10:09:49 +00:00
tuexen	2abe6d833d	Use a longer buffer for messages in ERROR chunks. This allows them to be sent in a non truncated way and addresses a warning given by newver versions of gcc. Thanks to Anselm Jonas Scholl for reporting it and providing a patch.	2017-06-23 09:27:31 +00:00
tuexen	4b087d3335	Honor the backlog field.	2017-06-23 08:35:54 +00:00
tuexen	fc06311c80	Improve compilation on platforms different from FreeBSD.	2017-06-23 08:34:01 +00:00
glebius	e35d543ec1	Listening sockets improvements. o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. - Take out selinfo's from the socket buffers into the socket. The first reason is to support braindamaged scenario when a socket is added to kevent(2) and then listen(2) is cast on it. The second reason is that there is future plan to make socket buffers pluggable, so that for a dataflow socket a socket buffer can be changed, and in this case we also want to keep same selinfos through the lifetime of a socket. - Remove struct struct so_accf. Since now listening stuff no longer affects struct socket size, just move its fields into listening part of the union. - Provide sol_upcall field and enforce that so_upcall_set() may be called only on a dataflow socket, which has buffers, and for listening sockets provide solisten_upcall_set(). o Remove ACCEPT_LOCK() global. - Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer. - Allow to acquire two socket locks, but the first one must belong to a listening socket. - Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally. - Most protocols aren't touched by this change, except UNIX local sockets. See below for more information. o Reduce copy-and-paste in kernel modules that accept connections from listening sockets: provide function solisten_dequeue(), and use it in the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4), infiniband, rpc. o UNIX local sockets. - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock. - To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets. - Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket. - Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()? Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9770	2017-06-08 21:30:34 +00:00
jtl	8fd8757603	Add the infrastructure to support loading multiple versions of TCP stack modules. It adds support for mangling symbols exported by a module by prepending a string to them. (This avoids overlapping symbols in the kernel linker.) It allows the use of a macro as the module name in the DECLARE_MACRO() and MACRO_VERSION() macros. It allows the code to register stack aliases (e.g. both a generic name ["default"] and version-specific name ["default_10_3p1"]). With these changes, it is trivial to compile TCP stack modules with the name defined in the Makefile and to load multiple versions of the same stack simultaneously. This functionality can be used to enable side-by-side testing of an old and new version of the same TCP stack. It also could support upgrading the TCP stack without a reboot. Reviewed by: gnn, sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D11086	2017-06-08 20:41:28 +00:00
glebius	cf36967bf9	This code was missing socket unlock and socket buffer lock, but it worked since right now these two locks are the same.	2017-06-08 06:37:11 +00:00
glebius	a6ce66b70e	The desired lock here is socket buffer, not socket. Right now they match, but won't in future.	2017-06-08 06:34:09 +00:00
tuexen	dd1ca3606d	Fix the ICMP6 handling for TCP. The ICMP6 packets might not be contained in a single mbuf. So don't assume this. Keep the IPv4 and IPv6 code in sync and make explicit that the syncache code only need the TCP sequence number, not the complete TCP header. MFC after: 3 days Sponsored by: Netflix, Inc.	2017-06-03 21:53:58 +00:00
tuexen	bb1f43f35d	Improve comments to describe what the code does. Reported by: jtl Sponsored by: Netflix, Inc.	2017-06-01 15:11:18 +00:00
jtl	5cc59b8e92	Enforce the limit on ICMP messages before doing work to formulate the response. Delete an unneeded rate limit for UDP under IPv6. Because ICMP6 messages have their own rate limit, it is unnecessary to apply a second rate limit to UDP messages. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D10387	2017-05-30 14:32:44 +00:00
tuexen	ab2f2c70c1	Use the SCTP_PCB_FLAGS_ACCEPTING flags to check for listeners. While there, use a macro for checking the listen state to allow for easier changes if required. This done to help glebius@ with his listen changes.	2017-05-26 16:29:00 +00:00
glebius	7845c5b75c	o Rearrange struct inpcb fields to optimize the TCP output code path considering cache line hits and misses. Put the lock and hash list glue into the first cache line, put inp_refcount inp_flags inp_socket into the second cache line. o On allocation zero out entire structure except the lock and list entries, including inp_route inp_lle inp_gencnt. When inp_route and inp_lle were introduced, they were added below inp_zero_size, resulting on not being cleared after free/alloc. This definitely was a source of bugs with route caching. Could be that r315956 has just fixed one of them. The inp_gencnt is reinitialized on every alloc, so it is safe to clear it. This has been proved to improve TCP performance at Netflix. Obtained from: rrs Differential Revision: D10686	2017-05-24 17:47:16 +00:00
tuexen	ad6edddba8	The connect() system call should return -1 and set errno to EAFNOSUPPORT if it is called on a TCP socket * with an IPv6 address and the socket is bound to an IPv4-mapped IPv6 address. * with an IPv4-mapped IPv6 address and the socket is bound to an IPv6 address. Thanks to Jonathan T. Leighton for reporting this issue. Reviewed by: bz gnn MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D9163	2017-05-22 15:29:10 +00:00
ae	9c4008f018	Set M_BCAST and M_MCAST flags on mbuf sent via divert socket. r290383 has changed how mbufs sent by divert socket are handled. Previously they are always handled by slow path processing in ip_input(). Now ip_tryforward() is invoked from ip_input() before in_broadcast() check. Since diverted packet lost all mbuf flags, it passes the broadcast check in ip_tryforward() due to missing M_BCAST flag. In the result the broadcast packet is forwarded to the wire instead of be consumed by network stack. Add in_broadcast() check to the div_output() function. And restore the M_BCAST flag if destination address is broadcast for the given network interface. PR: 209491 MFC after: 1 week	2017-05-17 09:04:09 +00:00
emaste	1901c3e1f2	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193	2017-05-17 00:34:34 +00:00
glebius	19fea9c3e0	Reduce in_pcbinfo_init() by two params. No users supply any flags to this function (they used to say UMA_ZONE_NOFREE), so flag parameter goes away. The zone_fini parameter also goes away. Previously no protocols (except divert) supplied zone_fini function, so inpcb locks were leaked with slabs. This was okay while zones were allocated with UMA_ZONE_NOFREE flag, but now this is a leak. Fix that by suppling inpcb_fini() function as fini method for all inpcb zones.	2017-05-15 21:58:36 +00:00
ngie	e40523722e	Add missing braces around MCAST_EXCLUDE check when KTR support is compiled into the kernel This ensures that .iss_asm (the number of ASM listeners) isn't incorrectly decremented for MLD-layer source datagrams when inspecting im*s_st[1] (the second state in the structure). MFC after: 2 months PR: 217509 [1] Reported by: Coverity (Isilon) Reviewed by: ae ("This patch looks correct to me." [1]) Submitted by: Miles Ohlrich <miles.ohlrich@isilon.com> Sponsored by: Dell EMC Isilon	2017-05-13 18:41:24 +00:00
glebius	433716df1d	There is no good reason for TCP reassembly zone to be UMA_ZONE_NOFREE. It has strong locking model, doesn't have any timers associated with entries. The entries theirselves are referenced only from the tcpcb zone, which itself is a normal zone, without the UMA_ZONE_NOFREE flag.	2017-05-10 23:32:31 +00:00
eugen	6a3a52c9a4	ipfw nat and natd support multiple aliasing instances with "nat global" feature that chooses right alias_address for outgoing packets that already have corresponding state in one of aliasing instances. This feature works just fine for ICMP, UDP, TCP and SCTP packes but not for others. For example, outgoing PPtP/GRE packets always get alias_address of latest configured instance no matter whether such packets have corresponding state or not. This change unbreaks translation of transit PPtP/GRE connections for "nat global" case fixing a bug in static ProtoAliasOut() function that ignores its "create" argument and performs translation regardless of its value. This static function is called only by LibAliasOutLocked() function and only for packers other than ICMP, UDP, TCP and SCTP. LibAliasOutLocked() passes its "create" argument unmodified. We have only two consumers of LibAliasOutLocked() in the source tree calling it with "create" unequal to 1: "ipfw nat global" code and similar natd code having same problem. All other consumers of LibAliasOutLocked() call it with create = 1 and the patch is "no-op" for such cases. PR: 218968 Approved by: ae, vsevolod (mentor) MFC after: 1 week	2017-05-10 19:41:52 +00:00
tuexen	0deadecfc3	Allow SCTP to use the hostcache. This patch allows the MTU stored in the hostcache to be used as an initial value for SCTP paths. When an ICMP PTB message is received, store the MTU in the hostcache. MFC after: 1 week	2017-04-29 19:20:50 +00:00
tuexen	23c816e1ed	Don't set the DF-bit on timer based retransmissions. MFC after: 1 week	2017-04-29 09:57:27 +00:00
tuexen	5e5f20f8fc	Set the DF bit for responses to out-of-the-blue packets. MFC after: 1 week	2017-04-28 15:38:34 +00:00
tuexen	72585cfb69	Fix an issue with MTU calculation if an ICMP messaeg is received for an SCTP/UDP packet. MFC after: 1 week	2017-04-26 20:21:05 +00:00
tuexen	fcb532bf04	Use consistently uint32_t for mtu values. This does not change functionality, but this cleanup is need for further improvements of ICMP handling. MFC after: 1 week	2017-04-26 19:26:40 +00:00
tuexen	becaf5f7f2	When a SYN-ACK is received in SYN-SENT state, RFC 793 requires the validation of SEG.ACK as the first step. If the ACK is not acceptable, a RST segment should be sent and the segment should be dropped. Up to now, the segment was partially processed. This patch moves the check for the SEG.ACK validation up to the front as required. Reviewed by: hiren, gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10424	2017-04-26 06:20:58 +00:00
np	99afb2d021	Flush the LRO ctrl as soon as lro_mbufs fills up. There is no need to wait for the next enqueue from the driver. Reviewed by: gnn@, hselasky@, gallatin@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D10432	2017-04-24 22:35:00 +00:00
np	660b10131f	Frames that are not considered for LRO should not be counted in LRO statistics. Reviewed by: gnn@, hselasky@, gallatin@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D10430	2017-04-24 22:31:56 +00:00
brooks	35c0325946	Remove the NATM framework including the en(4), fatm(4), hatm(4), and patm(4) devices. Maintaining an address family and framework has real costs when we make infrastructure improvements. In the case of NATM we support no devices manufactured in the last 20 years and some will not even work in modern motherboards (some newer devices that patm(4) could be updated to support apparently exist, but we do not currently have support). With this change, support remains for some netgraph modules that don't require NATM support code. It is unclear if all these should remain, though ng_atmllc certainly stands alone. Note well: FreeBSD 11 supports NATM and will continue to do so until at least September 30, 2021. Improvements to the code in FreeBSD 11 are certainly welcome. Reviewed by: philip Approved by: harti	2017-04-24 21:21:49 +00:00
tuexen	e7f054bd8f	Represent "a syncache overflow hasn't happend yet" by using -(SYNCOOKIE_LIFETIME + 1) instead of INT64_MIN, since it is good enough and works when time_t is int32 or int64. This fixes the issue reported by cy@ on i386. Reported by: cy MFC after: 1 week Sponsored by: Netflix, Inc.	2017-04-21 06:05:34 +00:00
tuexen	129217562b	Syncoockies can be used in combination with the syncache. If the cache overflows, syncookies are used. This patch restricts the usage of syncookies in this case: accept syncookies only if there was an overflow of the syncache recently. This mitigates a problem reported in PR217637, where is syncookie was accepted without any recent drops. Thanks to glebius@ for suggesting an improvement. PR: 217637 Reviewed by: gnn, glebius MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10272	2017-04-20 19:19:33 +00:00
np	45f27d78ad	Free lro_hash unconditionally, just like lro_mbuf_data a few lines later. Fix whitespace nit while here.	2017-04-19 23:06:07 +00:00
np	dc369db3bf	Do not leak lro_hash on failure to allocate lro_mbuf_data. MFC after: 1 week	2017-04-19 22:27:26 +00:00
np	df21ae8f03	Remove redundant assignment.	2017-04-19 22:20:41 +00:00
ae	3433e16f91	Rework r316770 to make it protocol independent and general, like we do for streaming sockets. And do more cleanup in the sbappendaddr_locked_internal() to prevent leak information from existing mbuf to the one, that will be possible created later by netgraph. Suggested by: glebius Tested by: Irina Liakh <spell at itl ua> MFC after: 1 week	2017-04-14 09:00:48 +00:00
ae	629029d020	Clear h/w csum flags on mbuf handled by UDP. When checksums of received IP and UDP header already checked, UDP uses sbappendaddr_locked() to pass received data to the socket. sbappendaddr_locked() uses given mbuf as is, and if NIC supports checksum offloading, mbuf contains csum_data and csum_flags that were calculated for already stripped headers. Some NICs support only limited checksums offloading and do not use CSUM_PSEUDO_HDR flag, and csum_data contains some value that UDP/TCP should use for pseudo header checksum calculation. When L2TP is used for tunneling with mpd5, ng_ksocket receives mbuf with filled csum_flags and csum_data, that were calculated for outer headers. When L2TP header is stripped, a packet that was tunneled goes to the IP layer and due to presence of csum_flags (without CSUM_PSEUDO_HDR) and csum_data, the UDP/TCP checksum check fails for this packet. Reported by: Irina Liakh <spell at itl ua> Tested by: Irina Liakh <spell at itl ua> MFC after: 1 week	2017-04-13 17:03:57 +00:00
tuexen	d7ad34a8ce	The sysctl variable net.inet.tcp.drop_synfin is not honored in all states, for example not in SYN-SENT. This patch adds code to check the sysctl variable in other states than LISTEN. Thanks to ae and gnn for providing comments. Reviewed by: gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D9894	2017-04-12 20:27:15 +00:00
ae	f7d1b9ebc6	Make sysctl identifiers for direct netisr queue unique. Introduce IPCTL_INTRDQMAXLEN and IPCTL_INTRDQDROPS macros for this purpose. Reviewed by: gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10358	2017-04-11 19:20:20 +00:00
smh	42f4e00617	Use estimated RTT for receive buffer auto resizing instead of timestamps Switched from using timestamps to RTT estimates when performing TCP receive buffer auto resizing, as not all hosts support / enable TCP timestamps. Disabled reset of receive buffer auto scaling when not in bulk receive mode, which gives an extra 20% performance increase. Also extracted auto resizing to a common method shared between standard and fastpath modules. With this AWS S3 downloads at ~17ms latency on a 1Gbps connection jump from ~3MB/s to ~100MB/s using the default settings. Reviewed by: lstewart, gnn MFC after: 2 weeks Relnotes: Yes Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D9668	2017-04-10 08:19:35 +00:00
rstone	e52698ef42	Revert the optimization from r304436 r304436 attempted to optimize the handling of incoming UDP packet by only making an expensive call to in_broadcast() if the mbuf was marked as an broadcast packet. Unfortunately, this cannot work in the case of point-to- point L2 protocols like PPP, which have no notion of "broadcast". The optimization has been disabled for several months now with no progress towards fixing it, so it needs to go.	2017-04-05 16:57:13 +00:00
ae	5b90a3f01f	Add O_EXTERNAL_DATA opcode support. This opcode can be used to attach some data to external action opcode. And unlike to O_EXTERNAL_INSTANCE opcode, this opcode does not require creating of named instance to pass configuration arguments to external action handler. The data is coming just next to O_EXTERNAL_ACTION opcode. The userlevel part currenly supports formatting for opcode with ipfw_insn size, by default it expects u16 numeric value in the arg1. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2017-04-03 02:44:40 +00:00
smh	62514c9edc	Allow explicitly assigned IPv4 loopback address to be used in jails If a jail has an explicitly assigned loopback address then allow it to be used instead of remapping requests for the loopback adddress to the first IPv4 address assigned to the jail. This fixes issues where applications attempt to detect their bound port where they requested a loopback address, which was available, but instead the kernel remapped it to the jails first address. A example of this is binding nginx to 127.0.0.1 and then running "service nginx upgrade" which before this change would cause nginx to fail. Also: * Correct the description of prison_check_ip4_locked to match the code. MFC after: 2 weeks Relnotes: Yes Sponsored by: Multiplay	2017-03-31 00:41:54 +00:00
karels	4be5457efa	Enable route and LLE (ndp) caching in TCP/IPv6 tcp_output.c was using a route on the stack for IPv6, which does not allow route caching or LLE/ndp caching. Switch to using the route (v6 flavor) in the in_pcb, which was already present, which caches both L3 and L2 lookups. Reviewed by: gnn hiren MFC after: 2 weeks	2017-03-27 23:48:36 +00:00
karels	2563deee1a	Fix reference count leak with L2 caching. ip_forward, TCP/IPv6, and probably SCTP leaked references to L2 cache entry because they used their own routes on the stack, not in_pcb routes. The original model for route caching was callers that provided a route structure to ip{,6}input() would keep the route, and this model was used for L2 caching as well. Instead, change L2 caching to be done by default only when using a route structure in the in_pcb; the pcb deallocation code frees L2 as well as L3 cacches. A separate change will add route caching to TCP/IPv6. Another suggestion was to have the transport protocols indicate willingness to use L2 caching, but this approach keeps the changes in the network level Reviewed by: ae gnn MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10059 and those below, will be ignored-- > Description of fields to fill in above: 76 columns --\| > PR: If and which Problem Report is related. > Submitted by: If someone else sent in the change. > Reported by: If someone else reported the issue. > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]\|week[s]\|month[s]]. Request a reminder email. > MFH: Ports tree branch name. Request approval for merge. > Relnotes: Set to 'yes' for mention in release notes. > Security: Vulnerability reference (one per line) or description. > Sponsored by: If the change was sponsored by an organization. > Differential Revision: https://reviews.freebsd.org/D### (full phabric URL needed). > Empty fields above will be automatically removed. M netinet/in_pcb.c M netinet/ip_output.c M netinet6/ip6_output.c	2017-03-25 15:06:28 +00:00
glebius	a58d0019c1	Force same alignment on struct xinpgen as we have on struct xinpcb. This fixes 32-bit builds.	2017-03-21 16:23:44 +00:00
glebius	3a5c9aaf2b	Hide struct inpcb, struct tcpcb from the userland. This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL \|\| _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018	2017-03-21 06:39:49 +00:00
vangyzen	d7b79e158f	Add some ntohl() love to r315277 inet_ntoa() and inet_ntoa_r() take the address in network byte-order. When I removed those calls, I should have replaced them with ntohl() to make the hex addresses slightly less unreadable. Here they are. See r315277 regarding classic blunders. vangyzen: you're deep in "no good deed" territory, it seems --badger Reported by: ian MFC after: 3 days MFC when: I finally get it right Sponsored by: Dell EMC	2017-03-14 20:57:54 +00:00
vangyzen	fad1018fc7	KTR: log IPv4 addresses in hex rather than dotted-quad When I made the changes in r313821, I fell victim to one of the classic blunders, the most famous of which is: never get involved in a land war in Asia. But only slightly less well known is this: Keep your brain turned on and engaged when making a tedious, sweeping, mechanical change. KTR can correctly log the immediate integral values passed to it, as well as constant strings, but not non-constant strings, since they might change by the time ktrdump retrieves them. Reported by: glebius MFC after: 3 days Sponsored by: Dell EMC	2017-03-14 18:27:48 +00:00
cem	d76c6282b3	alias_proxy.c: Fix accidental error quashing This was introduced on accident in r165243, when return sites were unified to add a lock around LibAliasProxyRule(). PR: 217749 Submitted by: Svyatoslav <razmyslov at viva64.com> Sponsored by: Viva64 (PVS-Studio)	2017-03-13 18:05:31 +00:00
ae	662d71cd1c	Fix the L2 address printed in the "arp: %s moved from %*D" message. In the r292978 struct llentry was changed and the ll_addr field become the pointer. PR: 217667 MFC after: 1 week	2017-03-11 04:57:52 +00:00
glebius	65907cccb6	Make inp_lock_assert() depend on INVARIANT_SUPPORT, not INVARIANTS. This will make INVARIANT-enabled modules, that use this function to load successfully on a kernel that has INVARIANT_SUPPORT only.	2017-03-09 00:55:19 +00:00
eri	f2a480c25c	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Reviewed by: adrian, aw Approved by: ae (mentor) Sponsored by: rsync.net Differential Revision: D9235	2017-03-06 04:01:58 +00:00
imp	7e6cabd06e	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
tuexen	19883d1af3	TCP window updates are only sent if the window can be increased by at least 2 * MSS. However, if the receive buffer size is small, this might be impossible. Add back a criterion to send a TCP window update if the window can be increased by at least half of the receive buffer size. This condition was removed in r242252. This patch simply brings it back. PR: 211003 Reviewed by: gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D9475	2017-02-23 18:14:36 +00:00
vangyzen	f36100ef2c	Remove inet_ntoa() from the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Remove it from the kernel. Suggested by: glebius, emaste Reviewed by: gnn MFC after: never Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:50:01 +00:00
vangyzen	c7c348accc	Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. Suggested by: glebius, emaste Reviewed by: gnn MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:47:41 +00:00
ae	ec763bfc18	Add missing check to fix the build with IPSEC_SUPPORT and without MAC. Submitted by: netchild	2017-02-14 21:33:10 +00:00
ae	5a443cfa6c	Remove IPsec related PCB code from SCTP. The inpcb structure has inp_sp pointer that is initialized by ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec security policies associated with a specific socket. An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options to configure these security policies. Then ip[6]_output() uses inpcb pointer to specify that an outgoing packet is associated with some socket. And IPSEC_OUTPUT() method can use a security policy stored in the inp_sp. For inbound packet the protocol-specific input routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms to inbound security policy configured in the inpcb. SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends packets. Thus IPSEC_OUTPUT() method does not consider such packets as associated with some socket and can not apply security policies from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY() method is called from protocol-specific input routine, it can specify inpcb pointer and associated with socket inbound policy will be checked. But there are two problems: 1. Such check is asymmetric, becasue we can not apply security policy from inpcb for outgoing packet. 2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and access to inp_sp is protected. But for SCTP this is not correct, becasue SCTP uses own locks to protect inpcb. To fix these problems remove IPsec related PCB code from SCTP. This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options will be not applicable to SCTP sockets. To be able correctly check inbound security policies for SCTP, mark its protocol header with the PR_LASTHDR flag. Reported by: tuexen Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D9538	2017-02-13 11:37:52 +00:00
eri	bb348ba959	Committed without approval from mentor. Reported by: gnn	2017-02-12 06:56:33 +00:00
rstone	cb359893cf	Don't zero out srtt after excess retransmits If the TCP stack has retransmitted more than 1/4 of the total number of retransmits before a connection drop, it decides that its current RTT estimate is hopelessly out of date and decides to recalculate it from scratch starting with the next ACK. Unfortunately, it implements this by zeroing out the current RTT estimate. Drop this hack entirely, as it makes it significantly more difficult to debug connection issues. Instead check for excessive retransmits at the point where srtt is updated from an ACK being received. If we've exceeded 1/4 of the maximum retransmits, discard the previous srtt estimate and replace it with the latest rtt measurement. Differential Revision: https://reviews.freebsd.org/D9519 Reviewed by: gnn Sponsored by: Dell EMC Isilon	2017-02-11 17:05:08 +00:00
glebius	c77bfbb1ee	Move tcp_fields_to_net() static inline into tcp_var.h, just below its friend tcp_fields_to_host(). There is third party code that also uses this inline. Reviewed by: ae	2017-02-10 17:46:26 +00:00
eri	2530138e32	Fix build after r313524 Reported-by: ohartmann@walstatt.org	2017-02-10 06:01:47 +00:00
eri	6898c4334b	Revert r313527 Heh svn is not git	2017-02-10 05:58:16 +00:00
eri	b429db62bc	Correct missed variable name. Reported-by: ohartmann@walstatt.org	2017-02-10 05:51:39 +00:00
eri	ed45b31494	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Sponsored-by: rsync.net Differential Revision: D9235 Reviewed-by: adrian	2017-02-10 05:16:14 +00:00
vangyzen	a55a27b7c6	Fix garbage IP addresses in UDP log_in_vain messages If multiple threads emit a UDP log_in_vain message concurrently, the IP addresses could be garbage due to concurrent usage of a single string buffer inside inet_ntoa(). Use inet_ntoa_r() with two stack buffers instead. Reported by: Mark Martinec <Mark.Martinec+freebsd@ijs.si> MFC after: 3 days Relnotes: yes Sponsored by: Dell EMC	2017-02-07 18:57:57 +00:00
ae	0fb6ad528e	Merge projects/ipsec into head/. Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352	2017-02-06 08:49:57 +00:00
pkelsey	6c91473476	Fix VIMAGE-related bugs in TFO. The autokey callout vnet context was not being initialized, and the per-vnet fastopen context was only being initialized for the default vnet. PR: 216613 Reported by: Alex Deiter <alex dot deiter at gmail dot com> MFC after: 1 week	2017-02-03 17:02:57 +00:00
gnn	5354226f97	Add an mbuf to ipinfo_t translator to finish cleanup of mbuf passing to TCP probes. Reviewed by: markj MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9401	2017-02-01 19:33:00 +00:00
tuexen	432d67eb8e	Ensure that the variable bail is always initialized before used. MFC after: 1 week	2017-02-01 00:10:29 +00:00
tuexen	695d1ccda1	Take the SCTP common header into account when computing the space available for chunks. This unbreaks the handling of ICMPV6 packets indicating "packet too big". It just worked for IPv4 since we are overbooking for IPv4. MFC after: 1 week	2017-01-31 23:36:31 +00:00
tuexen	b4b6fb9832	Remove a duplicate debug statement. MFC after: 1 week	2017-01-31 23:34:02 +00:00
cy	230ac6bb88	Correct comment grammar and make it easier to understand. MFC after: 1 week	2017-01-30 04:51:18 +00:00
hiren	845c485521	Add a knob to change default behavior of inheriting listen socket's tcp stack regardless of what the default stack for the system is set to. With current/default behavior, after changing the default tcp stack, the application needs to be restarted to pick up that change. Setting this new knob net.inet.tcp.functions_inherit_listen_socket_stack to '0' would change that behavior and make any new connection use the newly selected default tcp stack. Reviewed by: rrs MFC after: 2 weeks Sponsored by: Limelight Networks	2017-01-27 23:10:46 +00:00
loos	5bd3158a3b	After the in_control() changes in r257692, an existing address is (intentionally) deleted first and then completely added again (so all the events, announces and hooks are given a chance to run). This cause an issue with CARP where the existing CARP data structure is removed together with the last address for a given VHID, which will cause a subsequent fail when the address is later re-added. This change fixes this issue by adding a new flag to keep the CARP data structure when an address is not being removed. There was an additional issue with IPv6 CARP addresses, where the CARP data structure would never be removed after a change and lead to VHIDs which cannot be destroyed. Reviewed by: glebius Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-25 19:04:08 +00:00
tuexen	bd0169975f	Fix a bug where the overhead of the I-DATA chunk was not considered. MFC after: 1 week	2017-01-24 21:30:36 +00:00
hselasky	efa6326974	Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months	2017-01-18 13:31:17 +00:00
sobomax	701697521c	Add a new socket option SO_TS_CLOCK to pick from several different clock sources to return timestamps when SO_TIMESTAMP is enabled. Two additional clock sources are: o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME); o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC). In addition to this, this option provides unified interface to get bintime (equivalent of using SO_BINTIME), except it also supported with IPv6 where SO_BINTIME has never been supported. The long term plan is to depreciate SO_BINTIME and move everything to using SO_TS_CLOCK. Idea for this enhancement has been briefly discussed on the Net session during dev summit in Ottawa last June and the general input was positive. This change is believed to benefit network benchmarks/profiling as well as other scenarios where precise time of arrival measurement is necessary. There are two regression test cases as part of this commit: one extends unix domain test code (unix_cmsg) to test new SCM_XXX types and another one implementis totally new test case which exchanges UDP packets between two processes using both conventional methods (i.e. calling clock_gettime(2) before recv(2) and after send(2)), as well as using setsockopt()+recv() in receive path. The resulting delays are checked for sanity for all supported clock types. Reviewed by: adrian, gnn Differential Revision: https://reviews.freebsd.org/D9171	2017-01-16 17:46:38 +00:00
cem	6d240aee91	Fix a variety of cosmetic typos and misspellings No functional change. PR: 216096, 216097, 216098, 216101, 216102, 216106, 216109, 216110 Reported by: Bulat <bltsrc at mail.ru> Sponsored by: Dell EMC Isilon	2017-01-15 18:00:45 +00:00
glebius	23ed1207c8	Use getsock_cap() instead of deprecated fgetsock(). Reviewed by: tuexen	2017-01-13 16:54:44 +00:00
tuexen	36cef24e89	Ensure that the buffer length and the length provided in the IPv4 header match when using a raw socket to send IPv4 packets and providing the header. If they don't match, let send return -1 and set errno to EINVAL. Before this patch is was only enforced that the length in the header is not larger then the buffer length. PR: 212283 Reviewed by: ae, gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9161	2017-01-13 10:55:26 +00:00
sobomax	d54f5e8994	Fix slight type mismatch between so_options defined in sys/socketvar.h and tw_so_options defined here which is supposed to be a copy of the former (short vs u_short respectively). Switch tw_so_options to be "signed short" to match the type of the field it's inherited from.	2017-01-12 10:14:54 +00:00
hiren	9a5ee970a5	sysctl net.inet.tcp.hostcache.list in a jail can see connections from other jails and the host. This commit fixes it. PR: 200361 Submitted by: bz (original version), hiren (minor corrections) Reported by: Marcus Reid <marcus at blazingdot dot com> Reviewed by: bz, gnn Tested by: Lohith Bellad <lohithbsd at gmail dot com> MFC after: 1 week Sponsored by: Limelight Networks (minor corrections)	2017-01-05 17:22:09 +00:00

... 3 4 5 6 7 ...

6064 Commits