Commit Graph

6120 Commits

Author SHA1 Message Date
Matt Macy
3db28e6656 avoid 'tcp_outflags defined but not used' 2018-06-08 17:37:49 +00:00
Matt Macy
afbd6cfa72 hpts: remove redundant decl breaking gcc build 2018-06-08 17:37:43 +00:00
Randall Stewart
89e560f441 This commit brings in a new refactored TCP stack called Rack.
Rack includes the following features:
 - A different SACK processing scheme (the old sack structures are not used).
 - RACK (Recent acknowledgment) where counting dup-acks is no longer done
        instead time is used to knwo when to retransmit. (see the I-D)
 - TLP (Tail Loss Probe) where we will probe for tail-losses to attempt
        to try not to take a retransmit time-out. (see the I-D)
 - Burst mitigation using TCPHTPS
 - PRR (partial rate reduction) see the RFC.

Once built into your kernel, you can select this stack by either
socket option with the name of the stack is "rack" or by setting
the global sysctl so the default is rack.

Note that any connection that does not support SACK will be kicked
back to the "default" base  FreeBSD stack (currently known as "default").

To build this into your kernel you will need to enable in your
kernel:
   makeoptions WITH_EXTRA_TCP_STACKS=1
   options TCPHPTS

Sponsored by:	Netflix Inc.
Differential Revision:		https://reviews.freebsd.org/D15525
2018-06-07 18:18:13 +00:00
Michael Tuexen
ff34bbe9c2 Improve compliance with RFC 4895 and RFC 6458.
Silently dicard SCTP chunks which have been requested to be
authenticated but are received unauthenticated no matter if support
for SCTP authentication has been negotiated. This improves compliance
with RFC 4895.

When the application uses the SCTP_AUTH_CHUNK socket option to
request a chunk to be received in an authenticated way, enable
the SCTP authentication extension for the end-point. This improves
compliance with RFC 6458.

Discussed with:	Peter Lei
MFC after:	3 days
2018-06-06 19:27:06 +00:00
Sean Bruno
1a43cff92a Load balance sockets with new SO_REUSEPORT_LB option.
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967.  Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Obtained from:	DragonflyBSD
Relnotes:	Yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-06-06 15:45:57 +00:00
Andrey V. Elsukov
590d0a43b6 Make in_delayed_cksum() be similar to IPv6 implementation.
Use m_copyback() function to write checksum when it isn't located
in the first mbuf of the chain. Handmade analog doesn't handle the
case when parts of checksum are located in different mbufs.
Also in case when mbuf is too short, m_copyback() will allocate new
mbuf in the chain instead of making out of bounds write.

Also wrap long line and remove now useless KASSERTs.

X-MFC after:	r334705
2018-06-06 13:01:53 +00:00
Tom Jones
1fdbfb909f Use UDP len when calculating UDP checksums
The length of the IP payload is normally equal to the UDP length, UDP Options
(draft-ietf-tsvwg-udp-options-02) suggests using the difference between IP
length and UDP length to create space for trailing data.

Correct checksum length calculation to use the UDP length rather than the IP
length when not offloading UDP checksums.

Approved by: jtl (mentor)
Differential Revision:	https://reviews.freebsd.org/D15222
2018-06-06 07:04:40 +00:00
Andrey V. Elsukov
b941bc1d6e Rework if_gif(4) to use new encap_lookup_t method to speedup lookup
of needed interface when many gif interfaces are present.

Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations.
Use hash table to speedup lookup of needed softc. Interfaces
with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST.
Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed
16 years ago, and actually it could work only for outbound direction.
Each protocol, that can be handled by if_gif(4) interface is registered
by separate encap handler, this helps avoid invoking the handler
for unrelated protocols (GRE, PIM, etc.).

This change allows dramatically improve performance when many gif(4)
interfaces are used.

Sponsored by:	Yandex LLC
2018-06-05 21:24:59 +00:00
Andrey V. Elsukov
6d8fdfa9d5 Rework IP encapsulation handling code.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
  data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
  families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
  that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
  registered, encapcheck handler of each interface is invoked for each
  packet. The search takes O(n) for n interfaces. All this work is done
  with exclusive lock held.

What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
  addedr: min_length is the minimum packet length, that encapsulation
  handler expects to see; exact_match is maximum number of bits, that
  can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
  targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
  many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
  method was used from this structure, so I don't see the need to keep
  using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
  pointer. Now it is passed directly trough encap_input_t method.
  encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
  any code in the tree that uses them. All consumers use encap_attach_func()
  method, that relies on invoking of encapcheck() to determine the needed
  handler.
- introduced struct encap_config, it contains parameters of encap handler
  that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
  handlers that need more bits to match will be checked first, and if
  encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.

Reviewed by:	mmacy
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15617
2018-06-05 20:51:01 +00:00
Mateusz Guzik
34c538c356 malloc: try to use builtins for zeroing at the callsite
Plenty of allocation sites pass M_ZERO and sizes which are small and known
at compilation time. Handling them internally in malloc loses this information
and results in avoidable calls to memset.

Instead, let the compiler take the advantage of it whenever possible.

Discussed with:	jeff
2018-06-02 22:20:09 +00:00
Michael Tuexen
13500cbb61 Don't overflow a buffer if we receive an INIT or INIT-ACK chunk
without a RANDOM parameter but with a CHUNKS or HMAC-ALGO parameter.
Please note that sending this combination violates the specification.

Thnanks to Ronald E. Crane for reporting the issue for the userland
stack.

MFC after:	3 days
2018-06-02 16:28:10 +00:00
Michael Tuexen
c14f9fe5ef Limit the retransmission timer for SYN-ACKs by TCPTV_REXMTMAX.
Use the same logic to handle the SYN-ACK retransmission when sent from
the syn cache code as when sent from the main code.

MFC after:	3 days
Sponsored by:	Netflix, Inc.
2018-06-01 21:24:27 +00:00
Michael Tuexen
badef00d58 Ensure net.inet.tcp.syncache.rexmtlimit is limited by TCP_MAXRXTSHIFT.
If the sysctl variable is set to a value larger than TCP_MAXRXTSHIFT+1,
the array tcp_syn_backoff[] is accessed out of bounds.

Discussed with: jtl@
MFC after:	3 days
Sponsored by:	Netflix, Inc.
2018-06-01 19:58:19 +00:00
Andrey V. Elsukov
12e7376216 Remove empty encap_init() function.
MFC after:	2 weeks
2018-05-29 12:32:08 +00:00
Michael Tuexen
eef8d4a973 Use correct mask.
Introduced in https://svnweb.freebsd.org/changeset/base/333603.
Thanks to Irene Ruengler for testing and reporting the issue.

MFC after:		1 week
X-MFC-with:		333603
2018-05-28 13:31:47 +00:00
Matt Macy
62d733a11d in_pcbladdr: remove debug code that snuck in with ifa epoch conversion r334118 2018-05-27 06:47:09 +00:00
Matt Macy
0f8d79d977 CK: update consumers to use CK macros across the board
r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
2018-05-24 23:21:23 +00:00
Matt Macy
fe524329e4 convert allocations to INVARIANTS M_ZERO 2018-05-24 01:04:56 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Matt Macy
630ba2c514 udp: assign flowid to udp sockets round-robin
On a 2x8x2 SKL this increases measured throughput with 64
netperf -H $DUT -t UDP_STREAM -- -m 1

from 590kpps to 1.1Mpps
before:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender.svg
after:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg
2018-05-23 20:50:09 +00:00
Matt Macy
246a619924 epoch: allow for conditionally asserting that the epoch context fields
are unused by zeroing on INVARIANTS builds
2018-05-23 17:00:05 +00:00
Mark Johnston
9f78e2b83d Initialize the dumper struct before calling set_dumper().
Fields owned by the generic code were being left uninitialized,
causing problems in clear_dumper() if an error occurred.

Coverity CID:	1391200
X-MFC with:	r333283
2018-05-22 16:01:56 +00:00
Matt Macy
f42a83f2a6 inpcb: revert deferred inpcb free pending further review 2018-05-21 16:13:43 +00:00
Michael Tuexen
c692df45fc Only fillin data srucuture when actually stored. 2018-05-21 14:53:22 +00:00
Michael Tuexen
d3132db2b5 Do the appropriate accounting when ip_output() fails. 2018-05-21 14:52:18 +00:00
Michael Tuexen
95844fce7d Make clear why there is an assignment, which is not necessary. 2018-05-21 14:51:20 +00:00
Ed Maste
3b9b6b1704 Pair CURVNET_SET and CURVNET_RESTORE in a block
Per vnet(9), CURVNET_SET and CURVNET_RESTORE cannot be used as a single
statement for a conditional and CURVNET_RESTORE must be in the same
block as CURVNET_SET (or a subblock).

Reviewed by:	andrew
Sponsored by:	The FreeBSD Foundation
2018-05-21 13:08:44 +00:00
Ed Maste
15f8acc53f Revert r333968, it broke all archs but i386 and amd64 2018-05-21 11:56:07 +00:00
Matt Macy
ed6bb714b2 in(6)_mcast: Expand out vnet set / restore macro so that they work in a conditional block
Reported by:	zec at fer.hr
2018-05-21 08:34:10 +00:00
Matt Macy
06b15160e1 ensure that vnet is set when doing in_leavegroup 2018-05-21 07:12:06 +00:00
Matt Macy
1a3d880c26 in(s)_moptions: free before tearing down inpcb 2018-05-20 20:08:21 +00:00
Matt Macy
47d2a58560 inpcb: defer destruction of inpcb until after a grace period has elapsed
in_pcbfree will remove the incpb from the list and release the rtentry
while the vnet is set, but the actual destruction will be deferred
until any threads in a (not yet used) epoch section, no longer potentially
have references.
2018-05-20 04:38:04 +00:00
Matt Macy
056b40e29c inpcb: consolidate possible deletion in pcblist functions in to epoch
deferred context.
2018-05-20 02:27:58 +00:00
Matt Macy
ddece765f3 in_pcb: add helper for deferring inpcb rele calls from list functions 2018-05-20 02:17:30 +00:00
Matt Macy
cb6bb2303e ip(6)_freemoptions: defer imo destruction to epoch callback task
Avoid the ugly unlock / lock of the inpcbinfo where we need to
figure out what kind of lock we hold by simply deferring the
operation to another context. (Also a small dependency for
converting the pcbinfo read lock to epoch)
2018-05-20 00:22:28 +00:00
Matt Macy
f6960e207e netinet silence warnings 2018-05-19 05:56:21 +00:00
Matt Macy
7a3c5b05e4 tcp sysctl fix may be uninitialized 2018-05-19 05:55:31 +00:00
Matt Macy
21f6fe2d7c tcp fastopen: fix may be uninitialized 2018-05-19 05:55:00 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Mark Johnston
b35822d9d0 Fix netdump configuration when VIMAGE is enabled.
We need to set the current vnet before iterating over the global
interface list. Because the dump device may only be set from the host,
only proceed with configuration if the thread belongs to the default
vnet. [1]

Also fix a resource leak that occurs if the priv_check() in set_dumper()
fails.

Reported by:	mmacy, sbruno [1]
Reviewed by:	sbruno
X-MFC with:	r333283
Differential Revision:	https://reviews.freebsd.org/D15449
2018-05-17 04:08:57 +00:00
Lawrence Stewart
9891578a40 Plug a memory leak and potential NULL-pointer dereference introduced in r331214.
Each TCP connection that uses the system default cc_newreno(4) congestion
control algorithm module leaks a "struct newreno" (8 bytes of memory) at
connection initialisation time. The NULL-pointer dereference is only germane
when using the ABE feature, which is disabled by default.

While at it:

- Defer the allocation of memory until it is actually needed given that ABE is
  optional and disabled by default.

- Document the ENOMEM errno in getsockopt(2)/setsockopt(2).

- Document ENOMEM and ENOBUFS in tcp(4) as being synonymous given that they are
  used interchangeably throughout the code.

- Fix a few other nits also accidentally omitted from the original patch.

Reported by:	Harsh Jain on freebsd-net@
Tested by:	tjh@
Differential Revision:	https://reviews.freebsd.org/D15358
2018-05-17 02:46:27 +00:00
Brooks Davis
514ef08caf Unwrap a line that no longer requires wrapping. 2018-05-15 20:14:38 +00:00
Brooks Davis
423349f696 Remove stray tabs from in_lltable_dump_entry(). 2018-05-15 20:13:00 +00:00
Stephen Hurd
f2cf90e264 Check that ifma_protospec != NULL in inm_lookup
If ifma_protospec is NULL when inm_lookup() is called, there
is a dereference in a NULL struct pointer. This ensures that struct is
not NULL before comparing the address.

Reported by:	dumbbell
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15440
2018-05-15 16:54:41 +00:00
Michael Tuexen
482d420926 sctp_get_mbuf_for_msg() should honor the allinone parameter.
When it is not required that the buffer is not a chain, return
a chain. This is based on a patch provided by Irene Ruengeler.
2018-05-14 15:16:51 +00:00
Michael Tuexen
589c42c2c8 Ensure that the MTU's used are multiple of 4.
The length of SCTP packets is always a multiple of 4. Therefore,
ensure that the MTUs used are also a multiple of 4.

Thanks to Irene Ruengeler for providing an earlier version of this
patch.

MFC after:	1 week
2018-05-14 13:50:17 +00:00
Stephen Hurd
b69888c28f Fix LORs in in6?_leave_group()
r333175 updated the join_group functions, but not the leave_group ones.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15393
2018-05-11 21:42:27 +00:00
Warner Losh
33123867af Use the full year, for real this time. 2018-05-09 20:26:37 +00:00
Warner Losh
603bbd0631 Minor style nits
Use full copyright year.
Remove 'All Rights Reserved' from new file (rights holder OK'd)
Minor #ifdef motion and #endif tagging
Remove __FBSDID macro from comments

Sponsored by: Netflix
OK'd by: rrs@
2018-05-09 14:11:35 +00:00
Michael Tuexen
45d41de5e6 Fix two typos reported by N. J. Mann, which were introduced in
https://svnweb.freebsd.org/changeset/base/333382 by me.

MFC after:	3 days
2018-05-08 20:39:35 +00:00
Michael Tuexen
9669e724d1 When reporting ERROR or ABORT chunks, don't use more data
that is guaranteed to be contigous.
Thanks to Felix Weinrank for finding and reporting this bug
by fuzzing the usrsctp stack.

MFC after:	3 days
2018-05-08 18:48:51 +00:00
Matt Macy
10d20c84ed Fix spurious retransmit recovery on low latency networks
TCP's smoothed RTT (SRTT) can be much larger than an actual observed RTT. This can be either because of hz restricting the calculable RTT to 10ms in VMs or 1ms using the default 1000hz or simply because SRTT recently incorporated a larger value.

If an ACK arrives before the calculated badrxtwin (now + SRTT):
tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1));

We'll erroneously reset snd_una to snd_max. If multiple segments were dropped and this happens repeatedly the transmit rate will be limited to 1MSS per RTO until we've retransmitted all drops.

Reported by:	rstone
Reviewed by:	hiren, transport
Approved by:	sbruno
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8556
2018-05-08 02:22:34 +00:00
Alexander Motin
167a34407c Keep CARP state as INIT when net.inet.carp.allow=0.
Currently when net.inet.carp.allow=0 CARP state remains as MASTER, which is
not very useful (if there are other masters -- it can lead to split brain,
if there are none -- it makes no sense).  Having it as INIT makes it clear
that carp packets are disabled.

Submitted by:	wg
MFC after:	1 month
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14477
2018-05-07 14:44:55 +00:00
Matt Macy
b6f6f88018 r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.

Synchronously remove all external references to a multicast address before enqueueing for delete.

Reported by:	lwhsu
Approved by:	sbruno
2018-05-06 20:34:13 +00:00
Michael Tuexen
67e8b08bbe Ensure we are not dereferencing a NULL pointer.
This was found by Coverity scanning the usrsctp stack (CID 203808).

MFC after:	3 days
2018-05-06 14:19:50 +00:00
Mark Johnston
e505460228 Import the netdump client code.
This is a component of a system which lets the kernel dump core to
a remote host after a panic, rather than to a local storage device.
The server component is available in the ports tree. netdump is
particularly useful on diskless systems.

The netdump(4) man page contains some details describing the protocol.
Support for configuring netdump will be added to dumpon(8) in a future
commit. To use netdump, the kernel must have been compiled with the
NETDUMP option.

The initial revision of netdump was written by Darrell Anderson and
was integrated into Sandvine's OS, from which this version was derived.

Reviewed by:	bdrewery, cem (earlier versions), julian, sbruno
MFC after:	1 month
X-MFC note:	use a spare field in struct ifnet
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15253
2018-05-06 00:38:29 +00:00
Matt Macy
28c001002a Currently in_pcbfree will unconditionally wunlock the pcbinfo lock
to avoid a LOR on the multicast list lock in the freemoptions routines.
As it turns out, tcp_usr_detach can acquire the tcbinfo lock readonly.
Trying to wunlock the pcbinfo lock in that context has caused a number
of reported crashes.

This change unclutters in_pcbfree and moves the handling of wunlock vs
runlock of pcbinfo to the freemoptions routine.

Reported by:	mjg@, bde@, o.hartmann at walstatt.org
Approved by:	sbruno
2018-05-05 22:40:40 +00:00
Andrey V. Elsukov
5ada542398 Immediately propagate EACCES error code to application from tcp_output.
In r309610 and r315514 the behavior of handling EACCES was changed, and
tcp_output() now returns zero when EACCES happens. The reason of this
change was a hesitation that applications that use TCP-MD5 will be
affected by changes in project/ipsec.

TCP-MD5 code returns EACCES when security assocition for given connection
is not configured. But the same error code can return pfil(9), and this
change has affected connections blocked by pfil(9). E.g. application
doesn't return immediately when SYN segment is blocked, instead it waits
when several tries will be failed.

Actually, for TCP-MD5 application it doesn't matter will it get EACCES
after first SYN, or after several tries. Security associtions must be
configured before initiating TCP connection.

I left the EACCES in the switch() to show that it has special handling.

Reported by:	Andreas Longwitz <longwitz at incore dot de>
MFC after:	10 days
2018-05-04 09:28:12 +00:00
Sean Bruno
68ff29affa cc_cubic:
- Update cubic parameters to draft-ietf-tcpm-cubic-04

Submitted by:	Matt Macy <mmacy@mattmacy.io>
Reviewed by:	lstewart
Differential Revision:	https://reviews.freebsd.org/D10556
2018-05-03 15:01:27 +00:00
Michael Tuexen
4c6a10903f SImplify the call to tcp_drop(), since the handling of soft error
is also done in tcp_drop(). No functional change.

Sponsored by:	Netflix, Inc.
2018-05-02 20:04:31 +00:00
Stephen Hurd
f3e1324b41 Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by:	mmacy <mmacy@mattmacy.io>
Reviewed by:	shurd, sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14969
2018-05-02 19:36:29 +00:00
Randall Stewart
803a230592 This change re-arranges the fields within the tcp-pcb so that
they are more in order of cache line use as one passes
through the tcp_input/output paths (non-errors most likely path). This
helps speed up cache line optimization so that the tcp stack runs
a bit more efficently.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D15136
2018-04-26 21:41:16 +00:00
Sean Bruno
7875017ca9 Revert r332894 at the request of the submitter.
Submitted by:	Johannes Lundberg <johalun0_gmail.com>
Sponsored by:	Limelight Networks
2018-04-24 19:55:12 +00:00
Sean Bruno
7b7796eea5 Load balance sockets with new SO_REUSEPORT_LB option
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).

Submitted by:	Johannes Lundberg <johanlun0@gmail.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-04-23 19:51:00 +00:00
Randall Stewart
fd389e7cd5 These two modules need the tcp_hpts.h file for
when the option is enabled (not sure how LINT/build-universe
missed this) opps.

Sponsored by:	Netflix Inc
2018-04-19 15:03:48 +00:00
Randall Stewart
3ee9c3c4eb This commit brings in the TCP high precision timer system (tcp_hpts).
It is the forerunner/foundational work of bringing in both Rack and BBR
which use hpts for pacing out packets. The feature is optional and requires
the TCPHPTS option to be enabled before the feature will be active. TCP
modules that use it must assure that the base component is compile in
the kernel in which they are loaded.

MFC after:	Never
Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D15020
2018-04-19 13:37:59 +00:00
Brooks Davis
3a4fc8a8a1 Remove support for the Arcnet protocol.
While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.

Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.

PR:		182297
Reviewed by:	jhibbits, vangyzen
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15057
2018-04-13 21:18:04 +00:00
Brooks Davis
0437c8e3b1 Remove support for FDDI networks.
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by:	kib, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15017
2018-04-11 17:28:24 +00:00
Jonathan T. Looney
f897951965 Add missing header change from r332381.
Sponsored by:	Netflix, Inc.
Pointy hat:	jtl
2018-04-10 17:00:37 +00:00
Jonathan T. Looney
8b8718b5b0 Modify the net.inet.tcp.function_ids sysctl introduced in r331347.
Export additional information which may be helpful to userspace
consumers and rename the sysctl to net.inet.tcp.function_info.

Sponsored by:	Netflix, Inc.
2018-04-10 16:59:36 +00:00
Jonathan T. Looney
dcaffbd6fb Move the TCP Blackbox Recorder probe in tcp_output.c to be with the
other tracing/debugging code.

Sponsored by:	Netflix, Inc.
2018-04-10 15:54:29 +00:00
Jonathan T. Looney
4889b58ce8 Clean up some debugging code left in tcp_log_buf.c from r331347.
Sponsored by:	Netflix, Inc.
2018-04-10 15:51:37 +00:00
Michael Tuexen
efcf28ef77 Fix a logical inversion bug.
Thanks to Irene Ruengeler for finding and reporting this bug.

MFC after:	3 days
2018-04-08 12:08:20 +00:00
Michael Tuexen
3bcfd10fe1 Small cleanup, no functional change.
MFC after:	3 days
2018-04-08 11:50:06 +00:00
Michael Tuexen
c3115feb7e Fix a signed/unsigned warning showing up for the userland stack
on some platforms.
Thanks to Felix Weinrank for reporting the issue.

MFC after:i	3 days
2018-04-08 11:37:00 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Jonathan T. Looney
8fa799bd74 If a user closes the socket before we call tcp_usr_abort(), then
tcp_drop() may unlock the INP.  Currently, tcp_usr_abort() does not
check for this case, which results in a panic while trying to unlock
the already-unlocked INP (not to mention, a use-after-free violation).

Make tcp_usr_abort() check the return value of tcp_drop(). In the case
where tcp_drop() returns NULL, tcp_usr_abort() can skip further steps
to abort the connection and simply unlock the INP_INFO lock prior to
returning.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
2018-04-06 17:20:37 +00:00
Jonathan T. Looney
45a48d07c3 Check that in_pcbfree() is only called once for each PCB. If that
assumption is violated, "bad things" could follow.

I believe such an assert would have detected some of the problems jch@
was chasing in PR 203175 (see r307551).  We also use it in our internal
TCP development efforts.  And, in case a bug does slip through to
released code, this change silently ignores subsequent calls to
in_pcbfree().

Reviewed by:	rrs
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D14990
2018-04-06 16:48:11 +00:00
Ed Maste
c73b6f4da9 Fix kernel memory disclosure in tcp_ctloutput
strcpy was used to copy a string into a buffer copied to userland, which
left uninitialized data after the terminating 0-byte.  Use the same
approach as in tcp_subr.c: strncpy and explicit '\0'.

admbugs:	765, 822
MFC after:	1 day
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reported by:	Vlad Tsyrklevich
Security:	Kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
2018-04-04 21:12:35 +00:00
Jonathan T. Looney
4b9dc36454 r330675 introduced an extra window check in the LRO code to ensure it
captured and reported the highest window advertisement with the same
SEQ/ACK.  However, the window comparison uses modulo 2**16 math, rather
than directly comparing the absolute values.  Because windows use
absolute values and not modulo 2**16 math (i.e. they don't wrap), we
need to compare the absolute values.

Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D14937
2018-04-03 13:54:38 +00:00
Navdeep Parhar
a64564109a Add a hook to allow the toedev handling an offloaded connection to
provide accurate TCP_INFO.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14816
2018-04-03 01:08:54 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Navdeep Parhar
b9a9e8e9bd Fix RSS build (broken in r331309).
Sponsored by:	Chelsio Communications
2018-03-29 19:48:17 +00:00
Brooks Davis
69f0fecbd6 Remove infrastructure for token-ring networks.
Reviewed by:	cem, imp, jhb, jmallett
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14875
2018-03-28 23:33:26 +00:00
Sean Bruno
e2041bfa7c CC Cubic: fix underflow for cubic_cwnd()
Singed calculations in cubic_cwnd() can result in negative cwnd
value which is then cast to an unsigned value. Values less than
1 mss are generally bad for other parts of the code, also fixed.

Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14141
2018-03-26 19:53:36 +00:00
Jonathan T. Looney
e24e568336 Make the TCP blackbox code committed in r331347 be an optional feature
controlled by the TCP_BLACKBOX option.

Enable this as part of amd64 GENERIC. For now, leave it disabled on
other platforms.

Sponsored by:	Netflix, Inc.
2018-03-24 12:48:10 +00:00
Jonathan T. Looney
9959c8b9e6 Fix compilation for platforms that don't support atomic_fetchadd_64()
after r331347.

Reported by:	avg, br, jhibbits
Sponsored by:	Netflix, Inc.
2018-03-24 12:40:45 +00:00
Sean Bruno
72bfa0bf63 Revert r331379 as the "simple" lock changes have revealed a deeper problem
and need for a rethink.

Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
2018-03-23 18:34:38 +00:00
Kristof Provost
effaab8861 netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by:	ae, kevans
Differential Revision:	https://reviews.freebsd.org/D13715
2018-03-23 16:56:44 +00:00
Sean Bruno
2a499acf59 Simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput.
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14624
2018-03-22 22:29:32 +00:00
Jonathan T. Looney
2529f56ed3 Add the "TCP Blackbox Recorder" which we discussed at the developer
summits at BSDCan and BSDCam in 2017.

The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.

It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.

You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.

This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.

There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.

Reviewed by:	gnn (previous version)
Obtained from:	Netflix, Inc.
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D11085
2018-03-22 09:40:08 +00:00
Gleb Smirnoff
3108a71a34 Fix LINT-NOINET build initializing local to false. This is
a dead code, since for NOINET build isipv6 is always true,
but this dead code makes it compilable.

Reported by:	rpokala
2018-03-22 05:07:57 +00:00
Gleb Smirnoff
dd388cfd9b The net.inet.tcp.nolocaltimewait=1 optimization prevents local TCP connections
from entering the TIME_WAIT state. However, it omits sending the ACK for the
FIN, which results in RST. This becomes a bigger deal if the sysctl
net.inet.tcp.blackhole is 2. In this case RST isn't send, so the other side of
the connection (also local) keeps retransmitting FINs.

To fix that in tcp_twstart() we will not call tcp_close() immediately. Instead
we will allocate a tcptw on stack and proceed to the end of the function all
the way to tcp_twrespond(), to generate the correct ACK, then we will drop the
last PCB reference.

While here, make a few tiny improvements:
- use bools for boolean variable
- staticize nolocaltimewait
- remove pointless acquisiton of socket lock

Reported by:	jtl
Reviewed by:	jtl
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D14697
2018-03-21 20:59:30 +00:00
Jonathan T. Looney
7fb2986ff6 If the INP lock is uncontested, avoid taking a reference and jumping
through the lock-switching hoops.

A few of the INP lookup operations that lock INPs after the lookup do
so using this mechanism (to maintain lock ordering):

1. Lock lookup structure.
2. Find INP.
3. Acquire reference on INP.
4. Drop lock on lookup structure.
5. Acquire INP lock.
6. Drop reference on INP.

This change provides a slightly shorter path for cases where the INP
lock is uncontested:

1. Lock lookup structure.
2. Find INP.
3. Try to acquire the INP lock.
4. If successful, drop lock on lookup structure.

Of course, if the INP lock is contested, the functions will need to
revert to the previous way of switching locks safely.

This saves a few atomic operations when the INP lock is uncontested.

Discussed with:	gallatin, rrs, rwatson
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D12911
2018-03-21 15:54:46 +00:00
Lawrence Stewart
370efe5ac8 Add support for the experimental Internet-Draft "TCP Alternative Backoff with
ECN (ABE)" proposal to the New Reno congestion control algorithm module.
ABE reduces the amount of congestion window reduction in response to
ECN-signalled congestion relative to the loss-inferred congestion response.

More details about ABE can be found in the Internet-Draft:
https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn

The implementation introduces four new sysctls:

- net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to
  enable ABE for ECN-enabled TCP connections.

- net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the
  multiplicative window decrease factor, specified as a percentage, applied to
  the congestion window in response to a loss-based or ECN-based congestion
  signal respectively. They default to the values specified in the draft i.e.
  beta=50 and beta_ecn=80.

- net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to
  non-zero to enable the use of standard beta (50% by default) when repairing
  loss during an ECN-signalled congestion recovery episode. It enables a more
  conservative congestion response and is provided for the purposes of
  experimentation as a result of some discussion at IETF 100 in Singapore.

The values of beta and beta_ecn can also be set per-connection by way of the
TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or
CC_NEWRENO_BETA_ECN CC algo sub-options.

Submitted by:	Tom Jones <tj@enoti.me>
Tested by:	Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au>
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D11616
2018-03-19 16:37:47 +00:00
Alexander V. Chernikov
1435dcd94f Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.
Current arp/nd code relies on the feedback from the datapath indicating
 that the entry is still used. This mechanism is incorporated into the
 arpresolve()/nd6_resolve() routines. After the inpcb route cache
 introduction, the packet path for the locally-originated packets changed,
 passing cached lle pointer to the ether_output() directly. This resulted
 in the arp/ndp entry expire each time exactly after the configured max_age
 interval. During the small window between the ARP/NDP request and reply
 from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
 path used by route cache. Unify the notification code by using single
 inlined function with the per-AF callbacks.

Reported by:	sthaug at nethelp.no
Reviewed by:	ae
MFC after:	2 weeks
2018-03-17 17:05:48 +00:00
Michael Tuexen
1574b1e41e Set the inp_vflag consistently for accepted TCP/IPv6 connections when
net.inet6.ip6.v6only=0.

Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.

PR:			226421
Reviewed by:		bz, hiren, kib
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D13514
2018-03-16 15:26:07 +00:00
Sean Bruno
d7fb35d13a Update tcp_lro with tested bugfixes from Netflix and LLNW:
rrs - Lets make the LRO code look for true dup-acks and window update acks
          fly on through and combine.
    rrs - Make the LRO engine a bit more aware of ack-only seq space. Lets not
          have it incorrectly wipe out newer acks for older acks when we have
          out-of-order acks (common in wifi environments).
    jeggleston - LRO eating window updates

Based on all of the above I think we are RFC compliant doing it this way:

https://tools.ietf.org/html/rfc1122

section 4.2.2.16

"Note that TCP has a heuristic to select the latest window update despite
possible datagram reordering; as a result, it may ignore a window update with
a smaller window than previously offered if neither the sequence number nor the
acknowledgment number is increased."

Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
Reviewed by:	rstone gallatin
Sponsored by:	NetFlix and Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14540
2018-03-09 00:08:43 +00:00
Michael Tuexen
1c714531e8 When checking the TCP fast cookie length, conststently also check
for the minimum length.

This fixes a bug where cookies of length 2 bytes (which is smaller
than the minimum length of 4) is provided by the server.

Sponsored by:	Netflix, Inc.
2018-02-27 22:12:38 +00:00
Patrick Kelsey
1f13c23f3d Ensure signed comparison to avoid false trip of assert during VNET teardown.
Reported by:	lwhsu
MFC after:	1 month
2018-02-26 20:31:16 +00:00
Patrick Kelsey
18a7530938 Greatly reduce the number of #ifdefs supporting the TCP_RFC7413 kernel option.
The conditional compilation support is now centralized in
tcp_fastopen.h and tcp_var.h. This doesn't provide the minimum
theoretical code/data footprint when TCP_RFC7413 is disabled, but
nearly all the TFO code should wind up being removed by the optimizer,
the additional footprint in the syncache entries is a single pointer,
and the additional overhead in the tcpcb is at the end of the
structure.

This enables the TCP_RFC7413 kernel option by default in amd64 and
arm64 GENERIC.

Reviewed by:	hiren
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14048
2018-02-26 03:03:41 +00:00
Patrick Kelsey
c560df6f12 This is an implementation of the client side of TCP Fast Open (TFO)
[RFC7413]. It also includes a pre-shared key mode of operation in
which the server requires the client to be in possession of a shared
secret in order to successfully open TFO connections with that server.

The names of some existing fastopen sysctls have changed (e.g.,
net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable).

Reviewed by:	tuexen
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14047
2018-02-26 02:53:22 +00:00
Patrick Kelsey
798caa2ee5 Fix harmless locking bug in tfp_fastopen_check_cookie().
The keylist lock was not being acquired early enough.  The only side
effect of this bug is that the effective add time of a new key could
be slightly later than it would have been otherwise, as seen by a TFO
client.

Reviewed by:	tuexen
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14046
2018-02-26 02:43:26 +00:00
Andrey V. Elsukov
b2fe54bc8b Reinitialize IP header length after checksum calculation. It is used
later by TCP-MD5 code.

This fixes the problem with broken TCP-MD5 over IPv4 when NIC has
disabled TCP checksum offloading.

PR:		223835
MFC after:	1 week
2018-02-10 10:13:17 +00:00
Andrey V. Elsukov
b99a682320 Rework ipfw dynamic states implementation to be lockless on fast path.
o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
  for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
  need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
  information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
  dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
  lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
  and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
  O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
  prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
  separate arrays. 65535 limit to maximum number of hash buckets now
  removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
  and for parent states.
o By default now is used Jenkinks hash function.

Obtained from:	Yandex LLC
MFC after:	42 days
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D12685
2018-02-07 18:59:54 +00:00
John Baldwin
f17985319d Export tcp_always_keepalive for use by the Chelsio TOM module.
This used to work by accident with ld.bfd even though always_keepalive
was marked as static. LLD honors static more correctly, so export this
variable properly (including moving it into the tcp_* namespace).

Reviewed by:	bz, emaste
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14129
2018-01-30 23:01:37 +00:00
Michael Tuexen
cf6339882e Add constant for the PAD chunk as defined in RFC 4820.
This will be used by traceroute and traceroute6 soon.

MFC after:	1 week
2018-01-27 13:46:55 +00:00
Michael Tuexen
07e75d0a37 Update references in comments, since the IDs have become an RFC long
time ago. Also cleanup whitespaces. No functional change.

MFC after:	1 week
2018-01-27 13:43:03 +00:00
Conrad Meyer
222daa421f style: Remove remaining deprecated MALLOC/FREE macros
Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions).  Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by:	jhb
Reviewed by:	cy, imp, markj, rmacklem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14035
2018-01-25 22:25:13 +00:00
Mark Johnston
5557762b72 Use tcpinfoh_t for TCP headers in the tcp:::debug-{drop,input} probes.
The header passed to these probes has some fields converted to host
order by tcp_fields_to_host(), so the tcpinfo_t translator doesn't do
what we want.

Submitted by:	Hannes Mehnert
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12647
2018-01-25 15:35:34 +00:00
Navdeep Parhar
09b0b8c058 Do not generate illegal mbuf chains during IP fragment reassembly. Only
the first mbuf of the reassembled datagram should have a pkthdr.

This was discovered with cxgbe(4) + IPSEC + ping with payload more than
interface MTU.  cxgbe can generate !M_WRITEABLE mbufs and this results
in m_unshare being called on the reassembled datagram, and it complains:

panic: m_unshare: m0 0xfffff80020f82600, m 0xfffff8005d054100 has M_PKTHDR

PR:		224922
Reviewed by:	ae@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14009
2018-01-24 05:09:21 +00:00
Ryan Stone
fc21c53f63 Reduce code duplication for inpcb route caching
Add a new macro to clear both the L3 and L2 route caches, to
hopefully prevent future instances where only the L3 cache was
cleared when both should have been.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13989
Reviewed by:	karels
2018-01-23 03:15:39 +00:00
Michael Tuexen
e9a3a1b13c Fix a bug related to fast retransmissions.
When processing a SACK advancing the cumtsn-ack in fast recovery,
increment the miss-indications for all TSN's reported as missing.

Thanks to Fabian Ising for finding the bug and to Timo Voelker
for provinding a fix.

This fix moves also CMT related initialisation of some variables
to a more appropriate place.

MFC after:	1 week
2018-01-16 21:58:38 +00:00
Michael Tuexen
46bf534caf Don't provide a (meaningless) cmsg when proving a notification
in a recvmsg() call.

MFC after:	1 week
2018-01-15 21:59:20 +00:00
Pedro F. Giffuni
84f210952c libalias: small memory allocation cleanups.
Make the calloc wrappers behave as expected by using mallocarray.
It is rather weird that the malloc wrappers also zeroes the memory: update
a comment to reflect at least two cases where it is expected.

Reviewed by:	tuexen
2018-01-12 23:12:30 +00:00
Cy Schubert
f813e520c2 Correct the comment describing badrs which is bad router solicitiation,
not bad router advertisement.

MFC after:	3 days
2017-12-29 07:23:18 +00:00
Michael Tuexen
fa5867cbd6 White cleanups. 2017-12-26 16:33:55 +00:00
Michael Tuexen
f34a628e7e Clearify CID 1008197.
MFC after:	3 days
2017-12-26 16:12:04 +00:00
Michael Tuexen
0460135495 Clearify issue reported in CID 1008198.
MFC after:	3 days
2017-12-26 16:06:11 +00:00
Michael Tuexen
f6ea123171 Fix CID 1008428.
MFC after:	1 week
2017-12-26 15:29:11 +00:00
Michael Tuexen
4830aee72f Fix CID 1008936. 2017-12-26 15:24:42 +00:00
Michael Tuexen
c9256941d0 Allow the first (and second) argument of sn_calloc() be a sum.
This fixes a bug reported in
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224103
PR:		224103
2017-12-26 14:37:47 +00:00
Michael Tuexen
cd90150413 When adding support for sending SCTP packets containing an ABORT chunk
to ipfw in https://svnweb.freebsd.org/changeset/base/326233,
a dependency on the SCTP stack was added to ipfw by accident.

This was noted by Kevel Bowling in https://reviews.freebsd.org/D13594
where also a solution was suggested. This patch is based on Kevin's
suggestion, but implements the required SCTP checksum computation
without any dependency on other SCTP sources.

While there, do some cleanups and improve comments.

Thanks to Kevin Kevin Browling for reporting the issue and suggesting
a fix.
2017-12-26 12:35:02 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Andrey V. Elsukov
2aad62408b Fix mbuf leak when TCPMD5_OUTPUT() method returns error.
PR:		223817
MFC after:	1 week
2017-12-14 12:54:20 +00:00
Michael Tuexen
cd6340caf7 Cleaup, no functional change. 2017-12-13 17:11:57 +00:00
Gleb Smirnoff
66492fea49 Separate out send buffer autoscaling code into function, so that
alternative TCP stacks may reuse it instead of pasting.

Obtained from:	Netflix
2017-12-07 22:36:58 +00:00
Michael Tuexen
9f0abda051 Retire SCTP_WITH_NO_CSUM option.
This option was used in the early days to allow performance measurements
extrapolating the use of SCTP checksum offloading. Since this feature
is now available, get rid of this option.
This also un-breaks the LINT kernel. Thanks to markj@ for making me
aware of the problem.
2017-12-07 22:19:08 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Michael Tuexen
665c8a2ee5 Add to ipfw support for sending an SCTP packet containing an ABORT chunk.
This is similar to the TCP case. where a TCP RST segment can be sent.

There is one limitation: When sending an ABORT in response to an incoming
packet, it should be tested if there is no ABORT chunk in the received
packet. Currently, it is only checked if the first chunk is an ABORT
chunk to avoid parsing the whole packet, which could result in a DOS attack.

Thanks to Timo Voelker for helping me to test this patch.
Reviewed by: bcr@ (man page part), ae@ (generic, non-SCTP part)
Differential Revision:	https://reviews.freebsd.org/D13239
2017-11-26 18:19:01 +00:00
Michael Tuexen
18442f0a5b Fix SPDX line as suggested by pfg 2017-11-24 19:38:59 +00:00
Michael Tuexen
ad15e1548f Unbreak compilation when using SCTP_DETAILED_STR_STATS option.
MFC after:	1 week
2017-11-24 12:18:48 +00:00
Michael Tuexen
b7d2b5d5b1 Add SPDX line. 2017-11-24 11:25:53 +00:00
Mark Johnston
7a5c730561 Use the right variable for the IP header parameter to tcp:::send.
This addresses a regression from r311225.

MFC after:	1 week
2017-11-22 14:13:40 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Michael Tuexen
3e87bccde3 Fix the handling of ERROR chunks which a lot of error causes.
While there, clean up the code.
Thanks to Felix Weinrank who found the bug by using fuzz-testing
the SCTP userland stack.

MFC after:	1 week
2017-11-15 22:13:10 +00:00
Michael Tuexen
d0f6ab7920 Simply the code and use the full buffer for contigous chunk representation.
MFC after:	1 week
2017-11-14 02:30:21 +00:00
Gleb Smirnoff
3e21cbc802 Style r320614: don't initialize at declaration, new line after declarations,
shorten variable name to avoid extra long lines.
No functional changes.
2017-11-13 22:16:47 +00:00
Michael Tuexen
469a65d1f8 Cleanup the handling of control chunks. While there fix some minor
bug related to clearing the assoc retransmit counter and the dup TSN
handling of NR-SACK chunks.

MFC after:	3 days
2017-11-12 21:43:33 +00:00
Konstantin Belousov
06193f0be0 Use hardware timestamps to report packet timestamps for SO_TIMESTAMP
and other similar socket options.

Provide new control message SCM_TIME_INFO to supply information about
timestamp.  Currently it indicates that the timestamp was
hardware-assisted and high-precision, for software timestamps the
message is not returned.  Reserved fields are added to ABI to report
additional info about it, it is expected that raw hardware clock value
might be useful for some applications.

Reviewed by:	gallatin (previous version), hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
X-Differential revision:	https://reviews.freebsd.org/D12638
2017-11-07 09:46:26 +00:00
Michael Tuexen
253a63b817 Fix an accounting bug where data was counted twice if on the read
queue and on the ordered or unordered queue.
While there, improve the checking in INVARIANTs when computing the
a_rwnd.

MFC after:	3 days
2017-11-05 11:59:33 +00:00
Michael Tuexen
28a6adde1d Allow the setting of the MTU for future paths using an SCTP socket option.
This functionality was missing.

MFC after:	1 week
2017-11-03 20:46:12 +00:00
Michael Tuexen
ba5fc4cf78 Fix the reporting of the MTU for SCTP sockets when using IPv6.
MFC after:	1 week
2017-11-01 16:32:11 +00:00
Michael Tuexen
966dfbf910 Fix parsing error when processing cmsg in SCTP send calls. Thei bug is
related to a signed/unsigned mismatch.
This should most likely fix the issue in sctp_sosend reported by
Dmitry Vyukov on the freebsd-hackers mailing list and found by
running syzkaller.
2017-10-27 19:27:05 +00:00
Michael Tuexen
8d9b040dd4 Fix a bug reported by Felix Weinrank using the libfuzzer on the
userland stack.

MFC after:	3 days
2017-10-25 09:12:22 +00:00
Michael Tuexen
701492a5f6 Fix a bug in handling special ABORT chunks.
Thanks to Felix Weinrank for finding this issue using libfuzzer with
the userland stack.

MFC after:	3 days
2017-10-24 16:24:12 +00:00
Michael Tuexen
adc59f7f46 Fix a locking issue found by running AFL on the userland stack.
Thanks to Felix Weinrank for reporting the issue.

MFC after:	3 days
2017-10-24 14:28:56 +00:00
Alexander Motin
81098a018e Relax per-ifnet cif_vrs list double locking in carp(4).
In all cases where cif_vrs list is modified, two locks are held: per-ifnet
CIF_LOCK and global carp_sx.  It means to read that list only one of them
is enough to be held, so we can skip CIF_LOCK when we already have carp_sx.

This fixes kernel panic, caused by attempts of copyout() to sleep while
holding non-sleepable CIF_LOCK mutex.

Discussed with:	glebius
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-10-19 09:01:15 +00:00
Michael Tuexen
af03054c8a Fix a signed/unsigned warning.
MFC after:	1 week
2017-10-18 21:08:35 +00:00
Michael Tuexen
7f75695a3e Abort an SCTP association, when a DATA chunk is followed by an unknown
chunk with a length smaller than the minimum length.

Thanks to Felix Weinrank for making me aware of the problem.
MFC after:	3 days
2017-10-18 20:17:44 +00:00
Michael Tuexen
0d5af38ceb Revert change which got in accidently. 2017-10-18 18:59:35 +00:00
Michael Tuexen
3ed8d364a7 Fix a bug introduced in r324638.
Thanks to Felix Weinrank for making me aware of this.

MFC after:	3 days
2017-10-18 18:56:56 +00:00
Michael Tuexen
80a2d1406f Fix the handling of parital and too short chunks.
Ensure that the current behaviour is consistent: stop processing
of the chunk, but finish the processing of the previous chunks.

This behaviour might be changed in a later commit to ABORT the
assoication due to a protocol violation, but changing this
is a separate issue.

MFC after:	3 days
2017-10-15 19:33:30 +00:00
Michael Tuexen
8c8e10b763 Code cleanup, not functional change.
This avoids taking a pointer of a packed structure which allows simpler
compilation of the userland stack.

MFC after:	1 week
2017-10-14 10:02:59 +00:00
Gleb Smirnoff
3bdf4c4274 Declare more TCP globals in tcp_var.h, so that alternative TCP stacks
can use them.  Gather all TCP tunables in tcp_var.h in one place and
alphabetically sort them, to ease maintainance of the list.

Don't copy and paste declarations in tcp_stacks/fastpath.c.
2017-10-11 20:36:09 +00:00
Gleb Smirnoff
e29c55e4bb Declare pmtud_blackhole global variables in tcp_timer.h, so that
alternative TCP stacks can legally use them.
2017-10-06 20:33:40 +00:00
Michael Tuexen
ff76c8c9fd Ensure that the accept ABORT chunks with the T-bit set only the
a non-zero matching peer tag is provided.

MFC after:	1 week
2017-10-05 13:29:54 +00:00
Julien Charbon
5fcd2d9bfc Forgotten bits in r324179: Include sys/syslog.h if INVARIANTS is not defined
MFC after:	1 week
X-MFC with:	r324179
Pointy hat to:	jch
2017-10-02 09:45:17 +00:00
Patrick Kelsey
3f43239f21 The soisconnected() call removed from syncache_socket() in r307966 was
not extraneous in the TCP Fast Open (TFO) passive-open case.  In the
TFO passive-open case, syncache_socket() is being called during
processing of a TFO SYN bearing a valid cookie, and a call to
soisconnected() is required in order to allow the application to
immediately consume any data delivered in the SYN and to have a chance
to generate response data to accompany the SYN-ACK.  The removal of
this call to soisconnected() effectively converted all TFO passive
opens to having the same RTT cost as a standard 3WHS.

This commit adds a call to soisconnected() to syncache_tfo_expand() so
that it is only in the TFO passive-open path, thereby restoring TFO
passve-open RTT performance and preserving the non-TFO connection-rate
performance gains realized by r307966.

MFC after:	1 week
Sponsored by:	Limelight Networks
2017-10-01 23:37:17 +00:00
Julien Charbon
dfa1f80ce9 Fix an infinite loop in tcp_tw_2msl_scan() when an INP_TIMEWAIT inp has
been destroyed before its tcptw with INVARIANTS undefined.

This is a symmetric change of r307551:

A INP_TIMEWAIT inp should not be destroyed before its tcptw, and INVARIANTS
will catch this case.  If INVARIANTS is undefined it will emit a log(LOG_ERR)
and avoid a hard to debug infinite loop in tcp_tw_2msl_scan().

Reported by:		Ben Rubson, hselasky
Submitted by:		hselasky
Tested by:		Ben Rubson, jch
MFC after:		1 week
Sponsored by:		Verisign, inc
Differential Revision:	https://reviews.freebsd.org/D12267
2017-10-01 21:20:28 +00:00
Andrey V. Elsukov
f415d666c3 Some mbuf related fixes in icmp_error()
* check mbuf length before doing mtod() and accessing to IP header;
* update oip pointer and all depending pointers after m_pullup();
* remove extra checks and extra parentheses, wrap long lines;

PR:		222670
Reported by:	Prabhakar Lakhera
MFC after:	1 week
2017-09-29 06:24:45 +00:00
Michael Tuexen
09c53cb6cc Remove unused function.
MFC after:	1 week
2017-09-27 13:05:23 +00:00
Sepherosa Ziehau
fc572e261f tcp: Don't "negotiate" MSS.
_NO_ OSes actually "negotiate" MSS.

RFC 879:
"... This Maximum Segment Size (MSS) announcement (often mistakenly
called a negotiation) ..."

This negotiation behaviour was introduced 11 years ago by r159955
without any explaination about why FreeBSD had to "negotiate" MSS:

    In syncache_respond() do not reply with a MSS that is larger than what
    the peer announced to us but make it at least tcp_minmss in size.

    Sponsored by:   TCP/IP Optimization Fundraise 2005

The tcp_minmss behaviour is still kept.

Syncookie fix was prodded by tuexen, who also helped to test this
patch w/ packetdrill.

Reviewed by:	tuexen, karels, bz (previous version)
MFC after:	2 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12430
2017-09-27 05:52:37 +00:00
Michael Tuexen
d28a3a393b Add missing locking. Found by Coverity while scanning the usrsctp
library.

MFC after:	1 week
2017-09-22 06:33:01 +00:00
Michael Tuexen
afb908dada Add missing socket lock.
MFC after:	1 week
2017-09-22 06:07:47 +00:00
Michael Tuexen
cdd2d7d4a5 Code cleanup, no functional change.
MFC after:	1 week
2017-09-21 11:56:31 +00:00
Michael Tuexen
53999485e0 Free the control structure after using is, not before.
Found by Coverity while scanning the usrsctp library.
MFC after:	1 week
2017-09-21 09:47:56 +00:00
Michael Tuexen
d0d8c7de19 No need to wakeup, since sctp_add_to_readq() does it.
MFC after:	1 week
2017-09-21 09:18:05 +00:00
Michael Tuexen
2c62ba7377 Protect the address workqueue timer by a mutex.
MFC after:	1 week
2017-09-20 21:29:54 +00:00
Michael Tuexen
3ec509bcd3 Fix a warning.
MFC after:	1 week
2017-09-19 20:24:13 +00:00
Michael Tuexen
564a95f485 Avoid an overflow when computing the staleness.
This issue was found by running libfuzz on the userland stack.

MFC after:	1 week
2017-09-19 20:09:58 +00:00
Michael Tuexen
ad608f06ed Remove a no longer used variable.
Reported by:	Felix Weinrank
MFC after:	1 week
2017-09-19 15:00:19 +00:00
Michael Tuexen
72e23aba22 Fix an accounting bug and use sctp_timer_start to start a timer.
MFC after:	1 week
2017-09-17 09:27:27 +00:00
Michael Tuexen
fe40f49bb3 Remove code not used on any platform currently supported.
MFC after:	1 week
2017-09-16 21:26:06 +00:00
Michael Tuexen
292efb1bc0 Export the UDP encapsualation port and the path state. 2017-09-12 21:08:50 +00:00
Michael Tuexen
e5cccc35c3 Add support to print the TCP stack being used.
Sponsored by:	Netflix, Inc.
2017-09-12 13:34:43 +00:00
Michael Tuexen
7b5f06fbcc Fix MTU computation. Coverity scanning usrsctp pointed to this code...
MFC after:	3 days
2017-09-09 21:03:40 +00:00
Michael Tuexen
f55c326691 Fix locking issues found by Coverity scanning the usrsctp library.
MFC after:	3 days
2017-09-09 20:44:56 +00:00
Michael Tuexen
0c4622dab2 Silence a Coverity warning from scanning the usrsctp library.
MFC after:	3 days
2017-09-09 20:08:26 +00:00
Michael Tuexen
6c2cfc0419 Savely remove a chunk from the control queue.
This bug was found by Coverity scanning the usrsctp library.

MFC after:	3 days
2017-09-09 19:49:50 +00:00
Hans Petter Selasky
95ed5015ec Add support for generic backpressure indicator for ratelimited
transmit queues aswell as non-ratelimited ones.

Add the required structure bits in order to support a backpressure
indication with ratelimited connections aswell as non-ratelimited
ones. The backpressure indicator is a value between zero and 65535
inclusivly, indicating if the destination transmit queue is empty or
full respectivly. Applications can use this value as a decision point
for when to stop transmitting data to avoid endless ENOBUFS error
codes upon transmitting an mbuf. This indicator is also useful to
reduce the latency for ratelimited queues.

Reviewed by:		gallatin, kib, gnn
Differential Revision:	https://reviews.freebsd.org/D11518
Sponsored by:		Mellanox Technologies
2017-09-06 13:56:18 +00:00
Michael Tuexen
3d5af7a127 Fix blackhole detection.
There were two bugs related to the blackhole detection:
* The smalles size was tried more than two times.
* The restored MSS was not the original one, but the second
  candidate.

MFC after:	1 week
Sponsored by:	Netflix, Inc.
2017-08-28 11:41:18 +00:00
Sean Bruno
32a04bb81d Use counter(9) for PLPMTUD counters.
Remove unused PLPMTUD sysctl counters.

Bump UPDATING and FreeBSD Version to indicate a rebuild is required.

Submitted by:	kevin.bowling@kev009.com
Reviewed by:	jtl
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12003
2017-08-25 19:41:38 +00:00
Michael Tuexen
e8aba2eb21 Avoid TCP log messages which are false positives.
This is https://svnweb.freebsd.org/changeset/base/322812, just for
alternate TCP stacks.

XMFC with: 	322812
2017-08-23 15:08:51 +00:00
Michael Tuexen
4da9052a05 Avoid TCP log messages which are false positives.
The check for timestamps are too early to handle SYN-ACK correctly.
So move it down after the corresponing processing has been done.

PR:		216832
Obtained from:	antonfb@hesiod.org
MFC after:	1 week
2017-08-23 14:50:08 +00:00
Michael Tuexen
63ec505a4f Ensure inp_vflag is consistently set for TCP endpoints.
Make sure that the flags INP_IPV4 and INP_IPV6 are consistently set
for inpcbs used for TCP sockets, no matter if the setting is derived
from the net.inet6.ip6.v6only sysctl or the IPV6_V6ONLY socket option.
For UDP this was already done right.

PR:		221385
MFC after:	1 week
2017-08-18 07:27:15 +00:00
Oleg Bulyzhin
ff21796d25 Fix comment typo. 2017-08-09 10:46:34 +00:00
Dag-Erling Smørgrav
d80fbbee5f Correct sysctl names. 2017-08-09 07:24:58 +00:00
Bjoern A. Zeeb
ae69ad884d After inpcb route caching was put back in place there is no need for
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).

Reviewed by:		np
Differential Revision:	https://reviews.freebsd.org/D11448
2017-07-27 13:03:36 +00:00
Ed Maste
1edbb54fe9 cc_cubic: restore braces around if-condition block
r307901 was reverted in r321480, restoring an incorrect block
delimitation bug present in the original cc_cubic commit. Restore
only the bugfix (brace addition) from r307901.

CID:		1090182
Approved by:	sbruno
2017-07-26 21:23:09 +00:00
Sean Bruno
43053c125a Revert r307901 - Inform CC modules about loss events.
This was discussed between various transport@ members and it was
requested to be reverted and discussed.

Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
Reported by:	lawrence
Reviewed by:	hiren
Sponsored by:	Limelight Networks
2017-07-25 15:08:52 +00:00
Sean Bruno
5d53981a18 Revert r308180 - Set slow start threshold more accurrately on loss ...
This was discussed between various transport@ members and it was
requested to be reverted and discussed.

Submitted by:	kevin
Reported by:	lawerence
Reviewed by:	hiren
2017-07-25 15:03:05 +00:00
Michael Tuexen
e5a9c519bc Remove duplicate statement. 2017-07-25 11:05:53 +00:00
Michael Tuexen
9dd6ca9602 Deal with listening socket correctly. 2017-07-20 14:50:13 +00:00
Michael Tuexen
bbc9dfbc08 Fix the explicit EOR mode. If the final messages is not complete, send
an ABORT.
Joint work with rrs@
MFC after:	1 week
2017-07-20 11:09:33 +00:00
Michael Tuexen
1f76872c36 Avoid shadowed variables.
MFC after:	1 week
2017-07-19 15:12:23 +00:00
Michael Tuexen
5ba7f91f9d Use memset/memcpy instead of bzero/bcopy.
Just use one variant instead of both. Use the memset/memcpy
ones since they cause less problems in crossplatform deployment.

MFC after:	1 week
2017-07-19 14:28:58 +00:00
Michael Tuexen
28cd0699b6 Fix the accounting and add code to detect errors in accounting.
Joint work with rrs@
MFC after:	1 week
2017-07-19 12:27:40 +00:00
Michael Tuexen
d32ed2c735 Fix the handling of Explicit EOR mode.
While there, appropriately handle the overhead depending on
the usage of DATA or I-DATA chunks. Take the overhead only
into account, when required.

Joint work with rrs@
MFC after:	1 week
2017-07-15 19:54:03 +00:00
Konstantin Belousov
5cead59181 Correct sysent flags for dynamically loaded syscalls.
Using the https://github.com/google/capsicum-test/ suite, the
PosixMqueue.CapModeForked test was failing due to an ECAPMODE after
calling kmq_notify(). On further inspection, the dynamically
loaded syscall entry was initialized with sy_flags zeroed out, since
SYSCALL_INIT_HELPER() left sysent.sy_flags with the default value.

Add a new helper SYSCALL{,32}_INIT_HELPER_F() which takes an
additional argument to specify the sy_flags value.

Submitted by:	Siva Mahadevan <smahadevan@freebsdfoundation.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D11576
2017-07-14 09:34:44 +00:00