freebsd-nq

Author	SHA1	Message	Date
Hans Petter Selasky	95ed5015ec	Add support for generic backpressure indicator for ratelimited transmit queues aswell as non-ratelimited ones. Add the required structure bits in order to support a backpressure indication with ratelimited connections aswell as non-ratelimited ones. The backpressure indicator is a value between zero and 65535 inclusivly, indicating if the destination transmit queue is empty or full respectivly. Applications can use this value as a decision point for when to stop transmitting data to avoid endless ENOBUFS error codes upon transmitting an mbuf. This indicator is also useful to reduce the latency for ratelimited queues. Reviewed by: gallatin, kib, gnn Differential Revision: https://reviews.freebsd.org/D11518 Sponsored by: Mellanox Technologies	2017-09-06 13:56:18 +00:00
Michael Tuexen	3d5af7a127	Fix blackhole detection. There were two bugs related to the blackhole detection: * The smalles size was tried more than two times. * The restored MSS was not the original one, but the second candidate. MFC after: 1 week Sponsored by: Netflix, Inc.	2017-08-28 11:41:18 +00:00
Sean Bruno	32a04bb81d	Use counter(9) for PLPMTUD counters. Remove unused PLPMTUD sysctl counters. Bump UPDATING and FreeBSD Version to indicate a rebuild is required. Submitted by: kevin.bowling@kev009.com Reviewed by: jtl Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12003	2017-08-25 19:41:38 +00:00
Michael Tuexen	e8aba2eb21	Avoid TCP log messages which are false positives. This is https://svnweb.freebsd.org/changeset/base/322812, just for alternate TCP stacks. XMFC with: 322812	2017-08-23 15:08:51 +00:00
Michael Tuexen	4da9052a05	Avoid TCP log messages which are false positives. The check for timestamps are too early to handle SYN-ACK correctly. So move it down after the corresponing processing has been done. PR: 216832 Obtained from: antonfb@hesiod.org MFC after: 1 week	2017-08-23 14:50:08 +00:00
Michael Tuexen	63ec505a4f	Ensure inp_vflag is consistently set for TCP endpoints. Make sure that the flags INP_IPV4 and INP_IPV6 are consistently set for inpcbs used for TCP sockets, no matter if the setting is derived from the net.inet6.ip6.v6only sysctl or the IPV6_V6ONLY socket option. For UDP this was already done right. PR: 221385 MFC after: 1 week	2017-08-18 07:27:15 +00:00
Oleg Bulyzhin	ff21796d25	Fix comment typo.	2017-08-09 10:46:34 +00:00
Dag-Erling Smørgrav	d80fbbee5f	Correct sysctl names.	2017-08-09 07:24:58 +00:00
Bjoern A. Zeeb	ae69ad884d	After inpcb route caching was put back in place there is no need for flowtable anymore (as flowtable was never considered to be useful in the forwarding path). Reviewed by: np Differential Revision: https://reviews.freebsd.org/D11448	2017-07-27 13:03:36 +00:00
Ed Maste	1edbb54fe9	cc_cubic: restore braces around if-condition block r307901 was reverted in r321480, restoring an incorrect block delimitation bug present in the original cc_cubic commit. Restore only the bugfix (brace addition) from r307901. CID: 1090182 Approved by: sbruno	2017-07-26 21:23:09 +00:00
Sean Bruno	43053c125a	Revert r307901 - Inform CC modules about loss events. This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reported by: lawrence Reviewed by: hiren Sponsored by: Limelight Networks	2017-07-25 15:08:52 +00:00
Sean Bruno	5d53981a18	Revert r308180 - Set slow start threshold more accurrately on loss ... This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: kevin Reported by: lawerence Reviewed by: hiren	2017-07-25 15:03:05 +00:00
Michael Tuexen	e5a9c519bc	Remove duplicate statement.	2017-07-25 11:05:53 +00:00
Michael Tuexen	9dd6ca9602	Deal with listening socket correctly.	2017-07-20 14:50:13 +00:00
Michael Tuexen	bbc9dfbc08	Fix the explicit EOR mode. If the final messages is not complete, send an ABORT. Joint work with rrs@ MFC after: 1 week	2017-07-20 11:09:33 +00:00
Michael Tuexen	1f76872c36	Avoid shadowed variables. MFC after: 1 week	2017-07-19 15:12:23 +00:00
Michael Tuexen	5ba7f91f9d	Use memset/memcpy instead of bzero/bcopy. Just use one variant instead of both. Use the memset/memcpy ones since they cause less problems in crossplatform deployment. MFC after: 1 week	2017-07-19 14:28:58 +00:00
Michael Tuexen	28cd0699b6	Fix the accounting and add code to detect errors in accounting. Joint work with rrs@ MFC after: 1 week	2017-07-19 12:27:40 +00:00
Michael Tuexen	d32ed2c735	Fix the handling of Explicit EOR mode. While there, appropriately handle the overhead depending on the usage of DATA or I-DATA chunks. Take the overhead only into account, when required. Joint work with rrs@ MFC after: 1 week	2017-07-15 19:54:03 +00:00
Konstantin Belousov	5cead59181	Correct sysent flags for dynamically loaded syscalls. Using the https://github.com/google/capsicum-test/ suite, the PosixMqueue.CapModeForked test was failing due to an ECAPMODE after calling kmq_notify(). On further inspection, the dynamically loaded syscall entry was initialized with sy_flags zeroed out, since SYSCALL_INIT_HELPER() left sysent.sy_flags with the default value. Add a new helper SYSCALL{,32}_INIT_HELPER_F() which takes an additional argument to specify the sy_flags value. Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11576	2017-07-14 09:34:44 +00:00
Jonathan T. Looney	cb503ae22d	Don't overpromote values when calculating len in tcp_output(). sbavail() returns u_int and sendwin is a uint32_t. Therefore, min() (which operates on two u_int values) is able to correctly calculate the minimum of these two arguments. Reported by: rrs MFC after: 1 week Sponsored by: Netflix	2017-07-05 16:10:30 +00:00
Michael Tuexen	1698cbd919	Move to open state after plausibility checks. When doing this too early, the MIB counters go wrong. MFC after: 1 week	2017-07-04 18:24:50 +00:00
Michael Tuexen	afffa1a9ad	Don't hold if refcount on an stcb when it is not needed. This improves the consistency with other parts of the code.	2017-07-04 18:04:44 +00:00
Sean Bruno	ac952dd274	Add a sysctl to toggle the use of the sockets LOWAT when calculating auto window growth Submitted by: j@nitrology.com (Jason Wolfe) Reviewed by: gnn hiren Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11016	2017-07-03 19:39:58 +00:00
Michael Tuexen	f4358911bf	Handle sctp_get_next_param() in a consistent way. This addresses an issue found by Felix Weinrank using libfuzz. While there, use also consistent nameing. MFC after: 3 days	2017-06-23 21:01:57 +00:00
Michael Tuexen	d44b45df2c	Check the length of a COOKIE chunk before accessing fields in it. Thanks to Felix Weinrank for reporting the issue he found by using libFuzzer. MFC after: 3 days	2017-06-23 10:09:49 +00:00
Michael Tuexen	1a7abbb3be	Use a longer buffer for messages in ERROR chunks. This allows them to be sent in a non truncated way and addresses a warning given by newver versions of gcc. Thanks to Anselm Jonas Scholl for reporting it and providing a patch.	2017-06-23 09:27:31 +00:00
Michael Tuexen	94f66d603a	Honor the backlog field.	2017-06-23 08:35:54 +00:00
Michael Tuexen	3017b21bb6	Improve compilation on platforms different from FreeBSD.	2017-06-23 08:34:01 +00:00
Gleb Smirnoff	779f106aa1	Listening sockets improvements. o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. - Take out selinfo's from the socket buffers into the socket. The first reason is to support braindamaged scenario when a socket is added to kevent(2) and then listen(2) is cast on it. The second reason is that there is future plan to make socket buffers pluggable, so that for a dataflow socket a socket buffer can be changed, and in this case we also want to keep same selinfos through the lifetime of a socket. - Remove struct struct so_accf. Since now listening stuff no longer affects struct socket size, just move its fields into listening part of the union. - Provide sol_upcall field and enforce that so_upcall_set() may be called only on a dataflow socket, which has buffers, and for listening sockets provide solisten_upcall_set(). o Remove ACCEPT_LOCK() global. - Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer. - Allow to acquire two socket locks, but the first one must belong to a listening socket. - Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally. - Most protocols aren't touched by this change, except UNIX local sockets. See below for more information. o Reduce copy-and-paste in kernel modules that accept connections from listening sockets: provide function solisten_dequeue(), and use it in the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4), infiniband, rpc. o UNIX local sockets. - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock. - To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets. - Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket. - Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()? Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9770	2017-06-08 21:30:34 +00:00
Jonathan T. Looney	dc6a41b936	Add the infrastructure to support loading multiple versions of TCP stack modules. It adds support for mangling symbols exported by a module by prepending a string to them. (This avoids overlapping symbols in the kernel linker.) It allows the use of a macro as the module name in the DECLARE_MACRO() and MACRO_VERSION() macros. It allows the code to register stack aliases (e.g. both a generic name ["default"] and version-specific name ["default_10_3p1"]). With these changes, it is trivial to compile TCP stack modules with the name defined in the Makefile and to load multiple versions of the same stack simultaneously. This functionality can be used to enable side-by-side testing of an old and new version of the same TCP stack. It also could support upgrading the TCP stack without a reboot. Reviewed by: gnn, sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D11086	2017-06-08 20:41:28 +00:00
Gleb Smirnoff	3acfe1e1b0	This code was missing socket unlock and socket buffer lock, but it worked since right now these two locks are the same.	2017-06-08 06:37:11 +00:00
Gleb Smirnoff	12d8a8e7a3	The desired lock here is socket buffer, not socket. Right now they match, but won't in future.	2017-06-08 06:34:09 +00:00
Michael Tuexen	8cb5a8e90a	Fix the ICMP6 handling for TCP. The ICMP6 packets might not be contained in a single mbuf. So don't assume this. Keep the IPv4 and IPv6 code in sync and make explicit that the syncache code only need the TCP sequence number, not the complete TCP header. MFC after: 3 days Sponsored by: Netflix, Inc.	2017-06-03 21:53:58 +00:00
Michael Tuexen	98732609d5	Improve comments to describe what the code does. Reported by: jtl Sponsored by: Netflix, Inc.	2017-06-01 15:11:18 +00:00
Jonathan T. Looney	382a6bbcf1	Enforce the limit on ICMP messages before doing work to formulate the response. Delete an unneeded rate limit for UDP under IPv6. Because ICMP6 messages have their own rate limit, it is unnecessary to apply a second rate limit to UDP messages. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D10387	2017-05-30 14:32:44 +00:00
Michael Tuexen	5d08768a2b	Use the SCTP_PCB_FLAGS_ACCEPTING flags to check for listeners. While there, use a macro for checking the listen state to allow for easier changes if required. This done to help glebius@ with his listen changes.	2017-05-26 16:29:00 +00:00
Gleb Smirnoff	6a6cefac3d	o Rearrange struct inpcb fields to optimize the TCP output code path considering cache line hits and misses. Put the lock and hash list glue into the first cache line, put inp_refcount inp_flags inp_socket into the second cache line. o On allocation zero out entire structure except the lock and list entries, including inp_route inp_lle inp_gencnt. When inp_route and inp_lle were introduced, they were added below inp_zero_size, resulting on not being cleared after free/alloc. This definitely was a source of bugs with route caching. Could be that r315956 has just fixed one of them. The inp_gencnt is reinitialized on every alloc, so it is safe to clear it. This has been proved to improve TCP performance at Netflix. Obtained from: rrs Differential Revision: D10686	2017-05-24 17:47:16 +00:00
Michael Tuexen	5dba6ada91	The connect() system call should return -1 and set errno to EAFNOSUPPORT if it is called on a TCP socket * with an IPv6 address and the socket is bound to an IPv4-mapped IPv6 address. * with an IPv4-mapped IPv6 address and the socket is bound to an IPv6 address. Thanks to Jonathan T. Leighton for reporting this issue. Reviewed by: bz gnn MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D9163	2017-05-22 15:29:10 +00:00
Andrey V. Elsukov	38cc96a887	Set M_BCAST and M_MCAST flags on mbuf sent via divert socket. r290383 has changed how mbufs sent by divert socket are handled. Previously they are always handled by slow path processing in ip_input(). Now ip_tryforward() is invoked from ip_input() before in_broadcast() check. Since diverted packet lost all mbuf flags, it passes the broadcast check in ip_tryforward() due to missing M_BCAST flag. In the result the broadcast packet is forwarded to the wire instead of be consumed by network stack. Add in_broadcast() check to the div_output() function. And restore the M_BCAST flag if destination address is broadcast for the given network interface. PR: 209491 MFC after: 1 week	2017-05-17 09:04:09 +00:00
Ed Maste	3e85b721d6	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193	2017-05-17 00:34:34 +00:00
Gleb Smirnoff	cc487c1697	Reduce in_pcbinfo_init() by two params. No users supply any flags to this function (they used to say UMA_ZONE_NOFREE), so flag parameter goes away. The zone_fini parameter also goes away. Previously no protocols (except divert) supplied zone_fini function, so inpcb locks were leaked with slabs. This was okay while zones were allocated with UMA_ZONE_NOFREE flag, but now this is a leak. Fix that by suppling inpcb_fini() function as fini method for all inpcb zones.	2017-05-15 21:58:36 +00:00
Enji Cooper	bd7459366e	Add missing braces around MCAST_EXCLUDE check when KTR support is compiled into the kernel This ensures that .iss_asm (the number of ASM listeners) isn't incorrectly decremented for MLD-layer source datagrams when inspecting im*s_st[1] (the second state in the structure). MFC after: 2 months PR: 217509 [1] Reported by: Coverity (Isilon) Reviewed by: ae ("This patch looks correct to me." [1]) Submitted by: Miles Ohlrich <miles.ohlrich@isilon.com> Sponsored by: Dell EMC Isilon	2017-05-13 18:41:24 +00:00
Gleb Smirnoff	7637c57ee1	There is no good reason for TCP reassembly zone to be UMA_ZONE_NOFREE. It has strong locking model, doesn't have any timers associated with entries. The entries theirselves are referenced only from the tcpcb zone, which itself is a normal zone, without the UMA_ZONE_NOFREE flag.	2017-05-10 23:32:31 +00:00
Eugene Grosbein	1a356b8b90	ipfw nat and natd support multiple aliasing instances with "nat global" feature that chooses right alias_address for outgoing packets that already have corresponding state in one of aliasing instances. This feature works just fine for ICMP, UDP, TCP and SCTP packes but not for others. For example, outgoing PPtP/GRE packets always get alias_address of latest configured instance no matter whether such packets have corresponding state or not. This change unbreaks translation of transit PPtP/GRE connections for "nat global" case fixing a bug in static ProtoAliasOut() function that ignores its "create" argument and performs translation regardless of its value. This static function is called only by LibAliasOutLocked() function and only for packers other than ICMP, UDP, TCP and SCTP. LibAliasOutLocked() passes its "create" argument unmodified. We have only two consumers of LibAliasOutLocked() in the source tree calling it with "create" unequal to 1: "ipfw nat global" code and similar natd code having same problem. All other consumers of LibAliasOutLocked() call it with create = 1 and the patch is "no-op" for such cases. PR: 218968 Approved by: ae, vsevolod (mentor) MFC after: 1 week	2017-05-10 19:41:52 +00:00
Michael Tuexen	10e0318afa	Allow SCTP to use the hostcache. This patch allows the MTU stored in the hostcache to be used as an initial value for SCTP paths. When an ICMP PTB message is received, store the MTU in the hostcache. MFC after: 1 week	2017-04-29 19:20:50 +00:00
Michael Tuexen	4f43a14a85	Don't set the DF-bit on timer based retransmissions. MFC after: 1 week	2017-04-29 09:57:27 +00:00
Michael Tuexen	b6ecf43450	Set the DF bit for responses to out-of-the-blue packets. MFC after: 1 week	2017-04-28 15:38:34 +00:00
Michael Tuexen	d274bcc661	Fix an issue with MTU calculation if an ICMP messaeg is received for an SCTP/UDP packet. MFC after: 1 week	2017-04-26 20:21:05 +00:00
Michael Tuexen	6ebfa5ee14	Use consistently uint32_t for mtu values. This does not change functionality, but this cleanup is need for further improvements of ICMP handling. MFC after: 1 week	2017-04-26 19:26:40 +00:00

1 2 3 4 5 ...

5790 Commits