freebsd-nq

Author	SHA1	Message	Date
Andrey V. Elsukov	7a60a91011	Add missing check to fix the build with IPSEC_SUPPORT and without MAC. Submitted by: netchild	2017-02-14 21:33:10 +00:00
Andrey V. Elsukov	627c036f65	Remove IPsec related PCB code from SCTP. The inpcb structure has inp_sp pointer that is initialized by ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec security policies associated with a specific socket. An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options to configure these security policies. Then ip[6]_output() uses inpcb pointer to specify that an outgoing packet is associated with some socket. And IPSEC_OUTPUT() method can use a security policy stored in the inp_sp. For inbound packet the protocol-specific input routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms to inbound security policy configured in the inpcb. SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends packets. Thus IPSEC_OUTPUT() method does not consider such packets as associated with some socket and can not apply security policies from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY() method is called from protocol-specific input routine, it can specify inpcb pointer and associated with socket inbound policy will be checked. But there are two problems: 1. Such check is asymmetric, becasue we can not apply security policy from inpcb for outgoing packet. 2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and access to inp_sp is protected. But for SCTP this is not correct, becasue SCTP uses own locks to protect inpcb. To fix these problems remove IPsec related PCB code from SCTP. This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options will be not applicable to SCTP sockets. To be able correctly check inbound security policies for SCTP, mark its protocol header with the PR_LASTHDR flag. Reported by: tuexen Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D9538	2017-02-13 11:37:52 +00:00
Ermal Luçi	c10c5b1eba	Committed without approval from mentor. Reported by: gnn	2017-02-12 06:56:33 +00:00
Ryan Stone	5ede40dcf2	Don't zero out srtt after excess retransmits If the TCP stack has retransmitted more than 1/4 of the total number of retransmits before a connection drop, it decides that its current RTT estimate is hopelessly out of date and decides to recalculate it from scratch starting with the next ACK. Unfortunately, it implements this by zeroing out the current RTT estimate. Drop this hack entirely, as it makes it significantly more difficult to debug connection issues. Instead check for excessive retransmits at the point where srtt is updated from an ACK being received. If we've exceeded 1/4 of the maximum retransmits, discard the previous srtt estimate and replace it with the latest rtt measurement. Differential Revision: https://reviews.freebsd.org/D9519 Reviewed by: gnn Sponsored by: Dell EMC Isilon	2017-02-11 17:05:08 +00:00
Gleb Smirnoff	cfff3743cd	Move tcp_fields_to_net() static inline into tcp_var.h, just below its friend tcp_fields_to_host(). There is third party code that also uses this inline. Reviewed by: ae	2017-02-10 17:46:26 +00:00
Ermal Luçi	e97b60264d	Fix build after r313524 Reported-by: ohartmann@walstatt.org	2017-02-10 06:01:47 +00:00
Ermal Luçi	4616026faf	Revert r313527 Heh svn is not git	2017-02-10 05:58:16 +00:00
Ermal Luçi	c0fadfdbbf	Correct missed variable name. Reported-by: ohartmann@walstatt.org	2017-02-10 05:51:39 +00:00
Ermal Luçi	ed55edceef	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Sponsored-by: rsync.net Differential Revision: D9235 Reviewed-by: adrian	2017-02-10 05:16:14 +00:00
Eric van Gyzen	edf0313b70	Fix garbage IP addresses in UDP log_in_vain messages If multiple threads emit a UDP log_in_vain message concurrently, the IP addresses could be garbage due to concurrent usage of a single string buffer inside inet_ntoa(). Use inet_ntoa_r() with two stack buffers instead. Reported by: Mark Martinec <Mark.Martinec+freebsd@ijs.si> MFC after: 3 days Relnotes: yes Sponsored by: Dell EMC	2017-02-07 18:57:57 +00:00
Andrey V. Elsukov	fcf596178b	Merge projects/ipsec into head/. Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352	2017-02-06 08:49:57 +00:00
Patrick Kelsey	ec93ed8d95	Fix VIMAGE-related bugs in TFO. The autokey callout vnet context was not being initialized, and the per-vnet fastopen context was only being initialized for the default vnet. PR: 216613 Reported by: Alex Deiter <alex dot deiter at gmail dot com> MFC after: 1 week	2017-02-03 17:02:57 +00:00
George V. Neville-Neil	82988b50a1	Add an mbuf to ipinfo_t translator to finish cleanup of mbuf passing to TCP probes. Reviewed by: markj MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9401	2017-02-01 19:33:00 +00:00
Michael Tuexen	c03627fd06	Ensure that the variable bail is always initialized before used. MFC after: 1 week	2017-02-01 00:10:29 +00:00
Michael Tuexen	2aa116007c	Take the SCTP common header into account when computing the space available for chunks. This unbreaks the handling of ICMPV6 packets indicating "packet too big". It just worked for IPv4 since we are overbooking for IPv4. MFC after: 1 week	2017-01-31 23:36:31 +00:00
Michael Tuexen	7858d7cb8e	Remove a duplicate debug statement. MFC after: 1 week	2017-01-31 23:34:02 +00:00
Cy Schubert	3df96ee68e	Correct comment grammar and make it easier to understand. MFC after: 1 week	2017-01-30 04:51:18 +00:00
Hiren Panchasara	6134aabe38	Add a knob to change default behavior of inheriting listen socket's tcp stack regardless of what the default stack for the system is set to. With current/default behavior, after changing the default tcp stack, the application needs to be restarted to pick up that change. Setting this new knob net.inet.tcp.functions_inherit_listen_socket_stack to '0' would change that behavior and make any new connection use the newly selected default tcp stack. Reviewed by: rrs MFC after: 2 weeks Sponsored by: Limelight Networks	2017-01-27 23:10:46 +00:00
Luiz Otavio O Souza	338e227ac0	After the in_control() changes in r257692, an existing address is (intentionally) deleted first and then completely added again (so all the events, announces and hooks are given a chance to run). This cause an issue with CARP where the existing CARP data structure is removed together with the last address for a given VHID, which will cause a subsequent fail when the address is later re-added. This change fixes this issue by adding a new flag to keep the CARP data structure when an address is not being removed. There was an additional issue with IPv6 CARP addresses, where the CARP data structure would never be removed after a change and lead to VHIDs which cannot be destroyed. Reviewed by: glebius Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-25 19:04:08 +00:00
Michael Tuexen	bd60638c98	Fix a bug where the overhead of the I-DATA chunk was not considered. MFC after: 1 week	2017-01-24 21:30:31 +00:00
Hans Petter Selasky	f3e7afe2d7	Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months	2017-01-18 13:31:17 +00:00
Maxim Sobolev	339efd75a4	Add a new socket option SO_TS_CLOCK to pick from several different clock sources to return timestamps when SO_TIMESTAMP is enabled. Two additional clock sources are: o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME); o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC). In addition to this, this option provides unified interface to get bintime (equivalent of using SO_BINTIME), except it also supported with IPv6 where SO_BINTIME has never been supported. The long term plan is to depreciate SO_BINTIME and move everything to using SO_TS_CLOCK. Idea for this enhancement has been briefly discussed on the Net session during dev summit in Ottawa last June and the general input was positive. This change is believed to benefit network benchmarks/profiling as well as other scenarios where precise time of arrival measurement is necessary. There are two regression test cases as part of this commit: one extends unix domain test code (unix_cmsg) to test new SCM_XXX types and another one implementis totally new test case which exchanges UDP packets between two processes using both conventional methods (i.e. calling clock_gettime(2) before recv(2) and after send(2)), as well as using setsockopt()+recv() in receive path. The resulting delays are checked for sanity for all supported clock types. Reviewed by: adrian, gnn Differential Revision: https://reviews.freebsd.org/D9171	2017-01-16 17:46:38 +00:00
Conrad Meyer	1d64db52f3	Fix a variety of cosmetic typos and misspellings No functional change. PR: 216096, 216097, 216098, 216101, 216102, 216106, 216109, 216110 Reported by: Bulat <bltsrc at mail.ru> Sponsored by: Dell EMC Isilon	2017-01-15 18:00:45 +00:00
Gleb Smirnoff	0f7ddf91e9	Use getsock_cap() instead of deprecated fgetsock(). Reviewed by: tuexen	2017-01-13 16:54:44 +00:00
Michael Tuexen	24209f0122	Ensure that the buffer length and the length provided in the IPv4 header match when using a raw socket to send IPv4 packets and providing the header. If they don't match, let send return -1 and set errno to EINVAL. Before this patch is was only enforced that the length in the header is not larger then the buffer length. PR: 212283 Reviewed by: ae, gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9161	2017-01-13 10:55:26 +00:00
Maxim Sobolev	5e946c03c7	Fix slight type mismatch between so_options defined in sys/socketvar.h and tw_so_options defined here which is supposed to be a copy of the former (short vs u_short respectively). Switch tw_so_options to be "signed short" to match the type of the field it's inherited from.	2017-01-12 10:14:54 +00:00
Hiren Panchasara	b8a2fb91f6	sysctl net.inet.tcp.hostcache.list in a jail can see connections from other jails and the host. This commit fixes it. PR: 200361 Submitted by: bz (original version), hiren (minor corrections) Reported by: Marcus Reid <marcus at blazingdot dot com> Reviewed by: bz, gnn Tested by: Lohith Bellad <lohithbsd at gmail dot com> MFC after: 1 week Sponsored by: Limelight Networks (minor corrections)	2017-01-05 17:22:09 +00:00
George V. Neville-Neil	fad073dd44	Followup to mtod removal in main stack (r311225). Continued removal of mtod() calls from TCP_PROBE macros. MFC after: 1 week Sponsored by: Limelight Networks	2017-01-04 04:00:28 +00:00
George V. Neville-Neil	2b9c998413	Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary and dangerous. Those wanting data from an mbuf should use DTrace itself to get the data. PR: 203409 Reviewed by: hiren MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D9035	2017-01-04 02:19:13 +00:00
Enji Cooper	cfff8d3dbd	Unbreak ip_carp with WITHOUT_INET6 enabled by conditionalizing all IPv6 structs under the INET6 #ifdef. Similarly (even though it doesn't seem to affect the build), conditionalize all IPv4 structs under the INET #ifdef This also unbreaks the LINT-NOINET6 tinderbox target on amd64; I have not verified other MACHINE/TARGET pairs (e.g. armv6/arm). MFC after: 2 weeks X-MFC with: r310847 Pointyhat to: jpaetzel Reported by: O. Hartmann <o.hartmann@walstatt.org>	2016-12-30 21:33:01 +00:00
Josh Paetzel	8151740c88	Harden CARP against network loops. If there is a loop in the network a CARP that is in MASTER state will see it's own broadcasts, which will then cause it to assume BACKUP state. When it assumes BACKUP it will stop sending advertisements. In that state it will no longer see advertisements and will assume MASTER... We can't catch all the cases where we are seeing our own CARP broadcast, but we can catch the obvious case. Submitted by: torek Obtained from: FreeNAS MFC after: 2 weeks Sponsored by: iXsystems	2016-12-30 18:46:21 +00:00
Andrey V. Elsukov	2e77d270c1	When we are sending IP fragments, update ip pointers in IP_PROBE() for each fragment. MFC after: 1 week	2016-12-29 19:57:46 +00:00
Michael Tuexen	2048d80aa3	Consistent handling of errors reported from the lower layer. MFC after: 3 days	2016-12-27 22:14:41 +00:00
Michael Tuexen	b7b84c0e02	Whitespace changes. The toolchain for processing the sources has been updated. No functional change. MFC after: 3 days	2016-12-26 11:06:41 +00:00
Michael Tuexen	d6194c562f	Remove a KASSERT which is not always true. In case of the empty queue tp->snd_holes and tcp_sackhole_insert() failing due to memory shortage, tp->snd_holes will be empty. This problem was hit when stress tests where performed by pho. PR: 215513 Reported by: pho Tested by: pho Sponsored by: Netflix, Inc.	2016-12-25 17:37:18 +00:00
Gleb Smirnoff	030b9c2f69	Remove assigned only variable.	2016-12-21 22:47:10 +00:00
Andrey V. Elsukov	ad9f4d6ab6	ip[6]_tryforward does inbound and outbound packet firewall processing. This can lead to change of mbuf pointer (packet filter could do m_pullup(), NAT, etc). Also in case of change of destination address, tryforward can decide that packet should be handled by local system. In this case modified mbuf can be returned to the ip[6]_input(). To handle this correctly, check M_FASTFWD_OURS flag after return from ip[6]_tryforward. And if it is present, update variables that depend from mbuf pointer and skip another inbound firewall processing. No objection from: #network MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D8764	2016-12-19 11:02:49 +00:00
Michael Tuexen	3d6fe5d84c	Fix the handling of buffered messages in stream reset deferred handling. Thanks to Eugen-Andrei Gavriloaie for reporting the issue and providing substantial help in nailing down the issue. MFC after: 1 week	2016-12-17 22:31:30 +00:00
Hiren Panchasara	b6ff672460	We currently don't do TSO if ip options are present. In case of IPv6, we look at in6p_options to check that. That is incorrect as we carry ip options in in6p_outputopts. Also, just checking for in6p_outputopts being NULL won't suffice as we combine ip options and ip header fields both in that one field. The commit fixes this by using ip6_optlen() which correctly calculates length of only ip options for IPv6. Reviewed by: ae, bz MFC after: 3 weeks Sponsored by: Limelight Networks	2016-12-11 23:14:47 +00:00
Michael Tuexen	8b9c95f4a9	Ensure that the reported ppid and tsn are taken from the first fragment. This fixes a bug where the wrong ppid was reported, if * I-DATA was used on the first fragement was not received first * DATA was used and different ppids where used. Thanks to Julian Cordes for making me aware of the issue. MFC after: 1 week	2016-12-11 13:26:35 +00:00
Gleb Smirnoff	8c70a35334	Fix build for 32-bit machines. Submitted by: tuexen	2016-12-09 20:50:35 +00:00
Gleb Smirnoff	3cbee8caa1	Use counter_ratecheck() in the ICMP rate limiting. Together with: rrs, jtl	2016-12-09 17:59:15 +00:00
Michael Tuexen	ebecdad811	Don't bundle a SACK chunk with a SHUTDOWN chunk if it is not required. MFC after: 1 week	2016-12-09 17:58:07 +00:00
Michael Tuexen	8d0a31e19c	Don't send multiple SHUTDOWN chunks in a single packet. Thanks to Felix Weinrank for making me aware of this issue. MFC after: 1 week	2016-12-09 17:57:17 +00:00
Michael Tuexen	b594081bdf	Silence a warning produced by newer versions of gcc. MFC after: 1 week	2016-12-07 22:01:09 +00:00
Michael Tuexen	49656eefc8	Cleanup the names of SSN, SID, TSN, FSN, PPID and MID. This made a couple of bugs visible in handling SSN wrap-arounds when using DATA chunks. Now bulk transfer seems to work fine... This fixes the issue reported in https://github.com/sctplab/usrsctp/issues/111 MFC after: 1 week	2016-12-07 19:30:59 +00:00
Michael Tuexen	5b495f17a5	Whitespace changes. The tools using to generate the sources has been updated and produces different whitespaces. Commit this seperately to avoid intermixing these with real code changes. MFC after: 3 days	2016-12-06 10:21:25 +00:00
Michael Tuexen	4ddd5aadea	Fix the handling of TCP FIN-segments in the CLOSED state When a TCP segment with the FIN bit set was received in the CLOSED state, a TCP RST-ACK-segment is sent. When computing SEG.ACK for this, the FIN counts as one byte. This accounting was missing and is fixed by this patch. Reviewed by: hiren MFC after: 1 month Sponsored by: Netflix, Inc. Differential Revision: https://svn.freebsd.org/base/head	2016-12-02 08:02:31 +00:00
Andrey V. Elsukov	dc9d21f8b0	Rework ip_tryforward() to use FIB4 KPI. Tested by: olivier Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D8526	2016-11-28 17:55:32 +00:00
Hiren Panchasara	2806b2933b	For RTT calculations mid-session, we explicitly ignore ACKs with tsecr of 0 as many borken middle-boxes tend to do that. But during 3whs, in syncache_expand(), we don't do that which causes us to send a RST to such a client. Relax this constraint by only using tsecr to compare against timestamp that we sent when it is not 0. As a result, we'd now accept the final ACK of 3whs with tsecr of 0. Reviewed by: jtl, gnn Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8552	2016-11-21 20:53:11 +00:00

1 2 3 4 5 ...

5709 Commits