freebsd-dev

Author	SHA1	Message	Date
Michael Tuexen	f885296d70	Don't zero the stats before they are read out. MFC after: 3 days	2014-11-01 10:35:45 +00:00
Andrey V. Elsukov	1d904a55c8	Remove the check for packets with broadcast source from if_gif's encapcheck. The check was recommened in the draft-ietf-ngtrans-mech-05.txt. But it isn't clear, should it compare the source with all direct broadcast addresses in the system or not. RFC 4213 says it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. And this verification can be extended by administrator with any other forms of IPv4 ingress filtering. Discussed with: glebius, melifaro Sponsored by: Yandex LLC	2014-10-31 15:23:24 +00:00
Andrey V. Elsukov	7e4217558c	Fix typo.	2014-10-31 11:40:49 +00:00
Julien Charbon	cea40c4888	Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and tcp_tw_2msl_scan(). This race condition drives unplanned timewait timeout cancellation. Also simplify implementation by holding inpcb reference and removing tcptw reference counting. Differential Revision: https://reviews.freebsd.org/D826 Submitted by: Marc De la Gueronniere <mdelagueronniere@verisign.com> Submitted by: jch Reviewed By: jhb (mentor), adrian, rwatson Sponsored by: Verisign, Inc. MFC after: 2 weeks X-MFC-With: r264321	2014-10-30 08:53:56 +00:00
Hans Petter Selasky	0e1152fcc2	The SYSCTL data pointers can come from userspace and must not be directly accessed. Although this will work on some platforms, it can throw an exception if the pointer is invalid and then panic the kernel. Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-28 12:00:39 +00:00
Hans Petter Selasky	614b50ae8b	Preserve limitation of "TCP_CA_NAME_MAX" when matching the algorithm name. MFC after: 3 days Suggested by: gnn @	2014-10-27 16:08:41 +00:00
Hans Petter Selasky	60a945f95d	Make assignments to "net.inet.tcp.cc.algorithm" work by fixing a bad string comparison. MFC after: 3 days Reported by: Jukka Ukkonen <jau789@gmail.com> Sponsored by: Mellanox Technologies	2014-10-27 11:21:47 +00:00
Mateusz Guzik	e015b1ab0a	Avoid dynamic syscall overhead for statically compiled modules. The kernel tracks syscall users so that modules can safely unregister them. But if the module is not unloadable or was compiled into the kernel, there is no need to do this. Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC during kernel build and 0 otherwise. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-26 19:42:44 +00:00
Michael Tuexen	b3817112b4	Fix a use of an uninitialized variable by makeing sure that sctp_med_chunk_output() always initialized the reason_code instead of relying on the caller. The variable is only used for debugging purpose. This issue was reported by Peter Bostroem from Google. MFC after: 3 days	2014-10-25 09:25:29 +00:00
Andrey V. Elsukov	a663aa4ce8	Remove redundant check and m_pullup() call.	2014-10-24 13:34:22 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Michael Tuexen	84f3b49ac9	Fix the reported streams in a SCTP_STREAM_RESET_EVENT, if a sent incoming stream reset request was responded with failed or denied. Thanks to Peter Bostroem from Google for reporting the issue. MFC after: 3 days	2014-10-16 15:36:04 +00:00
Andrey V. Elsukov	0b9f5f8a5f	Overhaul if_gif(4): o convert to if_transmit; o use rmlock to protect access to gif_softc; o use sx lock to protect from concurrent ioctls; o remove a lot of unneeded and duplicated code; o remove cached route support (it won't work with concurrent io); o style fixes. Reviewed by: melifaro Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2014-10-14 13:31:47 +00:00
Sean Bruno	882ac53ed7	Handle small file case with regards to plpmtud blackhole detection. Submitted by: Mikhail <mp@lenta.ru> MFC after: 2 weeks Relnotes: yes	2014-10-13 21:06:21 +00:00
Sean Bruno	0f3e3bc526	Catch ipv6 case when attempting to do PLPMTUD blackhole detection. Submitted by: Mikhail <mp@lenta.ru> MFC after: 2 weeks Relnotes: yes	2014-10-13 21:05:29 +00:00
Alexander V. Chernikov	2930362fb1	Fix matching default rule on clear/show commands. Found by: Oleg Ginzburg	2014-10-13 13:49:28 +00:00
Julien Charbon	489dcc9262	A connection in TIME_WAIT state before calling close() actually did not received any RST packet. Do not set error to ECONNRESET in this case. Differential Revision: https://reviews.freebsd.org/D879 Reviewed by: rpaulo, adrian Approved by: jhb (mentor) Sponsored by: Verisign, Inc.	2014-10-12 23:01:25 +00:00
Robert Watson	f0cace5d94	When deciding whether to call m_pullup() even though there is adequate data in an mbuf, use M_WRITABLE() instead of a direct test of M_EXT; the latter both unnecessarily exposes mbuf-allocator internals in the protocol stack and is also insufficient to catch all cases of non-writability. (NB: m_pullup() does not actually guarantee that a writable mbuf is returned, so further refinement of all of these code paths continues to be required.) Reviewed by: bz MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D900	2014-10-12 15:49:52 +00:00
John Baldwin	585a4290ab	Update ip_divert.ko to depend on version 3 of ipfw.	2014-10-11 16:08:54 +00:00
Bryan Venteicher	81d3ec1763	Add context pointer and source address to the UDP tunnel callback These are needed for the forthcoming vxlan implementation. The context pointer means we do not have to use a spare pointer field in the inpcb, and the source address is required to populate vxlan's forwarding table. While I highly doubt there is an out of tree consumer of the UDP tunneling callback, this change may be a difficult to eventually MFC. Phabricator: https://reviews.freebsd.org/D383 Reviewed by: gnn	2014-10-10 06:08:59 +00:00
Bryan Venteicher	a0a9e1b57c	Add missing UDP multicast receive dtrace probes Phabricator: https://reviews.freebsd.org/D924 Reviewed by: rpaulo markj MFC after: 1 month	2014-10-09 22:36:21 +00:00
Michael Tuexen	e03159ea69	Ensure that the flags field of sctp_tmit_chunks is initialized. Thanks to Peter Bostroem from Google for reporting the issue. MFC after: 3 days	2014-10-09 20:08:12 +00:00
Alexander V. Chernikov	779b53d008	Sync to HEAD@r272825.	2014-10-09 15:35:28 +00:00
Marcel Moolenaar	80b47aefa1	Move the SCTP syscalls to netinet with the rest of the SCTP code. The syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout) Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:16:52 +00:00
Bryan Venteicher	c19f98eb74	Check for mbuf copy failure when there are multiple multicast sockets This partitular case is the only path where the mbuf could be NULL. udp_append() checked for a NULL mbuf only after invoking the tunneling callback. Our only in tree tunneling callback - SCTP - assumed a non NULL mbuf, and it is a bit odd to make the callbacks responsible for checking this condition. This also reduces the differences between the IPv4 and IPv6 code. MFC after: 1 month	2014-10-09 05:17:47 +00:00
Andrey V. Elsukov	5b7a43f546	When tunneling interface is going to insert mbuf into netisr queue after stripping outer header, consider it as new packet and clear the protocols flags. This fixes problems when IPSEC traffic goes through various tunnels and router doesn't send ICMP/ICMPv6 errors. PR: 174602 Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2014-10-08 21:23:34 +00:00
Michael Tuexen	9ba6106020	Ensure that the list of streams sent in a stream reset parameter fits in an mbuf-cluster. Thanks to Peter Bostroem for drawing my attention to this part of the code.	2014-10-08 15:30:59 +00:00
Michael Tuexen	e29127de2e	Ensure that the number of stream reported in srs_number_streams is consistent with the amount of data provided in the SCTP_RESET_STREAMS socket option. Thanks to Peter Bostroem from Google for drawing my attention to this part of the code.	2014-10-08 15:29:49 +00:00
Alexander V. Chernikov	be8bc45790	Add IP_FW_DUMP_SOPTCODES sopt to be able to determine which opcodes are currently available in kernel.	2014-10-08 11:12:14 +00:00
Sean Bruno	f6f6703f27	Implement PLPMTUD blackhole detection (RFC 4821), inspired by code from xnu sources. If we encounter a network where ICMP is blocked the Needs Frag indicator may not propagate back to us. Attempt to downshift the mss once to a preconfigured value. Default this feature to off for now while we do not have a full PLPMTUD implementation in our stack. Adds the following new sysctl's for control: net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature net.inet.tcp.pmtud_blackhole_mss -- mss to try for ipv4 net.inet.tcp.v6pmtud_blackhole_mss -- mss to try for ipv6 Adds the following new sysctl's for monitoring: -- Number of times the code was activated to attempt a mss downshift net.inet.tcp.pmtud_blackhole_activated -- Number of times the blackhole mss was used in an attempt to downshift net.inet.tcp.pmtud_blackhole_min_activated -- Number of times that we failed to connect after we downshifted the mss net.inet.tcp.pmtud_blackhole_failed Phabricator: https://reviews.freebsd.org/D506 Reviewed by: rpaulo bz MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks	2014-10-07 21:50:28 +00:00
Alexander V. Chernikov	a5fedf11fc	Sync to HEAD@r272609.	2014-10-06 11:29:50 +00:00
Hans Petter Selasky	b228e6bf57	Minor code styling. Suggested by: glebius @	2014-10-06 06:19:54 +00:00
Michael Tuexen	041353aba4	Remove unused MC_ALIGN macro as suggested by Robert. MFC after: 1 week	2014-10-05 20:30:49 +00:00
Robert Watson	6c572040c6	Eliminate use of M_EXT in IP6_EXTHDR_CHECK() by trimming a redundant 'if'/'else' case: it matches the simple 'else' case that follows. This reduces awareness of external-storage mechanics outside of the mbuf allocator. Reviewed by: bz MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D900	2014-10-05 06:28:53 +00:00
Alexander V. Chernikov	1ce4b35740	Sync to HEAD@r272516.	2014-10-04 12:42:37 +00:00
Hiroki Sato	9c57a5b630	Add an additional routing table lookup when m->m_pkthdr.fibnum is changed at a PFIL hook in ip{,6}_output(). IPFW setfib rule did not perform a routing table lookup when the destination address was not changed. CR: D805	2014-10-02 00:25:57 +00:00
Mark Johnston	00cb6bef99	Add a sysctl, net.inet.icmp.tstamprepl, which can be used to disable replies to ICMP Timestamp packets. PR: 193689 Submitted by: Anthony Cornehl <accornehl@gmail.com> MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division	2014-10-01 18:07:34 +00:00
Alexander V. Chernikov	31f0d081d8	Remove lock init from radix.c. Radix has never managed its locking itself. The only consumer using radix with embeded rwlock is system routing table. Move per-AF lock inits there.	2014-10-01 14:39:06 +00:00
Michael Tuexen	83e95fb30b	The default for UDPLITE_RECV_CSCOV is zero. RFC 3828 recommend that this means full checksum coverage for received packets. If an application is willing to accept packets with partial coverage, it is expected to use the socekt option and provice the minimum coverage it accepts. Reviewed by: kevlo MFC after: 3 days	2014-10-01 05:43:29 +00:00
Michael Tuexen	c6d81a3445	UDPLite requires a checksum. Therefore, discard a received packet if the checksum is 0. MFC after: 3 days	2014-09-30 20:29:58 +00:00
Michael Tuexen	0f4a03663b	If the checksum coverage field in the UDPLITE header is the length of the complete UDPLITE packet, the packet has full checksum coverage. SO fix the condition. Reviewed by: kevlo MFC after: 3 days	2014-09-30 18:17:28 +00:00
John Baldwin	a9456c081a	Only define the full inm_print() if KTR_IGMPV3 is enabled at compile time.	2014-09-30 17:26:34 +00:00
Michael Tuexen	03f90784bf	Checksum coverage values larger than 65535 for UDPLite are invalid. Check for this when the user calls setsockopt using UDPLITE_{SEND,RECV}CSCOV. Reviewed by: kevlo MFC after: 3 days	2014-09-28 17:22:45 +00:00
Alexander V. Chernikov	29c47f18da	* Split tcp_signature_compute() into 2 pieces: - tcp_get_sav() - SADB key lookup - tcp_signature_do_compute() - actual computation * Fix TCP signature case for listening socket: do not assume EVERY connection coming to socket with TCP_SIGNATURE set to be md5 signed regardless of SADB key existance for particular address. This fixes the case for routing software having _some_ BGP sessions secured by md5. * Simplify TCP_SIGNATURE handling in tcp_input() MFC after: 2 weeks	2014-09-27 07:04:12 +00:00
Adrian Chadd	3aac064c2f	Remove an un-needed bit of pre-processor work - it all lives inside #ifdef RSS.	2014-09-27 05:14:02 +00:00
John-Mark Gurney	469c4e0465	drop unnecessary ifdef IPSEC's. This file is only compiled when IPSEC is defined... Differential Revision: D839 Reviewed by: bz, glebius, gnn Sponsered by: EuroBSDCon DevSummit	2014-09-26 12:48:54 +00:00
Navdeep Parhar	5acf7269da	Catch up with r271119.	2014-09-24 20:12:40 +00:00
Hans Petter Selasky	9fd573c39d	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week	2014-09-22 08:27:27 +00:00
Hiroki Sato	cc45ae406d	Add a change missing in r271916.	2014-09-21 04:38:50 +00:00
Hiroki Sato	89c58b73e0	- Virtualize interface cloner for gre(4). This fixes a panic when destroying a vnet jail which has a gre(4) interface. - Make net.link.gre.max_nesting vnet-local.	2014-09-21 03:56:06 +00:00
Gleb Smirnoff	32c7c51c2a	Mechanically convert to if_inc_counter().	2014-09-19 10:19:51 +00:00
Gleb Smirnoff	22bfa4f5b1	Remove disabled code, that is very unlikely to be ever enabled again, as well as the comment that explains why is it disabled.	2014-09-19 05:23:47 +00:00
Alan Somers	58a39d8c5b	Fix source address selection on unbound sockets in the presence of multiple fibs. Use the mbuf's or the socket's fib instead of RT_ALL_FIBS. Fixes PR 187553. Also fixes netperf's UDP_STREAM test on a nondefault fib. sys/netinet/ip_output.c In ip_output, lookup the source address using the mbuf's fib instead of RT_ALL_FIBS. sys/netinet/in_pcb.c in in_pcbladdr, lookup the source address using the socket's fib, because we don't seem to have the mbuf fib. They should be the same, though. tests/sys/net/fibs_test.sh Clear the expected failure on udp_dontroute. PR: 187553 CR: https://reviews.freebsd.org/D772 MFC after: 3 weeks Sponsored by: Spectra Logic	2014-09-16 15:28:19 +00:00
Michael Tuexen	b60b0fe6fd	Add a explict cast to silence a warning when building the userland stack on Windows. This issue was reported by Peter Kasting from Google. MFC after: 3 days	2014-09-16 14:39:24 +00:00
Michael Tuexen	47b80412cd	Use a consistent type for the number of HMAC algorithms. This fixes a bug which resulted in a warning on the userland stack, when compiled on Windows. Thanks to Peter Kasting from Google for reporting the issue and provinding a potential fix. MFC after: 3 days	2014-09-16 14:20:33 +00:00
Michael Tuexen	667eb48763	Small cleanup which addresses a warning regaring the truncation of a 64-bit entity to a 32-bit entity. This issue was reported by Peter Kasting from Google. MFC after: 3 days	2014-09-16 13:48:46 +00:00
Gleb Smirnoff	3220a2121c	FreeBSD-SA-14:19.tcp raised attention to the state of our stack towards blind SYN/RST spoofed attack. Originally our stack used in-window checks for incoming SYN/RST as proposed by RFC793. Later, circa 2003 the RST attack was mitigated using the technique described in P. Watson "Slipping in the window" paper [1]. After that, the checks were only relaxed for the sake of compatibility with some buggy TCP stacks. First, r192912 introduced the vulnerability, just fixed by aforementioned SA. Second, r167310 had slightly relaxed the default RST checks, instead of utilizing net.inet.tcp.insecure_rst sysctl. In 2010 a new technique for mitigation of these attacks was proposed in RFC5961 [2]. The idea is to send a "challenge ACK" packet to the peer, to verify that packet arrived isn't spoofed. If peer receives challenge ACK it should regenerate its RST or SYN with correct sequence number. This should not only protect against attacks, but also improve communication with broken stacks, so authors of reverted r167310 and r192912 won't be disappointed. [1] http://bandwidthco.com/whitepapers/netforensics/tcpip/TCP Reset Attacks.pdf [2] http://www.rfc-editor.org/rfc/rfc5961.txt Changes made: o Revert r167310. o Implement "challenge ACK" protection as specificed in RFC5961 against RST attack. On by default. - Carefully preserve r138098, which handles empty window edge case, not described by the RFC. - Update net.inet.tcp.insecure_rst description. o Implement "challenge ACK" protection as specificed in RFC5961 against SYN attack. On by default. - Provide net.inet.tcp.insecure_syn sysctl, to turn off RFC5961 protection. The changes were tested at Netflix. The tested box didn't show any anomalies compared to control box, except slightly increased number of TCP connection in LAST_ACK state. Reviewed by: rrs Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-16 11:07:25 +00:00
Michael Tuexen	8a0834ec28	Make a type conversion explicit. When compiling this code on Windows as part of the SCTP userland stack, this fixes a warning reported by Peter Kasting from Google. MFC after: 3 days	2014-09-16 10:57:55 +00:00
Xin LI	831ad37ef2	Fix Denial of Service in TCP packet processing. Submitted by: glebius Security: FreeBSD-SA-14:19.tcp	2014-09-16 09:48:24 +00:00
Michael Tuexen	43f9f175c5	The MTU is handled as a 32-bit entity within the SCTP stack. This was reported by Peter Kasting from Google. MFC after: 3 days	2014-09-16 09:22:43 +00:00
Adrian Chadd	f4659f4c27	Ensure the correct software IPv4 hash is done based on the configured RSS parameters, rather than assuming we're hashing IPv4+UDP and IPv4+TCP.	2014-09-16 03:26:42 +00:00
Michael Tuexen	aa7e5af86f	Chunk IDs are 8 bit entities, not 16 bit. Thanks to Peter Kasting from Google for drawing my attention to it. MFC after: 3 days	2014-09-15 19:38:34 +00:00
Hiroki Sato	9bc11d7bd7	Use generic SYSCTL_* macro instead of deprecated SYSCTL_VNET_*. Suggested by: glebius	2014-09-15 14:43:58 +00:00
Hiroki Sato	348aae2398	Make net.inet.ip.sourceroute, net.inet.ip.accept_sourceroute, and net.inet.ip.process_options vnet-aware. Revert changes in r271545. Suggested by: bz	2014-09-15 07:20:40 +00:00
Hans Petter Selasky	72f3100047	Revert r271504. A new patch to solve this issue will be made. Suggested by: adrian @	2014-09-13 20:52:01 +00:00
Hans Petter Selasky	eb93b77ae4	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. MFC after: 1 week Sponsored by: Mellanox Technologies	2014-09-13 08:26:09 +00:00
Alan Somers	4f8585e021	Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and ifa_ifwithdstaddr. For the sake of backwards compatibility, the new arguments were added to new functions named ifa_ifwithnet_fib and ifa_ifwithdstaddr_fib, while the old functions became wrappers around the new ones that passed RT_ALL_FIBS for the fib argument. However, the backwards compatibility is not desired for FreeBSD 11, because there are numerous other incompatible changes to the ifnet(9) API. We therefore decided to remove it from head but leave it in place for stable/9 and stable/10. In addition, this commit adds the fib argument to ifa_ifwithbroadaddr for consistency's sake. sys/sys/param.h Increment __FreeBSD_version sys/net/if.c sys/net/if_var.h sys/net/route.c Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute. sys/net/route.c sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_options.c sys/netinet/ip_output.c sys/netinet6/nd6.c Fixup calls of modified functions. share/man/man9/ifnet.9 Document changed API. CR: https://reviews.freebsd.org/D458 MFC after: Never Sponsored by: Spectra Logic	2014-09-11 20:21:03 +00:00
Andrey V. Elsukov	028bdf289d	Add scope zone id to the in_endpoints and hc_metrics structures. A non-global IPv6 address can be used in more than one zone of the same scope. This zone index is used to identify to which zone a non-global address belongs. Also we can have many foreign hosts with equal non-global addresses, but from different zones. So, they can have different metrics in the host cache. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-10 16:26:18 +00:00
Andrey V. Elsukov	a7e201bbac	Make in6_pcblookup_hash_locked and in6_pcbladdr static. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-10 13:17:35 +00:00
Andrey V. Elsukov	1b44e5ffe3	Introduce INP6_PCBHASHKEY macro. Replace usage of hardcoded part of IPv6 address as hash key in all places. Obtained from: Yandex LLC	2014-09-10 12:35:42 +00:00
Adrian Chadd	8ad1a83b48	Calculate the RSS hash for outbound UDPv4 frames. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 04:19:36 +00:00
Adrian Chadd	b8bc95cd49	Update the IPv4 input path to handle reassembled frames and incoming frames with no RSS hash. When doing RSS: * Create a new IPv4 netisr which expects the frames to have been verified; it just directly dispatches to the IPv4 input path. * Once IPv4 reassembly is done, re-calculate the RSS hash with the new IP and L3 header; then reinject it as appropriate. * Update the IPv4 netisr to be a CPU affinity netisr with the RSS hash function (rss_soft_m2cpuid) - this will do a software hash if the hardware doesn't provide one. NICs that don't implement hardware RSS hashing will now benefit from RSS distribution - it'll inject into the correct destination netisr. Note: the netisr distribution doesn't work out of the box - netisr doesn't query RSS for how many CPUs and the affinity setup. Yes, netisr likely shouldn't really be doing CPU stuff anymore and should be "some kind of 'thing' that is a workqueue that may or may not have any CPU affinity"; that's for a later commit. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 04:18:20 +00:00
Adrian Chadd	72d33245f5	Implement IPv4 RSS software hash functions to use during packet ingress and egress. * rss_mbuf_software_hash_v4 - look at the IPv4 mbuf to fetch the IPv4 details + direction to calculate a hash. * rss_proto_software_hash_v4 - hash the given source/destination IPv4 address, port and direction. * rss_soft_m2cpuid - map the given mbuf to an RSS CPU ("bucket" for now) These functions are intended to be used by the stack to support the following: * Not all NICs do RSS hashing, so we should support some way of doing a hash in software; * The NIC / driver may not hash frames the way we want (eg UDP 4-tuple hashing when the stack is only doing 2-tuple hashing for UDP); so we may need to re-hash frames; * .. same with IPv4 fragments - they will need to be re-hashed after reassembly; * .. and same with things like IP tunneling and such; * The transmit path for things like UDP, RAW and ICMP don't currently have any RSS information attached to them - so they'll need an RSS calculation performed before transmit. TODO: * Counters! Everywhere! * Add a debug mode that software hashes received frames and compares them to the hardware hash provided by the hardware to ensure they match. The IPv6 part of this is missing - I'm going to do some re-juggling of where various parts of the RSS framework live before I add the IPv6 code (read: the IPv6 code is going to go into netinet6/in6_rss.[ch], rather than living here.) Note: This API is still fluid. Please keep that in mind. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 03:10:21 +00:00
Adrian Chadd	9d3ddf4384	Add support for receiving and setting flowtype, flowid and RSS bucket information as part of recvmsg(). This is primarily used for debugging/verification of the various processing paths in the IP, PCB and driver layers. Unfortunately the current implementation of the control message path results in a ~10% or so drop in UDP frame throughput when it's used. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 01:45:39 +00:00
Adrian Chadd	061a4b4c36	Add a flag to ip_output() - IP_NODEFAULTFLOWID - which prevents it from overriding an existing flowid/flowtype field in the outbound mbuf with the inp_flowid/inp_flowtype details. The upcoming RSS UDP support calculates a valid RSS value for outbound mbufs and since it may change per send, it doesn't cache it in the inpcb. So overriding it here would be wrong. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 00:19:02 +00:00
Alexander V. Chernikov	d6164b77f8	Make ipfw_nat module use IP_FW3 codes. Kernel changes: * Split kernel/userland nat structures eliminating IPFW_INTERNAL hack. * Add IP_FW_NAT44_* codes resemblin old ones. * Assume that instances can be named (no kernel support currently). * Use both UH+WLOCK locks for all configuration changes. * Provide full ABI support for old sockopts. Userland changes: * Use IP_FW_NAT44_* codes for nat operations. * Remove undocumented ability to show ranges of nat "log" entries.	2014-09-07 18:30:29 +00:00
Michael Tuexen	ad234e3c3d	Address warnings generated by the clang analyzer. MFC after: 1 week	2014-09-07 18:05:37 +00:00
Michael Tuexen	23602b60fb	Address another warnings reported by Patrick Laimbock when compiling in userspace. While there, improve consistency. MFC after: 1 week	2014-09-07 17:07:19 +00:00
Michael Tuexen	24aaac8d59	Use union sctp_sockstore instead of struct sockaddr_storage. This eliminiates some warnings when building in userland. Thanks to Patrick Laimbock for reporting this issue. Remove also some unnecessary casts. There should be no functional change. MFC after: 1 week	2014-09-07 09:06:26 +00:00
Michael Tuexen	95e550801c	Use SYSCTL_PROC instead of SYSCTL_VNET_PROC. Suggested by: glebius@ MFC after: 1 week	2014-09-07 07:49:49 +00:00
Michael Tuexen	24110da033	Fix a leak of an address, if the address is scheduled for removal and the stack is torn down. Thanks to Peter Bostroem and Jiayang Liu from Google for reporting the issue. MFC after: 1 week	2014-09-06 20:03:24 +00:00
Michael Tuexen	f47f328dc5	Fix the handling of sysctl variables when used with VIMAGE. While there do some cleanup of the code. MFC after: 1 week	2014-09-06 19:12:14 +00:00
Alexander V. Chernikov	c9daea0b86	Sync to HEAD@r271160.	2014-09-05 13:52:39 +00:00
Gleb Smirnoff	770aa6cb25	Satisfy assertion in m_demote(). Sponsored by: Nginx, Inc.	2014-09-04 19:28:02 +00:00
John Baldwin	a7c7f2a7e2	In tcp_input(), don't acquire the pcbinfo global write lock for SYN packets targeting a listening socket. Permit to reduce TCP input processing starvation in context of high SYN load (e.g. short-lived TCP connections or SYN flood). Submitted by: Julien Charbon <jcharbon@verisign.com> Reviewed by: adrian, hiren, jhb, Mike Bentkofsky	2014-09-04 19:09:08 +00:00
Gleb Smirnoff	07e845a3f4	Fixes for tcp_respond() comment.	2014-09-04 17:05:57 +00:00
Gleb Smirnoff	ba32fcfff9	Improve r265338. When inserting mbufs into TCP reassembly queue, try to collapse adjacent pieces using m_catpkt(). In best case scenario it copies data and frees mbufs, making mbuf exhaustion attack harder. Suggested by: Jonathan Looney <jonlooney gmail.com> Security: Hardens against remote mbuf exhaustion attack. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-04 09:15:44 +00:00
Gleb Smirnoff	bf7dcda366	Clean up unused CSUM_FRAGMENT. Sponsored by: Nginx, Inc.	2014-09-03 08:30:18 +00:00
Gleb Smirnoff	c26544aa7f	Make SOCK_RAW sockets to be truly raw, not modifying received and sent packets at all. Swapping byte order on SOCK_RAW was actually a bug, an artifact from the BSD network stack, that used to convert a packet to native byte order once it is received by kernel. Other operating systems didn't follow this, and later other BSD descendants fixed this, leaving us alone with the bug. Now it is clear that we should fix the bug. In collaboration with: Olivier Cochard-Labbé <olivier cochard.me> See also: https://wiki.freebsd.org/SOCK_RAW Sponsored by: Nginx, Inc.	2014-09-01 14:04:51 +00:00
Alexander V. Chernikov	0cba2b2802	Add support for multi-field values inside ipfw tables. This is the last major change in given branch. Kernel changes: * Use 64-bytes structures to hold multi-value variables. * Use shared array to hold values from all tables (assume each table algo is capable of holding 32-byte variables). * Add some placeholders to support per-table value arrays in future. * Use simple eventhandler-style API to ease the process of adding new table items. Currently table addition may required multiple UH drops/ acquires which is quite tricky due to atomic table modificatio/swap support, shared array resize, etc. Deal with it by calling special notifier capable of rolling back state before actually performing swap/resize operations. Original operation then restarts itself after acquiring UH lock. * Bump all objhash users default values to at least 64 * Fix custom hashing inside objhash. Userland changes: * Add support for dumping shared value array via "vlist" internal cmd. * Some small print/fill_flags dixes to support u32 values. * valtype is now bitmask of <skipto\|pipe\|fib\|nat\|dscp\|tag\|divert\|netgraph\|limit\|ipv4\|ipv6>. New values can hold distinct values for each of this types. * Provide special "legacy" type which assumes all values are the same. * More helpers/docs following.. Some examples: 3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6 3:41 [1] zfscurr0# ipfw table mimimi info +++ table(mimimi), set(0) +++ kindex: 2, type: addr references: 0, valtype: skipto,limit,ipv4,ipv6 algorithm: addr:radix items: 0, size: 296 3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1 added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1 3:42 [1] zfscurr0# ipfw table mimimi list +++ table(mimimi), set(0) +++ 10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1	2014-08-31 23:51:09 +00:00
Gleb Smirnoff	546451a2e5	Use macros instead of referencing struct if_data that resides in ifnet. Sponsored by: Nginx, Inc.	2014-08-31 06:30:50 +00:00
Michael Tuexen	76031b19ef	Announce SCTP support in the kern.features sysctl variables. MFC after: 3 days	2014-08-26 21:15:34 +00:00
Alexander V. Chernikov	832fd78087	Sync to HEAD@r270409.	2014-08-23 14:58:31 +00:00
Xin LI	a7f77a3950	Restore historical behavior of in_control, which, when no matching address is found, the first usable address is returned for legacy ioctls like SIOCGIFBRDADDR, SIOCGIFDSTADDR, SIOCGIFNETMASK and SIOCGIFADDR. While there also fix a subtle issue that a caller from a jail asking for INADDR_ANY may get the first IP of the host that do not belong to the jail. Submitted by: glebius Differential Revision: https://reviews.freebsd.org/D667	2014-08-22 19:08:12 +00:00
Lawrence Stewart	8b0fe327e8	Destroy the "qdiffsample_zone" UMA zone on unload to avoid a use-after-unload panic easily triggered by running "sysctl -a" after unload. Reported and tested by: Grenville Armitage <garmitage@swin.edu.au> MFC after: 1 week	2014-08-19 02:19:53 +00:00
Alexander V. Chernikov	4bbd15771b	Make room for multi-type values in struct tentry.	2014-08-15 12:58:32 +00:00
Kevin Lo	73d76e77b6	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
Alexander V. Chernikov	c21034b744	Replace "cidr" table type with "addr" type. Suggested by: luigi	2014-08-14 21:43:20 +00:00
Alexander V. Chernikov	18ad419788	* Fix displaying dynamic rules for large rulesets. * Clean up some comments.	2014-08-14 08:21:22 +00:00
Alexander V. Chernikov	1b833d535b	Sync to HEAD@r269943.	2014-08-13 16:20:41 +00:00
Michael Tuexen	f0396ad15e	Add support for the SCTP_PR_STREAM_STATUS and SCTP_PR_ASSOC_STATUS socket options. This includes managing the correspoing stat counters. Add the SCTP_DETAILED_STR_STATS kernel option to control per policy counters on every stream. The default is off and only an aggregated counter is available. This is sufficient for the RTCWeb usecase. MFC after: 1 week	2014-08-13 15:50:16 +00:00
Alexander V. Chernikov	1940fa7727	Change tablearg value to be 0 (try #2 ). Most of the tablearg-supported opcodes does not accept 0 as valid value: O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET, O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input. The rest are O_SETDSCP and O_SETFIB. 'Fix' them by adding high-order bit (0x8000) set for non-tablearg values. Do translation in kernel for old clients (import_rule0 / export_rule0), teach current ipfw(8) binary to add/remove given bit. This change does not affect handling SETDSCP values, but limit O_SETFIB values to 32767 instead of 65k. Since currently we have either old (16) or new (2^32) max fibs, this should not be a big deal: we're definitely OK for former and have to add another opcode to deal with latter, regardless of tablearg value.	2014-08-12 15:51:48 +00:00
Michael Tuexen	97a0ca5b3e	Change SCTP sysctl from auth_disable to auth_enable. This is consistent with other similar sysctl variable used in SCTP.	2014-08-12 13:13:11 +00:00
Michael Tuexen	c79bec9c75	Add support for the SCTP_AUTH_SUPPORTED and SCTP_ASCONF_SUPPORTED socket options. Add also a sysctl to control the support of ASCONF. MFC after: 1 week	2014-08-12 11:30:16 +00:00
Alexander V. Chernikov	4f43138ade	* Add the abilify to lock/unlock given table from changes. Example: # ipfw table si lock # ipfw table si info +++ table(si), set(0) +++ kindex: 0, type: cidr, locked valtype: number, references: 0 algorithm: cidr:radix items: 0, size: 288 # ipfw table si add 4.5.6.7 ignored: 4.5.6.7/32 0 ipfw: Adding record failed: table is locked # ipfw table si unlock # ipfw table si add 4.5.6.7 added: 4.5.6.7/32 0 # ipfw table si lock # ipfw table si delete 4.5.6.7 ignored: 4.5.6.7/32 0 ipfw: Deleting record failed: table is locked # ipfw table si unlock # ipfw table si delete 4.5.6.7 deleted: 4.5.6.7/32 0	2014-08-11 18:09:37 +00:00
Alexander V. Chernikov	3a845e1076	* Add support for batched add/delete for ipfw tables * Add support for atomic batches add (all or none). * Fix panic on deleting non-existing entry in radix algo. Examples: # si is empty # ipfw table si add 1.1.1.1/32 1111 2.2.2.2/32 2222 added: 1.1.1.1/32 1111 added: 2.2.2.2/32 2222 # ipfw table si add 2.2.2.2/32 2200 4.4.4.4/32 4444 exists: 2.2.2.2/32 2200 added: 4.4.4.4/32 4444 ipfw: Adding record failed: record already exists ^^^^^ Returns error but keeps inserted items # ipfw table si list +++ table(si), set(0) +++ 1.1.1.1/32 1111 2.2.2.2/32 2222 4.4.4.4/32 4444 # ipfw table si atomic add 3.3.3.3/32 3333 4.4.4.4/32 4400 5.5.5.5/32 5555 added(reverted): 3.3.3.3/32 3333 exists: 4.4.4.4/32 4400 ignored: 5.5.5.5/32 5555 ipfw: Adding record failed: record already exists ^^^^^ Returns error and reverts added records # ipfw table si list +++ table(si), set(0) +++ 1.1.1.1/32 1111 2.2.2.2/32 2222 4.4.4.4/32 4444	2014-08-11 17:34:25 +00:00
Hans Petter Selasky	e167cb89a2	Fix string length argument passed to "sysctl_handle_string()" so that the complete string is returned by the function and not just only one byte. PR: 192544 MFC after: 2 weeks	2014-08-10 07:51:55 +00:00
Hiren Panchasara	f7469d3e52	Improve comments by listing a criteria for automatic increment of receive socket buffer. Reviewed by: jmg	2014-08-09 21:01:24 +00:00
Michael Tuexen	82eaf95e8d	Small modification of the sctp_input() cleanup to avoid having code between declariations.	2014-08-09 14:33:44 +00:00
Konstantin Belousov	1216eb3320	Fix one more compiler warning, m is not initialized.	2014-08-08 15:50:02 +00:00
Alexander V. Chernikov	8bd1921248	Partially revert previous commit: "0" value is perfectly valid for O_SETFIB and O_SETDSCP, so tablearg remains to be 655535 for now.	2014-08-08 15:33:26 +00:00
Alexander V. Chernikov	2c452b20dd	* Switch tablearg value from 65535 to 0. * Use u16 table kidx instead of integer on for iface opcode. * Provide compability layer for old clients.	2014-08-08 14:23:20 +00:00
Alexander V. Chernikov	adf3b2b9d8	* Add IP_FW_TABLE_XMODIFY opcode * Since there seems to be lack of consensus on strict value typing, remove non-default value types. Use userland-only "value format type" to print values. Kernel changes: * Add IP_FW_XMODIFY to permit table run-time modifications. Currently we support changing limit and value format type. Userland changes: * Support IP_FW_XMODIFY opcode. * Support specifying value format type (ftype) in tablble create/modify req * Fine-print value type/value format type.	2014-08-08 09:27:49 +00:00
Bjoern A. Zeeb	eb5eb08820	Fix argument to KTR after r269699 to unbreak LINT builds.	2014-08-08 09:17:02 +00:00
Alexander V. Chernikov	28ea4fa355	Remove IP_FW_TABLES_XGETSIZE opcode. It is superseded by IP_FW_TABLES_XLIST.	2014-08-08 06:36:26 +00:00
Kevin Lo	8f5a8818f5	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
Alexander V. Chernikov	a73d728d31	Kernel changes: * Implement proper checks for switching between global and set-aware tables * Split IP_FW_DEL mess into the following opcodes: * IP_FW_XDEL (del rules matching pattern) * IP_FW_XMOVE (move rules matching pattern to another set) * IP_FW_SET_SWAP (swap between 2 sets) * IP_FW_SET_MOVE (move one set to another one) * IP_FW_SET_ENABLE (enable/disable sets) * Add IP_FW_XZERO / IP_FW_XRESETLOG to finish IP_FW3 migration. * Use unified ipfw_range_tlv as range description for all of the above. * Check dynamic states IFF there was non-zero number of deleted dyn rules, * Del relevant dynamic states with singe traversal instead of per-rule one. Userland changes: * Switch ipfw(8) to use new opcodes.	2014-08-07 21:37:31 +00:00
Michael Tuexen	317e00ef86	Add support for the SCTP_RECONFIG_SUPPORTED and the corresponding sysctl controlling the negotiation of the RE-CONFIG extension. MFC after: 3 days	2014-08-04 20:07:35 +00:00
Hiren Panchasara	76504ce978	Add a comment for easier code understanding.	2014-08-04 19:42:48 +00:00
Alexander V. Chernikov	46d5200874	Implement atomic ipfw table swap. Kernel changes: * Add opcode IP_FW_TABLE_XSWAP * Add support for swapping 2 tables with the same type/ftype/vtype. * Make skipto cache init after ipfw locks init. Userland changes: * Add "table X swap Y" command.	2014-08-03 21:37:12 +00:00
Michael Tuexen	cb9b8e6f7d	Add support for the SCTP_PKTDROP_SUPPORTED socket option and the corresponding sysctl variable. The default is off, since the specification is not an RFC yet. MFC after: 1 week	2014-08-03 18:12:55 +00:00
Michael Tuexen	2fdf7a7a35	Use consistent names for SCTP sysctls. Rename nr_sack_on_off to nrsack_enable. Please note that this extension is off by default since it is not specified in an RFC (yet).	2014-08-03 15:09:13 +00:00
Michael Tuexen	caea98793f	Add SCTP socket option SCTP_NRSACK_SUPPORTED to control the NRSACK extension. The default will still be off, since it it not an RFC (yet). Changing the sysctl name will be in a separate commit. MFC after: 1 week	2014-08-03 14:10:10 +00:00
Alexander V. Chernikov	5f379342d2	Show algorithm-specific data in "table info" output.	2014-08-03 12:19:45 +00:00
Michael Tuexen	dd973b0e15	Add support for the SCTP_PR_SUPPORTED socket option as specified in http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-prpolicies Add also a sysctl controlling the default of the end-points. MFC after: 1 week	2014-08-02 21:36:40 +00:00
Michael Tuexen	59a86c85bb	Fix a copy and paste error. X-MFC with: 269436	2014-08-02 20:37:02 +00:00
Michael Tuexen	f342355a0e	Cleanup the ECN configuration handling and provide an SCTP socket option for controlling ECN on future associations and get the status on current associations. A simialar pattern will be used for controlling SCTP extensions in upcoming commits.	2014-08-02 17:35:13 +00:00
Michael Tuexen	47aac6fa4b	Remove the asconf_auth_nochk sysctl. This was off by default and only existed to be able to test with non-compliant peers a long time ago.	2014-08-01 20:49:27 +00:00
Peter Grehan	07b4e38313	Fix byte ordering in default RSS key. The rss_key[] array in netinet/in_rss.c has the bytes in incorrect order. This results in the RSS test vectors in the Microsft RSS spec and Intel NIC specs giving incorrect results, and making it difficult to verify correct hash operation when RSS functionality is added to new NICs. CR: https://phabric.freebsd.org/D516 Reviewed by: adrian	2014-08-01 18:36:40 +00:00
Alexander V. Chernikov	4c0c07a552	* Permit limiting number of items in table. Kernel changes: * Add TEI_FLAGS_DONTADD entry flag to indicate that insert is not possible * Support given flag in all algorithms * Add "limit" field to ipfw_xtable_info * Add actual limiting code into add_table_entry() Userland changes: * Add "limit" option as "create" table sub-option. Limit modification is currently impossible. * Print human-readable errors in table enry addition/deletion code.	2014-08-01 15:17:46 +00:00
Michael Tuexen	ce11b8429b	Cleanup sctp_send_initiate() and sctp_send_initiate_ack() to be in sync as much as possible. This simplifies upcoming changes.	2014-08-01 12:42:37 +00:00
Alexander V. Chernikov	914bffb6ab	* Add new "flow" table type to support N=1..5-tuple lookups * Add "flow:hash" algorithm Kernel changes: * Add O_IP_FLOW_LOOKUP opcode to support "flow" lookups * Add IPFW_TABLE_FLOW table type * Add "struct tflow_entry" as strage for 6-tuple flows * Add "flow:hash" algorithm. Basically it is auto-growing chained hash table. Additionally, we store mask of fields we need to compare in each instance/ * Increase ipfw_obj_tentry size by adding struct tflow_entry * Add per-algorithm stat (ifpw_ta_tinfo) to ipfw_xtable_info * Increase algoname length: 32 -> 64 (algo options passed there as string) * Assume every table type can be customized by flags, use u8 to store "tflags" field. * Simplify ipfw_find_table_entry() by providing @tentry directly to algo callback. * Fix bug in cidr:chash resize procedure. Userland changes: * add "flow table(NAME)" syntax to support n-tuple checking tables. * make fill_flags() separate function to ease working with _s_x arrays * change "table info" output to reflect longer "type" fields Syntax: ipfw table fl2 create type flow:[src-ip][,proto][,src-port][,dst-ip][dst-port] [algo flow:hash] Examples: 0:02 [2] zfscurr0# ipfw table fl2 create type flow:src-ip,proto,dst-port algo flow:hash 0:02 [2] zfscurr0# ipfw table fl2 info +++ table(fl2), set(0) +++ kindex: 0, type: flow:src-ip,proto,dst-port valtype: number, references: 0 algorithm: flow:hash items: 0, size: 280 0:02 [2] zfscurr0# ipfw table fl2 add 2a02:6b8::333,tcp,443 45000 0:02 [2] zfscurr0# ipfw table fl2 add 10.0.0.92,tcp,80 22000 0:02 [2] zfscurr0# ipfw table fl2 list +++ table(fl2), set(0) +++ 2a02:6b8::333,6,443 45000 10.0.0.92,6,80 22000 0:02 [2] zfscurr0# ipfw add 200 count tcp from me to 78.46.89.105 80 flow 'table(fl2)' 00200 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2) 0:03 [2] zfscurr0# ipfw show 00200 0 0 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2) 65535 617 59416 allow ip from any to any 0:03 [2] zfscurr0# telnet -s 10.0.0.92 78.46.89.105 80 Trying 78.46.89.105... .. 0:04 [2] zfscurr0# ipfw show 00200 5 272 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2) 65535 682 66733 allow ip from any to any	2014-07-31 20:08:19 +00:00
Steven Hartland	5af464bbe0	Ensure that IP's added to CARP always use the CARP MAC Previously there was a race condition between the address addition and associating it with the CARP which resulted in the interface MAC, instead of the CARP MAC, being used for a brief amount of time. This caused "is using my IP address" warnings as well as data being sent to the wrong machine due to incorrect ARP entries being recorded by other devices on the network.	2014-07-31 16:43:56 +00:00
Steven Hartland	d34165f759	Only check error if one could have been generated	2014-07-31 09:18:29 +00:00
Alexander V. Chernikov	b23d5de9b6	* Add number:array algorithm lookup method. Kernel changes: * s/IPFW_TABLE_U32/IPFW_TABLE_NUMBER/ * Force "lookup <port\|uid\|gid\|jid>" to be IPFW_TABLE_NUMBER * Support "lookup" method for number tables * Add number:array algorihm (i32 as key, auto-growing). Userland changes: * Support named tables in "lookup <tag> Table" * Fix handling of "table(NAME,val)" case * Support printing "number" table data.	2014-07-30 14:52:26 +00:00
Hiren Panchasara	39c8c62ec4	Add a comment and while there, fix trailing whitespace.	2014-07-29 23:42:51 +00:00
Alexander V. Chernikov	9d099b4f38	* Dump available table algorithms via "ipfw talist" cmd. Kernel changes: * Add type/refcount fields to table algo instances. * Add IP_FW_TABLES_ALIST opcode to export available algorihms to userland. Userland changes: * Fix cores on empty input inside "ipfw table" handler. * Add "ipfw talist" cmd to print availabled kernel algorithms. * Change "table info" output to reflect long algorithm config lines.	2014-07-29 22:44:26 +00:00
Gleb Smirnoff	9753faf553	Garbage collect couple of unused fields from struct ifaddr: - ifa_claim_addr() unused since removal of NetAtalk - ifa_metric seems to be never utilized, always a copy of if_metric	2014-07-29 15:01:29 +00:00
Alexander V. Chernikov	68394ec88e	* Add generic ipfw interface tracking API * Rewrite interface tables to use interface indexes Kernel changes: * Add generic interface tracking API: - ipfw_iface_ref (must call unlocked, performs lazy init if needed, allocates state & bumps ref) - ipfw_iface_add_ntfy(UH_WLOCK+WLOCK, links comsumer & runs its callback to update ifindex) - ipfw_iface_del_ntfy(UH_WLOCK+WLOCK, unlinks consumer) - ipfw_iface_unref(unlocked, drops reference) Additionally, consumer callbacks are called in interface withdrawal/departure. * Rewrite interface tables to use iface tracking API. Currently tables are implemented the following way: runtime data is stored as sorted array of {ifidx, val} for existing interfaces full data is stored inside namedobj instance (chained hashed table). * Add IP_FW_XIFLIST opcode to dump status of tracked interfaces * Pass @chain ptr to most non-locked algorithm callbacks: (prepare_add, prepare_del, flush_entry ..). This may be needed for better interaction of given algorithm an other ipfw subsystems * Add optional "change_ti" algorithm handler to permit updating of cached table_info pointer (happens in case of table_max resize) * Fix small bug in ipfw_list_tables() * Add badd (insert into sorted array) and bdel (remove from sorted array) funcs Userland changes: * Add "iflist" cmd to print status of currently tracked interface * Add stringnum_cmp for better interface/table names sorting	2014-07-28 19:01:25 +00:00
Marcel Moolenaar	1e0a021e3d	The accept filter code is not specific to the FreeBSD IPv4 network stack, so it really should not be under "optional inet". The fact that uipc_accf.c lives under kern/ lends some weight to making it a "standard" file. Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the need for #ifdef INET in kern/uipc_socket.c. Also, this meant the net.inet.accf.unloadable sysctl needed to move, as net.inet does not exist without networking compiled in (as it lives in netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable. In order to support existing accept filter sysctls, the net.inet.accf node has been added netinet/in_proto.c. Submitted by: Steve Kiernan <stevek@juniper.net> Obtained from: Juniper Networks, Inc.	2014-07-26 19:27:34 +00:00
Michael Tuexen	56711f9433	Initialize notification strucuture. This was missed in an earlier commit MFC after: 3 days	2014-07-24 18:06:18 +00:00
Hiroki Sato	9be09a6e43	Fix EtherIP. TOS field must be initialized when the inner protocol is PF_LINK, and multicast/broadcast flag should always be dropped because the outer protocol uses unicast even when the inner address is not for unicast. It had been broken since r236951 when gif_output() started to use IFQ_HANDOFF().	2014-07-24 10:42:47 +00:00
Michael Tuexen	e710ed26a3	Cleanup the definition of two structures which are exposed to userland. Therefore no MFC.	2014-07-22 19:54:22 +00:00
Adrian Chadd	58ef629f00	Make the PCBGROUPS code aware of IPv4 UDP 4-tuple.	2014-07-20 07:38:38 +00:00
Adrian Chadd	9870806c93	Add hash awareness of the IPv4 and IPv6 UDP 4-tuple. Note: it would be nice if the supported hash check would be used here!	2014-07-20 07:37:47 +00:00
Adrian Chadd	40c753e3da	Implement rss_gethashconfig() - return the currently supported hash methods by the stack. Right now the stack isn't really setup for RSS with 4-tuple UDP hashing for either IPv4 and IPv6. The specifics: * The UDP init path udp_init() and udplite_init() specify the hash as 2-tuple, so the PCBGROUPS code only tries a 2-tuple check; * The PCBGROUPS and RSS code doesn't know about the UDP hash types just yet, so they're never treated as valid hashes. * For correctness, 4-tuple can't be enabled in the general case because UDP datagrams can be more fragmented than IP datagrams may be. Strictly speaking, TCP datagrams may also be fragmented and this could cause issues with PCBGROUPS/RSS until the IP defragment path grows some code to re-calculate the RSS hash. I'll follow this commit up with awareness of the UDP 4-tuple for those who wish to configure it, but for now it'll stay disabled. No drivers (yet) know to use this function when RSS is enabled.	2014-07-20 07:36:59 +00:00
Adrian Chadd	85415b47c8	Update the comment to be more concise.	2014-07-20 07:31:55 +00:00
Adrian Chadd	5f15473b37	Update the default RSS hash to the Chelsio T5 firmware one - it provides markedly better distribution of IPv6 address/ports than the previous key. The previous key would hash large swaths of the port space for a given source/destination IP address to the same low handful of bits, effectively mapping them to the same queue. This made testing very .. special.	2014-07-18 08:22:13 +00:00
Adrian Chadd	8496de3825	Oops - somehow I missed the IP option numbers clashing with the multicast numbers below. Move them to a new set of non-clashing numbers.	2014-07-17 05:45:54 +00:00
Adrian Chadd	e989b65f79	Add RSS hashing awareness for IPv6 and TCP IPv6 hash types.	2014-07-12 05:43:43 +00:00
Adrian Chadd	d5bb8bd315	Expose in_pcbbind_check_bindmulti() so the upcoming IPv6 RSS changes can be made to use it.	2014-07-12 05:40:13 +00:00
Michael Tuexen	0c8682e8ad	Whitespace changes. MFC after: 1 week	2014-07-11 21:15:40 +00:00
Michael Tuexen	f64a0b069a	Bugfix: When a remote address was added to an endpoint, a source address was selected and cached, but it was not stored that is was cached. This resulted in selecting different source addresses for the INIT-ACK and COOKIE-ACK when possible. Thanks to Niu Zhixiong for reporting the issue. MFC after: 1 week	2014-07-11 17:31:40 +00:00
Gleb Smirnoff	fcc34a238c	Fix style bug: rename the refcount field of m_ext to ext_cnt, to match other members. Sponsored by: Nginx, Inc.	2014-07-11 14:34:29 +00:00
Michael Tuexen	4474d71a7b	Integrate upstream changes. MFC after: 1 week	2014-07-11 06:52:48 +00:00
Adrian Chadd	0a100a6f1e	Implement the first stage of multi-bind listen sockets and RSS socket awareness. * Introduce IP_BINDMULTI - indicating that it's okay to bind multiple sockets on the same bind details. Although the PCB code has been taught about this (see below) this patch doesn't introduce the rest of the PCB changes necessary to distribute lookups among multiple PCB entries in the global wildcard table. * Introduce IP_RSS_LISTEN_BUCKET - placing an listen socket into the given RSS bucket (and thus a single PCBGROUP hash.) * Modify the PCB add path to be aware of IP_BINDMULTI: + Only allow further PCB entries to be added if the owner credentials and IP_BINDMULTI has been specified. Ie, only allow further IP_BINDMULTI sockets to appear if the first bind() was IP_BINDMULTI. * Teach the PCBGROUP code about IP_RSS_LISTE_BUCKET marked PCB entries. Instead of using the wildcard logic and hashing, these sockets are simply placed into the PCBGROUP and _not_ in the wildcard hash. * When doing a PCBGROUP lookup, also do a wildcard match as well. This allows for an RSS bucket PCB entry to appear in a PCBGROUP rather than having to exist in the wildcard list. Tested: * TCP IPv4 server testing with igb(4) * TCP IPv4 server testing with ix(4) TODO: * The pcbgroup lookup code duplicated the wildcard and wildcard-PCB logic. This could be refactored into a single function. * This doesn't yet work for IPv6 (The PCBGROUP code in netinet6/ doesn't yet know about this); nor does it yet fully work for UDP.	2014-07-10 03:10:56 +00:00
Gleb Smirnoff	fe82cbe85c	In several cases in ip_output() we obtain reference on ifa. Do not leak it. Together with: asomers, np Sponsored by: Nginx, Inc.	2014-07-09 07:48:05 +00:00
Alexander V. Chernikov	7e767c791f	* Use different rule structures in kernel/userland. * Switch kernel to use per-cpu counters for rules. * Keep ABI/API. Kernel changes: * Each rules is now exported as TLV with optional extenable counter block (ip_fW_bcounter for base one) and ip_fw_rule for rule&cmd data. * Counters needs to be explicitly requested by IPFW_CFG_GET_COUNTERS flag. * Separate counters from rules in kernel and clean up ip_fw a bit. * Pack each rule in IPFW_TLV_RULE_ENT tlv to ease parsing. * Introduce versioning in container TLV (may be needed in future). * Fix ipfw_cfg_lheader broken u64 alignment. Userland changes: * Use set_mask from cfg header when requesting config * Fix incorrect read accouting in ipfw_show_config() * Use IPFW_RULE_NOOPT flag instead of playing with _pad * Fix "ipfw -d list": do not print counters for dynamic states * Some small fixes	2014-07-08 23:11:15 +00:00
Xin LI	e432298ade	Initialize SCTP cmsg's and notification's buffer before copying out to userland. Submitted by: tuexen Security: CVE-2014-3953 Security: FreeBSD-SA-14:17.kmem	2014-07-08 21:54:27 +00:00
Alexander V. Chernikov	6447bae661	* Prepare to pass other dynamic states via ipfw_dump_config() Kernel changes: * Change dump format for dynamic states: each state is now stored inside ipfw_obj_dyntlv last dynamic state is indicated by IPFW_DF_LAST flag * Do not perform sooptcopyout() for !SOPT_GET requests. Userland changes: * Introduce foreach_state() function handler to ease work with different states passed by ipfw_dump_config().	2014-07-06 23:26:34 +00:00
Alexander V. Chernikov	81d3153d61	* Add "lookup" table functionality to permit userland entry lookups. * Bump table dump format preserving old ABI. Kernel size: * Add IP_FW_TABLE_XFIND to handle "lookup" request from userland. * Add ta_find_tentry() algorithm callbacks/handlers to support lookups. * Fully switch to ipfw_obj_tentry for various table dumps: algorithms are now required to support the latest (ipfw_obj_tentry) entry dump format, the rest is handled by generic dump code. IP_FW_TABLE_XLIST opcode version bumped (0 -> 1). * Eliminate legacy ta_dump_entry algo handler: dump_table_entry() converts data from current to legacy format. Userland side: * Add "lookup" table parameter. * Change the way table type is guessed: call table_get_info() first, and check value for IPv4/IPv6 type IFF table does not exist. * Fix table_get_list(): do more tries if supplied buffer is not enough. * Sparate table_show_entry() from table_show_list().	2014-07-06 18:16:04 +00:00
Hiren Panchasara	43630e625a	Fix a typo.	2014-07-03 23:12:43 +00:00
Alexander V. Chernikov	ac35ff1784	Fully switch to named tables: Kernel changes: * Introduce ipfw_obj_tentry table entry structure to force u64 alignment. * Support "update-on-existing-key" "add" bahavior (TEI_FLAGS_UPDATED). * Use "subtype" field to distingush between IPv4 and IPv6 table records instead of previous hack. * Add value type (vtype) field for kernel tables. Current types are number,ip and dscp * Fix sets mask retrieval for old binaries * Fix crash while using interface tables Userland changes: * Switch ipfw_table_handler() to use named-only tables. * Add "table NAME create [type {cidr\|iface\|u32} [valtype {number\|ip\|dscp}] ..." * Switch ipfw_table_handler to match_token()-based parser. * Switch ipfw_sets_handler to use new ipfw_get_config() for mask retrieval. * Allow ipfw set X table ... syntax to permit using per-set table namespaces.	2014-07-03 22:25:59 +00:00
Hiren Panchasara	cc412412db		2014-07-02 22:04:14 +00:00
Adrian Chadd	81a99d38e9	Remove old reference to IP_RSSCPUID. Submitted by: Eggert, Lars <lars@netapp.com>	2014-07-01 17:27:48 +00:00
Adrian Chadd	8f7e75cbbd	If we're doing RSS then ensure the TCP timer selection uses the multi-CPU callwheel setup, rather than just dumping all the timers on swi0.	2014-06-30 04:26:29 +00:00
Alexander V. Chernikov	6c2997ffec	* Add new IP_FW_XADD opcode which permits to a) specify table ids as names b) add multiple rules at once. Partially convert current code for atomic addition of multiple rules.	2014-06-29 22:35:47 +00:00
Alexander V. Chernikov	563b5ab132	Suppord showing named tables in ipfw(8) rule listing. Kernel changes: * change base TLV header to be u64 (so size can be u32). * Introduce ipfw_obj_ctlv generc container TLV. * Add IP_FW_XGET opcode which is now used for atomic configuration retrieval. One can specify needed configuration pieces to retrieve via flags field. Currently supported are IPFW_CFG_GET_STATIC (static rules) and IPFW_CFG_GET_STATES (dynamic states). Other configuration pieces (tables, pipes, etc..) support is planned. Userland changes: * Switch ipfw(8) to use new IP_FW_XGET for rule listing. * Split rule listing code get and show pieces. * Make several steps forward towards libipfw: permit printing states and rules(paritally) to supplied buffer. do not die on malloc/kernel failure inside given printing functions. stop assuming cmdline_opts is global symbol.	2014-06-28 23:20:24 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Adrian Chadd	dc847eb656	Add missing variable declarations when using RSS. Reported by: bryanv@	2014-06-27 19:07:00 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Adrian Chadd	7847796a93	Retire IP_RSSCPUID ; the right thing to do is query the RSS bucket; map the bucket to an RSS queue, then map the queue to a CPU ID. This way the bucket->queue and queue->CPU mapping can change over time. Introduce IP_RSSBUCKETID - which instead looks up the RSS bucket. User applications can then map the RSS bucket to a CPU.	2014-06-26 04:12:41 +00:00
Adrian Chadd	a6c88ec4fb	Add another RSS method to query the indirection table entries. There's 128 indirection table entries which correspond to the low 7 bits of the 32 bit RSS hash. Each value will correspond to an RSS bucket. (Then each RSS bucket currently will map to a CPU.) This is a more explicit way of figuring out which RSS bucket is in each RSS indirection slot. It can be inferred by the other methods but I'd rather drivers use something more simplified and explicit.	2014-06-26 02:49:51 +00:00
Michael Tuexen	2f4c57fbe9	Fix a bug which incorrectly allowed two listening SCTP sockets on the same port bound to the wildcard address. MFC after: 3 days	2014-06-20 20:17:39 +00:00
Michael Tuexen	8a794ba826	Fix a bug in the setsockopt()-handling of the SCTP specific option SCTP_PEER_ADDR_THLDS: Use the provided address as intended. MFC after: 3 days	2014-06-20 17:45:00 +00:00
Michael Tuexen	6ba22f19ca	Honor jails for unbound SCTP sockets when selecting source addresses, reporting IP-addresses to the peer during the handshake, adding addresses to the host, reporting the addresses via the sysctl interface (used by netstat, for example) and reporting the addresses to the application via socket options. This issue was reported by Bernd Walter. MFC after: 3 days	2014-06-20 13:26:49 +00:00
Alexander V. Chernikov	9490a62716	* Add IP_FW_TABLE_XCREATE / IP_FW_TABLE_XMODIFY opcodes. * Add 'algoname' string to ipfw_xtable_info permitting to specify lookup algoritm with parameters. * Rework part of ipfw_rewrite_table_uidx() Sponsored by: Yandex LLC	2014-06-16 13:05:07 +00:00
Alexander V. Chernikov	d3a4f9249c	Simplify opcode handling. * Use one u16 from op3 header to implement opcode versioning. * IP_FW_TABLE_XLIST has now 2 handlers, for ver.0 (old) and ver.1 (current). * Every getsockopt request is now handled in ip_fw_table.c * Rename new opcodes: IP_FW_OBJ_DEL -> IP_FW_TABLE_XDESTROY IP_FW_OBJ_LISTSIZE -> IP_FW_TABLES_XGETSIZE IP_FW_OBJ_LIST -> IP_FW_TABLES_XLIST IP_FW_OBJ_INFO -> IP_FW_TABLE_XINFO IP_FW_OBJ_INFO -> IP_FW_TABLE_XFLUSH * Add some docs about using given opcodes. * Group some legacy opcode/handlers.	2014-06-15 13:40:27 +00:00
Alexander V. Chernikov	f1220db8d7	Move further to eliminate next pieces of number-assuming code inside tables. Kernel changes: * Add IP_FW_OBJ_FLUSH opcode (flush table based on its name/set) * Add IP_FW_OBJ_DUMP opcode (dumps table data based on its names/set) * Add IP_FW_OBJ_LISTSIZE / IP_FW_OBJ_LIST opcodes (get list of kernel tables) Userland changes: * move tables code to separate tables.c file * get rid of tables_max * switch "all"/list handling to new opcodes	2014-06-14 22:47:25 +00:00
Alexander V. Chernikov	9f7d47b025	Add API to ease adding new algorithms/new tabletypes to ipfw. Kernel-side changelog: * Split general tables code and algorithm-specific table data. Current algorithms (IPv4/IPv6 radix and interface tables radix) moved to new ip_fw_table_algo.c file. Tables code now supports any algorithm implementing the following callbacks: +struct table_algo { + char name[64]; + int idx; + ta_init init; + ta_destroy destroy; + table_lookup_t lookup; + ta_prepare_add prepare_add; + ta_prepare_del prepare_del; + ta_add add; + ta_del del; + ta_flush_entry flush_entry; + ta_foreach foreach; + ta_dump_entry dump_entry; + ta_dump_xentry dump_xentry; +}; Change ->state, ->xstate, ->tabletype fields of ip_fw_chain to ->tablestate pointer (array of 32 bytes structures necessary for runtime lookups (can be probably shrinked to 16 bytes later): +struct table_info { + table_lookup_t lookup; / Lookup function / + void state; /* Lookup radix/other structure / + void xstate; /* eXtended state / + u_long data; / Hints for given func / +}; Add count method for namedobj instance to ease size calculations * Bump ip_fw3 buffer in ipfw_clt 128->256 bytes. * Improve bitmask resizing on tables_max change. * Remove table numbers checking from most places. * Fix wrong nesting in ipfw_rewrite_table_uidx(). * Add IP_FW_OBJ_LIST opcode (list all objects of given type, currently implemented for IPFW_OBJTYPE_TABLE). * Add IP_FW_OBJ_LISTSIZE (get buffer size to hold IP_FW_OBJ_LIST data, currenly implemented for IPFW_OBJTYPE_TABLE). * Add IP_FW_OBJ_INFO (requests info for one object of given type). Some name changes: s/ipfw_xtable_tlv/ipfw_obj_tlv/ (no table specifics) s/ipfw_xtable_ntlv/ipfw_obj_ntlv/ (no table specifics) Userland changes: * Add do_set3() cmd to ipfw2 to ease dealing with op3-embeded opcodes. * Add/improve support for destroy/info cmds.	2014-06-14 10:58:39 +00:00
Alexander V. Chernikov	b074b7bbce	Make ipfw tables use names as used-level identifier internally: * Add namedobject set-aware api capable of searching/allocation objects by their name/idx. * Switch tables code to use string ids for configuration tasks. * Change locking model: most configuration changes are protected with UH lock, runtime-visible are protected with both locks. * Reduce number of arguments passed to ipfw_table_add/del by using separate structure. * Add internal V_fw_tables_sets tunable (set to 0) to prepare for set-aware tables (requires opcodes/client support) * Implement typed table referencing (and tables are implicitly allocated with all state like radix ptrs on reference) * Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode Namedobj more detailed: * Blackbox api providing methods to add/del/search/enumerate objects * Statically-sized hashes for names/indexes * Per-set bitmask to indicate free indexes * Separate methods for index alloc/delete/resize Basically, there should not be any user-visible changes except the following: * reducing table_max is not supported * flush & add change table type won't work if table is referenced Sponsored by: Yandex LLC	2014-06-12 09:59:11 +00:00
Michael Tuexen	dfa9c0b787	Use ENOBUFS instead of ENOMEM in error situations related to m_uiotombuf(). This was suggested by kevlo@. MFC after: 3 days	2014-06-05 12:51:12 +00:00
Kevin Lo	71c92ff80a	Fix build UDP-Lite with VIMAGE enabled when building with gcc. Reported and tested by: Jason Hellenthal	2014-06-03 01:30:32 +00:00
Hiren Panchasara	fc5e1956d9	ECN marking implenetation for dummynet. Changes include both DCTCP and RFC 3168 ECN marking methodology. DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00 Submitted by: Midori Kato (aoimidori27@gmail.com) Worked with: Lars Eggert (lars@netapp.com) Reviewed by: luigi, hiren	2014-06-01 07:28:24 +00:00
Bjoern A. Zeeb	700515aa62	While PAWS is disabled, there are no consumers for the tcp options argument to tcp_twcheck(); thus mark it __unused. MFC after: 2 weeks	2014-05-30 22:34:06 +00:00
Alan Somers	2f308a343f	Fix unintended KBI change from r264905. Add _fib versions of ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the _fib() versions with RT_ALL_FIBS, preserving legacy behavior. sys/net/if_var.h sys/net/if.c Add legacy-compatible functions as described above. Ensure legacy behavior when RT_ALL_FIBS is passed as fibnum. sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/net/route.c sys/net/rtsock.c sys/netinet6/nd6.c Call with _fib() functions if we must use a specific fib, or the legacy functions otherwise. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c Improve the udp_dontroute test. The bug that this test exercises is that ifa_ifwithnet() will return the wrong address, if multiple interfaces have addresses on the same subnet but with different fibs. The previous version of the test only considered one possible failure mode: that ifa_ifwithnet_fib() might fail to find any suitable address at all. The new version also checks whether ifa_ifwithnet_fib() finds the correct address by checking where the ARP request goes. Reported by: bz, hrs Reviewed by: hrs MFC after: 1 week X-MFC-with: 264905 Sponsored by: Spectra Logic	2014-05-29 21:03:49 +00:00
Jilles Tjoelker	ced01b33f4	netinet/in.h: Expose htonl(), htons(), ntohl() and ntohs() in strict POSIX mode. Put the htonl(), htons(), ntohl() and ntohs() declarations under __POSIX_VISIBLE >= 200112. POSIX.1-2001 and newer require these to be exposed from <netinet/in.h> (as well as <arpa/inet.h>). Note that it may be unnecessary to check __POSIX_VISIBLE >= 200112 because older versions of POSIX and the C standard do not define this header. However, other places in the same file already perform the check. PR: 188316 Submitted by: Christian Neukirchen	2014-05-29 15:23:37 +00:00
Adrian Chadd	8bde802a2b	The users of RSS shouldn't be directly concerned about hash -> CPU ID mappings. Instead, they should be first mapping to an RSS bucket and then querying the RSS bucket -> CPU ID mapping to figure out the target CPU. When (if?) RSS rebalancing is implemented or some other (non round-robin) distribution of work from buckets to CPU IDs, various bits of code - both userland and kernel - will need to know how this mapping works. So, to support this: * Add a new function rss_m2bucket() - this maps an mbuf to a given bucket. Anything which is currently doing hash -> CPU work may instead wish to do hash -> bucket, and then query the bucket->cpuid map for which CPU it belongs on. Or, map it to a bucket, then re-pin that bucket -> CPU during a rebalance operation. * For userland applications which wish to exploit affinity to RSS buckets, the bucket -> CPU ID mapping is now available via a sysctl. net.inet.rss.bucket_mapping lists the bucket to CPU ID mapping via a list of bucket:cpu pairs.	2014-05-27 08:06:20 +00:00
Bjoern A. Zeeb	3150f357ff	Remove the prototpye for the static inline function tcp_signature_verify_input(). The function is defined before first use already. MFC after: 2 weeks	2014-05-24 15:31:40 +00:00
Bjoern A. Zeeb	ad494fa898	syncache_lookup() is a file local function. Make it static and take it out of the public KPI; seems it was never used elsewhere. MFC after: 2 weeks	2014-05-24 15:03:36 +00:00
Bjoern A. Zeeb	4fd2b4eb53	Make tcp_twrespond() file local private; this removes it from the public KPI; it is not used anywhere else and seems it never was. MFC after: 2 weeks	2014-05-24 14:01:18 +00:00
Bjoern A. Zeeb	5688fa661b	Remove the prototypes for things that are no longer file local but were moved to the header file. Pointy hat to: clang \|\| bz MFC after: 2 weeks X-MFC with: r266596 Reported by: gcc build of sparc64	2014-05-23 21:12:33 +00:00
Bjoern A. Zeeb	255cd9fd58	Move the tcp_fields_to_host() and tcp_fields_to_net() (inline) functions to the tcp_var.h header file in order to avoid further duplication with upcoming commits. Reviewed by: np MFC after: 2 weeks	2014-05-23 20:15:01 +00:00
Adrian Chadd	bad008ce85	Use CPU_FIRST() / CPU_NEXT() to iterate over the valid CPU IDs.	2014-05-22 07:25:36 +00:00
Adrian Chadd	883831c675	When RSS is enabled and per cpu TCP timers are enabled, do an RSS lookup for the inp flowid/flowtype to destination CPU. This only modifies the case where RSS is enabled and the per-cpu tcp timer option is enabled. Otherwise the behaviour should be the same as before.	2014-05-18 22:39:01 +00:00
Adrian Chadd	9c42397277	* When copying the flowid from inp -> outbound mbuf, also assign the hashtype to to the outbound mbuf as well as the flowid. * Add in socket options to fetch the hashid, the hashtype and RSS CPU ID for a given socket.	2014-05-18 22:37:31 +00:00
Adrian Chadd	2f71993288	Ensure that the flowid hashtype is assigned to the inp if the flowid is also assigned.	2014-05-18 22:34:06 +00:00
Adrian Chadd	cc6c187794	Add a new function to do a CPU ID lookup based on RSS hash information. This is intended to be used by various places that wish to hash some information about a TCP/UDP/IP flow but don't necessarily have a live mbuf to do it with. Refactor rss_m2cpuid() to use the refactored function.	2014-05-18 22:32:04 +00:00
Adrian Chadd	34e3dcedec	Add the flowtype to the inpcb. The flowid isn't enough to use as part of any RSS related CPU affinity lookups - the RSS code would like to know what kind of hash it is.	2014-05-18 22:30:12 +00:00

... 2 3 4 5 6 ...

5140 Commits