freebsd-skq

Author	SHA1	Message	Date
Alexander V. Chernikov	aa5f023eaf	Unify nd6 state switching by using newly-created nd6_llinfo_setstate() function. The change is mostly mechanical with the following exception: Last piece of nd6_resolve_slow() was refactored: ND6_LLINFO_PERMANENT condition was removed as always-true, explicit ND6_LLINFO_NOSTATE -> ND6_LLINFO_INCOMPLETE state transition was removed as duplicate. Reviewed by: ae Sponsored by: Yandex LLC	2015-09-21 11:19:53 +00:00
Gleb Smirnoff	399fbd0ec0	Use proper byteswap macro. This isn't a functional change.	2015-09-17 17:27:49 +00:00
Gleb Smirnoff	db642c8e6e	In tcp_ctlinput() separate the (ip == NULL) block from the rest of the function to reduce so many levels of indentation. Style the lines that got now indentation reduced. No functional change. Checked with: md5	2015-09-16 21:42:33 +00:00
Alexander V. Chernikov	59c180c35c	Unify loopback route switching: * prepare gateway before insertion * use RTM_CHANGE instead of explicit find/change route * Remove fib argument from ifa_switch_loopback_route added in r264887: if old ifp fib differes from new one, that the caller is doing something wrong * Make ifa_*_loopback_route call single ifa_maintain_loopback_route().	2015-09-16 06:23:15 +00:00
Brad Davis	e5fe11011a	Remove redundant 'man page' Reviewed by: allanjude	2015-09-15 21:16:45 +00:00
Hiren Panchasara	550e9d4235	Remove unnecessary tcp state transition call. Differential Revision: D3451 Reviewed by: markj MFC after: 2 weeks Sponsored by: Limelight Networks	2015-09-15 20:04:30 +00:00
Alexander V. Chernikov	eec33ea052	* Improve logging invalid arp messages * Remove redundant check in ip_arpinput Suggested by: glebius MFC after: 2 weeks	2015-09-15 08:50:44 +00:00
Alexander V. Chernikov	d3cdb71655	* Require explicitl lle unlink prior to calling llentry_delete(). This one slightly decreases time of holding afdata wlock. * While here, make nd6_free() return void. No one has used its return value since r186119.	2015-09-15 06:48:19 +00:00
Alexander V. Chernikov	3e7a2321e3	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
Alexander V. Chernikov	deb6bda6e3	* Improve error checking for arp messages. * Clean stale headers from if_ether.c. Reported by: rozhuk.im at gmail.com Reviewed by: ae MFC after: 2 weeks	2015-09-14 10:28:47 +00:00
Hans Petter Selasky	d76d40126e	Update TSO limits to include all headers. To make driver programming easier the TSO limits are changed to reflect the values used in the BUSDMA tag a network adapter driver is using. The TCP/IP network stack will subtract space for all linklevel and protocol level headers and ensure that the full mbuf chain passed to the network adapter fits within the given limits. Implementation notes: If a network adapter driver needs to fixup the first mbuf in order to support VLAN tag insertion, the size of the VLAN tag should be subtracted from the TSO limit. Else not. Network adapters which typically inline the complete header mbuf could technically transmit one more segment. This patch does not implement a mechanism to recover the last segment for data transmission. It is believed when sufficiently large mbuf clusters are used, the segment limit will not be reached and recovering the last segment will not have any effect. The current TSO algorithm tries to send MTU-sized packets, where the MTU typically is 1500 bytes, which gives 1448 bytes of TCP data payload per packet for IPv4. That means if the TSO length limitiation is set to 65536 bytes, there will be a data payload remainder of (65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to recover total TSO length due to inlining mbuf header data will not have any effect, because adding or removing the ETH/IP/TCP headers to or from 324 bytes will not cause more or less TCP payload to be TSO'ed. Existing network adapter limits will be updated separately. Differential Revision: https://reviews.freebsd.org/D3458 Reviewed by: rmacklem MFC after: 2 weeks	2015-09-14 08:36:22 +00:00
George V. Neville-Neil	5d06879adb	dd DTrace probe points, translators and a corresponding script to provide the TCPDEBUG functionality with pure DTrace. Reviewed by: rwatson MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: D3530	2015-09-13 15:50:55 +00:00
Michael Tuexen	30811e70d9	Fix compilation issue introduced in r287717. Thanks to bz@ for making me aware of it. MFC after: 1 week	2015-09-12 21:23:24 +00:00
Michael Tuexen	6802b0904f	Address a compile warning. MFC after: 1 week	2015-09-12 18:00:06 +00:00
Michael Tuexen	86eda749af	Cleanup the handling of error causes for ERROR chunks. This fixes an inconsistency of the padding handling. The final padding is now considered to be a chunk padding. MFC after: 1 week	2015-09-12 17:08:51 +00:00
Michael Tuexen	e629b9fc56	Ensure that ERROR chunks are always padded by implementing this in the routine, which queues an ERROR chunk, instead on relyinh on the callers to do so. Since one caller missed this, this actially fixes a bug. MFC after: 1 week	2015-09-11 13:54:33 +00:00
Michael Tuexen	0941640f34	RFC 4960 requires that packets containing an INIT chunk bundled with another chunk are silently discarded. Do so, instead of sending an ABORT. MFC after: 1 week	2015-09-07 14:00:38 +00:00
Allan Jude	32d321fa4a	missed file that should have been included in r287528 PR: 184110 Submitted by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> Approved by: wblock (mentor)	2015-09-07 02:00:05 +00:00
Adrian Chadd	499baf0aa7	Replace rss_m2cpuid with rss_soft_m2cpuid_v4 for ip_direct_nh.nh_m2cpuid, because the RSS hash may need to be recalculated. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3564	2015-09-06 20:20:48 +00:00
Alexander V. Chernikov	26deb8826c	Do not pass lle to nd6_ns_output(). Use newly-added nd6_llinfo_get_holdsrc() to extract desired IPv6 source from holdchain and pass it to the nd6_ns_output().	2015-09-05 14:14:03 +00:00
Gleb Smirnoff	388909a12a	Use Jenkins hash for TCP syncache. o Unlike xor, in Jenkins hash every bit of input affects virtually every bit of output, thus salting the hash actually works. With xor salting only provides a false sense of security, since if hash(x) collides with hash(y), then of course, hash(x) ^ salt would also collide with hash(y) ^ salt. [1] o Jenkins provides much better distribution than xor, very close to ideal. TCP connection setup/teardown benchmark has shown a 10% increase with default hash size, and with bigger hashes that still provide possibility for collisions. With enormous hash size, when dataset is by an order of magnitude smaller than hash size, the benchmark has shown 4% decrease in performance decrease, which is expected and acceptable. Noticed by: Jeffrey Knockel <jeffk cs.unm.edu> [1] Benchmarks by: jch Reviewed by: jch, pkelsey, delphij Security: strengthens protection against hash collision DoS Sponsored by: Nginx, Inc.	2015-09-05 10:15:19 +00:00
Gleb Smirnoff	24067db8ca	Make tcp_mtudisc() static and void. No functional changes. Sponsored by: Nginx, Inc.	2015-09-04 12:02:12 +00:00
Michael Tuexen	6fb9db98b3	Don't leak memory in an error case. MFC after: 1 week	2015-09-04 09:24:07 +00:00
Michael Tuexen	59713bbf27	Add a NULL pointer check to silence the clang code analyzer. MFC after: 1 week	2015-09-04 09:22:16 +00:00
Michael Tuexen	aa1cfca969	Fix a bug where two SHUTDOWN_ACK chunks were sent if a SHUTDOWN chunk was received acking all outstanding data.	2015-09-03 22:15:56 +00:00
Julien Charbon	d6de19ac2f	Put r284245 back in place: If at first this fix was seen as a temporary workaround for a callout(9) issue, it turns out it is instead the right way to use callout in mpsafe mode without using callout_drain(). r284245 commit message: Fix a callout race condition introduced in TCP timers callouts with r281599. In TCP timer context, it is not enough to check callout_stop() return value to decide if a callout is still running or not, previous callout_reset() return values have also to be checked. Differential Revision: https://reviews.freebsd.org/D2763	2015-08-30 13:44:39 +00:00
Michael Tuexen	2e2d67945a	Use 5 times RTO.Max as the default for the shutdown guard timer as required by RFC 4960. The sysctl variable can be used to overwrite this. Discussed with: rrs MFC after: 1 week	2015-08-29 17:26:29 +00:00
Michael Tuexen	e92c2a8d6a	Fix the exporting of SCTP association states to userland. Without this, associations in SHUTDOWN-PENDING were never reported correctly. MFC after: 3 weeks	2015-08-29 09:14:32 +00:00
Adrian Chadd	2527ccad2d	Rename rss_soft_m2cpuid() -> rss_soft_m2cpuid_v4() in preparation for an IPv6 version to show up. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3504	2015-08-29 06:58:30 +00:00
Adrian Chadd	e5562eb934	Replace the printf()s with optional rate limited debugging for RSS. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471	2015-08-28 05:58:16 +00:00
Bjoern A. Zeeb	a86e5c96af	get_inpcbinfo() and get_pcblist() are UDP local functions and do not do what one would expect by name. Prefix them with "udp_" to at least obviously limit the scope. This is a non-functional change. Reviewed by: gnn, rwatson MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3505	2015-08-27 15:27:41 +00:00
Julien Charbon	bcf9b91395	Revert r284245: "Fix a callout race condition introduced in TCP timers callouts with r281599." r281599 fixed a TCP timer race condition, but due a callout(9) bug it also introduced another race condition workaround-ed with r284245. The callout(9) bug being fixed with r286880, we can now revert the workaround (r284245). Differential Revision: https://reviews.freebsd.org/D2079 (Initial change) Differential Revision: https://reviews.freebsd.org/D2763 (Workaround) Differential Revision: https://reviews.freebsd.org/D3078 (Fix) Sponsored by: Verisign, Inc. MFC after: 2 weeks	2015-08-24 09:30:27 +00:00
Alexander V. Chernikov	5a2555160f	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
Alexander V. Chernikov	a4141c63c5	Check value return from lle_create() for NULL. This bug sneaked unnoticed in r286722. Reported by: adrian	2015-08-19 21:08:42 +00:00
Julien Charbon	31a7749d4b	Make clear that TIME_WAIT timeout expiration is managed solely by tcp_tw_2msl_scan(). Sponsored by: Verisign, Inc.	2015-08-18 08:27:26 +00:00
Alexander V. Chernikov	0c4210f984	Fix panic when handling non-inet arp message introduced in r286825. Submitted by: delphij	2015-08-18 06:16:19 +00:00
Alexander V. Chernikov	512e30ef9f	Split arpresolve() into fast/slow path. This change isolates the most common case (e.g. successful lookup) from more complicates scenarios. It also (tries to) make code more simple by avoiding retry: cycle. The actual goal is to prepare code to the upcoming change that will allow LL address retrieval without acquiring LLE lock at all. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D3383	2015-08-16 12:23:58 +00:00
Michael Tuexen	faadc1b492	Allow the path MTU to grow up to the outgoing interface MTU. MFC after: 3 days	2015-08-14 14:26:13 +00:00
Alexander V. Chernikov	f3bfa7d1cf	Move lle update code from from gigantic ip_arpinput() to separate bunch of functions. The goal is to isolate actual lle updates to permit more fine-grained locking. Do all lle link-level update under AFDATA wlock. Sponsored by: Yandex LLC	2015-08-13 13:38:09 +00:00
Hiren Panchasara	ad389a8c3b	Remove unused TCPTV_SRTTDFLT. We initialize srtt with TCPTV_SRTTBASE when we don't have any rtt estimate. Differential Revision: D3334 Sponsored by: Limelight Networks	2015-08-12 16:08:37 +00:00
Alexander V. Chernikov	0447c1367a	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
Alexander V. Chernikov	314294de5c	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
Alexander V. Chernikov	41cb42a633	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
Alexander V. Chernikov	2caee4be35	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
Alexander V. Chernikov	11cdad9873	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
Kristof Provost	30edc5385e	tcp_reass_zone is not a VNET variable. This fixes a panic during 'sysctl -a' on VIMAGE kernels. The tcp_reass_zone variable is not VNET_DEFINE() so we can not mark it as a VNET variable (with CTLFLAG_VNET).	2015-08-09 19:07:24 +00:00
Marius Strobl	d2b5ade3f4	Fix compilation after r286458.	2015-08-08 21:42:15 +00:00
Marius Strobl	6e4cd74673	Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT.	2015-08-08 21:41:59 +00:00
Alexander V. Chernikov	4bdf0b6a9a	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
Alexander V. Chernikov	e362cf0e9f	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
Alexander V. Chernikov	331dff0737	Simplify ip[6] simploop: Do not pass 'dst' sockaddr to ip[6]_mloopback: - We have explicit check for AF_INET in ip_output() - We assume ip header inside passed mbuf in ip_mloopback - We assume ip6 header inside passed mbuf in ip6_mloopback	2015-08-08 15:58:35 +00:00
Julien Charbon	079672cb07	Fix a kernel assertion issue introduced with r286227: Avoid too strict INP_INFO_RLOCK_ASSERT checks due to tcp_notify() being called from in6_pcbnotify(). Reported by: Larry Rosenman <ler@lerctr.org> Submitted by: markj, jch	2015-08-08 08:40:36 +00:00
Mark Johnston	8f980c016b	The mbuf parameter to ip_output_pfil() must be an output parameter since pfil(9) hooks may modify the chain. X-MFC-With: r286028	2015-08-03 17:47:02 +00:00
Julien Charbon	ff9b006d61	Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.	2015-08-03 12:13:54 +00:00
Michael Tuexen	e7e71dd7f3	Don't take the port numbers for packets containing ABORT chunks from a freed mbuf. Just use them from the stcb. MFC after: 3 days	2015-08-02 16:07:30 +00:00
Andrey V. Elsukov	cf14ccb0f7	Remove unneded #include "opt_inet.h".	2015-07-31 09:02:28 +00:00
Hiren Panchasara	03041aaac8	Update snd_una description to make it more readable. Differential Revision: https://reviews.freebsd.org/D3179 Reviewed by: gnn Sponsored by: Limelight Networks	2015-07-30 19:24:49 +00:00
Ermal Luçi	3c40232395	Avoid double reference decrement when firewalls force relooping of packets When firewalls force a reloop of packets and the caller supplied a route the reference to the route might be reduced twice creating issues. This is especially the scenario when a packet is looped because of operation in the firewall but the new route lookup gives a down route. Differential Revision: https://reviews.freebsd.org/D3037 Reviewed by: gnn Approved by: gnn(mentor)	2015-07-29 20:10:36 +00:00
Ermal Luçi	d9f2a78249	ip_output normalization and fixes ip_output has a big chunk of code used to handle special cases with pfil consumers which also forces a reloop on it. Gather all this code together to make it readable and properly handle the reloop cases. Some of the issues identified: M_IP_NEXTHOP is not handled properly in existing code. route reference leaking is possible with in FIB number change route flags checking is not consistent in the function Differential Revision: https://reviews.freebsd.org/D3022 Reviewed by: gnn Approved by: gnn(mentor) MFC after: 4 weeks	2015-07-29 18:04:01 +00:00
Patrick Kelsey	4741bfcb57	Revert r265338, r271089 and r271123 as those changes do not handle non-inline urgent data and introduce an mbuf exhaustion attack vector similar to FreeBSD-SA-15:15.tcp, but not requiring VNETs. Address the issue described in FreeBSD-SA-15:15.tcp. Reviewed by: glebius Approved by: so Approved by: jmallett (mentor) Security: FreeBSD-SA-15:15.tcp Sponsored by: Norse Corp, Inc.	2015-07-29 17:59:13 +00:00
Andrey V. Elsukov	10a0e0bf0a	Eliminate the use of m_copydata() in gif_encapcheck(). ip_encap already has inspected mbuf's data, at least an IP header. And it is safe to use mtod() and do direct access to needed fields. Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that mbuf has a packet header. Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also remove "martian filters" checks. According to RFC 4213 it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-07-29 14:07:43 +00:00
Andrey V. Elsukov	cc0a3c8ca4	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
Michael Tuexen	9ae56375af	Fix a typo reported by Erik Cederstrand. MFC after: 1 week	2015-07-28 08:50:13 +00:00
Michael Tuexen	267dbe63a1	Provide consistent error causes whenever an ABORT chunk is sent. MFC after: 1 week	2015-07-27 22:35:54 +00:00
Michael Tuexen	cf9e47b2f0	Improve locking on Mac OS X. This does not change the functionality on FreeBSD. Reviewed by: rrs MFC after: 1 week	2015-07-26 10:37:40 +00:00
Michael Tuexen	6247db3541	Fix and improve a debug message. The SID was reported as an SSN. MFC after: 1 week	2015-07-26 10:17:17 +00:00
Michael Tuexen	4ff815b71c	Move including netinet/icmp6.h around to avoid a problem when including netinet/icmp6.h and net/netmap.h. Both use ni_flags... This allows to build multistack with SCTP support. MFC after: 1 week	2015-07-25 18:26:09 +00:00
Kristof Provost	fc4443a1d5	Remove stale comment. The IPv6 pseudo header checksum was added by bz in r235961. Sponsored by: Essen FreeBSD Hackathon	2015-07-25 16:14:55 +00:00
Randall Stewart	5f98acb594	Fix silly syntax error emacs chugged in for me.. gesh. MFC after: 3 weeks	2015-07-24 14:13:43 +00:00
Randall Stewart	c616859963	Fix an issue with MAC OS locking and also optimize the case where we are sending back a stream-reset and a sack timer is running, in that case we should just send the SACK. MFC after: 3 weeks	2015-07-24 14:09:03 +00:00
Randall Stewart	7cca17758c	Fix several problems with Stream Reset. 1) We were not handling (or sending) the IN_PROGRESS case if the other side (or our side) was not able to reset (awaiting more data). 2) We would improperly send a stream-reset when we should not. Not waiting until the TSN had been assigned when data was inqueue. Reviewed by: tuexen	2015-07-22 11:30:37 +00:00
Xin LI	47a8e86509	Fix resource exhaustion due to sessions stuck in LAST_ACK state. Submitted by: Jonathan Looney (Juniper SIRT) Reviewed by: lstewart Security: CVE-2015-5358 Security: SA-15:13.tcp	2015-07-21 23:42:15 +00:00
Ermal Luçi	705f4d9c6a	IPSEC, remove variable argument function its already due. Differential Revision: https://reviews.freebsd.org/D3080 Reviewed by: gnn, ae Approved by: gnn(mentor)	2015-07-21 21:46:24 +00:00
Randall Stewart	c0d1be08f6	When a tunneling protocol is being used with UDP we must release the lock on the INP before calling the tunnel protocol, else a LOR may occur (it does with SCTP for sure). Instead we must acquire a ref count and release the lock, taking care to allow for the case where the UDP socket has gone away and not unlocking since the refcnt decrement on the inp will do the unlock in that case. Reviewed by: tuexen MFC after: 3 weeks	2015-07-21 09:54:31 +00:00
Luigi Rizzo	a6e8e92404	fix a typo in a comment	2015-07-18 15:28:32 +00:00
Kevin Lo	ddee45244d	Since the IETF has redefined the meaning of the tos field to accommodate a set of differentiated services, set IPTOS_PREC_* macros using IPTOS_DSCP_* macro definitions. While here, add IPTOS_DSCP_VA macro according to RFC 5865. Differential Revision: https://reviews.freebsd.org/D3119 Reviewed by: gnn	2015-07-18 06:48:30 +00:00
Patrick Kelsey	d57724fd46	Check TCP timestamp option flag so that the automatic receive buffer scaling code does not use an uninitialized timestamp echo reply value from the stack when timestamps are not enabled. Differential Revision: https://reviews.freebsd.org/D3060 Reviewed by: hiren Approved by: jmallett (mentor) MFC after: 3 days Sponsored by: Norse Corp, Inc.	2015-07-17 17:36:33 +00:00
Ermal Luçi	56844a6203	Correct issue presented in r285051, apparently neither clang nor gcc complain about this. But clang intis the var to NULL correctly while gcc on at least mips does not. Correct the undefined behavior by initializing the variable properly. PR: 201371 Differential Revision: https://reviews.freebsd.org/D3036 Reviewed by: gnn Approved by: gnn(mentor)	2015-07-09 16:28:36 +00:00
Michael Tuexen	29b9533b43	Export the ssthresh value per SCTP path via the sysctl interface. MFC after: 1 month	2015-07-07 06:34:28 +00:00
Ermal Luçi	d14122b078	Avoid doing multiple route lookups for the same destination IP during forwarding ip_forward() does a route lookup for testing this packet can be sent to a known destination, it also can do another route lookup if it detects that an ICMP redirect is needed, it forgets all of this and handovers to ip_output() to do the same lookup yet again. This optimisation just does one route lookup during the forwarding path and handovers that to be considered by ip_output(). Differential Revision: https://reviews.freebsd.org/D2964 Approved by: ae, gnn(mentor) MFC after: 1 week	2015-07-02 18:10:41 +00:00
Navdeep Parhar	9523d1bfc3	Fix leak in tcp_lro_rx. Simply clearing M_PKTHDR isn't enough, any tags hanging off the header need to be freed too. Differential Revision: https://reviews.freebsd.org/D2708 Reviewed by: ae@, hiren@	2015-06-30 17:19:58 +00:00
Hiren Panchasara	f85680793b	Avoid a situation where we do not set persist timer after a zero window condition. If you send a 0-length packet, but there is data is the socket buffer, and neither the rexmt or persist timer is already set, then activate the persist timer. PR: 192599 Differential Revision: D2946 Submitted by: jlott at averesystems dot com Reviewed by: jhb, jch, gnn, hiren Tested by: jlott at averesystems dot com, jch MFC after: 2 weeks	2015-06-29 21:23:54 +00:00
Hiren Panchasara	4c3972f0af	Reverting r284710. Today I learned: iff == if and only if. Suggested by: many	2015-06-22 22:16:06 +00:00
Hiren Panchasara	26f2eb6979	Fix a typo: s/iff/if/ Sponsored by: Limelight Networks	2015-06-22 21:53:55 +00:00
Michael Tuexen	bb2dc69f9a	Fix two KTRACE related bugs. Reported by: Coverity CID: 1018058, 1018060 MFC after: 3 days	2015-06-19 21:55:12 +00:00
Michael Tuexen	5de07f524d	When setting the primary address, return an error whenever it fails. MFC after: 3 days	2015-06-19 12:48:22 +00:00
Andrey V. Elsukov	efb5228ce8	Fix possible use after free in encap[46]_input(). There is small window, when encap_detach() can free matched entry directly after we release encapmtx. Instead of use pointer to the matched entry, save pointers to needed variables from this entry and use them after release mutex. Pass argument stored in the encaptab entry to encap_fillarg(), instead of pointer to matched entry. Also do not allocate new mbuf tag, when argument that we plan to save in this tag is NULL. Also make encaptab variable static. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-06-18 18:28:38 +00:00
Michael Tuexen	5fe29cdf20	Fix a bug related to flow assignment I introduce in https://svnweb.freebsd.org/base?view=revision&revision=275483 MFC after: 3 days	2015-06-17 19:26:23 +00:00
Michael Tuexen	d089f9b915	Add FIB support for SCTP. This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379 MFC after: 3 days	2015-06-17 15:20:14 +00:00
Ermal Luçi	f0516b3c96	If there is a system with a bpf consumer running and a packet is wanted to be transmitted but the arp cache entry expired, which triggers an arp request to be sent, the bpf code might want to sleep but crash the system due to a non sleep lock held from the arp entry not released properly. Release the lock before calling the arp request code to solve the issue as is done on all the other code paths. PR: 200323 Approved by: ae, gnn(mentor) MFC after: 1 week Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D2828	2015-06-17 12:23:04 +00:00
Michael Tuexen	790a758db8	Correctly detect the case where the last address is removed. MFC after: 3 days	2015-06-14 22:14:00 +00:00
Michael Tuexen	8e4844dc49	Stop the heartbeat timer when removing a net. Thanks to the reporter of https://code.google.com/p/sctp-refimpl/issues/detail?id=14 for reporting the issue. MFC after: 3 days	2015-06-14 17:48:44 +00:00
Michael Tuexen	75cf6fb38e	Fix the reporting of the PMTUD state for specific paths. MFC after: 3 days	2015-06-12 18:59:29 +00:00
Michael Tuexen	9cbf1815c0	Code cleanup. MFC after: 3 days	2015-06-12 17:20:09 +00:00
Michael Tuexen	a6a7d5cf0d	In case of an output error, continue with the next net, don't try to continue sending on the same net. This fixes a bug where an invalid mbuf chain was constructed, if a full size frame of control chunks should be sent and there is a output error. Based on a discussion with rrs@, change move to the next net. This fixes the bug and improves the behaviour. Thanks to Irene Ruengeler for spending a lot of time in narrowing this problem down. MFC after: 3 days	2015-06-12 16:01:41 +00:00
Julien Charbon	cad814eedf	Fix a callout race condition introduced in TCP timers callouts with r281599. In TCP timer context, it is not enough to check callout_stop() return value to decide if a callout is still running or not, previous callout_reset() return values have also to be checked. Differential Revision: https://reviews.freebsd.org/D2763 Reviewed by: hiren Approved by: hiren MFC after: 1 day Sponsored by: Verisign, Inc.	2015-06-10 20:43:07 +00:00
Michael Tuexen	0694a1bc74	Export a pointer to the SCTP socket. This is needed to add SCTP support to sockstat. MFC after: 3 days	2015-06-04 12:46:56 +00:00
Michael Tuexen	c06184c814	Remove printf() noise... MFC after: 3 days	2015-05-29 08:31:15 +00:00
Michael Tuexen	c913390df3	Report the MTU consistently as specified in https://tools.ietf.org/html/rfc6458 Thanks to Irene Ruengeler for helping me to fix this bug. MFC after: 3 days	2015-05-28 20:33:28 +00:00
Michael Tuexen	0818979a3c	Take source and destination address into account when determining the scope. This fixes a problem when a client with a global address connects to a server with a private address. Thanks to Irene Ruengeler in helping me to find the issue. MFC after: 3 days	2015-05-28 19:28:08 +00:00
Michael Tuexen	d60568d78a	Retire SCTP_DONT_DO_PRIVADDR_SCOPE which was never defined. MFC after: 3 days	2015-05-28 18:52:32 +00:00
Michael Tuexen	70fa550b45	Fix a bug where messages would not be sent in SHUTDOWN_RECEIVED state. This problem was reported by Mark Bonnekessel and Markus Boese. Thanks to Irene Ruengeler for helping me to fix the cause of the problem. It can be tested with the following packetdrill script: +0.0 socket(..., SOCK_STREAM, IPPROTO_SCTP) = 3 +0.0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) +0.0 fcntl(3, F_SETFL, O_RDWR\|O_NONBLOCK) = 0 // Check the handshake with an empty(!) cookie +0.1 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0.0 > sctp: INIT[flgs=0, tag=1, a_rwnd=..., os=..., is=..., tsn=0, ...] +0.1 < sctp: INIT_ACK[flgs=0, tag=2, a_rwnd=10000, os=1, is=1, tsn=0, STATE_COOKIE[len=4, val=...]] +0.0 > sctp: COOKIE_ECHO[flgs=0, len=4, val=...] +0.1 < sctp: COOKIE_ACK[flgs=0] +0.0 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 +0.0 write(3, ..., 1024) = 1024 +0.0 > sctp: DATA[flgs=BE, len=1040, tsn=0, sid=0, ssn=0, ppid=0] +0.0 write(3, ..., 1024) = 1024 // Pending due to Nagle +0.0 < sctp: SHUTDOWN[flgs=0, cum_tsn=0] +0.0 > sctp: DATA[flgs=BE, len=1040, tsn=1, sid=0, ssn=1, ppid=0] +0.0 < sctp: SACK[flgs=0, cum_tsn=1, a_rwnd=10000, gaps=[], dups=[]] // Do we need another SHUTDOWN here? +0.0 > sctp: SHUTDOWN_ACK[flgs=0] +0.0 < sctp: SHUTDOWN_COMPLETE[flgs=0] +0.0 close(3) = 0 MFC after: 3 days	2015-05-28 18:34:02 +00:00
Michael Tuexen	1c7db386c4	Use macros for overhead in a consistent way. No functional change. Thanks to Irene Ruengeler for suggesting the change. MFC after: 3 days	2015-05-28 17:57:56 +00:00
Michael Tuexen	ba78590287	Some more debug info cleanup. MFC after: 3 days	2015-05-28 16:39:22 +00:00
Michael Tuexen	b7d130befc	Fix and cleanup the debug information. This has no user-visible changes. Thanks to Irene Ruengeler for proving a patch. MFC after: 3 days	2015-05-28 16:00:23 +00:00
Michael Tuexen	548f47a8f1	Address some compiler warnings. No functional change. MFC after: 3 days	2015-05-28 14:24:21 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Hiren Panchasara	2645638064	Add a new sysctl net.inet.tcp.hostcache.purgenow=1 to expire and purge all entries in hostcache immediately. In collaboration with: bz, rwatson MFC after: 1 week Relnotes: yes Sponsored by: Limelight Networks	2015-05-20 01:08:01 +00:00
Hiren Panchasara	c52102dd25	Correct the wording as we are increasing the window size. Reviewed by: jhb Sponsored by: Limelight Networks	2015-05-19 19:17:20 +00:00
Andrey V. Elsukov	c1b4f79dfa	Add an ability accept encapsulated packets from different sources by one gif(4) interface. Add new option "ignore_source" for gif(4) interface. When it is enabled, gif's encapcheck function requires match only for packet's destination address. Differential Revision: https://reviews.freebsd.org/D2004 Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-15 12:19:45 +00:00
Michael Tuexen	fcbbf5af1d	Ensure that the COOKIE-ACK can be sent over UDP if the COOKIE-ECHO was received over UDP. Thanks to Felix Weinrank for makeing me aware of the problem and to Irene Ruengeler for providing the fix. MFC after: 1 week	2015-05-12 08:08:16 +00:00
George V. Neville-Neil	f00543eeab	Add a state transition call to show that we have entered TIME_WAIT. Although this is not important to the rest of the TCP processing it is a conveneint way to make the DTrace state-transition probe catch this important state change. MFC after: 1 week	2015-05-01 12:49:03 +00:00
George V. Neville-Neil	bee68e9400	Move the SIFTR DTrace probe out of the writing thread context and directly into the place where the data is collected.	2015-04-30 17:43:40 +00:00
George V. Neville-Neil	981ad3ecf9	Brief demo script showing the various values that can be read via the new SIFTR statically defined tracepoint (SDT). Differential Revision: https://reviews.freebsd.org/D2387 Reviewed by: bz, markj	2015-04-29 17:19:55 +00:00
Alexander V. Chernikov	74b22066b0	Make rule table kernel-index rewriting support any kind of objects. Currently we have tables identified by their names in userland with internal kernel-assigned indices. This works the following way: When userland wishes to communicate with kernel to add or change rule(s), it makes indexed sorted array of table names (internally ipfw_obj_ntlv entries), and refer to indices in that array in rule manipulation. Prior to committing new rule to the ruleset kernel a) finds all referenced tables, bump their refcounts and change values inside the opcodes to be real kernel indices b) auto-creates all referenced but not existing tables and then do a) for them. Kernel does almost the same when exporting rules to userland: prepares array of used tables in all rules in range, and prepends it before the actual ruleset retaining actual in-kernel indexes for that. There is also special translation layer for legacy clients which is able to provide 'real' indices for table names (basically doing atoi()). While it is arguable that every subsystem really needs names instead of numbers, there are several things that should be noted: 1) every non-singleton subsystem needs to store its runtime state somewhere inside ipfw chain (and be able to get it fast) 2) we can't assume object numbers provided by humans will be dense. Existing nat implementation (O(n) access and LIST inside chain) is a good example. Hence the following: * Convert table-centric rewrite code to be more generic, callback-based * Move most of the code from ip_fw_table.c to ip_fw_sockopt.c * Provide abstract API to permit subsystems convert their objects between userland string identifier and in-kernel index. (See struct opcode_obj_rewrite) for more details * Create another per-chain index (in next commit) shared among all subsystems * Convert current NAT44 implementation to use new API, O(1) lookups, shared index and names instead of numbers (in next commit). Sponsored by: Yandex LLC	2015-04-27 08:29:39 +00:00
Andrey V. Elsukov	3e92c37f32	Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet() returns EJUSTRETURN. Sponsored by: Yandex LLC	2015-04-27 01:11:09 +00:00
Andrey V. Elsukov	3d80e82d60	Fix possible use after free due to security policy deletion. When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(), we hold one reference to security policy and release it just after return from this function. But IPSec processing can be deffered and when we release reference to security policy after ipsec[46]_process_packet(), user can delete this security policy from SPDB. And when IPSec processing will be done, xform's callback function will do access to already freed memory. To fix this move KEY_FREESP() into callback function. Now IPSec code will release reference to SP after processing will be finished. Differential Revision: https://reviews.freebsd.org/D2324 No objections from: #network Sponsored by: Yandex LLC	2015-04-27 00:55:56 +00:00
Michael Tuexen	55f8a4bb1b	Don't panic under INVARIANTS when receiving a SACK which cumacks a TSN never sent. While there, fix two typos. MFC after: 1 week	2015-04-26 21:47:15 +00:00
Baptiste Daroussin	c37a0a8285	mdoc: fix rendering issues	2015-04-26 11:39:25 +00:00
Andrey V. Elsukov	1ffc12bc42	Fix possible reference leak. Sponsored by: Yandex LLC	2015-04-24 21:05:29 +00:00
Gleb Smirnoff	9c2cd1aa84	Improve carp(4) locking: - Use the carp_sx to serialize not only CARP ioctls, but also carp_attach() and carp_detach(). - Use cif_mtx to lock only access to those the linked list. - These locking changes allow us to do some memory allocations with M_WAITOK and also properly call callout_drain() in carp_destroy(). - In carp_attach() assert that ifaddr isn't attached. We always come here with a pristine address from in[6]_control(). Reviewed by: oleg Sponsored by: Nginx, Inc.	2015-04-21 20:25:12 +00:00
Gleb Smirnoff	28ebe80cab	Provide functions to determine presence of a given address configured on a given interface. Discussed with: np Sponsored by: Nginx, Inc.	2015-04-17 11:57:06 +00:00
Julien Charbon	5571f9cf81	Fix an old and well-documented use-after-free race condition in TCP timers: - Add a reference from tcpcb to its inpcb - Defer tcpcb deletion until TCP timers have finished Differential Revision: https://reviews.freebsd.org/D2079 Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com> Reviewed by: imp, rrs, adrian, jhb, bz Approved by: jhb Sponsored by: Verisign, Inc.	2015-04-16 10:00:06 +00:00
Adrian Chadd	3e217461e6	Fix RSS build - netisr input / NETISR_IP_DIRECT is used here.	2015-04-15 00:57:21 +00:00
Mateusz Guzik	2574218578	Replace struct filedesc argument in getsock_cap with struct thread This is is a step towards removal of spurious arguments.	2015-04-11 16:00:33 +00:00
Mateusz Guzik	90f54cbfeb	fd: remove filedesc argument from fdclose Just accept a thread instead. This makes it consistent with fdalloc. No functional changes.	2015-04-11 15:40:28 +00:00
Xin LI	843b0e5716	Attempt to fix build after 281351 by defining full prototype for the functions that were moved to ip_reass.c.	2015-04-11 01:06:59 +00:00
Gleb Smirnoff	c047fd1b99	o Use Jenkins hash. With previous hash, for a single source IP address and sequential IP ID case (e.g. ping -f), distribution fell into 8-10 buckets out of 64. With Jenkins hash, distribution is even. o Add random seed to the hash. Sponsored by: Nginx, Inc.	2015-04-10 06:55:43 +00:00
Gleb Smirnoff	1dbefcc00d	Move all code related to IP fragment reassembly to ip_reass.c. Some function names have changed and comments are reformatted or added, but there is no functional change. Claim copyright for me and Adrian. Sponsored by: Nginx, Inc.	2015-04-10 06:02:37 +00:00
Gleb Smirnoff	f25a3d10b3	Now that IP reassembly is no longer under single lock, book-keeping amount of allocations in V_nipq is racy. To fix that, we would simply stop doing book-keeping ourselves, and rely on UMA doing that. There could be a slight overcommit due to caches, but that isn't a big deal. o V_nipq and V_maxnipq go away. o net.inet.ip.fragpackets is now just SYSCTL_UMA_CUR() o net.inet.ip.maxfragpackets could have been just SYSCTL_UMA_MAX(), but historically it has special semantics about values of 0 and -1, so provide sysctl_maxfragpackets() to handle these special cases. o If zone limit lowers either due to net.inet.ip.maxfragpackets or due to kern.ipc.nmbclusters, then new function ipq_drain_tomax() goes over buckets and frees the oldest packets until we are in the limit. The code that (incorrectly) did that in ip_slowtimo() is removed. o ip_reass() doesn't check any limits and calls uma_zalloc(M_NOWAIT). If it fails, a new function ipq_reuse() is called. This function will find the oldest packet in the currently locked bucket, and if there is none, it will search in other buckets until success. Sponsored by: Nginx, Inc.	2015-04-09 22:13:27 +00:00
Gleb Smirnoff	f5746f593c	In the ip_reass() do packet examination and adjusting before acquiring locks and doing lookups. Sponsored by: Nginx, Inc.	2015-04-09 21:32:32 +00:00
Gleb Smirnoff	e3c2c63476	Make ip reassembly queue mutexes per-vnet, putting them into the structure that they protect. Sponsored by: Nginx, Inc.	2015-04-09 21:17:07 +00:00
Gleb Smirnoff	71c70e138d	Use TAILQ_FOREACH_SAFE() instead of implementing it ourselves. Sponsored by: Nginx, Inc.	2015-04-09 09:00:32 +00:00
Gleb Smirnoff	1c0b48c79a	If V_maxnipq is set to zero, drain the queue here and now, instead of relying on timeouts. Sponsored by: Nginx, Inc.	2015-04-09 08:56:23 +00:00
Gleb Smirnoff	55c28800ad	o Since we always update either fragdrop or fragtimeout stat counter when we free a fragment, provide two inline functions that do that for us: ipq_drop() and ipq_timeout(). o Rename ip_free_f() to ipq_free() to match the name scheme of IP reassembly. o Remove assertion from ipq_free(), since it requires extra argument to be passed, but locking scheme is simple enough and function is static. Sponsored by: Nginx, Inc.	2015-04-09 08:52:02 +00:00
Gleb Smirnoff	3de5805b02	Rename ip_drain_locked() to ip_drain_vnet(), since the function differs from ip_drain() not in locking, but in the scope of its work. Sponsored by: Nginx, Inc.	2015-04-09 08:37:16 +00:00
Adrian Chadd	f59e59d5c3	Move the IPv4 reassembly queue locking from a single lock to be per-bucket (global). This significantly improves performance on multi-core servers where there is any kind of IPv4 reassembly going on. glebius@ would like to see the locking moved to be attached to the reassembly bucket, which would make it per-bucket + per-VNET, instead of being global. I decided to keep it global for now as it's the minimal useful change; if people agree / wish to migrate it to be per-bucket / per-VNET then please do feel free to do so. I won't complain. Thanks to Norse Corp for giving me access to much larger servers to test this at across the 4 core boxes I have at home. Differential Revision: https://reviews.freebsd.org/D2095 Reviewed by: glebius (initial comments incorporated into this patch) MFC after: 2 weeks Sponsored by: Norse Corp, Inc (hardware)	2015-04-07 23:09:34 +00:00
Xin LI	edc76c95db	Improve patch for SA-15:04.igmp to solve a potential buffer overflow. Reported by: bde Submitted by: oshogbo	2015-04-07 20:20:03 +00:00
Gleb Smirnoff	93d4534cdc	Add sleepable lock to protect at least against two parallel SIOCSVHs. Sponsored by: Nginx, Inc.	2015-04-06 15:31:19 +00:00
Hans Petter Selasky	c4c4346f5f	Extend fixes made in r278103 and r38754 by copying the complete packet header and not only partial flags and fields. Firewalls can attach classification tags to the outgoing mbufs which should be copied to all the new fragments. Else only the first fragment will be let through by the firewall. This can easily be tested by sending a large ping packet through a firewall. It was also discovered that VLAN related flags and fields should be copied for packets traversing through VLANs. This is all handled by "m_dup_pkthdr()". Regarding the MAC policy check in ip_fragment(), the tag provided by the originating mbuf is copied instead of using the default one provided by m_gethdr(). Tested by: Karim Fodil-Lemelin <fodillemlinkarim at gmail.com> MFC after: 2 weeks Sponsored by: Mellanox Technologies PR: 7802	2015-04-02 15:47:37 +00:00
Julien Charbon	033749179f	Provide better debugging information in tcp_timer_activate() and tcp_timer_active() Differential Revision: https://reviews.freebsd.org/D2179 Suggested by: bz Reviewed by: jhb Approved by: jhb	2015-04-02 14:43:07 +00:00
Gleb Smirnoff	7a742e3744	Provide a comment explaining issues with the counter(9) trick, so that people won't copy and paste it blindly. Prodded by: ian Sponsored by: Nginx, Inc.	2015-04-02 14:22:59 +00:00
Bjoern A. Zeeb	1d549750c9	Try to unbreak the build after r280971 by providing the missing #include header for SYSINIT.	2015-04-02 00:30:53 +00:00
Gleb Smirnoff	6d947416cc	o Use new function ip_fillid() in all places throughout the kernel, where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes	2015-04-01 22:26:39 +00:00
Julien Charbon	18832f1fd1	Use appropriate timeout_t* instead of void* in tcp_timer_activate() Suggested by: imp Differential Revision: https://reviews.freebsd.org/D2154 Reviewed by: imp, jhb Approved by: jhb	2015-03-31 10:17:13 +00:00
Gleb Smirnoff	513635bfaa	VNETalize random IP ID engine. Sponsored by: Nginx, Inc.	2015-03-28 16:59:57 +00:00
Gleb Smirnoff	1f08c9479f	Initialize random IP ID engine via SYSINIT() instead of doing that on first packet. This allow to use M_WAITOK and cut down some error handling. Sponsored by: Nginx, Inc.	2015-03-28 16:06:46 +00:00
Fabien Thomas	d612b95e23	On multi CPU systems, we may emit successive packets with the same id. Fix the race by using an atomic operation. Differential Revision: https://reviews.freebsd.org/D2141 Obtained from: emeric.poupon@stormshield.eu MFC after: 1 week Sponsored by: Stormshield	2015-03-27 13:26:59 +00:00
Michael Tuexen	d59909c3e2	Improve the selection of the destination address of SACK chunks. This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196755 and is joint work with rrs@. MFC after: 1 week	2015-03-26 22:05:31 +00:00
Michael Tuexen	a756ffc931	Make sure that we don't free an SCTP shared key too early. Thanks to Pouyan Sepehrdad from Qualcomm Product Security Initiative for reporting the issue. MFC after: 3 days	2015-03-25 22:45:54 +00:00
Michael Tuexen	d9bdc5200a	Use the reference count of the right SCTP inp. Joint work with rrs@ MFC after: 3 days	2015-03-25 21:41:20 +00:00
Michael Tuexen	0426123f75	Fix two bugs which resulted in a screwed up end point list: * Use a save way to walk throught a list while manipulting it. * Have to appropiate locks in place. Joint work with rrs@ MFC after: 3 days	2015-03-24 21:12:45 +00:00
Lawrence Stewart	efca16682d	The addition of flowid and flowtype in r280233 and r280237 respectively forgot to extend the IPv6 packet node format string, which causes a build failure when SIFTR is compiled with IPv6 support. Reported by: Lars Eggert	2015-03-24 15:08:43 +00:00
Michael Tuexen	8427b3fd4f	Fix the bug in the handling of fragmented abandoned SCTP user messages reported in https://code.google.com/p/sctp-refimpl/issues/detail?id=11 Thanks to Lally Singh for reporting it. MFC after: 3 days	2015-03-24 15:05:36 +00:00
Michael Tuexen	7fd5b4365a	Fix an accounting bug related to the per stream chunk counter. While there, don't refer to a net articifically. MFC after: 3 days	2015-03-24 14:51:46 +00:00
Michael Tuexen	ca0f81984a	When an ICMP message is received and the MTU shrinks, only mark outstanding chunks for retransmissions. MFC after: 3 days	2015-03-23 23:34:21 +00:00
Michael Tuexen	d5ec585697	Remove a useless assignment. MFC after: 1 week	2015-03-23 15:12:02 +00:00
Hiren Panchasara	d0a8b2a5ae	Add connection flow type to siftr(4). Suggested by: adrian Sponsored by: Limelight Networks	2015-03-19 00:23:16 +00:00
Hiren Panchasara	a025fd1487	Add connection flowid to siftr(4). Reviewed by: lstewart MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D2089	2015-03-18 23:24:25 +00:00
Adrian Chadd	3b27278218	Correctly const-ify things. Found by: clang 3.6	2015-03-18 04:40:36 +00:00
Ian Lepore	dcdeb95f09	Go back to using sbuf_new() with a preallocated large buffer, to avoid triggering an sbuf auto-drain copyout while holding a lock. Pointed out by: jhb Pointy hat: ian	2015-03-14 23:57:33 +00:00
Ian Lepore	751ccc429d	Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl string returned to userland is nulterminated. PR: 195668	2015-03-14 18:11:24 +00:00
Andrey V. Elsukov	2530ed9e70	Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value to obtain IPv4 next hop address in tablearg case. Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we still use this opcode, but when packet identified as IPv6 packet, we obtain next hop address from dedicated field nh6 in struct table_value. Replace hopstore field in struct ip_fw_args with anonymous union and add hopstore6 field. Use this field to copy tablearg value for IPv6. Replace spare1 field in struct table_value with zoneid. Use it to keep scope zone id for link-local IPv6 addresses. Since spare1 was used internally, replace spare0 array with two variables spare0 and spare1. Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting IPv6 addresses in table_value. Use zoneid field in struct table_value to store sin6_scope_id value. Since the kernel still uses embedded scope zone id to represent link-local addresses, convert next_hop6 address into this form before return from pfil processing. This also fixes in6_localip() check for link-local addresses. Differential Revision: https://reviews.freebsd.org/D2015 Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-03-13 09:03:25 +00:00
Michael Tuexen	5ba11c4c2e	Update a comment to get it aligned with the code change. Reported by: brueffer@	2015-03-11 15:40:29 +00:00
Michael Tuexen	975c975bf0	It seems that sb_acc is a better replacement for sb_cc than sb_ccc. At least it unbreaks the use of select() for SCTP sockets. MFC after: 3 days	2015-03-11 15:21:39 +00:00
Michael Tuexen	b3bf169ac7	Fix the adaptation of the path state when thresholds are changed using the SCTP_PEER_ADDR_THLDS socket option. MFC after: 3 days	2015-03-11 14:25:23 +00:00
Michael Tuexen	3cb3567d7e	Keep track on the socket lock state. This fixes a bug showing up on Mac OS X. MFC after: 3 days	2015-03-10 22:38:10 +00:00
Michael Tuexen	2bb7e77385	Unlock the stcb when using setsockopt() for the SCTP_PEER_ADDR_THLDS option. MFC after: 3 days	2015-03-10 21:05:17 +00:00
Michael Tuexen	59b6d5be4e	Add a SCTP socket option to limit the cwnd for each path. MFC after: 1 month	2015-03-10 19:49:25 +00:00
Michael Tuexen	8f290ed51b	Fix a typo. MFC after: 1 week	2015-03-10 09:16:31 +00:00
Julien Charbon	eb96dc3336	In TCP, connect() can return incorrect error code EINVAL instead of EADDRINUSE or ECONNREFUSED PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196035 Differential Revision: https://reviews.freebsd.org/D1982 Reported by: Mark Nunberg <mnunberg@haskalah.org> Submitted by: Harrison Grundy <harrison.grundy@astrodoggroup.com> Reviewed by: adrian, jch, glebius, gnn Approved by: jhb MFC after: 2 weeks	2015-03-09 20:29:16 +00:00
Andrey V. Elsukov	1b5aa92cff	lla_lookup() can directly call llentry_free() for static entries and the last one requires to hold afdata's wlock. PR: 197096 MFC after: 1 week	2015-03-07 18:33:08 +00:00
Hiroki Sato	11d8451df3	Implement Enhanced DAD algorithm for IPv6 described in draft-ietf-6man-enhanced-dad-13. This basically adds a random nonce option (RFC 3971) to NS messages for DAD probe to detect a looped back packet. This looped back packet prevented DAD on some pseudo-interfaces which aggregates multiple L2 links such as lagg(4). The length of the nonce is set to 6 bytes. This algorithm can be disabled by setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis. Reported by: hiren Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D1835	2015-03-02 17:30:26 +00:00
Hans Petter Selasky	9c0f6aa762	Fix a special case in ip_fragment() to produce a more sensible chain of packets. When the data payload length excluding any headers, of an outgoing IPv4 packet exceeds PAGE_SIZE bytes, a special case in ip_fragment() can kick in to optimise the outgoing payload(s). The code which was added in r98849 as part of zero copy socket support assumes that the beginning of any MTU sized payload is aligned to where a MBUF's "m_data" pointer points. This is not always the case and can sometimes cause large IPv4 packets, as part of ping replies, to be split more than needed. Instead of iterating the MBUFs to figure out how much data is in the current chain, use the value already in the "m_pkthdr.len" field of the first MBUF in the chain. Reviewed by: ken @ Differential Revision: https://reviews.freebsd.org/D1893 MFC after: 2 weeks Sponsored by: Mellanox Technologies	2015-02-25 13:58:43 +00:00
Xin LI	cfa498d88e	Fix integer overflow in IGMP protocol. Security: FreeBSD-SA-15:04.igmp Security: CVE-2015-1414 Found by: Mateusz Kocielski, Logicaltrust Analyzed by: Marek Kroemeke, Mateusz Kocielski (shm@NetBSD.org) and 22733db72ab3ed94b5f8a1ffcde850251fe6f466 Submited by: Mariusz Zaborski <oshogbo@FreeBSD.org> Reviewed by: bms	2015-02-25 05:42:59 +00:00
Zbigniew Bodek	8018ac153f	Change struct attribute to avoid aligned operations mismatch Previous __alignment(4) allowed compiler to assume that operations are performed on aligned region. On ARM processor, this led to alignment fault as shown below: trapframe: 0xda9e5b10 FSR=00000001, FAR=a67b680e, spsr=60000113 r0 =00000000, r1 =00000068, r2 =0000007c, r3 =00000000 r4 =a67b6826, r5 =a67b680e, r6 =00000014, r7 =00000068 r8 =00000068, r9 =da9e5bd0, r10=00000011, r11=da9e5c10 r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc <...> udp_input+0x264: ldmia r5, {r0-r3, r6} udp_input+0x268: stmia r12, {r0-r3, r6} This was due to instructions which do not support unaligned access, whereas for __alignment(2) compiler replaced ldmia/stmia with some logically equivalent memcpy operations. In fact, the assumption that 'struct ip' is always 4-byte aligned is definitely false, as we have no impact on data alignment of packet stream received. Another possible solution would be to explicitely perform memcpy() on objects of 'struct ip' type, which, however, would suffer from performance drop, and be merely a problem hiding. Please, note that this has nothing to do with ARM32_DISABLE_ALIGNMENT_FAULTS option, but is related strictly to compiler behaviour. Submitted by: Wojciech Macek <wma@semihalf.com> Reviewed by: glebius, ian Obtained from: Semihalf	2015-02-24 12:57:03 +00:00
Gleb Smirnoff	26d50672d6	The last userland piece of in_var.h is now 'struct in_aliasreq'. Move it to the top of the file, and ifdef _KERNEL the rest.	2015-02-19 23:59:27 +00:00
Gleb Smirnoff	e072c794ad	Now that all users of _WANT_IFADDR are fixed, remove this crutch and hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 23:16:10 +00:00
Gleb Smirnoff	0d159406b6	- Rename 'struct igmp_ifinfo' into 'struct igmp_ifsoftc', since it really represents a context. - Preserve name 'struct igmp_ifinfo' for a new structure, that will be stable API between userland and kernel. - Make sysctl_igmp_ifinfo() return the new 'struct igmp_ifinfo', instead of old one, which had a bunch of internal kernel structures in it. - Move all above declarations from in_var.h to igmp_var.h, since they are private to IGMP code. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 22:35:23 +00:00
Konstantin Belousov	0c6dcac369	Fix build with KTR after r278978.	2015-02-19 15:41:23 +00:00
Gleb Smirnoff	fd1b2a7c57	Widen _KERNEL ifdef to hide more kernel network stack structures from userland.	2015-02-19 06:24:27 +00:00
Gleb Smirnoff	058e08bea9	Use new struct mbufq instead of struct ifqueue to manage packet queues in IPv4 multicast code. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 01:21:02 +00:00
Randall Stewart	2575fbb827	This fixes a bug in the way that the LLE timers for nd6 and arp were being used. They basically would pass in the mutex to the callout_init. Because they used this method to the callout system, it was possible to "stop" the callout. When flushing the table and you stopped the running callout, the callout_stop code would return 1 indicating that it was going to stop the callout (that was about to run on the callout_wheel blocked by the function calling the stop). Now when 1 was returned, it would lower the reference count one extra time for the stopped timer, then a few lines later delete the memory. Of course the callout_wheel was stuck in the lock code and would then crash since it was accessing freed memory. By using callout_init(c, 1) we always get a 0 back and the reference counting bug does not rear its head. We do have to make a few adjustments to the callouts themselves though to make sure it does the proper thing if rescheduled as well as gets the lock. Commented upon by hiren and sbruno See Phabricator D1777 for more details. Commented upon by hiren and sbruno Reviewed by: adrian, jhb and bz Sponsored by: Netflix Inc.	2015-02-09 19:28:11 +00:00
Hans Petter Selasky	609752f04f	The flowid and hashtype should be copied from the originating packet when fragmenting IP packets to preserve the order of the packets in a stream. Else the resulting fragments can be sent out of order when the hardware supports multiple transmit rings. Reviewed by: glebius @ MFC after: 1 week Sponsored by: Mellanox Technologies	2015-02-02 17:32:50 +00:00
Hiren Panchasara	ec446b1375	Make syncookie_mac() use 'tcp_seq irs' in computing hash. This fixes what seems like a simple oversight when the function was added in r253210. Reported by: Daniel Borkmann <dborkman@redhat.com> Florian Westphal <fw@strlen.de> Differential Revision: https://reviews.freebsd.org/D1628 Reviewed by: gnn MFC after: 1 month Sponsored by: Limelight Networks	2015-01-30 17:29:07 +00:00
Michael Tuexen	aec9ef9745	Whitespace change.	2015-01-27 21:30:24 +00:00
Xin LI	6a58f0e913	Fix SCTP stream reset vulnerability. We would like to acknowledge Gerasimos Dimitriadis who reported the issue and Michael Tuexen who analyzed and provided the fix. Security: FreeBSD-SA-15:03.sctp Security: CVE-2014-8613 Submitted by: tuexen	2015-01-27 19:35:38 +00:00
Xin LI	38f2a43815	Fix SCTP SCTP_SS_VALUE kernel memory corruption and disclosure vulnerability. We would like to acknowledge Clement LECIGNE from Google Security Team and Francisco Falcon from Core Security Technologies who discovered the issue independently and reported to the FreeBSD Security Team. Security: FreeBSD-SA-15:02.kmem Security: CVE-2014-8612 Submitted by: tuexen	2015-01-27 19:35:36 +00:00
John Baldwin	002d455873	Use an sbuf to generate the output of the net.inet.tcp.hostcache.list sysctl to avoid a possible buffer overflow if the cache grows while the text is being generated. PR: 172675 MFC after: 2 weeks	2015-01-25 19:45:44 +00:00
Will Andrews	bb269f3ae4	Log hardware interface up/down as "hardware" rather than just "hw". Suggested by: glebius MFC after: 1 week MFC with: 277530	2015-01-23 14:30:24 +00:00
Will Andrews	369a670857	When a CARP state change is caused by an ifconfig request, log it accordingly. Suggested by: glebius MFC after: 1 week MFC with: 277530	2015-01-23 14:28:12 +00:00
Will Andrews	d01641e2c1	Improve CARP logging so that all state transitions are logged. sys/netinet/ip_carp.c: Add a "reason" string parameter to carp_set_state() and carp_master_down_locked() allowing more specific logging information to be passed into these apis. Refactor existing state transition logging into a single log call in carp_set_state(). Update all calls to carp_set_state() and carp_master_down_locked() to pass an appropriate reason string. For state transitions that were previously logged, the output should be unchanged. Submitted by: gibbs (original), asomers (updated) MFC after: 1 week Sponsored by: Spectra Logic MFSpectraBSD: 1039697 on 2014/02/11 (original) 1049992 on 2014/03/21 (updated)	2015-01-22 17:09:54 +00:00
Michael Tuexen	bcbf8c2105	Remove comparisons which are not necessary. Reported by: Coverity CID: 1237826, 1237844, 1237847 MFC after: 1 week	2015-01-20 19:08:55 +00:00
Michael Tuexen	2010054d91	Code cleanup. Reported by: Coverity CID: 749578 MFC after: 1 week	2015-01-19 11:52:08 +00:00
Michael Tuexen	e1600e5058	Fix a bug which only shows up when an mbuf allocation failed. Therefore chances are low that we hit this. Reported by: Coverity CID: 1018886 MFC after: 1 week	2015-01-18 22:00:39 +00:00
Michael Tuexen	d6165c1fca	Remove an unnecessary check. Reported by: Coverity CID: 749576 MFC after: 1 week	2015-01-18 21:16:22 +00:00
Michael Tuexen	3ff78fbbd9	Add protection code to free memory in case of processing an address which is neither IPv4 or IPv6. Reported by: Coverity CID: 749311 MFC after: 1 week	2015-01-18 20:53:20 +00:00
Michael Tuexen	61330de4b0	Remove an unused variable. Reported by: Coverity CID: 750999 MFC after: 1 week	2015-01-18 20:20:27 +00:00
Adrian Chadd	b2bdc62a95	Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits)	2015-01-18 18:06:40 +00:00
Gleb Smirnoff	fc2517100b	Do not go one layer down to check ifqueue length. First, not all drivers use ifqueue at all. Second, there is no point in this lockless check. Either positive or negative result of the check could be incorrect after a tick. Reviewed by: tuexen Sponsored by: Nginx, Inc.	2015-01-12 18:06:22 +00:00

... 2 3 4 5 6 ...

5450 Commits