freebsd-nq

Author	SHA1	Message	Date
Michael Tuexen	bb2dc69f9a	Fix two KTRACE related bugs. Reported by: Coverity CID: 1018058, 1018060 MFC after: 3 days	2015-06-19 21:55:12 +00:00
Michael Tuexen	5de07f524d	When setting the primary address, return an error whenever it fails. MFC after: 3 days	2015-06-19 12:48:22 +00:00
Andrey V. Elsukov	efb5228ce8	Fix possible use after free in encap[46]_input(). There is small window, when encap_detach() can free matched entry directly after we release encapmtx. Instead of use pointer to the matched entry, save pointers to needed variables from this entry and use them after release mutex. Pass argument stored in the encaptab entry to encap_fillarg(), instead of pointer to matched entry. Also do not allocate new mbuf tag, when argument that we plan to save in this tag is NULL. Also make encaptab variable static. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-06-18 18:28:38 +00:00
Michael Tuexen	5fe29cdf20	Fix a bug related to flow assignment I introduce in https://svnweb.freebsd.org/base?view=revision&revision=275483 MFC after: 3 days	2015-06-17 19:26:23 +00:00
Michael Tuexen	d089f9b915	Add FIB support for SCTP. This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379 MFC after: 3 days	2015-06-17 15:20:14 +00:00
Ermal Luçi	f0516b3c96	If there is a system with a bpf consumer running and a packet is wanted to be transmitted but the arp cache entry expired, which triggers an arp request to be sent, the bpf code might want to sleep but crash the system due to a non sleep lock held from the arp entry not released properly. Release the lock before calling the arp request code to solve the issue as is done on all the other code paths. PR: 200323 Approved by: ae, gnn(mentor) MFC after: 1 week Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D2828	2015-06-17 12:23:04 +00:00
Michael Tuexen	790a758db8	Correctly detect the case where the last address is removed. MFC after: 3 days	2015-06-14 22:14:00 +00:00
Michael Tuexen	8e4844dc49	Stop the heartbeat timer when removing a net. Thanks to the reporter of https://code.google.com/p/sctp-refimpl/issues/detail?id=14 for reporting the issue. MFC after: 3 days	2015-06-14 17:48:44 +00:00
Michael Tuexen	75cf6fb38e	Fix the reporting of the PMTUD state for specific paths. MFC after: 3 days	2015-06-12 18:59:29 +00:00
Michael Tuexen	9cbf1815c0	Code cleanup. MFC after: 3 days	2015-06-12 17:20:09 +00:00
Michael Tuexen	a6a7d5cf0d	In case of an output error, continue with the next net, don't try to continue sending on the same net. This fixes a bug where an invalid mbuf chain was constructed, if a full size frame of control chunks should be sent and there is a output error. Based on a discussion with rrs@, change move to the next net. This fixes the bug and improves the behaviour. Thanks to Irene Ruengeler for spending a lot of time in narrowing this problem down. MFC after: 3 days	2015-06-12 16:01:41 +00:00
Julien Charbon	cad814eedf	Fix a callout race condition introduced in TCP timers callouts with r281599. In TCP timer context, it is not enough to check callout_stop() return value to decide if a callout is still running or not, previous callout_reset() return values have also to be checked. Differential Revision: https://reviews.freebsd.org/D2763 Reviewed by: hiren Approved by: hiren MFC after: 1 day Sponsored by: Verisign, Inc.	2015-06-10 20:43:07 +00:00
Michael Tuexen	0694a1bc74	Export a pointer to the SCTP socket. This is needed to add SCTP support to sockstat. MFC after: 3 days	2015-06-04 12:46:56 +00:00
Michael Tuexen	c06184c814	Remove printf() noise... MFC after: 3 days	2015-05-29 08:31:15 +00:00
Michael Tuexen	c913390df3	Report the MTU consistently as specified in https://tools.ietf.org/html/rfc6458 Thanks to Irene Ruengeler for helping me to fix this bug. MFC after: 3 days	2015-05-28 20:33:28 +00:00
Michael Tuexen	0818979a3c	Take source and destination address into account when determining the scope. This fixes a problem when a client with a global address connects to a server with a private address. Thanks to Irene Ruengeler in helping me to find the issue. MFC after: 3 days	2015-05-28 19:28:08 +00:00
Michael Tuexen	d60568d78a	Retire SCTP_DONT_DO_PRIVADDR_SCOPE which was never defined. MFC after: 3 days	2015-05-28 18:52:32 +00:00
Michael Tuexen	70fa550b45	Fix a bug where messages would not be sent in SHUTDOWN_RECEIVED state. This problem was reported by Mark Bonnekessel and Markus Boese. Thanks to Irene Ruengeler for helping me to fix the cause of the problem. It can be tested with the following packetdrill script: +0.0 socket(..., SOCK_STREAM, IPPROTO_SCTP) = 3 +0.0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) +0.0 fcntl(3, F_SETFL, O_RDWR\|O_NONBLOCK) = 0 // Check the handshake with an empty(!) cookie +0.1 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0.0 > sctp: INIT[flgs=0, tag=1, a_rwnd=..., os=..., is=..., tsn=0, ...] +0.1 < sctp: INIT_ACK[flgs=0, tag=2, a_rwnd=10000, os=1, is=1, tsn=0, STATE_COOKIE[len=4, val=...]] +0.0 > sctp: COOKIE_ECHO[flgs=0, len=4, val=...] +0.1 < sctp: COOKIE_ACK[flgs=0] +0.0 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 +0.0 write(3, ..., 1024) = 1024 +0.0 > sctp: DATA[flgs=BE, len=1040, tsn=0, sid=0, ssn=0, ppid=0] +0.0 write(3, ..., 1024) = 1024 // Pending due to Nagle +0.0 < sctp: SHUTDOWN[flgs=0, cum_tsn=0] +0.0 > sctp: DATA[flgs=BE, len=1040, tsn=1, sid=0, ssn=1, ppid=0] +0.0 < sctp: SACK[flgs=0, cum_tsn=1, a_rwnd=10000, gaps=[], dups=[]] // Do we need another SHUTDOWN here? +0.0 > sctp: SHUTDOWN_ACK[flgs=0] +0.0 < sctp: SHUTDOWN_COMPLETE[flgs=0] +0.0 close(3) = 0 MFC after: 3 days	2015-05-28 18:34:02 +00:00
Michael Tuexen	1c7db386c4	Use macros for overhead in a consistent way. No functional change. Thanks to Irene Ruengeler for suggesting the change. MFC after: 3 days	2015-05-28 17:57:56 +00:00
Michael Tuexen	ba78590287	Some more debug info cleanup. MFC after: 3 days	2015-05-28 16:39:22 +00:00
Michael Tuexen	b7d130befc	Fix and cleanup the debug information. This has no user-visible changes. Thanks to Irene Ruengeler for proving a patch. MFC after: 3 days	2015-05-28 16:00:23 +00:00
Michael Tuexen	548f47a8f1	Address some compiler warnings. No functional change. MFC after: 3 days	2015-05-28 14:24:21 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Hiren Panchasara	2645638064	Add a new sysctl net.inet.tcp.hostcache.purgenow=1 to expire and purge all entries in hostcache immediately. In collaboration with: bz, rwatson MFC after: 1 week Relnotes: yes Sponsored by: Limelight Networks	2015-05-20 01:08:01 +00:00
Hiren Panchasara	c52102dd25	Correct the wording as we are increasing the window size. Reviewed by: jhb Sponsored by: Limelight Networks	2015-05-19 19:17:20 +00:00
Andrey V. Elsukov	c1b4f79dfa	Add an ability accept encapsulated packets from different sources by one gif(4) interface. Add new option "ignore_source" for gif(4) interface. When it is enabled, gif's encapcheck function requires match only for packet's destination address. Differential Revision: https://reviews.freebsd.org/D2004 Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-15 12:19:45 +00:00
Michael Tuexen	fcbbf5af1d	Ensure that the COOKIE-ACK can be sent over UDP if the COOKIE-ECHO was received over UDP. Thanks to Felix Weinrank for makeing me aware of the problem and to Irene Ruengeler for providing the fix. MFC after: 1 week	2015-05-12 08:08:16 +00:00
George V. Neville-Neil	f00543eeab	Add a state transition call to show that we have entered TIME_WAIT. Although this is not important to the rest of the TCP processing it is a conveneint way to make the DTrace state-transition probe catch this important state change. MFC after: 1 week	2015-05-01 12:49:03 +00:00
George V. Neville-Neil	bee68e9400	Move the SIFTR DTrace probe out of the writing thread context and directly into the place where the data is collected.	2015-04-30 17:43:40 +00:00
George V. Neville-Neil	981ad3ecf9	Brief demo script showing the various values that can be read via the new SIFTR statically defined tracepoint (SDT). Differential Revision: https://reviews.freebsd.org/D2387 Reviewed by: bz, markj	2015-04-29 17:19:55 +00:00
Alexander V. Chernikov	74b22066b0	Make rule table kernel-index rewriting support any kind of objects. Currently we have tables identified by their names in userland with internal kernel-assigned indices. This works the following way: When userland wishes to communicate with kernel to add or change rule(s), it makes indexed sorted array of table names (internally ipfw_obj_ntlv entries), and refer to indices in that array in rule manipulation. Prior to committing new rule to the ruleset kernel a) finds all referenced tables, bump their refcounts and change values inside the opcodes to be real kernel indices b) auto-creates all referenced but not existing tables and then do a) for them. Kernel does almost the same when exporting rules to userland: prepares array of used tables in all rules in range, and prepends it before the actual ruleset retaining actual in-kernel indexes for that. There is also special translation layer for legacy clients which is able to provide 'real' indices for table names (basically doing atoi()). While it is arguable that every subsystem really needs names instead of numbers, there are several things that should be noted: 1) every non-singleton subsystem needs to store its runtime state somewhere inside ipfw chain (and be able to get it fast) 2) we can't assume object numbers provided by humans will be dense. Existing nat implementation (O(n) access and LIST inside chain) is a good example. Hence the following: * Convert table-centric rewrite code to be more generic, callback-based * Move most of the code from ip_fw_table.c to ip_fw_sockopt.c * Provide abstract API to permit subsystems convert their objects between userland string identifier and in-kernel index. (See struct opcode_obj_rewrite) for more details * Create another per-chain index (in next commit) shared among all subsystems * Convert current NAT44 implementation to use new API, O(1) lookups, shared index and names instead of numbers (in next commit). Sponsored by: Yandex LLC	2015-04-27 08:29:39 +00:00
Andrey V. Elsukov	3e92c37f32	Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet() returns EJUSTRETURN. Sponsored by: Yandex LLC	2015-04-27 01:11:09 +00:00
Andrey V. Elsukov	3d80e82d60	Fix possible use after free due to security policy deletion. When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(), we hold one reference to security policy and release it just after return from this function. But IPSec processing can be deffered and when we release reference to security policy after ipsec[46]_process_packet(), user can delete this security policy from SPDB. And when IPSec processing will be done, xform's callback function will do access to already freed memory. To fix this move KEY_FREESP() into callback function. Now IPSec code will release reference to SP after processing will be finished. Differential Revision: https://reviews.freebsd.org/D2324 No objections from: #network Sponsored by: Yandex LLC	2015-04-27 00:55:56 +00:00
Michael Tuexen	55f8a4bb1b	Don't panic under INVARIANTS when receiving a SACK which cumacks a TSN never sent. While there, fix two typos. MFC after: 1 week	2015-04-26 21:47:15 +00:00
Baptiste Daroussin	c37a0a8285	mdoc: fix rendering issues	2015-04-26 11:39:25 +00:00
Andrey V. Elsukov	1ffc12bc42	Fix possible reference leak. Sponsored by: Yandex LLC	2015-04-24 21:05:29 +00:00
Gleb Smirnoff	9c2cd1aa84	Improve carp(4) locking: - Use the carp_sx to serialize not only CARP ioctls, but also carp_attach() and carp_detach(). - Use cif_mtx to lock only access to those the linked list. - These locking changes allow us to do some memory allocations with M_WAITOK and also properly call callout_drain() in carp_destroy(). - In carp_attach() assert that ifaddr isn't attached. We always come here with a pristine address from in[6]_control(). Reviewed by: oleg Sponsored by: Nginx, Inc.	2015-04-21 20:25:12 +00:00
Gleb Smirnoff	28ebe80cab	Provide functions to determine presence of a given address configured on a given interface. Discussed with: np Sponsored by: Nginx, Inc.	2015-04-17 11:57:06 +00:00
Julien Charbon	5571f9cf81	Fix an old and well-documented use-after-free race condition in TCP timers: - Add a reference from tcpcb to its inpcb - Defer tcpcb deletion until TCP timers have finished Differential Revision: https://reviews.freebsd.org/D2079 Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com> Reviewed by: imp, rrs, adrian, jhb, bz Approved by: jhb Sponsored by: Verisign, Inc.	2015-04-16 10:00:06 +00:00
Adrian Chadd	3e217461e6	Fix RSS build - netisr input / NETISR_IP_DIRECT is used here.	2015-04-15 00:57:21 +00:00
Mateusz Guzik	2574218578	Replace struct filedesc argument in getsock_cap with struct thread This is is a step towards removal of spurious arguments.	2015-04-11 16:00:33 +00:00
Mateusz Guzik	90f54cbfeb	fd: remove filedesc argument from fdclose Just accept a thread instead. This makes it consistent with fdalloc. No functional changes.	2015-04-11 15:40:28 +00:00
Xin LI	843b0e5716	Attempt to fix build after 281351 by defining full prototype for the functions that were moved to ip_reass.c.	2015-04-11 01:06:59 +00:00
Gleb Smirnoff	c047fd1b99	o Use Jenkins hash. With previous hash, for a single source IP address and sequential IP ID case (e.g. ping -f), distribution fell into 8-10 buckets out of 64. With Jenkins hash, distribution is even. o Add random seed to the hash. Sponsored by: Nginx, Inc.	2015-04-10 06:55:43 +00:00
Gleb Smirnoff	1dbefcc00d	Move all code related to IP fragment reassembly to ip_reass.c. Some function names have changed and comments are reformatted or added, but there is no functional change. Claim copyright for me and Adrian. Sponsored by: Nginx, Inc.	2015-04-10 06:02:37 +00:00
Gleb Smirnoff	f25a3d10b3	Now that IP reassembly is no longer under single lock, book-keeping amount of allocations in V_nipq is racy. To fix that, we would simply stop doing book-keeping ourselves, and rely on UMA doing that. There could be a slight overcommit due to caches, but that isn't a big deal. o V_nipq and V_maxnipq go away. o net.inet.ip.fragpackets is now just SYSCTL_UMA_CUR() o net.inet.ip.maxfragpackets could have been just SYSCTL_UMA_MAX(), but historically it has special semantics about values of 0 and -1, so provide sysctl_maxfragpackets() to handle these special cases. o If zone limit lowers either due to net.inet.ip.maxfragpackets or due to kern.ipc.nmbclusters, then new function ipq_drain_tomax() goes over buckets and frees the oldest packets until we are in the limit. The code that (incorrectly) did that in ip_slowtimo() is removed. o ip_reass() doesn't check any limits and calls uma_zalloc(M_NOWAIT). If it fails, a new function ipq_reuse() is called. This function will find the oldest packet in the currently locked bucket, and if there is none, it will search in other buckets until success. Sponsored by: Nginx, Inc.	2015-04-09 22:13:27 +00:00
Gleb Smirnoff	f5746f593c	In the ip_reass() do packet examination and adjusting before acquiring locks and doing lookups. Sponsored by: Nginx, Inc.	2015-04-09 21:32:32 +00:00
Gleb Smirnoff	e3c2c63476	Make ip reassembly queue mutexes per-vnet, putting them into the structure that they protect. Sponsored by: Nginx, Inc.	2015-04-09 21:17:07 +00:00
Gleb Smirnoff	71c70e138d	Use TAILQ_FOREACH_SAFE() instead of implementing it ourselves. Sponsored by: Nginx, Inc.	2015-04-09 09:00:32 +00:00
Gleb Smirnoff	1c0b48c79a	If V_maxnipq is set to zero, drain the queue here and now, instead of relying on timeouts. Sponsored by: Nginx, Inc.	2015-04-09 08:56:23 +00:00
Gleb Smirnoff	55c28800ad	o Since we always update either fragdrop or fragtimeout stat counter when we free a fragment, provide two inline functions that do that for us: ipq_drop() and ipq_timeout(). o Rename ip_free_f() to ipq_free() to match the name scheme of IP reassembly. o Remove assertion from ipq_free(), since it requires extra argument to be passed, but locking scheme is simple enough and function is static. Sponsored by: Nginx, Inc.	2015-04-09 08:52:02 +00:00
Gleb Smirnoff	3de5805b02	Rename ip_drain_locked() to ip_drain_vnet(), since the function differs from ip_drain() not in locking, but in the scope of its work. Sponsored by: Nginx, Inc.	2015-04-09 08:37:16 +00:00
Adrian Chadd	f59e59d5c3	Move the IPv4 reassembly queue locking from a single lock to be per-bucket (global). This significantly improves performance on multi-core servers where there is any kind of IPv4 reassembly going on. glebius@ would like to see the locking moved to be attached to the reassembly bucket, which would make it per-bucket + per-VNET, instead of being global. I decided to keep it global for now as it's the minimal useful change; if people agree / wish to migrate it to be per-bucket / per-VNET then please do feel free to do so. I won't complain. Thanks to Norse Corp for giving me access to much larger servers to test this at across the 4 core boxes I have at home. Differential Revision: https://reviews.freebsd.org/D2095 Reviewed by: glebius (initial comments incorporated into this patch) MFC after: 2 weeks Sponsored by: Norse Corp, Inc (hardware)	2015-04-07 23:09:34 +00:00
Xin LI	edc76c95db	Improve patch for SA-15:04.igmp to solve a potential buffer overflow. Reported by: bde Submitted by: oshogbo	2015-04-07 20:20:03 +00:00
Gleb Smirnoff	93d4534cdc	Add sleepable lock to protect at least against two parallel SIOCSVHs. Sponsored by: Nginx, Inc.	2015-04-06 15:31:19 +00:00
Hans Petter Selasky	c4c4346f5f	Extend fixes made in r278103 and r38754 by copying the complete packet header and not only partial flags and fields. Firewalls can attach classification tags to the outgoing mbufs which should be copied to all the new fragments. Else only the first fragment will be let through by the firewall. This can easily be tested by sending a large ping packet through a firewall. It was also discovered that VLAN related flags and fields should be copied for packets traversing through VLANs. This is all handled by "m_dup_pkthdr()". Regarding the MAC policy check in ip_fragment(), the tag provided by the originating mbuf is copied instead of using the default one provided by m_gethdr(). Tested by: Karim Fodil-Lemelin <fodillemlinkarim at gmail.com> MFC after: 2 weeks Sponsored by: Mellanox Technologies PR: 7802	2015-04-02 15:47:37 +00:00
Julien Charbon	033749179f	Provide better debugging information in tcp_timer_activate() and tcp_timer_active() Differential Revision: https://reviews.freebsd.org/D2179 Suggested by: bz Reviewed by: jhb Approved by: jhb	2015-04-02 14:43:07 +00:00
Gleb Smirnoff	7a742e3744	Provide a comment explaining issues with the counter(9) trick, so that people won't copy and paste it blindly. Prodded by: ian Sponsored by: Nginx, Inc.	2015-04-02 14:22:59 +00:00
Bjoern A. Zeeb	1d549750c9	Try to unbreak the build after r280971 by providing the missing #include header for SYSINIT.	2015-04-02 00:30:53 +00:00
Gleb Smirnoff	6d947416cc	o Use new function ip_fillid() in all places throughout the kernel, where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes	2015-04-01 22:26:39 +00:00
Julien Charbon	18832f1fd1	Use appropriate timeout_t* instead of void* in tcp_timer_activate() Suggested by: imp Differential Revision: https://reviews.freebsd.org/D2154 Reviewed by: imp, jhb Approved by: jhb	2015-03-31 10:17:13 +00:00
Gleb Smirnoff	513635bfaa	VNETalize random IP ID engine. Sponsored by: Nginx, Inc.	2015-03-28 16:59:57 +00:00
Gleb Smirnoff	1f08c9479f	Initialize random IP ID engine via SYSINIT() instead of doing that on first packet. This allow to use M_WAITOK and cut down some error handling. Sponsored by: Nginx, Inc.	2015-03-28 16:06:46 +00:00
Fabien Thomas	d612b95e23	On multi CPU systems, we may emit successive packets with the same id. Fix the race by using an atomic operation. Differential Revision: https://reviews.freebsd.org/D2141 Obtained from: emeric.poupon@stormshield.eu MFC after: 1 week Sponsored by: Stormshield	2015-03-27 13:26:59 +00:00
Michael Tuexen	d59909c3e2	Improve the selection of the destination address of SACK chunks. This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196755 and is joint work with rrs@. MFC after: 1 week	2015-03-26 22:05:31 +00:00
Michael Tuexen	a756ffc931	Make sure that we don't free an SCTP shared key too early. Thanks to Pouyan Sepehrdad from Qualcomm Product Security Initiative for reporting the issue. MFC after: 3 days	2015-03-25 22:45:54 +00:00
Michael Tuexen	d9bdc5200a	Use the reference count of the right SCTP inp. Joint work with rrs@ MFC after: 3 days	2015-03-25 21:41:20 +00:00
Michael Tuexen	0426123f75	Fix two bugs which resulted in a screwed up end point list: * Use a save way to walk throught a list while manipulting it. * Have to appropiate locks in place. Joint work with rrs@ MFC after: 3 days	2015-03-24 21:12:45 +00:00
Lawrence Stewart	efca16682d	The addition of flowid and flowtype in r280233 and r280237 respectively forgot to extend the IPv6 packet node format string, which causes a build failure when SIFTR is compiled with IPv6 support. Reported by: Lars Eggert	2015-03-24 15:08:43 +00:00
Michael Tuexen	8427b3fd4f	Fix the bug in the handling of fragmented abandoned SCTP user messages reported in https://code.google.com/p/sctp-refimpl/issues/detail?id=11 Thanks to Lally Singh for reporting it. MFC after: 3 days	2015-03-24 15:05:36 +00:00
Michael Tuexen	7fd5b4365a	Fix an accounting bug related to the per stream chunk counter. While there, don't refer to a net articifically. MFC after: 3 days	2015-03-24 14:51:46 +00:00
Michael Tuexen	ca0f81984a	When an ICMP message is received and the MTU shrinks, only mark outstanding chunks for retransmissions. MFC after: 3 days	2015-03-23 23:34:21 +00:00
Michael Tuexen	d5ec585697	Remove a useless assignment. MFC after: 1 week	2015-03-23 15:12:02 +00:00
Hiren Panchasara	d0a8b2a5ae	Add connection flow type to siftr(4). Suggested by: adrian Sponsored by: Limelight Networks	2015-03-19 00:23:16 +00:00
Hiren Panchasara	a025fd1487	Add connection flowid to siftr(4). Reviewed by: lstewart MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D2089	2015-03-18 23:24:25 +00:00
Adrian Chadd	3b27278218	Correctly const-ify things. Found by: clang 3.6	2015-03-18 04:40:36 +00:00
Ian Lepore	dcdeb95f09	Go back to using sbuf_new() with a preallocated large buffer, to avoid triggering an sbuf auto-drain copyout while holding a lock. Pointed out by: jhb Pointy hat: ian	2015-03-14 23:57:33 +00:00
Ian Lepore	751ccc429d	Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl string returned to userland is nulterminated. PR: 195668	2015-03-14 18:11:24 +00:00
Andrey V. Elsukov	2530ed9e70	Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value to obtain IPv4 next hop address in tablearg case. Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we still use this opcode, but when packet identified as IPv6 packet, we obtain next hop address from dedicated field nh6 in struct table_value. Replace hopstore field in struct ip_fw_args with anonymous union and add hopstore6 field. Use this field to copy tablearg value for IPv6. Replace spare1 field in struct table_value with zoneid. Use it to keep scope zone id for link-local IPv6 addresses. Since spare1 was used internally, replace spare0 array with two variables spare0 and spare1. Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting IPv6 addresses in table_value. Use zoneid field in struct table_value to store sin6_scope_id value. Since the kernel still uses embedded scope zone id to represent link-local addresses, convert next_hop6 address into this form before return from pfil processing. This also fixes in6_localip() check for link-local addresses. Differential Revision: https://reviews.freebsd.org/D2015 Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-03-13 09:03:25 +00:00
Michael Tuexen	5ba11c4c2e	Update a comment to get it aligned with the code change. Reported by: brueffer@	2015-03-11 15:40:29 +00:00
Michael Tuexen	975c975bf0	It seems that sb_acc is a better replacement for sb_cc than sb_ccc. At least it unbreaks the use of select() for SCTP sockets. MFC after: 3 days	2015-03-11 15:21:39 +00:00
Michael Tuexen	b3bf169ac7	Fix the adaptation of the path state when thresholds are changed using the SCTP_PEER_ADDR_THLDS socket option. MFC after: 3 days	2015-03-11 14:25:23 +00:00
Michael Tuexen	3cb3567d7e	Keep track on the socket lock state. This fixes a bug showing up on Mac OS X. MFC after: 3 days	2015-03-10 22:38:10 +00:00
Michael Tuexen	2bb7e77385	Unlock the stcb when using setsockopt() for the SCTP_PEER_ADDR_THLDS option. MFC after: 3 days	2015-03-10 21:05:17 +00:00
Michael Tuexen	59b6d5be4e	Add a SCTP socket option to limit the cwnd for each path. MFC after: 1 month	2015-03-10 19:49:25 +00:00
Michael Tuexen	8f290ed51b	Fix a typo. MFC after: 1 week	2015-03-10 09:16:31 +00:00
Julien Charbon	eb96dc3336	In TCP, connect() can return incorrect error code EINVAL instead of EADDRINUSE or ECONNREFUSED PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196035 Differential Revision: https://reviews.freebsd.org/D1982 Reported by: Mark Nunberg <mnunberg@haskalah.org> Submitted by: Harrison Grundy <harrison.grundy@astrodoggroup.com> Reviewed by: adrian, jch, glebius, gnn Approved by: jhb MFC after: 2 weeks	2015-03-09 20:29:16 +00:00
Andrey V. Elsukov	1b5aa92cff	lla_lookup() can directly call llentry_free() for static entries and the last one requires to hold afdata's wlock. PR: 197096 MFC after: 1 week	2015-03-07 18:33:08 +00:00
Hiroki Sato	11d8451df3	Implement Enhanced DAD algorithm for IPv6 described in draft-ietf-6man-enhanced-dad-13. This basically adds a random nonce option (RFC 3971) to NS messages for DAD probe to detect a looped back packet. This looped back packet prevented DAD on some pseudo-interfaces which aggregates multiple L2 links such as lagg(4). The length of the nonce is set to 6 bytes. This algorithm can be disabled by setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis. Reported by: hiren Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D1835	2015-03-02 17:30:26 +00:00
Hans Petter Selasky	9c0f6aa762	Fix a special case in ip_fragment() to produce a more sensible chain of packets. When the data payload length excluding any headers, of an outgoing IPv4 packet exceeds PAGE_SIZE bytes, a special case in ip_fragment() can kick in to optimise the outgoing payload(s). The code which was added in r98849 as part of zero copy socket support assumes that the beginning of any MTU sized payload is aligned to where a MBUF's "m_data" pointer points. This is not always the case and can sometimes cause large IPv4 packets, as part of ping replies, to be split more than needed. Instead of iterating the MBUFs to figure out how much data is in the current chain, use the value already in the "m_pkthdr.len" field of the first MBUF in the chain. Reviewed by: ken @ Differential Revision: https://reviews.freebsd.org/D1893 MFC after: 2 weeks Sponsored by: Mellanox Technologies	2015-02-25 13:58:43 +00:00
Xin LI	cfa498d88e	Fix integer overflow in IGMP protocol. Security: FreeBSD-SA-15:04.igmp Security: CVE-2015-1414 Found by: Mateusz Kocielski, Logicaltrust Analyzed by: Marek Kroemeke, Mateusz Kocielski (shm@NetBSD.org) and 22733db72ab3ed94b5f8a1ffcde850251fe6f466 Submited by: Mariusz Zaborski <oshogbo@FreeBSD.org> Reviewed by: bms	2015-02-25 05:42:59 +00:00
Zbigniew Bodek	8018ac153f	Change struct attribute to avoid aligned operations mismatch Previous __alignment(4) allowed compiler to assume that operations are performed on aligned region. On ARM processor, this led to alignment fault as shown below: trapframe: 0xda9e5b10 FSR=00000001, FAR=a67b680e, spsr=60000113 r0 =00000000, r1 =00000068, r2 =0000007c, r3 =00000000 r4 =a67b6826, r5 =a67b680e, r6 =00000014, r7 =00000068 r8 =00000068, r9 =da9e5bd0, r10=00000011, r11=da9e5c10 r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc <...> udp_input+0x264: ldmia r5, {r0-r3, r6} udp_input+0x268: stmia r12, {r0-r3, r6} This was due to instructions which do not support unaligned access, whereas for __alignment(2) compiler replaced ldmia/stmia with some logically equivalent memcpy operations. In fact, the assumption that 'struct ip' is always 4-byte aligned is definitely false, as we have no impact on data alignment of packet stream received. Another possible solution would be to explicitely perform memcpy() on objects of 'struct ip' type, which, however, would suffer from performance drop, and be merely a problem hiding. Please, note that this has nothing to do with ARM32_DISABLE_ALIGNMENT_FAULTS option, but is related strictly to compiler behaviour. Submitted by: Wojciech Macek <wma@semihalf.com> Reviewed by: glebius, ian Obtained from: Semihalf	2015-02-24 12:57:03 +00:00
Gleb Smirnoff	26d50672d6	The last userland piece of in_var.h is now 'struct in_aliasreq'. Move it to the top of the file, and ifdef _KERNEL the rest.	2015-02-19 23:59:27 +00:00
Gleb Smirnoff	e072c794ad	Now that all users of _WANT_IFADDR are fixed, remove this crutch and hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 23:16:10 +00:00
Gleb Smirnoff	0d159406b6	- Rename 'struct igmp_ifinfo' into 'struct igmp_ifsoftc', since it really represents a context. - Preserve name 'struct igmp_ifinfo' for a new structure, that will be stable API between userland and kernel. - Make sysctl_igmp_ifinfo() return the new 'struct igmp_ifinfo', instead of old one, which had a bunch of internal kernel structures in it. - Move all above declarations from in_var.h to igmp_var.h, since they are private to IGMP code. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 22:35:23 +00:00
Konstantin Belousov	0c6dcac369	Fix build with KTR after r278978.	2015-02-19 15:41:23 +00:00
Gleb Smirnoff	fd1b2a7c57	Widen _KERNEL ifdef to hide more kernel network stack structures from userland.	2015-02-19 06:24:27 +00:00
Gleb Smirnoff	058e08bea9	Use new struct mbufq instead of struct ifqueue to manage packet queues in IPv4 multicast code. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 01:21:02 +00:00
Randall Stewart	2575fbb827	This fixes a bug in the way that the LLE timers for nd6 and arp were being used. They basically would pass in the mutex to the callout_init. Because they used this method to the callout system, it was possible to "stop" the callout. When flushing the table and you stopped the running callout, the callout_stop code would return 1 indicating that it was going to stop the callout (that was about to run on the callout_wheel blocked by the function calling the stop). Now when 1 was returned, it would lower the reference count one extra time for the stopped timer, then a few lines later delete the memory. Of course the callout_wheel was stuck in the lock code and would then crash since it was accessing freed memory. By using callout_init(c, 1) we always get a 0 back and the reference counting bug does not rear its head. We do have to make a few adjustments to the callouts themselves though to make sure it does the proper thing if rescheduled as well as gets the lock. Commented upon by hiren and sbruno See Phabricator D1777 for more details. Commented upon by hiren and sbruno Reviewed by: adrian, jhb and bz Sponsored by: Netflix Inc.	2015-02-09 19:28:11 +00:00
Hans Petter Selasky	609752f04f	The flowid and hashtype should be copied from the originating packet when fragmenting IP packets to preserve the order of the packets in a stream. Else the resulting fragments can be sent out of order when the hardware supports multiple transmit rings. Reviewed by: glebius @ MFC after: 1 week Sponsored by: Mellanox Technologies	2015-02-02 17:32:50 +00:00
Hiren Panchasara	ec446b1375	Make syncookie_mac() use 'tcp_seq irs' in computing hash. This fixes what seems like a simple oversight when the function was added in r253210. Reported by: Daniel Borkmann <dborkman@redhat.com> Florian Westphal <fw@strlen.de> Differential Revision: https://reviews.freebsd.org/D1628 Reviewed by: gnn MFC after: 1 month Sponsored by: Limelight Networks	2015-01-30 17:29:07 +00:00
Michael Tuexen	aec9ef9745	Whitespace change.	2015-01-27 21:30:24 +00:00
Xin LI	6a58f0e913	Fix SCTP stream reset vulnerability. We would like to acknowledge Gerasimos Dimitriadis who reported the issue and Michael Tuexen who analyzed and provided the fix. Security: FreeBSD-SA-15:03.sctp Security: CVE-2014-8613 Submitted by: tuexen	2015-01-27 19:35:38 +00:00
Xin LI	38f2a43815	Fix SCTP SCTP_SS_VALUE kernel memory corruption and disclosure vulnerability. We would like to acknowledge Clement LECIGNE from Google Security Team and Francisco Falcon from Core Security Technologies who discovered the issue independently and reported to the FreeBSD Security Team. Security: FreeBSD-SA-15:02.kmem Security: CVE-2014-8612 Submitted by: tuexen	2015-01-27 19:35:36 +00:00
John Baldwin	002d455873	Use an sbuf to generate the output of the net.inet.tcp.hostcache.list sysctl to avoid a possible buffer overflow if the cache grows while the text is being generated. PR: 172675 MFC after: 2 weeks	2015-01-25 19:45:44 +00:00
Will Andrews	bb269f3ae4	Log hardware interface up/down as "hardware" rather than just "hw". Suggested by: glebius MFC after: 1 week MFC with: 277530	2015-01-23 14:30:24 +00:00
Will Andrews	369a670857	When a CARP state change is caused by an ifconfig request, log it accordingly. Suggested by: glebius MFC after: 1 week MFC with: 277530	2015-01-23 14:28:12 +00:00
Will Andrews	d01641e2c1	Improve CARP logging so that all state transitions are logged. sys/netinet/ip_carp.c: Add a "reason" string parameter to carp_set_state() and carp_master_down_locked() allowing more specific logging information to be passed into these apis. Refactor existing state transition logging into a single log call in carp_set_state(). Update all calls to carp_set_state() and carp_master_down_locked() to pass an appropriate reason string. For state transitions that were previously logged, the output should be unchanged. Submitted by: gibbs (original), asomers (updated) MFC after: 1 week Sponsored by: Spectra Logic MFSpectraBSD: 1039697 on 2014/02/11 (original) 1049992 on 2014/03/21 (updated)	2015-01-22 17:09:54 +00:00
Michael Tuexen	bcbf8c2105	Remove comparisons which are not necessary. Reported by: Coverity CID: 1237826, 1237844, 1237847 MFC after: 1 week	2015-01-20 19:08:55 +00:00
Michael Tuexen	2010054d91	Code cleanup. Reported by: Coverity CID: 749578 MFC after: 1 week	2015-01-19 11:52:08 +00:00
Michael Tuexen	e1600e5058	Fix a bug which only shows up when an mbuf allocation failed. Therefore chances are low that we hit this. Reported by: Coverity CID: 1018886 MFC after: 1 week	2015-01-18 22:00:39 +00:00
Michael Tuexen	d6165c1fca	Remove an unnecessary check. Reported by: Coverity CID: 749576 MFC after: 1 week	2015-01-18 21:16:22 +00:00
Michael Tuexen	3ff78fbbd9	Add protection code to free memory in case of processing an address which is neither IPv4 or IPv6. Reported by: Coverity CID: 749311 MFC after: 1 week	2015-01-18 20:53:20 +00:00
Michael Tuexen	61330de4b0	Remove an unused variable. Reported by: Coverity CID: 750999 MFC after: 1 week	2015-01-18 20:20:27 +00:00
Adrian Chadd	b2bdc62a95	Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits)	2015-01-18 18:06:40 +00:00
Gleb Smirnoff	fc2517100b	Do not go one layer down to check ifqueue length. First, not all drivers use ifqueue at all. Second, there is no point in this lockless check. Either positive or negative result of the check could be incorrect after a tick. Reviewed by: tuexen Sponsored by: Nginx, Inc.	2015-01-12 18:06:22 +00:00
Gleb Smirnoff	fc7ea8b690	Remove incorrect layering violating code that: a) assumed that ifqueue length is measured in bytes, instead of packets b) assumed that any interface has working ifqueue c) incremented global counter instead of ifi_oqdrops Sponsored by: Nginx, Inc.	2015-01-12 09:41:12 +00:00
Hiren Panchasara	64807b300f	DCTCP (Data Center TCP) implementation. DCTCP congestion control algorithm aims to maximise throughput and minimise latency in data center networks by utilising the proportion of Explicit Congestion Notification (ECN) marked packets received from capable hardware as a congestion signal. Highlights: Implemented as a mod_cc(4) module. ECN (Explicit congestion notification) processing is done differently from RFC3168. Takes one-sided DCTCP into consideration where only one of the sides is using DCTCP and other is using standard ECN. IETF draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00 Thesis report by Midori Kato: https://eggert.org/students/kato-thesis.pdf Submitted by: Midori Kato <katoon@sfc.wide.ad.jp> and Lars Eggert <lars@netapp.com> with help and modifications from hiren Differential Revision: https://reviews.freebsd.org/D604 Reviewed by: gnn	2015-01-12 08:33:04 +00:00
Michael Tuexen	d89abe19b0	Remove dead code. Reported by: Coverity CID: 748664 MFC after: 1 week	2015-01-12 07:55:16 +00:00
Michael Tuexen	df26ea6839	Remove dead code. Reported by: Coverity CID: 1018052 MFC after: 1 week	2015-01-12 07:39:52 +00:00
Michael Tuexen	f104b614a0	Remove dead code. Reported by: Coverity CID: 1018053 MFC after: 1 week	2015-01-12 07:29:35 +00:00
Michael Tuexen	f0dc2113ca	Remove dead code. Reported by: Coverity CID: 748663 MFC after: 1 week	2015-01-11 22:49:20 +00:00
Michael Tuexen	448e859674	Remove dead code. Reported by: Coverity CID: 748660, 748661 MFC after: 1 week	2015-01-11 22:23:39 +00:00
Michael Tuexen	e88f89a393	Remove dead code. Reported by: Coverity CID: 748665 MFC after: 1 week	2015-01-11 21:55:30 +00:00
Michael Tuexen	d3cfd43074	Remove dead code. Reported by: Coverity CID: 748666 MFC after: 1 week	2015-01-11 21:44:56 +00:00
Michael Tuexen	4be807c4d6	Minimize the usage of SCTP_BUF_IS_EXTENDED. This should help Robert...	2015-01-10 20:49:57 +00:00
Michael Tuexen	296d0b9495	Retire SCTP_BUF_EXTEND_SIZE. This patch was suggested by Robert Watson.	2015-01-10 13:56:26 +00:00
Alexander V. Chernikov	d63e657c04	* Deal with ARCNET L2 multicast mapping for IPv6 the same way as in IPv4: handle it in arc_output() instead of nd6_storelladdr(). * Remove IFT_ARCNET check from arpresolve() since arc_output() does not use arpresolve() to handle broadcast/multicast. This check was there since r84931. It looks like it was not used since r89099 (initial import of Arcnet support where multicast is handled separately). * Remove IFT_IEEE1394 case from nd6_storelladdr() since firewire_output() calles nd6_storelladdr() for unicast addresses only. * Remove IFT_ARCNET case from nd6_storelladdr() since arc_output() now handles multicast by itself. As a result, we have the following pattern: all non-ethernet-style media have their own multicast map handling inside their appropriate routines. On the other hand, arpresolve() (and nd6_storelladdr()) which meant to be 'generic' ones de-facto handles ethernet-only multicast maps. MFC after: 3 weeks	2015-01-09 12:56:51 +00:00
Robert Watson	e1165035a6	Use M_WRITABLE() and M_LEADINGSPACE() rather than checking M_EXT and doing hand-crafted length calculations in the IP options code. Reviewed by: bz Sponsored by: EMC / Isilon Storage Division	2015-01-06 14:32:28 +00:00
Luiz Otavio O Souza	57c5139c46	Remove the check that prevent carp(4) advskew to be set to '0'. CARP devices are created with advskew set to '0' and once you set it to any other value in the valid range (0..254) you can't set it back to zero. The code in question is also used to prevent that zeroed values overwrite the CARP defaults when a new CARP device is created. Since advskew already defaults to '0' for newly created devices and the new value is guaranteed to be within the valid range, it is safe to overwrite it here. PR: 194672 Reported by: cmb@pfsense.org In collaboration with: garga Tested by: garga MFC after: 2 weeks	2015-01-06 13:07:13 +00:00
Alexander V. Chernikov	3a7498636a	* Allocate hash tables separately * Make llt_hash() callback more flexible * Default hash size and hashing method is now per-af * Move lltable allocation to separate function	2015-01-05 17:23:02 +00:00
Robert Watson	ed6a66ca6c	To ease changes to underlying mbuf structure and the mbuf allocator, reduce the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division	2015-01-05 09:58:32 +00:00
Alexander V. Chernikov	b44a7d5d87	* Use unified code for deleting entry by sockaddr instead of per-af one. * Remove now unused llt_delete_addr callback.	2015-01-03 19:09:06 +00:00
Alexander V. Chernikov	20dd899505	* Hide lltable implementation details in if_llatbl_var.h * Make most of lltable_* methods 'normal' functions instead of inline * Add lltable_get_<af\|ifp>() functions to access given lltable fields * Temporarily resurrect nd6_lookup() function	2015-01-03 16:04:28 +00:00
Alexander V. Chernikov	73db4e007a	Finish r275628: remove remaining 'base' references.	2015-01-03 13:41:53 +00:00
Adrian Chadd	492ccbe14d	Migrate the RSS IPv6 hash code to use pointers to the v6 addresses rather than passing them in by value. The eventual aim is to do incremental hash construction rather than all of the memcpy()'ing into a contiguous buffer for the hash function, which does show up as taking quite a bit of CPU during profiling. Tested: * a variety of laptops/desktop setups I have, with v6 connectivity Differential Revision: D1404 Reviewed by: bz, rpaulo	2014-12-31 22:52:43 +00:00
Andrey V. Elsukov	f188f14d43	Extern declarations in C files loses compile-time checking that the functions' calls match their definitions. Move them to header files. Reviewed by: jilles (previous version)	2014-12-25 21:32:37 +00:00
Andrey V. Elsukov	132c449079	Remove in_gif.h and in6_gif.h files. They only contain function declarations used by gif(4). Instead declare these functions in C files. Also make some variables static.	2014-12-23 16:17:37 +00:00
Michael Tuexen	f3ba71bee4	Don't check twice that inp is not NULL. Reported by: Coverity CID: 748671 MFC after: 3 days	2014-12-21 13:58:53 +00:00
Warner Losh	61f26cae7d	Where appropriate, use the modern terms for the one true time base (UTC) rather than the archaic (GMT) in comments. Except where the comments are making fun of people doing this (and pedants who insist on the new terms).	2014-12-21 05:07:11 +00:00
Michael Tuexen	b03b5d729a	Fix and harmonize the validation of PR-SCTP policies. Reported by: Coverity CID: 1232044 MFC after: 3 days	2014-12-20 21:17:28 +00:00
Michael Tuexen	ca10a8d944	Cleanup the code. Reported by: Coverity CID: 1232003	2014-12-20 13:47:38 +00:00
Michael Tuexen	142a4d9e86	Add a missing break. Reported by: Coverity CID: 1232014 MFC after: 3 days	2014-12-17 20:34:38 +00:00
Andrey V. Elsukov	44eb8bbe7b	Do not count security policy violation twice. ipsec*_in_reject() do this by their own. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 19:20:13 +00:00
Andrey V. Elsukov	0332a55f0f	Use ipsec4_in_reject() to simplify ip_ipsec_fwd() and ip_ipsec_input(). ipsec4_in_reject() does the same things, also it counts policy violation errors. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:55:54 +00:00
Andrey V. Elsukov	0275b2e369	Remove flag/flags argument from the following functions: ipsec_getpolicybyaddr() ipsec4_checkpolicy() ip_ipsec_output() ip6_ipsec_output() The only flag used here was IP_FORWARDING. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:35:34 +00:00
Andrey V. Elsukov	619764beab	Remove flags and tunalready arguments from ipsec4_process_packet() and make its prototype similar to ipsec6_process_packet. The flags argument isn't used here, tunalready is always zero. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:34:49 +00:00
Andrey V. Elsukov	8922ddbe40	Move ip_ipsec_fwd() from ip_input() into ip_forward(). Remove check for presence PACKET_TAG_IPSEC_IN_DONE mbuf tag from ip_ipsec_fwd(). PACKET_TAG_IPSEC_IN_DONE tag means that packet is already handled by IPSEC code. This means that before IPSEC processing it was destined to our address and security policy was checked in the ip_ipsec_input(). After IPSEC processing packet has new IP addresses and destination address isn't our own. So, anyway we can't check security policy from the mbuf tag, because it corresponds to different addresses. We should check security policy that corresponds to packet attributes in both cases - when it has a mbuf tag and when it has not. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 16:53:29 +00:00
Andrey V. Elsukov	e58320f127	Remove PACKET_TAG_IPSEC_IN_DONE mbuf tag lookup and usage of its security policy. The changed block of code in ip*_ipsec_input() is called when packet has ESP/AH header. Presence of PACKET_TAG_IPSEC_IN_DONE mbuf tag in the same time means that packet was already handled by IPSEC and reinjected in the netisr, and it has another ESP/AH headers (encrypted twice?). Since it was already processed by IPSEC code, the AH/ESP headers was already stripped (and probably outer IP header was stripped too) and security policy from the tdb_ident was applied to those headers. It is incorrect to apply this security policy to current headers. Also make ip_ipsec_input() prototype similar to ip6_ipsec_input(). Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 14:58:55 +00:00
Andrey V. Elsukov	dd9cd45b44	Remove check for presence of PACKET_TAG_IPSEC_PENDING_TDB and PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED mbuf tags. They aren't used in FreeBSD. Instead check presence of PACKET_TAG_IPSEC_OUT_DONE mbuf tag. If it is found, bypass security policy lookup as described in the comment. PACKET_TAG_IPSEC_OUT_DONE tag added to mbuf when IPSEC code finishes ESP/AH processing. Since it was already finished, this means the security policy placed in the tdb_ident was already checked. And there is no reason to check it again here. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 14:43:44 +00:00
Alexander V. Chernikov	ee7e9a4e17	* Do not assume lle has sockaddr key after struct lle: use llt_fill_sa_entry() llt method to store lle address in sa. * Eliminate L3_ADDR macro and either reference IPv4/IPv6 address directly from lle or use newly-created llt_fill_sa_entry(). * Do not store sockaddr inside arp/ndp lle anymore.	2014-12-09 00:48:08 +00:00
Alexander V. Chernikov	d82ed5051c	Simplify lle lookup/create api by using addresses instead of sockaddrs.	2014-12-08 23:23:53 +00:00
Alexander V. Chernikov	73b52ad896	Use llt_prepare_static_entry method to prepare valid per-af static entry.	2014-12-07 23:59:44 +00:00
Alexander V. Chernikov	0368226e65	* Retire abstract llentry_free() in favor of lltable_drop_entry_queue() and explicit calls to RTENTRY_FREE_LOCKED() * Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6. * Rename <lltable_\|llt>_delete function to _delete_addr() to note that this function is used to external callers. Make this function maintain its own locking. * Use lookup/unlink/clear call chain from internal callers instead of delete_addr. * Fix LLE_DELETED flag handling	2014-12-07 23:08:07 +00:00
Alexander V. Chernikov	721cd2e032	Do not enforce particular lle storage scheme: * move lltable allocation to per-domain callbacks. * make llentry_link/unlink functions overridable llt methods. * make hash table traversal another overridable llt method.	2014-12-07 17:32:06 +00:00
Alexander V. Chernikov	a743ccd468	* Add llt_clear_entry() callback which is able to do all lle cleanup including unlinking/freeing * Relax locking in lltable_prefix_free_af/lltable_free * Do not pass @llt to lle free callback: it is always NULL now. * Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding unlock/lock sequinces * Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc() to retrieve preferred source address from lle hold queue and pass it instead of lle. * Finally, make nd6_create() create and return unlocked lle * Separate defrtr handling code from nd6_free(): use nd6_check_del_defrtr() to check if we need to keep entry instead of performing GC, use nd6_check_recalc_defrtr() to perform actual recalc on lle removal. * Move isRouter handling from nd6_cache_lladdr() to separate nd6_check_router() * Add initial code to maintain lle runtime flags in sync.	2014-12-07 15:42:46 +00:00
Michael Tuexen	39cbb549cc	Include the received chunk padding when reporting an unknown chunk. MFC after: 1 week	2014-12-06 22:57:19 +00:00
Michael Tuexen	d59107f700	Fix the support of mapped IPv4 addresses. Thanks to Mark Bonnekessel and Markus Boese for making me aware of the problems. MFC after: 1 week	2014-12-06 20:00:08 +00:00
Craig Rodrigues	a8da5dd658	MFp4: @181627 Allow UMA allocated memory to be freed when VNET jails are torn down. Differential Revision: D1201 Submitted by: bz Reviewed by: rwatson, gnn	2014-12-06 02:59:59 +00:00
Michael Tuexen	457b4b8836	This is the SCTP specific companion of https://svnweb.freebsd.org/changeset/base/275358 which was provided by Hans Petter Selasky.	2014-12-04 21:17:50 +00:00
Michael Tuexen	4e88d37a2a	Do the renaming of sb_cc to sb_ccc in a way with less code changes by using a macro. This is an alternate approach to https://svnweb.freebsd.org/changeset/base/275326 which is easier to handle upstream. Discussed with: rrs, glebius	2014-12-02 20:29:29 +00:00
Andrey V. Elsukov	2d957916ef	Remove route chaching support from ipsec code. It isn't used for some time. * remove sa_route_union declaration and route_cache member from struct secashead; * remove key_sa_routechange() call from ICMP and ICMPv6 code; * simplify ip_ipsec_mtu(); * remove #include <net/route.h>; Sponsored by: Yandex LLC	2014-12-02 04:20:50 +00:00
Alexander V. Chernikov	a6e934e359	Put bandaid on arptimer-derived crashed for 'arp -da' deleted records. The proper fix will arrive after convering lltable 'delete' method.	2014-12-01 22:37:36 +00:00
Alexander V. Chernikov	9b65db85e2	Do more fine-grained locking in lltable code: lltable_create_lle() does actual new lle creation without extensive locking and existing lle search. Move lle updating code from gigantic in_arpinput() to arp_update_llle() and some other functions. IPv6 changes to follow.	2014-12-01 21:43:48 +00:00
Hans Petter Selasky	c25290420e	Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies	2014-12-01 11:45:24 +00:00
Alexander V. Chernikov	ce313fdd71	* Unify lle table dump/prefix removal code. * Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes used by lltable code.	2014-11-30 14:35:01 +00:00
Gleb Smirnoff	2cbcd3c198	Merge from projects/sendfile: - Provide pru_ready function for TCP. - Don't call tcp_output() from tcp_usr_send() if no ready data was put into the socket buffer. - In case of dropped connection don't try to m_freem() not ready data. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-30 13:43:52 +00:00
Gleb Smirnoff	651e4e6a30	Merge from projects/sendfile: extend protocols API to support sending not ready data: o Add new flag to pru_send() flags - PRUS_NOTREADY. o Add new protocol method pru_ready(). Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-30 13:24:21 +00:00
Gleb Smirnoff	0f9d0a73a4	Merge from projects/sendfile: o Introduce a notion of "not ready" mbufs in socket buffers. These mbufs are now being populated by some I/O in background and are referenced outside. This forces following implications: - An mbuf which is "not ready" can't be taken out of the buffer. - An mbuf that is behind a "not ready" in the queue neither. - If sockbet buffer is flushed, then "not ready" mbufs shouln't be freed. o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc. The sb_ccc stands for ""claimed character count", or "committed character count". And the sb_acc is "available character count". Consumers of socket buffer API shouldn't already access them directly, but use sbused() and sbavail() respectively. o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones with M_BLOCKED. o New field sb_fnrdy points to the first not ready mbuf, to avoid linear search. o New function sbready() is provided to activate certain amount of mbufs in a socket buffer. A special note on SCTP: SCTP has its own sockbufs. Unfortunately, FreeBSD stack doesn't yet allow protocol specific sockbufs. Thus, SCTP does some hacks to make itself compatible with FreeBSD: it manages sockbufs on its own, but keeps sb_cc updated to inform the stack of amount of data in them. The new notion of "not ready" data isn't supported by SCTP. Instead, only a mechanical substitute is done: s/sb_cc/sb_ccc/. A proper solution would be to take away struct sockbuf from struct socket and allow protocols to implement their own socket buffers, like SCTP already does. This was discussed with rrs@. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 12:52:33 +00:00
Gleb Smirnoff	300fa232ee	Missed in r274421: use sbavail() instead of bare access to sb_cc.	2014-11-30 12:11:01 +00:00
Alexander V. Chernikov	5d14e4cd76	Provide rte_<get\|set> methods to access rtentry for external consumers.	2014-11-29 19:27:43 +00:00
Alexander V. Chernikov	74860d4f7c	Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr - return lle flags IFF needed. Do not pass rte to arpresolve - pass is_gateway flag instead.	2014-11-27 23:06:25 +00:00
Alexander V. Chernikov	4c9df1c5b8	Fix r274855: use proper unlock method.	2014-11-23 16:40:33 +00:00
Alexander V. Chernikov	73d770287d	Do more fine-grained lltable locking: use table runtime lock as rare as we can.	2014-11-23 15:38:06 +00:00
Alexander V. Chernikov	9479029b1f	* Add lltable llt_hash callback * Move lltable items insertions/deletions to generic llt code.	2014-11-23 12:15:28 +00:00
Alexander V. Chernikov	7c066c18db	Use less-invasive approach for IF_AFDATA lock: convert into 2 locks: use rwlock accessible via external functions (IF_AFDATA_CFG_* -> if_afdata_cfg_()) for all control plane tasks use rmlock (IF_AFDATA_RUN_) for fast-path lookups.	2014-11-22 19:53:36 +00:00
Alexander V. Chernikov	27688dfe1d	Temporarily revert r274774.	2014-11-22 17:57:54 +00:00
Alexander V. Chernikov	d7ebebfba1	Fix non-debug build after r274855.	2014-11-22 17:30:37 +00:00
Alexander V. Chernikov	8f465f6690	Convert &in_ifaddr_lock to dual-locking model: use rwlock accessible via external functions (IN_IFADDR_CFG_* -> in_ifaddr_cfg_()) for all control plane tasks use rmlock (IN_IFADDR_RUN_) for fast-path lookups.	2014-11-22 16:27:51 +00:00
Alexander V. Chernikov	2e47d2f953	Mark ifaddr/rtsock static entries RLLE_VALID.	2014-11-21 23:37:59 +00:00
Alexander V. Chernikov	86b94cffe4	Finish r274774: add more headers/fix build for non-debug case.	2014-11-21 23:36:21 +00:00
Alexander V. Chernikov	9883e41b4b	Switch IF_AFDATA lock to rmlock	2014-11-21 02:28:56 +00:00
Alexander V. Chernikov	4d56c133fb	Sync to HEAD@r274766	2014-11-21 01:22:33 +00:00
Alexander V. Chernikov	f9723c7705	Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp we need to return. Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags fields for all relevant functions.	2014-11-20 22:41:59 +00:00
Julien Charbon	71da715374	Re-introduce padding fields removed with r264321 to keep struct tcptw ABI unchanged. Suggested by: jhb Approved by: jhb (mentor) MFC after: 1 day X-MFC-With: r264321	2014-11-17 14:56:02 +00:00
Alexander V. Chernikov	7f948f12f6	Finish r274175: do control plane MTU tracking. Update route MTU in case of ifnet MTU change. Add new RTF_FIXEDMTU to track explicitly specified MTU. Old behavior: ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU. User has to manually update all routes. ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU. However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu gets updated. New behavior: ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them. ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu. route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag. route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag. PR: 194238 MFC after: 1 month CR: D1125	2014-11-17 01:05:29 +00:00
Alexander V. Chernikov	df629abf3e	Rework LLE code locking: * struct llentry is now basically split into 2 pieces: all fields within 64 bytes (amd64) are now protected by both ifdata lock AND lle lock, e.g. you require both locks to be held exclusively for modification. All data necessary for fast path operations is kept here. Some fields were added: - r_l3addr - makes lookup key liev within first 64 bytes. - r_flags - flags, containing pre-compiled decision whether given lle contains usable data or not. Current the only flag is RLLE_VALID. - r_len - prepend data len, currently unused - r_kick - used to provide feedback to control plane (see below). All other fields are protected by lle lock. * Add simple state machine for ARP to handle "about to expire" case: Current model (for the fast path) is the following: - rlock afdata - find / rlock rte - runlock afdata - see if "expire time" is approaching (time_uptime + la->la_preempt > la->la_expire) - if true, call arprequest() and decrease la_preempt - store MAC and runlock rte New model (data plane): - rlock afdata - find rte - check if it can be used using r_* fields only - if true, store MAC - if r_kick field != 0 set it to 0. - runlock afdata New mode (control plane): - schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries) seconds instead of V_arpt_keep. - on first timer invocation change state from ARP_LLINFO_REACHABLE to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in V_arpt_rexmit (default to 1 sec). - on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks for r_kick value: reschedule if not changed, and send arprequest() if set to zero (e.g. entry was used). * Convert IPv4 path to use new single-lock approach. IPv6 bits to follow. * Slow down in_arpinput(): now valid reply will (in most cases) require acquiring afdata WLOCK twice. This is requirement for storing changed lle data. This change will be slightly optimized in future. * Provide explicit hash link/unlink functions for both ipv4/ipv6 code. This will probably be moved to generic lle code once we have per-AF hashing callback inside lltable. * Perform lle unlink on deletion immediately instead of delaying it to the timer routine. * Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the presence of lle reference used for safe callout calls.	2014-11-16 20:12:49 +00:00
Alexander V. Chernikov	b9497d2c91	Move "slow path" handling from arpresolve() to newly-created arpresolve_slow().	2014-11-15 20:02:22 +00:00
Alexander V. Chernikov	b4b1367ae4	* Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete Assume lla_create to return LLE_EXCLUSIVE lock for lle. * Rework lla_rt_output to perform all lle changes under afdata WLOCK. * change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().	2014-11-15 18:54:07 +00:00
Gleb Smirnoff	cfa6009e36	In preparation of merging projects/sendfile, transform bare access to sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-12 09:57:15 +00:00
Hans Petter Selasky	3c7c188c16	Fix some minor TSO issues: - Improve description of TSO limits. - Remove a not needed KASSERT() - Remove some not needed variable casts. Sponsored by: Mellanox Technologies Discussed with: lstewart @ MFC after: 1 week	2014-11-11 12:05:59 +00:00
Alexander V. Chernikov	670e8b3b8c	Kill custom in_matroute() radix mathing function removing one rte mutex lock. Initially in_matrote() in_clsroute() in their current state was introduced by r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them in route table, setting RTPRF_OURS flag and some expire time. After that, either GC came or RTPRF_OURS got removed on first-packet. It was a good solution in that days (and probably another decade after that) to keep TCP metrics. However, after moving metrics to TCP hostcache in r122922, most of in_rmx functionality became unused. It might had been used for flushing icmp-originated routes before rte mutexes/refcounting, but I'm not sure about that. So it looks like this is nearly impossible to make GC do its work nowadays: in_rtkill() ignores non-RTPRF_OURS routes. route can only become RTPRF_OURS after dropping last reference via rtfree() which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes. Dynamic routes can still be installed via received redirect, but they have default lifetime (no specific rt_expire) and no one has another trie walker to call RTFREE() on them. So, the changelist: * remove custom rnh_match / rnh_close matching function. * remove all GC functions * partially revert r256695 (proto3 is no more used inside kernel, it is not possible to use rt_expire from user point of view, proto3 support is not complete) * Finish r241884 (similar to this commit) and remove remaining IPv6 parts MFC after: 1 month	2014-11-11 02:52:40 +00:00
Alexander V. Chernikov	d1f79a3bfc	Remove kernel handling of ICMP_SOURCEQUENCH. It hasn't been used for a very long time. Additionally, it was deprecated by RFC 6633.	2014-11-10 23:10:01 +00:00
Alexander V. Chernikov	f7bab8d0dd	Switch route radix to dual-lock model: use rmlock for data patch access, and config rwlock for conrol plane processing. Route table changes require bock locks held.	2014-11-10 00:07:06 +00:00
Alexander V. Chernikov	603eaf792b	Renove faith(4) and faithd(8) from base. It looks like industry have chosen different (and more traditional) stateless/statuful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD. No objections from: net@	2014-11-09 21:33:01 +00:00
Alexander V. Chernikov	033074c440	Replace 'struct route ' if_output() argument with 'struct nhop_info '. Leave 'struct route' as is for legacy routing api users. Remove most of rtalloc_ign*-derived functions.	2014-11-09 16:33:04 +00:00
Alexander V. Chernikov	55e5eda676	Separate radix and routing: use different structures for route and for other customers. Introduce new 'struct rib_head' for routing purposes and make all routing api use it.	2014-11-09 00:36:39 +00:00
Andrey V. Elsukov	3e88eb903b	Remove ip6_getdstifaddr() and all functions to work with auxiliary data. It isn't safe to keep unreferenced ifaddrs. Use in6ifa_ifwithaddr() to determine ifaddr corresponding to destination address. Since currently we keep addresses with embedded scope zone, in6ifa_ifwithaddr is called with zero zoneid and marked with XXX. Also remove route and lle lookups from ip6_input. Use in6ifa_ifwithaddr() instead. Sponsored by: Yandex LLC	2014-11-08 19:38:34 +00:00
Alexander V. Chernikov	a9413f6ca0	Sync to HEAD@r274297.	2014-11-08 18:13:35 +00:00
Alexander V. Chernikov	1398ffe5bc	Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2014-11-08 16:38:15 +00:00
Alexander V. Chernikov	2c498ee6be	Finish r274290: arg.nextstop / arg.updating are not used anymore.	2014-11-08 15:58:17 +00:00
Alexander V. Chernikov	9c83df7d48	Retire rtq_timeout altering code. This code was initially introduced by r6399 to enhance expiring large number of host cache routes. Since we don't have route cloning anymore and no one altered V_rtq_toomany default (which is 128) in nearly 20 years, I assume this can be safely deleted.	2014-11-08 15:39:32 +00:00
Alexander V. Chernikov	22b08fd8b7	Split radix implementation and system route table structure: use new 'struct radix_head' for radix.	2014-11-07 22:52:02 +00:00
Andrey V. Elsukov	f325335caf	Overhaul if_gre(4). Split it into two modules: if_gre(4) for GRE encapsulation and if_me(4) for minimal encapsulation within IP. gre(4) changes: * convert to if_transmit; * rework locking: protect access to softc with rmlock, protect from concurrent ioctls with sx lock; * correct interface accounting for outgoing datagramms (count only payload size); * implement generic support for using IPv6 as delivery header; * make implementation conform to the RFC 2784 and partially to RFC 2890; * add support for GRE checksums - calculate for outgoing datagramms and check for inconming datagramms; * add support for sending sequence number in GRE header; * remove support of cached routes. This fixes problem, when gre(4) doesn't work at system startup. But this also removes support for having tunnels with the same addresses for inner and outer header. * deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD. Use our standard ioctls for tunnels. me(4): * implementation conform to RFC 2004; * use if_transmit; * use the same locking model as gre(4); PR: 164475 Differential Revision: D1023 No objections from: net@ Relnotes: yes Sponsored by: Yandex LLC	2014-11-07 19:13:19 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Alexander V. Chernikov	930d2a42c9	Remove legacy inet lookup functions.	2014-11-07 00:29:00 +00:00
Alexander V. Chernikov	064b1bdb2d	Convert lle rtchecks to use new routing API. For inet/ case, this involves reverting r225947 which seem to be pretty strange commit and should be reverted in HEAD ad well.	2014-11-06 23:35:22 +00:00
Alexander V. Chernikov	146a181f28	Finish r274118: remove useless fields from struct domain. Sponsored by: Yandex LLC	2014-11-06 14:39:04 +00:00
Alexander V. Chernikov	1a75e3b20f	Make checks for rt_mtu generic: Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking might be an option in some situation, it is not feasible to do MTU checks there: generic (or per-domain) routing code is perfectly capable of doing this. We currrently have 3 places where MTU is altered: 1) route addition. In this case domain overrides radix _addroute callback (in[6]_addroute) and all necessary checks/fixes are/can be done there. 2) route change (especially, GW change). In this case, there are no explicit per-domain calls, but one can override rte by setting ifa_rtrequest hook to domain handler (inet6 does this). 3) ifconfig ifaceX mtu YYYY In this case, we have no callbacks, but ip[6]_output performes runtime checks and decreases rt_mtu if necessary. Generally, the goals are to be able to handle all MTU changes in control plane, not in runtime part, and properly deal with increased interface MTU. This commit changes the following: * removes hooks setting MTU from drivers side * adds proper per-doman MTU checks for case 1) * adds generic MTU check for case 2) * The latter is done by using new dom_ifmtu callback since if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size. However, IPv6 mtu might be different from if_mtu one (e.g. default 1280) for some cases, so we need an abstract way to know maximum MTU size for given interface and domain. * moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies user-supplied data which must be checked. * removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to use this functions on new non-inserted rte. More changes will follow soon. MFC after: 1 month Sponsored by: Yandex LLC	2014-11-06 13:13:09 +00:00
Alexander V. Chernikov	9f25cbe45e	Remove old hack abusing domattach from NFS code. According to IANA RPC uaddr registry, there are no AFs except IPv4 and IPv6, so it's not worth being too abstract here. Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries. Use own initialization without relying on domattach code. While I admit that this was one of the rare places in kernel networking code which really was capable of doing multi-AF without any AF-depended code, it is not possible anymore to rely on dom* code. While here, change terrifying "Invalid radix node head, rn:" message, to different non-understandable "netcred already exists for given addr/mask", but less terrifying. Since we know that rn_addaddr() returns NULL if the same record already exists, we should provide more friendly error. MFC after: 1 month	2014-11-05 00:58:01 +00:00
Alexander V. Chernikov	69b74805d5	Convert gif and stf to use new routing api.	2014-11-04 18:48:13 +00:00
Alexander V. Chernikov	5c9ef37854	Sync to HEAD@r274095.	2014-11-04 18:22:33 +00:00
Alexander V. Chernikov	8c3cfe0be0	Hide 'struct rtentry' and all its macro inside new header: net/route_internal.h The goal is to make its opaque for all code except route/rtsock and proto domain _rmx.	2014-11-04 17:28:13 +00:00
Alexander V. Chernikov	4003643db1	Convert tcp_maxmtu6() to use new routing api.	2014-11-04 17:01:40 +00:00
Alexander V. Chernikov	257480b8ab	Convert netinet6/ to use new routing API. * Remove &ifpp from ip6_output() in favor of ri->ri_nh_info * Provide different wrappers to in6_selectsrc: Currently it is used by 2 differenct type of customers: - socket-based one, which all are unsure about provided address scope and - in-kernel ones (ND code mostly), which don't have any sockets, options, crededentials, etc. So, we provide two different wrappers to in6_selectsrc() returning select source. * Make different versions of selectroute(): Currenly selectroute() is used in two scenarios: - SAS, via in6_selecsrc() -> in6_selectif() -> selectroute() - output, via in6_output -> wrapper -> selectroute() Provide different versions for each customer: - fib6_lookup_nh_basic()-based in6_selectif() which is capable of returning interface only, without MTU/NHOP/L2 calculations - full-blown fib6_selectroute() with cached route/multipath/ MTU/L2 * Stop using routing table for link-local address lookups * Add in6_ifawithifp_lla() to make for-us check faster for link-local * Add in6_splitscope / in6_setllascope for faster embed/deembed scopes	2014-11-04 15:39:56 +00:00
Hans Petter Selasky	4952ad427a	Restore spares used in "struct tcpcb" and bump "__FreeBSD_version" to indicate need for kernel module re-compilation. Sponsored by: Mellanox Technologies	2014-11-03 13:01:58 +00:00
Michael Tuexen	f885296d70	Don't zero the stats before they are read out. MFC after: 3 days	2014-11-01 10:35:45 +00:00
Andrey V. Elsukov	1d904a55c8	Remove the check for packets with broadcast source from if_gif's encapcheck. The check was recommened in the draft-ietf-ngtrans-mech-05.txt. But it isn't clear, should it compare the source with all direct broadcast addresses in the system or not. RFC 4213 says it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. And this verification can be extended by administrator with any other forms of IPv4 ingress filtering. Discussed with: glebius, melifaro Sponsored by: Yandex LLC	2014-10-31 15:23:24 +00:00
Andrey V. Elsukov	7e4217558c	Fix typo.	2014-10-31 11:40:49 +00:00
Julien Charbon	cea40c4888	Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and tcp_tw_2msl_scan(). This race condition drives unplanned timewait timeout cancellation. Also simplify implementation by holding inpcb reference and removing tcptw reference counting. Differential Revision: https://reviews.freebsd.org/D826 Submitted by: Marc De la Gueronniere <mdelagueronniere@verisign.com> Submitted by: jch Reviewed By: jhb (mentor), adrian, rwatson Sponsored by: Verisign, Inc. MFC after: 2 weeks X-MFC-With: r264321	2014-10-30 08:53:56 +00:00
Hans Petter Selasky	0e1152fcc2	The SYSCTL data pointers can come from userspace and must not be directly accessed. Although this will work on some platforms, it can throw an exception if the pointer is invalid and then panic the kernel. Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-28 12:00:39 +00:00
Hans Petter Selasky	614b50ae8b	Preserve limitation of "TCP_CA_NAME_MAX" when matching the algorithm name. MFC after: 3 days Suggested by: gnn @	2014-10-27 16:08:41 +00:00
Hans Petter Selasky	60a945f95d	Make assignments to "net.inet.tcp.cc.algorithm" work by fixing a bad string comparison. MFC after: 3 days Reported by: Jukka Ukkonen <jau789@gmail.com> Sponsored by: Mellanox Technologies	2014-10-27 11:21:47 +00:00
Mateusz Guzik	e015b1ab0a	Avoid dynamic syscall overhead for statically compiled modules. The kernel tracks syscall users so that modules can safely unregister them. But if the module is not unloadable or was compiled into the kernel, there is no need to do this. Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC during kernel build and 0 otherwise. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-26 19:42:44 +00:00
Alexander V. Chernikov	7b42f6fae2	* Convert TOE framework to use new routing api. * Add fib6_lookup_nh_ext(). * Rename union structures: nhop64_basic -> nhopu_basic, nhop64_extended -> nhopu_extended	2014-10-25 18:25:00 +00:00
Alexander V. Chernikov	9f65116cc1	* Increase nh_flags to be u16 thus reducing nhop payload to be 48 bytes * Use NHF_ namespace for all nhop flags * Rename nhop_data -> nhop_prepend * Rename fib4_lookup_nh_extended -> fib4_lookup_nh_ext * Add "flags" argument to fib4_lookup_nh_ext() to specify whether we want returned nh_ext structure to be refcounted or not.	2014-10-25 15:32:56 +00:00
Michael Tuexen	b3817112b4	Fix a use of an uninitialized variable by makeing sure that sctp_med_chunk_output() always initialized the reason_code instead of relying on the caller. The variable is only used for debugging purpose. This issue was reported by Peter Bostroem from Google. MFC after: 3 days	2014-10-25 09:25:29 +00:00
Alexander V. Chernikov	7de36f627e	Convert ip_fastfwd() to use new routing api.	2014-10-24 23:08:44 +00:00
Alexander V. Chernikov	b863adaaf3	Convert last piece of ip_forward to use new rouing api.	2014-10-24 22:00:25 +00:00
Alexander V. Chernikov	0652a307ca	Convert arpinput() to use new routing api.	2014-10-24 19:38:05 +00:00
Alexander V. Chernikov	5bc9d581fd	Convert all ip_rtaddr() users to fib4_lookup_nh_extended(). Remove ip_rtaddr().	2014-10-24 17:40:32 +00:00
Andrey V. Elsukov	a663aa4ce8	Remove redundant check and m_pullup() call.	2014-10-24 13:34:22 +00:00
Alexander V. Chernikov	f50706648c	Add new fib4_lookup_nh_extended() which fills in nhop4_extended structure without doinf L2 resolve. It also requires freeing references by calling fib4_free_nh_ext(). Convert in_pcbladdr() to use it. Convert tcp_maxmtu() to use it.	2014-10-23 23:11:04 +00:00
Alexander V. Chernikov	18d8739c6b	Convert inp_lookup_mcast_ifp() to new routing api.	2014-10-23 21:38:54 +00:00
Alexander V. Chernikov	2bb83c79f6	Rename ip_sendmbuf to fib4_sendmbuf() and move it to rt_nhops api. Convert IPv4 SAS to use new routing api.	2014-10-23 21:09:14 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Alexander V. Chernikov	b4e8f808bf	Switch IPv4 output path to use new routing api. The goals of the new API is to provide consumers with minimal needed information, but as fast as possible. So we provide full nexthop info copied into alighed on-cache structure instead of rte/ia pointers, their refcounts and locks. This does not provide solution for protecting from egress ifp destruction, but does not make it any worse. Current changes: nhops: Add fib4_lookup_prepend() function which stores either full L2+L3 prepend info (e.g. MAC header in case of plain IPv4) or L3 info with NH_FLAGS_L2_INCOMPLETE flag indicating that no valid L2 info exists and we have to take "slow" path. ip_output: Currently ip[ 46]_output consumers use 'struct route' for the following purposes: 1) double lookup avoidance(route caching) 2) plain route caching 3) get path MTU to be able to notify source. The former pattern is mostly used by various tunnels (gif, gre, stf). (Actually, gre is the only remaining, others were already converted. Their locking model did not scale good enogh to benefit from such caching, so we have (temporarily) removed it without any performance loss). Plain route caching used by SCTP is simply wrong and should be removed. Temporary break it for now just to be able to compile. Optimize path mtu reporting by providing it in new 'route_info' stucture. Minimize games with @ia locking/refcounting for route lookup: add special nhop[46]_extended structure to store more route attributes. Pointer to given structure can be passed to fib4_lookup_prepend() to indicate we want this info (we actually needs it for UDP and raw IP). ether_output: Provide light-weight ether_output2() call to deal with transmitting L2 frame (e.g. properly handle broadcast/simloop/bridge/ other L2 hooks before actually transmitting frame by if_transmit()). Add a hack based on new RT_NHOP ro_flag to distinguish which version should we call. Better way is probably to add a new "if_output_frame" driver callbacks. Next steps: * Convert ip_fastfwd part * Implement auto-growing array for per-radix nexthops * Implement LLE tracking for nexthop calculations to be able to immediately provide all necessary info in single route lookup for gateway routes * Switch radix locking scheme to runtime/cfg lock * Implement multipath support for rtsock * Implement "tracked nexthops" for tunnels (e.g. _proper_ nexthop caching) * Add IPv6 support for remaining parts (postponed not to interfere with user/ae/inet6 branch) * Consider adding "if_output_frame" driver call to ease logical frame pushing.	2014-10-19 21:07:35 +00:00
Michael Tuexen	84f3b49ac9	Fix the reported streams in a SCTP_STREAM_RESET_EVENT, if a sent incoming stream reset request was responded with failed or denied. Thanks to Peter Bostroem from Google for reporting the issue. MFC after: 3 days	2014-10-16 15:36:04 +00:00
Andrey V. Elsukov	0b9f5f8a5f	Overhaul if_gif(4): o convert to if_transmit; o use rmlock to protect access to gif_softc; o use sx lock to protect from concurrent ioctls; o remove a lot of unneeded and duplicated code; o remove cached route support (it won't work with concurrent io); o style fixes. Reviewed by: melifaro Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2014-10-14 13:31:47 +00:00
Sean Bruno	882ac53ed7	Handle small file case with regards to plpmtud blackhole detection. Submitted by: Mikhail <mp@lenta.ru> MFC after: 2 weeks Relnotes: yes	2014-10-13 21:06:21 +00:00
Sean Bruno	0f3e3bc526	Catch ipv6 case when attempting to do PLPMTUD blackhole detection. Submitted by: Mikhail <mp@lenta.ru> MFC after: 2 weeks Relnotes: yes	2014-10-13 21:05:29 +00:00
Alexander V. Chernikov	2930362fb1	Fix matching default rule on clear/show commands. Found by: Oleg Ginzburg	2014-10-13 13:49:28 +00:00
Julien Charbon	489dcc9262	A connection in TIME_WAIT state before calling close() actually did not received any RST packet. Do not set error to ECONNRESET in this case. Differential Revision: https://reviews.freebsd.org/D879 Reviewed by: rpaulo, adrian Approved by: jhb (mentor) Sponsored by: Verisign, Inc.	2014-10-12 23:01:25 +00:00
Robert Watson	f0cace5d94	When deciding whether to call m_pullup() even though there is adequate data in an mbuf, use M_WRITABLE() instead of a direct test of M_EXT; the latter both unnecessarily exposes mbuf-allocator internals in the protocol stack and is also insufficient to catch all cases of non-writability. (NB: m_pullup() does not actually guarantee that a writable mbuf is returned, so further refinement of all of these code paths continues to be required.) Reviewed by: bz MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D900	2014-10-12 15:49:52 +00:00
John Baldwin	585a4290ab	Update ip_divert.ko to depend on version 3 of ipfw.	2014-10-11 16:08:54 +00:00
Bryan Venteicher	81d3ec1763	Add context pointer and source address to the UDP tunnel callback These are needed for the forthcoming vxlan implementation. The context pointer means we do not have to use a spare pointer field in the inpcb, and the source address is required to populate vxlan's forwarding table. While I highly doubt there is an out of tree consumer of the UDP tunneling callback, this change may be a difficult to eventually MFC. Phabricator: https://reviews.freebsd.org/D383 Reviewed by: gnn	2014-10-10 06:08:59 +00:00
Bryan Venteicher	a0a9e1b57c	Add missing UDP multicast receive dtrace probes Phabricator: https://reviews.freebsd.org/D924 Reviewed by: rpaulo markj MFC after: 1 month	2014-10-09 22:36:21 +00:00
Michael Tuexen	e03159ea69	Ensure that the flags field of sctp_tmit_chunks is initialized. Thanks to Peter Bostroem from Google for reporting the issue. MFC after: 3 days	2014-10-09 20:08:12 +00:00
Alexander V. Chernikov	779b53d008	Sync to HEAD@r272825.	2014-10-09 15:35:28 +00:00
Marcel Moolenaar	80b47aefa1	Move the SCTP syscalls to netinet with the rest of the SCTP code. The syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout) Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:16:52 +00:00

... 3 4 5 6 7 ...

5416 Commits