freebsd-skq

Author	SHA1	Message	Date
Gleb Smirnoff	e3a7aa6f56	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
Gleb Smirnoff	2a7da7299d	Remove ifa_ref()/ifa_free(), which are atomic(9), from ip_output(). The ifaddr is already referenced by the rtentry, and we are holding reference on the rtentry throughout the function execution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-04 19:49:41 +00:00
John Baldwin	5b26ea5df3	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
Gleb Smirnoff	4a2dd8d4fb	Improve logging of send errors, reporting error code and interface. Reduce code duplication between INET and INET6. Tested by: Lytochkin Boris <lytboris gmail.com>	2014-02-22 19:20:40 +00:00
Michael Tuexen	1213f0e749	Remove redundant code and fix a style error. MFC after: 3 days	2014-02-20 20:14:43 +00:00
Gleb Smirnoff	0ff96b4f55	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
Mikolaj Golub	db2f5a2461	Fixup for r261590 (vnet sysctl handlers cleanup). Reviewed by: glebius	2014-02-09 08:13:17 +00:00
Gleb Smirnoff	5d6d7e756b	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
Gleb Smirnoff	92f8975ff4	Utilize SYSCTL_UMA_CUR() to export usage of syncache and tcp reassembly zones. Sponsored by: Nginx, Inc.	2014-02-07 14:31:51 +00:00
Gleb Smirnoff	96d111245f	Catch up on r261590.	2014-02-07 14:26:33 +00:00
Peter Wemm	62b90589e4	Adjust r239672 from rrs and r258821 from eadler. By definition, the very first FIN is not a duplicate. Process it normally and don't feed it to congestion control as though it were a dupe. Don't prevent CC from seeing later dupe acks while in a half close state.	2014-01-28 21:13:15 +00:00
George V. Neville-Neil	6f3caa6d81	Decrease lock contention within the TCP accept case by removing the INP_INFO lock from tcp_usr_accept. As the PR/patch states this was following the advice already in the code. See the PR below for a full disucssion of this change and its measured effects. PR: 183659 Submitted by: Julian Charbon Reviewed by: jhb	2014-01-28 20:28:32 +00:00
Gleb Smirnoff	547246a373	Fix fallout from r241923. Calculate length of payload in pim_input() properly. While here, remove extra variable and incorrect condition before m_pullup(). Reported by: Olivier Cochard-Labbé <olivier cochard.me> Sponsored by: Nginx, Inc.	2014-01-22 10:57:42 +00:00
Alexander V. Chernikov	f6b84910bb	Further rework netinet6 address handling code: * Set ia address/mask values BEFORE attaching to address lists. Inet6 address assignment is not atomic, so the simplest way to do this atomically is to fill in ia before attach. * Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code). * Do some renamings: in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here) in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code) in6_ifaddloop -> nd6_add_ifa_lle in6_ifremloop -> nd6_rem_ifa_lle * Split working with LLE and route announce code for last two. Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour. * Call device SIOCSIFADDR handler IFF we're adding first address. In IPv4 we have to call it on every address change since ARP record is installed by arp_ifinit() which is called by given handler. IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so there is no reason to call SIOCSIFADDR often.	2014-01-19 16:07:27 +00:00
Adrian Chadd	9db69902c6	If the flowid is available for the mbuf that finalised the creation of a syncache connection, copy it into the inp_flowid field. Without this, an incoming TCP connection won't have an inp_flowid marked until some data comes in, and this means that things like the per-CPU TCP timer option will choose a different CPU for the timer work. (It also means that if one grabbed the flowid via an ioctl from userland, it won't be available until some data has been received.) Sponsored by: Netflix, Inc.	2014-01-18 23:48:20 +00:00
George V. Neville-Neil	d9e1bc4f0d	Fix various places where we don't properly release a lock PR: 185043 Submitted by: Michael Bentkofsky MFC after: 2 weeks	2014-01-16 22:14:54 +00:00
Gleb Smirnoff	3c065f2f3e	Cleanup comments and whitespace. No functional changes.	2014-01-16 12:58:03 +00:00
Alexander V. Chernikov	a49b317c41	Fix refcount leak on netinet ifa. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Yandex LLC	2014-01-16 12:35:18 +00:00
Alexander V. Chernikov	054692a4bd	Fix ipfw fwd for IPv4 traffic broken by r249894. Problem case: Original lookup returns route with GW set, so gw points to rte->rt_gateway. After that we're changing dst and performing lookup another time. Since fwd host is most probably directly reachable, resulting rte does not contain rt_gateway, so gw is not set. Finally, we end with packet transmitted to proper interface but wrong link-layer address. Found by: lstewart Discussed with: ae,lstewart MFC after: 2 weeks Sponsored by: Yandex LLC	2014-01-16 11:50:00 +00:00
Alexander V. Chernikov	d375edc9b5	Simplify inet alias handling code: if we're adding/removing alias which has the same prefix as some other alias on the same interface, use newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg(). This eliminates the following rtsock messages: Pinned RTM_ADD for prefix (for alias addition). Pinned RTM_DELETE for prefix (for alias withdrawal). Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24): before commit, addition: got message of size 116 on Fri Jan 10 14:13:15 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 got message of size 192 on Fri Jan 10 14:13:15 2014 RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff after commit, addition: got message of size 116 on Fri Jan 10 13:56:26 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255 before commit, wihdrawal: got message of size 192 on Fri Jan 10 13:58:59 2014 RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff got message of size 116 on Fri Jan 10 13:58:59 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 adter commit, withdrawal: got message of size 116 on Fri Jan 10 14:14:11 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong (and requires some hacks to keep prefix in route table on RTM_DELETE). I've tested this change with quagga (no change) and bird (). bird alias handling is already broken in BSD sysdep code, so nothing changes here, too. I'm going to MFC this change if there will be no complains about behavior change. While here, fix some style(9) bugs introduced by r260488 (pointed by glebius and bde). Sponsored by: Yandex LLC MFC after: 4 weeks	2014-01-10 12:13:55 +00:00
Gleb Smirnoff	0cc726f25a	Make failure of ifpromisc() a non-fatal error. This makes it possible to run carp(4) on vtnet(4). Sponsored by: Nginx, Inc.	2014-01-03 11:03:12 +00:00
Andrey V. Elsukov	e2d14d9317	Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag. MFC after: 1 week	2014-01-03 02:32:05 +00:00
Gleb Smirnoff	183e1c8634	Fix regression from r249894. Now we pass "gw" as argument to if_output method, thus for multicast case we need it to point at "dst". PR: 185395 Submitted by: ae	2014-01-02 10:18:39 +00:00
Andrey V. Elsukov	ea0c377602	lla_lookup() does modification only when LLE_CREATE is specified. Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing lla_lookup() without LLE_CREATE flag. Reviewed by: glebius, adrian MFC after: 1 week Sponsored by: Yandex LLC	2014-01-02 08:40:37 +00:00
Gleb Smirnoff	9706c950a2	Fix couple of bugs from r257692 related to scan of address list on an interface: - in in_control() skip over not AF_INET addresses. - in in_aifaddr_ioctl() and in_difaddr_ioctl() do correct check of address family, w/o accessing memory beyond struct ifaddr. Sponsored by: Nginx, Inc.	2013-12-29 22:20:06 +00:00
Michael Tuexen	04aab884d7	Address some warnings which showed up on the userland version. MFC after: 1 week	2013-12-27 13:07:00 +00:00
Sergey Kandaurov	b8b4cfcdf6	Draft-ietf-tcpm-initcwnd-05 became RFC6928. MFC after: 1 week	2013-12-26 04:24:08 +00:00
Bjoern A. Zeeb	415167d52b	Add more (IPv6) related Internet Protocols: - Host Identity Protocol (RFC5201) - Shim6 Protocol (RFC5533) - 2x experimentation and testing (RFC3692, RFC4727) This does not indicate interest to implement/support these protocols, but they are part of the "IPv6 Extension Header Types" [1] based on RFC7045 and might thus be needed by filtering and next header parsing implementations. References: [1] http://www.iana.org/assignments/ipv6-parameters Obtained from: http://www.iana.org/assignments/protocol-numbers MFC after: 1 week	2013-12-25 20:26:49 +00:00
Gleb Smirnoff	ec5df3a7b1	It'll be okay to use LibAliasDetachHandlers() here, relying on the fact that all handlers come from modules' bss and are followed by NODIR handler.	2013-12-25 09:43:51 +00:00
Gleb Smirnoff	535e0a0981	Cleanup alias module handler register/unregister. - Remove locking, since all module(9) events are running under &Giant. - Use TAILQ for protocol handlers and fix a bug which led to infinite cycle. Bug found in VirtualBox [1] - Simplify code everywhere. - Fix documentation. [1] https://www.virtualbox.org/pipermail/vbox-dev/2013-November/011936.html PR: 183792 [1] Submitted by: Valery Ushakov <uwe NetBSD.org> [1] Sponsored by: Nginx, Inc.	2013-12-25 03:24:20 +00:00
Gleb Smirnoff	2fb87f0892	Kill space at eols.	2013-12-25 02:06:57 +00:00
Gleb Smirnoff	1019f603d5	Remove from kernel the "dll" code.	2013-12-25 01:58:19 +00:00
Gleb Smirnoff	22d3fb1917	Whitespace cleanup.	2013-12-25 01:52:55 +00:00
Dimitry Andric	36f54f0aaa	In sys/netinet/in_mcast.c, inm_is_ifp_detached() is only used whenever KTR is defined, so put it between #ifdef KTR guards. This avoids a warning about a unused function if KTR is not enabled. MFC after: 3 days	2013-12-24 20:25:18 +00:00
Adrian Chadd	ac7e121247	Disable the now unpredicably bogus check for whether we have eneough queue space before queuing a bunch of IP fragments. As the comment in the committed change says, in the post-if_transmit(), post-SMP, post-preemption world, there's just too much overlapping concurrent code paths and different approaches to driver transmit queue management to have this code even remotely be effective. The only specific place it could be useful is if ALTQ is enabled but again it doesn't at all promise that all the fragments will be transmitted anyway. The main reason for committing this change is to disable a parallel place where the drops counter is incremented. This is a side effect of an upcoming change to ixgbe/cxgbe to handle the queue drops counter slightly better. Sponsored by: Netflix, Inc.	2013-12-20 07:41:03 +00:00
Eitan Adler	5f30ec9b63	In a situation where: - The remote host sends a FIN - in an ACK for a sequence number for which an ACK has already been received - There is still unacked data on route to the remote host - The packet does not contain a window update The packet may be dropped without processing the FIN flag. PR: kern/99188 Submitted by: Staffan Ulfberg <staffan@ulfberg.se> Discussed with: andre MFC after: never	2013-12-02 03:11:25 +00:00
Michael Tuexen	c302aeb123	In http://svnweb.freebsd.org/changeset/base/258221 I introduced a bug which initialized global locks whenever the SCTP stack initialized. This was fixed in http://svnweb.freebsd.org/changeset/base/258574 by rodrigc@. He just initialized the locks for the default vnet. This fix reverts to the old behaviour before r258221, which explicitly makes sure it is only called once, because this works also on other platforms. MFC after: 3 days X-MFC with: r258574.	2013-11-30 12:51:19 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Adrian Chadd	fa22ce1570	Convert over the TCP probes to use mtod() rather than directly dereferencing m->m_data. Sponsored by: Netflix, Inc.	2013-11-25 22:55:06 +00:00
Craig Rodrigues	c0c61281b4	Only initialize some mutexes for the default VNET. In r208160, sctp_it_ctl was made a global variable, across all VNETs. However, sctp_init() is called for every VNET that is created. This results in the same global mutexes which are part of sctp_it_ctl being initialized. This can result in crashes if many jails are created. To reproduce the problem: (1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS, INVARIANTS. (2) Run this command in a loop: jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html ) Witness will warn about the same mutex being initialized. Fix the problem by only initializing these mutexes in the default VNET.	2013-11-25 18:49:37 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Gleb Smirnoff	c1f7c3f500	In r257692 I intentionally deleted code that handled P2P interfaces with equal addresses on both sides. It appeared that OpenVPN uses such configutations. Submitted by: trociny	2013-11-17 15:14:07 +00:00
Mikolaj Golub	a3985bdd12	Deregister helper hooks on vnet destroy.	2013-11-17 15:09:39 +00:00
Michael Tuexen	2a44dbf682	Use SCTP_PR_SCTP_TTL when the user provides a positive timetolive in sctp_sendmsg(). MFC after: 3 days	2013-11-16 19:57:56 +00:00
Michael Tuexen	04194e4f7d	Remove a stray write operation. MFC after: 3 days	2013-11-16 16:09:09 +00:00
Michael Tuexen	dcb3fc4cd6	When determining if an address belongs to an stcb, take the address family into account for wildcard bound endpoints. MFC after: 3 days	2013-11-16 15:34:14 +00:00
Michael Tuexen	f4f34bde23	Cleanups which result in fixes which have been made upstream and where partially suggested by Andrew Galante. There is no functional change in FreeBSD. MFC after: 3 days	2013-11-16 15:04:49 +00:00
Gleb Smirnoff	555036b5f6	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
Gleb Smirnoff	2f3eb7f4d8	Make TCP_KEEP* socket options readable. At least PostgreSQL wants to read the values. Reported by: sobomax	2013-11-08 13:04:14 +00:00
Michael Tuexen	de72f4e54b	Get rid of the artification limitation enforced by SCTP_AUTH_RANDOM_SIZE_MAX. This was suggested by Andrew Galante. MFC after: 3 days	2013-11-07 18:50:11 +00:00

1 2 3 4 5 ...

4742 Commits