freebsd-skq

Author	SHA1	Message	Date
melifaro	9284124548	Use "hash" value in rtalloc_mpath_fib() instead of RTF_ANNOUNCE flag. Hashing method is the same as in in6_src.c. (Probably we need better one). MFC after: 2 weeks	2014-04-26 16:46:33 +00:00
melifaro	7b860c446e	Unify sa_equal() macro usage. MFC after: 2 weeks	2014-04-26 14:52:03 +00:00
asomers	f8a34b6f49	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic	2014-04-24 23:56:56 +00:00
ae	661b8edb01	Remove unused variable. PR: 173521 MFC after: 1 week Sponsored by: Yandex LLC	2014-04-17 06:40:11 +00:00
ae	60c35fc4bc	Properly release the in6_multi lock. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-12 02:05:31 +00:00
kevlo	bd0aef7d51	Minor style cleanups.	2014-04-07 01:55:53 +00:00
kevlo	45fcb795ff	Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks. Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian	2014-04-07 01:53:03 +00:00
ae	3b453386af	Remove unused label. MFC after: 1 week	2014-03-31 14:40:35 +00:00
ae	1348bd8004	Don't generate an ICMPv6 error message if packet was consumed by filter. MFC after: 1 week Sponsored by: Yandex LLC	2014-03-31 14:27:22 +00:00
rwatson	f411704afc	Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)	2014-03-15 00:57:50 +00:00
glebius	d734bed796	Since both netinet/ and netinet6/ call into netipsec/ and netpfil/, the protocol specific mbuf flags are shared between them. - Move all M_FOO definitions into a single place: netinet/in6.h, to avoid future clashes. - Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted in a failure of operation of IPSEC and packet filters. Thanks to Nicolas and Georgios for all the hard work on bisecting, testing and finally finding the root of the problem. PR: kern/186755 PR: kern/185876 In collaboration with: Georgios Amanakis <gamanakis gmail.com> In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com> Sponsored by: Nginx, Inc.	2014-03-12 14:29:08 +00:00
glebius	8a3e4bbebb	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
jhb	d9d6b88f18	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
rodrigc	0b83820bc8	Remove KASSERT from in6p_lookup_mcast_ifp(). When the devel/jenkins port, version 1.551 was started, the kernel would panic if INVARIANTS was enabled in the kernel config. Suggested by: bms	2014-02-23 01:27:22 +00:00
glebius	f62415c467	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
melifaro	dfdcfd9e83	Further simplify nd6_output_lle. Currently we have 3 usage patterns: 1) nd6_output (most traffic flow, no lle supplied, lle RLOCK sufficient) 2) corner cases for output (no lle, STALE lle, so on). lle WLOCK needed. 3) nd* iunternal machinery (WLOCK'ed lle provided, perform packet queing). We separate case 1 and implement it inside its only customer - nd6_output. This leads to some code duplication (especialy SEND stuff, which should be hooked to output in a different way), but simplifies locking and control flow logic fir nd6_output_lle. Reviewed by: ae MFC after: 3 weeks Sponsored by: Yandex LLC	2014-02-13 19:09:04 +00:00
ae	6fb6c5eabc	Drop packets to multicast address whose scop field contains the reserved value 0. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-13 14:10:44 +00:00
brueffer	b55833da5c	Only count table lookups when we're actually processing packets. PR: 183462 Submitted by: Sven-Thorsten Dietrich <thebigcorporation at gmail.com> Reviewed by: bms MFC after: 1 month	2014-02-10 14:47:51 +00:00
brueffer	576b9091dd	For IPv6, return the same error code as IPv4 when mrouter is not initialized. PR: 178472 Submitted by: Sven-Thorsten Dietrich <sven at vyatta.com> Reviewed by: bms	2014-02-10 14:36:51 +00:00
melifaro	0525ade939	Simplify nd6_output_lle: * Check ND6_IFF_IFDISABLED before acquiring any locks * Assume m is always non-NULL * remove 'bad' case not used anymore * Simply if_output conditional MFC after: 2 weeks Sponsored by: Yandex LLC	2014-02-10 12:52:33 +00:00
glebius	9d7706f9f4	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
ae	3fc4af35a2	Unlock entry before retry. Submitted by: melifaro MFC after: 1 week	2014-02-07 10:58:46 +00:00
ae	d86e904d3f	Take exclusive lock only when lle isn't NULL. We don't need write access to lle in most cases. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-02 07:28:04 +00:00
melifaro	6ee59753ec	Further rework netinet6 address handling code: * Set ia address/mask values BEFORE attaching to address lists. Inet6 address assignment is not atomic, so the simplest way to do this atomically is to fill in ia before attach. * Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code). * Do some renamings: in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here) in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code) in6_ifaddloop -> nd6_add_ifa_lle in6_ifremloop -> nd6_rem_ifa_lle * Split working with LLE and route announce code for last two. Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour. * Call device SIOCSIFADDR handler IFF we're adding first address. In IPv4 we have to call it on every address change since ARP record is installed by arp_ifinit() which is called by given handler. IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so there is no reason to call SIOCSIFADDR often.	2014-01-19 16:07:27 +00:00
melifaro	421d2fc5eb	Use in6_localip() instead of hand-rolled cycle. MFC after: 2 weeks	2014-01-18 20:54:55 +00:00
melifaro	4e82296063	Add in6_prepare_ifra() function to ease preparing in-kernel IPv6 address requests. MFC after: 2 weeks	2014-01-18 20:32:59 +00:00
melifaro	9f1142ff95	Do some style(9) not done in r260851 to improve readability. MFC after: 2 weeks	2014-01-18 15:57:43 +00:00
melifaro	9b02dc0fae	Split in6_update_ifa() into smaller pieces leaving functionality intact. Discussed with: ae MFC after: 2 weeks	2014-01-18 15:52:52 +00:00
ae	308e5129f6	Mechanically replace direct accessing to if_xname to using if_name() macro.	2014-01-10 12:33:28 +00:00
jmg	016d28765a	revert part of r260485 which changes how part of the header gets included.. netstat uses -DKERNEL=1 to get these parts and breaks the build w/o it... melifaro@ says that ae@ is probably asleep, and the PR doesn't have this part of the patch... Probably a local change got in by accident.. PR: 185148 Pointy hat to: ae@	2014-01-09 22:41:18 +00:00
ae	15b36ec523	Remove extra nesting from X_ip6_mforward() function. Also remove disabled definitions from ip6_mroute.h. PR: 185148 Sponsored by: Yandex LLC	2014-01-09 15:38:28 +00:00
ae	1e65346e1d	Add MRT6_DLOG() macro for debugging. Reduce number of MRT6DEBUG ifdefs and fix some broken format strings. MFC after: 1 week Sponsored by: Yandex LLC	2014-01-09 14:58:06 +00:00
melifaro	db2be6a793	Introduce IN6_MASK_ADDR() macro to unify various hand-rolled code to do IPv6 addr & mask in different places. MFC after: 2 weeks	2014-01-08 22:13:32 +00:00
ae	6ba4e83021	Use pointer to struct sockaddr_in6 in lla_lookup() call. This prevents from triggering KASSERT in in6_lltable_lookup.	2014-01-03 02:40:56 +00:00
ae	941bb837f9	Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag. MFC after: 1 week	2014-01-03 02:32:05 +00:00
ae	4b9dcf4e75	lla_lookup() does modification only when LLE_CREATE is specified. Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing lla_lookup() without LLE_CREATE flag. Reviewed by: glebius, adrian MFC after: 1 week Sponsored by: Yandex LLC	2014-01-02 08:40:37 +00:00
adrian	64782f1b07	Use an RLOCK here instead of an RWLOCK - matching all the other calls to lla_lookup(). This drastically reduces the very high lock contention when doing parallel TCP throughput tests (> 1024 sockets) with IPv6. Tested: * parallel IPv6 TCP bulk data exchange, 8192 sockets MFC after: 1 week Sponsored by: Netflix, Inc.	2014-01-01 00:56:26 +00:00
bz	1b0023f911	Correct warnings comparing unsigned variables < 0 constantly reported while building kernels. All instances removed are indeed unsigned so the expressions could not be true. MFC after: 1 week	2013-12-25 20:08:44 +00:00
dim	a51bfa2a7a	In sys/netinet6/in6_mcast.c, in6m_is_ifp_detached() is only used whenever KTR is defined, so put it between #ifdef KTR guards. This avoids a warning about a unused function if KTR is not enabled. MFC after: 3 days	2013-12-24 20:30:13 +00:00
ae	809aa53485	Free mbuf in case of error. MFC after: 1 week	2013-12-17 10:53:17 +00:00
attilio	7ee4e910ce	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
ae	f03201ef21	Fix panic with RADIX_MPATH, when RTFREE_LOCKED() called for already unlocked route. Use in6_rtalloc() instead of in6_rtalloc1. This helps simplify the code and remove several now unused variables. PR: 156283 MFC after: 2 weeks	2013-11-11 12:49:00 +00:00
glebius	3c1f482e0e	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
tuexen	d30ae7faf7	Changes from upstream to improve compilation when INET or INET6 or none of them is defined. MFC after: 3 days	2013-11-02 20:12:19 +00:00
glebius	f469ae1d45	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
glebius	2c1ec831c9	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
glebius	ff6e113f1b	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
ae	d4d55062e5	Initialize inc_fibnum for properly handling ICMP6_PACKET_TOO_BIG errors in multifib environment. PR: 183265 MFC after: 1 week	2013-10-25 01:02:25 +00:00
glebius	790225cfbc	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 11:37:57 +00:00
glebius	564d02b304	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:31:42 +00:00

1 2 3 4 5 ...

1297 Commits