freebsd-dev

Author	SHA1	Message	Date
Hiroki Sato	705bef548a	Cancel DAD for an ifa when the ifp has ND6_IFF_IFDISABLED as early as possible and do not clear IN6_IFF_TENTATIVE. If IFDISABLED was accidentally set after a DAD started, TENTATIVE could be cleared because no NA was received due to IFDISABLED, and as a result it could prevent DAD when manually clearing IFDISABLED after that.	2014-05-16 15:53:31 +00:00
Alexander V. Chernikov	b980262e63	Pass radix head ptr along with rte to rtexpunge(). Rename rtexpunge to rt_expunge().	2014-05-03 16:28:54 +00:00
Alexander V. Chernikov	cf58751a44	Use "hash" value in rtalloc_mpath_fib() instead of RTF_ANNOUNCE flag. Hashing method is the same as in in6_src.c. (Probably we need better one). MFC after: 2 weeks	2014-04-26 16:46:33 +00:00
Alexander V. Chernikov	36d55f0f9d	Unify sa_equal() macro usage. MFC after: 2 weeks	2014-04-26 14:52:03 +00:00
Alan Somers	0cfee0c223	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic	2014-04-24 23:56:56 +00:00
Andrey V. Elsukov	52c57247d3	Remove unused variable. PR: 173521 MFC after: 1 week Sponsored by: Yandex LLC	2014-04-17 06:40:11 +00:00
Andrey V. Elsukov	4fd913364f	Properly release the in6_multi lock. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-12 02:05:31 +00:00
Kevin Lo	d1b18731d9	Minor style cleanups.	2014-04-07 01:55:53 +00:00
Kevin Lo	e06e816f67	Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks. Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian	2014-04-07 01:53:03 +00:00
Andrey V. Elsukov	cd71804c84	Remove unused label. MFC after: 1 week	2014-03-31 14:40:35 +00:00
Andrey V. Elsukov	27aa751c90	Don't generate an ICMPv6 error message if packet was consumed by filter. MFC after: 1 week Sponsored by: Yandex LLC	2014-03-31 14:27:22 +00:00
Robert Watson	7527624efa	Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)	2014-03-15 00:57:50 +00:00
Gleb Smirnoff	aa69c61235	Since both netinet/ and netinet6/ call into netipsec/ and netpfil/, the protocol specific mbuf flags are shared between them. - Move all M_FOO definitions into a single place: netinet/in6.h, to avoid future clashes. - Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted in a failure of operation of IPSEC and packet filters. Thanks to Nicolas and Georgios for all the hard work on bisecting, testing and finally finding the root of the problem. PR: kern/186755 PR: kern/185876 In collaboration with: Georgios Amanakis <gamanakis gmail.com> In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com> Sponsored by: Nginx, Inc.	2014-03-12 14:29:08 +00:00
Gleb Smirnoff	e3a7aa6f56	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
John Baldwin	5b26ea5df3	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
Craig Rodrigues	47a79fadc6	Remove KASSERT from in6p_lookup_mcast_ifp(). When the devel/jenkins port, version 1.551 was started, the kernel would panic if INVARIANTS was enabled in the kernel config. Suggested by: bms	2014-02-23 01:27:22 +00:00
Gleb Smirnoff	0ff96b4f55	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
Alexander V. Chernikov	f6990c4e3e	Further simplify nd6_output_lle. Currently we have 3 usage patterns: 1) nd6_output (most traffic flow, no lle supplied, lle RLOCK sufficient) 2) corner cases for output (no lle, STALE lle, so on). lle WLOCK needed. 3) nd* iunternal machinery (WLOCK'ed lle provided, perform packet queing). We separate case 1 and implement it inside its only customer - nd6_output. This leads to some code duplication (especialy SEND stuff, which should be hooked to output in a different way), but simplifies locking and control flow logic fir nd6_output_lle. Reviewed by: ae MFC after: 3 weeks Sponsored by: Yandex LLC	2014-02-13 19:09:04 +00:00
Andrey V. Elsukov	e4c77ca0c0	Drop packets to multicast address whose scop field contains the reserved value 0. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-13 14:10:44 +00:00
Christian Brueffer	d37872314f	Only count table lookups when we're actually processing packets. PR: 183462 Submitted by: Sven-Thorsten Dietrich <thebigcorporation at gmail.com> Reviewed by: bms MFC after: 1 month	2014-02-10 14:47:51 +00:00
Christian Brueffer	1b55364ed9	For IPv6, return the same error code as IPv4 when mrouter is not initialized. PR: 178472 Submitted by: Sven-Thorsten Dietrich <sven at vyatta.com> Reviewed by: bms	2014-02-10 14:36:51 +00:00
Alexander V. Chernikov	9dffa6a3f3	Simplify nd6_output_lle: * Check ND6_IFF_IFDISABLED before acquiring any locks * Assume m is always non-NULL * remove 'bad' case not used anymore * Simply if_output conditional MFC after: 2 weeks Sponsored by: Yandex LLC	2014-02-10 12:52:33 +00:00
Gleb Smirnoff	5d6d7e756b	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
Andrey V. Elsukov	74a976fffd	Unlock entry before retry. Submitted by: melifaro MFC after: 1 week	2014-02-07 10:58:46 +00:00
Andrey V. Elsukov	51eecdc35a	Take exclusive lock only when lle isn't NULL. We don't need write access to lle in most cases. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-02 07:28:04 +00:00
Alexander V. Chernikov	f6b84910bb	Further rework netinet6 address handling code: * Set ia address/mask values BEFORE attaching to address lists. Inet6 address assignment is not atomic, so the simplest way to do this atomically is to fill in ia before attach. * Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code). * Do some renamings: in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here) in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code) in6_ifaddloop -> nd6_add_ifa_lle in6_ifremloop -> nd6_rem_ifa_lle * Split working with LLE and route announce code for last two. Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour. * Call device SIOCSIFADDR handler IFF we're adding first address. In IPv4 we have to call it on every address change since ARP record is installed by arp_ifinit() which is called by given handler. IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so there is no reason to call SIOCSIFADDR often.	2014-01-19 16:07:27 +00:00
Alexander V. Chernikov	0c5d4bde90	Use in6_localip() instead of hand-rolled cycle. MFC after: 2 weeks	2014-01-18 20:54:55 +00:00
Alexander V. Chernikov	9080e7d023	Add in6_prepare_ifra() function to ease preparing in-kernel IPv6 address requests. MFC after: 2 weeks	2014-01-18 20:32:59 +00:00
Alexander V. Chernikov	b6a16fc853	Do some style(9) not done in r260851 to improve readability. MFC after: 2 weeks	2014-01-18 15:57:43 +00:00
Alexander V. Chernikov	60d7c722a5	Split in6_update_ifa() into smaller pieces leaving functionality intact. Discussed with: ae MFC after: 2 weeks	2014-01-18 15:52:52 +00:00
Andrey V. Elsukov	e74966f60b	Mechanically replace direct accessing to if_xname to using if_name() macro.	2014-01-10 12:33:28 +00:00
John-Mark Gurney	f2effe745c	revert part of r260485 which changes how part of the header gets included.. netstat uses -DKERNEL=1 to get these parts and breaks the build w/o it... melifaro@ says that ae@ is probably asleep, and the PR doesn't have this part of the patch... Probably a local change got in by accident.. PR: 185148 Pointy hat to: ae@	2014-01-09 22:41:18 +00:00
Andrey V. Elsukov	78415d1082	Remove extra nesting from X_ip6_mforward() function. Also remove disabled definitions from ip6_mroute.h. PR: 185148 Sponsored by: Yandex LLC	2014-01-09 15:38:28 +00:00
Andrey V. Elsukov	0a6b0ffa54	Add MRT6_DLOG() macro for debugging. Reduce number of MRT6DEBUG ifdefs and fix some broken format strings. MFC after: 1 week Sponsored by: Yandex LLC	2014-01-09 14:58:06 +00:00
Alexander V. Chernikov	1dc8f6a82c	Introduce IN6_MASK_ADDR() macro to unify various hand-rolled code to do IPv6 addr & mask in different places. MFC after: 2 weeks	2014-01-08 22:13:32 +00:00
Andrey V. Elsukov	b88aef1dcf	Use pointer to struct sockaddr_in6 in lla_lookup() call. This prevents from triggering KASSERT in in6_lltable_lookup.	2014-01-03 02:40:56 +00:00
Andrey V. Elsukov	e2d14d9317	Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag. MFC after: 1 week	2014-01-03 02:32:05 +00:00
Andrey V. Elsukov	ea0c377602	lla_lookup() does modification only when LLE_CREATE is specified. Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing lla_lookup() without LLE_CREATE flag. Reviewed by: glebius, adrian MFC after: 1 week Sponsored by: Yandex LLC	2014-01-02 08:40:37 +00:00
Adrian Chadd	c445d2520d	Use an RLOCK here instead of an RWLOCK - matching all the other calls to lla_lookup(). This drastically reduces the very high lock contention when doing parallel TCP throughput tests (> 1024 sockets) with IPv6. Tested: * parallel IPv6 TCP bulk data exchange, 8192 sockets MFC after: 1 week Sponsored by: Netflix, Inc.	2014-01-01 00:56:26 +00:00
Bjoern A. Zeeb	010c2b8192	Correct warnings comparing unsigned variables < 0 constantly reported while building kernels. All instances removed are indeed unsigned so the expressions could not be true. MFC after: 1 week	2013-12-25 20:08:44 +00:00
Dimitry Andric	6c5a340e56	In sys/netinet6/in6_mcast.c, in6m_is_ifp_detached() is only used whenever KTR is defined, so put it between #ifdef KTR guards. This avoids a warning about a unused function if KTR is not enabled. MFC after: 3 days	2013-12-24 20:30:13 +00:00
Andrey V. Elsukov	569aad57d2	Free mbuf in case of error. MFC after: 1 week	2013-12-17 10:53:17 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Andrey V. Elsukov	ee674966f4	Fix panic with RADIX_MPATH, when RTFREE_LOCKED() called for already unlocked route. Use in6_rtalloc() instead of in6_rtalloc1. This helps simplify the code and remove several now unused variables. PR: 156283 MFC after: 2 weeks	2013-11-11 12:49:00 +00:00
Gleb Smirnoff	555036b5f6	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
Michael Tuexen	b54ddf225f	Changes from upstream to improve compilation when INET or INET6 or none of them is defined. MFC after: 3 days	2013-11-02 20:12:19 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Andrey V. Elsukov	baa09f1891	Initialize inc_fibnum for properly handling ICMP6_PACKET_TOO_BIG errors in multifib environment. PR: 183265 MFC after: 1 week	2013-10-25 01:02:25 +00:00
Gleb Smirnoff	7caf4ab7ac	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 11:37:57 +00:00
Gleb Smirnoff	4675896098	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:31:42 +00:00
Gleb Smirnoff	6ed910fabe	Hide 'struct ifaddr' definition from userland. Two tools left that use it, namely ipftest(1) and ifmcstat(1). These sniff structure definition using _WANT_IFADDR define. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:19:24 +00:00
Gleb Smirnoff	3fa98cf9ac	Remove unsigned < 0 check.	2013-10-15 10:12:19 +00:00
Gleb Smirnoff	ca695e0807	Remove useless check of ia6 against NULL, right after dereferencing it.	2013-10-15 10:11:23 +00:00
Gleb Smirnoff	0218539652	Now counter_u64_t is known to userland, thus remove hack from r253086. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:09:33 +00:00
Hiroki Sato	6378e1f369	Do not try to detach if the interface does not support IPv6. Tested by: hselasky PR: usb/182820 Approved by: re (glebius)	2013-10-10 09:43:15 +00:00
Gleb Smirnoff	491b520174	Fix mbuf leak. Submitted by: Loganaden Velvindron <logan elandsys.com> Obtained from: NetBSD Approved by: re (kib)	2013-10-07 12:07:40 +00:00
Bjoern A. Zeeb	fd291ae3ec	Update comment from draft to RFC number. Submitted by: Loganaden Velvindron (logan elandsys.com) Approved by: re (gjb) MFC after: 6 days	2013-09-22 14:53:07 +00:00
Mikolaj Golub	4d3dfd450a	Unregister inet/inet6 pfil hooks on vnet destroy. Discussed with: andre Approved by: re (rodrigc)	2013-09-13 18:45:10 +00:00
Dag-Erling Smørgrav	1a05c762b9	Fix the length calculation for the final block of a sendfile(2) transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re	2013-09-10 10:05:59 +00:00
John Baldwin	fa302f207f	Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This matches the types used when computing hash indices and the type of the maximum size of mfchashtbl[]. PR: kern/181821 Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4) MFC after: 1 week	2013-09-05 14:16:37 +00:00
John Baldwin	fd77bbb967	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
Mark Johnston	57f6086735	Implement the ip, tcp, and udp DTrace providers. The probe definitions use dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month	2013-08-25 21:54:41 +00:00
Michael Tuexen	1a94cdbea7	Provide human readable debug output.	2013-08-25 12:44:03 +00:00
Andre Oppermann	9850f95989	For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits. The upper 32bits are not occupied for now. Sponsored by: The FreeBSD Foundation	2013-08-25 09:49:00 +00:00
Andre Oppermann	1b4381afbb	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation	2013-08-24 19:51:18 +00:00
Xin LI	acde2476c4	Fix an integer overflow in computing the size of a temporary buffer can result in a buffer which is too small for the requested operation. Security: CVE-2013-3077 Security: FreeBSD-SA-13:09.ip_multicast	2013-08-22 00:51:37 +00:00
Andre Oppermann	86bd049144	Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius	2013-08-19 13:27:32 +00:00
Andre Oppermann	88388bdcbe	Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific flag instead. The flag is only used within the IP and IPv6 layer 3 protocols. Because some firewall packages treat IPv4 and IPv6 packets the same the flag should have the same value for both. Discussed with: trociny, glebius	2013-08-19 11:08:36 +00:00
Hiroki Sato	5a04191532	Return 0 in nbi->expire when la_expire == 0. Conversion from time_uptime to time_second should not be performed in this case.	2013-08-17 07:14:45 +00:00
Hiroki Sato	ffa0165ae0	Fix incompatibility in ICMPV6CTL_ND6_PRLIST sysctl, and SIOCGPRLST_IN6, SIOCGDRLST_IN6, and SIOCGNBRINFO_IN6 ioctl. These userland interfaces treat expiration times in time_second, not time_uptime.	2013-08-06 17:10:52 +00:00
Hiroki Sato	7d26db1792	- Use time_uptime instead of time_second in data structures for PF_INET6 in kernel. This fixes various malfunction when the wall time clock is changed. Bump __FreeBSD_version to 1000041. - Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities. MFC after: 1 month	2013-08-05 20:13:02 +00:00
Hiroki Sato	41541ebf94	Fix a panic in tmpaddrtimer.	2013-08-05 00:36:12 +00:00
Hiroki Sato	0de0dd9be8	Allocate in6_ifextra (ifp->if_afdata[AF_INET6]) only for IPv6-capable interfaces. This eliminates unnecessary IPv6 processing for non-IPv6 interfaces. MFC after: 3 days	2013-07-31 16:24:49 +00:00
Andrey V. Elsukov	6794f46021	Remove the large part of struct ipsecstat. Only few fields of this structure is used, but they already have equal fields in the struct newipsecstat, that was introduced with FAST_IPSEC and then was merged together with old ipsecstat structure. This fixes kernel stack overflow on some architectures after migration ipsecstat to PCPU counters. Reported by: Taku YAMAMOTO, Maciej Milewski	2013-07-23 14:14:24 +00:00
Mikolaj Golub	f122b319eb	A complete duplication of binding should be allowed if on both new and duplicated sockets a multicast address is bound and either SO_REUSEPORT or SO_REUSEADDR is set. But actually it works for the following combinations: * SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new; * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new; * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new; and fails for this: * SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new. Fix the last case. PR: 179901 MFC after: 1 month	2013-07-12 19:08:33 +00:00
Andrey V. Elsukov	9f0f032d10	Correct the size of allocated memory to store array of counters.	2013-07-09 15:20:46 +00:00
Andrey V. Elsukov	2841260cd6	Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters.	2013-07-09 09:59:46 +00:00
Andrey V. Elsukov	a786f67981	Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters.	2013-07-09 09:54:54 +00:00
Andrey V. Elsukov	c80211e3cf	Prepare network statistics structures for migration to PCPU counters. Use uint64_t as type for all fields of structures. Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat, in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat, pfkeystat, pim6stat, pimstat, rip6stat, udpstat. Discussed with: arch@	2013-07-09 09:32:06 +00:00
Mikolaj Golub	efdf104bca	In r227207, to fix the issue with possible NULL inp_socket pointer dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR for multicast), INP_REUSEPORT flag was introduced to cache the socket option. It was decided then that one flag would be enough to cache both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR setsockopt(2), it was checked if it was called for a multicast address and INP_REUSEPORT was set accordingly. Unfortunately that approach does not work when setsockopt(2) is called before binding to a multicast address: the multicast check fails and INP_REUSEPORT is not set. Fix this by adding INP_REUSEADDR flag to unconditionally cache SO_REUSEADDR. PR: 179901 Submitted by: Michael Gmelin freebsd grem.de (initial version) Reviewed by: rwatson MFC after: 1 week	2013-07-04 18:38:00 +00:00
Hiroki Sato	af8056441e	- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal. To configure an autoconfigured link-local address (RFC 4862), the following rc.conf(5) configuration can be used: ifconfig_bridge0_ipv6="inet6 auto_linklocal" - if_bridge(4) now removes IPv6 addresses on a member interface to be added when the parent interface or one of the existing member interfaces has an IPv6 address. if_bridge(4) merges each link-local scope zone which the member interfaces form respectively, so it causes address scope violation. Removal of the IPv6 addresses prevents it. - if_lagg(4) now removes IPv6 addresses on a member interfaces unconditionally. - Set reasonable flags to non-IPv6-capable interfaces. [] Submitted by: rpaulo [] MFC after: 1 week	2013-07-02 16:58:15 +00:00
Qing Li	378aa8d85e	Delete the nd6 entries associated with an off-link prefix if the same prefix cannot be found on an alternative interface. Reviewed by: hrs MFC after: 1 week	2013-06-24 05:01:13 +00:00
Andrey V. Elsukov	6659296cb0	Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics accounting. MFC after: 2 weeks	2013-06-20 09:55:53 +00:00
Andrey V. Elsukov	faf146b9e5	Use PIM6STAT_INC() and MRT6STAT_INC() macros for IPv6 multicast statistics accounting. MFC after: 2 weeks	2013-06-19 21:50:17 +00:00
Andrey V. Elsukov	f1d7ebfe05	Use RIP6STAT_INC() macro for raw ip6 statistics accounting. MFC after: 2 weeks	2013-06-19 20:48:34 +00:00
Andrey V. Elsukov	49dc650cea	Use ICMP6STAT_INC() macro for ICMPv6 errors accounting. MFC after: 2 weeks	2013-06-19 15:59:21 +00:00
Alexander V. Chernikov	6bdfdb2c5e	Really fix netmask address family this time. MFC with: r250813	2013-05-19 19:42:46 +00:00
Alexander V. Chernikov	346e9c9de8	Finish r85740 : Make IPv6 netmask has address family set. This pleases routing daemons like bird. MFC after: 2 weeks	2013-05-19 19:19:01 +00:00
Julian Elischer	4871fc4ab5	Finally change the mbuf to have its own fib field instead of stealing 4 flag bits. This was supposed to happen in 8.0, and again in 2012.. MFC after: never	2013-05-16 16:20:17 +00:00
Michael Tuexen	3457ccdaea	Honor the net.inet6.ip6.v6only sysctl variable and the IPV6_V6ONLY socket option for SCTP sockets in the same way as for UDP or TCP sockets. MFC after: 2 weeks	2013-05-10 18:09:38 +00:00
Hiroki Sato	5df1b6b57e	Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group Address. Although KAME implementation used FF02:0:0:0:0:2::/96 based on older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed in RFC 4620. The kernel always joins the /104-prefixed address, and additionally does /96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1. The default value of the sysctl is 1. ping6(8) -N flag now uses /104-prefixed one. When this flag is specified twice, it uses /96-prefixed one instead. Reviewed by: ume Based on work by: Thomas Scheffler PR: conf/174957 MFC after: 2 weeks	2013-05-04 19:16:26 +00:00
Gleb Smirnoff	47e8d432d5	Add const qualifier to the dst parameter of the ifnet if_output method.	2013-04-26 12:50:32 +00:00
Andrey V. Elsukov	817f395375	Remove unused variable. MFC after: 1 week	2013-04-24 10:24:01 +00:00
Oleg Bulyzhin	1571132f14	Plug static llentry leak (ipv4 & ipv6 were affected). PR: kern/172985 MFC after: 1 month	2013-04-21 21:28:38 +00:00
Tijl Coosemans	6c81895dab	Fix build after r249543.	2013-04-16 16:59:29 +00:00
Andrey V. Elsukov	4ff7c740fe	Fix accounting after the r249528, also add several another counters to the statistics.	2013-04-16 11:31:26 +00:00
Andrey V. Elsukov	eca4d72003	Use IP6S_M2MMAX macro.	2013-04-16 11:19:13 +00:00
Andrey V. Elsukov	43851aae9a	Replace hardcoded numbers.	2013-04-16 11:12:58 +00:00
Andrey V. Elsukov	e7a87117d3	The source address selection algorithm tries to apply several rules for the set of IPv6 addresses. Now each attempt goes into IPv6 statistics, even if given rule did not won. Change this and take into account only those rules, that won. Also add accounting for cases, when algorithm fails to select an address.	2013-04-15 21:02:40 +00:00
Andrey V. Elsukov	ecc5c7387b	Free memory after deleting an address policy entry. MFC after: 1 week	2013-04-12 07:59:54 +00:00
Andrey V. Elsukov	9cb8d207af	Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats. MFC after: 1 week	2013-04-09 07:11:22 +00:00
Kevin Lo	b3dcd51dde	Clean up some unused leftover code. Pointed out by: ae	2013-03-22 01:45:54 +00:00
Kevin Lo	dda95c6e59	Remove unused global variables. Reviewed by: ae, glebius	2013-03-22 01:40:17 +00:00
Gleb Smirnoff	10e5acc3c6	- Use m_getcl() instead of hand allocating. - Do not calculate constant length values at run time, CTASSERT() their sanity. - Remove superfluous cleaning of mbuf fields after allocation. - Replace compat macros with function calls. Sponsored by: Nginx, Inc.	2013-03-15 13:48:53 +00:00
Gleb Smirnoff	7b07d1bed0	- Use m_getcl() instead of hand allocating. - Use m_get()/m_gethdr() instead of macros. - Remove superfluous cleaning of mbuf fields after allocation. Sponsored by: Nginx, Inc.	2013-03-15 12:50:29 +00:00
Gleb Smirnoff	aa48181169	Use m_getcl() instead of hand made allocation. Sponsored by: Nginx, Inc.	2013-03-15 12:33:23 +00:00
Andrey V. Elsukov	b7c896c9be	Take the inpcb rlock before calculating checksum, it was accidentally moved in r191672. Obtained from: Yandex LLC MFC after: 1 week	2013-03-12 02:20:20 +00:00
Navdeep Parhar	63a97a4040	Generate lle_event in the IPv6 neighbor discovery code too. Reviewed by: bz@	2013-01-26 00:05:22 +00:00
Navdeep Parhar	f31b83e118	Avoid NULL dereference in nd6_storelladdr when no mbuf is provided. It is called this way from a couple of places in the OFED code. (toecore calls it too but that's going to change shortly). Reviewed by: bz@	2013-01-25 23:11:13 +00:00
Andrey V. Elsukov	3ea87cb57f	Simplify in6_setscope() function to get better performance. Currently we use interface indeces as zone IDs for link-local and interface-local scopes, and since we don't have any tool to configure zone IDs, there is no need to acquire the afdata lock several times per packet only to read if_index value. So, now in6_setscope reads zone IDs for interface-local, link-local and global scopes without a lock. Sponsored by: Yandex LLC MFC after: 2 weeks	2013-01-10 00:10:24 +00:00
Andrey V. Elsukov	6e9a5a5e52	Remove unneeded variable. MFC after: 1 week	2013-01-09 18:54:58 +00:00
Hajimu UMEMOTO	164051cea5	Add no_prefer_iface option. It stops treating the address on the interface as special by source address selection rule even when the interface is outgoing interface. This is desired in some situation. Requested by: hrs Reviewed by: IHANet folks including hrs MFC after: 1 week	2013-01-09 18:18:08 +00:00
Andrey V. Elsukov	78c235c99f	The in6_setscope() function determines the scope zone id of an address and embeds it into address. Inside the kernel we keep addresses with embedded zone id only for two scopes: link-local and interface-local. For other scopes this function is nop in most cases. To reduce an overhead of locking, first check that address is capable for embedding. Also, handle the loopback address before acquire the lock. Sponsored by: Yandex LLC MFC after: 1 week	2013-01-09 00:36:06 +00:00
Peter Wemm	8a1163e82f	Temporarily revert rev 244678. This is causing loopback problems with the lo (loopback) interfaces.	2013-01-03 10:21:28 +00:00
Gleb Smirnoff	468e45f3bd	The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify all interested parties in case if interface flag IFF_UP has changed. However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR and SIOCAIFADDR_IN6 can, too. The actual \|= is done not in the protocol code, but in code of interface drivers. To fix this historical layering violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the IFF_UP flag, and if it did, run the if_up() handler. This fixes configuring an address under CARP control on an interface that was initially !IFF_UP. P.S. I intentionally omitted handling the IFF_SMART flag. This flag was never ever used in any driver since it was introduced, and since it means another layering violation, it should be garbage collected instead of pretended to be supported.	2012-12-25 13:01:58 +00:00
Andrey V. Elsukov	f8fe3dc9aa	When we have some address to forward (e.g. it was specified with ipfw fwd), we should pass it as first argument into in6_selectroute_fib function to initiate new route lookup. MFC after: 1 week	2012-12-19 17:28:17 +00:00
Andrey V. Elsukov	16607317b5	Make dst_sa initialization only when it is actually needed. MFC after: 1 week	2012-12-19 17:08:49 +00:00
Andrey V. Elsukov	61d88f3421	The selectroute functions does own account of EHOSTUNREACH errors, no need to do it twice. MFC after: 1 week	2012-12-19 17:02:07 +00:00
Andrey V. Elsukov	79672fd277	Use M_PROTO7 flag for M_IP6_NEXTHOP, because M_PROTO2 was used for M_AUTHIPHDR. Pointy hat to: ae Reported by: Vadim Goncharov MFC after: 3 days	2012-12-17 14:36:56 +00:00
Andrey V. Elsukov	68eba526b9	In additional to the tailq of IPv6 addresses add the hash table. For now use 256 buckets and fnv_hash function. Use xor'ed 32-bit s6_addr32 parts of in6_addr structure as a hash key. Update in6_localip and in6_is_addr_deprecated to use hash table for fastest lookup. Sponsored by: Yandex LLC Discussed with: dwmalone, glebius, bz	2012-12-15 20:04:24 +00:00
Gleb Smirnoff	b1ec2940af	Fix problem in r238990. The LLE_LINKED flag should be tested prior to entering llentry_free(), and in case if we lose the race, we should simply perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread performing arptimer(), it will remove two references from the lle instead of one. Reported by: Ian FREISLICH <ianf clue.co.za>	2012-12-13 11:11:15 +00:00
Hiroki Sato	0bebb5448b	- Move definition of V_deembed_scopeid to scope6_var.h. - Deembed scope id in L3 address in in6_lltable_dump(). - Simplify scope id recovery in rtsock routines. - Remove embedded scope id handling in ndp(8) and route(8) completely.	2012-12-05 19:45:24 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Andrey V. Elsukov	35e692dc0e	Remove opt_inet.h, it isn't required here. MFC after: 1 week	2012-11-20 14:09:37 +00:00
Hiroki Sato	3829d13564	Check if an extracted zoneid is equal to the non-zero sin6_scope_id only when it is link-local or MC interface-local.	2012-11-18 16:06:51 +00:00
Michael Tuexen	3a51a2647a	Add support for SCTP/UDP/IPV6. This completes the support of http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-udp-encaps MFC after: 1 week	2012-11-17 20:04:04 +00:00
Andrey V. Elsukov	73cb2f38f2	Reduce the overhead of locking, use IF_AFDATA_RLOCK() when we are doing simple lookups. Sponsored by: Yandex LLC MFC after: 1 week	2012-11-16 12:12:02 +00:00
Andrey V. Elsukov	5cf7ec13f7	if_afdata lock was converted from mutex to rwlock a long ago, so we can replace IF_AFDATA_LOCK() macro depending to the access type. Sponsored by: Yandex LLC MFC after: 1 week	2012-11-14 17:36:06 +00:00
Andrey V. Elsukov	862a3e1227	SCOPE6_LOCK protects V_sid_default, no need to acquire it without any access to V_sid_default. Sponsored by: Yandex LLC MFC after: 1 week	2012-11-14 17:23:48 +00:00
Andrey V. Elsukov	f062401329	zoneid has unsigned type. MFC after: 1 week	2012-11-14 17:14:03 +00:00
David E. O'Brien	f1e0de695c	Use consistent style.	2012-11-13 01:48:00 +00:00
Andrey V. Elsukov	ffdbf9da3b	Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre	2012-11-02 01:20:55 +00:00
Michael Tuexen	21f67da7c4	Whitespace changes due to upstream integration of SCTP changes in the FreeBSD code base.	2012-10-29 20:47:32 +00:00
Andrey V. Elsukov	c1de64a495	Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks	2012-10-25 09:39:14 +00:00
Xin LI	6f56329a25	Remove __P. Submitted by: kevlo Reviewed by: md5(1) MFC after: 2 months	2012-10-22 21:49:56 +00:00
Gleb Smirnoff	8f134647ca	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
Alexander V. Chernikov	7994d24f0c	Eliminate code checking if found IPv6 rte is dynamic. IPv6 redirects are using (different) ND-based approach described in RFC 4861. This change is similar to r241406 which conditionally skips the same check in IPv4. This change is part of bigger patch eliminating rte locking. Sponsored by: Yandex LLC. OK'd by: hrs MFC after: 2 weeks	2012-10-22 12:54:52 +00:00
Andre Oppermann	c9b652e3e8	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.	2012-10-18 13:57:24 +00:00
Alexander V. Chernikov	3bff27cd67	Cleanup documentation: cloning route support has been removed in r186119. MFC after: 2 weeks	2012-10-13 09:31:01 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Andriy Gapon	ce8b4f7cb9	ip6_ipsec_output: fix a typo in r241344 Acting as a remote drone of glebius.	2012-10-08 13:45:40 +00:00
Gleb Smirnoff	23e9c6dc1e	After r241245 it appeared that in_delayed_cksum(), which still expects host byte order, was sometimes called with net byte order. Since we are moving towards net byte order throughout the stack, the function was converted to expect net byte order, and its consumers fixed appropriately: - ip_output(), ipfilter(4) not changed, since already call in_delayed_cksum() with header in net byte order. - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order there and back. - mrouting code and IPv6 ipsec now need to switch byte order there and back, but I hope, this is temporary solution. - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum(). - pf_route() catches up on r241245 changes to ip_output().	2012-10-08 08:03:58 +00:00
Gleb Smirnoff	d6d3f01e0a	Merge the projects/pf/head branch, that was worked on for last six months, into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>	2012-09-08 06:41:54 +00:00
Mikolaj Golub	ab16a5bd08	In ip6_ctloutput() guard inp_flags modifications with INP_WLOCK. MFC after: 2 weeks	2012-08-19 08:16:13 +00:00
Gleb Smirnoff	ea53792942	Fix races between in_lltable_prefix_free(), lla_lookup(), llentry_free() and arptimer(): o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via \|= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs. The patch is a collaborative work of all submitters and myself. PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>	2012-08-02 13:57:49 +00:00
Gleb Smirnoff	b9aee262e5	Some more whitespace cleanup.	2012-08-01 09:00:26 +00:00
Bjoern A. Zeeb	3b43b78342	In case of IPsec he have to do delayed checksum calculations before adding any extension header, or rather before calling into IPsec processing as we may send the packet and not return to IPv6 output processing here. PR: kern/170116 MFC After: 3 days	2012-07-31 23:34:06 +00:00
Gleb Smirnoff	ea50c13ebe	Some style(9) and whitespace changes. Together with: Andrey Zonov <andrey zonov.org>	2012-07-31 11:31:12 +00:00
Bjoern A. Zeeb	d2ed798615	Properly apply #ifdef INET and leave a comment that we are (will) apply delayed IPv6 checksum processing in ip6_output.c when doing IPsec. PR: kern/170116 MFC after: 3 days	2012-07-31 05:44:03 +00:00
Bjoern A. Zeeb	68c99a6023	Improve the should-never-hit printf to ease debugging in case we'd ever hit it again when doing the delayed IPv6 checksum calculations. MFC after: 3 days	2012-07-31 05:34:54 +00:00
Bjoern A. Zeeb	5dbbe4fdd2	For consistency put the IPsec comment iside the #fidef section. MFC after: 3 days	2012-07-29 00:45:24 +00:00
Bjoern A. Zeeb	c50ffbdf4b	Fix a comment that we do not have an SA yet but need to acquire one. MFC after: 3 days	2012-07-29 00:44:41 +00:00
Michael Tuexen	5e20b91dbe	Changes which improve compilation if neither INET nor INET6 is defined. MFC after: 3 days	2012-07-15 20:16:17 +00:00
Michael Tuexen	e0e00a4d0f	#ifdef INET and INET6 consistently. This also fixes a bug, where it was done wrong. MFC after: 3 days	2012-07-15 11:04:49 +00:00
Hiroki Sato	f6c336fe66	Remove "prefer_source" address selection option. FreeBSD has had an implementation of RFC 3484 for this purpose for a long time and "prefer_source" was never implemented actually. ND6_IFF_PREFER_SOURCE macro is left intact.	2012-07-09 06:21:46 +00:00
Bjoern A. Zeeb	4018ea9a2b	Implement handling of "atomic fragements" as outlined in draft-gont-6man-ipv6-atomic-fragments to mitigate one class of possible fragmentation-based attacks. MFC after: 5 days	2012-07-08 15:30:24 +00:00
Bjoern A. Zeeb	8dcc26febe	As mentioned in the commit message of r237571 (copied from a prototype patch of mine) also check if the 2nd in6_setscope() failed and return the error in that case. MFC after: 5 days	2012-07-08 08:49:37 +00:00
Gleb Smirnoff	bf9840512a	When ip_output()/ip6_output() is supplied a struct route *ro argument, it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation. The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro. - Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL. Tested by: tuexen (SCTP part)	2012-07-04 07:37:53 +00:00
Gleb Smirnoff	3df6468a2d	Remove route caching from IP multicast routing code. There is no reason to do that, and also, cached route never got unreferenced, which meant a reference leak. Reviewed by: bms	2012-07-02 19:44:18 +00:00
Michael Tuexen	a8775ad93d	Move common code parts to sctp_common_input_processing(). MFC after: 3 days	2012-07-02 16:44:09 +00:00
Bruce M Simpson	df7e35725b	Kick the current-state report timer when a V1 group report would be triggered. Submitted by: rpaulo@ MFC after: 3 days	2012-06-28 23:48:40 +00:00
Bruce M Simpson	b7d882304b	Fix a typo in MLD query exponent processing. Submitted by: rpaulo@ MFC after: 3 days	2012-06-28 23:45:37 +00:00
Bruce M Simpson	289ca95209	In MLDv2 general query processing, do not enforce the strict check on query origins. Submitted by: Gu Yong MFC after: 3 days	2012-06-28 23:44:47 +00:00
Michael Tuexen	b1754ad17b	Pass the src and dst address of a received packet explicitly around. MFC after: 3 days	2012-06-28 16:01:08 +00:00
Xin LI	5b8f1a8676	Fix a LOR acquiring the if_afdata lock while holding an rtentry lock. Possibly do some entra work in case we would not get into the ifa0 != NULL paths later as we already do for the mltaddr before. XXX We should possibly error in case in6_setscope fails. Reference: http://lists.freebsd.org/pipermail/freebsd-net/2011-September/029829.html Submitted by: bz MFC after: 1 week	2012-06-25 20:56:32 +00:00
Michael Tuexen	6dc5aabcb7	Unify sctp_input() and sctp6_input(). MFC after: 3 days	2012-06-25 19:13:43 +00:00
Michael Tuexen	39803b8c58	Whitespace cleanup. MFC after: 3 days	2012-06-25 17:15:09 +00:00
Michael Tuexen	20cc2188f3	Pass the packet length explicitly around. MFC after: 3 days	2012-06-24 23:12:24 +00:00
Michael Tuexen	f938425253	Do packet logging in a consistent way. MFC after: 3 days	2012-06-24 21:25:54 +00:00
Bjoern A. Zeeb	23bc7025c4	Just add a comment to further investigate when being closer to that code again next time. The condition of the 2nd if() is very unlikely ever met.	2012-06-22 21:26:35 +00:00
Michael Tuexen	f30ac43257	Pass flowid explicitly through the stack instead of taking it from the mbuf chain at different places. While there: Fix several bugs related to VRFs. MFC after: 3 days	2012-06-14 06:54:48 +00:00
Michael Tuexen	b36dcb9a39	Deliver IPV6_TCLASS, IPV6_HOPLIMIT and IPV6_PKTINFO cmsgs (if requested) on IPV6 sockets, which have been marked to be not IPV6_V6ONLY, for each received IPV4 packet. MFC after: 3 days	2012-06-12 13:57:56 +00:00
Bjoern A. Zeeb	15cc25e9c0	Plug two interface address refcount leaks in early error return cases in the ioctl path. Reported by: rpaulo Reviewed by: emax MFC after: 3 days	2012-06-05 13:27:37 +00:00
Maksim Yevmenkin	3df0e439b0	Plug reference leak. Interface routes are refcounted as packets move through the stack, and there's garbage collection tied to it so that route changes can safely propagate while traffic is flowing. In our setup, we weren't changing or deleting any routes, but the refcounting logic in ip6_input() was wrong and caused a reference leak on every inbound V6 packet. This eventually caused a 32bit overflow, and the resulting 0 value caused the garbage collection to run on the active route. That then snowballed into the panic. Reviewed by: scottl MFC after: 3 days	2012-06-03 07:36:59 +00:00
Michael Tuexen	a6cff10f2a	Seperate SCTP checksum offloading for IPv4 and IPv6. While there: remove some trainling whitespaces. MFC after: 3 days X-MFC with: 236170	2012-05-30 20:56:07 +00:00
Maksim Yevmenkin	c784f9e5eb	When we return deprecated addresses, we need to reference them. Reviewed by: bz, scottl MFC after: 3 days	2012-05-30 20:02:39 +00:00
Bjoern A. Zeeb	356ab07e2d	It turns out that too many drivers are not only parsing the L2/3/4 headers for TSO but also for generic checksum offloading. Ideally we would only have one common function shared amongst all drivers, and perhaps when updating them for IPv6 we should introduce that. Eventually we should provide the meta information along with mbufs to avoid (re-)parsing entirely. To not break IPv6 (checksums and offload) and to be able to MFC the changes without risking to hurt 3rd party drivers, duplicate the v4 framework, as other OSes have done as well. Introduce interface capability flags for TX/RX checksum offload with IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6 flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6 fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and add an alias for CSUM_DATA_VALID_IPV6. This pretty much brings IPv6 handling in line with IPv4. TSO is still handled in a different way and not via if_hwassist. Update ifconfig to allow (un)setting of the new capability flags. Update loopback to announce the new capabilities and if_hwassist flags. Individual driver updates will have to follow, as will SCTP. Reported by: gallatin, dim, .. Reviewed by: gallatin (glanced at?) MFC after: 3 days X-MFC with: r235961,235959,235958	2012-05-28 09:30:13 +00:00
Bjoern A. Zeeb	c69baa7e91	Correctly get the payload length in host byte order. While we already plan to support >64k payload here, the IPv6 header payload length obviously is only 16 bit and the calculations need to be right. Reported by: dim Tested by: dim MFC after: 1 day X-MFC: with r235958	2012-05-26 23:58:51 +00:00
Michael Tuexen	8d9638ab33	Get rid of SCTP specific code to avoid CRC32C computations on loopback. Just just offloading. MFC after: 3 days	2012-05-26 09:16:33 +00:00
Bjoern A. Zeeb	39e19560d6	MFp4 bz_ipv6_fast: Use M_ZERO with malloc rather than calling bzero() ourselves. Change if () panic() checks to KASSERT()s as they are only catching invariants in code flow but not dependent on network input/output. Move initial assigments indirecting pointers after the lock has been aquired. Passing layer boundries, reset M_PROTOFLAGS. Remove a NULL assignment before free. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 09:27:16 +00:00
Bjoern A. Zeeb	ae14505058	MFp4 bz_ipv6_fast: Factor out Hop-By-Hop option processing. It's still not heavily used, it reduces the footprint of ip6_input() and makes ip6_input() more readable. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 02:58:21 +00:00
Bjoern A. Zeeb	5aa624a803	MFp4 bz_ipv6_fast: Defer checksum calulations on UDP6 output and respect the mbuf flags set by NICs having done checksum validation for us already, thus saving the computing time in the input path as well. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 02:19:17 +00:00
Bjoern A. Zeeb	e7b92e2769	MFp4 bz_ipv6_fast: Add support for delayed checksum calculations in the IPv6 output path. We currently cannot offload to the card if we add extension headers (which incl. fragmentation). Fix two SCTP offload support copy&paste bugs: calculate checksums if fragmenting and no need to flag IPv4 header checksums in the IPv6 forwarding path. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 02:17:16 +00:00
Bjoern A. Zeeb	d3443481dc	MFp4 bz_ipv6_fast: Hide the ip6aux functions. The only one referenced outside ip6_input.c is not compiled in yet (__notyet__) in route6.c (r235954). We do have accessor functions that should be used. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days X-MFC: KPI?	2012-05-25 01:48:15 +00:00
Bjoern A. Zeeb	f8315b5fd6	MFp4 bz_ipv6_fast: Simplify the code removing a return from an earlier else case, not differing from the default function return called now. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 01:45:05 +00:00
Bjoern A. Zeeb	1b53a49ad9	MFp4 bz_ipv6_fast: We currently nowhere set IP6A_SWAP making the entire check useless with the current code. Keep around but do not compile in. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 01:43:52 +00:00
Bjoern A. Zeeb	2cf62998da	MFp4 bz_ipv6_fast: No need to hold the (expensive) rt lock over (expensive) logging. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 01:42:48 +00:00
Bjoern A. Zeeb	ecade87edf	MFp4 bz_ipv6_fast: Introduce a (for now copied stripped down) in6_cksum_pseudo() function. We should be able to use this from in6_cksum() but we should also ponder possible MD specific improvements. It takes an extra csum argument to allow for easy checks as will be done by the upper layer protocol input paths. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-24 18:25:09 +00:00
Bjoern A. Zeeb	2889eb8bdf	MFp4 bz_ipv6_fast: Optimize in6_cksum(), re-ordering work and limiting variable initialization, removing a bzero() for mostly re-initialized struct values, making use of the newly introduced in6_getscope(), as well as converting an if/panic to a KASSERT(). Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-24 18:05:10 +00:00
Bjoern A. Zeeb	0edc703a02	MFp4 bz_ipv6_fast: Introduce in6_getscope() to allow more effective checksum computations without the need to copy the address to clear the scope. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-24 16:30:13 +00:00
Michael Tuexen	807aad636f	Use consistent text at the begining of the files. MFC after: 3 days	2012-05-23 11:26:28 +00:00
Marius Strobl	7195094c3c	Rewrite nd6_sysctl_{d,p}rlist() to avoid misaligned accesses to char arrays casted to structs by getting rid of these buffers entirely. In r169832, it was tried to paper over this issue by 32-bit aligning the buffers. Depending on compiler optimizations that still was insufficient for 64-bit architectures with strong alignment requirements though. While at it, add comments regarding the total lack of locking in this area. Tested by: bz Reviewed by: bz (slightly earlier version), yongari (earlier version) MFC after: 1 week	2012-05-20 05:12:31 +00:00
Michael Tuexen	1a5b79010a	Missed to commit this in r235414. MFC after: 3 days	2012-05-13 19:25:21 +00:00
Michael Tuexen	410a3b1ef0	Use ECONNABORTED in cases where the ABORT was sent to the peer. MFC after: 3 days	2012-05-13 16:56:16 +00:00
Michael Tuexen	a2b42326b5	Provide in the association change notification the received ABORT chunk if case of SCTP_COMM_LOST or SCTP_CANT_STR_ASSOC as required by RFC 6458. MFC after: 3 days	2012-05-12 20:11:35 +00:00
Gleb Smirnoff	d0e6c546a2	in6_pcblookup_local() still can return a pcb with NULL inp_socket. To avoid panic, do not dereference inp_socket, but obtain reuse port option from inp_flags2, like this is done after next call to in_pcblookup_local() a few lines down below. Submitted by: rwatson	2012-03-21 08:43:38 +00:00
Michael Tuexen	dea47f3999	Clean up, no functional change. MFC after: 3 days.	2012-03-15 14:22:05 +00:00

... 2 3 4 5 6 ...

1453 Commits