freebsd-skq

Author	SHA1	Message	Date
Andrey V. Elsukov	a7e201bbac	Make in6_pcblookup_hash_locked and in6_pcbladdr static. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-10 13:17:35 +00:00
Andrey V. Elsukov	1b44e5ffe3	Introduce INP6_PCBHASHKEY macro. Replace usage of hardcoded part of IPv6 address as hash key in all places. Obtained from: Yandex LLC	2014-09-10 12:35:42 +00:00
Andrey V. Elsukov	5dbfa43f65	Add the ability to set `prefer_source' flag to an IPv6 address. It affects the IPv6 source address selection algorithm (RFC 6724) and allows override the last rule ("longest matching prefix") for choosing among equivalent addresses. The address with `prefer_source' will be preferred source address. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2014-09-09 10:52:50 +00:00
Adrian Chadd	a4d98bf442	Add basic RSS awareness for the UDPv6 send path. This doesn't include the same kind of userland overriding that the IPv4 path has; nor does it yet know about 2-tuple versus 4-tuple hashing. That'll come later. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 04:20:53 +00:00
Adrian Chadd	b174de323a	Add IP_NODEFAULTFLOWID awareness to ip6_output(). Differential Revision: https://reviews.freebsd.org/D527	2014-09-09 00:21:21 +00:00
Michael Tuexen	24aaac8d59	Use union sctp_sockstore instead of struct sockaddr_storage. This eliminiates some warnings when building in userland. Thanks to Patrick Laimbock for reporting this issue. Remove also some unnecessary casts. There should be no functional change. MFC after: 1 week	2014-09-07 09:06:26 +00:00
Andrey V. Elsukov	ccc53de916	Add the reverse part to rule #9 . Also change its description in the netstat(8) output. MFC after: 1 week	2014-09-01 09:30:34 +00:00
Mark Johnston	5fc2632281	Add some missing checks for unsupported interfaces (e.g. pflog(4)) when handling ioctls. While here, remove duplicated checks for a NULL ifp in in6_control(): this check is already done near the beginning of the function. PR: 189117 Reviewed by: hrs MFC after: 2 weeks	2014-08-22 19:21:08 +00:00
Kevin Lo	73d76e77b6	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
Kevin Lo	8f5a8818f5	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
Andrey V. Elsukov	d6e6b9943b	Add new rule to source address selection algorithm. It prefers address with better virtual status. Use ifa_preferred() to choose better address. PR: 187341 Tested by: des MFC after: 1 week	2014-07-30 15:08:12 +00:00
Gleb Smirnoff	9753faf553	Garbage collect couple of unused fields from struct ifaddr: - ifa_claim_addr() unused since removal of NetAtalk - ifa_metric seems to be never utilized, always a copy of if_metric	2014-07-29 15:01:29 +00:00
Hiroki Sato	9be09a6e43	Fix EtherIP. TOS field must be initialized when the inner protocol is PF_LINK, and multicast/broadcast flag should always be dropped because the outer protocol uses unicast even when the inner address is not for unicast. It had been broken since r236951 when gif_output() started to use IFQ_HANDOFF().	2014-07-24 10:42:47 +00:00
Adrian Chadd	0ae3f42231	When it's time to do 4-tuple UDP IPv6 hashing, make sure this is a known type.	2014-07-20 07:39:54 +00:00
Adrian Chadd	c7c0d94874	Add IPv6 flowid, bindmulti and RSS awareness.	2014-07-12 05:46:33 +00:00
Adrian Chadd	a8a2d8003a	Add INP_RSS_BUCKET_SET awareness for IPv6 pcbgroup entries. This ensures that a listen socket with INP_RSS_BUCKET_SET set will use the pre-determined PCBGROUP rather than what the hashing path chooses.	2014-07-12 05:45:53 +00:00
Adrian Chadd	6e4405cee1	Add the IPv6 versions of the multi-bind, hash/hash type and RSS options.	2014-07-12 05:44:16 +00:00
Andrey V. Elsukov	ff899182ec	Fix condition. Sponsored by: Yandex LLC	2014-07-11 06:34:15 +00:00
Bryan Venteicher	6700a7d44b	Use the appropriate IPv6 hashtype defines when looking up the PCBGROUP Reviewed by: adrian@	2014-07-07 00:02:49 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Hajimu UMEMOTO	f4839cbc0a	Make nd6_gctimer tunable. MFC after: 1 week	2014-06-23 16:27:29 +00:00
Kevin Lo	ea93c6a613	Catch up with r186809, correct comments.	2014-06-23 05:17:39 +00:00
Andrey V. Elsukov	45b4fb0449	Remove unused variable. Sponsored by: Yandex LLC	2014-06-08 09:08:51 +00:00
Alan Somers	2f308a343f	Fix unintended KBI change from r264905. Add _fib versions of ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the _fib() versions with RT_ALL_FIBS, preserving legacy behavior. sys/net/if_var.h sys/net/if.c Add legacy-compatible functions as described above. Ensure legacy behavior when RT_ALL_FIBS is passed as fibnum. sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/net/route.c sys/net/rtsock.c sys/netinet6/nd6.c Call with _fib() functions if we must use a specific fib, or the legacy functions otherwise. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c Improve the udp_dontroute test. The bug that this test exercises is that ifa_ifwithnet() will return the wrong address, if multiple interfaces have addresses on the same subnet but with different fibs. The previous version of the test only considered one possible failure mode: that ifa_ifwithnet_fib() might fail to find any suitable address at all. The new version also checks whether ifa_ifwithnet_fib() finds the correct address by checking where the ARP request goes. Reported by: bz, hrs Reviewed by: hrs MFC after: 1 week X-MFC-with: 264905 Sponsored by: Spectra Logic	2014-05-29 21:03:49 +00:00
Hiroki Sato	82a9fa4a1d	Add rwlock to struct dadq. A panic could occur when a large number of addresses performed DAD at the same time.	2014-05-29 20:53:53 +00:00
VANHULLEBUS Yvan	aaf2cfc0d6	Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels. For IPv6-in-IPv4, you may need to do the following command on the tunnel interface if it is configured as IPv4 only: ifconfig <interface> inet6 -ifdisabled Code logic inspired from NetBSD. PR: kern/169438 Submitted by: emeric.poupon@netasq.com Reviewed by: fabient, ae Obtained from: NETASQ	2014-05-28 12:45:27 +00:00
Hiroki Sato	705bef548a	Cancel DAD for an ifa when the ifp has ND6_IFF_IFDISABLED as early as possible and do not clear IN6_IFF_TENTATIVE. If IFDISABLED was accidentally set after a DAD started, TENTATIVE could be cleared because no NA was received due to IFDISABLED, and as a result it could prevent DAD when manually clearing IFDISABLED after that.	2014-05-16 15:53:31 +00:00
Alexander V. Chernikov	b980262e63	Pass radix head ptr along with rte to rtexpunge(). Rename rtexpunge to rt_expunge().	2014-05-03 16:28:54 +00:00
Alexander V. Chernikov	cf58751a44	Use "hash" value in rtalloc_mpath_fib() instead of RTF_ANNOUNCE flag. Hashing method is the same as in in6_src.c. (Probably we need better one). MFC after: 2 weeks	2014-04-26 16:46:33 +00:00
Alexander V. Chernikov	36d55f0f9d	Unify sa_equal() macro usage. MFC after: 2 weeks	2014-04-26 14:52:03 +00:00
Alan Somers	0cfee0c223	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic	2014-04-24 23:56:56 +00:00
Andrey V. Elsukov	52c57247d3	Remove unused variable. PR: 173521 MFC after: 1 week Sponsored by: Yandex LLC	2014-04-17 06:40:11 +00:00
Andrey V. Elsukov	4fd913364f	Properly release the in6_multi lock. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-12 02:05:31 +00:00
Kevin Lo	d1b18731d9	Minor style cleanups.	2014-04-07 01:55:53 +00:00
Kevin Lo	e06e816f67	Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks. Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian	2014-04-07 01:53:03 +00:00
Andrey V. Elsukov	cd71804c84	Remove unused label. MFC after: 1 week	2014-03-31 14:40:35 +00:00
Andrey V. Elsukov	27aa751c90	Don't generate an ICMPv6 error message if packet was consumed by filter. MFC after: 1 week Sponsored by: Yandex LLC	2014-03-31 14:27:22 +00:00
Robert Watson	7527624efa	Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)	2014-03-15 00:57:50 +00:00
Gleb Smirnoff	aa69c61235	Since both netinet/ and netinet6/ call into netipsec/ and netpfil/, the protocol specific mbuf flags are shared between them. - Move all M_FOO definitions into a single place: netinet/in6.h, to avoid future clashes. - Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted in a failure of operation of IPSEC and packet filters. Thanks to Nicolas and Georgios for all the hard work on bisecting, testing and finally finding the root of the problem. PR: kern/186755 PR: kern/185876 In collaboration with: Georgios Amanakis <gamanakis gmail.com> In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com> Sponsored by: Nginx, Inc.	2014-03-12 14:29:08 +00:00
Gleb Smirnoff	e3a7aa6f56	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
John Baldwin	5b26ea5df3	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
Craig Rodrigues	47a79fadc6	Remove KASSERT from in6p_lookup_mcast_ifp(). When the devel/jenkins port, version 1.551 was started, the kernel would panic if INVARIANTS was enabled in the kernel config. Suggested by: bms	2014-02-23 01:27:22 +00:00
Gleb Smirnoff	0ff96b4f55	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
Alexander V. Chernikov	f6990c4e3e	Further simplify nd6_output_lle. Currently we have 3 usage patterns: 1) nd6_output (most traffic flow, no lle supplied, lle RLOCK sufficient) 2) corner cases for output (no lle, STALE lle, so on). lle WLOCK needed. 3) nd* iunternal machinery (WLOCK'ed lle provided, perform packet queing). We separate case 1 and implement it inside its only customer - nd6_output. This leads to some code duplication (especialy SEND stuff, which should be hooked to output in a different way), but simplifies locking and control flow logic fir nd6_output_lle. Reviewed by: ae MFC after: 3 weeks Sponsored by: Yandex LLC	2014-02-13 19:09:04 +00:00
Andrey V. Elsukov	e4c77ca0c0	Drop packets to multicast address whose scop field contains the reserved value 0. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-13 14:10:44 +00:00
Christian Brueffer	d37872314f	Only count table lookups when we're actually processing packets. PR: 183462 Submitted by: Sven-Thorsten Dietrich <thebigcorporation at gmail.com> Reviewed by: bms MFC after: 1 month	2014-02-10 14:47:51 +00:00
Christian Brueffer	1b55364ed9	For IPv6, return the same error code as IPv4 when mrouter is not initialized. PR: 178472 Submitted by: Sven-Thorsten Dietrich <sven at vyatta.com> Reviewed by: bms	2014-02-10 14:36:51 +00:00
Alexander V. Chernikov	9dffa6a3f3	Simplify nd6_output_lle: * Check ND6_IFF_IFDISABLED before acquiring any locks * Assume m is always non-NULL * remove 'bad' case not used anymore * Simply if_output conditional MFC after: 2 weeks Sponsored by: Yandex LLC	2014-02-10 12:52:33 +00:00
Gleb Smirnoff	5d6d7e756b	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
Andrey V. Elsukov	74a976fffd	Unlock entry before retry. Submitted by: melifaro MFC after: 1 week	2014-02-07 10:58:46 +00:00
Andrey V. Elsukov	51eecdc35a	Take exclusive lock only when lle isn't NULL. We don't need write access to lle in most cases. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-02 07:28:04 +00:00
Alexander V. Chernikov	f6b84910bb	Further rework netinet6 address handling code: * Set ia address/mask values BEFORE attaching to address lists. Inet6 address assignment is not atomic, so the simplest way to do this atomically is to fill in ia before attach. * Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code). * Do some renamings: in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here) in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code) in6_ifaddloop -> nd6_add_ifa_lle in6_ifremloop -> nd6_rem_ifa_lle * Split working with LLE and route announce code for last two. Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour. * Call device SIOCSIFADDR handler IFF we're adding first address. In IPv4 we have to call it on every address change since ARP record is installed by arp_ifinit() which is called by given handler. IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so there is no reason to call SIOCSIFADDR often.	2014-01-19 16:07:27 +00:00
Alexander V. Chernikov	0c5d4bde90	Use in6_localip() instead of hand-rolled cycle. MFC after: 2 weeks	2014-01-18 20:54:55 +00:00
Alexander V. Chernikov	9080e7d023	Add in6_prepare_ifra() function to ease preparing in-kernel IPv6 address requests. MFC after: 2 weeks	2014-01-18 20:32:59 +00:00
Alexander V. Chernikov	b6a16fc853	Do some style(9) not done in r260851 to improve readability. MFC after: 2 weeks	2014-01-18 15:57:43 +00:00
Alexander V. Chernikov	60d7c722a5	Split in6_update_ifa() into smaller pieces leaving functionality intact. Discussed with: ae MFC after: 2 weeks	2014-01-18 15:52:52 +00:00
Andrey V. Elsukov	e74966f60b	Mechanically replace direct accessing to if_xname to using if_name() macro.	2014-01-10 12:33:28 +00:00
John-Mark Gurney	f2effe745c	revert part of r260485 which changes how part of the header gets included.. netstat uses -DKERNEL=1 to get these parts and breaks the build w/o it... melifaro@ says that ae@ is probably asleep, and the PR doesn't have this part of the patch... Probably a local change got in by accident.. PR: 185148 Pointy hat to: ae@	2014-01-09 22:41:18 +00:00
Andrey V. Elsukov	78415d1082	Remove extra nesting from X_ip6_mforward() function. Also remove disabled definitions from ip6_mroute.h. PR: 185148 Sponsored by: Yandex LLC	2014-01-09 15:38:28 +00:00
Andrey V. Elsukov	0a6b0ffa54	Add MRT6_DLOG() macro for debugging. Reduce number of MRT6DEBUG ifdefs and fix some broken format strings. MFC after: 1 week Sponsored by: Yandex LLC	2014-01-09 14:58:06 +00:00
Alexander V. Chernikov	1dc8f6a82c	Introduce IN6_MASK_ADDR() macro to unify various hand-rolled code to do IPv6 addr & mask in different places. MFC after: 2 weeks	2014-01-08 22:13:32 +00:00
Andrey V. Elsukov	b88aef1dcf	Use pointer to struct sockaddr_in6 in lla_lookup() call. This prevents from triggering KASSERT in in6_lltable_lookup.	2014-01-03 02:40:56 +00:00
Andrey V. Elsukov	e2d14d9317	Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag. MFC after: 1 week	2014-01-03 02:32:05 +00:00
Andrey V. Elsukov	ea0c377602	lla_lookup() does modification only when LLE_CREATE is specified. Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing lla_lookup() without LLE_CREATE flag. Reviewed by: glebius, adrian MFC after: 1 week Sponsored by: Yandex LLC	2014-01-02 08:40:37 +00:00
Adrian Chadd	c445d2520d	Use an RLOCK here instead of an RWLOCK - matching all the other calls to lla_lookup(). This drastically reduces the very high lock contention when doing parallel TCP throughput tests (> 1024 sockets) with IPv6. Tested: * parallel IPv6 TCP bulk data exchange, 8192 sockets MFC after: 1 week Sponsored by: Netflix, Inc.	2014-01-01 00:56:26 +00:00
Bjoern A. Zeeb	010c2b8192	Correct warnings comparing unsigned variables < 0 constantly reported while building kernels. All instances removed are indeed unsigned so the expressions could not be true. MFC after: 1 week	2013-12-25 20:08:44 +00:00
Dimitry Andric	6c5a340e56	In sys/netinet6/in6_mcast.c, in6m_is_ifp_detached() is only used whenever KTR is defined, so put it between #ifdef KTR guards. This avoids a warning about a unused function if KTR is not enabled. MFC after: 3 days	2013-12-24 20:30:13 +00:00
Andrey V. Elsukov	569aad57d2	Free mbuf in case of error. MFC after: 1 week	2013-12-17 10:53:17 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Andrey V. Elsukov	ee674966f4	Fix panic with RADIX_MPATH, when RTFREE_LOCKED() called for already unlocked route. Use in6_rtalloc() instead of in6_rtalloc1. This helps simplify the code and remove several now unused variables. PR: 156283 MFC after: 2 weeks	2013-11-11 12:49:00 +00:00
Gleb Smirnoff	555036b5f6	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
Michael Tuexen	b54ddf225f	Changes from upstream to improve compilation when INET or INET6 or none of them is defined. MFC after: 3 days	2013-11-02 20:12:19 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Andrey V. Elsukov	baa09f1891	Initialize inc_fibnum for properly handling ICMP6_PACKET_TOO_BIG errors in multifib environment. PR: 183265 MFC after: 1 week	2013-10-25 01:02:25 +00:00
Gleb Smirnoff	7caf4ab7ac	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 11:37:57 +00:00
Gleb Smirnoff	4675896098	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:31:42 +00:00
Gleb Smirnoff	6ed910fabe	Hide 'struct ifaddr' definition from userland. Two tools left that use it, namely ipftest(1) and ifmcstat(1). These sniff structure definition using _WANT_IFADDR define. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:19:24 +00:00
Gleb Smirnoff	3fa98cf9ac	Remove unsigned < 0 check.	2013-10-15 10:12:19 +00:00
Gleb Smirnoff	ca695e0807	Remove useless check of ia6 against NULL, right after dereferencing it.	2013-10-15 10:11:23 +00:00
Gleb Smirnoff	0218539652	Now counter_u64_t is known to userland, thus remove hack from r253086. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:09:33 +00:00
Hiroki Sato	6378e1f369	Do not try to detach if the interface does not support IPv6. Tested by: hselasky PR: usb/182820 Approved by: re (glebius)	2013-10-10 09:43:15 +00:00
Gleb Smirnoff	491b520174	Fix mbuf leak. Submitted by: Loganaden Velvindron <logan elandsys.com> Obtained from: NetBSD Approved by: re (kib)	2013-10-07 12:07:40 +00:00
Bjoern A. Zeeb	fd291ae3ec	Update comment from draft to RFC number. Submitted by: Loganaden Velvindron (logan elandsys.com) Approved by: re (gjb) MFC after: 6 days	2013-09-22 14:53:07 +00:00
Mikolaj Golub	4d3dfd450a	Unregister inet/inet6 pfil hooks on vnet destroy. Discussed with: andre Approved by: re (rodrigc)	2013-09-13 18:45:10 +00:00
Dag-Erling Smørgrav	1a05c762b9	Fix the length calculation for the final block of a sendfile(2) transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re	2013-09-10 10:05:59 +00:00
John Baldwin	fa302f207f	Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This matches the types used when computing hash indices and the type of the maximum size of mfchashtbl[]. PR: kern/181821 Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4) MFC after: 1 week	2013-09-05 14:16:37 +00:00
John Baldwin	fd77bbb967	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
Mark Johnston	57f6086735	Implement the ip, tcp, and udp DTrace providers. The probe definitions use dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month	2013-08-25 21:54:41 +00:00
Michael Tuexen	1a94cdbea7	Provide human readable debug output.	2013-08-25 12:44:03 +00:00
Andre Oppermann	9850f95989	For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits. The upper 32bits are not occupied for now. Sponsored by: The FreeBSD Foundation	2013-08-25 09:49:00 +00:00
Andre Oppermann	1b4381afbb	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation	2013-08-24 19:51:18 +00:00
Xin LI	acde2476c4	Fix an integer overflow in computing the size of a temporary buffer can result in a buffer which is too small for the requested operation. Security: CVE-2013-3077 Security: FreeBSD-SA-13:09.ip_multicast	2013-08-22 00:51:37 +00:00
Andre Oppermann	86bd049144	Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius	2013-08-19 13:27:32 +00:00
Andre Oppermann	88388bdcbe	Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific flag instead. The flag is only used within the IP and IPv6 layer 3 protocols. Because some firewall packages treat IPv4 and IPv6 packets the same the flag should have the same value for both. Discussed with: trociny, glebius	2013-08-19 11:08:36 +00:00
Hiroki Sato	5a04191532	Return 0 in nbi->expire when la_expire == 0. Conversion from time_uptime to time_second should not be performed in this case.	2013-08-17 07:14:45 +00:00
Hiroki Sato	ffa0165ae0	Fix incompatibility in ICMPV6CTL_ND6_PRLIST sysctl, and SIOCGPRLST_IN6, SIOCGDRLST_IN6, and SIOCGNBRINFO_IN6 ioctl. These userland interfaces treat expiration times in time_second, not time_uptime.	2013-08-06 17:10:52 +00:00

1 2 3 4 5 ...

1381 Commits