freebsd-dev

Author	SHA1	Message	Date
Alexander V. Chernikov	94017572ab	Fix flowtable part missed in r294706.	2016-01-25 09:31:32 +00:00
Alexander V. Chernikov	61eee0e202	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
Alexander V. Chernikov	809da2a3e0	Remove unused radix_mpath definitions.	2016-01-25 05:28:19 +00:00
Marcelo Araujo	d62edc5eb5	Add an IOCTL rr_limit to let users fine tuning the number of packets to be sent using roundrobin protocol and set a better granularity and distribution among the interfaces. Tuning the number of packages sent by interface can increase throughput and reduce unordered packets as well as reduce SACK. Example of usage: # ifconfig bge0 up # ifconfig bge1 up # ifconfig lagg0 create # ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \ 192.168.1.1 netmask 255.255.255.0 # ifconfig lagg0 rr_limit 500 Reviewed by: thompsa, glebius, adrian (old patch) Approved by: bapt (mentor) Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D540	2016-01-23 04:18:44 +00:00
Bjoern A. Zeeb	009e81b164	MFH @r294567	2016-01-22 15:11:40 +00:00
Bjoern A. Zeeb	1f12da0e82	Just checkpoint the WIP in order to be able to make the tree update easier. Note: this is currently not in a usable state as certain teardown parts are not called and the DOMAIN rework is missing. More to come soon and find its way to head. Obtained from: P4 //depot/user/bz/vimage/... Sponsored by: The FreeBSD Foundation	2016-01-22 15:00:01 +00:00
Alexander V. Chernikov	b7d076ed19	Clean up original route path selection logic a bit. NULL pointer dereference claimed by Coverity was possible if one (or several) next-hops for had their weights set to 0. CID: 1348482	2016-01-15 13:47:11 +00:00
Alexander V. Chernikov	fcbfdb37a1	Fix panic in IP redirect. Panic was introduced in r293466. Found by: Yamagi Burmeister <lists at yamagi.org>>	2016-01-14 16:31:00 +00:00
Alexander V. Chernikov	10e0e23528	Remove now-unused wrappers for various routing functions.	2016-01-14 08:54:44 +00:00
Alexander V. Chernikov	0eb64f4e44	Remove RTF_RNH_LOCKED support from rtalloc1_fib(). Last caller using it was eliminated in r293471. Sponsored by: Yandex LLC	2016-01-13 14:32:48 +00:00
Alexander V. Chernikov	59747033cd	Bring RADIX_MPATH support to new routing KPI to ease migration. Move actual rte selection process from rtalloc_mpath_fib() to the rt_path_selectrte() function. Add public rt_mpath_select() to use in fibX_lookup_ functions.	2016-01-11 08:45:28 +00:00
Alexander V. Chernikov	e5f3746abd	Do not rewrite all ro_flags.	2016-01-11 08:00:13 +00:00
Alexander V. Chernikov	64e9493420	Fix userland build broken by r293470. Pointy hat to: melifaro	2016-01-09 18:42:12 +00:00
Alexander V. Chernikov	36402a681f	Finish r275196: do not dereference rtentry in if_output() routines. The only piece of information that is required is rt_flags subset. In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result. All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error. There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()). Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte. Reviewed by: ae	2016-01-09 16:34:37 +00:00
Alexander V. Chernikov	ea8d14925c	Remove sys/eventhandler.h from net/route.h Reviewed by: ae	2016-01-09 09:34:39 +00:00
Alexander V. Chernikov	f2b2e77a41	(Temporarily) remove route_redirect_event eventhandler. Such handler should pass different set of variables, instead of directly providing 2 locked route entries. Given that it hasn't been really used since at least 2012, remove current code. Will re-add it after finishing most major routing-related changes. Discussed with: np	2016-01-09 06:26:40 +00:00
Alexander V. Chernikov	16703ea811	Please Coverity by removing unneccessary check (rt_key() is always set). Coverity CID: 1347797	2016-01-09 05:39:06 +00:00
Alexander V. Chernikov	048738b546	Do more fine-grained locking in rtrequest1_fib(). Last consumer using RTF_RNH_LOCKED flag was eliminated in r291643. Restrict passing RTF_RNH_LOCKED to rtrequest1_fib() and do better locking for RTM_ADD / RTM_DELETE cases.	2016-01-08 16:25:11 +00:00
Alexander V. Chernikov	9a1b64d5a0	Add rib_lookup_info() to provide API for retrieving individual route entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).	2016-01-04 15:03:20 +00:00
Alexander V. Chernikov	0d4df0290e	Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu(). Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to the caller. Currently, ip6_getpmtu() has 2 totally different use cases: 1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU and return it, w/o any reusability. 2) Actual ip6_output() data path where we (nearly) always use the provided route lookup data. If this data is not 'valid' we need to perform another lookup and save the result (which cannot be re-used by ip6_output()). Given that, handle 1) by calling separate function doing rte lookup itself. Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both ip6_getpmtu_ctl() and ip6_getpmtu(). For 2) instead of storing ref'ed rte, store mtu (the only needed data from the lookup result) inside newly-added ro_mtu field. 'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again by 4 bytes. New ro_mtu field will be used in other places like ip/tcp_output (EMSGSIZE handling from output routines). Reviewed by: ae	2016-01-03 09:54:03 +00:00
Alexander V. Chernikov	6cdb18544d	Remove second EVENTHANDLER_REGISTER slipped in r292978. Describe the reason of doing unconditional M_PREPEND in ether_output().	2016-01-01 10:15:06 +00:00
Marcelo Araujo	25656def0d	Clean up unused-but-set-variable spotted by gcc4.9. Reviewed by: ngie Approved by: rodrigc (mentor) Differential Revision: https://reviews.freebsd.org/D4719	2015-12-31 07:03:41 +00:00
Alexander V. Chernikov	4fb3a8208c	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
Marcelo Araujo	2bfd3dfb9f	Wrap using #ifdef 'notyet' those variables and statements not yet implemented to lower the compiler warnings. It fix the case of unused-but-set-variable spotted by gcc4.9. Reviewed by: ngie, ae Approved by: bapt (mentor) Differential Revision: https://reviews.freebsd.org/D4720	2015-12-31 02:01:20 +00:00
Alexander V. Chernikov	a18742e938	Add SFF-8024 Extended Specification Compliance Submitted by: markb_mellanox.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4666	2015-12-28 09:26:07 +00:00
Bjoern A. Zeeb	f501e6f136	If vnets are torn down while ifconfig runs an ioctl to say, destroy an epair(4), we may hit if_detach_internal() without holding a lock and by the time we aquire it the interface might be gone. We should not panic() in this case as it is our fault for not holding the lock all the way. It is not ideal to return silently without error to user space, but other callers will all ignore the return values so do not change the entire KPI for little benefit for now. The ifp will be dealt with one way or another still. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4529	2015-12-22 15:03:45 +00:00
Bjoern A. Zeeb	616bc4f476	If bootverbose is enabled every vnet startup and virtual interface creation will print extra lines on the console. We are generally not interested in this (repeated) information for each VNET. Thus only print it for the default VNET. Virtual interfaces on the base system will remain printing information, but e.g. each loopback in each vnet will no longer cause a "bpf attached" line. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4531	2015-12-22 15:00:04 +00:00
Bjoern A. Zeeb	76d68eccbd	Simplify bringup order by removing a SYSINIT making it a static list initialization. Mfp4 @180384,180385: There is no need for a dedicated SYSINIT here. The list can be initialized statically. Sponsored by: CK Software GmbH Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4528	2015-12-22 14:57:04 +00:00
Steven Hartland	d6e82913c1	Revert r292275 & r292379 glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay	2015-12-17 14:41:30 +00:00
Alexander V. Chernikov	427c2f4ef0	Provide additional lle data in IPv6 lltable dump used by ndp(8). Before the change, things like lle state were queried via SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump. This ioctl was added in 1999, probably to avoid touching rtsock code. This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the following way: expire (already) maps to rtm_rmx.rmx_expire isrouter -> rtm_flags & RTF_GATEWAY asked -> rtm_rmx.rmx_pksent state -> rtm_rmx.rmx_state (maps to rmx_weight via define) Reviewed by: ae	2015-12-16 10:14:16 +00:00
Alexander V. Chernikov	0792bcbb54	Convert if_stf(4) to new routing api.	2015-12-16 09:18:20 +00:00
Steven Hartland	52e53e2de0	Fix lagg failover due to missing notifications When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111	2015-12-15 16:02:11 +00:00
Alexander V. Chernikov	6af272d88e	Fix PINNED routes handling. Before r291643, adding new interface prefix had the following logic: try_add: EEXIST && (PINNED) { try_del(w/o PINNED flag) if (OK) try_add(PINNED) } In r291643, deletion was performed w/ PINNED flag held which leaded to new interface prefixes (like ::1) overriding older ones. Fix this by requesting deletion w/o RTF_PINNED. PR: kern/205285 Submitted by: Fabian Keil <fk at fabiankeil.de>	2015-12-13 16:37:01 +00:00
Alexander V. Chernikov	12cb7521c2	Remove LLE read lock from IPv6 fast path. LLE structure is mostly unchanged during its lifecycle: there are only 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we send NS to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: Special r_skip_req (introduced in D3688) value is used for fast path feedback. It is read lockless by fast path, but updated under req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. After transitioning to STALE state, callout timer is armed to run each V_nd6_delay seconds to make sure that if packet was transmitted at the start of given interval, we would be able to switch to PROBE state in V_nd6_delay seconds as user expects. (in STALE state) timer is rescheduled until original V_nd6_gctimer expires keeping lle in STALE state (remaining timer value stored in lle_remtime). (in STALE state) timer is rescheduled if packet was transmitted less that V_nd6_delay seconds ago to make sure we transition to PROBE state exactly after V_n6_delay seconds. As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled by fast path without acquiring lle read lock. Differential Revision: https://reviews.freebsd.org/D3780	2015-12-13 07:39:49 +00:00
Alexander V. Chernikov	65ff3638df	Merge helper fib* functions used for basic lookups. Vast majority of rtalloc(9) users require only basic info from route table (e.g. "does the rtentry interface match with the interface I have?". "what is the MTU?", "Give me the IPv4 source address to use", etc..). Instead of hand-rolling lookups, checking if rtentry is up, valid, dealing with IPv6 mtu, finding "address" ifp (almost never done right), provide easy-to-use API hiding all the complexity and returning the needed info into small on-stack structure. This change also helps hiding route subsystem internals (locking, direct rtentry accesses). Additionaly, using this API improves lookup performance since rtentry is not locked. (This is safe, since all the rtentry changes happens under both radix WLOCK and rtentry WLOCK). Sponsored by: Yandex LLC	2015-12-08 10:50:03 +00:00
Alexander V. Chernikov	f8aee88f0b	Remove LLE read lock from IPv4 fast path. LLE structure is mostly unchanged during its lifecycle. To be more specific, there are 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we re-send arp request to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: 1) introduce special r_skip_req field which is read lockless by fast path, but updated under (new) req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. 2) introduce simple state machine: incomplete->reachable<->verify->deleted. Before that we implicitely had incomplete->reachable->deleted state machine, with V_arpt_keep between "reachable" and "deleted". Verification was performed in runtime 5 seconds before V_arpt_keep expire. This is changed to "change state to verify 5 seconds before V_arpt_keep, set r_skip_req to non-zero value and check it every second". If the value is zero - then send arp verification probe. These changes do not introduce any signifficant control plane overhead: typically lle callout timer would fire 1 time more each V_arpt_keep (1200s) for used lles and up to arp_maxtries (5) for dead lles. As a result, all packets towards "reachable" lle are handled by fast path without acquiring lle read lock. Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler might keep LLE lock for signifficant amount of time, which might not be feasible for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock). Differential Revision: https://reviews.freebsd.org/D3688	2015-12-05 09:50:37 +00:00
Alexander V. Chernikov	4b3dc89847	Move RTF_PINNED handling to generic route code. This eliminates last RTF_RNH_LOCKED rtrequest1_fib() user.	2015-12-02 08:17:31 +00:00
Enji Cooper	af5c99e53f	Fix LINT-NOIP kernels after r291467 rn is only used if INET or INET6 are defined Sponsored by: EMC / Isilon Storage Division	2015-12-01 05:59:53 +00:00
Alexander V. Chernikov	674e0823c1	Move flowtable rte checks to separate function.	2015-11-30 05:59:22 +00:00
Alexander V. Chernikov	e8b0643eee	Add new rt_foreach_fib_walk_del() function for deleting route entries by filter function instead of picking into routing table details in each consumer. Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED user). This simplifies future nexthops/mulitipath changes and rtrequest1_fib() locking refactoring. Actual changes: Add "rt_chain" field to permit rte grouping while doing batched delete from routing table (thus growing rte 200->208 on amd64). Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo to pass filter function to various routing subsystems in standard way. Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate rt_expunge().	2015-11-30 05:51:14 +00:00
Enji Cooper	766b4e4b5c	Fix building sys/modules/if_enc by adding missing headers X-MFC with: r291292, r291299 (if that ever happens) Pointyhat to: ae	2015-11-25 21:16:10 +00:00
Andrey V. Elsukov	03b7b4bf05	Fix the build.	2015-11-25 11:31:07 +00:00
Andrey V. Elsukov	ef91a9765d	Overhaul if_enc(4) and make it loadable in run-time. Use hhook(9) framework to achieve ability of loading and unloading if_enc(4) kernel module. INET and INET6 code on initialization registers two helper hooks points in the kernel. if_enc(4) module uses these helper hook points and registers its hooks. IPSEC code uses these hhook points to call helper hooks implemented in if_enc(4).	2015-11-25 07:31:59 +00:00
Fabien Thomas	d6d3f24890	Implement the sadb_x_policy_priority field as it is done in Linux: lower priority policies are inserted first. Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu> Reviewed by: ae Sponsored by: Stormshield	2015-11-17 14:39:33 +00:00
Alexander V. Chernikov	e4790abf19	Pass provided af instead of AF_UNSPEC to setwa_f callback.	2015-11-14 18:16:17 +00:00
Alexander V. Chernikov	8ad43f2d0a	Move iflladdr_event eventhandler invocation to if_setlladdr. Suggested by: glebius	2015-11-14 13:34:03 +00:00
Randall Stewart	7c4676ddee	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.	2015-11-13 22:51:35 +00:00
Alexander V. Chernikov	b13c5b5db2	Use lladdr_event to propagate gratiotus arp. Differential Revision: https://reviews.freebsd.org/D4019	2015-11-09 10:11:14 +00:00
Alexander V. Chernikov	ddd208f7ad	Unify setting lladdr for AF_INET[6].	2015-11-07 11:12:00 +00:00
Steven Hartland	c1be893c44	Add sysctl to control LACP strict compliance default Add net.link.lagg.lacp.default_strict_mode which defines the default value for LACP strict compliance for created lagg devices. Also: * Add lacp_strict option to ifconfig(8). * Fix lagg(4) creation examples. * Minor style(9) fix. MFC after: 1 week	2015-11-06 15:33:27 +00:00
George V. Neville-Neil	33872124a5	Replace the fastforward path with tryforward which does not require a sysctl and will always be on. The former split between default and fast forwarding is removed by this commit while preserving the ability to use all network stack features. Differential Revision: https://reviews.freebsd.org/D4042 Reviewed by: ae, melifaro, olivier, rwatson MFC after: 1 month Sponsored by: Rubicon Communications (Netgate)	2015-11-05 07:26:32 +00:00
Randall Stewart	d1a6f62c45	Fix three flowtable bugs, a) one lookup issue, b) a two cleaner issue. MFC after: 3 days Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D4014	2015-11-02 21:21:00 +00:00
Alexander V. Chernikov	bb3d23fd35	Fix lladdr change propagation for on vlans on top of it. Fix lladdr update when setting mac address manually. Fix lladdr_event for slave ports addition. MFC after: 4 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D4004	2015-11-01 19:59:04 +00:00
Bryan Drewery	2780ba06c7	Avoid passing an uninitialized 'i'. Currently nothing was depending on it anyhow. Coverity CID: 1331562	2015-10-29 18:58:18 +00:00
Andrey V. Elsukov	50bc87bc43	Check the size of data available in mbuf, before using them. PR: 202667 MFC after: 1 week	2015-10-28 17:55:37 +00:00
Kristof Provost	2602284308	pf: Fix compliation warning with gcc While fixing the PF_ANEQ() macro I messed up the parentheses, leading to compliation warnings with gcc. Spotted by: ian Pointy Hat: kp	2015-10-25 18:09:03 +00:00
Kristof Provost	7d7624233a	PF_ANEQ() macro will in most situations returns TRUE comparing two identical IPv4 packets (when it should return FALSE). It happens because PF_ANEQ() doesn't stop if first 32 bits of IPv4 packets are equal and starts to check next 3*32 bits (like for IPv6 packet). Those bits containt some garbage and in result PF_ANEQ() wrongly returns TRUE. Fix: Check if packet is of AF_INET type and if it is then compare only first 32 bits of data. PR: 204005 Submitted by: Miłosz Kaniewski	2015-10-25 13:14:53 +00:00
Ed Maste	40a02d00a5	if_tap: correct typo in sysctl description (Enably) Sponsored by: The FreeBSD Foundation	2015-10-21 19:56:16 +00:00
Alexander V. Chernikov	f221bcaa06	Remove several compat functions from pre-fib era.	2015-10-17 17:26:44 +00:00
Hiroki Sato	b7a581eaa6	Fix a panic when destroying a lagg interface. Differential Revision: https://reviews.freebsd.org/D3883	2015-10-16 01:16:01 +00:00
Kristof Provost	c110fc49da	pf: Fix TSO issues In certain configurations (mostly but not exclusively as a VM on Xen) pf produced packets with an invalid TCP checksum. The problem was that pf could only handle packets with a full checksum. The FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only addresses, length and protocol). Certain network interfaces expect to see the pseudo-header checksum, so they end up producing packets with invalid checksums. To fix this stop calculating the full checksum and teach pf to only update TCP checksums if TSO is disabled or the change affects the pseudo-header checksum. PR: 154428, 193579, 198868 Reviewed by: sbruno MFC after: 1 week Relnotes: yes Sponsored by: RootBSD Differential Revision: https://reviews.freebsd.org/D3779	2015-10-14 16:21:41 +00:00
Hiroki Sato	023d10cbc7	Fix a bug that caused reinitialization failure of MAC addresses on the lagg interface when removing the primary port. PR: 201916 Differential Revision: https://reviews.freebsd.org/D3301	2015-10-07 06:32:34 +00:00
Marcelo Araujo	973532fc7d	Remove per complete the fec aggregation protocol. The remove began with revision r271733. NOTE: This patch must never be merge to 10-Stable Reviewed by: glebius Approved by: bapt (mentor) Relnotes: Yes Sponsored by: EuroBSDCon Sweden. Differential Revision: D3786	2015-10-04 08:00:29 +00:00
Hiroki Sato	f1aaad0cd9	Add IFCAP_LINKSTATE support.	2015-10-03 09:15:23 +00:00
Andrey V. Elsukov	1a6fb597b0	Always detach encap handler when reconfiguring tunnel. Reported by: hrs MFC after: 1 week	2015-10-03 03:57:58 +00:00
Alexander V. Chernikov	1558cb2448	Eliminate nd6_nud_hint() and its TCP bindings. Initially function was introduced in r53541 (KAME initial commit) to "provide hints from upper layer protocols that indicate a connection is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability Confirmation). However, it was converted to do nothing (e.g. just return) in r122922 (tcp_hostcache implementation) back in 2003. Some defines were moved to tcp_var.h in r169541. Then, it was broken (for non-corner cases) by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So, right now this code is broken and has no "real" base users. Differential Revision: https://reviews.freebsd.org/D3699	2015-09-27 05:29:34 +00:00
Alexander V. Chernikov	1fe201c322	Simplify the way of attaching IPv6 link-layer header. Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether\|atm\|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output\|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether\|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469	2015-09-16 14:26:28 +00:00
Andrey V. Elsukov	b71bed24a6	Use KASSERT for some checks, that are late to do. Discussed with: melifaro, glebius	2015-09-16 13:17:00 +00:00
Oleg Bulyzhin	3f70ebbf05	Remove superfluous m_freem(). MFC after: 1 month	2015-09-16 10:07:45 +00:00
Alexander V. Chernikov	59c180c35c	Unify loopback route switching: * prepare gateway before insertion * use RTM_CHANGE instead of explicit find/change route * Remove fib argument from ifa_switch_loopback_route added in r264887: if old ifp fib differes from new one, that the caller is doing something wrong * Make ifa_*_loopback_route call single ifa_maintain_loopback_route().	2015-09-16 06:23:15 +00:00
Alexander V. Chernikov	d3cdb71655	* Require explicitl lle unlink prior to calling llentry_delete(). This one slightly decreases time of holding afdata wlock. * While here, make nd6_free() return void. No one has used its return value since r186119.	2015-09-15 06:48:19 +00:00
Eric van Gyzen	17a036563d	Fix the handling of IPv6 On-Link Redirects. On receipt of a redirect message, install an interface route for the redirected destination. On removal of the corresponding Neighbor Cache entry, remove the interface route. This requires changes in rtredirect_fib() to cope with an AF_LINK address for the gateway and with the absence of RTF_GATEWAY. This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo Phase 2 test suite. Unrelated to the above, fix a recursion on the radix node head lock triggered by the Tahi Redirected to Alternate Router test cases. When I first wrote this patch in October 2012, all Section 2 (Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE, and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported that it passes Tahi. (Thanks!) These other test cases also passed in 2012: * the RTF_MODIFIED case, with IPv4 and IPv6 (using a RTF_HOST\|RTF_GATEWAY route for the destination) * the redirected-to-self case, with IPv4 and IPv6 * a valid IPv4 redirect All testing in 2012 was done with WITNESS and INVARIANTS. Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015, Mark Kelley <mark_kelley@dell.com> in 2012, TC Telkamp <terence_telkamp@dell.com> in 2012 PR: 152791 Reviewed by: melifaro (current rev), bz (earlier rev) Approved by: kib (mentor) MFC after: 1 month Relnotes: yes Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D3602	2015-09-14 19:17:25 +00:00
Alexander V. Chernikov	3e7a2321e3	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
Hans Petter Selasky	d76d40126e	Update TSO limits to include all headers. To make driver programming easier the TSO limits are changed to reflect the values used in the BUSDMA tag a network adapter driver is using. The TCP/IP network stack will subtract space for all linklevel and protocol level headers and ensure that the full mbuf chain passed to the network adapter fits within the given limits. Implementation notes: If a network adapter driver needs to fixup the first mbuf in order to support VLAN tag insertion, the size of the VLAN tag should be subtracted from the TSO limit. Else not. Network adapters which typically inline the complete header mbuf could technically transmit one more segment. This patch does not implement a mechanism to recover the last segment for data transmission. It is believed when sufficiently large mbuf clusters are used, the segment limit will not be reached and recovering the last segment will not have any effect. The current TSO algorithm tries to send MTU-sized packets, where the MTU typically is 1500 bytes, which gives 1448 bytes of TCP data payload per packet for IPv4. That means if the TSO length limitiation is set to 65536 bytes, there will be a data payload remainder of (65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to recover total TSO length due to inlining mbuf header data will not have any effect, because adding or removing the ETH/IP/TCP headers to or from 324 bytes will not cause more or less TCP payload to be TSO'ed. Existing network adapter limits will be updated separately. Differential Revision: https://reviews.freebsd.org/D3458 Reviewed by: rmacklem MFC after: 2 weeks	2015-09-14 08:36:22 +00:00
Hiroki Sato	b1c250ff3f	- Remove GIF_{SEND,ACCEPT}_REVETHIP. - Simplify EADDRNOTAVAIL and EAFNOSUPPORT conditions. MFC after: 3 days	2015-09-10 05:59:39 +00:00
Alexander V. Chernikov	441f9243df	Constantify lookup key in ifa_ifwith* functions. Some places in our network stack already have const arguments (like if_output() routines and LLE functions). Code using ifa_ifwith (and similar functins) along with LLE/_output functions is currently bound to use tricks like __DECONST(). Provide a cleaner way by making sockaddr lookup key really constant. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3464	2015-09-05 05:33:20 +00:00
Hiroki Sato	18e199ad72	Fix a panic which was reproducible by an infinite loop of "ifconfig epair0 create && ifconfig epair0a destroy". This was caused by an uninitialized function pointer in softc->media.	2015-09-02 16:30:45 +00:00
Alexander V. Chernikov	3b0fd911fa	Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in alloc handler, based on flags.	2015-08-31 05:03:36 +00:00
Adrian Chadd	8f1111cf0b	Remove now unused (and #if 0'ed out) headers.	2015-08-29 04:33:31 +00:00
Adrian Chadd	e5562eb934	Replace the printf()s with optional rate limited debugging for RSS. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471	2015-08-28 05:58:16 +00:00
Kristof Provost	64b3b4d611	pf: Remove support for 'scrub fragment crop\|drop-ovl' The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse users into making poor choices. It's also a fairly large amount of complex code, so just remove the support altogether. Users who have 'scrub fragment crop\|drop-ovl' in their pf configuration will be implicitly converted to 'scrub fragment reassemble'. Reviewed by: gnn, eri Relnotes: yes Differential Revision: https://reviews.freebsd.org/D3466	2015-08-27 21:27:47 +00:00
Luiz Otavio O Souza	c614d0a443	Fix the spelling of eri's name. Pointy hat to: loos MFC with: r287009	2015-08-24 23:40:36 +00:00
Luiz Otavio O Souza	0a70aaf8f5	Add ALTQ(9) support for the CoDel algorithm. CoDel is a parameterless queue discipline that handles variable bandwidth and RTT. It can be used as the single queue discipline on an interface or as a sub discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ. Differential Revision: https://reviews.freebsd.org/D3272 Reviewd by: rpaulo, gnn (previous version) Obtained from: pfSense Sponsored by: Rubicon Communications (Netgate)	2015-08-21 22:02:22 +00:00
Alexander V. Chernikov	5a2555160f	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
Hiren Panchasara	0e02b43a07	Make LAG LACP fast timeout tunable through IOCTL. Differential Revision: D3300 Submitted by: LN Sundararajan <lakshmi.n at msystechnologies> Reviewed by: wblock, smh, gnn, hiren, rpokala at panasas MFC after: 2 weeks Sponsored by: Panasas	2015-08-12 20:21:04 +00:00
Alexander V. Chernikov	0447c1367a	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
Alexander V. Chernikov	314294de5c	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
Alexander V. Chernikov	41cb42a633	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
Alexander V. Chernikov	2caee4be35	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
Alexander V. Chernikov	11cdad9873	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
Alexander V. Chernikov	4bdf0b6a9a	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
Alexander V. Chernikov	e362cf0e9f	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
Luiz Otavio O Souza	9224217213	Remove the mtx_sleep() from the kqueue f_event filter. The filter is called from the network hot path and must not sleep. The filter runs with the descriptor lock held and does not manipulates the buffers, so it is not necessary sleep when the hold buffer is in use. Just ignore the hold buffer contents when it is being copied to user space (when hold buffer in use is set). This fix the "Sleeping thread owns a non-sleepable lock" panic when the userland thread is too busy reading the packets from bpf(4). PR: 200323 MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-08-03 22:14:45 +00:00
Luiz Otavio O Souza	98fa5d858c	Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the buffers while the hold buffer is in use). Suggested by: ed, ghelmer MFC with: r286142	2015-08-03 18:22:31 +00:00
John-Mark Gurney	bba6880eab	looks like all archs either have clang or cdefs included before.. drop this include as unnecessary.. Requested by: bde	2015-08-02 21:33:40 +00:00
John-Mark Gurney	70e47040b0	convert to C11's _Static_assert, and pull in sys/cdefs.h for compatibility w/ older non-C11 compilers... passed make tinerdbox.. Suggested by: imp	2015-08-02 00:15:52 +00:00
Luiz Otavio O Souza	f87e372ef2	Remove two unnecessary sleeps from the hot path in bpf(4). The first one never triggers because bpf_canfreebuf() can only be true for zero-copy buffers and zero-copy buffers are not read with read(2). The second also never triggers, because we check the free buffer before calling ROTATE_BUFFERS(). If the hold buffer is in use the free buffer will be NULL and there is nothing else to do besides drop the packet. If the free buffer isn't NULL the hold buffer _is_ free and it is safe to rotate the buffers. Update the comment in ROTATE_BUFFERS macro to match the logic described here. While here fix a few typos in comments. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 21:43:27 +00:00
Luiz Otavio O Souza	faa693cdbe	Remove the sleep from the buffer allocation routine. The buffer must be allocated (or even changed) before the interface is set and thus, there is no need to verify if the buffer is in use. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 20:25:54 +00:00
Luiz Otavio O Souza	4f42daa4a3	Do not allocate the buffers at opening of the descriptor, because once the buffer is allocated we are committed to a particular buffer method (BPF_BUFMODE_BUFFER in this case). If we are using zero-copy buffers, the userland program must register its buffers before set the interface. If we are using kernel memory buffers, we can allocate the buffer at the time that the interface is being set. This fix allows the usage of BIOCSETBUFMODE after r235746. Update the comments to reflect the recent changes. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 20:02:12 +00:00
Andrey V. Elsukov	926381e108	Ansify if_stf.c	2015-07-31 09:04:22 +00:00

1 2 3 4 5 ...

3574 Commits