freebsd-dev

Author	SHA1	Message	Date
Ryan Stone	bc3d87fd59	Increment the route table gen count after a modify Increment the route table generation count after modifying a route. This signals back to TCP connections that they need to update their L2 caches as the gateway for their route may have changed. This is a heavier hammer than is needed, strictly speaking, but route changes will be unlikely enough that the performance effects of invalidating all connection route caches should be negligible. MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13990 Reviewed by: karels	2018-01-23 03:15:44 +00:00
Ryan Stone	19f41c2a52	Plug an ifaddr leak when changing a route's src If a route is modified in a way that changes the route's source address (i.e. the address used to access the gateway), then a reference on the ifaddr representing the old source address will be leaked if the address type does not have an ifa_rtrequest method defined. Plug the leak by releasing the reference in all cases. Differential Revision: https://reviews.freebsd.org/D13417 Reviewed by: ae MFC after: 3 weeks Sponsored by: Dell	2017-12-14 20:48:50 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Bjoern A. Zeeb	ae69ad884d	After inpcb route caching was put back in place there is no need for flowtable anymore (as flowtable was never considered to be useful in the forwarding path). Reviewed by: np Differential Revision: https://reviews.freebsd.org/D11448	2017-07-27 13:03:36 +00:00
Andrey V. Elsukov	b83aa367a5	Resurrect RTF_RNH_LOCKED flag and restore ability to call rtalloc1_fib() with acquired RIB lock. This fixes a possible panic due to trying to acquire RIB rlock when it is already exclusive locked. PR: 215963, 215122 MFC after: 1 week Sponsored by: Yandex LLC	2017-06-13 10:52:31 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Luiz Otavio O Souza	8f1c8ade60	Fix the typos and style(9) in comment. MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2016-12-08 18:18:48 +00:00
Andrey V. Elsukov	abe95d87ee	Replace rw_init/rw_destroy with corresponding macros. Obtained from: Yandex LLC	2016-10-06 14:42:06 +00:00
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Bjoern A. Zeeb	80ae8d609a	Provide a public interface to rt_flushifroutes which takes the address family as an argument as well. This will be used to cleanup individual protocols during VNET teardown. Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation	2016-06-06 12:49:47 +00:00
George V. Neville-Neil	6d76822688	This change re-adds L2 caching for TCP and UDP, as originally added in D4306 but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262	2016-06-02 17:51:29 +00:00
Bjoern A. Zeeb	4f321dbd1c	Fix compile errors after r297225: - properly V_irtualise variable access unbreaking VIMAGE kernels. - remove the volatile from the function return type to make architecture using gcc happy [-Wreturn-type] "type qualifiers ignored on function return type" I am not entirely happy with this solution putting the u_int there but it will do for now.	2016-03-24 11:40:10 +00:00
George V. Neville-Neil	84cc0778d0	FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306	2016-03-24 07:54:56 +00:00
Bjoern A. Zeeb	a5243af262	Code duplication but rib_head is special. Not found an easy way to go back and harmize the use cases among RIB, IPFW, PF yet but it's also not the scope of this work. Prevents instant panics on teardown and frees the FIB bits again. Sponsored by: The FreeBSD Foundation	2016-02-03 21:56:51 +00:00
Alexander V. Chernikov	94017572ab	Fix flowtable part missed in r294706.	2016-01-25 09:31:32 +00:00
Alexander V. Chernikov	61eee0e202	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
Alexander V. Chernikov	fcbfdb37a1	Fix panic in IP redirect. Panic was introduced in r293466. Found by: Yamagi Burmeister <lists at yamagi.org>>	2016-01-14 16:31:00 +00:00
Alexander V. Chernikov	10e0e23528	Remove now-unused wrappers for various routing functions.	2016-01-14 08:54:44 +00:00
Alexander V. Chernikov	0eb64f4e44	Remove RTF_RNH_LOCKED support from rtalloc1_fib(). Last caller using it was eliminated in r293471. Sponsored by: Yandex LLC	2016-01-13 14:32:48 +00:00
Alexander V. Chernikov	f2b2e77a41	(Temporarily) remove route_redirect_event eventhandler. Such handler should pass different set of variables, instead of directly providing 2 locked route entries. Given that it hasn't been really used since at least 2012, remove current code. Will re-add it after finishing most major routing-related changes. Discussed with: np	2016-01-09 06:26:40 +00:00
Alexander V. Chernikov	16703ea811	Please Coverity by removing unneccessary check (rt_key() is always set). Coverity CID: 1347797	2016-01-09 05:39:06 +00:00
Alexander V. Chernikov	048738b546	Do more fine-grained locking in rtrequest1_fib(). Last consumer using RTF_RNH_LOCKED flag was eliminated in r291643. Restrict passing RTF_RNH_LOCKED to rtrequest1_fib() and do better locking for RTM_ADD / RTM_DELETE cases.	2016-01-08 16:25:11 +00:00
Alexander V. Chernikov	9a1b64d5a0	Add rib_lookup_info() to provide API for retrieving individual route entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).	2016-01-04 15:03:20 +00:00
Alexander V. Chernikov	6af272d88e	Fix PINNED routes handling. Before r291643, adding new interface prefix had the following logic: try_add: EEXIST && (PINNED) { try_del(w/o PINNED flag) if (OK) try_add(PINNED) } In r291643, deletion was performed w/ PINNED flag held which leaded to new interface prefixes (like ::1) overriding older ones. Fix this by requesting deletion w/o RTF_PINNED. PR: kern/205285 Submitted by: Fabian Keil <fk at fabiankeil.de>	2015-12-13 16:37:01 +00:00
Alexander V. Chernikov	4b3dc89847	Move RTF_PINNED handling to generic route code. This eliminates last RTF_RNH_LOCKED rtrequest1_fib() user.	2015-12-02 08:17:31 +00:00
Enji Cooper	af5c99e53f	Fix LINT-NOIP kernels after r291467 rn is only used if INET or INET6 are defined Sponsored by: EMC / Isilon Storage Division	2015-12-01 05:59:53 +00:00
Alexander V. Chernikov	674e0823c1	Move flowtable rte checks to separate function.	2015-11-30 05:59:22 +00:00
Alexander V. Chernikov	e8b0643eee	Add new rt_foreach_fib_walk_del() function for deleting route entries by filter function instead of picking into routing table details in each consumer. Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED user). This simplifies future nexthops/mulitipath changes and rtrequest1_fib() locking refactoring. Actual changes: Add "rt_chain" field to permit rte grouping while doing batched delete from routing table (thus growing rte 200->208 on amd64). Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo to pass filter function to various routing subsystems in standard way. Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate rt_expunge().	2015-11-30 05:51:14 +00:00
Alexander V. Chernikov	e4790abf19	Pass provided af instead of AF_UNSPEC to setwa_f callback.	2015-11-14 18:16:17 +00:00
Bryan Drewery	2780ba06c7	Avoid passing an uninitialized 'i'. Currently nothing was depending on it anyhow. Coverity CID: 1331562	2015-10-29 18:58:18 +00:00
Alexander V. Chernikov	f221bcaa06	Remove several compat functions from pre-fib era.	2015-10-17 17:26:44 +00:00
Eric van Gyzen	17a036563d	Fix the handling of IPv6 On-Link Redirects. On receipt of a redirect message, install an interface route for the redirected destination. On removal of the corresponding Neighbor Cache entry, remove the interface route. This requires changes in rtredirect_fib() to cope with an AF_LINK address for the gateway and with the absence of RTF_GATEWAY. This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo Phase 2 test suite. Unrelated to the above, fix a recursion on the radix node head lock triggered by the Tahi Redirected to Alternate Router test cases. When I first wrote this patch in October 2012, all Section 2 (Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE, and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported that it passes Tahi. (Thanks!) These other test cases also passed in 2012: * the RTF_MODIFIED case, with IPv4 and IPv6 (using a RTF_HOST\|RTF_GATEWAY route for the destination) * the redirected-to-self case, with IPv4 and IPv6 * a valid IPv4 redirect All testing in 2012 was done with WITNESS and INVARIANTS. Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015, Mark Kelley <mark_kelley@dell.com> in 2012, TC Telkamp <terence_telkamp@dell.com> in 2012 PR: 152791 Reviewed by: melifaro (current rev), bz (earlier rev) Approved by: kib (mentor) MFC after: 1 month Relnotes: yes Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D3602	2015-09-14 19:17:25 +00:00
Alexander V. Chernikov	441f9243df	Constantify lookup key in ifa_ifwith* functions. Some places in our network stack already have const arguments (like if_output() routines and LLE functions). Code using ifa_ifwith (and similar functins) along with LLE/_output functions is currently bound to use tricks like __DECONST(). Provide a cleaner way by making sockaddr lookup key really constant. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3464	2015-09-05 05:33:20 +00:00
Alexander V. Chernikov	2caee4be35	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
Alexander V. Chernikov	4bdf0b6a9a	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
Luiz Otavio O Souza	8b15f615e0	Follow r256586 and rename the kernel version of the Free() macro to R_Free(). This matches the other macros and reduces the chances to clash with other headers. This also fixes the build of radix.c outside of the kernel environment. Reviewed by: glebius	2015-07-30 02:09:03 +00:00
Marcelo Araujo	546afaf83d	Remove duplicate header entry.	2015-04-16 02:44:37 +00:00
Alexander V. Chernikov	7f948f12f6	Finish r274175: do control plane MTU tracking. Update route MTU in case of ifnet MTU change. Add new RTF_FIXEDMTU to track explicitly specified MTU. Old behavior: ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU. User has to manually update all routes. ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU. However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu gets updated. New behavior: ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them. ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu. route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag. route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag. PR: 194238 MFC after: 1 month CR: D1125	2014-11-17 01:05:29 +00:00
Alexander V. Chernikov	ac2cf5d37e	Revert r274585: rte lock is properly destroyed in uma dtor callback. Pointed by: glebius	2014-11-16 18:15:23 +00:00
Alexander V. Chernikov	3cb04899de	Make witness happy: destroy rte lock before free. MFC after: 2 weeks	2014-11-16 14:56:31 +00:00
Alexander V. Chernikov	57c3556b58	Fix build. Pointy hat to: melifaro	2014-11-06 17:50:35 +00:00
Alexander V. Chernikov	146a181f28	Finish r274118: remove useless fields from struct domain. Sponsored by: Yandex LLC	2014-11-06 14:39:04 +00:00
Alexander V. Chernikov	1a75e3b20f	Make checks for rt_mtu generic: Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking might be an option in some situation, it is not feasible to do MTU checks there: generic (or per-domain) routing code is perfectly capable of doing this. We currrently have 3 places where MTU is altered: 1) route addition. In this case domain overrides radix _addroute callback (in[6]_addroute) and all necessary checks/fixes are/can be done there. 2) route change (especially, GW change). In this case, there are no explicit per-domain calls, but one can override rte by setting ifa_rtrequest hook to domain handler (inet6 does this). 3) ifconfig ifaceX mtu YYYY In this case, we have no callbacks, but ip[6]_output performes runtime checks and decreases rt_mtu if necessary. Generally, the goals are to be able to handle all MTU changes in control plane, not in runtime part, and properly deal with increased interface MTU. This commit changes the following: * removes hooks setting MTU from drivers side * adds proper per-doman MTU checks for case 1) * adds generic MTU check for case 2) * The latter is done by using new dom_ifmtu callback since if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size. However, IPv6 mtu might be different from if_mtu one (e.g. default 1280) for some cases, so we need an abstract way to know maximum MTU size for given interface and domain. * moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies user-supplied data which must be checked. * removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to use this functions on new non-inserted rte. More changes will follow soon. MFC after: 1 month Sponsored by: Yandex LLC	2014-11-06 13:13:09 +00:00
Hiroki Sato	ee0bd4b909	Make net.add_addr_allfibs vnet-local.	2014-09-21 03:48:20 +00:00
Alan Somers	4f8585e021	Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and ifa_ifwithdstaddr. For the sake of backwards compatibility, the new arguments were added to new functions named ifa_ifwithnet_fib and ifa_ifwithdstaddr_fib, while the old functions became wrappers around the new ones that passed RT_ALL_FIBS for the fib argument. However, the backwards compatibility is not desired for FreeBSD 11, because there are numerous other incompatible changes to the ifnet(9) API. We therefore decided to remove it from head but leave it in place for stable/9 and stable/10. In addition, this commit adds the fib argument to ifa_ifwithbroadaddr for consistency's sake. sys/sys/param.h Increment __FreeBSD_version sys/net/if.c sys/net/if_var.h sys/net/route.c Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute. sys/net/route.c sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_options.c sys/netinet/ip_output.c sys/netinet6/nd6.c Fixup calls of modified functions. share/man/man9/ifnet.9 Document changed API. CR: https://reviews.freebsd.org/D458 MFC after: Never Sponsored by: Spectra Logic	2014-09-11 20:21:03 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Alan Somers	2f308a343f	Fix unintended KBI change from r264905. Add _fib versions of ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the _fib() versions with RT_ALL_FIBS, preserving legacy behavior. sys/net/if_var.h sys/net/if.c Add legacy-compatible functions as described above. Ensure legacy behavior when RT_ALL_FIBS is passed as fibnum. sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/net/route.c sys/net/rtsock.c sys/netinet6/nd6.c Call with _fib() functions if we must use a specific fib, or the legacy functions otherwise. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c Improve the udp_dontroute test. The bug that this test exercises is that ifa_ifwithnet() will return the wrong address, if multiple interfaces have addresses on the same subnet but with different fibs. The previous version of the test only considered one possible failure mode: that ifa_ifwithnet_fib() might fail to find any suitable address at all. The new version also checks whether ifa_ifwithnet_fib() finds the correct address by checking where the ARP request goes. Reported by: bz, hrs Reviewed by: hrs MFC after: 1 week X-MFC-with: 264905 Sponsored by: Spectra Logic	2014-05-29 21:03:49 +00:00
Alexander V. Chernikov	972ed56a33	Remove additional fib checks from rtalloc1_fib. It looks like current consumers are either unaware of MRT (and uses RT_DEFAULT_FIB implicitly) or know what thay are doing, In latter case they will be either hit by KASSERT or ESCRH will be returned due to NULL rnh.	2014-05-03 16:38:05 +00:00

1 2 3 4 5 ...

272 Commits