freebsd-dev

Author	SHA1	Message	Date
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Conrad Meyer	2769d06203	in_lltable_alloc and in6 copy: Don't leak LLE in error path Fix a memory leak in error conditions introduced in r292978. Reported by: Coverity CIDs: 1347009, 1347010 Sponsored by: EMC / Isilon Storage Division	2016-04-26 23:13:48 +00:00
Alexander V. Chernikov	9a1b64d5a0	Add rib_lookup_info() to provide API for retrieving individual route entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).	2016-01-04 15:03:20 +00:00
Alexander V. Chernikov	4fb3a8208c	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
Alexander V. Chernikov	f8aee88f0b	Remove LLE read lock from IPv4 fast path. LLE structure is mostly unchanged during its lifecycle. To be more specific, there are 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we re-send arp request to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: 1) introduce special r_skip_req field which is read lockless by fast path, but updated under (new) req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. 2) introduce simple state machine: incomplete->reachable<->verify->deleted. Before that we implicitely had incomplete->reachable->deleted state machine, with V_arpt_keep between "reachable" and "deleted". Verification was performed in runtime 5 seconds before V_arpt_keep expire. This is changed to "change state to verify 5 seconds before V_arpt_keep, set r_skip_req to non-zero value and check it every second". If the value is zero - then send arp verification probe. These changes do not introduce any signifficant control plane overhead: typically lle callout timer would fire 1 time more each V_arpt_keep (1200s) for used lles and up to arp_maxtries (5) for dead lles. As a result, all packets towards "reachable" lle are handled by fast path without acquiring lle read lock. Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler might keep LLE lock for signifficant amount of time, which might not be feasible for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock). Differential Revision: https://reviews.freebsd.org/D3688	2015-12-05 09:50:37 +00:00
Randall Stewart	7c4676ddee	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.	2015-11-13 22:51:35 +00:00
Alexander V. Chernikov	ddd208f7ad	Unify setting lladdr for AF_INET[6].	2015-11-07 11:12:00 +00:00
Alexander V. Chernikov	26a6057525	Fix deletion of ifaddr lle entries when deleting prefix from interface in down state. Regression appeared in r287789, where the "prefix has no corresponding installed route" case was forgotten. Additionally, lltable_delete_addr() was called with incorrect byte order (default is network for lltable code). While here, improve comments on given cases and byte order. PR: 203573 Submitted by: phk	2015-10-18 12:26:25 +00:00
Alexander V. Chernikov	4a336ef40c	rtsock requests for deleting interface address lles started to return EPERM instead of old "ignore-and-return 0" in r287789. This broke arp -da / ndp -cn behavior (they exit on rtsock command failure). Fix this by translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and making arp/ndp ignore these entries in batched delete. MFC after: 2 weeks	2015-09-27 04:54:29 +00:00
Alexander V. Chernikov	59c180c35c	Unify loopback route switching: * prepare gateway before insertion * use RTM_CHANGE instead of explicit find/change route * Remove fib argument from ifa_switch_loopback_route added in r264887: if old ifp fib differes from new one, that the caller is doing something wrong * Make ifa_*_loopback_route call single ifa_maintain_loopback_route().	2015-09-16 06:23:15 +00:00
Alexander V. Chernikov	3e7a2321e3	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
Alexander V. Chernikov	5a2555160f	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
Alexander V. Chernikov	0447c1367a	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
Alexander V. Chernikov	314294de5c	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
Alexander V. Chernikov	41cb42a633	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
Alexander V. Chernikov	11cdad9873	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
Marius Strobl	6e4cd74673	Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT.	2015-08-08 21:41:59 +00:00
Alexander V. Chernikov	e362cf0e9f	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
Andrey V. Elsukov	cc0a3c8ca4	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
Gleb Smirnoff	28ebe80cab	Provide functions to determine presence of a given address configured on a given interface. Discussed with: np Sponsored by: Nginx, Inc.	2015-04-17 11:57:06 +00:00
Randall Stewart	2575fbb827	This fixes a bug in the way that the LLE timers for nd6 and arp were being used. They basically would pass in the mutex to the callout_init. Because they used this method to the callout system, it was possible to "stop" the callout. When flushing the table and you stopped the running callout, the callout_stop code would return 1 indicating that it was going to stop the callout (that was about to run on the callout_wheel blocked by the function calling the stop). Now when 1 was returned, it would lower the reference count one extra time for the stopped timer, then a few lines later delete the memory. Of course the callout_wheel was stuck in the lock code and would then crash since it was accessing freed memory. By using callout_init(c, 1) we always get a 0 back and the reference counting bug does not rear its head. We do have to make a few adjustments to the callouts themselves though to make sure it does the proper thing if rescheduled as well as gets the lock. Commented upon by hiren and sbruno See Phabricator D1777 for more details. Commented upon by hiren and sbruno Reviewed by: adrian, jhb and bz Sponsored by: Netflix Inc.	2015-02-09 19:28:11 +00:00
Alexander V. Chernikov	3a7498636a	* Allocate hash tables separately * Make llt_hash() callback more flexible * Default hash size and hashing method is now per-af * Move lltable allocation to separate function	2015-01-05 17:23:02 +00:00
Alexander V. Chernikov	b44a7d5d87	* Use unified code for deleting entry by sockaddr instead of per-af one. * Remove now unused llt_delete_addr callback.	2015-01-03 19:09:06 +00:00
Alexander V. Chernikov	20dd899505	* Hide lltable implementation details in if_llatbl_var.h * Make most of lltable_* methods 'normal' functions instead of inline * Add lltable_get_<af\|ifp>() functions to access given lltable fields * Temporarily resurrect nd6_lookup() function	2015-01-03 16:04:28 +00:00
Alexander V. Chernikov	73db4e007a	Finish r275628: remove remaining 'base' references.	2015-01-03 13:41:53 +00:00
Alexander V. Chernikov	ee7e9a4e17	* Do not assume lle has sockaddr key after struct lle: use llt_fill_sa_entry() llt method to store lle address in sa. * Eliminate L3_ADDR macro and either reference IPv4/IPv6 address directly from lle or use newly-created llt_fill_sa_entry(). * Do not store sockaddr inside arp/ndp lle anymore.	2014-12-09 00:48:08 +00:00
Alexander V. Chernikov	d82ed5051c	Simplify lle lookup/create api by using addresses instead of sockaddrs.	2014-12-08 23:23:53 +00:00
Alexander V. Chernikov	73b52ad896	Use llt_prepare_static_entry method to prepare valid per-af static entry.	2014-12-07 23:59:44 +00:00
Alexander V. Chernikov	0368226e65	* Retire abstract llentry_free() in favor of lltable_drop_entry_queue() and explicit calls to RTENTRY_FREE_LOCKED() * Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6. * Rename <lltable_\|llt>_delete function to _delete_addr() to note that this function is used to external callers. Make this function maintain its own locking. * Use lookup/unlink/clear call chain from internal callers instead of delete_addr. * Fix LLE_DELETED flag handling	2014-12-07 23:08:07 +00:00
Alexander V. Chernikov	721cd2e032	Do not enforce particular lle storage scheme: * move lltable allocation to per-domain callbacks. * make llentry_link/unlink functions overridable llt methods. * make hash table traversal another overridable llt method.	2014-12-07 17:32:06 +00:00
Alexander V. Chernikov	a743ccd468	* Add llt_clear_entry() callback which is able to do all lle cleanup including unlinking/freeing * Relax locking in lltable_prefix_free_af/lltable_free * Do not pass @llt to lle free callback: it is always NULL now. * Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding unlock/lock sequinces * Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc() to retrieve preferred source address from lle hold queue and pass it instead of lle. * Finally, make nd6_create() create and return unlocked lle * Separate defrtr handling code from nd6_free(): use nd6_check_del_defrtr() to check if we need to keep entry instead of performing GC, use nd6_check_recalc_defrtr() to perform actual recalc on lle removal. * Move isRouter handling from nd6_cache_lladdr() to separate nd6_check_router() * Add initial code to maintain lle runtime flags in sync.	2014-12-07 15:42:46 +00:00
Alexander V. Chernikov	9b65db85e2	Do more fine-grained locking in lltable code: lltable_create_lle() does actual new lle creation without extensive locking and existing lle search. Move lle updating code from gigantic in_arpinput() to arp_update_llle() and some other functions. IPv6 changes to follow.	2014-12-01 21:43:48 +00:00
Alexander V. Chernikov	ce313fdd71	* Unify lle table dump/prefix removal code. * Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes used by lltable code.	2014-11-30 14:35:01 +00:00
Alexander V. Chernikov	4c9df1c5b8	Fix r274855: use proper unlock method.	2014-11-23 16:40:33 +00:00
Alexander V. Chernikov	73d770287d	Do more fine-grained lltable locking: use table runtime lock as rare as we can.	2014-11-23 15:38:06 +00:00
Alexander V. Chernikov	9479029b1f	* Add lltable llt_hash callback * Move lltable items insertions/deletions to generic llt code.	2014-11-23 12:15:28 +00:00
Alexander V. Chernikov	7c066c18db	Use less-invasive approach for IF_AFDATA lock: convert into 2 locks: use rwlock accessible via external functions (IF_AFDATA_CFG_* -> if_afdata_cfg_()) for all control plane tasks use rmlock (IF_AFDATA_RUN_) for fast-path lookups.	2014-11-22 19:53:36 +00:00
Alexander V. Chernikov	27688dfe1d	Temporarily revert r274774.	2014-11-22 17:57:54 +00:00
Alexander V. Chernikov	8f465f6690	Convert &in_ifaddr_lock to dual-locking model: use rwlock accessible via external functions (IN_IFADDR_CFG_* -> in_ifaddr_cfg_()) for all control plane tasks use rmlock (IN_IFADDR_RUN_) for fast-path lookups.	2014-11-22 16:27:51 +00:00
Alexander V. Chernikov	86b94cffe4	Finish r274774: add more headers/fix build for non-debug case.	2014-11-21 23:36:21 +00:00
Alexander V. Chernikov	9883e41b4b	Switch IF_AFDATA lock to rmlock	2014-11-21 02:28:56 +00:00
Alexander V. Chernikov	f9723c7705	Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp we need to return. Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags fields for all relevant functions.	2014-11-20 22:41:59 +00:00
Alexander V. Chernikov	df629abf3e	Rework LLE code locking: * struct llentry is now basically split into 2 pieces: all fields within 64 bytes (amd64) are now protected by both ifdata lock AND lle lock, e.g. you require both locks to be held exclusively for modification. All data necessary for fast path operations is kept here. Some fields were added: - r_l3addr - makes lookup key liev within first 64 bytes. - r_flags - flags, containing pre-compiled decision whether given lle contains usable data or not. Current the only flag is RLLE_VALID. - r_len - prepend data len, currently unused - r_kick - used to provide feedback to control plane (see below). All other fields are protected by lle lock. * Add simple state machine for ARP to handle "about to expire" case: Current model (for the fast path) is the following: - rlock afdata - find / rlock rte - runlock afdata - see if "expire time" is approaching (time_uptime + la->la_preempt > la->la_expire) - if true, call arprequest() and decrease la_preempt - store MAC and runlock rte New model (data plane): - rlock afdata - find rte - check if it can be used using r_* fields only - if true, store MAC - if r_kick field != 0 set it to 0. - runlock afdata New mode (control plane): - schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries) seconds instead of V_arpt_keep. - on first timer invocation change state from ARP_LLINFO_REACHABLE to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in V_arpt_rexmit (default to 1 sec). - on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks for r_kick value: reschedule if not changed, and send arprequest() if set to zero (e.g. entry was used). * Convert IPv4 path to use new single-lock approach. IPv6 bits to follow. * Slow down in_arpinput(): now valid reply will (in most cases) require acquiring afdata WLOCK twice. This is requirement for storing changed lle data. This change will be slightly optimized in future. * Provide explicit hash link/unlink functions for both ipv4/ipv6 code. This will probably be moved to generic lle code once we have per-AF hashing callback inside lltable. * Perform lle unlink on deletion immediately instead of delaying it to the timer routine. * Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the presence of lle reference used for safe callout calls.	2014-11-16 20:12:49 +00:00
Alexander V. Chernikov	b4b1367ae4	* Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete Assume lla_create to return LLE_EXCLUSIVE lock for lle. * Rework lla_rt_output to perform all lle changes under afdata WLOCK. * change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().	2014-11-15 18:54:07 +00:00
Alexander V. Chernikov	a9413f6ca0	Sync to HEAD@r274297.	2014-11-08 18:13:35 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Alexander V. Chernikov	064b1bdb2d	Convert lle rtchecks to use new routing API. For inet/ case, this involves reverting r225947 which seem to be pretty strange commit and should be reverted in HEAD ad well.	2014-11-06 23:35:22 +00:00
Alexander V. Chernikov	8c3cfe0be0	Hide 'struct rtentry' and all its macro inside new header: net/route_internal.h The goal is to make its opaque for all code except route/rtsock and proto domain _rmx.	2014-11-04 17:28:13 +00:00
Hiroki Sato	cc45ae406d	Add a change missing in r271916.	2014-09-21 04:38:50 +00:00
Xin LI	a7f77a3950	Restore historical behavior of in_control, which, when no matching address is found, the first usable address is returned for legacy ioctls like SIOCGIFBRDADDR, SIOCGIFDSTADDR, SIOCGIFNETMASK and SIOCGIFADDR. While there also fix a subtle issue that a caller from a jail asking for INADDR_ANY may get the first IP of the host that do not belong to the jail. Submitted by: glebius Differential Revision: https://reviews.freebsd.org/D667	2014-08-22 19:08:12 +00:00
Steven Hartland	5af464bbe0	Ensure that IP's added to CARP always use the CARP MAC Previously there was a race condition between the address addition and associating it with the CARP which resulted in the interface MAC, instead of the CARP MAC, being used for a brief amount of time. This caused "is using my IP address" warnings as well as data being sent to the wrong machine due to incorrect ARP entries being recorded by other devices on the network.	2014-07-31 16:43:56 +00:00
Steven Hartland	d34165f759	Only check error if one could have been generated	2014-07-31 09:18:29 +00:00
Gleb Smirnoff	9753faf553	Garbage collect couple of unused fields from struct ifaddr: - ifa_claim_addr() unused since removal of NetAtalk - ifa_metric seems to be never utilized, always a copy of if_metric	2014-07-29 15:01:29 +00:00
Alan Somers	7278b62aee	Fix a panic when removing an IP address from an interface, if the same address exists on another interface. The panic was introduced by change 264887, which changed the fibnum parameter in the call to rtalloc1_fib() in ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS. The solution is to use the interface fib in that call. For the majority of users, that will be equivalent to the legacy behavior. PR: kern/189089 Reported by: neel Reviewed by: neel MFC after: 3 weeks X-MFC with: 264887 Sponsored by: Spectra Logic	2014-04-29 14:46:45 +00:00
Alan Somers	0cfee0c223	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic	2014-04-24 23:56:56 +00:00
Alan Somers	0489b8916e	Fix host and network routes for new interfaces when net.add_addr_allfibs=0 sys/net/route.c In rtinit1, use the interface fib instead of the process fib. The latter wasn't very useful because ifconfig(8) is usually invoked with the default process fib. Changing ifconfig(8) to use setfib(2) would be redundant, because it already sets the interface fib. tests/sys/netinet/fibs_test.sh Clear the expected ATF failure sys/net/if.c Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib sys/netinet/in.c sys/net/if_var.h Add a fibnum argument to ifa_switch_loopback_route, a subroutine of in_scrubprefix. Pass it the interface fib. PR: kern/187549 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-04-24 17:23:16 +00:00
Kevin Lo	e06e816f67	Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks. Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian	2014-04-07 01:53:03 +00:00
Alan Somers	743c072a09	Correct ARP update handling when the routes for network interfaces are restricted to a single FIB in a multifib system. Restricting an interface's routes to the FIB to which it is assigned (by setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve: can't allocate llinfo for x.x.x.x". This is due to the ARP update code hard coding it's lookup for existing routing entries to FIB 0. sys/netinet/in.c: When dealing with RTM_ADD (add route) requests for an interface, use the interface's assigned FIB instead of the default (FIB 0). sys/netinet/if_ether.c: In arpresolve(), enhance error message generated when an lla_lookup() fails so that the interface causing the error is visible in logs. tests/sys/netinet/fibs_test.sh Clear ATF expected error. PR: kern/167947 Submitted by: Nikolay Denev <ndenev@gmail.com> (previous version) Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-26 22:46:03 +00:00
Alexander V. Chernikov	a49b317c41	Fix refcount leak on netinet ifa. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Yandex LLC	2014-01-16 12:35:18 +00:00
Alexander V. Chernikov	d375edc9b5	Simplify inet alias handling code: if we're adding/removing alias which has the same prefix as some other alias on the same interface, use newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg(). This eliminates the following rtsock messages: Pinned RTM_ADD for prefix (for alias addition). Pinned RTM_DELETE for prefix (for alias withdrawal). Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24): before commit, addition: got message of size 116 on Fri Jan 10 14:13:15 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 got message of size 192 on Fri Jan 10 14:13:15 2014 RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff after commit, addition: got message of size 116 on Fri Jan 10 13:56:26 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255 before commit, wihdrawal: got message of size 192 on Fri Jan 10 13:58:59 2014 RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff got message of size 116 on Fri Jan 10 13:58:59 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 adter commit, withdrawal: got message of size 116 on Fri Jan 10 14:14:11 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong (and requires some hacks to keep prefix in route table on RTM_DELETE). I've tested this change with quagga (no change) and bird (). bird alias handling is already broken in BSD sysdep code, so nothing changes here, too. I'm going to MFC this change if there will be no complains about behavior change. While here, fix some style(9) bugs introduced by r260488 (pointed by glebius and bde). Sponsored by: Yandex LLC MFC after: 4 weeks	2014-01-10 12:13:55 +00:00
Andrey V. Elsukov	e2d14d9317	Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag. MFC after: 1 week	2014-01-03 02:32:05 +00:00
Gleb Smirnoff	9706c950a2	Fix couple of bugs from r257692 related to scan of address list on an interface: - in in_control() skip over not AF_INET addresses. - in in_aifaddr_ioctl() and in_difaddr_ioctl() do correct check of address family, w/o accessing memory beyond struct ifaddr. Sponsored by: Nginx, Inc.	2013-12-29 22:20:06 +00:00
Gleb Smirnoff	c1f7c3f500	In r257692 I intentionally deleted code that handled P2P interfaces with equal addresses on both sides. It appeared that OpenVPN uses such configutations. Submitted by: trociny	2013-11-17 15:14:07 +00:00
Gleb Smirnoff	555036b5f6	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
Gleb Smirnoff	77b89ad837	Provide compat layer for OSIOCAIFADDR.	2013-11-06 19:46:20 +00:00
Gleb Smirnoff	821b5caf7a	Fix my braino in r257692. For SIOCG*ADDR we don't need exact match on specified address, actually in most cases the address isn't specified. Reported by: peter	2013-11-06 08:36:08 +00:00
Nathan Whitehorn	6224cd89c0	Fix build on GCC.	2013-11-06 01:14:00 +00:00
Gleb Smirnoff	f7a39160c2	Rewrite in_control(), so that it is comprehendable without getting mad. o Provide separate functions for SIOCAIFADDR and for SIOCDIFADDR, with clear code flow from beginning to the end. After that the rest of in_control() gets very small and clear. o Provide sx(9) lock to protect against parallel ioctl() invocations. o Reimplement logic from r201282, that tried to keep localhost route in table when multiple P2P interfaces with same local address are created and deleted. Discussed with: pluknet, melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-11-05 07:44:15 +00:00
Gleb Smirnoff	b1b9dcae46	Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by default from the very beginning. It was placed in wrong namespace net.link.ether, originally it had been at another wrong namespace. It was incorrectly documented at incorrect manual page arp(8). Since new-ARP commit, the tunable have been consulted only on route addition, and ignored on route deletion. Behaviour of a system with tunable turned off is not fully correct, and has no advantages comparing to normal behavior.	2013-11-05 07:32:09 +00:00
Gleb Smirnoff	237bf7f773	Cleanup in_ifscrub(), which is just an entry to in_scrubprefix().	2013-11-01 10:18:41 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Gleb Smirnoff	4675896098	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:31:42 +00:00
Andrey V. Elsukov	5b7cb97c2b	Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU counters.	2013-07-09 09:50:15 +00:00
Oleg Bulyzhin	1571132f14	Plug static llentry leak (ipv4 & ipv6 were affected). PR: kern/172985 MFC after: 1 month	2013-04-21 21:28:38 +00:00
Gleb Smirnoff	9711a168b9	Retire struct sockaddr_inarp. Since ARP and routing are separated, "proxy only" entries don't have any meaning, thus we don't need additional field in sockaddr to pass SIN_PROXY flag. New kernel is binary compatible with old tools, since sizes of sockaddr_inarp and sockaddr_in match, and sa_family are filled with same value. The structure declaration is left for compatibility with third party software, but in tree code no longer use it. Reviewed by: ru, andre, net@	2013-01-31 08:55:21 +00:00
Peter Wemm	8a1163e82f	Temporarily revert rev 244678. This is causing loopback problems with the lo (loopback) interfaces.	2013-01-03 10:21:28 +00:00
Gleb Smirnoff	468e45f3bd	The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify all interested parties in case if interface flag IFF_UP has changed. However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR and SIOCAIFADDR_IN6 can, too. The actual \|= is done not in the protocol code, but in code of interface drivers. To fix this historical layering violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the IFF_UP flag, and if it did, run the if_up() handler. This fixes configuring an address under CARP control on an interface that was initially !IFF_UP. P.S. I intentionally omitted handling the IFF_SMART flag. This flag was never ever used in any driver since it was introduced, and since it means another layering violation, it should be garbage collected instead of pretended to be supported.	2012-12-25 13:01:58 +00:00
Gleb Smirnoff	3e6c8b5366	Minor style(9) changes: - Remove declaration in initializer. - Add empty line between logical blocks.	2012-12-24 21:35:48 +00:00
Randall Stewart	7db496de2c	Though I disagree, I conceed to jhb & Rui. Note that we still have a problem with this whole structure of locks and in_input.c [it does not lock which it should not, but this can lead to crashes]. (I have seen it in our SQA testbed.. besides the one with a refcnt issue that I will have SQA work on next week ;-)	2012-08-19 11:54:02 +00:00
Randall Stewart	9424879158	Ok jhb, lets move the ifa_free() down to the bottom to assure that all tables and such are removed before we start to free. This won't protect the Hash in ip_input.c but in theory should protect any other uses that do use locks. MFC after: 1 week (or more)	2012-08-17 05:51:46 +00:00
Randall Stewart	184749821f	Its never a good idea to double free the same address. MFC after: 1 week (after the other commits ahead of this gets MFC'd)	2012-08-16 17:55:16 +00:00
Gleb Smirnoff	ea53792942	Fix races between in_lltable_prefix_free(), lla_lookup(), llentry_free() and arptimer(): o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via \|= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs. The patch is a collaborative work of all submitters and myself. PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>	2012-08-02 13:57:49 +00:00
Gleb Smirnoff	b9aee262e5	Some more whitespace cleanup.	2012-08-01 09:00:26 +00:00
Gleb Smirnoff	ea50c13ebe	Some style(9) and whitespace changes. Together with: Andrey Zonov <andrey zonov.org>	2012-07-31 11:31:12 +00:00
Navdeep Parhar	09fe63205c	- Updated TOE support in the kernel. - Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features. - iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon. Build-tested with make universe. 30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m \| grep TOE Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp \| grep toe # sockstat -46c \| grep toe Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)	2012-06-19 07:34:13 +00:00
Gleb Smirnoff	90b357f6ec	M_DONTWAIT is a flag from historical mbuf(9) allocator, not malloc(9) or uma(9) flag.	2012-04-10 06:52:39 +00:00
Kip Macy	a93cda789a	When using flowtable llentrys can outlive the interface with which they're associated at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer valid. Move the free pointer in to the llentry itself and update the initalization sites. MFC after: 2 weeks	2012-02-23 18:21:37 +00:00
Bjoern A. Zeeb	81d5d46b3c	Add multi-FIB IPv6 support to the core network stack supplementing the original IPv4 implementation from r178888: - Use RT_DEFAULT_FIB in the IPv4 implementation where noticed. - Use rtfib() KPI with explicit RT_DEFAULT_FIB where applicable in the NFS code. - Use the new in6_rt KPI in TCP, gif(4), and the IPv6 network stack where applicable. - Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally prevent multiple initializations of callouts in in6_inithead(). - Use wrapper functions where needed to preserve the current KPI to ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup. - Fix (related) comments (both technical or style). - Convert to rtinit() where applicable and only use custom loops where currently not possible otherwise. - Multicast group, most neighbor discovery address actions and faith(4) are locked to the default FIB. Individual IPv6 addresses will only appear in the default FIB, however redirect information and prefixes of connected subnets are automatically propagated to all FIBs by default (mimicking IPv4 behavior as closely as possible). Sponsored by: Cisco Systems, Inc.	2012-02-03 13:08:44 +00:00
Gleb Smirnoff	56cf9dc1f6	Drop support for SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFBRDADDR, SIOCSIFDSTADDR ioctl commands. PR: 163524 Reviewed by: net	2012-01-16 09:53:24 +00:00
John Baldwin	137f91e80f	Convert all users of IF_ADDR_LOCK to use new locking macros that specify either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks	2012-01-05 19:00:36 +00:00
John Baldwin	0f188ebb1d	Use a helper variable to wrap a long line.	2012-01-04 13:29:26 +00:00
John Baldwin	56d6e1292d	In the handling of the SIOC[DG]LIFADDR icotls in in_lifaddr_ioctl(), add missing interface address list locking and grab a reference on the matching interface address after dropping the lock while it is used to avoid a potential use after free. Reviewed by: bz MFC after: 1 week	2012-01-04 13:26:56 +00:00
John Baldwin	0823c29b81	Fix the SIOC[DG]LIFADDR ioctls in in_lifaddr_ioctl() to work with IPv4 interface address rather than IPv6. Submitted by: hrs Reviewed by: bz MFC after: 1 week	2012-01-04 13:23:51 +00:00
Gleb Smirnoff	7121247312	Provide ABI compatibility shim to enable configuring of addresses with ifconfig(8) prior to r228571. Requested by: brooks	2011-12-21 12:39:08 +00:00
Gleb Smirnoff	92ed4e1a24	Since size of struct in_aliasreq has just been changed in r228571, and thus ifconfig(8) needs recompile, it is a good chance to make parameter checks on SIOCAIFADDR arguments more strict.	2011-12-16 13:30:17 +00:00
Gleb Smirnoff	08b68b0e4c	A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]	2011-12-16 12:16:56 +00:00
Gleb Smirnoff	55174c34ef	Belatedly catch up with r151555. in_scrubprefix() also needs this fix. We should compare not only addresses, but their masks, too, when searching for matching prefix.	2011-12-13 06:56:43 +00:00
Gleb Smirnoff	f769e5b0fa	Fix a very special case when SIOCAIFADDR supplies mask of 0.0.0.0, don't overwrite the mask with autoguessing based on classes.	2011-12-06 20:55:20 +00:00
Gleb Smirnoff	89b9325530	Fix one more fallout from r227791: do not overwrite trimmed sa_len on the ia_sockmask when doing SIOCSIFNETMASK. Reported by: Stefan Bethke <stb lassitu.de>, gonzo Pointy hat to: glebius	2011-11-28 13:30:14 +00:00
Gleb Smirnoff	c6e5c71116	Remove superfluous check: SIOCAIFADDR must have ifra_addr supplied.	2011-11-24 22:46:11 +00:00

1 2 3 4 5 ...

338 Commits