freebsd-nq

Author	SHA1	Message	Date
Qing Li	c7ab66020f	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days	2009-12-30 21:35:34 +00:00
John Baldwin	5428776e2c	Change vlan interfaces to cope more usefully with the parent interface being renamed. Previously the vlan interfaces would lose their configuration as if the parent interface had been physically removed. Now vlan interfaces ignore rename events. - Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being renamed. This flag can be checked in ifnet departure/arrival event handlers to treat rename events differently. - Change the ifnet departure event handler in the if_vlan(4) driver to ignore departure events due to a trunk interface being renamed. Reviewed by: brooks, rwatson MFC after: 1 week	2009-12-29 13:35:18 +00:00
Luigi Rizzo	830c6e2b97	bring in several cleanups tested in ipfw3-head branch, namely: r201011 - move most of ng_ipfw.h into ip_fw_private.h, as this code is ipfw-specific. This removes a dependency on ng_ipfw.h from some files. - move many equivalent definitions of direction (IN, OUT) for reinjected packets into ip_fw_private.h - document the structure of the packet tags used for dummynet and netgraph; r201049 - merge some common code to attach/detach hooks into a single function. r201055 - remove some duplicated code in ip_fw_pfil. The input and output processing uses almost exactly the same code so there is no need to use two separate hooks. ip_fw_pfil.o goes from 2096 to 1382 bytes of .text r201057 (see the svn log for full details) - macros to make the conversion of ip_len and ip_off between host and network format more explicit r201113 (the remaining parts) - readability fixes -- put braces around some large for() blocks, localize variables so the compiler does not think they are uninitialized, do not insist on precise allocation size if we have more than we need. r201119 - when doing a lookup, keys must be in big endian format because this is what the radix code expects (this fixes a bug in the recently-introduced 'lookup' option) No ABI changes in this commit. MFC after: 1 week	2009-12-28 10:47:04 +00:00
Robert Watson	912f6323cd	When warning about possible netisr configuration problems during boot, report using "netisr_init" rather than "netisr2", which was the development name for the project. MFC after: 3 days	2009-12-23 12:33:59 +00:00
Robert Watson	0a32e29f59	Refine netisr.c comments a bit.	2009-12-23 12:31:27 +00:00
Luigi Rizzo	de240d1013	merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw. In detail: 1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK. 2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N). 3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1). 4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id> All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t Operation costs now are as follows: Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) ------------------------------------------------------------------- The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI Supported by: Valeria Paoli MFC after: 1 month	2009-12-22 19:01:47 +00:00
John Baldwin	8e9683767c	Remove commented out prototype for ifinit(). This prototype has been commented out since 1.1 and has not been present in <sys/systm.h> since at least 1.1 of that file. It is also not needed in FreeBSD due to SYSINIT().	2009-12-21 20:09:19 +00:00
Luigi Rizzo	70228fb346	Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part. Files touched by this commit: conf/files now references the two new source files netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h. netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions netinet/ipfw/ip_fw_log.c ipfw_log and related functions netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer. A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them. MFC after: 1 month	2009-12-15 16:15:14 +00:00
Luigi Rizzo	614cb83990	Move the scan for max_keylen into route.c::route_init(), and make max_keylen an argument for rn_init(). This removes an unnecessary dependency on domain.h from radix.c MFC after: 7 days	2009-12-14 20:12:51 +00:00
Bjoern A. Zeeb	de0bd6f76b	Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days	2009-12-13 13:57:32 +00:00
Luigi Rizzo	a50f6188de	Make the code buildable in userland so it is easier to test it: this requires a small reordering of headers and a few #defines to map functions not available in userland. Remove a useless #ifndef block at the beginning of the file. Introduce (temporarily) rn_init2(), see the comment in the code for the proper long term change. No ABI or functional change. MFC after: 7 days	2009-12-12 15:49:28 +00:00
Luigi Rizzo	22efc80fd8	No functional changes (who dares to touch this code!) but: - cast the result of LEN() to int as this is the main usage. - use LEN() in one place where it was forgotten. - Document the use of a static variable in rw mode. More small changes to follow. MFC after: 7 days	2009-12-10 10:34:30 +00:00
John Baldwin	34605f8542	Remove if_timer/if_watchdog now that they are no longer used. The space used by if_timer is reserved for expanding if_index to an int in the future. Reviewed by: rwatson, brooks	2009-11-30 21:25:57 +00:00
Jung-uk Kim	c12b965f99	General style cleanup, no functional change.	2009-11-20 21:12:40 +00:00
Jung-uk Kim	5ecf77367c	- Allocate scratch memory on stack instead of pre-allocating it with the filter as we do from bpf_filter()[1]. - Revert experimental use of contigmalloc(9)/contigfree(9). It has no performance benefit over malloc(9)/free(9)[2]. Requested by: rwatson[1] Pointed out by: rwatson, jhb, alc[2]	2009-11-20 18:49:20 +00:00
Jung-uk Kim	ae4fdab8a8	- Change internal function bpf_jit_compile() to return allocated size of the generated binary and remove page size limitation for userland. - Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make sure the generated binary aligns properly and make it physically contiguous.	2009-11-18 23:40:19 +00:00
Jung-uk Kim	366652f987	- Make BPF JIT compiler working again in userland. We are limiting size of generated native binary to page size for now. - Update copyright date and fix some style nits.	2009-11-18 19:26:17 +00:00
Michael Tuexen	7f2797200f	Fix a LOR showing up with sctp_bsd_addr(): Do not hold a rt lock when calling rt_newaddrmsg(). Reviewed by: qingli Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 12:57:10 +00:00
Xin LI	1a9d4dda9b	Revert revision 199201 for now as it has introduced a kernel vulnerability and requires more polishing.	2009-11-12 19:02:10 +00:00
Xin LI	41c8c6e876	Add interface description capability as inspired by OpenBSD. MFC after: 3 months	2009-11-11 21:30:58 +00:00
John Baldwin	e1b17582f4	Take a step towards removing if_watchdog/if_timer. Don't explicitly set if_watchdog/if_timer to NULL/0 when initializing an ifnet. if_alloc() sets those members to NULL/0 already.	2009-11-06 14:55:01 +00:00
Robert Watson	974e99b008	Remove unneeded blank line from bpf_drvinit(). MFC after: 3 days	2009-10-23 17:26:29 +00:00
Christian Brueffer	4382b0681e	Check pointer for NULL before dereferencing it, not after. PR: 138390 Submitted by: Patroklos Argyroudis <argp@census-labs.com> MFC after: 1 week	2009-10-22 06:17:04 +00:00
Qing Li	fc02477e1c	Verify "smp_started" is true before calling sched_bind() and sched_unbind(). Reviewed by: kmacy MFC after: 3 days	2009-10-22 00:32:01 +00:00
Qing Li	48d0c039cb	The flow-table function flowtable_route_flush() may be called during system initialization time. Since the flow-table is designed to maintain per CPU flow cache, the existing code did not check whether "smp_started" is true before calling sched_bind() and sched_unbind(), which triggers a page fault. Reviewed by: jeff MFC after: immediately	2009-10-20 21:27:03 +00:00
Robert Watson	cee8119875	Clean up comments, white space, and style in pfil.c (especially new VNET bits). MFC after: 3 days (not VNET bits)	2009-10-19 15:19:14 +00:00
Robert Watson	23b5fd2285	Remove unused pfil_flags field in packet_filter_hook. MFC after: 3 days	2009-10-18 22:54:09 +00:00
Robert Watson	c9ddf688b6	Sort function prototypes in pfil.h, clean up white space, and better align fields for printing. MFC after: 3 days	2009-10-18 22:43:28 +00:00
Robert Watson	33c89765f1	Line-wrap pfil.c so that it prints more nicely. MFC after: 3 days	2009-10-18 11:27:34 +00:00
Bjoern A. Zeeb	382e8b5ad9	Unbreak the VIMAGE build with IPSEC, broken with r197952 by virtualizing the pfil hooks. For consistency add the V_ to virtualize the pfil hooks in here as well. MFC after: 55 days X-MFC after: julian MFCed r197952.	2009-10-14 11:55:55 +00:00
Julian Elischer	0b4b0b0fee	Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months	2009-10-11 05:59:43 +00:00
Bjoern A. Zeeb	db44ff4047	Put #ifdef INET around parts of the FLOWTABLE code, to unbreak nooptions INET kernel builds. MFC after: 3 days X-MFC: with r197687	2009-10-03 10:56:03 +00:00
Qing Li	e5c610d659	The flow-table associates TCP/UDP flows and IP destinations with specific routes. When the routing table changes, for example, when a new route with a more specific prefix is inserted into the routing table, the flow-table is not updated to reflect that change. As such existing connections cannot take advantage of the new path. In some cases the path is broken. This patch will update the affected flow-table entries when a more specific route is added. The route entry is properly marked when a route is deleted from the table. In this case, when the flow-table performs a search, the stale entry is updated automatically. Therefore this patch is not necessary for route deletion. Submitted by: simon, phk Reviewed by: bz, kmacy MFC after: 3 days	2009-10-01 20:32:29 +00:00
Qing Li	46e7f9838b	A wrong variable is used when setting up the interface address route, which broke source address selection in some code paths. Submitted by: noted by bz Reviewed by: hrs MFC after: immediately	2009-09-20 17:22:19 +00:00
Marko Zec	38d61195b8	Style fix - break too long a line in two. Spotted by: bz MFC after: 3 days	2009-09-18 09:03:23 +00:00
Marko Zec	989e04112b	V_irtualize the lltables list, making ARP and ND reasonably usable again with options VIMAGE kernels. Submitted by: bz (the original version, probably identical to this one) Reviewed by: many @ DevSummit Cambridge MFC after: 3 days	2009-09-17 14:52:15 +00:00
Qing Li	9bb7d0f47a	Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately	2009-09-15 19:18:34 +00:00
Robert Watson	e76d823b81	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
Ed Maste	1bdc73d337	Compare pointer with NULL, not 0.	2009-09-09 03:36:43 +00:00
Navdeep Parhar	9a31144537	Add arp_update_event. This replaces route_arp_update_event, which has not worked since the arp-v2 rewrite. The event handler will be called with the llentry write-locked and can examine la_flags to determine whether the entry is being added or removed. Reviewed by: gnn, kmacy Approved by: gnn (mentor) MFC after: 1 month	2009-09-08 21:17:17 +00:00
Qing Li	d134008aa0	The addresses that are assigned to the loopback interface should be part of the kernel routing table. Reviewed by: bz MFC after: immediately	2009-09-05 20:24:37 +00:00
Qing Li	9452b0d2de	This patch fixes the following issues: - Interface link-local address is not reachable within the node that owns the interface, this is due to the mismatch in address scope as the result of the installed interface address loopback route. Therefore for each interface address loopback route, the rt_gateway field (of AF_LINK type) will be used to track which interface a given address belongs to. This will aid the address source to use the proper interface for address scope/zone validation. - The loopback address is not reachable. The root cause is the same as the above. - Empty nd6 entries are created for the IPv6 loopback addresses only for validation reason. Doing so will eliminate as much of the special case (loopback addresses) handling code as possible, however, these empty nd6 entries should not be returned to the userland applications such as the "ndp" command. Since both of the above issues contain common files, these files are committed together. Reviewed by: bz MFC after: immediately	2009-09-05 16:43:16 +00:00
George V. Neville-Neil	54fc657d59	Add ARP statistics to the kernel and netstat. New counters now exist for: requests sent replies sent requests received replies received packets received total packets dropped due to no ARP entry entrys timed out Duplicate IPs seen The new statistics are seen in the netstat command when it is given the -s command line switch. MFC after: 2 weeks In collaboration with: bz	2009-09-03 21:10:57 +00:00
Qing Li	5311e988ea	As part of r196609, a call to "rtalloc" did not take the fib into account. So call the appropriate "rtalloc_ign_fib()" instead of calling "rtalloc_ign()". Reviewed by:i pointed out by bz MFC after: immediately	2009-08-31 00:14:37 +00:00
Marko Zec	a99fcfd4ca	Introduce a separate sx lock for protecting lists of vnet sysinit and sysuninit handlers. Previously, sx_vnet, which is a lock designated for protecting the vnet list, was (ab)used for protecting vnet sysinit / sysuninit handler lists as well. Holding exclusively the sx_vnet lock while invoking sysinit and / or sysuninit handlers turned out to be problematic, since some of the handlers may attempt to wake up another thread and wait for it to walk over the vnet list, hence acquire a shared lock on sx_vnet, which in turn leads to a deadlock. Protecting vnet sysinit / sysuninit lists with a separate lock mitigates this issue, which was first observed with flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c. Reviewed by: rwatson, jhb MFC after: 3 days	2009-08-28 22:30:55 +00:00
Qing Li	9231d35f4d	In ip_output(), the flow-table module must not try to cache L2/L3 information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type. Since the L2 information (rt_lle) is invalid for these interface types, accidental caching attempt will trigger panic when the invalid rt_lle reference is accessed. When installing a new route, or when updating an existing route, the user supplied gateway address may be an interface address (this is particularly true for point-to-point interface related modules such as ppp, if_tun, if_gif). Currently the routing command handler always set the RTF_GATEWAY flag if the gateway address is given as part of the command paramters. Therefore the gateway address must be verified against interface addresses or else the route would be treated as an indirect route, thus making that route unusable. Reviewed by: kmacy, julia, rwatson Verified by: marcus MFC after: 3 days	2009-08-28 07:01:09 +00:00
Robert Watson	ed2dabfc68	Add IFNET_HOLD reserved pointer value for the ifindex ifnet array, which allows an index to be reserved for an ifnet without making the ifnet available for management operations. Use this in if_alloc() while the ifnet lock is released between initial index allocation and completion of ifnet initialization. Add ifindex_free() to centralize the implementation of releasing an ifindex value. Use in if_free() and if_vmove(), as well as when releasing a held index in if_alloc(). Reviewed by: bz MFC after: 3 days	2009-08-26 11:13:10 +00:00
Robert Watson	61f6986b07	Break out allocation of new ifindex values from if_alloc() and if_vmove(), and centralize in a single function ifindex_alloc(). Assert the IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not close all known races in this code. Reviewed by: bz MFC after: 3 days	2009-08-25 20:21:16 +00:00
Robert Watson	dc56e98f0d	Use locks specific to the lltable code, rather than borrow the ifnet list/index locks, to protect link layer address tables. This avoids lock order issues during interface teardown, but maintains the bug that sysctl copy routines may be called while a non-sleepable lock is held. Reviewed by: bz, kmacy MFC after: 3 days	2009-08-25 09:52:38 +00:00
Jack F Vogel	3de029efaf	When bridging LRO is causing a problem, the believe that it would work as long as all interfaces have TSO seems to be false, until the matter gets sorted out just disable LRO completely.	2009-08-24 21:04:51 +00:00
Robert Watson	8e937462f4	Make if_grow static -- it's not used outside of if.c, and with the internals destined to change, it's better if it remains that way. MFC after: 3 days	2009-08-24 12:52:05 +00:00
Marko Zec	52db6805ea	When moving ifnets from one vnet to another, and the ifnet has ifaddresses of AF_LINK type which thus have an embedded if_index "backpointer", we must update that if_index backpointer to reflect the new if_index that our ifnet just got assigned. This change affects only options VIMAGE builds. Submitted by: bz Reviewed by: bz Approved by: re (rwatson), julian (mentor)	2009-08-24 10:14:09 +00:00
Robert Watson	6852110b64	Rather than using IFNET_RLOCK() when iterating over (and modifying) the ifnet list during if_ef load, directly acquire the ifnet_sxlock exclusively. That way when if_alloc() recurses the lock, it's a write recursion rather than a read->write recursion. This code structure is arguably a bug, so add a comment indicating that this is the case. Post-8.0, we should fix this, but this commit resolves panic-on-load for if_ef. Discussed with: bz, julian Reported by: phk MFC after: 3 days	2009-08-23 21:00:21 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Julian Elischer	cd81cd3fd1	Don't allow access to the internals until it has all been set up. Specifically, not until the per-vnet parts have been set up. Submitted by: kmacy@ Reviewed by: julian@, zec@ Approved by: re(rwatson) MFC after: immediately	2009-08-21 09:22:32 +00:00
Kip Macy	6d37c3ecd9	This change fixes a comment and addresses a complaint by kib@ by moving a frequently executed flowtable syslog statement from being conditional on bootverbose to conditional on a per-vnet flowtable sysctl. Approved by: re@	2009-08-19 20:13:09 +00:00
Kip Macy	3ee42584f9	- change the interface to flowtable_lookup so that we don't rely on the mbuf for obtaining the fib index - check that a cached flow corresponds to the same fib index as the packet for which we are doing the lookup - at interface detach time flush any flows referencing stale rtentrys associated with the interface that is going away (fixes reported panics) - reduce the time between cleans in case the cleaner is running at the time the eventhandler is called and the wakeup is missed less time will elapse before the eventhandler returns - separate per-vnet initialization from global initialization (pointed out by jeli@) Reviewed by: sam@ Approved by: re@	2009-08-18 20:28:58 +00:00
Kip Macy	d53e359b9a	fix netboot issue by disabling flowtable lookups until initialization has been run Reviewed by: rwatson@ Approved by: re@	2009-08-17 19:09:28 +00:00
Robert Watson	d931ea0961	Remove unused if_rawoutput() macro; it has been unused since at least FreeBSD 2. Approved by: re (kib)	2009-08-15 22:26:26 +00:00
Marko Zec	9abb486279	Appease VNET_DEBUG - in if_vmove we temporarily switch i.e. recurse from one vnet to another which is OK, so no need to flood the console with warnings here. Approved by: re (rwatson), julian (mentor)	2009-08-14 22:46:45 +00:00
Marko Zec	67addcde86	Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from INVARIANTS. Reviewed by: bz Approved by: re (rwatson), julian (mentor)	2009-08-14 22:41:39 +00:00
Bjoern A. Zeeb	eb79e1c76e	Make it possible to change the vnet sysctl variables on jails with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)	2009-08-13 10:26:34 +00:00
Bjoern A. Zeeb	20b0cdb749	Put multiple instructions into a block when iterating; unbreaks NET_RT_DUMP, which otherwise only returned information of AF_MAX. This was broken in r193232 (save your time - my bug, my fix). PR: kern/137700 Reported by: Larry Baird (lab gta.com) Tested by: Larry Baird (lab gta.com) Reviewed by: zec, lstewart, qing Approved by: re (kib)	2009-08-13 09:29:52 +00:00
Jung-uk Kim	a36599cce7	Always embed pointer to BPF JIT function in BPF descriptor to avoid inconsistency when opt_bpf.h is not included. Reviewed by: rwatson Approved by: re (rwatson)	2009-08-12 17:28:53 +00:00
Bjoern A. Zeeb	281c86a4ef	Update DDB show vnet command to print all used and available information. Reviewed by: rwatson, zec Approved by: re	2009-08-12 12:00:21 +00:00
Bjoern A. Zeeb	1b501e53f3	Put minimum alignment on the dpcpu and vnet section so that ld when adding the __start_ symbol knows the expected section alignment and can place the __start_ symbol correctly. These sections will not support symbols with super-cache line alignment requirements. For full details, see posting to freebsd-current, 2009-08-10, Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>. Debugging and testing patches by: Kamigishi Rei (spambox haruhiism.net), np, lstewart, jhb, kib, rwatson Tested by: Kamigishi Rei, lstewart Reviewed by: kib Approved by: re	2009-08-12 10:26:03 +00:00
Robert Watson	315e3e38fa	Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib)	2009-08-02 19:43:32 +00:00
Robert Watson	6aad5c1c93	The colour was red as shall be the letters of this warning to people upon boot if the experimental VIMAGE feature was compiled into the kernel. Submitted by: bz Reviewed by: zec Approved by: re (vimage blanket)	2009-08-01 22:22:45 +00:00
Robert Watson	c8f6a13820	Minor style tweaks. Approved by: re (vimage blanket)	2009-08-01 21:58:32 +00:00
Robert Watson	6bc2c7b70c	Make the vnet alloc/destroy paths a bit easier to followg by merging vnet_data_init/vnet_data_destroy into vnet_alloc/vnet_destroy. Reviewed by: bz, zec Approved by: re (vimage blanket)	2009-08-01 21:54:15 +00:00
Robert Watson	7429a3f3d8	Remove vnet_foreach() utility function, which previously allowed vnet.c to iterate virtual network stacks without being aware of the implementation details previously hidden in kern_vimage.c. Now they are in the same file, so remove this added complexity. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 20:24:45 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	ed3db012fc	Reorder and recomment vnet.c and vnet.h on the basis that they are no longer solely about the virtual network stack memory allocator. Approved by: re (vimage blanket)	2009-07-30 12:41:19 +00:00
Robert Watson	a9bcca799e	Revise header comments for vnet.h as we now implement VNET_SYSINIT, not just VNET_DEFINE in vnet.h. Approved by: re (vimage blanket)	2009-07-28 22:17:34 +00:00
Qing Li	9fca4f79c7	The new flow table caches both the routing table entry as well as the L2 information. For an indirect route the cached L2 entry contains the MAC address of the gateway. Typically the default route is used to transmit multicast packets when explicit multicast routes are not available. The ether_output() function bypasses L2 resolution function if it verifies the L2 cache is valid, because the cached L2 address (a unicast MAC address) is copied into the packets as the destination MAC address. This validation, however, does not apply to broadcast and multicast packets because the destination MAC address is mapped according to a standard method instead. Submitted by: Xin Li Reviewed by: bz Approved by: re	2009-07-28 17:16:54 +00:00
Qing Li	df813b7ea2	This patch does the following: - Allow loopback route to be installed for address assigned to interface of IFF_POINTOPOINT type. - Install loopback route for an IPv4 interface addreess when the "useloopback" sysctl variable is enabled. Similarly, install loopback route for an IPv6 interface address when the sysctl variable "nd6_useloopback" is enabled. Deleting loopback routes for interface addresses is unconditional in case these sysctl variables were disabled after an interface address has been assigned. Reviewed by: bz Approved by: re	2009-07-27 17:08:06 +00:00
Bjoern A. Zeeb	d0ea47437a	Update epair(4) to the new netisr implementation and polish things a bit: - use dpcpu data to track the ifps with packets queued up, - per-cpu locking and driver flags - along with .nh_drainedcpu and NETISR_POLICY_CPU. - Put the mbufs in flight reference count, preventing interfaces from going away, under INVARIANTS as this is a general problem of the stack and should be solved in if.c/netisr but still good to verify the internal queuing logic. - Permit changing the MTU to virtually everythinkg like we do for loopback. Hook epair(4) up to the build. Approved by: re (kib)	2009-07-26 12:20:07 +00:00
Bjoern A. Zeeb	be31e5e7b5	Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls (ifconfig ifN (-)vnet <jname\|jid>) work correctly. Move vi_if_move to if.c and split it up into two functions(), one for each ioctl. In the reclaim case, correctly set the vnet before calling if_vmove. Instead of silently allowing a move of an interface from the current vnet to the current vnet, return an error. () There is some duplicate interface name checking before actually moving the interface between network stacks without locking and thus race prone. Ideally if_vmove will correctly and automagically handle these in the future. Suggested by: rwatson (*) Approved by: re (kib)	2009-07-26 11:29:26 +00:00
Robert Watson	d0728d7174	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
Bjoern A. Zeeb	a08362ce46	sysctl_msec_to_ticks is used with both virtualized and non-vrtiualized sysctls so we cannot used one common function. Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code. Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there. Convert the two over places to use the new macro. Reviewed by: rwatson Approved by: re (kib)	2009-07-21 21:58:55 +00:00
Robert Watson	0a4747d4d0	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
Robert Watson	17ef1feb8a	Add macros VNET_SETNAME and VNET_SYMPREFIX, and expose to userspace if _WANT_VNET is defined. This way we don't need separate definitions in libkvm. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 07:50:50 +00:00
Robert Watson	006e9db452	Normalize field naming for struct vnet, fix two debugging printfs that print them. Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-19 17:40:45 +00:00
Robert Watson	5ee847d3ac	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
Jamie Gritton	7afcbc18b3	Remove the interim vimage containers, struct vimage and struct procg, and the ioctl-based interface that supported them. Approved by: re (kib), bz (mentor)	2009-07-17 14:48:21 +00:00
Robert Watson	1e77c1056a	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
Robert Watson	c1e200ffcc	Add missing license line for vnet.h, correct white space nit. Approved by: re (kensmith) (implicit)	2009-07-15 00:56:15 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Kip Macy	6a7bff2c31	Re-factoring for adding weighted routes introduced a fairly irritating bug where the system will panic when RADIX_MPATH is enabled. This change fixes this. Approved by: re@	2009-07-11 21:56:23 +00:00
Rui Paulo	59aa14a91d	Implementation of the upcoming Wireless Mesh standard, 802.11s, on the net80211 wireless stack. This work is based on the March 2009 D3.0 draft standard. This standard is expected to become final next year. This includes two main net80211 modules, ieee80211_mesh.c which deals with peer link management, link metric calculation, routing table control and mesh configuration and ieee80211_hwmp.c which deals with the actually routing process on the mesh network. HWMP is the mandatory routing protocol on by the mesh standard, but others, such as RA-OLSR, can be implemented. Authentication and encryption are not implemented. There are several scripts under tools/tools/net80211/scripts that can be used to test different mesh network topologies and they also teach you how to setup a mesh vap (for the impatient: ifconfig wlan0 create wlandev ... wlanmode mesh). A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled by default on GENERIC kernels for i386, amd64, sparc64 and pc98. Drivers that support mesh networks right now are: ath, ral and mwl. More information at: http://wiki.freebsd.org/WifiMesh Please note that this work is experimental. Also, please note that bridging a mesh vap with another network interface is not yet supported. Many thanks to the FreeBSD Foundation for sponsoring this project and to Sam Leffler for his support. Also, I would like to thank Gateworks Corporation for sending me a Cambria board which was used during the development of this project. Reviewed by: sam Approved by: re (kensmith) Obtained from: projects/mesh11s	2009-07-11 15:02:45 +00:00
Bjoern A. Zeeb	ba3b25b35a	In case we cannot queue a packet reaching the queue limit, retain the semantics netisr_queue() always had and free the mbuf along with returning the error. Reviewed by: rwatson Approved by: re (kensmith)	2009-06-30 05:21:00 +00:00
Brooks Davis	6cb7f168db	Remove support for the /dev/net/* per-interface devices. They serve little purpose and are unused in the base system. The IOCTL functionality is entirely duplicated and routing sockets provide a richer interface than the kqueue functionality. Further, it is not practical for these devices to be made sensible in the face of VIMAGE. Bump __FreeBSD_version on the off chance that there is any code out there that actually uses this stuff. Reviewed by: rwatson Discussed with: bz, zec Approved by: re@ (kensmith)	2009-06-29 19:46:29 +00:00
Robert Watson	395cbe82d2	Remove unnecessary include of kdb.h that snuck in during ifaddr refcount work. Reported by: pluknet <pluknet at gmail.com> Approved by: re (kib)	2009-06-27 10:30:28 +00:00
Robert Watson	9e6e01ebf6	In light of DPCPU use by netisr, revise various for loops from using MAXCPU to mp_maxid, and handling and reporting of requests to use more threads than we have CPUs to run them on. Reviewed by: bz Approved by: re (kib) MFC after: 6 weeks	2009-06-26 20:39:36 +00:00
Robert Watson	ba16a0fab1	Use if_addr_rlock/if_addr_runlock for if_spp when iterating if_addrhead, as it is loadable as a module. Approved by: re (kib) MFC after: 6 weeks	2009-06-26 18:50:49 +00:00
Robert Watson	3893212ddc	Update if_stf and if_tun to use if_addr_rlock()/if_addr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK() when iterating ifp->if_addrhead. MFC after: 6 weeks	2009-06-26 00:45:20 +00:00
Robert Watson	f9ef96ca71	Define four wrapper functions for interface address locking, if_addr_rlock() and if_addr_runlock() for regular address lists, and if_maddr_rlock() and if_maddr_runlock() for multicast address lists. We will use these in various kernel modules to avoid encoding specific type and locking strategy information into modules that currently use IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly. MFC after: 6 weeks	2009-06-26 00:36:47 +00:00
Robert Watson	534027673b	Convert netisr to use dynamic per-CPU storage (DPCPU) instead of sizing arrays to [MAXCPU], offering moderate memory savings. In some places, this requires using CPU_ABSENT() to handle less common platforms with sparse CPU IDs. In several places, assert that the selected CPUID for work placement or statistics is not CPU_ABSENT() to be on the safe side. Discussed with: bz, jeff	2009-06-26 00:19:25 +00:00
Konstantin Belousov	9f80ce043d	Change the type of uio_resid member of struct uio from int to ssize_t. Note that this does not actually enable full-range i/o requests for 64 architectures, and is done now to update KBI only. Tested by: pho Reviewed by: jhb, bde (as part of the review of the bigger patch)	2009-06-25 18:46:30 +00:00
Robert Watson	2d9cfabad4	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz	2009-06-25 11:52:33 +00:00
Bjoern A. Zeeb	98c230c87e	Merge from p4: CH154790,154793,154874 Import if_epair(4), a virtual cross-over Ethernet-like interface pair. Note these files are 1:1 from p4 and not yet connected to the build not knowing about the new netisr interface. Sponsored by: The FreeBSD Foundation	2009-06-24 22:21:30 +00:00
Navdeep Parhar	456ae55008	Add 10Gbase-T to known ethernet media types. Approved by: gnn (mentor) MFC after: 1 week.	2009-06-24 21:53:25 +00:00
Navdeep Parhar	52d9cb1252	About to add 10Gbase-T to known media types, this is just a whitespace cleanup before that commit. No functional impact. Approved by: gnn (mentor)	2009-06-24 21:51:42 +00:00
Robert Watson	3baaf2974d	In if_setlladdr(), use IF_ADDR_LOCK() and ifaddr references to improve the safety of link layer address manipulation. MFC after: 6 weeks	2009-06-24 10:36:48 +00:00
Robert Watson	6c7ffe9340	Break at_ifawithnet() into two variants: - at_ifawithnet(), which acquires an locks it needs and returns an at_ifaddr reference. - at_ifawithnet_locked(), which relies on the caller locking at_ifaddr_list, and returns a pointer rather than a reference. Update various consumers to prefer one or the other, including ether and fddi output, to properly release at_ifaddr references. Rework at_control() to manage locking and references in a manner identical to in_control(). MFC after: 6 weeks	2009-06-24 10:32:44 +00:00
Robert Watson	5c66449004	Lock if_addrhead when iterating, and where necessary acquire and release ifadr references in if_sppp. MFC after: 6 weeks	2009-06-24 08:53:23 +00:00
Robert Watson	fe0ecfd64d	Make stf_getsrcifa6() return a reference to an in6_ifaddr rather than a pointer, and dispose of the references when no longer needed. MFC after: 6 weeks	2009-06-24 08:52:09 +00:00
Robert Watson	8c0fec805f	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)	2009-06-23 20:19:09 +00:00
Bjoern A. Zeeb	5736e6fb9d	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
Bjoern A. Zeeb	a877d0cffa	Remove duplicate #include <net/route.h> from the middle of the file.	2009-06-23 13:16:16 +00:00
Marko Zec	fa057b15bd	V_irtualize flowtable state. This change should make options VIMAGE kernel builds usable again, to some extent at least. Note that the size of struct vnet_inet has changed, though in accordance with one-bump-per-day policy we didn't update the __FreeBSD_version number, given that it has already been touched by r194640 a few hours ago. Reviewed by: bz Approved by: julian (mentor)	2009-06-22 21:19:24 +00:00
Bjoern A. Zeeb	3952a5abc9	Updates after r194640: - shrink size guards for vnet_net. vnet_rtable does not need size guards as it is self-contained. - remove a bunch of defines from vnet.h no longer valid.	2009-06-22 17:56:07 +00:00
Bjoern A. Zeeb	b58ea5f310	Move virtualization of routing related variables into their own Vimage module, which had been there already but now is stateful. All variables are now file local; so this further limits the global spreading of routing related things throughout the kernel. Add a missing function local variable in case of MPATHing. Reviewed by: zec	2009-06-22 17:48:16 +00:00
Bjoern A. Zeeb	f987f19301	Collect all VIMAGE_GLOBALS variables in one place. No longer export rt_tables as all lookups go through rt_tables_get_rnh(). We cannot make rt_tables (and rtstat, rttrash[1]) static as netstat -r (-rs[1]) would stop working on a stripped VIMAGE_GLOBALS kernel. Reviewed by: zec Presumably broken by: phk 13.5y ago in r12820 [1]	2009-06-22 15:07:12 +00:00
Robert Watson	8896f83a58	Add a new function, ifa_ifwithaddr_check(), which rather than returning a pointer to an ifaddr matching the passed socket address, returns a boolean indicating whether one was present. In the (near) future, ifa_ifwithaddr() will return a referenced ifaddr rather than a raw ifaddr pointer, and the new wrapper will allow callers that care only about the boolean condition to avoid having to free that reference. MFC after: 3 weeks	2009-06-22 10:59:34 +00:00
Bjoern A. Zeeb	bed56bb51b	After the update to fxp(4) in r194573 we should no longer need this DELAY(100) hack introduced in r56938. Thanks to: yongari MFC after: 6 weeks X-MFC note: not before the fxp(4) changes	2009-06-22 10:27:20 +00:00
Robert Watson	1099f828b3	Clean up common ifaddr management: - Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks	2009-06-21 19:30:33 +00:00
Roman Divacky	e40bae9a45	Switch cmd argument to u_long. This matches what if_ethersubr.c does and allows the code to compile cleanly on amd64 with clang. Reviewed by: rwatson Approved by: ed (mentor)	2009-06-21 10:29:31 +00:00
Roman Divacky	2b7d10c225	In non-debugging mode make this define (void)0 instead of nothing. This helps to catch bugs like the below with clang. if (cond); <--- note the trailing ; something(); Approved by: ed (mentor) Discussed on: current@	2009-06-21 08:49:06 +00:00
Kip Macy	d49cd9a18e	add helper function for flushing software queues	2009-06-19 23:11:20 +00:00
Christian S.J. Peron	0e37f3e196	Implement the -z (zero counters) option for the various bpf counters. Add necessary changes to the kernel for this (basically introduce a bpf_zero_counters() function). As well, update the man page. MFC after: 1 month Discussed with: rwatson	2009-06-19 20:31:44 +00:00
Bjoern A. Zeeb	ebd8672cc3	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
Bjoern A. Zeeb	7654a365db	Add the explicit include of vimage.h to another five .c files still missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.	2009-06-17 12:44:11 +00:00
Sam Leffler	d659538f72	r193336 moved ifq_detach to if_free which broke if_alloc followed by if_free (w/o doing if_attach); move ifq_attach to if_alloc and rename ifq_attach/detach to ifq_init/ifq_delete to better identify their purpose Reviewed by: jhb, kmacy	2009-06-15 19:50:03 +00:00
Jamie Gritton	9ed47d01eb	Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 19:01:53 +00:00
Jamie Gritton	679e13901c	Manage vnets via the jail system. If a jail is given the boolean parameter "vnet" when it is created, a new vnet instance will be created along with the jail. Networks interfaces can be moved between prisons with an ioctl similar to the one that moves them between vimages. For now vnets will co-exist under both jails and vimages, but soon struct vimage will be going away. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 18:59:29 +00:00
Bjoern A. Zeeb	ed655c8c07	Add an optional callback function that will be invoked when a per-CPU queue was drained. It will never fire for a directly dispatched packet. You will most likely never want to use this for any ordinary netisr usage and you will never blame netisr in case you try to use it and it does not work as expected. Reviewed by: rwatson	2009-06-14 17:15:18 +00:00
Bjoern A. Zeeb	736801ace4	Garbage collect an extern for a non-existent variable. While here let the comment end in a '.' and mark the #endif of _KERNEL. Reviewed by: rwatson (as part of a larger patch)	2009-06-12 20:50:28 +00:00
Bjoern A. Zeeb	53be8fca00	Move the kernel option FLOWTABLE chacking from the header file to the actual implementation. Remove the accessor functions for the compiled out case, just returning "unavail" values. Remove the kernel conditional from the header file as it is no longer needed, only leaving the externs. Hide the improperly virtualized SYSCTL/TUNABLE for the flowtable size under the kernel option as well. Reviewed by: rwatson	2009-06-12 20:46:36 +00:00
VANHULLEBUS Yvan	7b495c4494	Added support for NAT-Traversal (RFC 3948) in IPsec stack. Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ	2009-06-12 15:44:35 +00:00
Bjoern A. Zeeb	259d2d5431	carp(4) allows people to share a set of IP addresses and can only use IPv4/v6 for inter-node communication (according to my reading). Properly wrap the carp callouts in INET \|\| INET6 and refelect this in sys/conf/files as well. While in theory this should be ok, it might be a bit optimistic to think that carp could build with inet6 only[1]. Discussed with: mlaier [1]	2009-06-11 10:26:38 +00:00
Konstantin Belousov	d8b0556c6d	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland	2009-06-10 20:59:32 +00:00
Bjoern A. Zeeb	c03528b663	SCTP needs either IPv4 or IPv6 as lower layer[1]. So properly hide the already #ifdef SCTP code with #if defined(INET) \|\| defined(INET6) as well to get us closer to a non-INET/INET6 kernel. Discussed with: tuexen [1]	2009-06-10 14:36:59 +00:00
Bjoern A. Zeeb	974524bf59	ip_gif_ttl/GIF_TTL are only used by the inet part in in_gif.c, so put the initialization under #ifdef INET.	2009-06-10 13:39:51 +00:00
Bjoern A. Zeeb	74900cdf7f	The llentry *lle is only used in cases of INET or INET6. Put the variable declaration under proper #ifdefs. In case variables are only needed for one of the two AFs more them into proper scope.	2009-06-10 09:07:05 +00:00
Kip Macy	3576e2f4a2	revert to opt-in flowtable	2009-06-09 21:55:28 +00:00
Oleg Bulyzhin	dda10d624c	Close long existed race with net.inet.ip.fw.one_pass = 0: If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc) it carries pointer to matching ipfw rule. If this packet then reinjected back to ipfw, ruleset processing starts from that rule. If rule was deleted meanwhile, due to existed race condition panic was possible (as well as other odd effects like parsing rules in 'reap list'). P.S. this commit changes ABI so userland ipfw related binaries should be recompiled. MFC after: 1 month Tested by: Mikolaj Golub	2009-06-09 21:27:11 +00:00
Kip Macy	15d13a59a3	make flowtable opt-out	2009-06-09 20:27:30 +00:00
Kip Macy	ee117d034c	move jenkins hash to its own header in libkern	2009-06-09 20:21:40 +00:00
Kip Macy	a913be0917	- add drbr routines for accessing #qentries and conditionally dequeueing - track bytes enqueued in buf_ring	2009-06-09 19:19:16 +00:00
Bjoern A. Zeeb	c2f16e371e	Remove one INET dependency by calling the general AF agnostic version for doing the routing lookup. Reviewed by: kmacy	2009-06-09 09:50:43 +00:00
Hiroki Sato	4cd5f57d6b	Style fix. Submitted by: bz	2009-06-09 08:09:30 +00:00
Hiroki Sato	fb70e72b0c	- Fix sanity check of GIFSOPTS ioctl. - Rename option mask s/GIF_FULLOPTS/GIF_OPTMASK/ Spotted by: Eygene Ryabinkin, delphij	2009-06-09 02:27:59 +00:00
Bjoern A. Zeeb	5c40c5e989	Remove two unneeded, hidden includes.	2009-06-08 20:04:46 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Marko Zec	bc29160df3	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
Hiroki Sato	dbe5926046	Fix and add a workaround on an issue of EtherIP packet with reversed version field sent via gif(4)+if_bridge(4). The EtherIP implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had an interoperability issue because it sent the incorrect EtherIP packets and discarded the correct ones. This change introduces the following two flags to gif(4): accept_rev_ethip_ver: accepts both correct EtherIP packets and ones with reversed version field, if enabled. If disabled, the gif accepts the correct packets only. This flag is enabled by default. send_rev_ethip_ver: sends EtherIP packets with reversed version field intentionally, if enabled. If disabled, the gif sends the correct packets only. This flag is disabled by default. These flags are stored in struct gif_softc and can be set by ifconfig(8) on per-interface basis. Note that this is an incompatible change of EtherIP with the older FreeBSD releases. If you need to interoperate older FreeBSD boxes and new versions after this commit, setting "send_rev_ethip_ver" is needed. Reviewed by: thompsa and rwatson Spotted by: Shunsuke SHINOMIYA PR: kern/125003 MFC after: 2 weeks	2009-06-07 23:00:40 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Luigi Rizzo	115a40c7bf	More cleanup in preparation of ipfw relocation (no actual code change): + move ipfw and dummynet hooks declarations to raw_ip.c (definitions in ip_var.h) same as for most other global variables. This removes some dependencies from ip_input.c; + remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly; + remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly; + move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it; To be merged together with rev 193497 MFC after: 5 days	2009-06-05 13:44:30 +00:00
Sam Leffler	c9dd371765	move ifq_detach from if_detach to if_free; this permits callers to reference if_snd in the period between detach+free which helps simplify detach code Reviewed by: jhb, rwatson	2009-06-02 18:53:21 +00:00

1 2 3 4 5 ...

2678 Commits