freebsd-skq

Author	SHA1	Message	Date
kib	1bf8e84d75	Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST. This allows getifaddrs(3) to work for compat32 binaries. Submitted by: jhb (6.x version) Reviewed by: emaste Tested by: emaste and <pluknet gmail com> MFC after: 2 weeks	2010-04-25 16:42:47 +00:00
julian	e5e811b1e9	Move two copies of the same definition to a common include file. MFC after: 3 weeks	2010-04-14 23:06:07 +00:00
delphij	58be1647f8	When an underlying ioctl(2) handler returns an error, our ioctl(2) interface considers that it hits a fatal error, and will not copyout the request structure back for _IOW and _IOWR ioctls, keeping them untouched. The previous implementation of the SIOCGIFDESCR ioctl intends to feed the buffer length back to userland. However, if we return an error, the feedback would be defeated and ifconfig(8) would trap into an infinite loop. This commit changes SIOCGIFDESCR to set buffer field to NULL to indicate the previous ENAMETOOLONG case. Reported by: bschmidt MFC after: 2 weeks	2010-04-14 22:02:19 +00:00
bz	778f026aa2	Take a reference to make sure that the interface cannot go away during if_clone_destroy() in case parallel threads try to. PR: kern/116837 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 10 days	2010-04-11 18:47:38 +00:00
bz	35f1e4bd73	Check that the interface is on the list of cloned interfaces before trying to remove it to avoid panics in case of two threads trying to remove it in parallel. PR: kern/116837 Submitted by: Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version) MFC after: 10 days	2010-04-11 18:41:31 +00:00
bz	d7a91dc6bf	Plug reference leaks in the link-layer code ("new-arp") that previously prevented the link-layer entry from being freed. In both in.c and in6.c (though that code path seems to be basically dead) plug a reference leak in case of a pending callout being drained. In if_ether.c consistently add a reference before resetting the callout and in case we canceled a pending one remove the reference for that. In the final case in arptimer, before freeing the expired entry, remove the reference again and explicitly call callout_stop() to clear the active flag. In nd6.c:nd6_free() we are only ever called from the callout function and thus need to remove the reference there as well before calling into llentry_free(). In if_llatbl.c when freeing entire tables make sure that in case we cancel a pending callout to remove the reference as well. Reviewed by: qingli (earlier version) MFC after: 10 days Problem observed, patch tested by: simon on ipv6gw.f.o, Christian Kratzer (ck cksoft.de), Evgenii Davidov (dado korolev-net.ru) PR: kern/144564 Configurations still affected: with options FLOWTABLE	2010-04-11 16:04:08 +00:00
bz	d5b3b3f9d9	In if_detach_internal() we cannot hold the af_data lock over the dom_ifdetach() calls as they might sleep for callout_drain(). Do as we do in if_attachdomain1() [r121470] and handle if_afdata_initialized earlier and call dom_ifdetach() unlocked. Discussed with: rwatson MFC after: 10 days	2010-04-11 11:51:44 +00:00
bz	674a87c918	In if_detach_internal() only try to do the detach run if if_attachdomain1() has actually succeeded to initialize and attach. There is a theoretical possibility to drop out early in if_attachdomain1() leaving the array uninitialized if we cannot get the lock. Discussed with: rwatson MFC after: 10 days	2010-04-11 11:49:24 +00:00
jkim	2b1849a3fc	Check the pointer to JIT binary filter before its de-allocation. Submitted by: Alexander Sack (asack at niksun dot com) MFC after: 3 days	2010-03-29 20:24:03 +00:00
rpaulo	e14131f581	Add MCS to the list of media types. Sponsored by: iXsystems, inc.	2010-03-23 13:15:11 +00:00
kmacy	01cb21605b	- boot-time size the ipv4 flowtable and the maximum number of flows - increase flow cleaning frequency and decrease flow caching time when near the flow limit - stop allocating new flows when within 3% of maxflows don't start allocating again until below 12.5% MFC after: 7 days	2010-03-22 23:04:12 +00:00
emaste	41b3806743	Avoid holding the VLAN_LOCK() over the parent interface SIOCGIFMEDIA ioctl call, as it may sleep. Reviewed by: rwatson	2010-03-21 15:00:33 +00:00
bz	102a1f8933	Split eventhandler_register() into an internal part and a wrapper function that provides the allocated and setup eventhandler entry. Add a new wrapper for VIMAGE that allocates extra space to hold the callback function and argument in addition to an extra wrapper function. While the wrapper function goes as normal callback function the argument points to the extra space allocated holding the original func and arg that the wrapper function can then call. Provide an iterator function for the virtual network stack (vnet) that will call the callback function for each network stack. Provide a new set of macros for VNET that in the non-VIMAGE case will just call eventhandler_register() while in the VIMAGE case it will use vimage_eventhandler_register() passing in the extra iterator function but will only register once rather than per-vnet. We need a special macro in case we are interested in the tag returned as we must check for curvnet and can neither simply assign the return value, nor not change it in the non-vnet0 case without that. Sponsored by: ISPsystem Discussed with: jhb Reviewed by: zec (earlier version), jhb MFC after: 1 month	2010-03-19 19:51:03 +00:00
bz	abd6b4ebb7	Add ddb support to the "new" link layer code ("new-arp"): - show all lltables [1] (optional flag to also show the llentries as well) - show lltable <struct lltable > - show llentry <struct llentry > MFC after: 6 days	2010-03-18 09:09:59 +00:00
qingli	4ff4954e4e	Verify interface up status using its link state only if the interface has such capability. The interface capability flag indicates whether such capability exists. This approach is much more backward compatible. Physical device driver changes will be part of another commit. Also updated the ifconfig utility to show the LINKSTATE capability if present. Reviewed by: rwatson, imp, juli MFC after: 3 days	2010-03-16 17:59:12 +00:00
mlaier	d62719cf37	Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834. MFC after: 3 days	2010-03-15 21:15:03 +00:00
kmacy	a5e0110227	flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB pointed out by jkim@	2010-03-12 19:58:51 +00:00
jkim	5e802872af	Fix a style(9) nit.	2010-03-12 19:42:42 +00:00
kmacy	965730af4f	re-update copyright to 2010 pointed out by danfe@	2010-03-12 19:26:45 +00:00
jkim	df5e72589a	Tidy up callout for select(2) and read timeout. - Add a missing callout_drain(9) before the descriptor deallocation.[1] - Prefer callout_init_mtx(9) over callout_init(9) and let the callout subsystem handle the mutex for callout function. PR: kern/144453 Submitted by: Alexander Sack (asack at niksun dot com)[1] MFC after: 1 week	2010-03-12 19:14:58 +00:00
qingli	5c999930df	The flow-table module retrieves the destination and source address as well as the transport protocol port information from the outbound packets. The routing code is generic and compares every byte in the given sockaddr object. Therefore the temporary sockaddr objects must be cleared due to padding bytes. In addition, the port information must be stripped or the route search will either fail or return the incorrect route entry. Unit testing is done using OpenVPN over the if_tun interface. MFC after: 7 days	2010-03-12 10:24:58 +00:00
kmacy	0315e29550	fix stats reporting sysctl	2010-03-12 06:31:19 +00:00
kmacy	128542c758	- restructure flowtable to support ipv6 - add a name argument to flowtable_alloc for printing with ddb commands - extend ddb commands to print destination address or 4-tuples - don't parse ports in ulp header if FL_HASH_ALL is not passed - add kern_flowtable_insert to enable more generic use of flowtable (e.g. system calls for adding entries) - don't hash loopback addresses - cleanup whitespace - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention - add sysctls to accumulate stats and report aggregate MFC after: 7 days	2010-03-12 05:03:26 +00:00
qingli	cde322640a	The if_tap interface is of IFT_ETHERNET type, but it does not set or update the if_link_state variable. As such RT_LINK_IS_UP() fails for the if_tap interface. Also, the RT_LINK_IS_UP() needs to bypass all loopback interfaces because loopback interfaces are considered up logically as long as the system is running. This patch fixes the above issues by setting and updating the if_link_state variable when the tap interface is opened or closed respectively. Similary approach is already done in the if_tun device. MFC after: 3 days	2010-03-11 17:56:46 +00:00
qingli	93013817b0	One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days	2010-03-09 01:11:45 +00:00
delphij	fe5f1f57b8	Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg interface. The check itself seems to be coming from OpenBSD but does not seem to be useful for our code. Discussed with: thomasa MFC after: 1 month	2010-03-09 00:52:16 +00:00
bz	07f7a52d59	Not only flush the ipfw tables when unloading ipfw or tearing down a virtual netowrk stack, but also free the Radix Node Head. Sponsored by: ISPsystem Reviewed by: julian MFC after: 5 days	2010-03-07 15:37:58 +00:00
bz	8ef34a3986	Introduce a function rn_detachhead() that will free the radix table root nodes. This is only needed (and available) in the virtualization case to free the resources when tearing down a virtual network stack. Sponsored by: ISPsystem Reviewed by: julian, zec MFC after: 5 days	2010-03-06 21:27:26 +00:00
bz	3338a0e11e	Rework reference counting in case we queue into the netisr, or overflow the netisr queue and fall back to the interface queue so that we can garuantee that the ifnet pointer stays valid. Formerly we ended up with reference counts <= 0 in case the netisr had returned ENOBUFS. The idea is to track any packet in the netisr queue and only change the refount on edge operations for the fallback interface queue. This also avoids problems in case the if_snd.ifq_len lies to us. Also rework refount assertions to make sure they trigger if we go below 1. Formerly a negative refence count did not trigger the assert as the refcount variable is u_int. Sponsored by: ISPsystem MFC after: 5 days	2010-03-06 21:22:28 +00:00
luigi	5ceeac4aa8	Bring in the most recent version of ipfw and dummynet, developed and tested over the past two months in the ipfw3-head branch. This also happens to be the same code available in the Linux and Windows ports of ipfw and dummynet. The major enhancement is a completely restructured version of dummynet, with support for different packet scheduling algorithms (loadable at runtime), faster queue/pipe lookup, and a much cleaner internal architecture and kernel/userland ABI which simplifies future extensions. In addition to the existing schedulers (FIFO and WF2Q+), we include a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new, very fast version of WF2Q+ called QFQ. Some test code is also present (in sys/netinet/ipfw/test) that lets you build and test schedulers in userland. Also, we have added a compatibility layer that understands requests from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries, and replies correctly (at least, it does its best; sometimes you just cannot tell who sent the request and how to answer). The compatibility layer should make it possible to MFC this code in a relatively short time. Some minor glitches (e.g. handling of ipfw set enable/disable, and a workaround for a bug in RELENG_7's /sbin/ipfw) will be fixed with separate commits. CREDITS: This work has been partly supported by the ONELAB2 project, and mostly developed by Riccardo Panicucci and myself. The code for the qfq scheduler is mostly from Fabio Checconi, and Marta Carbone and Francesco Magno have helped with testing, debugging and some bug fixes.	2010-03-02 17:40:48 +00:00
luigi	1a096d21d2	remove unnecessary casts leftover from a bogus fix to a previous bug	2010-03-02 16:24:16 +00:00
alfred	f34ce3dd38	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan	2010-03-02 06:58:58 +00:00
joel	bb682915c9	The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software. Obtained from: NetBSD	2010-03-01 17:05:46 +00:00
rwatson	384e695fac	Whitespace tweak. MFC after: 3 days	2010-03-01 00:43:05 +00:00
rwatson	0b098aa759	Changes to support crashdump analysis of netisr: - Rename the netisr protocol registration array, 'np' to 'netisr_proto', in order to reduce the chances of symbol name collisions. It remains statically defined, but it will be looked up by netstat(1). - Move certain internal structure definitions from netisr.c to netisr_internal.h so that netstat(1) can find them. They remain private, and should not be used for any other purpose (for example, they should not be used by kernel modules, which must instead use the public interfaces in netisr.h). - Store a kernel-compiled version of NETISR_MAXPROT in the global variable netisr_maxprot, and export via a sysctl, so that it is available for use by netstat(1). This is especially important for crashdump interpretation, where the size of the workstream structure is determined by the maximum number of protocols compiled into the kernel. MFC after: 1 week Sponsored by: Juniper Networks	2010-03-01 00:42:36 +00:00
kib	f93a8fa5df	In both if_tun and if_tap: Do not do additional dev_ref() on the newly created interface in the if_clone create method [1]. This reference is not needed and never removed, causing struct cdevpriv leakage. Remove the setting of SI_CHEAPCLONE flag as well, since it is unused. For dev_clone handlers, create cdevs with the call make_dev_credf(MAKEDEV_REF) instead of calling make_dev() and then dev_ref(), to avoid a race. Call drain_dev_clone_events() at the module unload time after dev_clone handler is deinstalled. Submitted by: Mikolaj Golub <to.my.trociny gmail com> [1] MFC after: 1 week	2010-02-28 16:25:49 +00:00
rwatson	e0805e87a3	Fix edge cases in several KASSERTs: use <= rather than < when testing that counters have not gone about MAXCPU or NETISR_MAXPROT. These problems caused panics on UP kernels with INVARIANTS when using sysctl -a, but would also have caused problems for 32-core boxes or if the netisr protocol vector was fully populated. Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com> MFC after: 4 days	2010-02-25 09:51:14 +00:00
bz	44a0d7a588	Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets' in the db_show_all_table as 'show all ifnets' and with that follow the convention for showing complete lists. Submitted by: thompsa MFC after: 3 days	2010-02-24 15:54:24 +00:00
rwatson	79e4f88402	Fix constant assignment for netisr protocol information sysctl. MFC after: 1 week Spotted by: bz	2010-02-22 16:16:16 +00:00
rwatson	ac0cbafef7	Export netisr configuration and statistics to userspace via sysctl(9). MFC after: 1 week Sponsored by: Juniper Networks	2010-02-22 15:03:16 +00:00
rwatson	e1faba5621	ifconfig(8) expects interface fooX to be supported by the module if_foo, and will try to load it if it's not present. To better meet these expectations, change the module name for the loopback interface from 'loop' to 'if_lo'. The loopback interface is always compiled into the base kernel, so there are no resulting changes in kld files, etc. Discussed with: brooks (ages ago) MFC after: 1 week	2010-02-21 15:25:47 +00:00
yongari	55d2bfd121	Add __FBSDID. Reviewed by: sam	2010-02-21 00:07:45 +00:00
yongari	453676a091	Add TSO support on VLANs. Intentionally separated IFCAP_VLAN_HWTSO from IFCAP_VLAN_HWTAGGING. I think some hardwares may be able to TSO over VLAN without VLAN hardware tagging. Driver changes and userland support will follow. Reviewed by: thompsa	2010-02-20 22:47:20 +00:00
bz	27d5a92985	Start to implement ifnet DDB support: - 'show ifnets' prints a list of ifnet s per virtual network stack, - 'show ifnet <struct ifnet >' prints fields matching the given ifp. We do not yet print the complete set of fields and might want to factor this out to an extra if_debug.c file in case this grows a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1]. Sponsored by: ISPsystem Suggested by: rwatson [1] Reviewed by: rwatson MFC after: 5 days	2010-02-20 22:09:48 +00:00
bz	77e8f746fc	Enhance a panic string to contain more useful debugging information. Sponsored by: ISPsystem Reviewed by: rwatson MFC after: 5 days	2010-02-20 21:43:36 +00:00
jkim	30767ce70b	Return partially filled buffer for non-blocking read(2) in non-immediate mode. PR: kern/143855	2010-02-20 00:19:21 +00:00
pjd	12d96a648c	Mark various sysctls also as tunables. Reviewed by: rwatson MFC after: 1 week	2010-02-15 09:19:07 +00:00
mlaier	b056acb864	Fix drbr and altq interaction: - introduce drbr_needs_enqueue that returns whether the interface/br needs an enqueue operation: returns true if altq is enabled or there are already packets in the ring (as we need to maintain packet order) - update all drbr consumers - fix drbr_flush - avoid using the driver queue (IFQ_DRV_*) in the altq case as the multiqueue consumer does not provide enough protection, serialize altq interaction with the main queue lock - make drbr_dequeue_cond work with altq Discussed with: kmacy, yongari, jfv MFC after: 4 weeks	2010-02-13 16:04:58 +00:00
bz	7b5cc4e51b	Add DDB support for printing vnet_sysinit and vnet_sysuninit ordered call lists. Try to lookup function/symbol names and print those in addition to the pointers, along with the constants for subsystem and order. This is useful for debugging vnet teardown ordering issues. Make it possible to call the actual printing frunction from normal code at runtime, ie. from vnet_sysuninit(), if DDB support is there. Sponsored by: ISPsystem MFC After: 8 days	2010-02-09 22:39:34 +00:00
bz	eb28e347bf	Add an SDT provider for "vnet"s along with probes for vnet_alloc and vnet_destroy. Use the line number rather than NULL as dummy argument. Note: the fbt provider does not reliably provide :return probes (depending on optimization levels used at compile time) making it unusable for scripts to generate complete call-traces with well defined boundaries over allocations or destructions of virtual network stacks. Sponsored by: ISPsystem MFC After: 8 days	2010-02-09 22:15:59 +00:00
eri	3c38fdad1e	Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features. PR: kern/141646 Reviewed by: thompsa Approved by: thompsa(co-mentor) MFC after: 2 weeks	2010-02-06 13:49:35 +00:00
zec	393450ca7b	Instead of spamming the console on each curvnet recursion event, print out each such call graph only once, along with a stack backtrace. This should make kernels built with VNET_DEBUG reasonably usable again in busy / production environments. Introduce a new DDB command "show vnetrcrs" which dumps the whole log of distinctive curvnet recursion events. This might be useful when recursion reports get burried / lost too deep in the message buffer. In the later case stack backtraces are not available. Reviewed by: bz MFC after: 3 days	2010-02-04 07:55:42 +00:00
hrs	0f402c2331	- Check if_type of "addm <interface>" before setting the interface's MTU to the if_bridge(4) interface. This fixes a bug that MTU value of "addm <interface>" is used even when it is invalid for the if_bridge(4) member: # ifconfig bridge0 create # ifconfig bridge0 bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 ... # ifconfig bridge0 addm lo0 ifconfig: BRDGADD lo0: Invalid argument # ifconfig bridge0 bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 16384 ... - Do not ignore MTU value of an interface even when if_type == IFT_GIF. This fixes MTU mismatch when an if_bridge(4) interface has a gif(4) interface and no other interface as the member, and it is directly used for L2 communication with EtherIP tunneling enabled. - Implement SIOCSIFMTU ioctl. Changing the MTU is allowed only when all members have the same MTU value.	2010-01-31 08:16:37 +00:00
delphij	d9a0cd0982	Revised revision 199201 (add interface description capability as inspired by OpenBSD), based on comments from many, including rwatson, jhb, brooks and others. Sponsored by: iXsystems, Inc. MFC after: 1 month	2010-01-27 00:30:07 +00:00
syrinx	40d92428fb	While flushing the multicast filter of an interface, do not zero the relevant ifmultiaddr structures' reference to the parent interface, unless the parent interface is really detaching. While here, program only link layer multicast filters to a wlan's hardware parent interface. PR: kern/142391, kern/142392 Reviewed by: sam, rpaolo, bms MFC after: 1 week	2010-01-24 16:17:58 +00:00
thompsa	d0093d69aa	Do not hold the lock over if_setlladdr() as it calls into the interface driver init routine.	2010-01-19 04:29:42 +00:00
thompsa	5056e27c2d	Declare a new EVENTHANDLER called iflladdr_event which signals that the L2 address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev	2010-01-18 20:34:00 +00:00
bz	8458b2914e	Correct a typo. MFC after: 5 days	2010-01-10 12:03:53 +00:00
trasz	2ced506c4b	Stop GCC from complaining about lagg_port_checkstacking() being unused.	2010-01-08 16:44:33 +00:00
mbr	7450f52a57	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
luigi	f1fcae96ad	put ip_var before ip_fw_private.h as this will be needed in the near future	2010-01-07 10:27:52 +00:00
luigi	40024ff7c3	Various cleanup done in ipfw3-head branch including: - use a uniform mtag format for all packets that exit and re-enter the firewall in the middle of a rulechain. On reentry, all tags containing reinject info are renamed to MTAG_IPFW_RULE so the processing is simpler. - make ipfw and dummynet use ip_len and ip_off in network format everywhere. Conversion is done only once instead of tracking the format in every place. - use a macro FREE_PKT to dispose of mbufs. This eases portability. On passing i also removed a few typos, staticise or localise variables, remove useless declarations and other minor things. Overall the code shrinks a bit and is hopefully more readable. I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr. For ng_ipfw i am actually waiting for feedback from glebius@ because we might have some small changes to make. For if_bridge and if_ethersubr feedback would be welcome (there are still some redundant parts in these two modules that I would like to remove, but first i need to check functionality).	2010-01-04 19:01:22 +00:00
jhb	26cb0488b3	Use stricter checking to match possible vlan clones by not allowing extra garbage characters around or within the tag. Reviewed by: brooks MFC after: 3 days	2009-12-31 20:44:38 +00:00
brooks	a5cc24440b	The devices that supported EVFILT_NETDEV kqueue filters were removed in r195175. Remove all definitions, documentation, and usage. fifo_misc.c: Remove all kqueue tests as fifo_io.c performs all those that would have remained. Reviewed by: rwatson MFC after: 3 weeks X-MFC note: don't change vlan_link_state() function signature	2009-12-31 20:29:58 +00:00
qingli	c44d3e680a	Remove a deleted comment line that was brought back by my previous commit. MFC after: 5 days	2009-12-31 01:09:16 +00:00
qingli	ed965a92bc	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days	2009-12-30 21:35:34 +00:00
jhb	3ce93dcb7c	Change vlan interfaces to cope more usefully with the parent interface being renamed. Previously the vlan interfaces would lose their configuration as if the parent interface had been physically removed. Now vlan interfaces ignore rename events. - Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being renamed. This flag can be checked in ifnet departure/arrival event handlers to treat rename events differently. - Change the ifnet departure event handler in the if_vlan(4) driver to ignore departure events due to a trunk interface being renamed. Reviewed by: brooks, rwatson MFC after: 1 week	2009-12-29 13:35:18 +00:00
luigi	483862a5a2	bring in several cleanups tested in ipfw3-head branch, namely: r201011 - move most of ng_ipfw.h into ip_fw_private.h, as this code is ipfw-specific. This removes a dependency on ng_ipfw.h from some files. - move many equivalent definitions of direction (IN, OUT) for reinjected packets into ip_fw_private.h - document the structure of the packet tags used for dummynet and netgraph; r201049 - merge some common code to attach/detach hooks into a single function. r201055 - remove some duplicated code in ip_fw_pfil. The input and output processing uses almost exactly the same code so there is no need to use two separate hooks. ip_fw_pfil.o goes from 2096 to 1382 bytes of .text r201057 (see the svn log for full details) - macros to make the conversion of ip_len and ip_off between host and network format more explicit r201113 (the remaining parts) - readability fixes -- put braces around some large for() blocks, localize variables so the compiler does not think they are uninitialized, do not insist on precise allocation size if we have more than we need. r201119 - when doing a lookup, keys must be in big endian format because this is what the radix code expects (this fixes a bug in the recently-introduced 'lookup' option) No ABI changes in this commit. MFC after: 1 week	2009-12-28 10:47:04 +00:00
rwatson	c7656477f0	When warning about possible netisr configuration problems during boot, report using "netisr_init" rather than "netisr2", which was the development name for the project. MFC after: 3 days	2009-12-23 12:33:59 +00:00
rwatson	0b860a6956	Refine netisr.c comments a bit.	2009-12-23 12:31:27 +00:00
luigi	2043aec456	merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw. In detail: 1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK. 2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N). 3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1). 4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id> All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t Operation costs now are as follows: Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) ------------------------------------------------------------------- The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI Supported by: Valeria Paoli MFC after: 1 month	2009-12-22 19:01:47 +00:00
jhb	70c567a753	Remove commented out prototype for ifinit(). This prototype has been commented out since 1.1 and has not been present in <sys/systm.h> since at least 1.1 of that file. It is also not needed in FreeBSD due to SYSINIT().	2009-12-21 20:09:19 +00:00
luigi	c4e6c7a490	Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part. Files touched by this commit: conf/files now references the two new source files netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h. netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions netinet/ipfw/ip_fw_log.c ipfw_log and related functions netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer. A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them. MFC after: 1 month	2009-12-15 16:15:14 +00:00
luigi	003a092f8a	Move the scan for max_keylen into route.c::route_init(), and make max_keylen an argument for rn_init(). This removes an unnecessary dependency on domain.h from radix.c MFC after: 7 days	2009-12-14 20:12:51 +00:00
bz	932cbdbe4d	Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days	2009-12-13 13:57:32 +00:00
luigi	29b041fe97	Make the code buildable in userland so it is easier to test it: this requires a small reordering of headers and a few #defines to map functions not available in userland. Remove a useless #ifndef block at the beginning of the file. Introduce (temporarily) rn_init2(), see the comment in the code for the proper long term change. No ABI or functional change. MFC after: 7 days	2009-12-12 15:49:28 +00:00
luigi	54e485de2b	No functional changes (who dares to touch this code!) but: - cast the result of LEN() to int as this is the main usage. - use LEN() in one place where it was forgotten. - Document the use of a static variable in rw mode. More small changes to follow. MFC after: 7 days	2009-12-10 10:34:30 +00:00
jhb	1f48c677b5	Remove if_timer/if_watchdog now that they are no longer used. The space used by if_timer is reserved for expanding if_index to an int in the future. Reviewed by: rwatson, brooks	2009-11-30 21:25:57 +00:00
jkim	2d93b6e424	General style cleanup, no functional change.	2009-11-20 21:12:40 +00:00
jkim	052bc52af7	- Allocate scratch memory on stack instead of pre-allocating it with the filter as we do from bpf_filter()[1]. - Revert experimental use of contigmalloc(9)/contigfree(9). It has no performance benefit over malloc(9)/free(9)[2]. Requested by: rwatson[1] Pointed out by: rwatson, jhb, alc[2]	2009-11-20 18:49:20 +00:00
jkim	8e75a7257b	- Change internal function bpf_jit_compile() to return allocated size of the generated binary and remove page size limitation for userland. - Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make sure the generated binary aligns properly and make it physically contiguous.	2009-11-18 23:40:19 +00:00
jkim	efc247aeb3	- Make BPF JIT compiler working again in userland. We are limiting size of generated native binary to page size for now. - Update copyright date and fix some style nits.	2009-11-18 19:26:17 +00:00
tuexen	7b17948ff8	Fix a LOR showing up with sctp_bsd_addr(): Do not hold a rt lock when calling rt_newaddrmsg(). Reviewed by: qingli Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 12:57:10 +00:00
delphij	8fed657163	Revert revision 199201 for now as it has introduced a kernel vulnerability and requires more polishing.	2009-11-12 19:02:10 +00:00
delphij	13a19ef806	Add interface description capability as inspired by OpenBSD. MFC after: 3 months	2009-11-11 21:30:58 +00:00
jhb	892fe839c4	Take a step towards removing if_watchdog/if_timer. Don't explicitly set if_watchdog/if_timer to NULL/0 when initializing an ifnet. if_alloc() sets those members to NULL/0 already.	2009-11-06 14:55:01 +00:00
rwatson	07e1c4bd14	Remove unneeded blank line from bpf_drvinit(). MFC after: 3 days	2009-10-23 17:26:29 +00:00
brueffer	245a0bbb51	Check pointer for NULL before dereferencing it, not after. PR: 138390 Submitted by: Patroklos Argyroudis <argp@census-labs.com> MFC after: 1 week	2009-10-22 06:17:04 +00:00
qingli	e4c3421f52	Verify "smp_started" is true before calling sched_bind() and sched_unbind(). Reviewed by: kmacy MFC after: 3 days	2009-10-22 00:32:01 +00:00
qingli	88bb68ef0f	The flow-table function flowtable_route_flush() may be called during system initialization time. Since the flow-table is designed to maintain per CPU flow cache, the existing code did not check whether "smp_started" is true before calling sched_bind() and sched_unbind(), which triggers a page fault. Reviewed by: jeff MFC after: immediately	2009-10-20 21:27:03 +00:00
rwatson	d42d258f85	Clean up comments, white space, and style in pfil.c (especially new VNET bits). MFC after: 3 days (not VNET bits)	2009-10-19 15:19:14 +00:00
rwatson	dc93fde3ad	Remove unused pfil_flags field in packet_filter_hook. MFC after: 3 days	2009-10-18 22:54:09 +00:00
rwatson	c564f117b3	Sort function prototypes in pfil.h, clean up white space, and better align fields for printing. MFC after: 3 days	2009-10-18 22:43:28 +00:00
rwatson	066984a67b	Line-wrap pfil.c so that it prints more nicely. MFC after: 3 days	2009-10-18 11:27:34 +00:00
bz	b0448eafb0	Unbreak the VIMAGE build with IPSEC, broken with r197952 by virtualizing the pfil hooks. For consistency add the V_ to virtualize the pfil hooks in here as well. MFC after: 55 days X-MFC after: julian MFCed r197952.	2009-10-14 11:55:55 +00:00
julian	79c1f884ef	Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months	2009-10-11 05:59:43 +00:00
bz	1e3cae3b31	Put #ifdef INET around parts of the FLOWTABLE code, to unbreak nooptions INET kernel builds. MFC after: 3 days X-MFC: with r197687	2009-10-03 10:56:03 +00:00
qingli	42eac0e4cd	The flow-table associates TCP/UDP flows and IP destinations with specific routes. When the routing table changes, for example, when a new route with a more specific prefix is inserted into the routing table, the flow-table is not updated to reflect that change. As such existing connections cannot take advantage of the new path. In some cases the path is broken. This patch will update the affected flow-table entries when a more specific route is added. The route entry is properly marked when a route is deleted from the table. In this case, when the flow-table performs a search, the stale entry is updated automatically. Therefore this patch is not necessary for route deletion. Submitted by: simon, phk Reviewed by: bz, kmacy MFC after: 3 days	2009-10-01 20:32:29 +00:00
qingli	85d603eeff	A wrong variable is used when setting up the interface address route, which broke source address selection in some code paths. Submitted by: noted by bz Reviewed by: hrs MFC after: immediately	2009-09-20 17:22:19 +00:00
zec	58bb001f4b	Style fix - break too long a line in two. Spotted by: bz MFC after: 3 days	2009-09-18 09:03:23 +00:00
zec	02353f41db	V_irtualize the lltables list, making ARP and ND reasonably usable again with options VIMAGE kernels. Submitted by: bz (the original version, probably identical to this one) Reviewed by: many @ DevSummit Cambridge MFC after: 3 days	2009-09-17 14:52:15 +00:00
qingli	3a82e44273	Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately	2009-09-15 19:18:34 +00:00
rwatson	53eaed07bc	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
emaste	e750f5d891	Compare pointer with NULL, not 0.	2009-09-09 03:36:43 +00:00
np	ba75578c03	Add arp_update_event. This replaces route_arp_update_event, which has not worked since the arp-v2 rewrite. The event handler will be called with the llentry write-locked and can examine la_flags to determine whether the entry is being added or removed. Reviewed by: gnn, kmacy Approved by: gnn (mentor) MFC after: 1 month	2009-09-08 21:17:17 +00:00
qingli	5b9cf14b54	The addresses that are assigned to the loopback interface should be part of the kernel routing table. Reviewed by: bz MFC after: immediately	2009-09-05 20:24:37 +00:00
qingli	0cca60c70d	This patch fixes the following issues: - Interface link-local address is not reachable within the node that owns the interface, this is due to the mismatch in address scope as the result of the installed interface address loopback route. Therefore for each interface address loopback route, the rt_gateway field (of AF_LINK type) will be used to track which interface a given address belongs to. This will aid the address source to use the proper interface for address scope/zone validation. - The loopback address is not reachable. The root cause is the same as the above. - Empty nd6 entries are created for the IPv6 loopback addresses only for validation reason. Doing so will eliminate as much of the special case (loopback addresses) handling code as possible, however, these empty nd6 entries should not be returned to the userland applications such as the "ndp" command. Since both of the above issues contain common files, these files are committed together. Reviewed by: bz MFC after: immediately	2009-09-05 16:43:16 +00:00
gnn	60eb51e7a3	Add ARP statistics to the kernel and netstat. New counters now exist for: requests sent replies sent requests received replies received packets received total packets dropped due to no ARP entry entrys timed out Duplicate IPs seen The new statistics are seen in the netstat command when it is given the -s command line switch. MFC after: 2 weeks In collaboration with: bz	2009-09-03 21:10:57 +00:00
qingli	04fc72216c	As part of r196609, a call to "rtalloc" did not take the fib into account. So call the appropriate "rtalloc_ign_fib()" instead of calling "rtalloc_ign()". Reviewed by:i pointed out by bz MFC after: immediately	2009-08-31 00:14:37 +00:00
zec	9940277b0b	Introduce a separate sx lock for protecting lists of vnet sysinit and sysuninit handlers. Previously, sx_vnet, which is a lock designated for protecting the vnet list, was (ab)used for protecting vnet sysinit / sysuninit handler lists as well. Holding exclusively the sx_vnet lock while invoking sysinit and / or sysuninit handlers turned out to be problematic, since some of the handlers may attempt to wake up another thread and wait for it to walk over the vnet list, hence acquire a shared lock on sx_vnet, which in turn leads to a deadlock. Protecting vnet sysinit / sysuninit lists with a separate lock mitigates this issue, which was first observed with flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c. Reviewed by: rwatson, jhb MFC after: 3 days	2009-08-28 22:30:55 +00:00
qingli	f92dfc8485	In ip_output(), the flow-table module must not try to cache L2/L3 information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type. Since the L2 information (rt_lle) is invalid for these interface types, accidental caching attempt will trigger panic when the invalid rt_lle reference is accessed. When installing a new route, or when updating an existing route, the user supplied gateway address may be an interface address (this is particularly true for point-to-point interface related modules such as ppp, if_tun, if_gif). Currently the routing command handler always set the RTF_GATEWAY flag if the gateway address is given as part of the command paramters. Therefore the gateway address must be verified against interface addresses or else the route would be treated as an indirect route, thus making that route unusable. Reviewed by: kmacy, julia, rwatson Verified by: marcus MFC after: 3 days	2009-08-28 07:01:09 +00:00
rwatson	cffae4081c	Add IFNET_HOLD reserved pointer value for the ifindex ifnet array, which allows an index to be reserved for an ifnet without making the ifnet available for management operations. Use this in if_alloc() while the ifnet lock is released between initial index allocation and completion of ifnet initialization. Add ifindex_free() to centralize the implementation of releasing an ifindex value. Use in if_free() and if_vmove(), as well as when releasing a held index in if_alloc(). Reviewed by: bz MFC after: 3 days	2009-08-26 11:13:10 +00:00
rwatson	8184e58b3a	Break out allocation of new ifindex values from if_alloc() and if_vmove(), and centralize in a single function ifindex_alloc(). Assert the IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not close all known races in this code. Reviewed by: bz MFC after: 3 days	2009-08-25 20:21:16 +00:00
rwatson	544dfa0789	Use locks specific to the lltable code, rather than borrow the ifnet list/index locks, to protect link layer address tables. This avoids lock order issues during interface teardown, but maintains the bug that sysctl copy routines may be called while a non-sleepable lock is held. Reviewed by: bz, kmacy MFC after: 3 days	2009-08-25 09:52:38 +00:00
jfv	91fcf01735	When bridging LRO is causing a problem, the believe that it would work as long as all interfaces have TSO seems to be false, until the matter gets sorted out just disable LRO completely.	2009-08-24 21:04:51 +00:00
rwatson	260dfcf9e9	Make if_grow static -- it's not used outside of if.c, and with the internals destined to change, it's better if it remains that way. MFC after: 3 days	2009-08-24 12:52:05 +00:00
zec	47445e571b	When moving ifnets from one vnet to another, and the ifnet has ifaddresses of AF_LINK type which thus have an embedded if_index "backpointer", we must update that if_index backpointer to reflect the new if_index that our ifnet just got assigned. This change affects only options VIMAGE builds. Submitted by: bz Reviewed by: bz Approved by: re (rwatson), julian (mentor)	2009-08-24 10:14:09 +00:00
rwatson	f0ee7a159b	Rather than using IFNET_RLOCK() when iterating over (and modifying) the ifnet list during if_ef load, directly acquire the ifnet_sxlock exclusively. That way when if_alloc() recurses the lock, it's a write recursion rather than a read->write recursion. This code structure is arguably a bug, so add a comment indicating that this is the case. Post-8.0, we should fix this, but this commit resolves panic-on-load for if_ef. Discussed with: bz, julian Reported by: phk MFC after: 3 days	2009-08-23 21:00:21 +00:00
rwatson	ef8d755d4d	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
julian	009dd32306	Don't allow access to the internals until it has all been set up. Specifically, not until the per-vnet parts have been set up. Submitted by: kmacy@ Reviewed by: julian@, zec@ Approved by: re(rwatson) MFC after: immediately	2009-08-21 09:22:32 +00:00
kmacy	e10b9149e4	This change fixes a comment and addresses a complaint by kib@ by moving a frequently executed flowtable syslog statement from being conditional on bootverbose to conditional on a per-vnet flowtable sysctl. Approved by: re@	2009-08-19 20:13:09 +00:00
kmacy	bbd03fa206	- change the interface to flowtable_lookup so that we don't rely on the mbuf for obtaining the fib index - check that a cached flow corresponds to the same fib index as the packet for which we are doing the lookup - at interface detach time flush any flows referencing stale rtentrys associated with the interface that is going away (fixes reported panics) - reduce the time between cleans in case the cleaner is running at the time the eventhandler is called and the wakeup is missed less time will elapse before the eventhandler returns - separate per-vnet initialization from global initialization (pointed out by jeli@) Reviewed by: sam@ Approved by: re@	2009-08-18 20:28:58 +00:00
kmacy	2836450c4c	fix netboot issue by disabling flowtable lookups until initialization has been run Reviewed by: rwatson@ Approved by: re@	2009-08-17 19:09:28 +00:00
rwatson	e905082a5a	Remove unused if_rawoutput() macro; it has been unused since at least FreeBSD 2. Approved by: re (kib)	2009-08-15 22:26:26 +00:00
zec	51ca260850	Appease VNET_DEBUG - in if_vmove we temporarily switch i.e. recurse from one vnet to another which is OK, so no need to flood the console with warnings here. Approved by: re (rwatson), julian (mentor)	2009-08-14 22:46:45 +00:00
zec	b464f1dafc	Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from INVARIANTS. Reviewed by: bz Approved by: re (rwatson), julian (mentor)	2009-08-14 22:41:39 +00:00
bz	5307a46b8b	Make it possible to change the vnet sysctl variables on jails with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)	2009-08-13 10:26:34 +00:00
bz	b6a41509df	Put multiple instructions into a block when iterating; unbreaks NET_RT_DUMP, which otherwise only returned information of AF_MAX. This was broken in r193232 (save your time - my bug, my fix). PR: kern/137700 Reported by: Larry Baird (lab gta.com) Tested by: Larry Baird (lab gta.com) Reviewed by: zec, lstewart, qing Approved by: re (kib)	2009-08-13 09:29:52 +00:00
jkim	cda13d6d86	Always embed pointer to BPF JIT function in BPF descriptor to avoid inconsistency when opt_bpf.h is not included. Reviewed by: rwatson Approved by: re (rwatson)	2009-08-12 17:28:53 +00:00
bz	a5f0cffcb4	Update DDB show vnet command to print all used and available information. Reviewed by: rwatson, zec Approved by: re	2009-08-12 12:00:21 +00:00
bz	d74deee9a1	Put minimum alignment on the dpcpu and vnet section so that ld when adding the __start_ symbol knows the expected section alignment and can place the __start_ symbol correctly. These sections will not support symbols with super-cache line alignment requirements. For full details, see posting to freebsd-current, 2009-08-10, Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>. Debugging and testing patches by: Kamigishi Rei (spambox haruhiism.net), np, lstewart, jhb, kib, rwatson Tested by: Kamigishi Rei, lstewart Reviewed by: kib Approved by: re	2009-08-12 10:26:03 +00:00
rwatson	5c6699ad3d	Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib)	2009-08-02 19:43:32 +00:00
rwatson	2eafab39fe	The colour was red as shall be the letters of this warning to people upon boot if the experimental VIMAGE feature was compiled into the kernel. Submitted by: bz Reviewed by: zec Approved by: re (vimage blanket)	2009-08-01 22:22:45 +00:00
rwatson	3b267ddfaa	Minor style tweaks. Approved by: re (vimage blanket)	2009-08-01 21:58:32 +00:00
rwatson	648ff24430	Make the vnet alloc/destroy paths a bit easier to followg by merging vnet_data_init/vnet_data_destroy into vnet_alloc/vnet_destroy. Reviewed by: bz, zec Approved by: re (vimage blanket)	2009-08-01 21:54:15 +00:00
rwatson	e3917e768d	Remove vnet_foreach() utility function, which previously allowed vnet.c to iterate virtual network stacks without being aware of the implementation details previously hidden in kern_vimage.c. Now they are in the same file, so remove this added complexity. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 20:24:45 +00:00
rwatson	fb9ffed650	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
rwatson	466a4af8b2	Reorder and recomment vnet.c and vnet.h on the basis that they are no longer solely about the virtual network stack memory allocator. Approved by: re (vimage blanket)	2009-07-30 12:41:19 +00:00
rwatson	c387c55113	Revise header comments for vnet.h as we now implement VNET_SYSINIT, not just VNET_DEFINE in vnet.h. Approved by: re (vimage blanket)	2009-07-28 22:17:34 +00:00
qingli	4092d532fe	The new flow table caches both the routing table entry as well as the L2 information. For an indirect route the cached L2 entry contains the MAC address of the gateway. Typically the default route is used to transmit multicast packets when explicit multicast routes are not available. The ether_output() function bypasses L2 resolution function if it verifies the L2 cache is valid, because the cached L2 address (a unicast MAC address) is copied into the packets as the destination MAC address. This validation, however, does not apply to broadcast and multicast packets because the destination MAC address is mapped according to a standard method instead. Submitted by: Xin Li Reviewed by: bz Approved by: re	2009-07-28 17:16:54 +00:00
qingli	8c1899d934	This patch does the following: - Allow loopback route to be installed for address assigned to interface of IFF_POINTOPOINT type. - Install loopback route for an IPv4 interface addreess when the "useloopback" sysctl variable is enabled. Similarly, install loopback route for an IPv6 interface address when the sysctl variable "nd6_useloopback" is enabled. Deleting loopback routes for interface addresses is unconditional in case these sysctl variables were disabled after an interface address has been assigned. Reviewed by: bz Approved by: re	2009-07-27 17:08:06 +00:00
bz	83f1495433	Update epair(4) to the new netisr implementation and polish things a bit: - use dpcpu data to track the ifps with packets queued up, - per-cpu locking and driver flags - along with .nh_drainedcpu and NETISR_POLICY_CPU. - Put the mbufs in flight reference count, preventing interfaces from going away, under INVARIANTS as this is a general problem of the stack and should be solved in if.c/netisr but still good to verify the internal queuing logic. - Permit changing the MTU to virtually everythinkg like we do for loopback. Hook epair(4) up to the build. Approved by: re (kib)	2009-07-26 12:20:07 +00:00
bz	3aec900b26	Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls (ifconfig ifN (-)vnet <jname\|jid>) work correctly. Move vi_if_move to if.c and split it up into two functions(), one for each ioctl. In the reclaim case, correctly set the vnet before calling if_vmove. Instead of silently allowing a move of an interface from the current vnet to the current vnet, return an error. () There is some duplicate interface name checking before actually moving the interface between network stacks without locking and thus race prone. Ideally if_vmove will correctly and automagically handle these in the future. Suggested by: rwatson (*) Approved by: re (kib)	2009-07-26 11:29:26 +00:00
rwatson	b3be1c6e3b	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
bz	1f4b104d4d	sysctl_msec_to_ticks is used with both virtualized and non-vrtiualized sysctls so we cannot used one common function. Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code. Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there. Convert the two over places to use the new macro. Reviewed by: rwatson Approved by: re (kib)	2009-07-21 21:58:55 +00:00
rwatson	d32e9e9901	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
rwatson	fb3be5ae64	Add macros VNET_SETNAME and VNET_SYMPREFIX, and expose to userspace if _WANT_VNET is defined. This way we don't need separate definitions in libkvm. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 07:50:50 +00:00
rwatson	80ed051e0c	Normalize field naming for struct vnet, fix two debugging printfs that print them. Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-19 17:40:45 +00:00
rwatson	6955067932	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
jamie	9f81cbd9ec	Remove the interim vimage containers, struct vimage and struct procg, and the ioctl-based interface that supported them. Approved by: re (kib), bz (mentor)	2009-07-17 14:48:21 +00:00
rwatson	88f8de4d40	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
rwatson	bc565b5b97	Add missing license line for vnet.h, correct white space nit. Approved by: re (kensmith) (implicit)	2009-07-15 00:56:15 +00:00
rwatson	57ca4583e7	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
kmacy	ccb66bdf1d	Re-factoring for adding weighted routes introduced a fairly irritating bug where the system will panic when RADIX_MPATH is enabled. This change fixes this. Approved by: re@	2009-07-11 21:56:23 +00:00
rpaulo	8424d74020	Implementation of the upcoming Wireless Mesh standard, 802.11s, on the net80211 wireless stack. This work is based on the March 2009 D3.0 draft standard. This standard is expected to become final next year. This includes two main net80211 modules, ieee80211_mesh.c which deals with peer link management, link metric calculation, routing table control and mesh configuration and ieee80211_hwmp.c which deals with the actually routing process on the mesh network. HWMP is the mandatory routing protocol on by the mesh standard, but others, such as RA-OLSR, can be implemented. Authentication and encryption are not implemented. There are several scripts under tools/tools/net80211/scripts that can be used to test different mesh network topologies and they also teach you how to setup a mesh vap (for the impatient: ifconfig wlan0 create wlandev ... wlanmode mesh). A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled by default on GENERIC kernels for i386, amd64, sparc64 and pc98. Drivers that support mesh networks right now are: ath, ral and mwl. More information at: http://wiki.freebsd.org/WifiMesh Please note that this work is experimental. Also, please note that bridging a mesh vap with another network interface is not yet supported. Many thanks to the FreeBSD Foundation for sponsoring this project and to Sam Leffler for his support. Also, I would like to thank Gateworks Corporation for sending me a Cambria board which was used during the development of this project. Reviewed by: sam Approved by: re (kensmith) Obtained from: projects/mesh11s	2009-07-11 15:02:45 +00:00
bz	f236086095	In case we cannot queue a packet reaching the queue limit, retain the semantics netisr_queue() always had and free the mbuf along with returning the error. Reviewed by: rwatson Approved by: re (kensmith)	2009-06-30 05:21:00 +00:00
brooks	0cabaf8791	Remove support for the /dev/net/* per-interface devices. They serve little purpose and are unused in the base system. The IOCTL functionality is entirely duplicated and routing sockets provide a richer interface than the kqueue functionality. Further, it is not practical for these devices to be made sensible in the face of VIMAGE. Bump __FreeBSD_version on the off chance that there is any code out there that actually uses this stuff. Reviewed by: rwatson Discussed with: bz, zec Approved by: re@ (kensmith)	2009-06-29 19:46:29 +00:00
rwatson	5cfc09d074	Remove unnecessary include of kdb.h that snuck in during ifaddr refcount work. Reported by: pluknet <pluknet at gmail.com> Approved by: re (kib)	2009-06-27 10:30:28 +00:00
rwatson	83138eec8f	In light of DPCPU use by netisr, revise various for loops from using MAXCPU to mp_maxid, and handling and reporting of requests to use more threads than we have CPUs to run them on. Reviewed by: bz Approved by: re (kib) MFC after: 6 weeks	2009-06-26 20:39:36 +00:00
rwatson	d5ad725967	Use if_addr_rlock/if_addr_runlock for if_spp when iterating if_addrhead, as it is loadable as a module. Approved by: re (kib) MFC after: 6 weeks	2009-06-26 18:50:49 +00:00
rwatson	3dffb5419c	Update if_stf and if_tun to use if_addr_rlock()/if_addr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK() when iterating ifp->if_addrhead. MFC after: 6 weeks	2009-06-26 00:45:20 +00:00
rwatson	c4ac6ab020	Define four wrapper functions for interface address locking, if_addr_rlock() and if_addr_runlock() for regular address lists, and if_maddr_rlock() and if_maddr_runlock() for multicast address lists. We will use these in various kernel modules to avoid encoding specific type and locking strategy information into modules that currently use IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly. MFC after: 6 weeks	2009-06-26 00:36:47 +00:00
rwatson	6bcc6a8413	Convert netisr to use dynamic per-CPU storage (DPCPU) instead of sizing arrays to [MAXCPU], offering moderate memory savings. In some places, this requires using CPU_ABSENT() to handle less common platforms with sparse CPU IDs. In several places, assert that the selected CPUID for work placement or statistics is not CPU_ABSENT() to be on the safe side. Discussed with: bz, jeff	2009-06-26 00:19:25 +00:00
kib	a7a5954511	Change the type of uio_resid member of struct uio from int to ssize_t. Note that this does not actually enable full-range i/o requests for 64 architectures, and is done now to update KBI only. Tested by: pho Reviewed by: jhb, bde (as part of the review of the bigger patch)	2009-06-25 18:46:30 +00:00
rwatson	ea70a3542d	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz	2009-06-25 11:52:33 +00:00
bz	e1dc108b61	Merge from p4: CH154790,154793,154874 Import if_epair(4), a virtual cross-over Ethernet-like interface pair. Note these files are 1:1 from p4 and not yet connected to the build not knowing about the new netisr interface. Sponsored by: The FreeBSD Foundation	2009-06-24 22:21:30 +00:00
np	e77e976678	Add 10Gbase-T to known ethernet media types. Approved by: gnn (mentor) MFC after: 1 week.	2009-06-24 21:53:25 +00:00
np	e3f82ee9f1	About to add 10Gbase-T to known media types, this is just a whitespace cleanup before that commit. No functional impact. Approved by: gnn (mentor)	2009-06-24 21:51:42 +00:00
rwatson	d5bc5239c3	In if_setlladdr(), use IF_ADDR_LOCK() and ifaddr references to improve the safety of link layer address manipulation. MFC after: 6 weeks	2009-06-24 10:36:48 +00:00
rwatson	16b92db4cd	Break at_ifawithnet() into two variants: - at_ifawithnet(), which acquires an locks it needs and returns an at_ifaddr reference. - at_ifawithnet_locked(), which relies on the caller locking at_ifaddr_list, and returns a pointer rather than a reference. Update various consumers to prefer one or the other, including ether and fddi output, to properly release at_ifaddr references. Rework at_control() to manage locking and references in a manner identical to in_control(). MFC after: 6 weeks	2009-06-24 10:32:44 +00:00
rwatson	f32ae09bf9	Lock if_addrhead when iterating, and where necessary acquire and release ifadr references in if_sppp. MFC after: 6 weeks	2009-06-24 08:53:23 +00:00
rwatson	5972c25713	Make stf_getsrcifa6() return a reference to an in6_ifaddr rather than a pointer, and dispose of the references when no longer needed. MFC after: 6 weeks	2009-06-24 08:52:09 +00:00
rwatson	c9ef486fe1	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)	2009-06-23 20:19:09 +00:00
bz	0808d0b1a6	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
bz	37fcd1fbea	Remove duplicate #include <net/route.h> from the middle of the file.	2009-06-23 13:16:16 +00:00
zec	f9dfbad808	V_irtualize flowtable state. This change should make options VIMAGE kernel builds usable again, to some extent at least. Note that the size of struct vnet_inet has changed, though in accordance with one-bump-per-day policy we didn't update the __FreeBSD_version number, given that it has already been touched by r194640 a few hours ago. Reviewed by: bz Approved by: julian (mentor)	2009-06-22 21:19:24 +00:00
bz	e11fa9da51	Updates after r194640: - shrink size guards for vnet_net. vnet_rtable does not need size guards as it is self-contained. - remove a bunch of defines from vnet.h no longer valid.	2009-06-22 17:56:07 +00:00
bz	309ab541f2	Move virtualization of routing related variables into their own Vimage module, which had been there already but now is stateful. All variables are now file local; so this further limits the global spreading of routing related things throughout the kernel. Add a missing function local variable in case of MPATHing. Reviewed by: zec	2009-06-22 17:48:16 +00:00
bz	425e742e59	Collect all VIMAGE_GLOBALS variables in one place. No longer export rt_tables as all lookups go through rt_tables_get_rnh(). We cannot make rt_tables (and rtstat, rttrash[1]) static as netstat -r (-rs[1]) would stop working on a stripped VIMAGE_GLOBALS kernel. Reviewed by: zec Presumably broken by: phk 13.5y ago in r12820 [1]	2009-06-22 15:07:12 +00:00
rwatson	3c02410d55	Add a new function, ifa_ifwithaddr_check(), which rather than returning a pointer to an ifaddr matching the passed socket address, returns a boolean indicating whether one was present. In the (near) future, ifa_ifwithaddr() will return a referenced ifaddr rather than a raw ifaddr pointer, and the new wrapper will allow callers that care only about the boolean condition to avoid having to free that reference. MFC after: 3 weeks	2009-06-22 10:59:34 +00:00
bz	c5ae8d95f2	After the update to fxp(4) in r194573 we should no longer need this DELAY(100) hack introduced in r56938. Thanks to: yongari MFC after: 6 weeks X-MFC note: not before the fxp(4) changes	2009-06-22 10:27:20 +00:00
rwatson	1f7e54e8c5	Clean up common ifaddr management: - Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks	2009-06-21 19:30:33 +00:00
rdivacky	9992cf9aeb	Switch cmd argument to u_long. This matches what if_ethersubr.c does and allows the code to compile cleanly on amd64 with clang. Reviewed by: rwatson Approved by: ed (mentor)	2009-06-21 10:29:31 +00:00
rdivacky	cc5ff80770	In non-debugging mode make this define (void)0 instead of nothing. This helps to catch bugs like the below with clang. if (cond); <--- note the trailing ; something(); Approved by: ed (mentor) Discussed on: current@	2009-06-21 08:49:06 +00:00
kmacy	6154623e0c	add helper function for flushing software queues	2009-06-19 23:11:20 +00:00
csjp	888867acdc	Implement the -z (zero counters) option for the various bpf counters. Add necessary changes to the kernel for this (basically introduce a bpf_zero_counters() function). As well, update the man page. MFC after: 1 month Discussed with: rwatson	2009-06-19 20:31:44 +00:00
bz	48dc6805f8	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
bz	b017972f11	Add the explicit include of vimage.h to another five .c files still missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.	2009-06-17 12:44:11 +00:00
sam	df30f1c9f2	r193336 moved ifq_detach to if_free which broke if_alloc followed by if_free (w/o doing if_attach); move ifq_attach to if_alloc and rename ifq_attach/detach to ifq_init/ifq_delete to better identify their purpose Reviewed by: jhb, kmacy	2009-06-15 19:50:03 +00:00
jamie	f950eed7d7	Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 19:01:53 +00:00
jamie	5675a54fb1	Manage vnets via the jail system. If a jail is given the boolean parameter "vnet" when it is created, a new vnet instance will be created along with the jail. Networks interfaces can be moved between prisons with an ioctl similar to the one that moves them between vimages. For now vnets will co-exist under both jails and vimages, but soon struct vimage will be going away. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 18:59:29 +00:00
bz	56983733aa	Add an optional callback function that will be invoked when a per-CPU queue was drained. It will never fire for a directly dispatched packet. You will most likely never want to use this for any ordinary netisr usage and you will never blame netisr in case you try to use it and it does not work as expected. Reviewed by: rwatson	2009-06-14 17:15:18 +00:00
bz	6d318791b0	Garbage collect an extern for a non-existent variable. While here let the comment end in a '.' and mark the #endif of _KERNEL. Reviewed by: rwatson (as part of a larger patch)	2009-06-12 20:50:28 +00:00
bz	64d1a84073	Move the kernel option FLOWTABLE chacking from the header file to the actual implementation. Remove the accessor functions for the compiled out case, just returning "unavail" values. Remove the kernel conditional from the header file as it is no longer needed, only leaving the externs. Hide the improperly virtualized SYSCTL/TUNABLE for the flowtable size under the kernel option as well. Reviewed by: rwatson	2009-06-12 20:46:36 +00:00
vanhu	16c1346b9a	Added support for NAT-Traversal (RFC 3948) in IPsec stack. Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ	2009-06-12 15:44:35 +00:00
bz	fe42112add	carp(4) allows people to share a set of IP addresses and can only use IPv4/v6 for inter-node communication (according to my reading). Properly wrap the carp callouts in INET \|\| INET6 and refelect this in sys/conf/files as well. While in theory this should be ok, it might be a bit optimistic to think that carp could build with inet6 only[1]. Discussed with: mlaier [1]	2009-06-11 10:26:38 +00:00
kib	e1cb2941d4	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland	2009-06-10 20:59:32 +00:00
bz	0b5c06357f	SCTP needs either IPv4 or IPv6 as lower layer[1]. So properly hide the already #ifdef SCTP code with #if defined(INET) \|\| defined(INET6) as well to get us closer to a non-INET/INET6 kernel. Discussed with: tuexen [1]	2009-06-10 14:36:59 +00:00
bz	9089a9160f	ip_gif_ttl/GIF_TTL are only used by the inet part in in_gif.c, so put the initialization under #ifdef INET.	2009-06-10 13:39:51 +00:00
bz	f5e2381b7d	The llentry *lle is only used in cases of INET or INET6. Put the variable declaration under proper #ifdefs. In case variables are only needed for one of the two AFs more them into proper scope.	2009-06-10 09:07:05 +00:00

... 2 3 4 5 6 ...

2796 Commits