freebsd-skq

Author	SHA1	Message	Date
rwatson	79f582472b	Add an optional netisr dispatch point at ether_input(), but set the default dispatch method to NETISR_DISPATCH_DIRECT in order to force direct dispatch. This adds a fairly negligble overhead without changing default behavior, but in the future will allow deferred or hybrid dispatch to other worker threads before link layer processing has taken place. For example, this could allow redistribution using RSS hashes without ethernet header cache line hits, if the NIC was unable to adequately implement load balancing to too small a number of input queues -- perhaps due to hard queueset counts of 1, 3, or 8, but in a modern system with 16-128 threads. This can happen on highly threaded systems, where you want want an ithread per core, redistributing work to other queues, but also on virtualised systems where hardware hashing is (or is not) available, but only a single queue has been directed to one VCPU on a VM. Note: this adds a previously non-present assertion about the equivalence of the ifnet from which the packet is received, and the ifnet stamped in the mbuf header. I believe this assertion to generally be true, but we'll find out soon -- if it's not, we might have to add additional overhead in some cases to add an m_tag with the originating ifnet pointer stored in it. Reviewed by: bz MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.	2011-06-01 20:00:25 +00:00
nwhitehorn	a69e106b2f	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb	2011-05-31 15:11:43 +00:00
rwatson	d81ad1a304	Rework netisr policy mechanism so that per-protocol dispatch policies can be represented: - A single policy namespace is defined, consisting of four possible policies: "default" to use the global default, "deferred" to force deferred dispatch, "direct" to employ direct dispatch where possible, and "hybrid" which makes a dynamic decision based on CPU affinity, ordering, etc. Routines are implemented to convert between strings and an integer namespace. - A new global variable, netisr_dispatch_policy, subsumes existing global variables for direct dispatch, forced direct dispatch, etc, and is used for explicit policy interpretation and composition. Old variables remain so that they can be exported by legacy sysctls for use by old netstat(1) binaries. A new sysctl and tunable, netisr.dispatch.policy, accepts the above strings for specifying a global policy default. - The protocol registration structure, netisr_handler, grows an nh_dispatch field, which accepts a per-policy policy override. The default value is '0', which corresponds to "default", meaning that protocols will accept the global default policy unless otherwise specified. - Policies are now interpreted and composed explicitly at various points in packet dispatch; protocol policies override global policies. - Protocols grow the ability to express a non-opinion about affinity even when implenting m2cpuid by returning NETISR_CPUID_NONE. In that case, the framework falls back on source ordering, rather than simply using the current CPU. These changes are in support of allowing link layer re-dispatch based on RSS or similar hashes provided by NICs, especially in the case where the number of hardware receive queues matches hardware core count, rather than hardware thread count, requiring further software redistributeon. (i.e., on RMI XLR). MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-24 12:34:19 +00:00
zec	0e7fcf0b2c	Allow for vlan(4) interfaces with MTU of 1500 bytes to be configured on top of epair(4) virtual interfaces, since there's no physical hardware associated with epair interfaces which would imply any constraints on MTU sizes. MFC after: 3 days	2011-05-24 08:02:55 +00:00
zec	3ab76c6800	Let epair(4) virtual interfaces report fake link / media status, by borrowing the skeleton of if_media manipulation and reporting code from if_lagg(4). The main motivation behind this change is to allow for epair(4) interfaces to participate in STP if_bridge(4) configurations. Reviewed by: bz MFC after: 3 days	2011-05-24 07:57:28 +00:00
qingli	a1bf1a2582	The statically configured (permanent) ARP entries are removed when an interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days	2011-05-20 19:12:20 +00:00
marius	03919a3303	- Add 10baseT as an alias for 10baseT/UTP. - Add shorthand aliases for common media+option combinations as announced by miibus(4) so that one can actually supply the media strings found in the dmesg output to ifconfig(8). Obtained from: NetBSD (in principle) MFC after: 2 weeks	2011-05-15 12:58:29 +00:00
yongari	0aab37190f	Fix white space nits and style	2011-05-06 20:46:29 +00:00
yongari	b282e7aca8	Do not increment collision counter if transmit have failed. Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it has nothing to do with collision. Reported by: Zeus V Panchenko (zeus <> ibs dot dn dot ua)	2011-05-06 20:37:07 +00:00
thompsa	fc83a48265	LACP frames must not be send VLAN-tagged, check for that before processing. PR: kern/156743 Submitted by: Dmitrij Tejblum MFC after: 1 week	2011-04-30 20:34:52 +00:00
bz	1910487722	Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:30:44 +00:00
glebius	8c3b1b83f9	When removing ifnets, we should first remove the reference to ifnet from the interface index, then decrease refcount, not vice versa. Otherwise there is a race (reproducible) when if_free_internal() contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the index for a long time. It may be picked by some other thread, that runs ifnet_byindex_ref(), who takes the ifnet from index, and bumps refcount. When reader drops the lock, if_free_internal() proceeds with free. Then reader tries to free it a second time.	2011-04-04 07:45:08 +00:00
jeff	2d7d8c05e7	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
dchagin	0f55e947bd	Remove dead code. MFC after: 1 Week	2011-03-20 08:35:00 +00:00
dchagin	a96a94aaf2	ouch, newrt is used on the return path, my fault. Partialy revert the previous change. MFC after: 1 Week.	2011-03-19 21:10:57 +00:00
dchagin	b3fb445553	A bit rearranged rtalloc1_fib() code. Initialize a variable when it is really needed. To avoid code duplication move the miss label to line up and jump on it. MFC after: 1 Week	2011-03-19 19:50:36 +00:00
dchagin	9dffdca01d	Remove a now unused variable. MFC after: 1 Week	2011-03-19 16:52:06 +00:00
eri	2ad117efbd	Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none. Approved by: thompsa(mentor) MFC after: 1 week	2011-03-04 20:37:38 +00:00
bz	209ebad7af	Hide the outer IP addresses of a tunnel interfaces (gif(4), gre(4)) from processes inside jails if the addresses do not belong to the jail. Originally reported by: Pieter de Boer via remko PR: kern/151119 Tested by: Piotr KUCHARSKI (nospam 42.pl) [gif] MFC after: 1 week	2011-03-02 21:39:08 +00:00
brucec	6d9b42b486	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
bz	b9b7d3e93a	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
bz	1289b7e264	Mfp4 CH=177255: Resort the CURVNET_SET* macros in the non-VNET_DEBUG case to match the call order of the VNET_DEBUG case. Add the VNET_ASSERT() to the non-VNET_DEBUG case as well so that INVARIANTS will still catch problems. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks	2011-02-11 14:17:58 +00:00
bz	aad842be85	Mfp4 CH=177255: Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS. Change the syntax to match KASSERT() to allow more flexible panic messages rather than having a printf with hardcoded arguments before panic. Adjust the few assertions we have to the new format (and enhance the output). Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks	2011-02-11 13:27:00 +00:00
bz	3bac00c1ca	Mfp4 CH=177255: Use __func__ rather than __FUNCTION__. MFC after: 2 weeks	2011-02-11 12:56:05 +00:00
mlaier	f0ab2e35d4	As info.rti_info[RTAX_DST] can point inside of rtm we must not free the rtm until rt_dispatch is done with the sockaddr. Found by: memguard MFC after: 3 days	2011-02-10 01:24:09 +00:00
jhb	aff100c87f	Fix a LOR by dropping the global ifnet locks while allocating a new ifnet table in if_grow(). The order of the SYSINIT's for ifnet state were swapped so that the various locks were initialized before being used. Reviewed by: pluknet, bz MFC after: 2 weeks	2011-01-24 22:21:58 +00:00
mdf	6648b8cede	sysctl(8) should use the CTLTYPE to determine the type of data when reading. (This was already done for writing to a sysctl). This requires all SYSCTL setups to specify a type. Most of them are now checked at compile-time. Remove SYSCTL_X sysctl additions as the print being in hex should be controlled by the -x flag to sysctl(8). Succested by: bde	2011-01-19 17:04:07 +00:00
mdf	5e41205b16	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the net* piece.	2011-01-12 19:53:50 +00:00
jhb	c17f46e472	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde	2011-01-11 13:59:06 +00:00
bz	7864be44ec	MfP4 CH=185246 [1]: Add FEATURE() to announce optional VIMAGE. MFC after: 3 days [1] for the moment put it in vnet.c.	2011-01-09 20:40:21 +00:00
jhb	3a7fd5f8c7	- Restore dropping the priority of syncer down to PPAUSE when it is idle. This was lost when it was converted to using a condition variable instead of lbolt. - Drop the priority of flowtable down to PPAUSE when it is idle as well since it is a similar background task. MFC after: 2 weeks	2011-01-06 22:17:07 +00:00
marius	2c3e165ed0	Teach ifconfig(8) the handy shared option shortcut aliases the NetBSD counterpart also takes, i.e. "fdx" for "full-duplex", "flow" for "flowcontrol", "hdx" for "half-duplex" as well as "loop" and "loopback" for "hw-loopback". MFC after: 1 week	2011-01-05 15:28:30 +00:00
marius	577786b402	Fix whitespace. MFC after: 1 week	2011-01-05 14:51:04 +00:00
bz	43c37d3ddd	Use NULL rather than 0 to invalidate a pointer. Rather than duplicating the LLE_FREE_LOCKED() macro code in LLE_FREE(), call it directly (like we do for the RT_* macros). Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC After: 1 week [1] Early 2010.	2010-12-31 21:57:54 +00:00
bz	2a04b52e8a	Print the vnet pointer under DDB when iterating over flowtables of each virtual network stack instance. Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC after: 1 week [1] Early 2010.	2010-12-31 21:20:32 +00:00
bz	558c7e035b	Move the increment operation under the lock and split the condition variable into two so that we can see on which one we are waiting. This might also more properly propagate the update of the flowclean_cycles flag and avoid "hangs" people were seeing. Suggested by: rwatson [1] Sponsored by: ISPsystem [1] Reviewed by: julian [1] Updated by: Mikolaj Golub (to.my.trociny gmail.com) Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC After: 1 week [1] Early 2010, initial version.	2010-12-31 21:06:52 +00:00
alc	971b02b7bc	Introduce and use a new VM interface for temporarily pinning pages. This new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@	2010-12-25 21:26:56 +00:00
weongyo	6dc48cb05c	Adds IFF_CANTCONFIG to IFF_CANTCHANGE that it shouldn't happen through ioctl(2).	2010-12-07 20:31:04 +00:00
weongyo	33417874f4	Introduces IFF_CANTCONFIG interface flag to point that the interface isn't configurable in a meaningful way. This is for ifconfig(8) or other tools not to change code whenever IFT_USB-like interfaces are registered at the interface list. Reviewed by: brooks No objections: gavin, jkim	2010-12-07 20:23:47 +00:00
maxim	c6ed377d5e	o Swap descriptions for net.bpf.bufsize and net.bpf.maxbufsize. PR: misc/152531 MFC after: 1 week	2010-11-24 05:50:19 +00:00
zec	163ca01105	Allow for vlan(4) ifnets to have overlapping unit numbers if they are created in separated vnets. As a side-effect of having a separated if_cloner instance for each vnet, all vlan ifnets created in a vnet will be automatically destroyed when vnet teardown is initiated. Disallow SIOCSETVLAN and SIOCGETVLAN ioctls on vlan ifnets which are associated with physical ifnets residing in parent vnets. This is an interim vlan-specific solution which will be superseded by a more generic if_cloner V_irtualization change from p4. For nooptions VIMAGE builds, this should be a no-op change. Discussed with: bz MFC after: 3 days	2010-11-22 23:35:29 +00:00
dim	fb307d7d1d	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
bz	076c86b1ea	Add a missing ';' and change the debugging sysctl from xint to int. Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 3 days	2010-11-21 19:33:19 +00:00
dim	02778a6df6	Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined.	2010-11-14 20:40:55 +00:00
dim	fda4020a88	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
dim	7dff36caf7	Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-14 20:23:02 +00:00
marius	278d761d73	o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control support in mii(4): - Merge generic flow control advertisement (which can be enabled by passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD, IFM_FLOW isn't implemented as a global option via the "don't care mask" but instead as a media specific option this. This has the following advantages: o allows flow control advertisement with autonegotiation to be turned on and off via ifconfig(8) with the default typically being off (though MIIF_FORCEPAUSE has been added causing flow control to be always advertised, allowing to easily MFC this changes for drivers that previously used home-grown support for flow control that behaved that way without breaking POLA) o allows to deal with PHY drivers where flow control advertisement with manual selection doesn't work or at least isn't implemented, like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4), by setting MIIF_NOMANPAUSE o the available combinations of media options are readily available from the `ifconfig -m` output - Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so these are understood by ifconfig(8). o Make the master/slave support in mii(4) actually usable: - Change IFM_ETH_MASTER from being implemented as a global option via the "don't care mask" to a media specific one as it actually is only applicable to IFM_1000_T to date. - Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to actually configure manually selected slave mode (like we also do in the PHY specific implementations). - Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it is understood by ifconfig(8). o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4), e1000phy(4) and ip1000phy(4) to use the generic flow control support instead of home-grown solutions via IFM_FLAGs. This includes changing these PHY drivers and smcphy(4) to no longer unconditionally advertise support for flow control but only if the selected media has IFM_FLOW set (or MIIF_FORCEPAUSE is set) and implemented for these media variants, i.e. typically only for copper. o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0 and some IFM_FLAGn. o Switch brgphy(4) to add at least the the supported copper media based on the contents of the BMSR via mii_phy_add_media() instead of hardcoding them. The latter approach seems to have developed historically, besides causing unnecessary code duplication it was also undesirable because brgphy_mii_phy_auto() already based the capability advertisement on the contents of the BMSR though. o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not just BCM5701. Apparently this was a misinterpretation of a workaround in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but this doesn't mean we can't set these as well on other PHYs for manual media selection. o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so IFM_1000_T master mode support now is generally available with all PHY drivers. o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's not applicable there. Reviewed by: yongari (plus additional testing) Obtained from: NetBSD (partially), OpenBSD (partially) MFC after: 2 weeks	2010-11-14 13:26:10 +00:00
kib	1889bb0afa	Use 'z' modifier for size_t printing.	2010-11-13 11:11:51 +00:00
dim	7b0aabca30	Similar to r212647, remove the workaround in sys/net/vnet.h for an ld bug (incorrect placement of __start_SECNAME in some cases) that was fixed in r210245. There is already an UPDATING entry about needing a recent ld. MFC after: 1 month	2010-11-12 22:59:50 +00:00
gnn	c3225b5eaa	Add a queue to hold packets while we await an ARP reply. When a fast machine first brings up some non TCP networking program it is quite possible that we will drop packets due to the fact that only one packet can be held per ARP entry. This leads to packets being missed when a program starts or restarts if the ARP data is not currently in the ARP cache. This code adds a new sysctl, net.link.ether.inet.maxhold, which defines a system wide maximum number of packets to be held in each ARP entry. Up to maxhold packets are queued until an ARP reply is received or the ARP times out. The default setting is the old value of 1 which has been part of the BSD networking code since time immemorial. Expose the time we hold an incomplete ARP entry by adding the sysctl net.link.ether.inet.wait, which defaults to 20 seconds, the value used when the new ARP code was added.. Reviewed by: bz, rpaulo MFC after: 3 weeks	2010-11-12 22:03:02 +00:00
dim	2c7257916f	Use the same treatment as in linker_set.h for the __start and __stop symbols of the set_vnet and set_pcpu sections, so those symbols will always be emitted in kernel modules, if they use vnet.h or pcpu.h. Also, for pcpu.h, make the __(start\|stop)_set_pcpu declarations, and associated macros invisible to userland, to prevent it picking up these symbols. Reviewed by: kib	2010-11-11 19:18:52 +00:00
rpaulo	2631ae0f3d	Sync DLTs with the latest pcap version.	2010-10-29 18:41:09 +00:00
bz	491af1942e	Factor out DDB commands from r204145, r204279 into if_debug.c for further enhancements (1). Switch to a standard 2-clause BSD license for this (2). Unfortunately we have to un-static the ifindex_table for this but do not publicly export it. Suggested by: rwatson (1) a while back. Approved by: thompsa (2) for the change from r204279. MFC after: 6 days	2010-10-25 08:30:19 +00:00
pluknet	8950ed8036	Reshuffle SIOCGIFCONF32 handler from r155224. - move all the chunks into one file, which allows to hide SIOCGIFCONF32 global definition as well. - replace __amd64__ with proper COMPAT_FREEBSD32 around. - handle 32bit capacity before going into the handler itself instead of doing internal 32bit specific changes within it (e.g. as it's done for SIOCGDEFIFACE32_IN6). - use explicitely sized types for ABI compat. Approved by: kib (mentor) MFC after: 2 weeks	2010-10-21 16:20:48 +00:00
bz	753b9262ae	Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating over all interfaces to make sure the address will neither change nor be freed while we are working on it. PR: kern/146250 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 1 week	2010-10-16 19:25:27 +00:00
bz	e7e5079137	lltable_drain() has never been used so far, thus #if 0 it for now. While touching it add the missing locking to the now disabled code for the time when we'll resurrect it. MFC after: 3 days	2010-10-16 18:42:09 +00:00
bz	054f4fff23	Only hide the ifa and not the tp under #ifdef INET as the tp is needed for locking evenwhen there is no INET. MFC after: 3 days	2010-10-01 15:14:14 +00:00
jhb	b8914d3479	- Expand scope of tun/tap softc locks to cover more softc fields and driver-maintained ifnet fields (such as if_drv_flags). - Use soft locks as the mutex that protects each interface's knote list rather than using the global knote list lock. Also, use the softc for kn_hook instead of the cdev. - Use mtx_sleep() instead of tsleep() when blocking in the read routines. This fixes a lost wakeup race. - Remove D_NEEDGIANT now that the cdevsw routines use the softc lock where locking is needed. - Lock IFQ when calculating the result for FIONREAD in tap(4). tun(4) already did this. - Remove remaining spl calls. Submitted by: Marcin Cieslak saper of saper\|info (3) MFC after: 2 weeks	2010-09-22 21:02:43 +00:00
jkim	4b1a6e6702	Fix a typo in a comment. Submitted by: afiveg	2010-09-16 18:37:33 +00:00
mdf	ab3a8b533a	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
bz	06977c291c	MFp4 CH=183259: No reason to use if_free_type() as we don't change our type. Just if_free() is fine. MFC after: 3 days	2010-09-02 16:11:12 +00:00
emaste	a9a1b47f1d	Add a sysctl knob to accept input packets on any link in a failover lagg.	2010-09-01 16:53:38 +00:00
bz	70761a9ea6	MFp4 CH=182972: Add explicit linkstate UP/DOWN for the epair. This is needed by carp(4) and other things to work. MFC after: 5 days	2010-08-27 23:22:58 +00:00
rpaulo	ea11ba6788	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
zec	e1e5264fc5	When moving an ethernet ifnet from one vnet to another, destroy the associated ng_ether netgraph node in the current vnet, and create a new one in the target vnet. Reviewed by: julian MFC after: 3 days	2010-08-13 18:17:32 +00:00
will	d548943ae9	Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with the appropriate ifdefs. Reviewed by: bz Approved by: ken (mentor)	2010-08-11 20:18:19 +00:00
will	aa4e762c4a	Allow carp(4) to be loaded as a kernel module. Follow precedent set by bridge(4), lagg(4) etc. and make use of function pointers and pf_proto_register() to hook carp into the network stack. Currently, because of the uncertainty about whether the unload path is free of race condition panics, unloads are disallowed by default. Compiling with CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure. This commit requires IP6PROTOSPACER, introduced in r211115. Reviewed by: bz, simon Approved by: ken (mentor) MFC after: 2 weeks	2010-08-11 00:51:50 +00:00
jhb	4ab164bfd3	Adjust the interface type in the link layer socket address for vlan(4) interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface (IFT_ETHER). The code already fixed if_type in the ifnet causing some places to report the interface as a vlan (e.g. arp -a output) and other places to report the interface as Ethernet (getifaddrs(3)). Now they should all report IFT_L2VLAN. Reviewed by: brooks MFC after: 1 month	2010-08-06 15:15:26 +00:00
kib	c5df0ad3b8	Properly set ifi_datalen for compat32 struct if_data32. PR: kern/149240 Submitted by: Stef Walter <stef memberwebs com> MFC after: 1 weeks	2010-08-03 15:40:42 +00:00
glebius	b89adb17c9	Don't check malloc(M_WAITOK) result.	2010-07-27 11:56:49 +00:00
bz	0078d05705	Return NULL rather than 0 for a pointer. MFC after: 3 days	2010-07-27 11:54:01 +00:00
glebius	43f917d899	When installing a new ARP entry via 'arp -S', lla_lookup() will either find an existing entry, or allocate a new one. In the latter case an entry would have flags, that were supplied as argument to lla_lookup(). In case of an existing entry, flags aren't modified. This lead to losing LLE_PUB and/or LLE_PROXY flags. We should apply these flags either in lla_rt_output() or in the in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is a more correct choice. PR: kern/148784, kern/146539 Silence from: qingli, 5 days	2010-07-27 10:05:27 +00:00
jkim	9fdab3dd15	Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock regardless of the given flags. MFC after: 3 days	2010-07-22 18:44:40 +00:00
luigi	bbbbe1a022	whitespace cleanup	2010-07-15 14:41:59 +00:00
luigi	9f2bcd601a	small portability fix to build on linux/windows	2010-07-15 14:41:06 +00:00
jkim	14f08fd627	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
jhb	9b74a62d73	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
zec	66c3a596d7	Provide a macro for registering a virtualized sysctl handler for VNET opaque data. MFC after: 30 days	2010-06-02 15:29:21 +00:00
qingli	f6ab4a6810	This patch fixes the problem where proxy ARP entries cannot be added over the if_ng interface. MFC after: 3 days	2010-05-25 20:42:35 +00:00
jhb	7df2d528cf	Ignore failures from removing multicast addresses from the parent (trunk) interface when tearing down a vlan interface. If a trunk interface is detached, all of its multicast addresses are removed before the ifnet departure eventhandlers are invoked. This means that all of the multicast addresses are removed before the vlan interfaces are removed which causes the if_delmulti() calls in the vlan teardown to fail. In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer valid parent interface. In the !VLAN_ARRAY case, the eventhandler gets stuck in an infinite loop retrying vlan_unconfig_locked() forever. In general the callers of vlan_unconfig_locked() do not expect nor handle failure, so I believe it is safer to ignore the errors and tear down as much of the vlan state as possible. Silence from: net@ MFC after: 4 days	2010-05-17 19:36:56 +00:00
kmacy	a68dd336d5	allocate ipv6 flows from the ipv6 flow zone reported by: rrs@ MFC after: 3 days	2010-05-16 21:48:39 +00:00
bz	c9d1ca826b	Fix an issue with the dynamic pcpu/vnet data allocators. We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than _MODMIN space and ignore the extra space (). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-05-14 21:11:58 +00:00
kmacy	5e73b28c4d	workaround bug with ipv6 where a flow can have a null rtentry	2010-05-12 04:51:20 +00:00
alc	e6c77ecaea	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
alc	c9aaa1e2a2	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
sobomax	213eac1f2c	Add new tunable 'net.link.ifqmaxlen' to set default send interface queue length. The default value for this parameter is 50, which is quite low for many of today's uses and the only way to modify this parameter right now is to edit if_var.h file. Also add read-only sysctl with the same name, so that it's possible to retrieve the current value. MFC after: 1 month	2010-05-03 07:32:50 +00:00
alc	387e15c45a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
kmacy	1dc1263413	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
bz	0a90ef1728	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
kmacy	e04522d342	need to initialize the lock before it is used MFC after: 3 days	2010-04-27 23:48:50 +00:00
bz	7fcd42cfee	MFP4: @177254 Add missing CURVNET_RESTORE() calls for multiple code paths, to stop leaking the currently cached vnet into callers and to the process. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-04-27 15:16:54 +00:00
kib	2861e20e63	Provide compat32 shims for bpf(4), except zero-copy facilities. bd_compat32 field of struct bpf_d is kept unconditionally to not impose the requirement of including "opt_compat.h" on all numerous users of bpfdesc.h. Submitted by: jhb (version for 6.x) Reviewed and tested by: emaste MFC after: 2 weeks	2010-04-25 16:43:41 +00:00
kib	1bf8e84d75	Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST. This allows getifaddrs(3) to work for compat32 binaries. Submitted by: jhb (6.x version) Reviewed by: emaste Tested by: emaste and <pluknet gmail com> MFC after: 2 weeks	2010-04-25 16:42:47 +00:00
julian	e5e811b1e9	Move two copies of the same definition to a common include file. MFC after: 3 weeks	2010-04-14 23:06:07 +00:00
delphij	58be1647f8	When an underlying ioctl(2) handler returns an error, our ioctl(2) interface considers that it hits a fatal error, and will not copyout the request structure back for _IOW and _IOWR ioctls, keeping them untouched. The previous implementation of the SIOCGIFDESCR ioctl intends to feed the buffer length back to userland. However, if we return an error, the feedback would be defeated and ifconfig(8) would trap into an infinite loop. This commit changes SIOCGIFDESCR to set buffer field to NULL to indicate the previous ENAMETOOLONG case. Reported by: bschmidt MFC after: 2 weeks	2010-04-14 22:02:19 +00:00
bz	778f026aa2	Take a reference to make sure that the interface cannot go away during if_clone_destroy() in case parallel threads try to. PR: kern/116837 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 10 days	2010-04-11 18:47:38 +00:00
bz	35f1e4bd73	Check that the interface is on the list of cloned interfaces before trying to remove it to avoid panics in case of two threads trying to remove it in parallel. PR: kern/116837 Submitted by: Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version) MFC after: 10 days	2010-04-11 18:41:31 +00:00
bz	d7a91dc6bf	Plug reference leaks in the link-layer code ("new-arp") that previously prevented the link-layer entry from being freed. In both in.c and in6.c (though that code path seems to be basically dead) plug a reference leak in case of a pending callout being drained. In if_ether.c consistently add a reference before resetting the callout and in case we canceled a pending one remove the reference for that. In the final case in arptimer, before freeing the expired entry, remove the reference again and explicitly call callout_stop() to clear the active flag. In nd6.c:nd6_free() we are only ever called from the callout function and thus need to remove the reference there as well before calling into llentry_free(). In if_llatbl.c when freeing entire tables make sure that in case we cancel a pending callout to remove the reference as well. Reviewed by: qingli (earlier version) MFC after: 10 days Problem observed, patch tested by: simon on ipv6gw.f.o, Christian Kratzer (ck cksoft.de), Evgenii Davidov (dado korolev-net.ru) PR: kern/144564 Configurations still affected: with options FLOWTABLE	2010-04-11 16:04:08 +00:00
bz	d5b3b3f9d9	In if_detach_internal() we cannot hold the af_data lock over the dom_ifdetach() calls as they might sleep for callout_drain(). Do as we do in if_attachdomain1() [r121470] and handle if_afdata_initialized earlier and call dom_ifdetach() unlocked. Discussed with: rwatson MFC after: 10 days	2010-04-11 11:51:44 +00:00
bz	674a87c918	In if_detach_internal() only try to do the detach run if if_attachdomain1() has actually succeeded to initialize and attach. There is a theoretical possibility to drop out early in if_attachdomain1() leaving the array uninitialized if we cannot get the lock. Discussed with: rwatson MFC after: 10 days	2010-04-11 11:49:24 +00:00
jkim	2b1849a3fc	Check the pointer to JIT binary filter before its de-allocation. Submitted by: Alexander Sack (asack at niksun dot com) MFC after: 3 days	2010-03-29 20:24:03 +00:00
rpaulo	e14131f581	Add MCS to the list of media types. Sponsored by: iXsystems, inc.	2010-03-23 13:15:11 +00:00
kmacy	01cb21605b	- boot-time size the ipv4 flowtable and the maximum number of flows - increase flow cleaning frequency and decrease flow caching time when near the flow limit - stop allocating new flows when within 3% of maxflows don't start allocating again until below 12.5% MFC after: 7 days	2010-03-22 23:04:12 +00:00
emaste	41b3806743	Avoid holding the VLAN_LOCK() over the parent interface SIOCGIFMEDIA ioctl call, as it may sleep. Reviewed by: rwatson	2010-03-21 15:00:33 +00:00
bz	102a1f8933	Split eventhandler_register() into an internal part and a wrapper function that provides the allocated and setup eventhandler entry. Add a new wrapper for VIMAGE that allocates extra space to hold the callback function and argument in addition to an extra wrapper function. While the wrapper function goes as normal callback function the argument points to the extra space allocated holding the original func and arg that the wrapper function can then call. Provide an iterator function for the virtual network stack (vnet) that will call the callback function for each network stack. Provide a new set of macros for VNET that in the non-VIMAGE case will just call eventhandler_register() while in the VIMAGE case it will use vimage_eventhandler_register() passing in the extra iterator function but will only register once rather than per-vnet. We need a special macro in case we are interested in the tag returned as we must check for curvnet and can neither simply assign the return value, nor not change it in the non-vnet0 case without that. Sponsored by: ISPsystem Discussed with: jhb Reviewed by: zec (earlier version), jhb MFC after: 1 month	2010-03-19 19:51:03 +00:00
bz	abd6b4ebb7	Add ddb support to the "new" link layer code ("new-arp"): - show all lltables [1] (optional flag to also show the llentries as well) - show lltable <struct lltable > - show llentry <struct llentry > MFC after: 6 days	2010-03-18 09:09:59 +00:00
qingli	4ff4954e4e	Verify interface up status using its link state only if the interface has such capability. The interface capability flag indicates whether such capability exists. This approach is much more backward compatible. Physical device driver changes will be part of another commit. Also updated the ifconfig utility to show the LINKSTATE capability if present. Reviewed by: rwatson, imp, juli MFC after: 3 days	2010-03-16 17:59:12 +00:00
mlaier	d62719cf37	Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834. MFC after: 3 days	2010-03-15 21:15:03 +00:00
kmacy	a5e0110227	flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB pointed out by jkim@	2010-03-12 19:58:51 +00:00
jkim	5e802872af	Fix a style(9) nit.	2010-03-12 19:42:42 +00:00
kmacy	965730af4f	re-update copyright to 2010 pointed out by danfe@	2010-03-12 19:26:45 +00:00
jkim	df5e72589a	Tidy up callout for select(2) and read timeout. - Add a missing callout_drain(9) before the descriptor deallocation.[1] - Prefer callout_init_mtx(9) over callout_init(9) and let the callout subsystem handle the mutex for callout function. PR: kern/144453 Submitted by: Alexander Sack (asack at niksun dot com)[1] MFC after: 1 week	2010-03-12 19:14:58 +00:00
qingli	5c999930df	The flow-table module retrieves the destination and source address as well as the transport protocol port information from the outbound packets. The routing code is generic and compares every byte in the given sockaddr object. Therefore the temporary sockaddr objects must be cleared due to padding bytes. In addition, the port information must be stripped or the route search will either fail or return the incorrect route entry. Unit testing is done using OpenVPN over the if_tun interface. MFC after: 7 days	2010-03-12 10:24:58 +00:00
kmacy	0315e29550	fix stats reporting sysctl	2010-03-12 06:31:19 +00:00
kmacy	128542c758	- restructure flowtable to support ipv6 - add a name argument to flowtable_alloc for printing with ddb commands - extend ddb commands to print destination address or 4-tuples - don't parse ports in ulp header if FL_HASH_ALL is not passed - add kern_flowtable_insert to enable more generic use of flowtable (e.g. system calls for adding entries) - don't hash loopback addresses - cleanup whitespace - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention - add sysctls to accumulate stats and report aggregate MFC after: 7 days	2010-03-12 05:03:26 +00:00
qingli	cde322640a	The if_tap interface is of IFT_ETHERNET type, but it does not set or update the if_link_state variable. As such RT_LINK_IS_UP() fails for the if_tap interface. Also, the RT_LINK_IS_UP() needs to bypass all loopback interfaces because loopback interfaces are considered up logically as long as the system is running. This patch fixes the above issues by setting and updating the if_link_state variable when the tap interface is opened or closed respectively. Similary approach is already done in the if_tun device. MFC after: 3 days	2010-03-11 17:56:46 +00:00
qingli	93013817b0	One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days	2010-03-09 01:11:45 +00:00
delphij	fe5f1f57b8	Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg interface. The check itself seems to be coming from OpenBSD but does not seem to be useful for our code. Discussed with: thomasa MFC after: 1 month	2010-03-09 00:52:16 +00:00
bz	07f7a52d59	Not only flush the ipfw tables when unloading ipfw or tearing down a virtual netowrk stack, but also free the Radix Node Head. Sponsored by: ISPsystem Reviewed by: julian MFC after: 5 days	2010-03-07 15:37:58 +00:00
bz	8ef34a3986	Introduce a function rn_detachhead() that will free the radix table root nodes. This is only needed (and available) in the virtualization case to free the resources when tearing down a virtual network stack. Sponsored by: ISPsystem Reviewed by: julian, zec MFC after: 5 days	2010-03-06 21:27:26 +00:00
bz	3338a0e11e	Rework reference counting in case we queue into the netisr, or overflow the netisr queue and fall back to the interface queue so that we can garuantee that the ifnet pointer stays valid. Formerly we ended up with reference counts <= 0 in case the netisr had returned ENOBUFS. The idea is to track any packet in the netisr queue and only change the refount on edge operations for the fallback interface queue. This also avoids problems in case the if_snd.ifq_len lies to us. Also rework refount assertions to make sure they trigger if we go below 1. Formerly a negative refence count did not trigger the assert as the refcount variable is u_int. Sponsored by: ISPsystem MFC after: 5 days	2010-03-06 21:22:28 +00:00
luigi	5ceeac4aa8	Bring in the most recent version of ipfw and dummynet, developed and tested over the past two months in the ipfw3-head branch. This also happens to be the same code available in the Linux and Windows ports of ipfw and dummynet. The major enhancement is a completely restructured version of dummynet, with support for different packet scheduling algorithms (loadable at runtime), faster queue/pipe lookup, and a much cleaner internal architecture and kernel/userland ABI which simplifies future extensions. In addition to the existing schedulers (FIFO and WF2Q+), we include a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new, very fast version of WF2Q+ called QFQ. Some test code is also present (in sys/netinet/ipfw/test) that lets you build and test schedulers in userland. Also, we have added a compatibility layer that understands requests from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries, and replies correctly (at least, it does its best; sometimes you just cannot tell who sent the request and how to answer). The compatibility layer should make it possible to MFC this code in a relatively short time. Some minor glitches (e.g. handling of ipfw set enable/disable, and a workaround for a bug in RELENG_7's /sbin/ipfw) will be fixed with separate commits. CREDITS: This work has been partly supported by the ONELAB2 project, and mostly developed by Riccardo Panicucci and myself. The code for the qfq scheduler is mostly from Fabio Checconi, and Marta Carbone and Francesco Magno have helped with testing, debugging and some bug fixes.	2010-03-02 17:40:48 +00:00
luigi	1a096d21d2	remove unnecessary casts leftover from a bogus fix to a previous bug	2010-03-02 16:24:16 +00:00
alfred	f34ce3dd38	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan	2010-03-02 06:58:58 +00:00
joel	bb682915c9	The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software. Obtained from: NetBSD	2010-03-01 17:05:46 +00:00
rwatson	384e695fac	Whitespace tweak. MFC after: 3 days	2010-03-01 00:43:05 +00:00
rwatson	0b098aa759	Changes to support crashdump analysis of netisr: - Rename the netisr protocol registration array, 'np' to 'netisr_proto', in order to reduce the chances of symbol name collisions. It remains statically defined, but it will be looked up by netstat(1). - Move certain internal structure definitions from netisr.c to netisr_internal.h so that netstat(1) can find them. They remain private, and should not be used for any other purpose (for example, they should not be used by kernel modules, which must instead use the public interfaces in netisr.h). - Store a kernel-compiled version of NETISR_MAXPROT in the global variable netisr_maxprot, and export via a sysctl, so that it is available for use by netstat(1). This is especially important for crashdump interpretation, where the size of the workstream structure is determined by the maximum number of protocols compiled into the kernel. MFC after: 1 week Sponsored by: Juniper Networks	2010-03-01 00:42:36 +00:00
kib	f93a8fa5df	In both if_tun and if_tap: Do not do additional dev_ref() on the newly created interface in the if_clone create method [1]. This reference is not needed and never removed, causing struct cdevpriv leakage. Remove the setting of SI_CHEAPCLONE flag as well, since it is unused. For dev_clone handlers, create cdevs with the call make_dev_credf(MAKEDEV_REF) instead of calling make_dev() and then dev_ref(), to avoid a race. Call drain_dev_clone_events() at the module unload time after dev_clone handler is deinstalled. Submitted by: Mikolaj Golub <to.my.trociny gmail com> [1] MFC after: 1 week	2010-02-28 16:25:49 +00:00
rwatson	e0805e87a3	Fix edge cases in several KASSERTs: use <= rather than < when testing that counters have not gone about MAXCPU or NETISR_MAXPROT. These problems caused panics on UP kernels with INVARIANTS when using sysctl -a, but would also have caused problems for 32-core boxes or if the netisr protocol vector was fully populated. Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com> MFC after: 4 days	2010-02-25 09:51:14 +00:00
bz	44a0d7a588	Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets' in the db_show_all_table as 'show all ifnets' and with that follow the convention for showing complete lists. Submitted by: thompsa MFC after: 3 days	2010-02-24 15:54:24 +00:00
rwatson	79e4f88402	Fix constant assignment for netisr protocol information sysctl. MFC after: 1 week Spotted by: bz	2010-02-22 16:16:16 +00:00
rwatson	ac0cbafef7	Export netisr configuration and statistics to userspace via sysctl(9). MFC after: 1 week Sponsored by: Juniper Networks	2010-02-22 15:03:16 +00:00
rwatson	e1faba5621	ifconfig(8) expects interface fooX to be supported by the module if_foo, and will try to load it if it's not present. To better meet these expectations, change the module name for the loopback interface from 'loop' to 'if_lo'. The loopback interface is always compiled into the base kernel, so there are no resulting changes in kld files, etc. Discussed with: brooks (ages ago) MFC after: 1 week	2010-02-21 15:25:47 +00:00
yongari	55d2bfd121	Add __FBSDID. Reviewed by: sam	2010-02-21 00:07:45 +00:00
yongari	453676a091	Add TSO support on VLANs. Intentionally separated IFCAP_VLAN_HWTSO from IFCAP_VLAN_HWTAGGING. I think some hardwares may be able to TSO over VLAN without VLAN hardware tagging. Driver changes and userland support will follow. Reviewed by: thompsa	2010-02-20 22:47:20 +00:00
bz	27d5a92985	Start to implement ifnet DDB support: - 'show ifnets' prints a list of ifnet s per virtual network stack, - 'show ifnet <struct ifnet >' prints fields matching the given ifp. We do not yet print the complete set of fields and might want to factor this out to an extra if_debug.c file in case this grows a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1]. Sponsored by: ISPsystem Suggested by: rwatson [1] Reviewed by: rwatson MFC after: 5 days	2010-02-20 22:09:48 +00:00
bz	77e8f746fc	Enhance a panic string to contain more useful debugging information. Sponsored by: ISPsystem Reviewed by: rwatson MFC after: 5 days	2010-02-20 21:43:36 +00:00
jkim	30767ce70b	Return partially filled buffer for non-blocking read(2) in non-immediate mode. PR: kern/143855	2010-02-20 00:19:21 +00:00
pjd	12d96a648c	Mark various sysctls also as tunables. Reviewed by: rwatson MFC after: 1 week	2010-02-15 09:19:07 +00:00
mlaier	b056acb864	Fix drbr and altq interaction: - introduce drbr_needs_enqueue that returns whether the interface/br needs an enqueue operation: returns true if altq is enabled or there are already packets in the ring (as we need to maintain packet order) - update all drbr consumers - fix drbr_flush - avoid using the driver queue (IFQ_DRV_*) in the altq case as the multiqueue consumer does not provide enough protection, serialize altq interaction with the main queue lock - make drbr_dequeue_cond work with altq Discussed with: kmacy, yongari, jfv MFC after: 4 weeks	2010-02-13 16:04:58 +00:00
bz	7b5cc4e51b	Add DDB support for printing vnet_sysinit and vnet_sysuninit ordered call lists. Try to lookup function/symbol names and print those in addition to the pointers, along with the constants for subsystem and order. This is useful for debugging vnet teardown ordering issues. Make it possible to call the actual printing frunction from normal code at runtime, ie. from vnet_sysuninit(), if DDB support is there. Sponsored by: ISPsystem MFC After: 8 days	2010-02-09 22:39:34 +00:00
bz	eb28e347bf	Add an SDT provider for "vnet"s along with probes for vnet_alloc and vnet_destroy. Use the line number rather than NULL as dummy argument. Note: the fbt provider does not reliably provide :return probes (depending on optimization levels used at compile time) making it unusable for scripts to generate complete call-traces with well defined boundaries over allocations or destructions of virtual network stacks. Sponsored by: ISPsystem MFC After: 8 days	2010-02-09 22:15:59 +00:00
eri	3c38fdad1e	Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features. PR: kern/141646 Reviewed by: thompsa Approved by: thompsa(co-mentor) MFC after: 2 weeks	2010-02-06 13:49:35 +00:00
zec	393450ca7b	Instead of spamming the console on each curvnet recursion event, print out each such call graph only once, along with a stack backtrace. This should make kernels built with VNET_DEBUG reasonably usable again in busy / production environments. Introduce a new DDB command "show vnetrcrs" which dumps the whole log of distinctive curvnet recursion events. This might be useful when recursion reports get burried / lost too deep in the message buffer. In the later case stack backtraces are not available. Reviewed by: bz MFC after: 3 days	2010-02-04 07:55:42 +00:00
hrs	0f402c2331	- Check if_type of "addm <interface>" before setting the interface's MTU to the if_bridge(4) interface. This fixes a bug that MTU value of "addm <interface>" is used even when it is invalid for the if_bridge(4) member: # ifconfig bridge0 create # ifconfig bridge0 bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 ... # ifconfig bridge0 addm lo0 ifconfig: BRDGADD lo0: Invalid argument # ifconfig bridge0 bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 16384 ... - Do not ignore MTU value of an interface even when if_type == IFT_GIF. This fixes MTU mismatch when an if_bridge(4) interface has a gif(4) interface and no other interface as the member, and it is directly used for L2 communication with EtherIP tunneling enabled. - Implement SIOCSIFMTU ioctl. Changing the MTU is allowed only when all members have the same MTU value.	2010-01-31 08:16:37 +00:00
delphij	d9a0cd0982	Revised revision 199201 (add interface description capability as inspired by OpenBSD), based on comments from many, including rwatson, jhb, brooks and others. Sponsored by: iXsystems, Inc. MFC after: 1 month	2010-01-27 00:30:07 +00:00
syrinx	40d92428fb	While flushing the multicast filter of an interface, do not zero the relevant ifmultiaddr structures' reference to the parent interface, unless the parent interface is really detaching. While here, program only link layer multicast filters to a wlan's hardware parent interface. PR: kern/142391, kern/142392 Reviewed by: sam, rpaolo, bms MFC after: 1 week	2010-01-24 16:17:58 +00:00
thompsa	d0093d69aa	Do not hold the lock over if_setlladdr() as it calls into the interface driver init routine.	2010-01-19 04:29:42 +00:00
thompsa	5056e27c2d	Declare a new EVENTHANDLER called iflladdr_event which signals that the L2 address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev	2010-01-18 20:34:00 +00:00
bz	8458b2914e	Correct a typo. MFC after: 5 days	2010-01-10 12:03:53 +00:00
trasz	2ced506c4b	Stop GCC from complaining about lagg_port_checkstacking() being unused.	2010-01-08 16:44:33 +00:00
mbr	7450f52a57	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
luigi	f1fcae96ad	put ip_var before ip_fw_private.h as this will be needed in the near future	2010-01-07 10:27:52 +00:00
luigi	40024ff7c3	Various cleanup done in ipfw3-head branch including: - use a uniform mtag format for all packets that exit and re-enter the firewall in the middle of a rulechain. On reentry, all tags containing reinject info are renamed to MTAG_IPFW_RULE so the processing is simpler. - make ipfw and dummynet use ip_len and ip_off in network format everywhere. Conversion is done only once instead of tracking the format in every place. - use a macro FREE_PKT to dispose of mbufs. This eases portability. On passing i also removed a few typos, staticise or localise variables, remove useless declarations and other minor things. Overall the code shrinks a bit and is hopefully more readable. I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr. For ng_ipfw i am actually waiting for feedback from glebius@ because we might have some small changes to make. For if_bridge and if_ethersubr feedback would be welcome (there are still some redundant parts in these two modules that I would like to remove, but first i need to check functionality).	2010-01-04 19:01:22 +00:00
jhb	26cb0488b3	Use stricter checking to match possible vlan clones by not allowing extra garbage characters around or within the tag. Reviewed by: brooks MFC after: 3 days	2009-12-31 20:44:38 +00:00
brooks	a5cc24440b	The devices that supported EVFILT_NETDEV kqueue filters were removed in r195175. Remove all definitions, documentation, and usage. fifo_misc.c: Remove all kqueue tests as fifo_io.c performs all those that would have remained. Reviewed by: rwatson MFC after: 3 weeks X-MFC note: don't change vlan_link_state() function signature	2009-12-31 20:29:58 +00:00
qingli	c44d3e680a	Remove a deleted comment line that was brought back by my previous commit. MFC after: 5 days	2009-12-31 01:09:16 +00:00
qingli	ed965a92bc	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days	2009-12-30 21:35:34 +00:00
jhb	3ce93dcb7c	Change vlan interfaces to cope more usefully with the parent interface being renamed. Previously the vlan interfaces would lose their configuration as if the parent interface had been physically removed. Now vlan interfaces ignore rename events. - Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being renamed. This flag can be checked in ifnet departure/arrival event handlers to treat rename events differently. - Change the ifnet departure event handler in the if_vlan(4) driver to ignore departure events due to a trunk interface being renamed. Reviewed by: brooks, rwatson MFC after: 1 week	2009-12-29 13:35:18 +00:00
luigi	483862a5a2	bring in several cleanups tested in ipfw3-head branch, namely: r201011 - move most of ng_ipfw.h into ip_fw_private.h, as this code is ipfw-specific. This removes a dependency on ng_ipfw.h from some files. - move many equivalent definitions of direction (IN, OUT) for reinjected packets into ip_fw_private.h - document the structure of the packet tags used for dummynet and netgraph; r201049 - merge some common code to attach/detach hooks into a single function. r201055 - remove some duplicated code in ip_fw_pfil. The input and output processing uses almost exactly the same code so there is no need to use two separate hooks. ip_fw_pfil.o goes from 2096 to 1382 bytes of .text r201057 (see the svn log for full details) - macros to make the conversion of ip_len and ip_off between host and network format more explicit r201113 (the remaining parts) - readability fixes -- put braces around some large for() blocks, localize variables so the compiler does not think they are uninitialized, do not insist on precise allocation size if we have more than we need. r201119 - when doing a lookup, keys must be in big endian format because this is what the radix code expects (this fixes a bug in the recently-introduced 'lookup' option) No ABI changes in this commit. MFC after: 1 week	2009-12-28 10:47:04 +00:00
rwatson	c7656477f0	When warning about possible netisr configuration problems during boot, report using "netisr_init" rather than "netisr2", which was the development name for the project. MFC after: 3 days	2009-12-23 12:33:59 +00:00
rwatson	0b860a6956	Refine netisr.c comments a bit.	2009-12-23 12:31:27 +00:00
luigi	2043aec456	merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw. In detail: 1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK. 2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N). 3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1). 4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id> All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t Operation costs now are as follows: Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) ------------------------------------------------------------------- The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI Supported by: Valeria Paoli MFC after: 1 month	2009-12-22 19:01:47 +00:00
jhb	70c567a753	Remove commented out prototype for ifinit(). This prototype has been commented out since 1.1 and has not been present in <sys/systm.h> since at least 1.1 of that file. It is also not needed in FreeBSD due to SYSINIT().	2009-12-21 20:09:19 +00:00
luigi	c4e6c7a490	Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part. Files touched by this commit: conf/files now references the two new source files netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h. netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions netinet/ipfw/ip_fw_log.c ipfw_log and related functions netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer. A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them. MFC after: 1 month	2009-12-15 16:15:14 +00:00
luigi	003a092f8a	Move the scan for max_keylen into route.c::route_init(), and make max_keylen an argument for rn_init(). This removes an unnecessary dependency on domain.h from radix.c MFC after: 7 days	2009-12-14 20:12:51 +00:00
bz	932cbdbe4d	Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days	2009-12-13 13:57:32 +00:00
luigi	29b041fe97	Make the code buildable in userland so it is easier to test it: this requires a small reordering of headers and a few #defines to map functions not available in userland. Remove a useless #ifndef block at the beginning of the file. Introduce (temporarily) rn_init2(), see the comment in the code for the proper long term change. No ABI or functional change. MFC after: 7 days	2009-12-12 15:49:28 +00:00
luigi	54e485de2b	No functional changes (who dares to touch this code!) but: - cast the result of LEN() to int as this is the main usage. - use LEN() in one place where it was forgotten. - Document the use of a static variable in rw mode. More small changes to follow. MFC after: 7 days	2009-12-10 10:34:30 +00:00
jhb	1f48c677b5	Remove if_timer/if_watchdog now that they are no longer used. The space used by if_timer is reserved for expanding if_index to an int in the future. Reviewed by: rwatson, brooks	2009-11-30 21:25:57 +00:00
jkim	2d93b6e424	General style cleanup, no functional change.	2009-11-20 21:12:40 +00:00
jkim	052bc52af7	- Allocate scratch memory on stack instead of pre-allocating it with the filter as we do from bpf_filter()[1]. - Revert experimental use of contigmalloc(9)/contigfree(9). It has no performance benefit over malloc(9)/free(9)[2]. Requested by: rwatson[1] Pointed out by: rwatson, jhb, alc[2]	2009-11-20 18:49:20 +00:00
jkim	8e75a7257b	- Change internal function bpf_jit_compile() to return allocated size of the generated binary and remove page size limitation for userland. - Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make sure the generated binary aligns properly and make it physically contiguous.	2009-11-18 23:40:19 +00:00
jkim	efc247aeb3	- Make BPF JIT compiler working again in userland. We are limiting size of generated native binary to page size for now. - Update copyright date and fix some style nits.	2009-11-18 19:26:17 +00:00
tuexen	7b17948ff8	Fix a LOR showing up with sctp_bsd_addr(): Do not hold a rt lock when calling rt_newaddrmsg(). Reviewed by: qingli Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 12:57:10 +00:00
delphij	8fed657163	Revert revision 199201 for now as it has introduced a kernel vulnerability and requires more polishing.	2009-11-12 19:02:10 +00:00
delphij	13a19ef806	Add interface description capability as inspired by OpenBSD. MFC after: 3 months	2009-11-11 21:30:58 +00:00
jhb	892fe839c4	Take a step towards removing if_watchdog/if_timer. Don't explicitly set if_watchdog/if_timer to NULL/0 when initializing an ifnet. if_alloc() sets those members to NULL/0 already.	2009-11-06 14:55:01 +00:00
rwatson	07e1c4bd14	Remove unneeded blank line from bpf_drvinit(). MFC after: 3 days	2009-10-23 17:26:29 +00:00
brueffer	245a0bbb51	Check pointer for NULL before dereferencing it, not after. PR: 138390 Submitted by: Patroklos Argyroudis <argp@census-labs.com> MFC after: 1 week	2009-10-22 06:17:04 +00:00
qingli	e4c3421f52	Verify "smp_started" is true before calling sched_bind() and sched_unbind(). Reviewed by: kmacy MFC after: 3 days	2009-10-22 00:32:01 +00:00
qingli	88bb68ef0f	The flow-table function flowtable_route_flush() may be called during system initialization time. Since the flow-table is designed to maintain per CPU flow cache, the existing code did not check whether "smp_started" is true before calling sched_bind() and sched_unbind(), which triggers a page fault. Reviewed by: jeff MFC after: immediately	2009-10-20 21:27:03 +00:00
rwatson	d42d258f85	Clean up comments, white space, and style in pfil.c (especially new VNET bits). MFC after: 3 days (not VNET bits)	2009-10-19 15:19:14 +00:00
rwatson	dc93fde3ad	Remove unused pfil_flags field in packet_filter_hook. MFC after: 3 days	2009-10-18 22:54:09 +00:00
rwatson	c564f117b3	Sort function prototypes in pfil.h, clean up white space, and better align fields for printing. MFC after: 3 days	2009-10-18 22:43:28 +00:00
rwatson	066984a67b	Line-wrap pfil.c so that it prints more nicely. MFC after: 3 days	2009-10-18 11:27:34 +00:00
bz	b0448eafb0	Unbreak the VIMAGE build with IPSEC, broken with r197952 by virtualizing the pfil hooks. For consistency add the V_ to virtualize the pfil hooks in here as well. MFC after: 55 days X-MFC after: julian MFCed r197952.	2009-10-14 11:55:55 +00:00
julian	79c1f884ef	Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months	2009-10-11 05:59:43 +00:00
bz	1e3cae3b31	Put #ifdef INET around parts of the FLOWTABLE code, to unbreak nooptions INET kernel builds. MFC after: 3 days X-MFC: with r197687	2009-10-03 10:56:03 +00:00
qingli	42eac0e4cd	The flow-table associates TCP/UDP flows and IP destinations with specific routes. When the routing table changes, for example, when a new route with a more specific prefix is inserted into the routing table, the flow-table is not updated to reflect that change. As such existing connections cannot take advantage of the new path. In some cases the path is broken. This patch will update the affected flow-table entries when a more specific route is added. The route entry is properly marked when a route is deleted from the table. In this case, when the flow-table performs a search, the stale entry is updated automatically. Therefore this patch is not necessary for route deletion. Submitted by: simon, phk Reviewed by: bz, kmacy MFC after: 3 days	2009-10-01 20:32:29 +00:00
qingli	85d603eeff	A wrong variable is used when setting up the interface address route, which broke source address selection in some code paths. Submitted by: noted by bz Reviewed by: hrs MFC after: immediately	2009-09-20 17:22:19 +00:00
zec	58bb001f4b	Style fix - break too long a line in two. Spotted by: bz MFC after: 3 days	2009-09-18 09:03:23 +00:00
zec	02353f41db	V_irtualize the lltables list, making ARP and ND reasonably usable again with options VIMAGE kernels. Submitted by: bz (the original version, probably identical to this one) Reviewed by: many @ DevSummit Cambridge MFC after: 3 days	2009-09-17 14:52:15 +00:00
qingli	3a82e44273	Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately	2009-09-15 19:18:34 +00:00
rwatson	53eaed07bc	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
emaste	e750f5d891	Compare pointer with NULL, not 0.	2009-09-09 03:36:43 +00:00
np	ba75578c03	Add arp_update_event. This replaces route_arp_update_event, which has not worked since the arp-v2 rewrite. The event handler will be called with the llentry write-locked and can examine la_flags to determine whether the entry is being added or removed. Reviewed by: gnn, kmacy Approved by: gnn (mentor) MFC after: 1 month	2009-09-08 21:17:17 +00:00
qingli	5b9cf14b54	The addresses that are assigned to the loopback interface should be part of the kernel routing table. Reviewed by: bz MFC after: immediately	2009-09-05 20:24:37 +00:00
qingli	0cca60c70d	This patch fixes the following issues: - Interface link-local address is not reachable within the node that owns the interface, this is due to the mismatch in address scope as the result of the installed interface address loopback route. Therefore for each interface address loopback route, the rt_gateway field (of AF_LINK type) will be used to track which interface a given address belongs to. This will aid the address source to use the proper interface for address scope/zone validation. - The loopback address is not reachable. The root cause is the same as the above. - Empty nd6 entries are created for the IPv6 loopback addresses only for validation reason. Doing so will eliminate as much of the special case (loopback addresses) handling code as possible, however, these empty nd6 entries should not be returned to the userland applications such as the "ndp" command. Since both of the above issues contain common files, these files are committed together. Reviewed by: bz MFC after: immediately	2009-09-05 16:43:16 +00:00
gnn	60eb51e7a3	Add ARP statistics to the kernel and netstat. New counters now exist for: requests sent replies sent requests received replies received packets received total packets dropped due to no ARP entry entrys timed out Duplicate IPs seen The new statistics are seen in the netstat command when it is given the -s command line switch. MFC after: 2 weeks In collaboration with: bz	2009-09-03 21:10:57 +00:00
qingli	04fc72216c	As part of r196609, a call to "rtalloc" did not take the fib into account. So call the appropriate "rtalloc_ign_fib()" instead of calling "rtalloc_ign()". Reviewed by:i pointed out by bz MFC after: immediately	2009-08-31 00:14:37 +00:00
zec	9940277b0b	Introduce a separate sx lock for protecting lists of vnet sysinit and sysuninit handlers. Previously, sx_vnet, which is a lock designated for protecting the vnet list, was (ab)used for protecting vnet sysinit / sysuninit handler lists as well. Holding exclusively the sx_vnet lock while invoking sysinit and / or sysuninit handlers turned out to be problematic, since some of the handlers may attempt to wake up another thread and wait for it to walk over the vnet list, hence acquire a shared lock on sx_vnet, which in turn leads to a deadlock. Protecting vnet sysinit / sysuninit lists with a separate lock mitigates this issue, which was first observed with flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c. Reviewed by: rwatson, jhb MFC after: 3 days	2009-08-28 22:30:55 +00:00
qingli	f92dfc8485	In ip_output(), the flow-table module must not try to cache L2/L3 information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type. Since the L2 information (rt_lle) is invalid for these interface types, accidental caching attempt will trigger panic when the invalid rt_lle reference is accessed. When installing a new route, or when updating an existing route, the user supplied gateway address may be an interface address (this is particularly true for point-to-point interface related modules such as ppp, if_tun, if_gif). Currently the routing command handler always set the RTF_GATEWAY flag if the gateway address is given as part of the command paramters. Therefore the gateway address must be verified against interface addresses or else the route would be treated as an indirect route, thus making that route unusable. Reviewed by: kmacy, julia, rwatson Verified by: marcus MFC after: 3 days	2009-08-28 07:01:09 +00:00
rwatson	cffae4081c	Add IFNET_HOLD reserved pointer value for the ifindex ifnet array, which allows an index to be reserved for an ifnet without making the ifnet available for management operations. Use this in if_alloc() while the ifnet lock is released between initial index allocation and completion of ifnet initialization. Add ifindex_free() to centralize the implementation of releasing an ifindex value. Use in if_free() and if_vmove(), as well as when releasing a held index in if_alloc(). Reviewed by: bz MFC after: 3 days	2009-08-26 11:13:10 +00:00
rwatson	8184e58b3a	Break out allocation of new ifindex values from if_alloc() and if_vmove(), and centralize in a single function ifindex_alloc(). Assert the IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not close all known races in this code. Reviewed by: bz MFC after: 3 days	2009-08-25 20:21:16 +00:00
rwatson	544dfa0789	Use locks specific to the lltable code, rather than borrow the ifnet list/index locks, to protect link layer address tables. This avoids lock order issues during interface teardown, but maintains the bug that sysctl copy routines may be called while a non-sleepable lock is held. Reviewed by: bz, kmacy MFC after: 3 days	2009-08-25 09:52:38 +00:00
jfv	91fcf01735	When bridging LRO is causing a problem, the believe that it would work as long as all interfaces have TSO seems to be false, until the matter gets sorted out just disable LRO completely.	2009-08-24 21:04:51 +00:00
rwatson	260dfcf9e9	Make if_grow static -- it's not used outside of if.c, and with the internals destined to change, it's better if it remains that way. MFC after: 3 days	2009-08-24 12:52:05 +00:00
zec	47445e571b	When moving ifnets from one vnet to another, and the ifnet has ifaddresses of AF_LINK type which thus have an embedded if_index "backpointer", we must update that if_index backpointer to reflect the new if_index that our ifnet just got assigned. This change affects only options VIMAGE builds. Submitted by: bz Reviewed by: bz Approved by: re (rwatson), julian (mentor)	2009-08-24 10:14:09 +00:00
rwatson	f0ee7a159b	Rather than using IFNET_RLOCK() when iterating over (and modifying) the ifnet list during if_ef load, directly acquire the ifnet_sxlock exclusively. That way when if_alloc() recurses the lock, it's a write recursion rather than a read->write recursion. This code structure is arguably a bug, so add a comment indicating that this is the case. Post-8.0, we should fix this, but this commit resolves panic-on-load for if_ef. Discussed with: bz, julian Reported by: phk MFC after: 3 days	2009-08-23 21:00:21 +00:00
rwatson	ef8d755d4d	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
julian	009dd32306	Don't allow access to the internals until it has all been set up. Specifically, not until the per-vnet parts have been set up. Submitted by: kmacy@ Reviewed by: julian@, zec@ Approved by: re(rwatson) MFC after: immediately	2009-08-21 09:22:32 +00:00
kmacy	e10b9149e4	This change fixes a comment and addresses a complaint by kib@ by moving a frequently executed flowtable syslog statement from being conditional on bootverbose to conditional on a per-vnet flowtable sysctl. Approved by: re@	2009-08-19 20:13:09 +00:00
kmacy	bbd03fa206	- change the interface to flowtable_lookup so that we don't rely on the mbuf for obtaining the fib index - check that a cached flow corresponds to the same fib index as the packet for which we are doing the lookup - at interface detach time flush any flows referencing stale rtentrys associated with the interface that is going away (fixes reported panics) - reduce the time between cleans in case the cleaner is running at the time the eventhandler is called and the wakeup is missed less time will elapse before the eventhandler returns - separate per-vnet initialization from global initialization (pointed out by jeli@) Reviewed by: sam@ Approved by: re@	2009-08-18 20:28:58 +00:00
kmacy	2836450c4c	fix netboot issue by disabling flowtable lookups until initialization has been run Reviewed by: rwatson@ Approved by: re@	2009-08-17 19:09:28 +00:00
rwatson	e905082a5a	Remove unused if_rawoutput() macro; it has been unused since at least FreeBSD 2. Approved by: re (kib)	2009-08-15 22:26:26 +00:00
zec	51ca260850	Appease VNET_DEBUG - in if_vmove we temporarily switch i.e. recurse from one vnet to another which is OK, so no need to flood the console with warnings here. Approved by: re (rwatson), julian (mentor)	2009-08-14 22:46:45 +00:00
zec	b464f1dafc	Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from INVARIANTS. Reviewed by: bz Approved by: re (rwatson), julian (mentor)	2009-08-14 22:41:39 +00:00
bz	5307a46b8b	Make it possible to change the vnet sysctl variables on jails with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)	2009-08-13 10:26:34 +00:00
bz	b6a41509df	Put multiple instructions into a block when iterating; unbreaks NET_RT_DUMP, which otherwise only returned information of AF_MAX. This was broken in r193232 (save your time - my bug, my fix). PR: kern/137700 Reported by: Larry Baird (lab gta.com) Tested by: Larry Baird (lab gta.com) Reviewed by: zec, lstewart, qing Approved by: re (kib)	2009-08-13 09:29:52 +00:00
jkim	cda13d6d86	Always embed pointer to BPF JIT function in BPF descriptor to avoid inconsistency when opt_bpf.h is not included. Reviewed by: rwatson Approved by: re (rwatson)	2009-08-12 17:28:53 +00:00
bz	a5f0cffcb4	Update DDB show vnet command to print all used and available information. Reviewed by: rwatson, zec Approved by: re	2009-08-12 12:00:21 +00:00
bz	d74deee9a1	Put minimum alignment on the dpcpu and vnet section so that ld when adding the __start_ symbol knows the expected section alignment and can place the __start_ symbol correctly. These sections will not support symbols with super-cache line alignment requirements. For full details, see posting to freebsd-current, 2009-08-10, Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>. Debugging and testing patches by: Kamigishi Rei (spambox haruhiism.net), np, lstewart, jhb, kib, rwatson Tested by: Kamigishi Rei, lstewart Reviewed by: kib Approved by: re	2009-08-12 10:26:03 +00:00
rwatson	5c6699ad3d	Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib)	2009-08-02 19:43:32 +00:00
rwatson	2eafab39fe	The colour was red as shall be the letters of this warning to people upon boot if the experimental VIMAGE feature was compiled into the kernel. Submitted by: bz Reviewed by: zec Approved by: re (vimage blanket)	2009-08-01 22:22:45 +00:00
rwatson	3b267ddfaa	Minor style tweaks. Approved by: re (vimage blanket)	2009-08-01 21:58:32 +00:00
rwatson	648ff24430	Make the vnet alloc/destroy paths a bit easier to followg by merging vnet_data_init/vnet_data_destroy into vnet_alloc/vnet_destroy. Reviewed by: bz, zec Approved by: re (vimage blanket)	2009-08-01 21:54:15 +00:00
rwatson	e3917e768d	Remove vnet_foreach() utility function, which previously allowed vnet.c to iterate virtual network stacks without being aware of the implementation details previously hidden in kern_vimage.c. Now they are in the same file, so remove this added complexity. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 20:24:45 +00:00
rwatson	fb9ffed650	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
rwatson	466a4af8b2	Reorder and recomment vnet.c and vnet.h on the basis that they are no longer solely about the virtual network stack memory allocator. Approved by: re (vimage blanket)	2009-07-30 12:41:19 +00:00
rwatson	c387c55113	Revise header comments for vnet.h as we now implement VNET_SYSINIT, not just VNET_DEFINE in vnet.h. Approved by: re (vimage blanket)	2009-07-28 22:17:34 +00:00
qingli	4092d532fe	The new flow table caches both the routing table entry as well as the L2 information. For an indirect route the cached L2 entry contains the MAC address of the gateway. Typically the default route is used to transmit multicast packets when explicit multicast routes are not available. The ether_output() function bypasses L2 resolution function if it verifies the L2 cache is valid, because the cached L2 address (a unicast MAC address) is copied into the packets as the destination MAC address. This validation, however, does not apply to broadcast and multicast packets because the destination MAC address is mapped according to a standard method instead. Submitted by: Xin Li Reviewed by: bz Approved by: re	2009-07-28 17:16:54 +00:00
qingli	8c1899d934	This patch does the following: - Allow loopback route to be installed for address assigned to interface of IFF_POINTOPOINT type. - Install loopback route for an IPv4 interface addreess when the "useloopback" sysctl variable is enabled. Similarly, install loopback route for an IPv6 interface address when the sysctl variable "nd6_useloopback" is enabled. Deleting loopback routes for interface addresses is unconditional in case these sysctl variables were disabled after an interface address has been assigned. Reviewed by: bz Approved by: re	2009-07-27 17:08:06 +00:00
bz	83f1495433	Update epair(4) to the new netisr implementation and polish things a bit: - use dpcpu data to track the ifps with packets queued up, - per-cpu locking and driver flags - along with .nh_drainedcpu and NETISR_POLICY_CPU. - Put the mbufs in flight reference count, preventing interfaces from going away, under INVARIANTS as this is a general problem of the stack and should be solved in if.c/netisr but still good to verify the internal queuing logic. - Permit changing the MTU to virtually everythinkg like we do for loopback. Hook epair(4) up to the build. Approved by: re (kib)	2009-07-26 12:20:07 +00:00
bz	3aec900b26	Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls (ifconfig ifN (-)vnet <jname\|jid>) work correctly. Move vi_if_move to if.c and split it up into two functions(), one for each ioctl. In the reclaim case, correctly set the vnet before calling if_vmove. Instead of silently allowing a move of an interface from the current vnet to the current vnet, return an error. () There is some duplicate interface name checking before actually moving the interface between network stacks without locking and thus race prone. Ideally if_vmove will correctly and automagically handle these in the future. Suggested by: rwatson (*) Approved by: re (kib)	2009-07-26 11:29:26 +00:00
rwatson	b3be1c6e3b	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
bz	1f4b104d4d	sysctl_msec_to_ticks is used with both virtualized and non-vrtiualized sysctls so we cannot used one common function. Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code. Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there. Convert the two over places to use the new macro. Reviewed by: rwatson Approved by: re (kib)	2009-07-21 21:58:55 +00:00
rwatson	d32e9e9901	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
rwatson	fb3be5ae64	Add macros VNET_SETNAME and VNET_SYMPREFIX, and expose to userspace if _WANT_VNET is defined. This way we don't need separate definitions in libkvm. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 07:50:50 +00:00
rwatson	80ed051e0c	Normalize field naming for struct vnet, fix two debugging printfs that print them. Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-19 17:40:45 +00:00
rwatson	6955067932	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
jamie	9f81cbd9ec	Remove the interim vimage containers, struct vimage and struct procg, and the ioctl-based interface that supported them. Approved by: re (kib), bz (mentor)	2009-07-17 14:48:21 +00:00
rwatson	88f8de4d40	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
rwatson	bc565b5b97	Add missing license line for vnet.h, correct white space nit. Approved by: re (kensmith) (implicit)	2009-07-15 00:56:15 +00:00
rwatson	57ca4583e7	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
kmacy	ccb66bdf1d	Re-factoring for adding weighted routes introduced a fairly irritating bug where the system will panic when RADIX_MPATH is enabled. This change fixes this. Approved by: re@	2009-07-11 21:56:23 +00:00
rpaulo	8424d74020	Implementation of the upcoming Wireless Mesh standard, 802.11s, on the net80211 wireless stack. This work is based on the March 2009 D3.0 draft standard. This standard is expected to become final next year. This includes two main net80211 modules, ieee80211_mesh.c which deals with peer link management, link metric calculation, routing table control and mesh configuration and ieee80211_hwmp.c which deals with the actually routing process on the mesh network. HWMP is the mandatory routing protocol on by the mesh standard, but others, such as RA-OLSR, can be implemented. Authentication and encryption are not implemented. There are several scripts under tools/tools/net80211/scripts that can be used to test different mesh network topologies and they also teach you how to setup a mesh vap (for the impatient: ifconfig wlan0 create wlandev ... wlanmode mesh). A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled by default on GENERIC kernels for i386, amd64, sparc64 and pc98. Drivers that support mesh networks right now are: ath, ral and mwl. More information at: http://wiki.freebsd.org/WifiMesh Please note that this work is experimental. Also, please note that bridging a mesh vap with another network interface is not yet supported. Many thanks to the FreeBSD Foundation for sponsoring this project and to Sam Leffler for his support. Also, I would like to thank Gateworks Corporation for sending me a Cambria board which was used during the development of this project. Reviewed by: sam Approved by: re (kensmith) Obtained from: projects/mesh11s	2009-07-11 15:02:45 +00:00
bz	f236086095	In case we cannot queue a packet reaching the queue limit, retain the semantics netisr_queue() always had and free the mbuf along with returning the error. Reviewed by: rwatson Approved by: re (kensmith)	2009-06-30 05:21:00 +00:00
brooks	0cabaf8791	Remove support for the /dev/net/* per-interface devices. They serve little purpose and are unused in the base system. The IOCTL functionality is entirely duplicated and routing sockets provide a richer interface than the kqueue functionality. Further, it is not practical for these devices to be made sensible in the face of VIMAGE. Bump __FreeBSD_version on the off chance that there is any code out there that actually uses this stuff. Reviewed by: rwatson Discussed with: bz, zec Approved by: re@ (kensmith)	2009-06-29 19:46:29 +00:00
rwatson	5cfc09d074	Remove unnecessary include of kdb.h that snuck in during ifaddr refcount work. Reported by: pluknet <pluknet at gmail.com> Approved by: re (kib)	2009-06-27 10:30:28 +00:00

... 3 4 5 6 7 ...

2938 Commits