freebsd-dev

Author	SHA1	Message	Date
Bjoern A. Zeeb	3d07127c64	When adding IPv6 fwd support to ipfw in r225044 these two files were not committed. Initialize next_hop6 to align with the IPv4 code. PR: bin/117214 MFC after: 3 weeks X-MFC with: r225044 Approved by: re (kib)	2011-08-27 08:49:55 +00:00
Attilio Rao	6aba400a70	Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks	2011-08-25 15:51:54 +00:00
Qing Li	fc96aabef1	When the RADIX_MPATH kernel option is enabled, the RADIX_MPATH code tries to find the first route node of an ECMP chain before executing the route command. If the system has a default route, and the specific route argument to the command does not exist in the routing table, then the default route would be reached. The current code does not verify the reached node matches the given route argument, therefore erroneous removed the entry. This patch fixes that bug. Approved by: re MFC after: 3 days	2011-08-25 04:31:20 +00:00
Kevin Lo	e9ff3d45e4	In rtinit1(), before rtrequest1_fib() is called, info.rti_flags is initialized by flags (function argument) or-ed with ifa->ifa_flags. If both NIC has a loopback route to itself, so IFA_RTSELF is set on ifa(s). As IFA_RTSELF is defined by RTF_HOST, rtrequest1_fib() is called with RTF_HOST flag even if netmask is not NULL. Consequently, netmask is set to zero in rtrequest1_fib(), and request to add network route is changed under hands to request to add host route. Tested by: Andrew Boyer <aboyer at averesystems.com> Submitted by: Svatopluk Kraus <onwahe at gmail dot com> Approved by: re (hrs)	2011-08-08 05:25:51 +00:00
Sergey Kandaurov	c94a66f8ae	Add missing MODULE_VERSION() definition to protect against duplicating module loads. PR: kern/159345 Reported by: Eugene Grosbein <egrosbein att rdtc ru> Tested by: Eugene Grosbein <egrosbein att rdtc ru> Approved by: re (kib) MFC after: 1 week	2011-08-01 11:24:55 +00:00
Bjoern A. Zeeb	d9a362862c	Add spares to the network stack for FreeBSD-9: - TCP keep* timers - TCP UTO (adjust from what was there already) - netmap - route caching - user cookie (temporary to allow for the real fix) Slightly re-shuffle struct ifnet moving fields out of the middle of spares and to better align. Discussed with: rwatson (slightly earlier version)	2011-07-17 21:15:20 +00:00
Mark Peek	a4980a95b5	Clear the filter memory area before using it. Leaving it uninitialized may leak previous kernel stack contents through a malicioius BPF filter. PR: kern/158880 Submitted by: Guy Harris Obtained from: OpenBSD MFC after: 1 week	2011-07-14 21:06:22 +00:00
Marko Zec	13e255fab7	Permit ARP to proceed for IPv4 host routes for which the gateway is the same as the host address. This already works fine for INET6 and ND6. While here, remove two function pointers from struct lltable which are only initialized but never used. MFC after: 3 days	2011-07-08 09:38:33 +00:00
Andrew Thompson	6069a2c0bd	Grab the rlock before checking if our interface is enabled, it could be possible to hit a dead pointer when changing interfaces. PR: kern/156978 Submitted by: Andrew Boyer MFC after: 1 week	2011-07-07 20:02:09 +00:00
Bjoern A. Zeeb	a34c6aeb85	Tag mbufs of all incoming frames or packets with the interface's FIB setting (either default or if supported as set by SIOCSIFFIB, e.g. from ifconfig). Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) Reviewed by: julian MFC after: 2 weeks	2011-07-03 16:08:38 +00:00
Bjoern A. Zeeb	43deddcdfe	Remove extra white space to comply with style for the rest of the struct. MFC after: 2 weeks	2011-07-03 15:34:09 +00:00
Bjoern A. Zeeb	35fd7bc020	Add infrastructure to allow all frames/packets received on an interface to be assigned to a non-default FIB instance. You may need to recompile world or ports due to the change of struct ifnet. Submitted by: cjsp Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) (original versions) Reviewed by: julian Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru) MFC after: 2 weeks X-MFC: use spare in struct ifnet	2011-07-03 12:22:02 +00:00
Sergey Kandaurov	235195988b	Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32 (i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match the native SIOCGIFCONF behavior. PR: kern/158369 Reported by: Paul Procacci <pprocacci att gmail com> MFC after: 1 week	2011-06-28 08:41:44 +00:00
Bjoern A. Zeeb	f5857e2d3d	Garbage collect never used global, sysctl, externs. MFC after: 1 week	2011-06-21 07:19:03 +00:00
Bjoern A. Zeeb	b8b8e0c981	Leave an extra comment about flowtable and IPv6 support rectifying a previous comment. MFC after: 1 week	2011-06-20 12:35:12 +00:00
Bjoern A. Zeeb	52dcd04ba3	gre(4) was using a field in the softc to detect possible recursion. On MP systems this is not a usable solution anymore and could easily lead to false positives triggering enough logging that even using the console was no longer usable (multiple parallel ping -f can do). Switch to the suggested solution of using mbuf tags to carry per packet state between gre_output() invocations. Contrary to the proposed solution modelled after gif(4) only allocate one mbuf tag per packet rather than per packet and per gre_output() pass through. As the sysctl to control the possible valid (gre in gre) nestings does no sanity checks, make sure to always allocate space in the mbuf tag for at least one, and at most 255 possible gre interfaces to detect loops in addition to the counter. Submitted by: Cristian KLEIN (cristi net.utcluj.ro) (original version) PR: kern/114714 Reviewed by: Cristian KLEIN (cristi net.utcluj.ro) Reviewed bu: Wooseog Choi (ben_choi hotmail.com) Sponsored by: Sandvine Incorporated MFC after: 1 week	2011-06-18 09:34:03 +00:00
Luigi Rizzo	c9d658e9f7	Grab one of the ifcap bits for netmap, and enable printing in ifconfig. Document the fact that we might want an IFCAP_CANTCHANGE mask, even though the value is not yet used in sys/net/if.c (asked on -current a week ago, no feedback so i assume no objection).	2011-06-14 12:40:55 +00:00
Marko Zec	2fe7ca2ca6	Set curvnet context in a callout-trigerred code path. MFC after: 3 days	2011-06-07 20:46:03 +00:00
John Baldwin	190367ef1c	Properly return an ENOBUFS error if a write to a tun(4) device fails due to m_uiotombuf() failing. While here, trim unneeded error handling related to tuninit() since it can never fail. Submitted by: Martin Birgmeier la5lbtyi aon at Reviewed by: glebius MFC after: 1 week	2011-06-03 13:47:05 +00:00
Robert Watson	6cb52192fe	Add an optional netisr dispatch point at ether_input(), but set the default dispatch method to NETISR_DISPATCH_DIRECT in order to force direct dispatch. This adds a fairly negligble overhead without changing default behavior, but in the future will allow deferred or hybrid dispatch to other worker threads before link layer processing has taken place. For example, this could allow redistribution using RSS hashes without ethernet header cache line hits, if the NIC was unable to adequately implement load balancing to too small a number of input queues -- perhaps due to hard queueset counts of 1, 3, or 8, but in a modern system with 16-128 threads. This can happen on highly threaded systems, where you want want an ithread per core, redistributing work to other queues, but also on virtualised systems where hardware hashing is (or is not) available, but only a single queue has been directed to one VCPU on a VM. Note: this adds a previously non-present assertion about the equivalence of the ifnet from which the packet is received, and the ifnet stamped in the mbuf header. I believe this assertion to generally be true, but we'll find out soon -- if it's not, we might have to add additional overhead in some cases to add an m_tag with the originating ifnet pointer stored in it. Reviewed by: bz MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.	2011-06-01 20:00:25 +00:00
Nathan Whitehorn	d098f93019	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb	2011-05-31 15:11:43 +00:00
Robert Watson	f2d2d69438	Rework netisr policy mechanism so that per-protocol dispatch policies can be represented: - A single policy namespace is defined, consisting of four possible policies: "default" to use the global default, "deferred" to force deferred dispatch, "direct" to employ direct dispatch where possible, and "hybrid" which makes a dynamic decision based on CPU affinity, ordering, etc. Routines are implemented to convert between strings and an integer namespace. - A new global variable, netisr_dispatch_policy, subsumes existing global variables for direct dispatch, forced direct dispatch, etc, and is used for explicit policy interpretation and composition. Old variables remain so that they can be exported by legacy sysctls for use by old netstat(1) binaries. A new sysctl and tunable, netisr.dispatch.policy, accepts the above strings for specifying a global policy default. - The protocol registration structure, netisr_handler, grows an nh_dispatch field, which accepts a per-policy policy override. The default value is '0', which corresponds to "default", meaning that protocols will accept the global default policy unless otherwise specified. - Policies are now interpreted and composed explicitly at various points in packet dispatch; protocol policies override global policies. - Protocols grow the ability to express a non-opinion about affinity even when implenting m2cpuid by returning NETISR_CPUID_NONE. In that case, the framework falls back on source ordering, rather than simply using the current CPU. These changes are in support of allowing link layer re-dispatch based on RSS or similar hashes provided by NICs, especially in the case where the number of hardware receive queues matches hardware core count, rather than hardware thread count, requiring further software redistributeon. (i.e., on RMI XLR). MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-24 12:34:19 +00:00
Marko Zec	9f8cab7fc2	Allow for vlan(4) interfaces with MTU of 1500 bytes to be configured on top of epair(4) virtual interfaces, since there's no physical hardware associated with epair interfaces which would imply any constraints on MTU sizes. MFC after: 3 days	2011-05-24 08:02:55 +00:00
Marko Zec	2dccdd4562	Let epair(4) virtual interfaces report fake link / media status, by borrowing the skeleton of if_media manipulation and reporting code from if_lagg(4). The main motivation behind this change is to allow for epair(4) interfaces to participate in STP if_bridge(4) configurations. Reviewed by: bz MFC after: 3 days	2011-05-24 07:57:28 +00:00
Qing Li	5b84dc789a	The statically configured (permanent) ARP entries are removed when an interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days	2011-05-20 19:12:20 +00:00
Marius Strobl	d09c5f16b0	- Add 10baseT as an alias for 10baseT/UTP. - Add shorthand aliases for common media+option combinations as announced by miibus(4) so that one can actually supply the media strings found in the dmesg output to ifconfig(8). Obtained from: NetBSD (in principle) MFC after: 2 weeks	2011-05-15 12:58:29 +00:00
Pyun YongHyeon	d2d0470dc6	Fix white space nits and style	2011-05-06 20:46:29 +00:00
Pyun YongHyeon	26b8066bce	Do not increment collision counter if transmit have failed. Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it has nothing to do with collision. Reported by: Zeus V Panchenko (zeus <> ibs dot dn dot ua)	2011-05-06 20:37:07 +00:00
Andrew Thompson	627cecc5c9	LACP frames must not be send VLAN-tagged, check for that before processing. PR: kern/156743 Submitted by: Dmitrij Tejblum MFC after: 1 week	2011-04-30 20:34:52 +00:00
Bjoern A. Zeeb	a0ae8f04e8	Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:30:44 +00:00
Gleb Smirnoff	4c506522c1	When removing ifnets, we should first remove the reference to ifnet from the interface index, then decrease refcount, not vice versa. Otherwise there is a race (reproducible) when if_free_internal() contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the index for a long time. It may be picked by some other thread, that runs ifnet_byindex_ref(), who takes the ifnet from index, and bumps refcount. When reader drops the lock, if_free_internal() proceeds with free. Then reader tries to free it a second time.	2011-04-04 07:45:08 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Dmitry Chagin	2093339ead	Remove dead code. MFC after: 1 Week	2011-03-20 08:35:00 +00:00
Dmitry Chagin	e579f1c1cf	ouch, newrt is used on the return path, my fault. Partialy revert the previous change. MFC after: 1 Week.	2011-03-19 21:10:57 +00:00
Dmitry Chagin	523e60025b	A bit rearranged rtalloc1_fib() code. Initialize a variable when it is really needed. To avoid code duplication move the miss label to line up and jump on it. MFC after: 1 Week	2011-03-19 19:50:36 +00:00
Dmitry Chagin	6a873ef717	Remove a now unused variable. MFC after: 1 Week	2011-03-19 16:52:06 +00:00
Ermal Luçi	5f82cfdf6c	Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none. Approved by: thompsa(mentor) MFC after: 1 week	2011-03-04 20:37:38 +00:00
Bjoern A. Zeeb	e3416ab0c0	Hide the outer IP addresses of a tunnel interfaces (gif(4), gre(4)) from processes inside jails if the addresses do not belong to the jail. Originally reported by: Pieter de Boer via remko PR: kern/151119 Tested by: Piotr KUCHARSKI (nospam 42.pl) [gif] MFC after: 1 week	2011-03-02 21:39:08 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Bjoern A. Zeeb	1fb51a12f2	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
Bjoern A. Zeeb	144e6203ff	Mfp4 CH=177255: Resort the CURVNET_SET* macros in the non-VNET_DEBUG case to match the call order of the VNET_DEBUG case. Add the VNET_ASSERT() to the non-VNET_DEBUG case as well so that INVARIANTS will still catch problems. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks	2011-02-11 14:17:58 +00:00
Bjoern A. Zeeb	0028e52461	Mfp4 CH=177255: Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS. Change the syntax to match KASSERT() to allow more flexible panic messages rather than having a printf with hardcoded arguments before panic. Adjust the few assertions we have to the new format (and enhance the output). Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks	2011-02-11 13:27:00 +00:00
Bjoern A. Zeeb	6cf986ac19	Mfp4 CH=177255: Use __func__ rather than __FUNCTION__. MFC after: 2 weeks	2011-02-11 12:56:05 +00:00
Max Laier	826bf287b5	As info.rti_info[RTAX_DST] can point inside of rtm we must not free the rtm until rt_dispatch is done with the sockaddr. Found by: memguard MFC after: 3 days	2011-02-10 01:24:09 +00:00
John Baldwin	5f3b301a43	Fix a LOR by dropping the global ifnet locks while allocating a new ifnet table in if_grow(). The order of the SYSINIT's for ifnet state were swapped so that the various locks were initialized before being used. Reviewed by: pluknet, bz MFC after: 2 weeks	2011-01-24 22:21:58 +00:00
Matthew D Fleming	f8e4b4ef49	sysctl(8) should use the CTLTYPE to determine the type of data when reading. (This was already done for writing to a sysctl). This requires all SYSCTL setups to specify a type. Most of them are now checked at compile-time. Remove SYSCTL_X sysctl additions as the print being in hex should be controlled by the -x flag to sysctl(8). Succested by: bde	2011-01-19 17:04:07 +00:00
Matthew D Fleming	f88910cdf5	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the net* piece.	2011-01-12 19:53:50 +00:00
John Baldwin	58ccf5b41c	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde	2011-01-11 13:59:06 +00:00
Bjoern A. Zeeb	9269189f98	MfP4 CH=185246 [1]: Add FEATURE() to announce optional VIMAGE. MFC after: 3 days [1] for the moment put it in vnet.c.	2011-01-09 20:40:21 +00:00
John Baldwin	a8f4344f08	- Restore dropping the priority of syncer down to PPAUSE when it is idle. This was lost when it was converted to using a condition variable instead of lbolt. - Drop the priority of flowtable down to PPAUSE when it is idle as well since it is a similar background task. MFC after: 2 weeks	2011-01-06 22:17:07 +00:00
Marius Strobl	a0fc3825c3	Teach ifconfig(8) the handy shared option shortcut aliases the NetBSD counterpart also takes, i.e. "fdx" for "full-duplex", "flow" for "flowcontrol", "hdx" for "half-duplex" as well as "loop" and "loopback" for "hw-loopback". MFC after: 1 week	2011-01-05 15:28:30 +00:00
Marius Strobl	8b28e7e1a3	Fix whitespace. MFC after: 1 week	2011-01-05 14:51:04 +00:00
Bjoern A. Zeeb	962be6dfb3	Use NULL rather than 0 to invalidate a pointer. Rather than duplicating the LLE_FREE_LOCKED() macro code in LLE_FREE(), call it directly (like we do for the RT_* macros). Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC After: 1 week [1] Early 2010.	2010-12-31 21:57:54 +00:00
Bjoern A. Zeeb	c9a2711a54	Print the vnet pointer under DDB when iterating over flowtables of each virtual network stack instance. Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC after: 1 week [1] Early 2010.	2010-12-31 21:20:32 +00:00
Bjoern A. Zeeb	f0a56b0678	Move the increment operation under the lock and split the condition variable into two so that we can see on which one we are waiting. This might also more properly propagate the update of the flowclean_cycles flag and avoid "hangs" people were seeing. Suggested by: rwatson [1] Sponsored by: ISPsystem [1] Reviewed by: julian [1] Updated by: Mikolaj Golub (to.my.trociny gmail.com) Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC After: 1 week [1] Early 2010, initial version.	2010-12-31 21:06:52 +00:00
Alan Cox	82de724fe1	Introduce and use a new VM interface for temporarily pinning pages. This new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@	2010-12-25 21:26:56 +00:00
Weongyo Jeong	c5649739a5	Adds IFF_CANTCONFIG to IFF_CANTCHANGE that it shouldn't happen through ioctl(2).	2010-12-07 20:31:04 +00:00
Weongyo Jeong	6e3cb00068	Introduces IFF_CANTCONFIG interface flag to point that the interface isn't configurable in a meaningful way. This is for ifconfig(8) or other tools not to change code whenever IFT_USB-like interfaces are registered at the interface list. Reviewed by: brooks No objections: gavin, jkim	2010-12-07 20:23:47 +00:00
Maxim Konovalov	57542d0481	o Swap descriptions for net.bpf.bufsize and net.bpf.maxbufsize. PR: misc/152531 MFC after: 1 week	2010-11-24 05:50:19 +00:00
Marko Zec	ccf7ba972c	Allow for vlan(4) ifnets to have overlapping unit numbers if they are created in separated vnets. As a side-effect of having a separated if_cloner instance for each vnet, all vlan ifnets created in a vnet will be automatically destroyed when vnet teardown is initiated. Disallow SIOCSETVLAN and SIOCGETVLAN ioctls on vlan ifnets which are associated with physical ifnets residing in parent vnets. This is an interim vlan-specific solution which will be superseded by a more generic if_cloner V_irtualization change from p4. For nooptions VIMAGE builds, this should be a no-op change. Discussed with: bz MFC after: 3 days	2010-11-22 23:35:29 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Bjoern A. Zeeb	2c8b047c07	Add a missing ';' and change the debugging sysctl from xint to int. Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 3 days	2010-11-21 19:33:19 +00:00
Dimitry Andric	c3adda9fc3	Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined.	2010-11-14 20:40:55 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Dimitry Andric	47d46d92c2	Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-14 20:23:02 +00:00
Marius Strobl	efd4fc3fb3	o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control support in mii(4): - Merge generic flow control advertisement (which can be enabled by passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD, IFM_FLOW isn't implemented as a global option via the "don't care mask" but instead as a media specific option this. This has the following advantages: o allows flow control advertisement with autonegotiation to be turned on and off via ifconfig(8) with the default typically being off (though MIIF_FORCEPAUSE has been added causing flow control to be always advertised, allowing to easily MFC this changes for drivers that previously used home-grown support for flow control that behaved that way without breaking POLA) o allows to deal with PHY drivers where flow control advertisement with manual selection doesn't work or at least isn't implemented, like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4), by setting MIIF_NOMANPAUSE o the available combinations of media options are readily available from the `ifconfig -m` output - Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so these are understood by ifconfig(8). o Make the master/slave support in mii(4) actually usable: - Change IFM_ETH_MASTER from being implemented as a global option via the "don't care mask" to a media specific one as it actually is only applicable to IFM_1000_T to date. - Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to actually configure manually selected slave mode (like we also do in the PHY specific implementations). - Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it is understood by ifconfig(8). o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4), e1000phy(4) and ip1000phy(4) to use the generic flow control support instead of home-grown solutions via IFM_FLAGs. This includes changing these PHY drivers and smcphy(4) to no longer unconditionally advertise support for flow control but only if the selected media has IFM_FLOW set (or MIIF_FORCEPAUSE is set) and implemented for these media variants, i.e. typically only for copper. o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0 and some IFM_FLAGn. o Switch brgphy(4) to add at least the the supported copper media based on the contents of the BMSR via mii_phy_add_media() instead of hardcoding them. The latter approach seems to have developed historically, besides causing unnecessary code duplication it was also undesirable because brgphy_mii_phy_auto() already based the capability advertisement on the contents of the BMSR though. o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not just BCM5701. Apparently this was a misinterpretation of a workaround in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but this doesn't mean we can't set these as well on other PHYs for manual media selection. o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so IFM_1000_T master mode support now is generally available with all PHY drivers. o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's not applicable there. Reviewed by: yongari (plus additional testing) Obtained from: NetBSD (partially), OpenBSD (partially) MFC after: 2 weeks	2010-11-14 13:26:10 +00:00
Konstantin Belousov	7b3b099e07	Use 'z' modifier for size_t printing.	2010-11-13 11:11:51 +00:00
Dimitry Andric	7e54af0831	Similar to r212647, remove the workaround in sys/net/vnet.h for an ld bug (incorrect placement of __start_SECNAME in some cases) that was fixed in r210245. There is already an UPDATING entry about needing a recent ld. MFC after: 1 month	2010-11-12 22:59:50 +00:00
George V. Neville-Neil	e162ea60d4	Add a queue to hold packets while we await an ARP reply. When a fast machine first brings up some non TCP networking program it is quite possible that we will drop packets due to the fact that only one packet can be held per ARP entry. This leads to packets being missed when a program starts or restarts if the ARP data is not currently in the ARP cache. This code adds a new sysctl, net.link.ether.inet.maxhold, which defines a system wide maximum number of packets to be held in each ARP entry. Up to maxhold packets are queued until an ARP reply is received or the ARP times out. The default setting is the old value of 1 which has been part of the BSD networking code since time immemorial. Expose the time we hold an incomplete ARP entry by adding the sysctl net.link.ether.inet.wait, which defaults to 20 seconds, the value used when the new ARP code was added.. Reviewed by: bz, rpaulo MFC after: 3 weeks	2010-11-12 22:03:02 +00:00
Dimitry Andric	4403994d7d	Use the same treatment as in linker_set.h for the __start and __stop symbols of the set_vnet and set_pcpu sections, so those symbols will always be emitted in kernel modules, if they use vnet.h or pcpu.h. Also, for pcpu.h, make the __(start\|stop)_set_pcpu declarations, and associated macros invisible to userland, to prevent it picking up these symbols. Reviewed by: kib	2010-11-11 19:18:52 +00:00
Rui Paulo	09b6dcf968	Sync DLTs with the latest pcap version.	2010-10-29 18:41:09 +00:00
Bjoern A. Zeeb	a38de0134b	Factor out DDB commands from r204145, r204279 into if_debug.c for further enhancements (1). Switch to a standard 2-clause BSD license for this (2). Unfortunately we have to un-static the ifindex_table for this but do not publicly export it. Suggested by: rwatson (1) a while back. Approved by: thompsa (2) for the change from r204279. MFC after: 6 days	2010-10-25 08:30:19 +00:00
Sergey Kandaurov	9af74f3d68	Reshuffle SIOCGIFCONF32 handler from r155224. - move all the chunks into one file, which allows to hide SIOCGIFCONF32 global definition as well. - replace __amd64__ with proper COMPAT_FREEBSD32 around. - handle 32bit capacity before going into the handler itself instead of doing internal 32bit specific changes within it (e.g. as it's done for SIOCGDEFIFACE32_IN6). - use explicitely sized types for ABI compat. Approved by: kib (mentor) MFC after: 2 weeks	2010-10-21 16:20:48 +00:00
Bjoern A. Zeeb	ee7c7fee94	Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating over all interfaces to make sure the address will neither change nor be freed while we are working on it. PR: kern/146250 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 1 week	2010-10-16 19:25:27 +00:00
Bjoern A. Zeeb	fc2bfb3294	lltable_drain() has never been used so far, thus #if 0 it for now. While touching it add the missing locking to the now disabled code for the time when we'll resurrect it. MFC after: 3 days	2010-10-16 18:42:09 +00:00
Bjoern A. Zeeb	b6b8c0779d	Only hide the ifa and not the tp under #ifdef INET as the tp is needed for locking evenwhen there is no INET. MFC after: 3 days	2010-10-01 15:14:14 +00:00
John Baldwin	24f481fde2	- Expand scope of tun/tap softc locks to cover more softc fields and driver-maintained ifnet fields (such as if_drv_flags). - Use soft locks as the mutex that protects each interface's knote list rather than using the global knote list lock. Also, use the softc for kn_hook instead of the cdev. - Use mtx_sleep() instead of tsleep() when blocking in the read routines. This fixes a lost wakeup race. - Remove D_NEEDGIANT now that the cdevsw routines use the softc lock where locking is needed. - Lock IFQ when calculating the result for FIONREAD in tap(4). tun(4) already did this. - Remove remaining spl calls. Submitted by: Marcin Cieslak saper of saper\|info (3) MFC after: 2 weeks	2010-09-22 21:02:43 +00:00
Jung-uk Kim	d0d7bcdf92	Fix a typo in a comment. Submitted by: afiveg	2010-09-16 18:37:33 +00:00
Matthew D Fleming	4d369413e1	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
Bjoern A. Zeeb	73e39d6137	MFp4 CH=183259: No reason to use if_free_type() as we don't change our type. Just if_free() is fine. MFC after: 3 days	2010-09-02 16:11:12 +00:00
Ed Maste	be4572c896	Add a sysctl knob to accept input packets on any link in a failover lagg.	2010-09-01 16:53:38 +00:00
Bjoern A. Zeeb	c749353940	MFp4 CH=182972: Add explicit linkstate UP/DOWN for the epair. This is needed by carp(4) and other things to work. MFC after: 5 days	2010-08-27 23:22:58 +00:00
Rui Paulo	79856499bd	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
Marko Zec	d3c351c50f	When moving an ethernet ifnet from one vnet to another, destroy the associated ng_ether netgraph node in the current vnet, and create a new one in the target vnet. Reviewed by: julian MFC after: 3 days	2010-08-13 18:17:32 +00:00
Will Andrews	9963e8a52c	Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with the appropriate ifdefs. Reviewed by: bz Approved by: ken (mentor)	2010-08-11 20:18:19 +00:00
Will Andrews	54bfbd5153	Allow carp(4) to be loaded as a kernel module. Follow precedent set by bridge(4), lagg(4) etc. and make use of function pointers and pf_proto_register() to hook carp into the network stack. Currently, because of the uncertainty about whether the unload path is free of race condition panics, unloads are disallowed by default. Compiling with CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure. This commit requires IP6PROTOSPACER, introduced in r211115. Reviewed by: bz, simon Approved by: ken (mentor) MFC after: 2 weeks	2010-08-11 00:51:50 +00:00
John Baldwin	3ba24fde11	Adjust the interface type in the link layer socket address for vlan(4) interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface (IFT_ETHER). The code already fixed if_type in the ifnet causing some places to report the interface as a vlan (e.g. arp -a output) and other places to report the interface as Ethernet (getifaddrs(3)). Now they should all report IFT_L2VLAN. Reviewed by: brooks MFC after: 1 month	2010-08-06 15:15:26 +00:00
Konstantin Belousov	04f3205755	Properly set ifi_datalen for compat32 struct if_data32. PR: kern/149240 Submitted by: Stef Walter <stef memberwebs com> MFC after: 1 weeks	2010-08-03 15:40:42 +00:00
Gleb Smirnoff	b17f26b00c	Don't check malloc(M_WAITOK) result.	2010-07-27 11:56:49 +00:00
Bjoern A. Zeeb	cd292f1264	Return NULL rather than 0 for a pointer. MFC after: 3 days	2010-07-27 11:54:01 +00:00
Gleb Smirnoff	85011246ac	When installing a new ARP entry via 'arp -S', lla_lookup() will either find an existing entry, or allocate a new one. In the latter case an entry would have flags, that were supplied as argument to lla_lookup(). In case of an existing entry, flags aren't modified. This lead to losing LLE_PUB and/or LLE_PROXY flags. We should apply these flags either in lla_rt_output() or in the in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is a more correct choice. PR: kern/148784, kern/146539 Silence from: qingli, 5 days	2010-07-27 10:05:27 +00:00
Jung-uk Kim	82040afcf3	Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock regardless of the given flags. MFC after: 3 days	2010-07-22 18:44:40 +00:00
Luigi Rizzo	1f6ad072ea	whitespace cleanup	2010-07-15 14:41:59 +00:00
Luigi Rizzo	b62cb72c48	small portability fix to build on linux/windows	2010-07-15 14:41:06 +00:00
Jung-uk Kim	547d94bde3	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Marko Zec	b1ae592bd4	Provide a macro for registering a virtualized sysctl handler for VNET opaque data. MFC after: 30 days	2010-06-02 15:29:21 +00:00
Qing Li	0ed6142b31	This patch fixes the problem where proxy ARP entries cannot be added over the if_ng interface. MFC after: 3 days	2010-05-25 20:42:35 +00:00
John Baldwin	6f359e2828	Ignore failures from removing multicast addresses from the parent (trunk) interface when tearing down a vlan interface. If a trunk interface is detached, all of its multicast addresses are removed before the ifnet departure eventhandlers are invoked. This means that all of the multicast addresses are removed before the vlan interfaces are removed which causes the if_delmulti() calls in the vlan teardown to fail. In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer valid parent interface. In the !VLAN_ARRAY case, the eventhandler gets stuck in an infinite loop retrying vlan_unconfig_locked() forever. In general the callers of vlan_unconfig_locked() do not expect nor handle failure, so I believe it is safer to ignore the errors and tear down as much of the vlan state as possible. Silence from: net@ MFC after: 4 days	2010-05-17 19:36:56 +00:00
Kip Macy	83e711ec14	allocate ipv6 flows from the ipv6 flow zone reported by: rrs@ MFC after: 3 days	2010-05-16 21:48:39 +00:00
Bjoern A. Zeeb	793f71bf2e	Fix an issue with the dynamic pcpu/vnet data allocators. We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than _MODMIN space and ignore the extra space (). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-05-14 21:11:58 +00:00
Kip Macy	19d0491585	workaround bug with ipv6 where a flow can have a null rtentry	2010-05-12 04:51:20 +00:00
Alan Cox	f0c0d3998d	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
Alan Cox	a7283d3213	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
Maxim Sobolev	e50d35e6c6	Add new tunable 'net.link.ifqmaxlen' to set default send interface queue length. The default value for this parameter is 50, which is quite low for many of today's uses and the only way to modify this parameter right now is to edit if_var.h file. Also add read-only sysctl with the same name, so that it's possible to retrieve the current value. MFC after: 1 month	2010-05-03 07:32:50 +00:00
Alan Cox	913814935a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Bjoern A. Zeeb	82cea7e6f3	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
Kip Macy	3e8b572db4	need to initialize the lock before it is used MFC after: 3 days	2010-04-27 23:48:50 +00:00
Bjoern A. Zeeb	1b610a749e	MFP4: @177254 Add missing CURVNET_RESTORE() calls for multiple code paths, to stop leaking the currently cached vnet into callers and to the process. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-04-27 15:16:54 +00:00
Konstantin Belousov	fc0a61a401	Provide compat32 shims for bpf(4), except zero-copy facilities. bd_compat32 field of struct bpf_d is kept unconditionally to not impose the requirement of including "opt_compat.h" on all numerous users of bpfdesc.h. Submitted by: jhb (version for 6.x) Reviewed and tested by: emaste MFC after: 2 weeks	2010-04-25 16:43:41 +00:00
Konstantin Belousov	427a928af7	Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST. This allows getifaddrs(3) to work for compat32 binaries. Submitted by: jhb (6.x version) Reviewed by: emaste Tested by: emaste and <pluknet gmail com> MFC after: 2 weeks	2010-04-25 16:42:47 +00:00
Julian Elischer	7a90b21212	Move two copies of the same definition to a common include file. MFC after: 3 weeks	2010-04-14 23:06:07 +00:00
Xin LI	57d848483e	When an underlying ioctl(2) handler returns an error, our ioctl(2) interface considers that it hits a fatal error, and will not copyout the request structure back for _IOW and _IOWR ioctls, keeping them untouched. The previous implementation of the SIOCGIFDESCR ioctl intends to feed the buffer length back to userland. However, if we return an error, the feedback would be defeated and ifconfig(8) would trap into an infinite loop. This commit changes SIOCGIFDESCR to set buffer field to NULL to indicate the previous ENAMETOOLONG case. Reported by: bschmidt MFC after: 2 weeks	2010-04-14 22:02:19 +00:00
Bjoern A. Zeeb	d0088cde62	Take a reference to make sure that the interface cannot go away during if_clone_destroy() in case parallel threads try to. PR: kern/116837 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 10 days	2010-04-11 18:47:38 +00:00
Bjoern A. Zeeb	c769e1be01	Check that the interface is on the list of cloned interfaces before trying to remove it to avoid panics in case of two threads trying to remove it in parallel. PR: kern/116837 Submitted by: Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version) MFC after: 10 days	2010-04-11 18:41:31 +00:00
Bjoern A. Zeeb	becba438d2	Plug reference leaks in the link-layer code ("new-arp") that previously prevented the link-layer entry from being freed. In both in.c and in6.c (though that code path seems to be basically dead) plug a reference leak in case of a pending callout being drained. In if_ether.c consistently add a reference before resetting the callout and in case we canceled a pending one remove the reference for that. In the final case in arptimer, before freeing the expired entry, remove the reference again and explicitly call callout_stop() to clear the active flag. In nd6.c:nd6_free() we are only ever called from the callout function and thus need to remove the reference there as well before calling into llentry_free(). In if_llatbl.c when freeing entire tables make sure that in case we cancel a pending callout to remove the reference as well. Reviewed by: qingli (earlier version) MFC after: 10 days Problem observed, patch tested by: simon on ipv6gw.f.o, Christian Kratzer (ck cksoft.de), Evgenii Davidov (dado korolev-net.ru) PR: kern/144564 Configurations still affected: with options FLOWTABLE	2010-04-11 16:04:08 +00:00
Bjoern A. Zeeb	d8c136591a	In if_detach_internal() we cannot hold the af_data lock over the dom_ifdetach() calls as they might sleep for callout_drain(). Do as we do in if_attachdomain1() [r121470] and handle if_afdata_initialized earlier and call dom_ifdetach() unlocked. Discussed with: rwatson MFC after: 10 days	2010-04-11 11:51:44 +00:00
Bjoern A. Zeeb	318c3213e5	In if_detach_internal() only try to do the detach run if if_attachdomain1() has actually succeeded to initialize and attach. There is a theoretical possibility to drop out early in if_attachdomain1() leaving the array uninitialized if we cannot get the lock. Discussed with: rwatson MFC after: 10 days	2010-04-11 11:49:24 +00:00
Jung-uk Kim	704858479c	Check the pointer to JIT binary filter before its de-allocation. Submitted by: Alexander Sack (asack at niksun dot com) MFC after: 3 days	2010-03-29 20:24:03 +00:00
Rui Paulo	59fe4a8ce6	Add MCS to the list of media types. Sponsored by: iXsystems, inc.	2010-03-23 13:15:11 +00:00
Kip Macy	3059584e2a	- boot-time size the ipv4 flowtable and the maximum number of flows - increase flow cleaning frequency and decrease flow caching time when near the flow limit - stop allocating new flows when within 3% of maxflows don't start allocating again until below 12.5% MFC after: 7 days	2010-03-22 23:04:12 +00:00
Ed Maste	d8564efde1	Avoid holding the VLAN_LOCK() over the parent interface SIOCGIFMEDIA ioctl call, as it may sleep. Reviewed by: rwatson	2010-03-21 15:00:33 +00:00
Bjoern A. Zeeb	42eedeac00	Split eventhandler_register() into an internal part and a wrapper function that provides the allocated and setup eventhandler entry. Add a new wrapper for VIMAGE that allocates extra space to hold the callback function and argument in addition to an extra wrapper function. While the wrapper function goes as normal callback function the argument points to the extra space allocated holding the original func and arg that the wrapper function can then call. Provide an iterator function for the virtual network stack (vnet) that will call the callback function for each network stack. Provide a new set of macros for VNET that in the non-VIMAGE case will just call eventhandler_register() while in the VIMAGE case it will use vimage_eventhandler_register() passing in the extra iterator function but will only register once rather than per-vnet. We need a special macro in case we are interested in the tag returned as we must check for curvnet and can neither simply assign the return value, nor not change it in the non-vnet0 case without that. Sponsored by: ISPsystem Discussed with: jhb Reviewed by: zec (earlier version), jhb MFC after: 1 month	2010-03-19 19:51:03 +00:00
Bjoern A. Zeeb	335b943f8e	Add ddb support to the "new" link layer code ("new-arp"): - show all lltables [1] (optional flag to also show the llentries as well) - show lltable <struct lltable > - show llentry <struct llentry > MFC after: 6 days	2010-03-18 09:09:59 +00:00
Qing Li	6b533b5ddb	Verify interface up status using its link state only if the interface has such capability. The interface capability flag indicates whether such capability exists. This approach is much more backward compatible. Physical device driver changes will be part of another commit. Also updated the ifconfig utility to show the LINKSTATE capability if present. Reviewed by: rwatson, imp, juli MFC after: 3 days	2010-03-16 17:59:12 +00:00
Max Laier	4c71aa5890	Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834. MFC after: 3 days	2010-03-15 21:15:03 +00:00
Kip Macy	8847ae28f5	flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB pointed out by jkim@	2010-03-12 19:58:51 +00:00
Jung-uk Kim	5d7af3a1cc	Fix a style(9) nit.	2010-03-12 19:42:42 +00:00
Kip Macy	a398ca9cea	re-update copyright to 2010 pointed out by danfe@	2010-03-12 19:26:45 +00:00
Jung-uk Kim	9fee1bd1d8	Tidy up callout for select(2) and read timeout. - Add a missing callout_drain(9) before the descriptor deallocation.[1] - Prefer callout_init_mtx(9) over callout_init(9) and let the callout subsystem handle the mutex for callout function. PR: kern/144453 Submitted by: Alexander Sack (asack at niksun dot com)[1] MFC after: 1 week	2010-03-12 19:14:58 +00:00
Qing Li	688ba6823b	The flow-table module retrieves the destination and source address as well as the transport protocol port information from the outbound packets. The routing code is generic and compares every byte in the given sockaddr object. Therefore the temporary sockaddr objects must be cleared due to padding bytes. In addition, the port information must be stripped or the route search will either fail or return the incorrect route entry. Unit testing is done using OpenVPN over the if_tun interface. MFC after: 7 days	2010-03-12 10:24:58 +00:00
Kip Macy	112125d206	fix stats reporting sysctl	2010-03-12 06:31:19 +00:00
Kip Macy	d4121a02c0	- restructure flowtable to support ipv6 - add a name argument to flowtable_alloc for printing with ddb commands - extend ddb commands to print destination address or 4-tuples - don't parse ports in ulp header if FL_HASH_ALL is not passed - add kern_flowtable_insert to enable more generic use of flowtable (e.g. system calls for adding entries) - don't hash loopback addresses - cleanup whitespace - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention - add sysctls to accumulate stats and report aggregate MFC after: 7 days	2010-03-12 05:03:26 +00:00
Qing Li	355ad3ead4	The if_tap interface is of IFT_ETHERNET type, but it does not set or update the if_link_state variable. As such RT_LINK_IS_UP() fails for the if_tap interface. Also, the RT_LINK_IS_UP() needs to bypass all loopback interfaces because loopback interfaces are considered up logically as long as the system is running. This patch fixes the above issues by setting and updating the if_link_state variable when the tap interface is opened or closed respectively. Similary approach is already done in the if_tun device. MFC after: 3 days	2010-03-11 17:56:46 +00:00
Qing Li	c7ea0aa648	One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days	2010-03-09 01:11:45 +00:00
Xin LI	13d85d4382	Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg interface. The check itself seems to be coming from OpenBSD but does not seem to be useful for our code. Discussed with: thomasa MFC after: 1 month	2010-03-09 00:52:16 +00:00
Bjoern A. Zeeb	e253cdd07c	Not only flush the ipfw tables when unloading ipfw or tearing down a virtual netowrk stack, but also free the Radix Node Head. Sponsored by: ISPsystem Reviewed by: julian MFC after: 5 days	2010-03-07 15:37:58 +00:00
Bjoern A. Zeeb	1bb635b04d	Introduce a function rn_detachhead() that will free the radix table root nodes. This is only needed (and available) in the virtualization case to free the resources when tearing down a virtual network stack. Sponsored by: ISPsystem Reviewed by: julian, zec MFC after: 5 days	2010-03-06 21:27:26 +00:00
Bjoern A. Zeeb	eea3faf77b	Rework reference counting in case we queue into the netisr, or overflow the netisr queue and fall back to the interface queue so that we can garuantee that the ifnet pointer stays valid. Formerly we ended up with reference counts <= 0 in case the netisr had returned ENOBUFS. The idea is to track any packet in the netisr queue and only change the refount on edge operations for the fallback interface queue. This also avoids problems in case the if_snd.ifq_len lies to us. Also rework refount assertions to make sure they trigger if we go below 1. Formerly a negative refence count did not trigger the assert as the refcount variable is u_int. Sponsored by: ISPsystem MFC after: 5 days	2010-03-06 21:22:28 +00:00
Luigi Rizzo	cc4d3c30ea	Bring in the most recent version of ipfw and dummynet, developed and tested over the past two months in the ipfw3-head branch. This also happens to be the same code available in the Linux and Windows ports of ipfw and dummynet. The major enhancement is a completely restructured version of dummynet, with support for different packet scheduling algorithms (loadable at runtime), faster queue/pipe lookup, and a much cleaner internal architecture and kernel/userland ABI which simplifies future extensions. In addition to the existing schedulers (FIFO and WF2Q+), we include a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new, very fast version of WF2Q+ called QFQ. Some test code is also present (in sys/netinet/ipfw/test) that lets you build and test schedulers in userland. Also, we have added a compatibility layer that understands requests from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries, and replies correctly (at least, it does its best; sometimes you just cannot tell who sent the request and how to answer). The compatibility layer should make it possible to MFC this code in a relatively short time. Some minor glitches (e.g. handling of ipfw set enable/disable, and a workaround for a bug in RELENG_7's /sbin/ipfw) will be fixed with separate commits. CREDITS: This work has been partly supported by the ONELAB2 project, and mostly developed by Riccardo Panicucci and myself. The code for the qfq scheduler is mostly from Fabio Checconi, and Marta Carbone and Francesco Magno have helped with testing, debugging and some bug fixes.	2010-03-02 17:40:48 +00:00
Luigi Rizzo	7bc2288264	remove unnecessary casts leftover from a bogus fix to a previous bug	2010-03-02 16:24:16 +00:00
Alfred Perlstein	e722820434	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan	2010-03-02 06:58:58 +00:00
Joel Dahl	7df6f59359	The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software. Obtained from: NetBSD	2010-03-01 17:05:46 +00:00
Robert Watson	60efbc9991	Whitespace tweak. MFC after: 3 days	2010-03-01 00:43:05 +00:00
Robert Watson	938448cd87	Changes to support crashdump analysis of netisr: - Rename the netisr protocol registration array, 'np' to 'netisr_proto', in order to reduce the chances of symbol name collisions. It remains statically defined, but it will be looked up by netstat(1). - Move certain internal structure definitions from netisr.c to netisr_internal.h so that netstat(1) can find them. They remain private, and should not be used for any other purpose (for example, they should not be used by kernel modules, which must instead use the public interfaces in netisr.h). - Store a kernel-compiled version of NETISR_MAXPROT in the global variable netisr_maxprot, and export via a sysctl, so that it is available for use by netstat(1). This is especially important for crashdump interpretation, where the size of the workstream structure is determined by the maximum number of protocols compiled into the kernel. MFC after: 1 week Sponsored by: Juniper Networks	2010-03-01 00:42:36 +00:00
Konstantin Belousov	22e62e7e6e	In both if_tun and if_tap: Do not do additional dev_ref() on the newly created interface in the if_clone create method [1]. This reference is not needed and never removed, causing struct cdevpriv leakage. Remove the setting of SI_CHEAPCLONE flag as well, since it is unused. For dev_clone handlers, create cdevs with the call make_dev_credf(MAKEDEV_REF) instead of calling make_dev() and then dev_ref(), to avoid a race. Call drain_dev_clone_events() at the module unload time after dev_clone handler is deinstalled. Submitted by: Mikolaj Golub <to.my.trociny gmail com> [1] MFC after: 1 week	2010-02-28 16:25:49 +00:00
Robert Watson	7f450feb07	Fix edge cases in several KASSERTs: use <= rather than < when testing that counters have not gone about MAXCPU or NETISR_MAXPROT. These problems caused panics on UP kernels with INVARIANTS when using sysctl -a, but would also have caused problems for 32-core boxes or if the netisr protocol vector was fully populated. Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com> MFC after: 4 days	2010-02-25 09:51:14 +00:00
Bjoern A. Zeeb	7405f23cd7	Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets' in the db_show_all_table as 'show all ifnets' and with that follow the convention for showing complete lists. Submitted by: thompsa MFC after: 3 days	2010-02-24 15:54:24 +00:00
Robert Watson	c4fbf89fc5	Fix constant assignment for netisr protocol information sysctl. MFC after: 1 week Spotted by: bz	2010-02-22 16:16:16 +00:00
Robert Watson	2d22f334ea	Export netisr configuration and statistics to userspace via sysctl(9). MFC after: 1 week Sponsored by: Juniper Networks	2010-02-22 15:03:16 +00:00
Robert Watson	5702371bd2	ifconfig(8) expects interface fooX to be supported by the module if_foo, and will try to load it if it's not present. To better meet these expectations, change the module name for the loopback interface from 'loop' to 'if_lo'. The loopback interface is always compiled into the base kernel, so there are no resulting changes in kld files, etc. Discussed with: brooks (ages ago) MFC after: 1 week	2010-02-21 15:25:47 +00:00
Pyun YongHyeon	8b2d91810b	Add __FBSDID. Reviewed by: sam	2010-02-21 00:07:45 +00:00
Pyun YongHyeon	9b76d9cb3d	Add TSO support on VLANs. Intentionally separated IFCAP_VLAN_HWTSO from IFCAP_VLAN_HWTAGGING. I think some hardwares may be able to TSO over VLAN without VLAN hardware tagging. Driver changes and userland support will follow. Reviewed by: thompsa	2010-02-20 22:47:20 +00:00
Bjoern A. Zeeb	c9fdacdac8	Start to implement ifnet DDB support: - 'show ifnets' prints a list of ifnet s per virtual network stack, - 'show ifnet <struct ifnet >' prints fields matching the given ifp. We do not yet print the complete set of fields and might want to factor this out to an extra if_debug.c file in case this grows a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1]. Sponsored by: ISPsystem Suggested by: rwatson [1] Reviewed by: rwatson MFC after: 5 days	2010-02-20 22:09:48 +00:00
Bjoern A. Zeeb	58606037c1	Enhance a panic string to contain more useful debugging information. Sponsored by: ISPsystem Reviewed by: rwatson MFC after: 5 days	2010-02-20 21:43:36 +00:00
Jung-uk Kim	8df67d77ed	Return partially filled buffer for non-blocking read(2) in non-immediate mode. PR: kern/143855	2010-02-20 00:19:21 +00:00
Pawel Jakub Dawidek	784949026c	Mark various sysctls also as tunables. Reviewed by: rwatson MFC after: 1 week	2010-02-15 09:19:07 +00:00
Max Laier	193cbc4d24	Fix drbr and altq interaction: - introduce drbr_needs_enqueue that returns whether the interface/br needs an enqueue operation: returns true if altq is enabled or there are already packets in the ring (as we need to maintain packet order) - update all drbr consumers - fix drbr_flush - avoid using the driver queue (IFQ_DRV_*) in the altq case as the multiqueue consumer does not provide enough protection, serialize altq interaction with the main queue lock - make drbr_dequeue_cond work with altq Discussed with: kmacy, yongari, jfv MFC after: 4 weeks	2010-02-13 16:04:58 +00:00
Bjoern A. Zeeb	3e0490b3fe	Add DDB support for printing vnet_sysinit and vnet_sysuninit ordered call lists. Try to lookup function/symbol names and print those in addition to the pointers, along with the constants for subsystem and order. This is useful for debugging vnet teardown ordering issues. Make it possible to call the actual printing frunction from normal code at runtime, ie. from vnet_sysuninit(), if DDB support is there. Sponsored by: ISPsystem MFC After: 8 days	2010-02-09 22:39:34 +00:00
Bjoern A. Zeeb	61d033d436	Add an SDT provider for "vnet"s along with probes for vnet_alloc and vnet_destroy. Use the line number rather than NULL as dummy argument. Note: the fbt provider does not reliably provide :return probes (depending on optimization levels used at compile time) making it unusable for scripts to generate complete call-traces with well defined boundaries over allocations or destructions of virtual network stacks. Sponsored by: ISPsystem MFC After: 8 days	2010-02-09 22:15:59 +00:00
Ermal Luçi	644da90d9f	Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features. PR: kern/141646 Reviewed by: thompsa Approved by: thompsa(co-mentor) MFC after: 2 weeks	2010-02-06 13:49:35 +00:00
Marko Zec	0a705ab66f	Instead of spamming the console on each curvnet recursion event, print out each such call graph only once, along with a stack backtrace. This should make kernels built with VNET_DEBUG reasonably usable again in busy / production environments. Introduce a new DDB command "show vnetrcrs" which dumps the whole log of distinctive curvnet recursion events. This might be useful when recursion reports get burried / lost too deep in the message buffer. In the later case stack backtraces are not available. Reviewed by: bz MFC after: 3 days	2010-02-04 07:55:42 +00:00
Hiroki Sato	c2a5f1a57a	- Check if_type of "addm <interface>" before setting the interface's MTU to the if_bridge(4) interface. This fixes a bug that MTU value of "addm <interface>" is used even when it is invalid for the if_bridge(4) member: # ifconfig bridge0 create # ifconfig bridge0 bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 ... # ifconfig bridge0 addm lo0 ifconfig: BRDGADD lo0: Invalid argument # ifconfig bridge0 bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 16384 ... - Do not ignore MTU value of an interface even when if_type == IFT_GIF. This fixes MTU mismatch when an if_bridge(4) interface has a gif(4) interface and no other interface as the member, and it is directly used for L2 communication with EtherIP tunneling enabled. - Implement SIOCSIFMTU ioctl. Changing the MTU is allowed only when all members have the same MTU value.	2010-01-31 08:16:37 +00:00
Xin LI	215940b3fa	Revised revision 199201 (add interface description capability as inspired by OpenBSD), based on comments from many, including rwatson, jhb, brooks and others. Sponsored by: iXsystems, Inc. MFC after: 1 month	2010-01-27 00:30:07 +00:00
Shteryana Shopova	93ec7edca7	While flushing the multicast filter of an interface, do not zero the relevant ifmultiaddr structures' reference to the parent interface, unless the parent interface is really detaching. While here, program only link layer multicast filters to a wlan's hardware parent interface. PR: kern/142391, kern/142392 Reviewed by: sam, rpaolo, bms MFC after: 1 week	2010-01-24 16:17:58 +00:00
Andrew Thompson	6117727b6c	Do not hold the lock over if_setlladdr() as it calls into the interface driver init routine.	2010-01-19 04:29:42 +00:00
Andrew Thompson	ea4ca115b7	Declare a new EVENTHANDLER called iflladdr_event which signals that the L2 address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev	2010-01-18 20:34:00 +00:00
Bjoern A. Zeeb	3c20163a70	Correct a typo. MFC after: 5 days	2010-01-10 12:03:53 +00:00
Edward Tomasz Napierala	22133b4449	Stop GCC from complaining about lagg_port_checkstacking() being unused.	2010-01-08 16:44:33 +00:00
Martin Blapp	c2ede4b379	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
Luigi Rizzo	0bcfa8e4b3	put ip_var before ip_fw_private.h as this will be needed in the near future	2010-01-07 10:27:52 +00:00
Luigi Rizzo	7173b6e554	Various cleanup done in ipfw3-head branch including: - use a uniform mtag format for all packets that exit and re-enter the firewall in the middle of a rulechain. On reentry, all tags containing reinject info are renamed to MTAG_IPFW_RULE so the processing is simpler. - make ipfw and dummynet use ip_len and ip_off in network format everywhere. Conversion is done only once instead of tracking the format in every place. - use a macro FREE_PKT to dispose of mbufs. This eases portability. On passing i also removed a few typos, staticise or localise variables, remove useless declarations and other minor things. Overall the code shrinks a bit and is hopefully more readable. I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr. For ng_ipfw i am actually waiting for feedback from glebius@ because we might have some small changes to make. For if_bridge and if_ethersubr feedback would be welcome (there are still some redundant parts in these two modules that I would like to remove, but first i need to check functionality).	2010-01-04 19:01:22 +00:00
John Baldwin	fb92ad4af5	Use stricter checking to match possible vlan clones by not allowing extra garbage characters around or within the tag. Reviewed by: brooks MFC after: 3 days	2009-12-31 20:44:38 +00:00
Brooks Davis	a6fffd6cb0	The devices that supported EVFILT_NETDEV kqueue filters were removed in r195175. Remove all definitions, documentation, and usage. fifo_misc.c: Remove all kqueue tests as fifo_io.c performs all those that would have remained. Reviewed by: rwatson MFC after: 3 weeks X-MFC note: don't change vlan_link_state() function signature	2009-12-31 20:29:58 +00:00
Qing Li	9f1409057b	Remove a deleted comment line that was brought back by my previous commit. MFC after: 5 days	2009-12-31 01:09:16 +00:00
Qing Li	c7ab66020f	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days	2009-12-30 21:35:34 +00:00
John Baldwin	5428776e2c	Change vlan interfaces to cope more usefully with the parent interface being renamed. Previously the vlan interfaces would lose their configuration as if the parent interface had been physically removed. Now vlan interfaces ignore rename events. - Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being renamed. This flag can be checked in ifnet departure/arrival event handlers to treat rename events differently. - Change the ifnet departure event handler in the if_vlan(4) driver to ignore departure events due to a trunk interface being renamed. Reviewed by: brooks, rwatson MFC after: 1 week	2009-12-29 13:35:18 +00:00
Luigi Rizzo	830c6e2b97	bring in several cleanups tested in ipfw3-head branch, namely: r201011 - move most of ng_ipfw.h into ip_fw_private.h, as this code is ipfw-specific. This removes a dependency on ng_ipfw.h from some files. - move many equivalent definitions of direction (IN, OUT) for reinjected packets into ip_fw_private.h - document the structure of the packet tags used for dummynet and netgraph; r201049 - merge some common code to attach/detach hooks into a single function. r201055 - remove some duplicated code in ip_fw_pfil. The input and output processing uses almost exactly the same code so there is no need to use two separate hooks. ip_fw_pfil.o goes from 2096 to 1382 bytes of .text r201057 (see the svn log for full details) - macros to make the conversion of ip_len and ip_off between host and network format more explicit r201113 (the remaining parts) - readability fixes -- put braces around some large for() blocks, localize variables so the compiler does not think they are uninitialized, do not insist on precise allocation size if we have more than we need. r201119 - when doing a lookup, keys must be in big endian format because this is what the radix code expects (this fixes a bug in the recently-introduced 'lookup' option) No ABI changes in this commit. MFC after: 1 week	2009-12-28 10:47:04 +00:00
Robert Watson	912f6323cd	When warning about possible netisr configuration problems during boot, report using "netisr_init" rather than "netisr2", which was the development name for the project. MFC after: 3 days	2009-12-23 12:33:59 +00:00
Robert Watson	0a32e29f59	Refine netisr.c comments a bit.	2009-12-23 12:31:27 +00:00
Luigi Rizzo	de240d1013	merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw. In detail: 1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK. 2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N). 3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1). 4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id> All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t Operation costs now are as follows: Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) ------------------------------------------------------------------- The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI Supported by: Valeria Paoli MFC after: 1 month	2009-12-22 19:01:47 +00:00
John Baldwin	8e9683767c	Remove commented out prototype for ifinit(). This prototype has been commented out since 1.1 and has not been present in <sys/systm.h> since at least 1.1 of that file. It is also not needed in FreeBSD due to SYSINIT().	2009-12-21 20:09:19 +00:00
Luigi Rizzo	70228fb346	Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part. Files touched by this commit: conf/files now references the two new source files netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h. netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions netinet/ipfw/ip_fw_log.c ipfw_log and related functions netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer. A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them. MFC after: 1 month	2009-12-15 16:15:14 +00:00
Luigi Rizzo	614cb83990	Move the scan for max_keylen into route.c::route_init(), and make max_keylen an argument for rn_init(). This removes an unnecessary dependency on domain.h from radix.c MFC after: 7 days	2009-12-14 20:12:51 +00:00
Bjoern A. Zeeb	de0bd6f76b	Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days	2009-12-13 13:57:32 +00:00
Luigi Rizzo	a50f6188de	Make the code buildable in userland so it is easier to test it: this requires a small reordering of headers and a few #defines to map functions not available in userland. Remove a useless #ifndef block at the beginning of the file. Introduce (temporarily) rn_init2(), see the comment in the code for the proper long term change. No ABI or functional change. MFC after: 7 days	2009-12-12 15:49:28 +00:00
Luigi Rizzo	22efc80fd8	No functional changes (who dares to touch this code!) but: - cast the result of LEN() to int as this is the main usage. - use LEN() in one place where it was forgotten. - Document the use of a static variable in rw mode. More small changes to follow. MFC after: 7 days	2009-12-10 10:34:30 +00:00
John Baldwin	34605f8542	Remove if_timer/if_watchdog now that they are no longer used. The space used by if_timer is reserved for expanding if_index to an int in the future. Reviewed by: rwatson, brooks	2009-11-30 21:25:57 +00:00
Jung-uk Kim	c12b965f99	General style cleanup, no functional change.	2009-11-20 21:12:40 +00:00
Jung-uk Kim	5ecf77367c	- Allocate scratch memory on stack instead of pre-allocating it with the filter as we do from bpf_filter()[1]. - Revert experimental use of contigmalloc(9)/contigfree(9). It has no performance benefit over malloc(9)/free(9)[2]. Requested by: rwatson[1] Pointed out by: rwatson, jhb, alc[2]	2009-11-20 18:49:20 +00:00
Jung-uk Kim	ae4fdab8a8	- Change internal function bpf_jit_compile() to return allocated size of the generated binary and remove page size limitation for userland. - Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make sure the generated binary aligns properly and make it physically contiguous.	2009-11-18 23:40:19 +00:00
Jung-uk Kim	366652f987	- Make BPF JIT compiler working again in userland. We are limiting size of generated native binary to page size for now. - Update copyright date and fix some style nits.	2009-11-18 19:26:17 +00:00
Michael Tuexen	7f2797200f	Fix a LOR showing up with sctp_bsd_addr(): Do not hold a rt lock when calling rt_newaddrmsg(). Reviewed by: qingli Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 12:57:10 +00:00
Xin LI	1a9d4dda9b	Revert revision 199201 for now as it has introduced a kernel vulnerability and requires more polishing.	2009-11-12 19:02:10 +00:00
Xin LI	41c8c6e876	Add interface description capability as inspired by OpenBSD. MFC after: 3 months	2009-11-11 21:30:58 +00:00
John Baldwin	e1b17582f4	Take a step towards removing if_watchdog/if_timer. Don't explicitly set if_watchdog/if_timer to NULL/0 when initializing an ifnet. if_alloc() sets those members to NULL/0 already.	2009-11-06 14:55:01 +00:00
Robert Watson	974e99b008	Remove unneeded blank line from bpf_drvinit(). MFC after: 3 days	2009-10-23 17:26:29 +00:00
Christian Brueffer	4382b0681e	Check pointer for NULL before dereferencing it, not after. PR: 138390 Submitted by: Patroklos Argyroudis <argp@census-labs.com> MFC after: 1 week	2009-10-22 06:17:04 +00:00
Qing Li	fc02477e1c	Verify "smp_started" is true before calling sched_bind() and sched_unbind(). Reviewed by: kmacy MFC after: 3 days	2009-10-22 00:32:01 +00:00
Qing Li	48d0c039cb	The flow-table function flowtable_route_flush() may be called during system initialization time. Since the flow-table is designed to maintain per CPU flow cache, the existing code did not check whether "smp_started" is true before calling sched_bind() and sched_unbind(), which triggers a page fault. Reviewed by: jeff MFC after: immediately	2009-10-20 21:27:03 +00:00
Robert Watson	cee8119875	Clean up comments, white space, and style in pfil.c (especially new VNET bits). MFC after: 3 days (not VNET bits)	2009-10-19 15:19:14 +00:00
Robert Watson	23b5fd2285	Remove unused pfil_flags field in packet_filter_hook. MFC after: 3 days	2009-10-18 22:54:09 +00:00
Robert Watson	c9ddf688b6	Sort function prototypes in pfil.h, clean up white space, and better align fields for printing. MFC after: 3 days	2009-10-18 22:43:28 +00:00
Robert Watson	33c89765f1	Line-wrap pfil.c so that it prints more nicely. MFC after: 3 days	2009-10-18 11:27:34 +00:00
Bjoern A. Zeeb	382e8b5ad9	Unbreak the VIMAGE build with IPSEC, broken with r197952 by virtualizing the pfil hooks. For consistency add the V_ to virtualize the pfil hooks in here as well. MFC after: 55 days X-MFC after: julian MFCed r197952.	2009-10-14 11:55:55 +00:00
Julian Elischer	0b4b0b0fee	Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months	2009-10-11 05:59:43 +00:00
Bjoern A. Zeeb	db44ff4047	Put #ifdef INET around parts of the FLOWTABLE code, to unbreak nooptions INET kernel builds. MFC after: 3 days X-MFC: with r197687	2009-10-03 10:56:03 +00:00
Qing Li	e5c610d659	The flow-table associates TCP/UDP flows and IP destinations with specific routes. When the routing table changes, for example, when a new route with a more specific prefix is inserted into the routing table, the flow-table is not updated to reflect that change. As such existing connections cannot take advantage of the new path. In some cases the path is broken. This patch will update the affected flow-table entries when a more specific route is added. The route entry is properly marked when a route is deleted from the table. In this case, when the flow-table performs a search, the stale entry is updated automatically. Therefore this patch is not necessary for route deletion. Submitted by: simon, phk Reviewed by: bz, kmacy MFC after: 3 days	2009-10-01 20:32:29 +00:00
Qing Li	46e7f9838b	A wrong variable is used when setting up the interface address route, which broke source address selection in some code paths. Submitted by: noted by bz Reviewed by: hrs MFC after: immediately	2009-09-20 17:22:19 +00:00
Marko Zec	38d61195b8	Style fix - break too long a line in two. Spotted by: bz MFC after: 3 days	2009-09-18 09:03:23 +00:00
Marko Zec	989e04112b	V_irtualize the lltables list, making ARP and ND reasonably usable again with options VIMAGE kernels. Submitted by: bz (the original version, probably identical to this one) Reviewed by: many @ DevSummit Cambridge MFC after: 3 days	2009-09-17 14:52:15 +00:00
Qing Li	9bb7d0f47a	Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately	2009-09-15 19:18:34 +00:00
Robert Watson	e76d823b81	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
Ed Maste	1bdc73d337	Compare pointer with NULL, not 0.	2009-09-09 03:36:43 +00:00
Navdeep Parhar	9a31144537	Add arp_update_event. This replaces route_arp_update_event, which has not worked since the arp-v2 rewrite. The event handler will be called with the llentry write-locked and can examine la_flags to determine whether the entry is being added or removed. Reviewed by: gnn, kmacy Approved by: gnn (mentor) MFC after: 1 month	2009-09-08 21:17:17 +00:00
Qing Li	d134008aa0	The addresses that are assigned to the loopback interface should be part of the kernel routing table. Reviewed by: bz MFC after: immediately	2009-09-05 20:24:37 +00:00
Qing Li	9452b0d2de	This patch fixes the following issues: - Interface link-local address is not reachable within the node that owns the interface, this is due to the mismatch in address scope as the result of the installed interface address loopback route. Therefore for each interface address loopback route, the rt_gateway field (of AF_LINK type) will be used to track which interface a given address belongs to. This will aid the address source to use the proper interface for address scope/zone validation. - The loopback address is not reachable. The root cause is the same as the above. - Empty nd6 entries are created for the IPv6 loopback addresses only for validation reason. Doing so will eliminate as much of the special case (loopback addresses) handling code as possible, however, these empty nd6 entries should not be returned to the userland applications such as the "ndp" command. Since both of the above issues contain common files, these files are committed together. Reviewed by: bz MFC after: immediately	2009-09-05 16:43:16 +00:00
George V. Neville-Neil	54fc657d59	Add ARP statistics to the kernel and netstat. New counters now exist for: requests sent replies sent requests received replies received packets received total packets dropped due to no ARP entry entrys timed out Duplicate IPs seen The new statistics are seen in the netstat command when it is given the -s command line switch. MFC after: 2 weeks In collaboration with: bz	2009-09-03 21:10:57 +00:00
Qing Li	5311e988ea	As part of r196609, a call to "rtalloc" did not take the fib into account. So call the appropriate "rtalloc_ign_fib()" instead of calling "rtalloc_ign()". Reviewed by:i pointed out by bz MFC after: immediately	2009-08-31 00:14:37 +00:00
Marko Zec	a99fcfd4ca	Introduce a separate sx lock for protecting lists of vnet sysinit and sysuninit handlers. Previously, sx_vnet, which is a lock designated for protecting the vnet list, was (ab)used for protecting vnet sysinit / sysuninit handler lists as well. Holding exclusively the sx_vnet lock while invoking sysinit and / or sysuninit handlers turned out to be problematic, since some of the handlers may attempt to wake up another thread and wait for it to walk over the vnet list, hence acquire a shared lock on sx_vnet, which in turn leads to a deadlock. Protecting vnet sysinit / sysuninit lists with a separate lock mitigates this issue, which was first observed with flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c. Reviewed by: rwatson, jhb MFC after: 3 days	2009-08-28 22:30:55 +00:00
Qing Li	9231d35f4d	In ip_output(), the flow-table module must not try to cache L2/L3 information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type. Since the L2 information (rt_lle) is invalid for these interface types, accidental caching attempt will trigger panic when the invalid rt_lle reference is accessed. When installing a new route, or when updating an existing route, the user supplied gateway address may be an interface address (this is particularly true for point-to-point interface related modules such as ppp, if_tun, if_gif). Currently the routing command handler always set the RTF_GATEWAY flag if the gateway address is given as part of the command paramters. Therefore the gateway address must be verified against interface addresses or else the route would be treated as an indirect route, thus making that route unusable. Reviewed by: kmacy, julia, rwatson Verified by: marcus MFC after: 3 days	2009-08-28 07:01:09 +00:00
Robert Watson	ed2dabfc68	Add IFNET_HOLD reserved pointer value for the ifindex ifnet array, which allows an index to be reserved for an ifnet without making the ifnet available for management operations. Use this in if_alloc() while the ifnet lock is released between initial index allocation and completion of ifnet initialization. Add ifindex_free() to centralize the implementation of releasing an ifindex value. Use in if_free() and if_vmove(), as well as when releasing a held index in if_alloc(). Reviewed by: bz MFC after: 3 days	2009-08-26 11:13:10 +00:00
Robert Watson	61f6986b07	Break out allocation of new ifindex values from if_alloc() and if_vmove(), and centralize in a single function ifindex_alloc(). Assert the IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not close all known races in this code. Reviewed by: bz MFC after: 3 days	2009-08-25 20:21:16 +00:00
Robert Watson	dc56e98f0d	Use locks specific to the lltable code, rather than borrow the ifnet list/index locks, to protect link layer address tables. This avoids lock order issues during interface teardown, but maintains the bug that sysctl copy routines may be called while a non-sleepable lock is held. Reviewed by: bz, kmacy MFC after: 3 days	2009-08-25 09:52:38 +00:00
Jack F Vogel	3de029efaf	When bridging LRO is causing a problem, the believe that it would work as long as all interfaces have TSO seems to be false, until the matter gets sorted out just disable LRO completely.	2009-08-24 21:04:51 +00:00
Robert Watson	8e937462f4	Make if_grow static -- it's not used outside of if.c, and with the internals destined to change, it's better if it remains that way. MFC after: 3 days	2009-08-24 12:52:05 +00:00
Marko Zec	52db6805ea	When moving ifnets from one vnet to another, and the ifnet has ifaddresses of AF_LINK type which thus have an embedded if_index "backpointer", we must update that if_index backpointer to reflect the new if_index that our ifnet just got assigned. This change affects only options VIMAGE builds. Submitted by: bz Reviewed by: bz Approved by: re (rwatson), julian (mentor)	2009-08-24 10:14:09 +00:00
Robert Watson	6852110b64	Rather than using IFNET_RLOCK() when iterating over (and modifying) the ifnet list during if_ef load, directly acquire the ifnet_sxlock exclusively. That way when if_alloc() recurses the lock, it's a write recursion rather than a read->write recursion. This code structure is arguably a bug, so add a comment indicating that this is the case. Post-8.0, we should fix this, but this commit resolves panic-on-load for if_ef. Discussed with: bz, julian Reported by: phk MFC after: 3 days	2009-08-23 21:00:21 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Julian Elischer	cd81cd3fd1	Don't allow access to the internals until it has all been set up. Specifically, not until the per-vnet parts have been set up. Submitted by: kmacy@ Reviewed by: julian@, zec@ Approved by: re(rwatson) MFC after: immediately	2009-08-21 09:22:32 +00:00
Kip Macy	6d37c3ecd9	This change fixes a comment and addresses a complaint by kib@ by moving a frequently executed flowtable syslog statement from being conditional on bootverbose to conditional on a per-vnet flowtable sysctl. Approved by: re@	2009-08-19 20:13:09 +00:00
Kip Macy	3ee42584f9	- change the interface to flowtable_lookup so that we don't rely on the mbuf for obtaining the fib index - check that a cached flow corresponds to the same fib index as the packet for which we are doing the lookup - at interface detach time flush any flows referencing stale rtentrys associated with the interface that is going away (fixes reported panics) - reduce the time between cleans in case the cleaner is running at the time the eventhandler is called and the wakeup is missed less time will elapse before the eventhandler returns - separate per-vnet initialization from global initialization (pointed out by jeli@) Reviewed by: sam@ Approved by: re@	2009-08-18 20:28:58 +00:00
Kip Macy	d53e359b9a	fix netboot issue by disabling flowtable lookups until initialization has been run Reviewed by: rwatson@ Approved by: re@	2009-08-17 19:09:28 +00:00
Robert Watson	d931ea0961	Remove unused if_rawoutput() macro; it has been unused since at least FreeBSD 2. Approved by: re (kib)	2009-08-15 22:26:26 +00:00
Marko Zec	9abb486279	Appease VNET_DEBUG - in if_vmove we temporarily switch i.e. recurse from one vnet to another which is OK, so no need to flood the console with warnings here. Approved by: re (rwatson), julian (mentor)	2009-08-14 22:46:45 +00:00
Marko Zec	67addcde86	Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from INVARIANTS. Reviewed by: bz Approved by: re (rwatson), julian (mentor)	2009-08-14 22:41:39 +00:00
Bjoern A. Zeeb	eb79e1c76e	Make it possible to change the vnet sysctl variables on jails with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)	2009-08-13 10:26:34 +00:00
Bjoern A. Zeeb	20b0cdb749	Put multiple instructions into a block when iterating; unbreaks NET_RT_DUMP, which otherwise only returned information of AF_MAX. This was broken in r193232 (save your time - my bug, my fix). PR: kern/137700 Reported by: Larry Baird (lab gta.com) Tested by: Larry Baird (lab gta.com) Reviewed by: zec, lstewart, qing Approved by: re (kib)	2009-08-13 09:29:52 +00:00
Jung-uk Kim	a36599cce7	Always embed pointer to BPF JIT function in BPF descriptor to avoid inconsistency when opt_bpf.h is not included. Reviewed by: rwatson Approved by: re (rwatson)	2009-08-12 17:28:53 +00:00
Bjoern A. Zeeb	281c86a4ef	Update DDB show vnet command to print all used and available information. Reviewed by: rwatson, zec Approved by: re	2009-08-12 12:00:21 +00:00
Bjoern A. Zeeb	1b501e53f3	Put minimum alignment on the dpcpu and vnet section so that ld when adding the __start_ symbol knows the expected section alignment and can place the __start_ symbol correctly. These sections will not support symbols with super-cache line alignment requirements. For full details, see posting to freebsd-current, 2009-08-10, Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>. Debugging and testing patches by: Kamigishi Rei (spambox haruhiism.net), np, lstewart, jhb, kib, rwatson Tested by: Kamigishi Rei, lstewart Reviewed by: kib Approved by: re	2009-08-12 10:26:03 +00:00
Robert Watson	315e3e38fa	Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib)	2009-08-02 19:43:32 +00:00
Robert Watson	6aad5c1c93	The colour was red as shall be the letters of this warning to people upon boot if the experimental VIMAGE feature was compiled into the kernel. Submitted by: bz Reviewed by: zec Approved by: re (vimage blanket)	2009-08-01 22:22:45 +00:00
Robert Watson	c8f6a13820	Minor style tweaks. Approved by: re (vimage blanket)	2009-08-01 21:58:32 +00:00
Robert Watson	6bc2c7b70c	Make the vnet alloc/destroy paths a bit easier to followg by merging vnet_data_init/vnet_data_destroy into vnet_alloc/vnet_destroy. Reviewed by: bz, zec Approved by: re (vimage blanket)	2009-08-01 21:54:15 +00:00
Robert Watson	7429a3f3d8	Remove vnet_foreach() utility function, which previously allowed vnet.c to iterate virtual network stacks without being aware of the implementation details previously hidden in kern_vimage.c. Now they are in the same file, so remove this added complexity. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 20:24:45 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	ed3db012fc	Reorder and recomment vnet.c and vnet.h on the basis that they are no longer solely about the virtual network stack memory allocator. Approved by: re (vimage blanket)	2009-07-30 12:41:19 +00:00
Robert Watson	a9bcca799e	Revise header comments for vnet.h as we now implement VNET_SYSINIT, not just VNET_DEFINE in vnet.h. Approved by: re (vimage blanket)	2009-07-28 22:17:34 +00:00

... 3 4 5 6 7 ...

2954 Commits