freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	cad582b753	Better comment for ifa_init(), ifa_ref(), ifa_free().	2012-02-05 08:53:05 +00:00
Gleb Smirnoff	adc7231d5d	In ifa_init() initialize if_data.ifi_datalen. This would be required after upcoming changes from bz@. Discussed with: bz	2012-02-05 08:31:15 +00:00
Bjoern A. Zeeb	81d5d46b3c	Add multi-FIB IPv6 support to the core network stack supplementing the original IPv4 implementation from r178888: - Use RT_DEFAULT_FIB in the IPv4 implementation where noticed. - Use rtfib() KPI with explicit RT_DEFAULT_FIB where applicable in the NFS code. - Use the new in6_rt KPI in TCP, gif(4), and the IPv6 network stack where applicable. - Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally prevent multiple initializations of callouts in in6_inithead(). - Use wrapper functions where needed to preserve the current KPI to ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup. - Fix (related) comments (both technical or style). - Convert to rtinit() where applicable and only use custom loops where currently not possible otherwise. - Multicast group, most neighbor discovery address actions and faith(4) are locked to the default FIB. Individual IPv6 addresses will only appear in the default FIB, however redirect information and prefixes of connected subnets are automatically propagated to all FIBs by default (mimicking IPv4 behavior as closely as possible). Sponsored by: Cisco Systems, Inc.	2012-02-03 13:08:44 +00:00
Bjoern A. Zeeb	a84986256f	Move a comment from rtinit1() to the top of the file where dealing with the (maximum) number of FIBs trying to clarify that evetually FIBs should probably attached to domain(9) specific storage. [1] Add a comment on a limitimation on the rt_add_addr_allfibs option. Use RT_DEFAULT_FIB instead of 0 where applicable. Add empty line to functions without local variables per style. Put public yet unused in-tree function rtinit_fib() under BURN_BRIDGES to indicate that it might go away in the future. No functional change. Discussed with: julian [1] (clarification on what the original one meant) Sponsored by: Cisco Systems, Inc.	2012-02-03 12:25:14 +00:00
Bjoern A. Zeeb	b3dd077152	Minor optimization doing input validation with a possible early return before doing further work. Sponsored by: Cisco Systems, Inc.	2012-02-03 11:20:11 +00:00
Bjoern A. Zeeb	096f27864f	Fix FLOWTABLE IPv6 handling in route.c missed in r205066. While doing so, for consistency with the rtalloc_ign_fib(9) interface called, remove the "in_" prefix from rtalloc_ign_wrapper() no longer indicating that it would only handle the INET case. Sponsored by: Cisco Systems, Inc.	2012-02-03 10:17:34 +00:00
Bjoern A. Zeeb	b680a383a8	Allow for IPv6 to allocate (and in the VIMAGE case free) as many routing tables (FIBs) as IPv4. Prepare various general rt* functions for multi-FIB IPv6 handling in addition to already existing multi-FIB IPv4 cases. Sponsored by: Cisco Systems, Inc.	2012-02-03 09:23:55 +00:00
Bjoern A. Zeeb	556d81ddd7	Rather than putting magic 0s as FIB argument into the rt* calls, provide a macro RT_DEFAULT_FIB defined to 0 to more easily identify the cases tied to the default FIB. Sponsored by: Cisco Systems, Inc.	2012-02-03 09:06:24 +00:00
Kip Macy	0fe48d670f	A flowtable entry can continue referencing an llentry indefinitely if the entry is repeatedly referenced within its timeout window. This change clears the LLE_VALID flag when an llentry is removed from an interface's hash table and adds an extra check to the flowtable code for the LLE_VALID flag in llentry to avoid retaining and using a stale reference. Reviewed by: qingli@ MFC after: 2 weeks	2012-01-26 20:02:40 +00:00
Bjoern A. Zeeb	8d74af3668	Replace random ARIN direct assignment legacy IPs with proper RFC 5735 TEST-NET1 block for use in documentation and example code addresses. MFC after: 3 days	2012-01-24 15:20:31 +00:00
Eitan Adler	dde153da49	- Fix trivial typo Approved by: nwhitehorn MFC after: 3 days	2012-01-14 17:07:52 +00:00
Robert Watson	7983103ae6	Clarify throughout the vlan(4) code the difference between a "tag" (the 802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a VLAN ID (VID). Tags go in packets. VIDs identify VLANs. No functional change is intended, so this should be safe to MFC. Further cleanup with functional changes will be committed separately (for example, renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI). Reviewed by: bz Sponsored by: ADARA Networks, Inc. MFC after: 3 days	2012-01-12 18:39:37 +00:00
Lawrence Stewart	9a7e6bac47	Consumers of bpfdetach() expect it to remove all bpf_if structs from the bpf_iflist list which reference the specified ifnet. The existing implementation only removes the first matching bpf_if found in the list, effectively leaking list entries if an ifnet has been bpfattach()ed multiple times with different DLTs. Fix the leak by performing the detach logic in a loop, stopping when all bpf_if structs referencing the specified ifnet have been detached and removed from the bpf_iflist list. Whilst here, also: - Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never exist in the list with a NULL ifnet pointer. - Except when INVARIANTS is in the kernel config, silently ignore the case where no bpf_if referencing the specified ifnet is found, as it is harmless and does not require user attention. Reviewed by: csjp MFC after: 1 week	2012-01-10 00:48:29 +00:00
John Baldwin	fbcebf7f71	Convert the per-interface address list lock from a mutex to a reader/writer lock. Reviewed by: bz	2012-01-09 19:34:12 +00:00
Gleb Smirnoff	c82c34b4a9	Copy ifa->if_data to ifam->ifam_data. This was forgotten in r228571. Submitted by: bz	2012-01-08 17:11:53 +00:00
Gleb Smirnoff	94901d5e60	Move arprequest() declaration to if_ether.h.	2012-01-08 13:34:00 +00:00
Gleb Smirnoff	dcb39bd84a	Since r228571 CARP is no longer an interface.	2012-01-06 12:05:43 +00:00
John Baldwin	137f91e80f	Convert all users of IF_ADDR_LOCK to use new locking macros that specify either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks	2012-01-05 19:00:36 +00:00
John Baldwin	a2cb1d522b	Add new variants of the IF_ADDR_LOCK() macros used for protecting interface address lists that distinguish read locks from write locks. To preserve the KPI, the previous operations are mapped to the write lock macros. The lock is still kept as a mutex for now. Reviewed by: bz MFC after: 2 weeks	2012-01-05 18:35:49 +00:00
Robert Watson	5a39f779b2	Refine last comment. Submitted by: joeld Sponsored by: ADARA Networks, Inc. MFC after: 3 days	2012-01-05 11:42:34 +00:00
Robert Watson	15f6780ef4	Add comment to the VLAN code about its integration with VIMAGE: we see what the code is doing, we recognise the legitimacy of its goal, but we're not quite sure it's going about it the right way. More pondering is clearly required. Sponsored by: ADARA Networks, Inc. Discussed with: bz MFC after: 3 days	2012-01-05 11:24:22 +00:00
Lawrence Stewart	253a3814d4	Revert r228986 until it can be reworked to avoid panicing the kernel when the same interface is attached multiple times with different DLTs, as is done in net80211 for example. Reported by: adrian	2011-12-31 07:21:28 +00:00
Lawrence Stewart	0f89fc22f3	- Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one aspect of time stamp configuration per interface rather than per BPF descriptor. Prior to this, the order in which BPF devices were opened and the per descriptor time stamp configuration settings could cause non-deterministic and unintended behaviour with respect to time stamping. With the new scheme, a BPF attached interface's tscfg sysctl entry can be set to "default", "none", "fast", "normal" or "external". Setting "default" means use the system default option (set with the net.bpf.tscfg.default sysctl), "none" means do not generate time stamps for tapped packets, "fast" means generate time stamps for tapped packets using a hz granularity system clock read, "normal" means generate time stamps for tapped packets using a full timecounter granularity system clock read and "external" (currently unimplemented) means use the time stamp provided with the packet from an underlying source. - Utilise the recently introduced sysclock_getsnapshot() and sysclock_snap2bintime() KPIs to ensure the system clock is only read once per packet, regardless of the number of BPF descriptors and time stamp formats requested. Use the per BPF attached interface time stamp configuration to control if sysclock_getsnapshot() is called and whether the system clock read is fast or normal. The per BPF descriptor time stamp configuration is then used to control how the system clock snapshot is converted to a bintime by sysclock_snap2bintime(). - Remove all FAST related BPF descriptor flag variants. Performing a "fast" read of the system clock is now controlled per BPF attached interface using the net.bpf.tscfg sysctl tree. - Update the bpf.4 man page. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ In collaboration with: Julien Ridoux (jridoux at unimelb edu au)	2011-12-30 08:57:58 +00:00
Pyun YongHyeon	1ad7a2570d	Update if_obytes and if_omcast after successful transmit. While I'm here update if_oerrors if parent interface of vlan is not up and running. Previously it updated collision counter and it was confusing to interprete it. PR: kern/163478 Reviewed by: glebius, jhb Tested by: Joe Holden < lists <> rewt dot org dot uk >	2011-12-29 18:40:58 +00:00
Gleb Smirnoff	7121247312	Provide ABI compatibility shim to enable configuring of addresses with ifconfig(8) prior to r228571. Requested by: brooks	2011-12-21 12:39:08 +00:00
Gleb Smirnoff	f08535f872	Restore a feature that was present in 5.x and 6.x, and was cleared in 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.	2011-12-20 13:53:31 +00:00
Gleb Smirnoff	08b68b0e4c	A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]	2011-12-16 12:16:56 +00:00
Gleb Smirnoff	f3909e37ff	Simplify rtrequest(RTM_ADD): ifa can't be NULL after rt_getifa_fib().	2011-12-15 12:49:10 +00:00
Brooks Davis	f26fa169e7	Remove the unused if_free_type() function. X-MFC after: never	2011-12-09 23:26:28 +00:00
Luigi Rizzo	506cc70cce	1. Fix the handling of link reset while in netmap more. A link reset now is completely transparent for the netmap client: even if the NIC resets its own ring (e.g. restarting from 0), the client will not see any change in the current rx/tx positions, because the driver will keep track of the offset between the two. 2. make the device-specific code more uniform across different drivers There were some inconsistencies in the implementation of the netmap support routines, now drivers have been aligned to a common code structure. 3. import netmap support for ixgbe . This is implemented as a very small patch for ixgbe.c (233 lines, 11 chunks, mostly comments: in total the patch has only 54 lines of new code) , as most of the code is in an external file sys/dev/netmap/ixgbe_netmap.h , following some initial comments from Jack Vogel about making changes less intrusive. (Note, i have emailed Jack multiple times asking if he had comments on this structure of the code; i got no reply so i assume he is fine with it). Support for other drivers (em, lem, re, igb) will come later. "ixgbe" is now the reference driver for netmap support. Both the external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should serve as a reference for other device drivers. Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap, the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz on an i7-860 with 4 cores and 82599 card. Haven't tried yet more aggressive optimizations such as adding 'prefetch' instructions in the time-critical parts of the code.	2011-12-05 12:06:53 +00:00
Lawrence Stewart	3e47c78798	Revert r227778 in preparation for committing reworked patches in its place.	2011-11-29 12:55:26 +00:00
John Baldwin	d9b1d61535	Change the if_vlan driver to use if_transmit for forwarding packets to the parent interface. This avoids the overhead of queueing a packet to an IFQ only to immediately dequeue it again. Suggested by: np Reviewed by: brooks MFC after: 1 month	2011-11-28 19:35:08 +00:00
Gleb Smirnoff	2e9fff5b18	- Use generic alloc_unr(9) allocator for if_clone, instead of hand-made. - When registering new cloner, check whether a cloner with same name already exist. - When allocating unit, also check with help of ifunit() whether such interface already exist or not. [1] PR: kern/162789 [1]	2011-11-28 14:44:59 +00:00
Gleb Smirnoff	c0ba290b5f	Improve logging: - don't hardcode function name - use LOG_DEBUG for such a debug message - print error value	2011-11-22 19:42:17 +00:00
Lawrence Stewart	b6f1c7db32	- When feed-forward clock support is compiled in, change the BPF header to contain both a regular timestamp obtained from the system clock and the current feed-forward ffcounter value. This enables new possibilities including comparison of timekeeping performance and timestamp correction during post processing. - Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping packets using the feedback or feed-forward system clock. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-21 04:17:24 +00:00
Luigi Rizzo	68b8534bdf	Bring in support for netmap, a framework for very efficient packet I/O from userspace, capable of line rate at 10G, see http://info.iet.unipi.it/~luigi/netmap/ At this time I am bringing in only the generic code (sys/dev/netmap/ plus two headers under sys/net/), and some sample applications in tools/tools/netmap. There is also a manpage in share/man/man4 [1] In order to make use of the framework you need to build a kernel with "device netmap", and patch individual drivers with the code that you can find in sys/dev/netmap/head.diff The file will go away as the relevant pieces are committed to the various device drivers, which should happen in a few days after talking to the driver maintainers. Netmap support is available at the moment for Intel 10G and 1G cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re"). I have partial patches for "bge" and am starting to work on "cxgbe". Hopefully changes are trivial enough so interested third parties can submit their patches. Interested people can contact me for advice on how to add netmap support to specific devices. CREDITS: Netmap has been developed by Luigi Rizzo and other collaborators at the Universita` di Pisa, and supported by EU project CHANGE (http://www.change-project.eu/) The code is distributed under a BSD Copyright. [1] In my opinion is a bad idea to have all manpage in one directory. We should place kernel documentation in the same dir that contains the code, which would make it much simpler to keep doc and code in sync, reduce the clutter in share/man/ and incidentally is the policy used for all of userspace code. Makefiles and doc tools can be trivially adjusted to find the manpages in the relevant subdirs.	2011-11-17 12:17:39 +00:00
Robert Millan	ea4d9a14f1	Remove a few bits of FreeBSD 2.x compatibility code. Approved by: kib (mentor)	2011-11-14 18:21:27 +00:00
Brooks Davis	4b22573a89	In r191367 the need for if_free_type() was removed and a new member if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month	2011-11-11 22:57:52 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Max Laier	3ca1a2d6a0	Fix a use-after-free/redzone issue in the routing code. Reported by (repeatedly): Mike Tancsa Prodded by (repeatedly): bz Forgotten by (repeatedly): mlaier MFC after: 2 weeks	2011-11-03 18:33:30 +00:00
Gleb Smirnoff	a0af7c3edb	Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off the queue. It can be utilized in queue processing to avoid multiple locking/unlocking.	2011-10-27 09:45:12 +00:00
Qing Li	46a70de2b0	The host-id/interface-id can have a specific value and is properly masked out when adding a prefix route through the "route" command. However, when deleting the route, simply changing the command keyword from "add" to "delete" does not work. The failoure is observed in both IPv4 and IPv6 route insertion. The patch makes the route command behavior consistent between the "add" and the "delete" operation. MFC after: 1 week	2011-10-25 00:34:39 +00:00
Ed Schouten	cf05e311ea	Add missing #includes. According to POSIX, these two header files should be able to be included by themselves, not depending on other headers. The <net/if.h> header uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses integer datatypes (u_int32_t, u_short, etc). MFC after: 2 months	2011-10-21 12:58:34 +00:00
Ed Schouten	a185bd12f3	Get rid of D_PSEUDO. It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL. Nowadays we have a different interface for that; make_dev_p(). There's no need to keep it there. While there, remove an unneeded D_NEEDMINOR from the gpio driver. Discussed with: gonzo@ (gpio)	2011-10-18 08:09:44 +00:00
Bjoern A. Zeeb	528737fdfe	Pass the fibnum where we need filtering of the message on the rtsock allowing routing daemons to filter routing updates on an rtsock per FIB. Adjust raw_input() and split it into wrapper and a new function taking an optional callback argument even though we only have one consumer [1] to keep the hackish flags local to rtsock.c. PR: kern/134931 Submitted by: multiple (see PR) Suggested by: rwatson [1] Reviewed by: rwatson MFC after: 3 days	2011-09-28 13:48:36 +00:00
Kip Macy	1eeb6d97d0	Make KBI changes required for future MFCing of inpcb rtentry / llentry caching. Reviewed by: rwatson, bz Approved by: re (kib)	2011-09-20 20:27:26 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Andrew Thompson	0fe082e7d5	On the first loop for generating a bridge MAC address use the local hostid, this gives a good chance of keeping the same address over reboots. This is intended to help IPV6 and similar which generate their addresses from the mac. PR: kern/160300 Submitted by: mdodd Approved by: re (kib)	2011-09-04 22:06:32 +00:00
Bjoern A. Zeeb	3d07127c64	When adding IPv6 fwd support to ipfw in r225044 these two files were not committed. Initialize next_hop6 to align with the IPv4 code. PR: bin/117214 MFC after: 3 weeks X-MFC with: r225044 Approved by: re (kib)	2011-08-27 08:49:55 +00:00
Attilio Rao	6aba400a70	Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks	2011-08-25 15:51:54 +00:00
Qing Li	fc96aabef1	When the RADIX_MPATH kernel option is enabled, the RADIX_MPATH code tries to find the first route node of an ECMP chain before executing the route command. If the system has a default route, and the specific route argument to the command does not exist in the routing table, then the default route would be reached. The current code does not verify the reached node matches the given route argument, therefore erroneous removed the entry. This patch fixes that bug. Approved by: re MFC after: 3 days	2011-08-25 04:31:20 +00:00
Kevin Lo	e9ff3d45e4	In rtinit1(), before rtrequest1_fib() is called, info.rti_flags is initialized by flags (function argument) or-ed with ifa->ifa_flags. If both NIC has a loopback route to itself, so IFA_RTSELF is set on ifa(s). As IFA_RTSELF is defined by RTF_HOST, rtrequest1_fib() is called with RTF_HOST flag even if netmask is not NULL. Consequently, netmask is set to zero in rtrequest1_fib(), and request to add network route is changed under hands to request to add host route. Tested by: Andrew Boyer <aboyer at averesystems.com> Submitted by: Svatopluk Kraus <onwahe at gmail dot com> Approved by: re (hrs)	2011-08-08 05:25:51 +00:00
Sergey Kandaurov	c94a66f8ae	Add missing MODULE_VERSION() definition to protect against duplicating module loads. PR: kern/159345 Reported by: Eugene Grosbein <egrosbein att rdtc ru> Tested by: Eugene Grosbein <egrosbein att rdtc ru> Approved by: re (kib) MFC after: 1 week	2011-08-01 11:24:55 +00:00
Bjoern A. Zeeb	d9a362862c	Add spares to the network stack for FreeBSD-9: - TCP keep* timers - TCP UTO (adjust from what was there already) - netmap - route caching - user cookie (temporary to allow for the real fix) Slightly re-shuffle struct ifnet moving fields out of the middle of spares and to better align. Discussed with: rwatson (slightly earlier version)	2011-07-17 21:15:20 +00:00
Mark Peek	a4980a95b5	Clear the filter memory area before using it. Leaving it uninitialized may leak previous kernel stack contents through a malicioius BPF filter. PR: kern/158880 Submitted by: Guy Harris Obtained from: OpenBSD MFC after: 1 week	2011-07-14 21:06:22 +00:00
Marko Zec	13e255fab7	Permit ARP to proceed for IPv4 host routes for which the gateway is the same as the host address. This already works fine for INET6 and ND6. While here, remove two function pointers from struct lltable which are only initialized but never used. MFC after: 3 days	2011-07-08 09:38:33 +00:00
Andrew Thompson	6069a2c0bd	Grab the rlock before checking if our interface is enabled, it could be possible to hit a dead pointer when changing interfaces. PR: kern/156978 Submitted by: Andrew Boyer MFC after: 1 week	2011-07-07 20:02:09 +00:00
Bjoern A. Zeeb	a34c6aeb85	Tag mbufs of all incoming frames or packets with the interface's FIB setting (either default or if supported as set by SIOCSIFFIB, e.g. from ifconfig). Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) Reviewed by: julian MFC after: 2 weeks	2011-07-03 16:08:38 +00:00
Bjoern A. Zeeb	43deddcdfe	Remove extra white space to comply with style for the rest of the struct. MFC after: 2 weeks	2011-07-03 15:34:09 +00:00
Bjoern A. Zeeb	35fd7bc020	Add infrastructure to allow all frames/packets received on an interface to be assigned to a non-default FIB instance. You may need to recompile world or ports due to the change of struct ifnet. Submitted by: cjsp Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) (original versions) Reviewed by: julian Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru) MFC after: 2 weeks X-MFC: use spare in struct ifnet	2011-07-03 12:22:02 +00:00
Sergey Kandaurov	235195988b	Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32 (i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match the native SIOCGIFCONF behavior. PR: kern/158369 Reported by: Paul Procacci <pprocacci att gmail com> MFC after: 1 week	2011-06-28 08:41:44 +00:00
Bjoern A. Zeeb	f5857e2d3d	Garbage collect never used global, sysctl, externs. MFC after: 1 week	2011-06-21 07:19:03 +00:00
Bjoern A. Zeeb	b8b8e0c981	Leave an extra comment about flowtable and IPv6 support rectifying a previous comment. MFC after: 1 week	2011-06-20 12:35:12 +00:00
Bjoern A. Zeeb	52dcd04ba3	gre(4) was using a field in the softc to detect possible recursion. On MP systems this is not a usable solution anymore and could easily lead to false positives triggering enough logging that even using the console was no longer usable (multiple parallel ping -f can do). Switch to the suggested solution of using mbuf tags to carry per packet state between gre_output() invocations. Contrary to the proposed solution modelled after gif(4) only allocate one mbuf tag per packet rather than per packet and per gre_output() pass through. As the sysctl to control the possible valid (gre in gre) nestings does no sanity checks, make sure to always allocate space in the mbuf tag for at least one, and at most 255 possible gre interfaces to detect loops in addition to the counter. Submitted by: Cristian KLEIN (cristi net.utcluj.ro) (original version) PR: kern/114714 Reviewed by: Cristian KLEIN (cristi net.utcluj.ro) Reviewed bu: Wooseog Choi (ben_choi hotmail.com) Sponsored by: Sandvine Incorporated MFC after: 1 week	2011-06-18 09:34:03 +00:00
Luigi Rizzo	c9d658e9f7	Grab one of the ifcap bits for netmap, and enable printing in ifconfig. Document the fact that we might want an IFCAP_CANTCHANGE mask, even though the value is not yet used in sys/net/if.c (asked on -current a week ago, no feedback so i assume no objection).	2011-06-14 12:40:55 +00:00
Marko Zec	2fe7ca2ca6	Set curvnet context in a callout-trigerred code path. MFC after: 3 days	2011-06-07 20:46:03 +00:00
John Baldwin	190367ef1c	Properly return an ENOBUFS error if a write to a tun(4) device fails due to m_uiotombuf() failing. While here, trim unneeded error handling related to tuninit() since it can never fail. Submitted by: Martin Birgmeier la5lbtyi aon at Reviewed by: glebius MFC after: 1 week	2011-06-03 13:47:05 +00:00
Robert Watson	6cb52192fe	Add an optional netisr dispatch point at ether_input(), but set the default dispatch method to NETISR_DISPATCH_DIRECT in order to force direct dispatch. This adds a fairly negligble overhead without changing default behavior, but in the future will allow deferred or hybrid dispatch to other worker threads before link layer processing has taken place. For example, this could allow redistribution using RSS hashes without ethernet header cache line hits, if the NIC was unable to adequately implement load balancing to too small a number of input queues -- perhaps due to hard queueset counts of 1, 3, or 8, but in a modern system with 16-128 threads. This can happen on highly threaded systems, where you want want an ithread per core, redistributing work to other queues, but also on virtualised systems where hardware hashing is (or is not) available, but only a single queue has been directed to one VCPU on a VM. Note: this adds a previously non-present assertion about the equivalence of the ifnet from which the packet is received, and the ifnet stamped in the mbuf header. I believe this assertion to generally be true, but we'll find out soon -- if it's not, we might have to add additional overhead in some cases to add an m_tag with the originating ifnet pointer stored in it. Reviewed by: bz MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.	2011-06-01 20:00:25 +00:00
Nathan Whitehorn	d098f93019	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb	2011-05-31 15:11:43 +00:00
Robert Watson	f2d2d69438	Rework netisr policy mechanism so that per-protocol dispatch policies can be represented: - A single policy namespace is defined, consisting of four possible policies: "default" to use the global default, "deferred" to force deferred dispatch, "direct" to employ direct dispatch where possible, and "hybrid" which makes a dynamic decision based on CPU affinity, ordering, etc. Routines are implemented to convert between strings and an integer namespace. - A new global variable, netisr_dispatch_policy, subsumes existing global variables for direct dispatch, forced direct dispatch, etc, and is used for explicit policy interpretation and composition. Old variables remain so that they can be exported by legacy sysctls for use by old netstat(1) binaries. A new sysctl and tunable, netisr.dispatch.policy, accepts the above strings for specifying a global policy default. - The protocol registration structure, netisr_handler, grows an nh_dispatch field, which accepts a per-policy policy override. The default value is '0', which corresponds to "default", meaning that protocols will accept the global default policy unless otherwise specified. - Policies are now interpreted and composed explicitly at various points in packet dispatch; protocol policies override global policies. - Protocols grow the ability to express a non-opinion about affinity even when implenting m2cpuid by returning NETISR_CPUID_NONE. In that case, the framework falls back on source ordering, rather than simply using the current CPU. These changes are in support of allowing link layer re-dispatch based on RSS or similar hashes provided by NICs, especially in the case where the number of hardware receive queues matches hardware core count, rather than hardware thread count, requiring further software redistributeon. (i.e., on RMI XLR). MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-24 12:34:19 +00:00
Marko Zec	9f8cab7fc2	Allow for vlan(4) interfaces with MTU of 1500 bytes to be configured on top of epair(4) virtual interfaces, since there's no physical hardware associated with epair interfaces which would imply any constraints on MTU sizes. MFC after: 3 days	2011-05-24 08:02:55 +00:00
Marko Zec	2dccdd4562	Let epair(4) virtual interfaces report fake link / media status, by borrowing the skeleton of if_media manipulation and reporting code from if_lagg(4). The main motivation behind this change is to allow for epair(4) interfaces to participate in STP if_bridge(4) configurations. Reviewed by: bz MFC after: 3 days	2011-05-24 07:57:28 +00:00
Qing Li	5b84dc789a	The statically configured (permanent) ARP entries are removed when an interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days	2011-05-20 19:12:20 +00:00
Marius Strobl	d09c5f16b0	- Add 10baseT as an alias for 10baseT/UTP. - Add shorthand aliases for common media+option combinations as announced by miibus(4) so that one can actually supply the media strings found in the dmesg output to ifconfig(8). Obtained from: NetBSD (in principle) MFC after: 2 weeks	2011-05-15 12:58:29 +00:00
Pyun YongHyeon	d2d0470dc6	Fix white space nits and style	2011-05-06 20:46:29 +00:00
Pyun YongHyeon	26b8066bce	Do not increment collision counter if transmit have failed. Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it has nothing to do with collision. Reported by: Zeus V Panchenko (zeus <> ibs dot dn dot ua)	2011-05-06 20:37:07 +00:00
Andrew Thompson	627cecc5c9	LACP frames must not be send VLAN-tagged, check for that before processing. PR: kern/156743 Submitted by: Dmitrij Tejblum MFC after: 1 week	2011-04-30 20:34:52 +00:00
Bjoern A. Zeeb	a0ae8f04e8	Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:30:44 +00:00
Gleb Smirnoff	4c506522c1	When removing ifnets, we should first remove the reference to ifnet from the interface index, then decrease refcount, not vice versa. Otherwise there is a race (reproducible) when if_free_internal() contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the index for a long time. It may be picked by some other thread, that runs ifnet_byindex_ref(), who takes the ifnet from index, and bumps refcount. When reader drops the lock, if_free_internal() proceeds with free. Then reader tries to free it a second time.	2011-04-04 07:45:08 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Dmitry Chagin	2093339ead	Remove dead code. MFC after: 1 Week	2011-03-20 08:35:00 +00:00
Dmitry Chagin	e579f1c1cf	ouch, newrt is used on the return path, my fault. Partialy revert the previous change. MFC after: 1 Week.	2011-03-19 21:10:57 +00:00
Dmitry Chagin	523e60025b	A bit rearranged rtalloc1_fib() code. Initialize a variable when it is really needed. To avoid code duplication move the miss label to line up and jump on it. MFC after: 1 Week	2011-03-19 19:50:36 +00:00
Dmitry Chagin	6a873ef717	Remove a now unused variable. MFC after: 1 Week	2011-03-19 16:52:06 +00:00
Ermal Luçi	5f82cfdf6c	Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none. Approved by: thompsa(mentor) MFC after: 1 week	2011-03-04 20:37:38 +00:00
Bjoern A. Zeeb	e3416ab0c0	Hide the outer IP addresses of a tunnel interfaces (gif(4), gre(4)) from processes inside jails if the addresses do not belong to the jail. Originally reported by: Pieter de Boer via remko PR: kern/151119 Tested by: Piotr KUCHARSKI (nospam 42.pl) [gif] MFC after: 1 week	2011-03-02 21:39:08 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Bjoern A. Zeeb	1fb51a12f2	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
Bjoern A. Zeeb	144e6203ff	Mfp4 CH=177255: Resort the CURVNET_SET* macros in the non-VNET_DEBUG case to match the call order of the VNET_DEBUG case. Add the VNET_ASSERT() to the non-VNET_DEBUG case as well so that INVARIANTS will still catch problems. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks	2011-02-11 14:17:58 +00:00
Bjoern A. Zeeb	0028e52461	Mfp4 CH=177255: Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS. Change the syntax to match KASSERT() to allow more flexible panic messages rather than having a printf with hardcoded arguments before panic. Adjust the few assertions we have to the new format (and enhance the output). Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks	2011-02-11 13:27:00 +00:00
Bjoern A. Zeeb	6cf986ac19	Mfp4 CH=177255: Use __func__ rather than __FUNCTION__. MFC after: 2 weeks	2011-02-11 12:56:05 +00:00
Max Laier	826bf287b5	As info.rti_info[RTAX_DST] can point inside of rtm we must not free the rtm until rt_dispatch is done with the sockaddr. Found by: memguard MFC after: 3 days	2011-02-10 01:24:09 +00:00
John Baldwin	5f3b301a43	Fix a LOR by dropping the global ifnet locks while allocating a new ifnet table in if_grow(). The order of the SYSINIT's for ifnet state were swapped so that the various locks were initialized before being used. Reviewed by: pluknet, bz MFC after: 2 weeks	2011-01-24 22:21:58 +00:00
Matthew D Fleming	f8e4b4ef49	sysctl(8) should use the CTLTYPE to determine the type of data when reading. (This was already done for writing to a sysctl). This requires all SYSCTL setups to specify a type. Most of them are now checked at compile-time. Remove SYSCTL_X sysctl additions as the print being in hex should be controlled by the -x flag to sysctl(8). Succested by: bde	2011-01-19 17:04:07 +00:00
Matthew D Fleming	f88910cdf5	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the net* piece.	2011-01-12 19:53:50 +00:00
John Baldwin	58ccf5b41c	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde	2011-01-11 13:59:06 +00:00
Bjoern A. Zeeb	9269189f98	MfP4 CH=185246 [1]: Add FEATURE() to announce optional VIMAGE. MFC after: 3 days [1] for the moment put it in vnet.c.	2011-01-09 20:40:21 +00:00
John Baldwin	a8f4344f08	- Restore dropping the priority of syncer down to PPAUSE when it is idle. This was lost when it was converted to using a condition variable instead of lbolt. - Drop the priority of flowtable down to PPAUSE when it is idle as well since it is a similar background task. MFC after: 2 weeks	2011-01-06 22:17:07 +00:00
Marius Strobl	a0fc3825c3	Teach ifconfig(8) the handy shared option shortcut aliases the NetBSD counterpart also takes, i.e. "fdx" for "full-duplex", "flow" for "flowcontrol", "hdx" for "half-duplex" as well as "loop" and "loopback" for "hw-loopback". MFC after: 1 week	2011-01-05 15:28:30 +00:00
Marius Strobl	8b28e7e1a3	Fix whitespace. MFC after: 1 week	2011-01-05 14:51:04 +00:00
Bjoern A. Zeeb	962be6dfb3	Use NULL rather than 0 to invalidate a pointer. Rather than duplicating the LLE_FREE_LOCKED() macro code in LLE_FREE(), call it directly (like we do for the RT_* macros). Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC After: 1 week [1] Early 2010.	2010-12-31 21:57:54 +00:00
Bjoern A. Zeeb	c9a2711a54	Print the vnet pointer under DDB when iterating over flowtables of each virtual network stack instance. Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC after: 1 week [1] Early 2010.	2010-12-31 21:20:32 +00:00
Bjoern A. Zeeb	f0a56b0678	Move the increment operation under the lock and split the condition variable into two so that we can see on which one we are waiting. This might also more properly propagate the update of the flowclean_cycles flag and avoid "hangs" people were seeing. Suggested by: rwatson [1] Sponsored by: ISPsystem [1] Reviewed by: julian [1] Updated by: Mikolaj Golub (to.my.trociny gmail.com) Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC After: 1 week [1] Early 2010, initial version.	2010-12-31 21:06:52 +00:00
Alan Cox	82de724fe1	Introduce and use a new VM interface for temporarily pinning pages. This new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@	2010-12-25 21:26:56 +00:00
Weongyo Jeong	c5649739a5	Adds IFF_CANTCONFIG to IFF_CANTCHANGE that it shouldn't happen through ioctl(2).	2010-12-07 20:31:04 +00:00
Weongyo Jeong	6e3cb00068	Introduces IFF_CANTCONFIG interface flag to point that the interface isn't configurable in a meaningful way. This is for ifconfig(8) or other tools not to change code whenever IFT_USB-like interfaces are registered at the interface list. Reviewed by: brooks No objections: gavin, jkim	2010-12-07 20:23:47 +00:00
Maxim Konovalov	57542d0481	o Swap descriptions for net.bpf.bufsize and net.bpf.maxbufsize. PR: misc/152531 MFC after: 1 week	2010-11-24 05:50:19 +00:00
Marko Zec	ccf7ba972c	Allow for vlan(4) ifnets to have overlapping unit numbers if they are created in separated vnets. As a side-effect of having a separated if_cloner instance for each vnet, all vlan ifnets created in a vnet will be automatically destroyed when vnet teardown is initiated. Disallow SIOCSETVLAN and SIOCGETVLAN ioctls on vlan ifnets which are associated with physical ifnets residing in parent vnets. This is an interim vlan-specific solution which will be superseded by a more generic if_cloner V_irtualization change from p4. For nooptions VIMAGE builds, this should be a no-op change. Discussed with: bz MFC after: 3 days	2010-11-22 23:35:29 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Bjoern A. Zeeb	2c8b047c07	Add a missing ';' and change the debugging sysctl from xint to int. Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 3 days	2010-11-21 19:33:19 +00:00
Dimitry Andric	c3adda9fc3	Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined.	2010-11-14 20:40:55 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Dimitry Andric	47d46d92c2	Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-14 20:23:02 +00:00
Marius Strobl	efd4fc3fb3	o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control support in mii(4): - Merge generic flow control advertisement (which can be enabled by passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD, IFM_FLOW isn't implemented as a global option via the "don't care mask" but instead as a media specific option this. This has the following advantages: o allows flow control advertisement with autonegotiation to be turned on and off via ifconfig(8) with the default typically being off (though MIIF_FORCEPAUSE has been added causing flow control to be always advertised, allowing to easily MFC this changes for drivers that previously used home-grown support for flow control that behaved that way without breaking POLA) o allows to deal with PHY drivers where flow control advertisement with manual selection doesn't work or at least isn't implemented, like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4), by setting MIIF_NOMANPAUSE o the available combinations of media options are readily available from the `ifconfig -m` output - Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so these are understood by ifconfig(8). o Make the master/slave support in mii(4) actually usable: - Change IFM_ETH_MASTER from being implemented as a global option via the "don't care mask" to a media specific one as it actually is only applicable to IFM_1000_T to date. - Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to actually configure manually selected slave mode (like we also do in the PHY specific implementations). - Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it is understood by ifconfig(8). o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4), e1000phy(4) and ip1000phy(4) to use the generic flow control support instead of home-grown solutions via IFM_FLAGs. This includes changing these PHY drivers and smcphy(4) to no longer unconditionally advertise support for flow control but only if the selected media has IFM_FLOW set (or MIIF_FORCEPAUSE is set) and implemented for these media variants, i.e. typically only for copper. o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0 and some IFM_FLAGn. o Switch brgphy(4) to add at least the the supported copper media based on the contents of the BMSR via mii_phy_add_media() instead of hardcoding them. The latter approach seems to have developed historically, besides causing unnecessary code duplication it was also undesirable because brgphy_mii_phy_auto() already based the capability advertisement on the contents of the BMSR though. o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not just BCM5701. Apparently this was a misinterpretation of a workaround in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but this doesn't mean we can't set these as well on other PHYs for manual media selection. o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so IFM_1000_T master mode support now is generally available with all PHY drivers. o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's not applicable there. Reviewed by: yongari (plus additional testing) Obtained from: NetBSD (partially), OpenBSD (partially) MFC after: 2 weeks	2010-11-14 13:26:10 +00:00
Konstantin Belousov	7b3b099e07	Use 'z' modifier for size_t printing.	2010-11-13 11:11:51 +00:00
Dimitry Andric	7e54af0831	Similar to r212647, remove the workaround in sys/net/vnet.h for an ld bug (incorrect placement of __start_SECNAME in some cases) that was fixed in r210245. There is already an UPDATING entry about needing a recent ld. MFC after: 1 month	2010-11-12 22:59:50 +00:00
George V. Neville-Neil	e162ea60d4	Add a queue to hold packets while we await an ARP reply. When a fast machine first brings up some non TCP networking program it is quite possible that we will drop packets due to the fact that only one packet can be held per ARP entry. This leads to packets being missed when a program starts or restarts if the ARP data is not currently in the ARP cache. This code adds a new sysctl, net.link.ether.inet.maxhold, which defines a system wide maximum number of packets to be held in each ARP entry. Up to maxhold packets are queued until an ARP reply is received or the ARP times out. The default setting is the old value of 1 which has been part of the BSD networking code since time immemorial. Expose the time we hold an incomplete ARP entry by adding the sysctl net.link.ether.inet.wait, which defaults to 20 seconds, the value used when the new ARP code was added.. Reviewed by: bz, rpaulo MFC after: 3 weeks	2010-11-12 22:03:02 +00:00
Dimitry Andric	4403994d7d	Use the same treatment as in linker_set.h for the __start and __stop symbols of the set_vnet and set_pcpu sections, so those symbols will always be emitted in kernel modules, if they use vnet.h or pcpu.h. Also, for pcpu.h, make the __(start\|stop)_set_pcpu declarations, and associated macros invisible to userland, to prevent it picking up these symbols. Reviewed by: kib	2010-11-11 19:18:52 +00:00
Rui Paulo	09b6dcf968	Sync DLTs with the latest pcap version.	2010-10-29 18:41:09 +00:00
Bjoern A. Zeeb	a38de0134b	Factor out DDB commands from r204145, r204279 into if_debug.c for further enhancements (1). Switch to a standard 2-clause BSD license for this (2). Unfortunately we have to un-static the ifindex_table for this but do not publicly export it. Suggested by: rwatson (1) a while back. Approved by: thompsa (2) for the change from r204279. MFC after: 6 days	2010-10-25 08:30:19 +00:00
Sergey Kandaurov	9af74f3d68	Reshuffle SIOCGIFCONF32 handler from r155224. - move all the chunks into one file, which allows to hide SIOCGIFCONF32 global definition as well. - replace __amd64__ with proper COMPAT_FREEBSD32 around. - handle 32bit capacity before going into the handler itself instead of doing internal 32bit specific changes within it (e.g. as it's done for SIOCGDEFIFACE32_IN6). - use explicitely sized types for ABI compat. Approved by: kib (mentor) MFC after: 2 weeks	2010-10-21 16:20:48 +00:00
Bjoern A. Zeeb	ee7c7fee94	Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating over all interfaces to make sure the address will neither change nor be freed while we are working on it. PR: kern/146250 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 1 week	2010-10-16 19:25:27 +00:00
Bjoern A. Zeeb	fc2bfb3294	lltable_drain() has never been used so far, thus #if 0 it for now. While touching it add the missing locking to the now disabled code for the time when we'll resurrect it. MFC after: 3 days	2010-10-16 18:42:09 +00:00
Bjoern A. Zeeb	b6b8c0779d	Only hide the ifa and not the tp under #ifdef INET as the tp is needed for locking evenwhen there is no INET. MFC after: 3 days	2010-10-01 15:14:14 +00:00
John Baldwin	24f481fde2	- Expand scope of tun/tap softc locks to cover more softc fields and driver-maintained ifnet fields (such as if_drv_flags). - Use soft locks as the mutex that protects each interface's knote list rather than using the global knote list lock. Also, use the softc for kn_hook instead of the cdev. - Use mtx_sleep() instead of tsleep() when blocking in the read routines. This fixes a lost wakeup race. - Remove D_NEEDGIANT now that the cdevsw routines use the softc lock where locking is needed. - Lock IFQ when calculating the result for FIONREAD in tap(4). tun(4) already did this. - Remove remaining spl calls. Submitted by: Marcin Cieslak saper of saper\|info (3) MFC after: 2 weeks	2010-09-22 21:02:43 +00:00
Jung-uk Kim	d0d7bcdf92	Fix a typo in a comment. Submitted by: afiveg	2010-09-16 18:37:33 +00:00
Matthew D Fleming	4d369413e1	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
Bjoern A. Zeeb	73e39d6137	MFp4 CH=183259: No reason to use if_free_type() as we don't change our type. Just if_free() is fine. MFC after: 3 days	2010-09-02 16:11:12 +00:00
Ed Maste	be4572c896	Add a sysctl knob to accept input packets on any link in a failover lagg.	2010-09-01 16:53:38 +00:00
Bjoern A. Zeeb	c749353940	MFp4 CH=182972: Add explicit linkstate UP/DOWN for the epair. This is needed by carp(4) and other things to work. MFC after: 5 days	2010-08-27 23:22:58 +00:00
Rui Paulo	79856499bd	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
Marko Zec	d3c351c50f	When moving an ethernet ifnet from one vnet to another, destroy the associated ng_ether netgraph node in the current vnet, and create a new one in the target vnet. Reviewed by: julian MFC after: 3 days	2010-08-13 18:17:32 +00:00
Will Andrews	9963e8a52c	Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with the appropriate ifdefs. Reviewed by: bz Approved by: ken (mentor)	2010-08-11 20:18:19 +00:00
Will Andrews	54bfbd5153	Allow carp(4) to be loaded as a kernel module. Follow precedent set by bridge(4), lagg(4) etc. and make use of function pointers and pf_proto_register() to hook carp into the network stack. Currently, because of the uncertainty about whether the unload path is free of race condition panics, unloads are disallowed by default. Compiling with CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure. This commit requires IP6PROTOSPACER, introduced in r211115. Reviewed by: bz, simon Approved by: ken (mentor) MFC after: 2 weeks	2010-08-11 00:51:50 +00:00
John Baldwin	3ba24fde11	Adjust the interface type in the link layer socket address for vlan(4) interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface (IFT_ETHER). The code already fixed if_type in the ifnet causing some places to report the interface as a vlan (e.g. arp -a output) and other places to report the interface as Ethernet (getifaddrs(3)). Now they should all report IFT_L2VLAN. Reviewed by: brooks MFC after: 1 month	2010-08-06 15:15:26 +00:00
Konstantin Belousov	04f3205755	Properly set ifi_datalen for compat32 struct if_data32. PR: kern/149240 Submitted by: Stef Walter <stef memberwebs com> MFC after: 1 weeks	2010-08-03 15:40:42 +00:00
Gleb Smirnoff	b17f26b00c	Don't check malloc(M_WAITOK) result.	2010-07-27 11:56:49 +00:00
Bjoern A. Zeeb	cd292f1264	Return NULL rather than 0 for a pointer. MFC after: 3 days	2010-07-27 11:54:01 +00:00
Gleb Smirnoff	85011246ac	When installing a new ARP entry via 'arp -S', lla_lookup() will either find an existing entry, or allocate a new one. In the latter case an entry would have flags, that were supplied as argument to lla_lookup(). In case of an existing entry, flags aren't modified. This lead to losing LLE_PUB and/or LLE_PROXY flags. We should apply these flags either in lla_rt_output() or in the in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is a more correct choice. PR: kern/148784, kern/146539 Silence from: qingli, 5 days	2010-07-27 10:05:27 +00:00
Jung-uk Kim	82040afcf3	Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock regardless of the given flags. MFC after: 3 days	2010-07-22 18:44:40 +00:00
Luigi Rizzo	1f6ad072ea	whitespace cleanup	2010-07-15 14:41:59 +00:00
Luigi Rizzo	b62cb72c48	small portability fix to build on linux/windows	2010-07-15 14:41:06 +00:00
Jung-uk Kim	547d94bde3	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Marko Zec	b1ae592bd4	Provide a macro for registering a virtualized sysctl handler for VNET opaque data. MFC after: 30 days	2010-06-02 15:29:21 +00:00
Qing Li	0ed6142b31	This patch fixes the problem where proxy ARP entries cannot be added over the if_ng interface. MFC after: 3 days	2010-05-25 20:42:35 +00:00
John Baldwin	6f359e2828	Ignore failures from removing multicast addresses from the parent (trunk) interface when tearing down a vlan interface. If a trunk interface is detached, all of its multicast addresses are removed before the ifnet departure eventhandlers are invoked. This means that all of the multicast addresses are removed before the vlan interfaces are removed which causes the if_delmulti() calls in the vlan teardown to fail. In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer valid parent interface. In the !VLAN_ARRAY case, the eventhandler gets stuck in an infinite loop retrying vlan_unconfig_locked() forever. In general the callers of vlan_unconfig_locked() do not expect nor handle failure, so I believe it is safer to ignore the errors and tear down as much of the vlan state as possible. Silence from: net@ MFC after: 4 days	2010-05-17 19:36:56 +00:00
Kip Macy	83e711ec14	allocate ipv6 flows from the ipv6 flow zone reported by: rrs@ MFC after: 3 days	2010-05-16 21:48:39 +00:00
Bjoern A. Zeeb	793f71bf2e	Fix an issue with the dynamic pcpu/vnet data allocators. We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than _MODMIN space and ignore the extra space (). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-05-14 21:11:58 +00:00
Kip Macy	19d0491585	workaround bug with ipv6 where a flow can have a null rtentry	2010-05-12 04:51:20 +00:00
Alan Cox	f0c0d3998d	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
Alan Cox	a7283d3213	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
Maxim Sobolev	e50d35e6c6	Add new tunable 'net.link.ifqmaxlen' to set default send interface queue length. The default value for this parameter is 50, which is quite low for many of today's uses and the only way to modify this parameter right now is to edit if_var.h file. Also add read-only sysctl with the same name, so that it's possible to retrieve the current value. MFC after: 1 month	2010-05-03 07:32:50 +00:00
Alan Cox	913814935a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Bjoern A. Zeeb	82cea7e6f3	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
Kip Macy	3e8b572db4	need to initialize the lock before it is used MFC after: 3 days	2010-04-27 23:48:50 +00:00
Bjoern A. Zeeb	1b610a749e	MFP4: @177254 Add missing CURVNET_RESTORE() calls for multiple code paths, to stop leaking the currently cached vnet into callers and to the process. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-04-27 15:16:54 +00:00
Konstantin Belousov	fc0a61a401	Provide compat32 shims for bpf(4), except zero-copy facilities. bd_compat32 field of struct bpf_d is kept unconditionally to not impose the requirement of including "opt_compat.h" on all numerous users of bpfdesc.h. Submitted by: jhb (version for 6.x) Reviewed and tested by: emaste MFC after: 2 weeks	2010-04-25 16:43:41 +00:00
Konstantin Belousov	427a928af7	Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST. This allows getifaddrs(3) to work for compat32 binaries. Submitted by: jhb (6.x version) Reviewed by: emaste Tested by: emaste and <pluknet gmail com> MFC after: 2 weeks	2010-04-25 16:42:47 +00:00
Julian Elischer	7a90b21212	Move two copies of the same definition to a common include file. MFC after: 3 weeks	2010-04-14 23:06:07 +00:00
Xin LI	57d848483e	When an underlying ioctl(2) handler returns an error, our ioctl(2) interface considers that it hits a fatal error, and will not copyout the request structure back for _IOW and _IOWR ioctls, keeping them untouched. The previous implementation of the SIOCGIFDESCR ioctl intends to feed the buffer length back to userland. However, if we return an error, the feedback would be defeated and ifconfig(8) would trap into an infinite loop. This commit changes SIOCGIFDESCR to set buffer field to NULL to indicate the previous ENAMETOOLONG case. Reported by: bschmidt MFC after: 2 weeks	2010-04-14 22:02:19 +00:00
Bjoern A. Zeeb	d0088cde62	Take a reference to make sure that the interface cannot go away during if_clone_destroy() in case parallel threads try to. PR: kern/116837 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 10 days	2010-04-11 18:47:38 +00:00
Bjoern A. Zeeb	c769e1be01	Check that the interface is on the list of cloned interfaces before trying to remove it to avoid panics in case of two threads trying to remove it in parallel. PR: kern/116837 Submitted by: Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version) MFC after: 10 days	2010-04-11 18:41:31 +00:00
Bjoern A. Zeeb	becba438d2	Plug reference leaks in the link-layer code ("new-arp") that previously prevented the link-layer entry from being freed. In both in.c and in6.c (though that code path seems to be basically dead) plug a reference leak in case of a pending callout being drained. In if_ether.c consistently add a reference before resetting the callout and in case we canceled a pending one remove the reference for that. In the final case in arptimer, before freeing the expired entry, remove the reference again and explicitly call callout_stop() to clear the active flag. In nd6.c:nd6_free() we are only ever called from the callout function and thus need to remove the reference there as well before calling into llentry_free(). In if_llatbl.c when freeing entire tables make sure that in case we cancel a pending callout to remove the reference as well. Reviewed by: qingli (earlier version) MFC after: 10 days Problem observed, patch tested by: simon on ipv6gw.f.o, Christian Kratzer (ck cksoft.de), Evgenii Davidov (dado korolev-net.ru) PR: kern/144564 Configurations still affected: with options FLOWTABLE	2010-04-11 16:04:08 +00:00
Bjoern A. Zeeb	d8c136591a	In if_detach_internal() we cannot hold the af_data lock over the dom_ifdetach() calls as they might sleep for callout_drain(). Do as we do in if_attachdomain1() [r121470] and handle if_afdata_initialized earlier and call dom_ifdetach() unlocked. Discussed with: rwatson MFC after: 10 days	2010-04-11 11:51:44 +00:00
Bjoern A. Zeeb	318c3213e5	In if_detach_internal() only try to do the detach run if if_attachdomain1() has actually succeeded to initialize and attach. There is a theoretical possibility to drop out early in if_attachdomain1() leaving the array uninitialized if we cannot get the lock. Discussed with: rwatson MFC after: 10 days	2010-04-11 11:49:24 +00:00
Jung-uk Kim	704858479c	Check the pointer to JIT binary filter before its de-allocation. Submitted by: Alexander Sack (asack at niksun dot com) MFC after: 3 days	2010-03-29 20:24:03 +00:00
Rui Paulo	59fe4a8ce6	Add MCS to the list of media types. Sponsored by: iXsystems, inc.	2010-03-23 13:15:11 +00:00
Kip Macy	3059584e2a	- boot-time size the ipv4 flowtable and the maximum number of flows - increase flow cleaning frequency and decrease flow caching time when near the flow limit - stop allocating new flows when within 3% of maxflows don't start allocating again until below 12.5% MFC after: 7 days	2010-03-22 23:04:12 +00:00
Ed Maste	d8564efde1	Avoid holding the VLAN_LOCK() over the parent interface SIOCGIFMEDIA ioctl call, as it may sleep. Reviewed by: rwatson	2010-03-21 15:00:33 +00:00
Bjoern A. Zeeb	42eedeac00	Split eventhandler_register() into an internal part and a wrapper function that provides the allocated and setup eventhandler entry. Add a new wrapper for VIMAGE that allocates extra space to hold the callback function and argument in addition to an extra wrapper function. While the wrapper function goes as normal callback function the argument points to the extra space allocated holding the original func and arg that the wrapper function can then call. Provide an iterator function for the virtual network stack (vnet) that will call the callback function for each network stack. Provide a new set of macros for VNET that in the non-VIMAGE case will just call eventhandler_register() while in the VIMAGE case it will use vimage_eventhandler_register() passing in the extra iterator function but will only register once rather than per-vnet. We need a special macro in case we are interested in the tag returned as we must check for curvnet and can neither simply assign the return value, nor not change it in the non-vnet0 case without that. Sponsored by: ISPsystem Discussed with: jhb Reviewed by: zec (earlier version), jhb MFC after: 1 month	2010-03-19 19:51:03 +00:00
Bjoern A. Zeeb	335b943f8e	Add ddb support to the "new" link layer code ("new-arp"): - show all lltables [1] (optional flag to also show the llentries as well) - show lltable <struct lltable > - show llentry <struct llentry > MFC after: 6 days	2010-03-18 09:09:59 +00:00
Qing Li	6b533b5ddb	Verify interface up status using its link state only if the interface has such capability. The interface capability flag indicates whether such capability exists. This approach is much more backward compatible. Physical device driver changes will be part of another commit. Also updated the ifconfig utility to show the LINKSTATE capability if present. Reviewed by: rwatson, imp, juli MFC after: 3 days	2010-03-16 17:59:12 +00:00
Max Laier	4c71aa5890	Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834. MFC after: 3 days	2010-03-15 21:15:03 +00:00
Kip Macy	8847ae28f5	flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB pointed out by jkim@	2010-03-12 19:58:51 +00:00
Jung-uk Kim	5d7af3a1cc	Fix a style(9) nit.	2010-03-12 19:42:42 +00:00
Kip Macy	a398ca9cea	re-update copyright to 2010 pointed out by danfe@	2010-03-12 19:26:45 +00:00
Jung-uk Kim	9fee1bd1d8	Tidy up callout for select(2) and read timeout. - Add a missing callout_drain(9) before the descriptor deallocation.[1] - Prefer callout_init_mtx(9) over callout_init(9) and let the callout subsystem handle the mutex for callout function. PR: kern/144453 Submitted by: Alexander Sack (asack at niksun dot com)[1] MFC after: 1 week	2010-03-12 19:14:58 +00:00
Qing Li	688ba6823b	The flow-table module retrieves the destination and source address as well as the transport protocol port information from the outbound packets. The routing code is generic and compares every byte in the given sockaddr object. Therefore the temporary sockaddr objects must be cleared due to padding bytes. In addition, the port information must be stripped or the route search will either fail or return the incorrect route entry. Unit testing is done using OpenVPN over the if_tun interface. MFC after: 7 days	2010-03-12 10:24:58 +00:00
Kip Macy	112125d206	fix stats reporting sysctl	2010-03-12 06:31:19 +00:00
Kip Macy	d4121a02c0	- restructure flowtable to support ipv6 - add a name argument to flowtable_alloc for printing with ddb commands - extend ddb commands to print destination address or 4-tuples - don't parse ports in ulp header if FL_HASH_ALL is not passed - add kern_flowtable_insert to enable more generic use of flowtable (e.g. system calls for adding entries) - don't hash loopback addresses - cleanup whitespace - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention - add sysctls to accumulate stats and report aggregate MFC after: 7 days	2010-03-12 05:03:26 +00:00
Qing Li	355ad3ead4	The if_tap interface is of IFT_ETHERNET type, but it does not set or update the if_link_state variable. As such RT_LINK_IS_UP() fails for the if_tap interface. Also, the RT_LINK_IS_UP() needs to bypass all loopback interfaces because loopback interfaces are considered up logically as long as the system is running. This patch fixes the above issues by setting and updating the if_link_state variable when the tap interface is opened or closed respectively. Similary approach is already done in the if_tun device. MFC after: 3 days	2010-03-11 17:56:46 +00:00
Qing Li	c7ea0aa648	One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days	2010-03-09 01:11:45 +00:00
Xin LI	13d85d4382	Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg interface. The check itself seems to be coming from OpenBSD but does not seem to be useful for our code. Discussed with: thomasa MFC after: 1 month	2010-03-09 00:52:16 +00:00
Bjoern A. Zeeb	e253cdd07c	Not only flush the ipfw tables when unloading ipfw or tearing down a virtual netowrk stack, but also free the Radix Node Head. Sponsored by: ISPsystem Reviewed by: julian MFC after: 5 days	2010-03-07 15:37:58 +00:00
Bjoern A. Zeeb	1bb635b04d	Introduce a function rn_detachhead() that will free the radix table root nodes. This is only needed (and available) in the virtualization case to free the resources when tearing down a virtual network stack. Sponsored by: ISPsystem Reviewed by: julian, zec MFC after: 5 days	2010-03-06 21:27:26 +00:00
Bjoern A. Zeeb	eea3faf77b	Rework reference counting in case we queue into the netisr, or overflow the netisr queue and fall back to the interface queue so that we can garuantee that the ifnet pointer stays valid. Formerly we ended up with reference counts <= 0 in case the netisr had returned ENOBUFS. The idea is to track any packet in the netisr queue and only change the refount on edge operations for the fallback interface queue. This also avoids problems in case the if_snd.ifq_len lies to us. Also rework refount assertions to make sure they trigger if we go below 1. Formerly a negative refence count did not trigger the assert as the refcount variable is u_int. Sponsored by: ISPsystem MFC after: 5 days	2010-03-06 21:22:28 +00:00
Luigi Rizzo	cc4d3c30ea	Bring in the most recent version of ipfw and dummynet, developed and tested over the past two months in the ipfw3-head branch. This also happens to be the same code available in the Linux and Windows ports of ipfw and dummynet. The major enhancement is a completely restructured version of dummynet, with support for different packet scheduling algorithms (loadable at runtime), faster queue/pipe lookup, and a much cleaner internal architecture and kernel/userland ABI which simplifies future extensions. In addition to the existing schedulers (FIFO and WF2Q+), we include a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new, very fast version of WF2Q+ called QFQ. Some test code is also present (in sys/netinet/ipfw/test) that lets you build and test schedulers in userland. Also, we have added a compatibility layer that understands requests from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries, and replies correctly (at least, it does its best; sometimes you just cannot tell who sent the request and how to answer). The compatibility layer should make it possible to MFC this code in a relatively short time. Some minor glitches (e.g. handling of ipfw set enable/disable, and a workaround for a bug in RELENG_7's /sbin/ipfw) will be fixed with separate commits. CREDITS: This work has been partly supported by the ONELAB2 project, and mostly developed by Riccardo Panicucci and myself. The code for the qfq scheduler is mostly from Fabio Checconi, and Marta Carbone and Francesco Magno have helped with testing, debugging and some bug fixes.	2010-03-02 17:40:48 +00:00
Luigi Rizzo	7bc2288264	remove unnecessary casts leftover from a bogus fix to a previous bug	2010-03-02 16:24:16 +00:00
Alfred Perlstein	e722820434	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan	2010-03-02 06:58:58 +00:00
Joel Dahl	7df6f59359	The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software. Obtained from: NetBSD	2010-03-01 17:05:46 +00:00
Robert Watson	60efbc9991	Whitespace tweak. MFC after: 3 days	2010-03-01 00:43:05 +00:00
Robert Watson	938448cd87	Changes to support crashdump analysis of netisr: - Rename the netisr protocol registration array, 'np' to 'netisr_proto', in order to reduce the chances of symbol name collisions. It remains statically defined, but it will be looked up by netstat(1). - Move certain internal structure definitions from netisr.c to netisr_internal.h so that netstat(1) can find them. They remain private, and should not be used for any other purpose (for example, they should not be used by kernel modules, which must instead use the public interfaces in netisr.h). - Store a kernel-compiled version of NETISR_MAXPROT in the global variable netisr_maxprot, and export via a sysctl, so that it is available for use by netstat(1). This is especially important for crashdump interpretation, where the size of the workstream structure is determined by the maximum number of protocols compiled into the kernel. MFC after: 1 week Sponsored by: Juniper Networks	2010-03-01 00:42:36 +00:00
Konstantin Belousov	22e62e7e6e	In both if_tun and if_tap: Do not do additional dev_ref() on the newly created interface in the if_clone create method [1]. This reference is not needed and never removed, causing struct cdevpriv leakage. Remove the setting of SI_CHEAPCLONE flag as well, since it is unused. For dev_clone handlers, create cdevs with the call make_dev_credf(MAKEDEV_REF) instead of calling make_dev() and then dev_ref(), to avoid a race. Call drain_dev_clone_events() at the module unload time after dev_clone handler is deinstalled. Submitted by: Mikolaj Golub <to.my.trociny gmail com> [1] MFC after: 1 week	2010-02-28 16:25:49 +00:00
Robert Watson	7f450feb07	Fix edge cases in several KASSERTs: use <= rather than < when testing that counters have not gone about MAXCPU or NETISR_MAXPROT. These problems caused panics on UP kernels with INVARIANTS when using sysctl -a, but would also have caused problems for 32-core boxes or if the netisr protocol vector was fully populated. Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com> MFC after: 4 days	2010-02-25 09:51:14 +00:00
Bjoern A. Zeeb	7405f23cd7	Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets' in the db_show_all_table as 'show all ifnets' and with that follow the convention for showing complete lists. Submitted by: thompsa MFC after: 3 days	2010-02-24 15:54:24 +00:00
Robert Watson	c4fbf89fc5	Fix constant assignment for netisr protocol information sysctl. MFC after: 1 week Spotted by: bz	2010-02-22 16:16:16 +00:00
Robert Watson	2d22f334ea	Export netisr configuration and statistics to userspace via sysctl(9). MFC after: 1 week Sponsored by: Juniper Networks	2010-02-22 15:03:16 +00:00

... 2 3 4 5 6 ...

2953 Commits