freebsd-skq

Author	SHA1	Message	Date
glebius	0bf7c1fae8	Better comment for ifa_init(), ifa_ref(), ifa_free().	2012-02-05 08:53:05 +00:00
glebius	8be39a40a2	In ifa_init() initialize if_data.ifi_datalen. This would be required after upcoming changes from bz@. Discussed with: bz	2012-02-05 08:31:15 +00:00
kmacy	8466f3ca88	A flowtable entry can continue referencing an llentry indefinitely if the entry is repeatedly referenced within its timeout window. This change clears the LLE_VALID flag when an llentry is removed from an interface's hash table and adds an extra check to the flowtable code for the LLE_VALID flag in llentry to avoid retaining and using a stale reference. Reviewed by: qingli@ MFC after: 2 weeks	2012-01-26 20:02:40 +00:00
bz	417a8b2daa	Replace random ARIN direct assignment legacy IPs with proper RFC 5735 TEST-NET1 block for use in documentation and example code addresses. MFC after: 3 days	2012-01-24 15:20:31 +00:00
eadler	8cde6e1e87	- Fix trivial typo Approved by: nwhitehorn MFC after: 3 days	2012-01-14 17:07:52 +00:00
rwatson	ea95a90ac2	Clarify throughout the vlan(4) code the difference between a "tag" (the 802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a VLAN ID (VID). Tags go in packets. VIDs identify VLANs. No functional change is intended, so this should be safe to MFC. Further cleanup with functional changes will be committed separately (for example, renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI). Reviewed by: bz Sponsored by: ADARA Networks, Inc. MFC after: 3 days	2012-01-12 18:39:37 +00:00
lstewart	15b1ff3609	Consumers of bpfdetach() expect it to remove all bpf_if structs from the bpf_iflist list which reference the specified ifnet. The existing implementation only removes the first matching bpf_if found in the list, effectively leaking list entries if an ifnet has been bpfattach()ed multiple times with different DLTs. Fix the leak by performing the detach logic in a loop, stopping when all bpf_if structs referencing the specified ifnet have been detached and removed from the bpf_iflist list. Whilst here, also: - Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never exist in the list with a NULL ifnet pointer. - Except when INVARIANTS is in the kernel config, silently ignore the case where no bpf_if referencing the specified ifnet is found, as it is harmless and does not require user attention. Reviewed by: csjp MFC after: 1 week	2012-01-10 00:48:29 +00:00
jhb	0577b44f73	Convert the per-interface address list lock from a mutex to a reader/writer lock. Reviewed by: bz	2012-01-09 19:34:12 +00:00
glebius	06fc76befd	Copy ifa->if_data to ifam->ifam_data. This was forgotten in r228571. Submitted by: bz	2012-01-08 17:11:53 +00:00
glebius	f99edf0f86	Move arprequest() declaration to if_ether.h.	2012-01-08 13:34:00 +00:00
glebius	3c3da145a2	Since r228571 CARP is no longer an interface.	2012-01-06 12:05:43 +00:00
jhb	4ef366671a	Convert all users of IF_ADDR_LOCK to use new locking macros that specify either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks	2012-01-05 19:00:36 +00:00
jhb	219e62f17e	Add new variants of the IF_ADDR_LOCK() macros used for protecting interface address lists that distinguish read locks from write locks. To preserve the KPI, the previous operations are mapped to the write lock macros. The lock is still kept as a mutex for now. Reviewed by: bz MFC after: 2 weeks	2012-01-05 18:35:49 +00:00
rwatson	e3196d8882	Refine last comment. Submitted by: joeld Sponsored by: ADARA Networks, Inc. MFC after: 3 days	2012-01-05 11:42:34 +00:00
rwatson	628c91bb51	Add comment to the VLAN code about its integration with VIMAGE: we see what the code is doing, we recognise the legitimacy of its goal, but we're not quite sure it's going about it the right way. More pondering is clearly required. Sponsored by: ADARA Networks, Inc. Discussed with: bz MFC after: 3 days	2012-01-05 11:24:22 +00:00
lstewart	8a799f2a2f	Revert r228986 until it can be reworked to avoid panicing the kernel when the same interface is attached multiple times with different DLTs, as is done in net80211 for example. Reported by: adrian	2011-12-31 07:21:28 +00:00
lstewart	1b1510811a	- Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one aspect of time stamp configuration per interface rather than per BPF descriptor. Prior to this, the order in which BPF devices were opened and the per descriptor time stamp configuration settings could cause non-deterministic and unintended behaviour with respect to time stamping. With the new scheme, a BPF attached interface's tscfg sysctl entry can be set to "default", "none", "fast", "normal" or "external". Setting "default" means use the system default option (set with the net.bpf.tscfg.default sysctl), "none" means do not generate time stamps for tapped packets, "fast" means generate time stamps for tapped packets using a hz granularity system clock read, "normal" means generate time stamps for tapped packets using a full timecounter granularity system clock read and "external" (currently unimplemented) means use the time stamp provided with the packet from an underlying source. - Utilise the recently introduced sysclock_getsnapshot() and sysclock_snap2bintime() KPIs to ensure the system clock is only read once per packet, regardless of the number of BPF descriptors and time stamp formats requested. Use the per BPF attached interface time stamp configuration to control if sysclock_getsnapshot() is called and whether the system clock read is fast or normal. The per BPF descriptor time stamp configuration is then used to control how the system clock snapshot is converted to a bintime by sysclock_snap2bintime(). - Remove all FAST related BPF descriptor flag variants. Performing a "fast" read of the system clock is now controlled per BPF attached interface using the net.bpf.tscfg sysctl tree. - Update the bpf.4 man page. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ In collaboration with: Julien Ridoux (jridoux at unimelb edu au)	2011-12-30 08:57:58 +00:00
yongari	69a9d9792b	Update if_obytes and if_omcast after successful transmit. While I'm here update if_oerrors if parent interface of vlan is not up and running. Previously it updated collision counter and it was confusing to interprete it. PR: kern/163478 Reviewed by: glebius, jhb Tested by: Joe Holden < lists <> rewt dot org dot uk >	2011-12-29 18:40:58 +00:00
glebius	653f8c5e71	Provide ABI compatibility shim to enable configuring of addresses with ifconfig(8) prior to r228571. Requested by: brooks	2011-12-21 12:39:08 +00:00
glebius	8c74bad9f3	Restore a feature that was present in 5.x and 6.x, and was cleared in 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.	2011-12-20 13:53:31 +00:00
glebius	27a36f6ac8	A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]	2011-12-16 12:16:56 +00:00
glebius	c43116e67e	Simplify rtrequest(RTM_ADD): ifa can't be NULL after rt_getifa_fib().	2011-12-15 12:49:10 +00:00
brooks	9029bb4f3b	Remove the unused if_free_type() function. X-MFC after: never	2011-12-09 23:26:28 +00:00
luigi	298ffde665	1. Fix the handling of link reset while in netmap more. A link reset now is completely transparent for the netmap client: even if the NIC resets its own ring (e.g. restarting from 0), the client will not see any change in the current rx/tx positions, because the driver will keep track of the offset between the two. 2. make the device-specific code more uniform across different drivers There were some inconsistencies in the implementation of the netmap support routines, now drivers have been aligned to a common code structure. 3. import netmap support for ixgbe . This is implemented as a very small patch for ixgbe.c (233 lines, 11 chunks, mostly comments: in total the patch has only 54 lines of new code) , as most of the code is in an external file sys/dev/netmap/ixgbe_netmap.h , following some initial comments from Jack Vogel about making changes less intrusive. (Note, i have emailed Jack multiple times asking if he had comments on this structure of the code; i got no reply so i assume he is fine with it). Support for other drivers (em, lem, re, igb) will come later. "ixgbe" is now the reference driver for netmap support. Both the external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should serve as a reference for other device drivers. Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap, the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz on an i7-860 with 4 cores and 82599 card. Haven't tried yet more aggressive optimizations such as adding 'prefetch' instructions in the time-critical parts of the code.	2011-12-05 12:06:53 +00:00
lstewart	9ff63371a4	Revert r227778 in preparation for committing reworked patches in its place.	2011-11-29 12:55:26 +00:00
jhb	27cc3c5372	Change the if_vlan driver to use if_transmit for forwarding packets to the parent interface. This avoids the overhead of queueing a packet to an IFQ only to immediately dequeue it again. Suggested by: np Reviewed by: brooks MFC after: 1 month	2011-11-28 19:35:08 +00:00
glebius	5393e1dd70	- Use generic alloc_unr(9) allocator for if_clone, instead of hand-made. - When registering new cloner, check whether a cloner with same name already exist. - When allocating unit, also check with help of ifunit() whether such interface already exist or not. [1] PR: kern/162789 [1]	2011-11-28 14:44:59 +00:00
glebius	ec7618f8d2	Improve logging: - don't hardcode function name - use LOG_DEBUG for such a debug message - print error value	2011-11-22 19:42:17 +00:00
lstewart	1ce25155b2	- When feed-forward clock support is compiled in, change the BPF header to contain both a regular timestamp obtained from the system clock and the current feed-forward ffcounter value. This enables new possibilities including comparison of timekeeping performance and timestamp correction during post processing. - Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping packets using the feedback or feed-forward system clock. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-21 04:17:24 +00:00
luigi	b97eb69f80	Bring in support for netmap, a framework for very efficient packet I/O from userspace, capable of line rate at 10G, see http://info.iet.unipi.it/~luigi/netmap/ At this time I am bringing in only the generic code (sys/dev/netmap/ plus two headers under sys/net/), and some sample applications in tools/tools/netmap. There is also a manpage in share/man/man4 [1] In order to make use of the framework you need to build a kernel with "device netmap", and patch individual drivers with the code that you can find in sys/dev/netmap/head.diff The file will go away as the relevant pieces are committed to the various device drivers, which should happen in a few days after talking to the driver maintainers. Netmap support is available at the moment for Intel 10G and 1G cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re"). I have partial patches for "bge" and am starting to work on "cxgbe". Hopefully changes are trivial enough so interested third parties can submit their patches. Interested people can contact me for advice on how to add netmap support to specific devices. CREDITS: Netmap has been developed by Luigi Rizzo and other collaborators at the Universita` di Pisa, and supported by EU project CHANGE (http://www.change-project.eu/) The code is distributed under a BSD Copyright. [1] In my opinion is a bad idea to have all manpage in one directory. We should place kernel documentation in the same dir that contains the code, which would make it much simpler to keep doc and code in sync, reduce the clutter in share/man/ and incidentally is the policy used for all of userspace code. Makefiles and doc tools can be trivially adjusted to find the manpages in the relevant subdirs.	2011-11-17 12:17:39 +00:00
rmh	ff5c11fefd	Remove a few bits of FreeBSD 2.x compatibility code. Approved by: kib (mentor)	2011-11-14 18:21:27 +00:00
brooks	e4a4d6436f	In r191367 the need for if_free_type() was removed and a new member if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month	2011-11-11 22:57:52 +00:00
ed	0c56cf839d	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
ed	e97eae1577	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
mlaier	7aec0d1835	Fix a use-after-free/redzone issue in the routing code. Reported by (repeatedly): Mike Tancsa Prodded by (repeatedly): bz Forgotten by (repeatedly): mlaier MFC after: 2 weeks	2011-11-03 18:33:30 +00:00
glebius	fb9b0b1cbe	Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off the queue. It can be utilized in queue processing to avoid multiple locking/unlocking.	2011-10-27 09:45:12 +00:00
qingli	9ae130094e	The host-id/interface-id can have a specific value and is properly masked out when adding a prefix route through the "route" command. However, when deleting the route, simply changing the command keyword from "add" to "delete" does not work. The failoure is observed in both IPv4 and IPv6 route insertion. The patch makes the route command behavior consistent between the "add" and the "delete" operation. MFC after: 1 week	2011-10-25 00:34:39 +00:00
ed	b18bd1101c	Add missing #includes. According to POSIX, these two header files should be able to be included by themselves, not depending on other headers. The <net/if.h> header uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses integer datatypes (u_int32_t, u_short, etc). MFC after: 2 months	2011-10-21 12:58:34 +00:00
ed	832b15d289	Get rid of D_PSEUDO. It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL. Nowadays we have a different interface for that; make_dev_p(). There's no need to keep it there. While there, remove an unneeded D_NEEDMINOR from the gpio driver. Discussed with: gonzo@ (gpio)	2011-10-18 08:09:44 +00:00
bz	a13ffdabcc	Pass the fibnum where we need filtering of the message on the rtsock allowing routing daemons to filter routing updates on an rtsock per FIB. Adjust raw_input() and split it into wrapper and a new function taking an optional callback argument even though we only have one consumer [1] to keep the hackish flags local to rtsock.c. PR: kern/134931 Submitted by: multiple (see PR) Suggested by: rwatson [1] Reviewed by: rwatson MFC after: 3 days	2011-09-28 13:48:36 +00:00
kmacy	e3079e1350	Make KBI changes required for future MFCing of inpcb rtentry / llentry caching. Reviewed by: rwatson, bz Approved by: re (kib)	2011-09-20 20:27:26 +00:00
kmacy	99851f359e	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
thompsa	cb24e9343d	On the first loop for generating a bridge MAC address use the local hostid, this gives a good chance of keeping the same address over reboots. This is intended to help IPV6 and similar which generate their addresses from the mac. PR: kern/160300 Submitted by: mdodd Approved by: re (kib)	2011-09-04 22:06:32 +00:00
bz	75529036c4	When adding IPv6 fwd support to ipfw in r225044 these two files were not committed. Initialize next_hop6 to align with the IPv4 code. PR: bin/117214 MFC after: 3 weeks X-MFC with: r225044 Approved by: re (kib)	2011-08-27 08:49:55 +00:00
attilio	683d7a54ce	Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks	2011-08-25 15:51:54 +00:00
qingli	631a8abdff	When the RADIX_MPATH kernel option is enabled, the RADIX_MPATH code tries to find the first route node of an ECMP chain before executing the route command. If the system has a default route, and the specific route argument to the command does not exist in the routing table, then the default route would be reached. The current code does not verify the reached node matches the given route argument, therefore erroneous removed the entry. This patch fixes that bug. Approved by: re MFC after: 3 days	2011-08-25 04:31:20 +00:00
kevlo	c7105822b4	In rtinit1(), before rtrequest1_fib() is called, info.rti_flags is initialized by flags (function argument) or-ed with ifa->ifa_flags. If both NIC has a loopback route to itself, so IFA_RTSELF is set on ifa(s). As IFA_RTSELF is defined by RTF_HOST, rtrequest1_fib() is called with RTF_HOST flag even if netmask is not NULL. Consequently, netmask is set to zero in rtrequest1_fib(), and request to add network route is changed under hands to request to add host route. Tested by: Andrew Boyer <aboyer at averesystems.com> Submitted by: Svatopluk Kraus <onwahe at gmail dot com> Approved by: re (hrs)	2011-08-08 05:25:51 +00:00
pluknet	957e60f904	Add missing MODULE_VERSION() definition to protect against duplicating module loads. PR: kern/159345 Reported by: Eugene Grosbein <egrosbein att rdtc ru> Tested by: Eugene Grosbein <egrosbein att rdtc ru> Approved by: re (kib) MFC after: 1 week	2011-08-01 11:24:55 +00:00
bz	352be4e985	Add spares to the network stack for FreeBSD-9: - TCP keep* timers - TCP UTO (adjust from what was there already) - netmap - route caching - user cookie (temporary to allow for the real fix) Slightly re-shuffle struct ifnet moving fields out of the middle of spares and to better align. Discussed with: rwatson (slightly earlier version)	2011-07-17 21:15:20 +00:00
mp	f3103cdbe2	Clear the filter memory area before using it. Leaving it uninitialized may leak previous kernel stack contents through a malicioius BPF filter. PR: kern/158880 Submitted by: Guy Harris Obtained from: OpenBSD MFC after: 1 week	2011-07-14 21:06:22 +00:00
zec	99a0b299b3	Permit ARP to proceed for IPv4 host routes for which the gateway is the same as the host address. This already works fine for INET6 and ND6. While here, remove two function pointers from struct lltable which are only initialized but never used. MFC after: 3 days	2011-07-08 09:38:33 +00:00
thompsa	9d0e437193	Grab the rlock before checking if our interface is enabled, it could be possible to hit a dead pointer when changing interfaces. PR: kern/156978 Submitted by: Andrew Boyer MFC after: 1 week	2011-07-07 20:02:09 +00:00
bz	300a95bf76	Tag mbufs of all incoming frames or packets with the interface's FIB setting (either default or if supported as set by SIOCSIFFIB, e.g. from ifconfig). Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) Reviewed by: julian MFC after: 2 weeks	2011-07-03 16:08:38 +00:00
bz	cf260e73d6	Remove extra white space to comply with style for the rest of the struct. MFC after: 2 weeks	2011-07-03 15:34:09 +00:00
bz	9cad5bfef3	Add infrastructure to allow all frames/packets received on an interface to be assigned to a non-default FIB instance. You may need to recompile world or ports due to the change of struct ifnet. Submitted by: cjsp Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) (original versions) Reviewed by: julian Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru) MFC after: 2 weeks X-MFC: use spare in struct ifnet	2011-07-03 12:22:02 +00:00
pluknet	29ff2b4aff	Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32 (i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match the native SIOCGIFCONF behavior. PR: kern/158369 Reported by: Paul Procacci <pprocacci att gmail com> MFC after: 1 week	2011-06-28 08:41:44 +00:00
bz	a8e6967ab3	Garbage collect never used global, sysctl, externs. MFC after: 1 week	2011-06-21 07:19:03 +00:00
bz	98020d5ece	Leave an extra comment about flowtable and IPv6 support rectifying a previous comment. MFC after: 1 week	2011-06-20 12:35:12 +00:00
bz	f4689a8d0f	gre(4) was using a field in the softc to detect possible recursion. On MP systems this is not a usable solution anymore and could easily lead to false positives triggering enough logging that even using the console was no longer usable (multiple parallel ping -f can do). Switch to the suggested solution of using mbuf tags to carry per packet state between gre_output() invocations. Contrary to the proposed solution modelled after gif(4) only allocate one mbuf tag per packet rather than per packet and per gre_output() pass through. As the sysctl to control the possible valid (gre in gre) nestings does no sanity checks, make sure to always allocate space in the mbuf tag for at least one, and at most 255 possible gre interfaces to detect loops in addition to the counter. Submitted by: Cristian KLEIN (cristi net.utcluj.ro) (original version) PR: kern/114714 Reviewed by: Cristian KLEIN (cristi net.utcluj.ro) Reviewed bu: Wooseog Choi (ben_choi hotmail.com) Sponsored by: Sandvine Incorporated MFC after: 1 week	2011-06-18 09:34:03 +00:00
luigi	7cd78b912e	Grab one of the ifcap bits for netmap, and enable printing in ifconfig. Document the fact that we might want an IFCAP_CANTCHANGE mask, even though the value is not yet used in sys/net/if.c (asked on -current a week ago, no feedback so i assume no objection).	2011-06-14 12:40:55 +00:00
zec	5bdda7f91b	Set curvnet context in a callout-trigerred code path. MFC after: 3 days	2011-06-07 20:46:03 +00:00
jhb	b37cdd581d	Properly return an ENOBUFS error if a write to a tun(4) device fails due to m_uiotombuf() failing. While here, trim unneeded error handling related to tuninit() since it can never fail. Submitted by: Martin Birgmeier la5lbtyi aon at Reviewed by: glebius MFC after: 1 week	2011-06-03 13:47:05 +00:00
rwatson	79f582472b	Add an optional netisr dispatch point at ether_input(), but set the default dispatch method to NETISR_DISPATCH_DIRECT in order to force direct dispatch. This adds a fairly negligble overhead without changing default behavior, but in the future will allow deferred or hybrid dispatch to other worker threads before link layer processing has taken place. For example, this could allow redistribution using RSS hashes without ethernet header cache line hits, if the NIC was unable to adequately implement load balancing to too small a number of input queues -- perhaps due to hard queueset counts of 1, 3, or 8, but in a modern system with 16-128 threads. This can happen on highly threaded systems, where you want want an ithread per core, redistributing work to other queues, but also on virtualised systems where hardware hashing is (or is not) available, but only a single queue has been directed to one VCPU on a VM. Note: this adds a previously non-present assertion about the equivalence of the ifnet from which the packet is received, and the ifnet stamped in the mbuf header. I believe this assertion to generally be true, but we'll find out soon -- if it's not, we might have to add additional overhead in some cases to add an m_tag with the originating ifnet pointer stored in it. Reviewed by: bz MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.	2011-06-01 20:00:25 +00:00
nwhitehorn	a69e106b2f	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb	2011-05-31 15:11:43 +00:00
rwatson	d81ad1a304	Rework netisr policy mechanism so that per-protocol dispatch policies can be represented: - A single policy namespace is defined, consisting of four possible policies: "default" to use the global default, "deferred" to force deferred dispatch, "direct" to employ direct dispatch where possible, and "hybrid" which makes a dynamic decision based on CPU affinity, ordering, etc. Routines are implemented to convert between strings and an integer namespace. - A new global variable, netisr_dispatch_policy, subsumes existing global variables for direct dispatch, forced direct dispatch, etc, and is used for explicit policy interpretation and composition. Old variables remain so that they can be exported by legacy sysctls for use by old netstat(1) binaries. A new sysctl and tunable, netisr.dispatch.policy, accepts the above strings for specifying a global policy default. - The protocol registration structure, netisr_handler, grows an nh_dispatch field, which accepts a per-policy policy override. The default value is '0', which corresponds to "default", meaning that protocols will accept the global default policy unless otherwise specified. - Policies are now interpreted and composed explicitly at various points in packet dispatch; protocol policies override global policies. - Protocols grow the ability to express a non-opinion about affinity even when implenting m2cpuid by returning NETISR_CPUID_NONE. In that case, the framework falls back on source ordering, rather than simply using the current CPU. These changes are in support of allowing link layer re-dispatch based on RSS or similar hashes provided by NICs, especially in the case where the number of hardware receive queues matches hardware core count, rather than hardware thread count, requiring further software redistributeon. (i.e., on RMI XLR). MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-24 12:34:19 +00:00
zec	0e7fcf0b2c	Allow for vlan(4) interfaces with MTU of 1500 bytes to be configured on top of epair(4) virtual interfaces, since there's no physical hardware associated with epair interfaces which would imply any constraints on MTU sizes. MFC after: 3 days	2011-05-24 08:02:55 +00:00
zec	3ab76c6800	Let epair(4) virtual interfaces report fake link / media status, by borrowing the skeleton of if_media manipulation and reporting code from if_lagg(4). The main motivation behind this change is to allow for epair(4) interfaces to participate in STP if_bridge(4) configurations. Reviewed by: bz MFC after: 3 days	2011-05-24 07:57:28 +00:00
qingli	a1bf1a2582	The statically configured (permanent) ARP entries are removed when an interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days	2011-05-20 19:12:20 +00:00
marius	03919a3303	- Add 10baseT as an alias for 10baseT/UTP. - Add shorthand aliases for common media+option combinations as announced by miibus(4) so that one can actually supply the media strings found in the dmesg output to ifconfig(8). Obtained from: NetBSD (in principle) MFC after: 2 weeks	2011-05-15 12:58:29 +00:00
yongari	0aab37190f	Fix white space nits and style	2011-05-06 20:46:29 +00:00
yongari	b282e7aca8	Do not increment collision counter if transmit have failed. Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it has nothing to do with collision. Reported by: Zeus V Panchenko (zeus <> ibs dot dn dot ua)	2011-05-06 20:37:07 +00:00
thompsa	fc83a48265	LACP frames must not be send VLAN-tagged, check for that before processing. PR: kern/156743 Submitted by: Dmitrij Tejblum MFC after: 1 week	2011-04-30 20:34:52 +00:00
bz	1910487722	Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:30:44 +00:00
glebius	8c3b1b83f9	When removing ifnets, we should first remove the reference to ifnet from the interface index, then decrease refcount, not vice versa. Otherwise there is a race (reproducible) when if_free_internal() contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the index for a long time. It may be picked by some other thread, that runs ifnet_byindex_ref(), who takes the ifnet from index, and bumps refcount. When reader drops the lock, if_free_internal() proceeds with free. Then reader tries to free it a second time.	2011-04-04 07:45:08 +00:00
jeff	2d7d8c05e7	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
dchagin	0f55e947bd	Remove dead code. MFC after: 1 Week	2011-03-20 08:35:00 +00:00
dchagin	a96a94aaf2	ouch, newrt is used on the return path, my fault. Partialy revert the previous change. MFC after: 1 Week.	2011-03-19 21:10:57 +00:00
dchagin	b3fb445553	A bit rearranged rtalloc1_fib() code. Initialize a variable when it is really needed. To avoid code duplication move the miss label to line up and jump on it. MFC after: 1 Week	2011-03-19 19:50:36 +00:00
dchagin	9dffdca01d	Remove a now unused variable. MFC after: 1 Week	2011-03-19 16:52:06 +00:00
eri	2ad117efbd	Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none. Approved by: thompsa(mentor) MFC after: 1 week	2011-03-04 20:37:38 +00:00
bz	209ebad7af	Hide the outer IP addresses of a tunnel interfaces (gif(4), gre(4)) from processes inside jails if the addresses do not belong to the jail. Originally reported by: Pieter de Boer via remko PR: kern/151119 Tested by: Piotr KUCHARSKI (nospam 42.pl) [gif] MFC after: 1 week	2011-03-02 21:39:08 +00:00
brucec	6d9b42b486	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
bz	b9b7d3e93a	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
bz	1289b7e264	Mfp4 CH=177255: Resort the CURVNET_SET* macros in the non-VNET_DEBUG case to match the call order of the VNET_DEBUG case. Add the VNET_ASSERT() to the non-VNET_DEBUG case as well so that INVARIANTS will still catch problems. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks	2011-02-11 14:17:58 +00:00
bz	aad842be85	Mfp4 CH=177255: Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS. Change the syntax to match KASSERT() to allow more flexible panic messages rather than having a printf with hardcoded arguments before panic. Adjust the few assertions we have to the new format (and enhance the output). Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 2 weeks	2011-02-11 13:27:00 +00:00
bz	3bac00c1ca	Mfp4 CH=177255: Use __func__ rather than __FUNCTION__. MFC after: 2 weeks	2011-02-11 12:56:05 +00:00
mlaier	f0ab2e35d4	As info.rti_info[RTAX_DST] can point inside of rtm we must not free the rtm until rt_dispatch is done with the sockaddr. Found by: memguard MFC after: 3 days	2011-02-10 01:24:09 +00:00
jhb	aff100c87f	Fix a LOR by dropping the global ifnet locks while allocating a new ifnet table in if_grow(). The order of the SYSINIT's for ifnet state were swapped so that the various locks were initialized before being used. Reviewed by: pluknet, bz MFC after: 2 weeks	2011-01-24 22:21:58 +00:00
mdf	6648b8cede	sysctl(8) should use the CTLTYPE to determine the type of data when reading. (This was already done for writing to a sysctl). This requires all SYSCTL setups to specify a type. Most of them are now checked at compile-time. Remove SYSCTL_X sysctl additions as the print being in hex should be controlled by the -x flag to sysctl(8). Succested by: bde	2011-01-19 17:04:07 +00:00
mdf	5e41205b16	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the net* piece.	2011-01-12 19:53:50 +00:00
jhb	c17f46e472	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde	2011-01-11 13:59:06 +00:00
bz	7864be44ec	MfP4 CH=185246 [1]: Add FEATURE() to announce optional VIMAGE. MFC after: 3 days [1] for the moment put it in vnet.c.	2011-01-09 20:40:21 +00:00
jhb	3a7fd5f8c7	- Restore dropping the priority of syncer down to PPAUSE when it is idle. This was lost when it was converted to using a condition variable instead of lbolt. - Drop the priority of flowtable down to PPAUSE when it is idle as well since it is a similar background task. MFC after: 2 weeks	2011-01-06 22:17:07 +00:00
marius	2c3e165ed0	Teach ifconfig(8) the handy shared option shortcut aliases the NetBSD counterpart also takes, i.e. "fdx" for "full-duplex", "flow" for "flowcontrol", "hdx" for "half-duplex" as well as "loop" and "loopback" for "hw-loopback". MFC after: 1 week	2011-01-05 15:28:30 +00:00
marius	577786b402	Fix whitespace. MFC after: 1 week	2011-01-05 14:51:04 +00:00
bz	43c37d3ddd	Use NULL rather than 0 to invalidate a pointer. Rather than duplicating the LLE_FREE_LOCKED() macro code in LLE_FREE(), call it directly (like we do for the RT_* macros). Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC After: 1 week [1] Early 2010.	2010-12-31 21:57:54 +00:00
bz	2a04b52e8a	Print the vnet pointer under DDB when iterating over flowtables of each virtual network stack instance. Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC after: 1 week [1] Early 2010.	2010-12-31 21:20:32 +00:00
bz	558c7e035b	Move the increment operation under the lock and split the condition variable into two so that we can see on which one we are waiting. This might also more properly propagate the update of the flowclean_cycles flag and avoid "hangs" people were seeing. Suggested by: rwatson [1] Sponsored by: ISPsystem [1] Reviewed by: julian [1] Updated by: Mikolaj Golub (to.my.trociny gmail.com) Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC After: 1 week [1] Early 2010, initial version.	2010-12-31 21:06:52 +00:00
alc	971b02b7bc	Introduce and use a new VM interface for temporarily pinning pages. This new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@	2010-12-25 21:26:56 +00:00
weongyo	6dc48cb05c	Adds IFF_CANTCONFIG to IFF_CANTCHANGE that it shouldn't happen through ioctl(2).	2010-12-07 20:31:04 +00:00
weongyo	33417874f4	Introduces IFF_CANTCONFIG interface flag to point that the interface isn't configurable in a meaningful way. This is for ifconfig(8) or other tools not to change code whenever IFT_USB-like interfaces are registered at the interface list. Reviewed by: brooks No objections: gavin, jkim	2010-12-07 20:23:47 +00:00
maxim	c6ed377d5e	o Swap descriptions for net.bpf.bufsize and net.bpf.maxbufsize. PR: misc/152531 MFC after: 1 week	2010-11-24 05:50:19 +00:00
zec	163ca01105	Allow for vlan(4) ifnets to have overlapping unit numbers if they are created in separated vnets. As a side-effect of having a separated if_cloner instance for each vnet, all vlan ifnets created in a vnet will be automatically destroyed when vnet teardown is initiated. Disallow SIOCSETVLAN and SIOCGETVLAN ioctls on vlan ifnets which are associated with physical ifnets residing in parent vnets. This is an interim vlan-specific solution which will be superseded by a more generic if_cloner V_irtualization change from p4. For nooptions VIMAGE builds, this should be a no-op change. Discussed with: bz MFC after: 3 days	2010-11-22 23:35:29 +00:00
dim	fb307d7d1d	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
bz	076c86b1ea	Add a missing ';' and change the debugging sysctl from xint to int. Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 3 days	2010-11-21 19:33:19 +00:00
dim	02778a6df6	Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined.	2010-11-14 20:40:55 +00:00
dim	fda4020a88	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
dim	7dff36caf7	Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-14 20:23:02 +00:00
marius	278d761d73	o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control support in mii(4): - Merge generic flow control advertisement (which can be enabled by passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD, IFM_FLOW isn't implemented as a global option via the "don't care mask" but instead as a media specific option this. This has the following advantages: o allows flow control advertisement with autonegotiation to be turned on and off via ifconfig(8) with the default typically being off (though MIIF_FORCEPAUSE has been added causing flow control to be always advertised, allowing to easily MFC this changes for drivers that previously used home-grown support for flow control that behaved that way without breaking POLA) o allows to deal with PHY drivers where flow control advertisement with manual selection doesn't work or at least isn't implemented, like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4), by setting MIIF_NOMANPAUSE o the available combinations of media options are readily available from the `ifconfig -m` output - Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so these are understood by ifconfig(8). o Make the master/slave support in mii(4) actually usable: - Change IFM_ETH_MASTER from being implemented as a global option via the "don't care mask" to a media specific one as it actually is only applicable to IFM_1000_T to date. - Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to actually configure manually selected slave mode (like we also do in the PHY specific implementations). - Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it is understood by ifconfig(8). o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4), e1000phy(4) and ip1000phy(4) to use the generic flow control support instead of home-grown solutions via IFM_FLAGs. This includes changing these PHY drivers and smcphy(4) to no longer unconditionally advertise support for flow control but only if the selected media has IFM_FLOW set (or MIIF_FORCEPAUSE is set) and implemented for these media variants, i.e. typically only for copper. o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0 and some IFM_FLAGn. o Switch brgphy(4) to add at least the the supported copper media based on the contents of the BMSR via mii_phy_add_media() instead of hardcoding them. The latter approach seems to have developed historically, besides causing unnecessary code duplication it was also undesirable because brgphy_mii_phy_auto() already based the capability advertisement on the contents of the BMSR though. o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not just BCM5701. Apparently this was a misinterpretation of a workaround in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but this doesn't mean we can't set these as well on other PHYs for manual media selection. o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so IFM_1000_T master mode support now is generally available with all PHY drivers. o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's not applicable there. Reviewed by: yongari (plus additional testing) Obtained from: NetBSD (partially), OpenBSD (partially) MFC after: 2 weeks	2010-11-14 13:26:10 +00:00
kib	1889bb0afa	Use 'z' modifier for size_t printing.	2010-11-13 11:11:51 +00:00
dim	7b0aabca30	Similar to r212647, remove the workaround in sys/net/vnet.h for an ld bug (incorrect placement of __start_SECNAME in some cases) that was fixed in r210245. There is already an UPDATING entry about needing a recent ld. MFC after: 1 month	2010-11-12 22:59:50 +00:00
gnn	c3225b5eaa	Add a queue to hold packets while we await an ARP reply. When a fast machine first brings up some non TCP networking program it is quite possible that we will drop packets due to the fact that only one packet can be held per ARP entry. This leads to packets being missed when a program starts or restarts if the ARP data is not currently in the ARP cache. This code adds a new sysctl, net.link.ether.inet.maxhold, which defines a system wide maximum number of packets to be held in each ARP entry. Up to maxhold packets are queued until an ARP reply is received or the ARP times out. The default setting is the old value of 1 which has been part of the BSD networking code since time immemorial. Expose the time we hold an incomplete ARP entry by adding the sysctl net.link.ether.inet.wait, which defaults to 20 seconds, the value used when the new ARP code was added.. Reviewed by: bz, rpaulo MFC after: 3 weeks	2010-11-12 22:03:02 +00:00
dim	2c7257916f	Use the same treatment as in linker_set.h for the __start and __stop symbols of the set_vnet and set_pcpu sections, so those symbols will always be emitted in kernel modules, if they use vnet.h or pcpu.h. Also, for pcpu.h, make the __(start\|stop)_set_pcpu declarations, and associated macros invisible to userland, to prevent it picking up these symbols. Reviewed by: kib	2010-11-11 19:18:52 +00:00
rpaulo	2631ae0f3d	Sync DLTs with the latest pcap version.	2010-10-29 18:41:09 +00:00
bz	491af1942e	Factor out DDB commands from r204145, r204279 into if_debug.c for further enhancements (1). Switch to a standard 2-clause BSD license for this (2). Unfortunately we have to un-static the ifindex_table for this but do not publicly export it. Suggested by: rwatson (1) a while back. Approved by: thompsa (2) for the change from r204279. MFC after: 6 days	2010-10-25 08:30:19 +00:00
pluknet	8950ed8036	Reshuffle SIOCGIFCONF32 handler from r155224. - move all the chunks into one file, which allows to hide SIOCGIFCONF32 global definition as well. - replace __amd64__ with proper COMPAT_FREEBSD32 around. - handle 32bit capacity before going into the handler itself instead of doing internal 32bit specific changes within it (e.g. as it's done for SIOCGDEFIFACE32_IN6). - use explicitely sized types for ABI compat. Approved by: kib (mentor) MFC after: 2 weeks	2010-10-21 16:20:48 +00:00
bz	753b9262ae	Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating over all interfaces to make sure the address will neither change nor be freed while we are working on it. PR: kern/146250 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 1 week	2010-10-16 19:25:27 +00:00
bz	e7e5079137	lltable_drain() has never been used so far, thus #if 0 it for now. While touching it add the missing locking to the now disabled code for the time when we'll resurrect it. MFC after: 3 days	2010-10-16 18:42:09 +00:00
bz	054f4fff23	Only hide the ifa and not the tp under #ifdef INET as the tp is needed for locking evenwhen there is no INET. MFC after: 3 days	2010-10-01 15:14:14 +00:00
jhb	b8914d3479	- Expand scope of tun/tap softc locks to cover more softc fields and driver-maintained ifnet fields (such as if_drv_flags). - Use soft locks as the mutex that protects each interface's knote list rather than using the global knote list lock. Also, use the softc for kn_hook instead of the cdev. - Use mtx_sleep() instead of tsleep() when blocking in the read routines. This fixes a lost wakeup race. - Remove D_NEEDGIANT now that the cdevsw routines use the softc lock where locking is needed. - Lock IFQ when calculating the result for FIONREAD in tap(4). tun(4) already did this. - Remove remaining spl calls. Submitted by: Marcin Cieslak saper of saper\|info (3) MFC after: 2 weeks	2010-09-22 21:02:43 +00:00
jkim	4b1a6e6702	Fix a typo in a comment. Submitted by: afiveg	2010-09-16 18:37:33 +00:00
mdf	ab3a8b533a	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
bz	06977c291c	MFp4 CH=183259: No reason to use if_free_type() as we don't change our type. Just if_free() is fine. MFC after: 3 days	2010-09-02 16:11:12 +00:00
emaste	a9a1b47f1d	Add a sysctl knob to accept input packets on any link in a failover lagg.	2010-09-01 16:53:38 +00:00
bz	70761a9ea6	MFp4 CH=182972: Add explicit linkstate UP/DOWN for the epair. This is needed by carp(4) and other things to work. MFC after: 5 days	2010-08-27 23:22:58 +00:00
rpaulo	ea11ba6788	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
zec	e1e5264fc5	When moving an ethernet ifnet from one vnet to another, destroy the associated ng_ether netgraph node in the current vnet, and create a new one in the target vnet. Reviewed by: julian MFC after: 3 days	2010-08-13 18:17:32 +00:00
will	d548943ae9	Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with the appropriate ifdefs. Reviewed by: bz Approved by: ken (mentor)	2010-08-11 20:18:19 +00:00
will	aa4e762c4a	Allow carp(4) to be loaded as a kernel module. Follow precedent set by bridge(4), lagg(4) etc. and make use of function pointers and pf_proto_register() to hook carp into the network stack. Currently, because of the uncertainty about whether the unload path is free of race condition panics, unloads are disallowed by default. Compiling with CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure. This commit requires IP6PROTOSPACER, introduced in r211115. Reviewed by: bz, simon Approved by: ken (mentor) MFC after: 2 weeks	2010-08-11 00:51:50 +00:00
jhb	4ab164bfd3	Adjust the interface type in the link layer socket address for vlan(4) interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface (IFT_ETHER). The code already fixed if_type in the ifnet causing some places to report the interface as a vlan (e.g. arp -a output) and other places to report the interface as Ethernet (getifaddrs(3)). Now they should all report IFT_L2VLAN. Reviewed by: brooks MFC after: 1 month	2010-08-06 15:15:26 +00:00
kib	c5df0ad3b8	Properly set ifi_datalen for compat32 struct if_data32. PR: kern/149240 Submitted by: Stef Walter <stef memberwebs com> MFC after: 1 weeks	2010-08-03 15:40:42 +00:00
glebius	b89adb17c9	Don't check malloc(M_WAITOK) result.	2010-07-27 11:56:49 +00:00
bz	0078d05705	Return NULL rather than 0 for a pointer. MFC after: 3 days	2010-07-27 11:54:01 +00:00
glebius	43f917d899	When installing a new ARP entry via 'arp -S', lla_lookup() will either find an existing entry, or allocate a new one. In the latter case an entry would have flags, that were supplied as argument to lla_lookup(). In case of an existing entry, flags aren't modified. This lead to losing LLE_PUB and/or LLE_PROXY flags. We should apply these flags either in lla_rt_output() or in the in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is a more correct choice. PR: kern/148784, kern/146539 Silence from: qingli, 5 days	2010-07-27 10:05:27 +00:00
jkim	9fdab3dd15	Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock regardless of the given flags. MFC after: 3 days	2010-07-22 18:44:40 +00:00
luigi	bbbbe1a022	whitespace cleanup	2010-07-15 14:41:59 +00:00
luigi	9f2bcd601a	small portability fix to build on linux/windows	2010-07-15 14:41:06 +00:00
jkim	14f08fd627	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
jhb	9b74a62d73	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
zec	66c3a596d7	Provide a macro for registering a virtualized sysctl handler for VNET opaque data. MFC after: 30 days	2010-06-02 15:29:21 +00:00
qingli	f6ab4a6810	This patch fixes the problem where proxy ARP entries cannot be added over the if_ng interface. MFC after: 3 days	2010-05-25 20:42:35 +00:00
jhb	7df2d528cf	Ignore failures from removing multicast addresses from the parent (trunk) interface when tearing down a vlan interface. If a trunk interface is detached, all of its multicast addresses are removed before the ifnet departure eventhandlers are invoked. This means that all of the multicast addresses are removed before the vlan interfaces are removed which causes the if_delmulti() calls in the vlan teardown to fail. In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer valid parent interface. In the !VLAN_ARRAY case, the eventhandler gets stuck in an infinite loop retrying vlan_unconfig_locked() forever. In general the callers of vlan_unconfig_locked() do not expect nor handle failure, so I believe it is safer to ignore the errors and tear down as much of the vlan state as possible. Silence from: net@ MFC after: 4 days	2010-05-17 19:36:56 +00:00
kmacy	a68dd336d5	allocate ipv6 flows from the ipv6 flow zone reported by: rrs@ MFC after: 3 days	2010-05-16 21:48:39 +00:00
bz	c9d1ca826b	Fix an issue with the dynamic pcpu/vnet data allocators. We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than _MODMIN space and ignore the extra space (). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-05-14 21:11:58 +00:00
kmacy	5e73b28c4d	workaround bug with ipv6 where a flow can have a null rtentry	2010-05-12 04:51:20 +00:00
alc	e6c77ecaea	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
alc	c9aaa1e2a2	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
sobomax	213eac1f2c	Add new tunable 'net.link.ifqmaxlen' to set default send interface queue length. The default value for this parameter is 50, which is quite low for many of today's uses and the only way to modify this parameter right now is to edit if_var.h file. Also add read-only sysctl with the same name, so that it's possible to retrieve the current value. MFC after: 1 month	2010-05-03 07:32:50 +00:00
alc	387e15c45a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
kmacy	1dc1263413	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00

1 2 3 4 5 ...

2900 Commits