freebsd-skq

Author	SHA1	Message	Date
Pedro F. Giffuni	fe267a5590	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
Andrey V. Elsukov	c00bf73076	Do not adjust interface MTU automatically. Leave this task to the system administrator. This restores the behavior that was prior to r274246. No objection from: #network MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10215	2017-04-11 08:56:18 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Marcelo Araujo	2bfd3dfb9f	Wrap using #ifdef 'notyet' those variables and statements not yet implemented to lower the compiler warnings. It fix the case of unused-but-set-variable spotted by gcc4.9. Reviewed by: ngie, ae Approved by: bapt (mentor) Differential Revision: https://reviews.freebsd.org/D4720	2015-12-31 02:01:20 +00:00
Andrey V. Elsukov	50bc87bc43	Check the size of data available in mbuf, before using them. PR: 202667 MFC after: 1 week	2015-10-28 17:55:37 +00:00
Hiroki Sato	f1aaad0cd9	Add IFCAP_LINKSTATE support.	2015-10-03 09:15:23 +00:00
Andrey V. Elsukov	1a6fb597b0	Always detach encap handler when reconfiguring tunnel. Reported by: hrs MFC after: 1 week	2015-10-03 03:57:58 +00:00
Andrey V. Elsukov	eccfe69a5c	Add new socket ioctls SIOC[SG]TUNFIB to set FIB number of encapsulated packets on tunnel interfaces. Add support of these ioctls to gre(4), gif(4) and me(4) interfaces. For incoming packets M_SETFIB() should use if_fib value from ifnet structure, use proper value in gre(4) and me(4). Differential Revision: https://reviews.freebsd.org/D2462 No objection from: #network MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-12 07:37:27 +00:00
Andrey V. Elsukov	84d03ddad3	Fix an error introduced in r274246. Pass mtag argument into m_tag_locate() to continue the search from the last found mtag. X-MFC after: r274246	2015-01-10 03:26:46 +00:00
Andrey V. Elsukov	f188f14d43	Extern declarations in C files loses compile-time checking that the functions' calls match their definitions. Move them to header files. Reviewed by: jilles (previous version)	2014-12-25 21:32:37 +00:00
Bjoern A. Zeeb	4dbd7c5dc4	After r274246 make the tree compile again. gcc requires variables to be initialised in two places. One of them is correctly used only under the same conditional though. For module builds properly check if the kernel supports INET or INET6, as otherwise various mips kernels without IPv6 support would fail to build.	2014-11-08 14:41:32 +00:00
Andrey V. Elsukov	f325335caf	Overhaul if_gre(4). Split it into two modules: if_gre(4) for GRE encapsulation and if_me(4) for minimal encapsulation within IP. gre(4) changes: * convert to if_transmit; * rework locking: protect access to softc with rmlock, protect from concurrent ioctls with sx lock; * correct interface accounting for outgoing datagramms (count only payload size); * implement generic support for using IPv6 as delivery header; * make implementation conform to the RFC 2784 and partially to RFC 2890; * add support for GRE checksums - calculate for outgoing datagramms and check for inconming datagramms; * add support for sending sequence number in GRE header; * remove support of cached routes. This fixes problem, when gre(4) doesn't work at system startup. But this also removes support for having tunnels with the same addresses for inner and outer header. * deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD. Use our standard ioctls for tunnels. me(4): * implementation conform to RFC 2004; * use if_transmit; * use the same locking model as gre(4); PR: 164475 Differential Revision: D1023 No objections from: net@ Relnotes: yes Sponsored by: Yandex LLC	2014-11-07 19:13:19 +00:00
Hiroki Sato	89c58b73e0	- Virtualize interface cloner for gre(4). This fixes a panic when destroying a vnet jail which has a gre(4) interface. - Make net.link.gre.max_nesting vnet-local.	2014-09-21 03:56:06 +00:00
Gleb Smirnoff	3751dddb3e	Mechanically convert to if_inc_counter().	2014-09-19 10:39:58 +00:00
Gleb Smirnoff	a6f2696932	Increase errors, not queue drops, in cases the module is supplied with a bad packet or if mbuf allocation failes.	2014-09-19 05:43:38 +00:00
Kevin Lo	73d76e77b6	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
Gleb Smirnoff	45c203fce2	Remove AppleTalk support. AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 06:29:43 +00:00
Gleb Smirnoff	555036b5f6	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Gleb Smirnoff	540b1a7238	Clean up SIOCSIFDSTADDR usage from ifnet drivers. The ioctl itself is extremely outdated, and I doubt that it was ever used for ifnet drivers. It was used for AF_INET sockets in pre-FreeBSD time. Approved by: re (hrs) Sponsored by: Nginx, Inc.	2013-09-11 09:19:44 +00:00
Hiroki Sato	b8992a6792	Add IFF_MONITOR support to gre(4). Tested by: Chip Marshall MFC after: 1 week	2013-05-11 19:05:38 +00:00
Gleb Smirnoff	47e8d432d5	Add const qualifier to the dst parameter of the ifnet if_output method.	2013-04-26 12:50:32 +00:00
Gleb Smirnoff	dc4ad05ecd	Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.	2013-03-15 12:55:30 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Gleb Smirnoff	8f134647ca	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
Andre Oppermann	c9b652e3e8	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.	2012-10-18 13:57:24 +00:00
Gleb Smirnoff	42a58907c3	Make the "struct if_clone" opaque to users of the cloning API. Users now use function calls: if_clone_simple() if_clone_advanced() to initialize a cloner, instead of macros that initialize if_clone structure. Discussed with: brooks, bz, 1 year ago	2012-10-16 13:37:54 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Bjoern A. Zeeb	52dcd04ba3	gre(4) was using a field in the softc to detect possible recursion. On MP systems this is not a usable solution anymore and could easily lead to false positives triggering enough logging that even using the console was no longer usable (multiple parallel ping -f can do). Switch to the suggested solution of using mbuf tags to carry per packet state between gre_output() invocations. Contrary to the proposed solution modelled after gif(4) only allocate one mbuf tag per packet rather than per packet and per gre_output() pass through. As the sysctl to control the possible valid (gre in gre) nestings does no sanity checks, make sure to always allocate space in the mbuf tag for at least one, and at most 255 possible gre interfaces to detect loops in addition to the counter. Submitted by: Cristian KLEIN (cristi net.utcluj.ro) (original version) PR: kern/114714 Reviewed by: Cristian KLEIN (cristi net.utcluj.ro) Reviewed bu: Wooseog Choi (ben_choi hotmail.com) Sponsored by: Sandvine Incorporated MFC after: 1 week	2011-06-18 09:34:03 +00:00
Bjoern A. Zeeb	e3416ab0c0	Hide the outer IP addresses of a tunnel interfaces (gif(4), gre(4)) from processes inside jails if the addresses do not belong to the jail. Originally reported by: Pieter de Boer via remko PR: kern/151119 Tested by: Piotr KUCHARSKI (nospam 42.pl) [gif] MFC after: 1 week	2011-03-02 21:39:08 +00:00
Maxim Sobolev	e50d35e6c6	Add new tunable 'net.link.ifqmaxlen' to set default send interface queue length. The default value for this parameter is 50, which is quite low for many of today's uses and the only way to modify this parameter right now is to edit if_var.h file. Also add read-only sysctl with the same name, so that it's possible to retrieve the current value. MFC after: 1 month	2010-05-03 07:32:50 +00:00
Joel Dahl	7df6f59359	The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software. Obtained from: NetBSD	2010-03-01 17:05:46 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Bjoern A. Zeeb	7654a365db	Add the explicit include of vimage.h to another five .c files still missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.	2009-06-17 12:44:11 +00:00
Kip Macy	279aa3d419	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson	2009-04-16 20:30:28 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Julian Elischer	44554a6de7	Add support for actually sending WCCP return packets via GRE. This MAY be combined by a clever person with the 'key' code recently added, however a cursary glance suggest that it would be safer to just keep the patches as it is unlikely that the two modes would be used together and the separate patch has been extensively tested. Obtained from: here and there MFC after: 1 week	2008-07-20 21:45:15 +00:00
Julian Elischer	a54eadd8c4	change a variable name ot stop it from colliding with other names in some situations. (i.e. in vimage) MFC after: 1 week	2008-06-26 22:59:49 +00:00
Andrew Thompson	131c55bc5b	Add support for the optional key in the GRE header. PR: kern/114714 Submitted by: Cristian KLEIN	2008-06-20 17:26:34 +00:00
Julian Elischer	8b07e49a00	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco	2008-05-09 23:03:00 +00:00
Julian Elischer	29481f8846	Replace really convoluted code that simplifies to "a ^= 0x01;"	2008-03-19 22:29:11 +00:00
Robert Watson	cc9bdf2a62	Sync comments to code: we now use priv_check() rather than suser() to determine privilege. Approved by: re (bmah)	2007-06-26 23:01:01 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Brooks Davis	43bc7a9c62	With exception of the if_name() macro, all definitions in net_osdep.h were unused or already in if_var.h so add if_name() to if_var.h and remove net_osdep.h along with all references to it. Longer term we may want to kill off if_name() entierly since all modern BSDs have if_xname variables rendering it unnecessicary.	2006-08-04 21:27:40 +00:00
Sam Leffler	6b7330e2d4	Revise network interface cloning to take an optional opaque parameter that can specify configuration parameters: o rev cloner api's to add optional parameter block o add SIOCCREATE2 that accepts parameter data o rev vlan support to use new api (maintain old code) Reviewed by: arch@	2006-07-09 06:04:01 +00:00
Christian S.J. Peron	16d878cc99	Fix the following bpf(4) race condition which can result in a panic: (1) bpf peer attaches to interface netif0 (2) Packet is received by netif0 (3) ifp->if_bpf pointer is checked and handed off to bpf (4) bpf peer detaches from netif0 resulting in ifp->if_bpf being initialized to NULL. (5) ifp->if_bpf is dereferenced by bpf machinery (6) Kaboom This race condition likely explains the various different kernel panics reported around sending SIGINT to tcpdump or dhclient processes. But really this race can result in kernel panics anywhere you have frequent bpf attach and detach operations with high packet per second load. Summary of changes: - Remove the bpf interface's "driverp" member - When we attach bpf interfaces, we now set the ifp->if_bpf member to the bpf interface structure. Once this is done, ifp->if_bpf should never be NULL. [1] - Introduce bpf_peers_present function, an inline operation which will do a lockless read bpf peer list associated with the interface. It should be noted that the bpf code will pickup the bpf_interface lock before adding or removing bpf peers. This should serialize the access to the bpf descriptor list, removing the race. - Expose the bpf_if structure in bpf.h so that the bpf_peers_present function can use it. This also removes the struct bpf_if; hack that was there. - Adjust all consumers of the raw if_bpf structure to use bpf_peers_present Now what happens is: (1) Packet is received by netif0 (2) Check to see if bpf descriptor list is empty (3) Pickup the bpf interface lock (4) Hand packet off to process From the attach/detach side: (1) Pickup the bpf interface lock (2) Add/remove from bpf descriptor list Now that we are storing the bpf interface structure with the ifnet, there is is no need to walk the bpf interface list to locate the correct bpf interface. We now simply look up the interface, and initialize the pointer. This has a nice side effect of changing a bpf interface attach operation from O(N) (where N is the number of bpf interfaces), to O(1). [1] From now on, we can no longer check ifp->if_bpf to tell us whether or not we have any bpf peers that might be interested in receiving packets. In collaboration with: sam@ MFC after: 1 month	2006-06-02 19:59:33 +00:00

1 2

91 Commits