freebsd-skq

Author	SHA1	Message	Date
David Malone	91433904b5	Rather than calling mircotime() in catchpacket(), make catchpacket() take a timeval indicating when the packet was captured. Move microtime() to the calling functions and grab the timestamp as soon as we know that we're going to call catchpacket at least once. This means that we call microtime() once per matched packet, as opposed to once per matched packet per bpf listener. It also means that we return the same timestamp to all bpf listeners, rather than slightly different ones. It would be more accurate to call microtime() even earlier for all packets, as you have to grab (1+#listener) locks before you can determine if the packet will be logged. You could always grab a timestamp before the locks, but microtime() can be costly, so this didn't seem like a good idea. (I guess most ethernet interfaces will have a bpf listener these days because of dhclient. That means that we could be doing two bpf locks on most packets going through the interface.) PR: 71711	2006-07-24 15:42:04 +00:00
Christian S.J. Peron	4b19419ee7	Adjust descriptor locking to tell the kqueue subsystem that our descriptor is already locked. The reason to do this is to avoid two lock+unlock operations in a row. We need the lock here to serialize access to bd_pid for stats collection purposes. Drop the locks all together on detach, as they will be picked up by knlist_remove. This should fix a failed locking assertion when kqueue is being used with bpf descriptors. Discussed with: jmg	2006-07-03 20:02:06 +00:00
Christian S.J. Peron	19ba8395e1	Since we are doing some bpf(4) clean up, change a couple of function prototypes to be consistent. Also, ANSI'fy function definitions. There is no functional change here.	2006-06-15 15:39:12 +00:00
Christian S.J. Peron	7eae78a419	If bpf(4) has not been compiled into the kernel, initialize the bpf interface pointer to a zeroed, statically allocated bpf_if structure. This way the LIST_EMPTY() macro will always return true. This allows us to remove the additional unconditional memory reference for each packet in the fast path. Discussed with: sam	2006-06-14 02:23:28 +00:00
Christian S.J. Peron	16d878cc99	Fix the following bpf(4) race condition which can result in a panic: (1) bpf peer attaches to interface netif0 (2) Packet is received by netif0 (3) ifp->if_bpf pointer is checked and handed off to bpf (4) bpf peer detaches from netif0 resulting in ifp->if_bpf being initialized to NULL. (5) ifp->if_bpf is dereferenced by bpf machinery (6) Kaboom This race condition likely explains the various different kernel panics reported around sending SIGINT to tcpdump or dhclient processes. But really this race can result in kernel panics anywhere you have frequent bpf attach and detach operations with high packet per second load. Summary of changes: - Remove the bpf interface's "driverp" member - When we attach bpf interfaces, we now set the ifp->if_bpf member to the bpf interface structure. Once this is done, ifp->if_bpf should never be NULL. [1] - Introduce bpf_peers_present function, an inline operation which will do a lockless read bpf peer list associated with the interface. It should be noted that the bpf code will pickup the bpf_interface lock before adding or removing bpf peers. This should serialize the access to the bpf descriptor list, removing the race. - Expose the bpf_if structure in bpf.h so that the bpf_peers_present function can use it. This also removes the struct bpf_if; hack that was there. - Adjust all consumers of the raw if_bpf structure to use bpf_peers_present Now what happens is: (1) Packet is received by netif0 (2) Check to see if bpf descriptor list is empty (3) Pickup the bpf interface lock (4) Hand packet off to process From the attach/detach side: (1) Pickup the bpf interface lock (2) Add/remove from bpf descriptor list Now that we are storing the bpf interface structure with the ifnet, there is is no need to walk the bpf interface list to locate the correct bpf interface. We now simply look up the interface, and initialize the pointer. This has a nice side effect of changing a bpf interface attach operation from O(N) (where N is the number of bpf interfaces), to O(1). [1] From now on, we can no longer check ifp->if_bpf to tell us whether or not we have any bpf peers that might be interested in receiving packets. In collaboration with: sam@ MFC after: 1 month	2006-06-02 19:59:33 +00:00
Ruslan Ermilov	293c06a186	Fix -Wundef warnings.	2006-05-30 19:24:01 +00:00
Christian S.J. Peron	1fc9e38706	Pickup locks for the BPF interface structure. It's quite possible that bpf(4) descriptors can be added and removed on this interface while we are processing stats. MFC after: 2 weeks	2006-05-07 03:21:43 +00:00
Jung-uk Kim	848c454cc1	Add BPF Just-In-Time compiler support for ng_bpf(4). The sysctl is changed from net.bpf.jitter.enable to net.bpf_jitter.enable and this controls both bpf(4) and ng_bpf(4) now.	2005-12-07 21:30:47 +00:00
Jung-uk Kim	ae275efcae	Add experimental BPF Just-In-Time compiler for amd64 and i386. Use the following kernel configuration option to enable: options BPF_JITTER If you want to use bpf_filter() instead (e. g., debugging), do: sysctl net.bpf.jitter.enable=0 to turn it off. Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is partially supported because 1) no need, 2) avoid expensive m_copydata(9). Obtained from: WinPcap 3.1 (for i386)	2005-12-06 02:58:12 +00:00
Christian S.J. Peron	cb1d4f92ec	Protect PID initializations for statistics by the bpf descriptor locks. Also while we are here, protect the bpf descriptor during knlist_remove{add} operations. Discussed with: rwatson	2005-10-04 15:06:10 +00:00
Andre Oppermann	035ba19027	Undo a tad little optimization to bpf_mtap() introduced in rev. 1.95 which broke the correct handling of the BIOCGSEESENT flag in the bpf listener. PR: kern/56441 Submitted by: <vys at renet.ru> MFC after: 3 days	2005-09-14 16:37:05 +00:00
Christian S.J. Peron	b75a24a075	Instead of caching the PID which opened the bpf descriptor, continuously refresh the PID which has the descriptor open. The PID is refreshed in various operations like ioctl(2), kevent(2) or poll(2). This produces more accurate information about current bpf consumers. While we are here remove the bd_pcomm member of the bpf stats structure because now that we have an accurate PID we can lookup the via the kern.proc.pid sysctl variable. This is the trick that NetBSD decided to use to deal with this issue. Special care needs to be taken when MFC'ing this change, as we have made a change to the bpf stats structure. What will end up happening is we will leave the pcomm structure but just mark it as being un-used. This way we keep the ABI in tact. MFC after: 1 month Discussed with: Rui Paulo < rpaulo at NetBSD dot org >	2005-09-05 23:08:04 +00:00
Christian S.J. Peron	93e39f0b93	Introduce two new ioctl(2) commands, BIOCLOCK and BIOCSETWF. These commands enhance the security of bpf(4) by further relinquishing the privilege of the bpf(4) consumer (assuming the ioctl commands are being implemented). Once BIOCLOCK is executed, the device becomes locked which prevents the execution of ioctl(2) commands which can change the underly parameters of the bpf(4) device. An example might be the setting of bpf(4) filter programs or attaching to different network interfaces. BIOCSETWF can be used to set write filters for outgoing packets. Currently if a bpf(4) consumer is compromised, the bpf(4) descriptor can essentially be used as a raw socket, regardless of consumer's UID. Write filters give users the ability to constrain which packets can be sent through the bpf(4) descriptor. These features are currently implemented by a couple programs which came from OpenBSD, such as the new dhclient and pflogd. -Modify bpf_setf(9) to accept a "cmd" parameter. This will be used to specify whether a read or write filter is to be set. -Add a bpf(4) filter program as a parameter to bpf_movein(9) as we will run the filter program on the mbuf data once we move the packet in from user-space. -Rather than execute two uiomove operations, (one for the link header and the other for the packet data), execute one and manually copy the linker header into the sockaddr structure via bcopy. -Restructure bpf_setf to compensate for write filters, as well as read. -Adjust bpf(4) stats structures to include a bd_locked member. It should be noted that the FreeBSD and OpenBSD implementations differ a bit in the sense that we unconditionally enforce the lock, where OpenBSD enforces it only if the calling credential is not root. Idea from: OpenBSD Reviewed by: mlaier	2005-08-22 19:35:48 +00:00
Christian S.J. Peron	4ddfb5312a	Add missing braces around bpf_filter which were missed when I merged the bpfstat code. Pointed out by: iedowse Pointy hat to: csjp MFC after: 3 days	2005-08-18 22:30:52 +00:00
Robert Watson	6a113b3de7	Merge the dev_clone and dev_clone_cred event handlers into a single event handler, dev_clone, which accepts a credential argument. Implementors of the event can ignore it if they're not interested, and most do. This avoids having multiple event handler types and fall-back/precedence logic in devfs. This changes the kernel API for /dev cloning, and may affect third party packages containg cloning kernel modules. Requested by: phk MFC after: 3 days	2005-08-08 19:55:32 +00:00
Christian S.J. Peron	422a63da6e	Rather than hold a mutex over calls to SYSCTL_OUT allocate a temporary buffer then pass the array to user-space once we have dropped the lock. While we are here, drop an assertion which could result in a kernel panic under certain race conditions. Pointed out by: rwatson	2005-07-26 17:21:56 +00:00
Christian S.J. Peron	69f7644bc9	Introduce new sysctl variable: net.bpf.stats. This sysctl variable can be used to pass statistics regarding dropped, matched and received packet counts from the kernel to user-space. While we are here introduce a new counter for filtered or matched packets. We currently keep track of packets received or dropped by the bpf device, but not how many packets actually matched the bpf filter. -Introduce net.bpf.stats sysctl OID -Move sysctl variables after the function prototypes so we can reference bpf_stats_sysctl(9) without build errors. -Introduce bpf descriptor counter which is used mainly for sizing of the xbpf_d array. -Introduce a xbpf_d structure which will act as an external representation of the bpf_d structure. -Add a the following members to the bpfd structure: bd_fcount - Number of packets which matched bpf filter bd_pid - PID which opened the bpf device bd_pcomm - Process name which opened the device. It should be noted that it's possible that the process which opened the device could be long gone at the time of stats collection. An example might be a process that opens the bpf device forks then exits leaving the child process with the bpf fd. Reviewed by: mdodd	2005-07-24 17:21:17 +00:00
Suleiman Souhlal	571dcd15e2	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone	2005-07-01 16:28:32 +00:00
David Malone	01399f34a5	Fix some long standing bugs in writing to the BPF device attached to a DLT_NULL interface. In particular: 1) Consistently use type u_int32_t for the header of a DLT_NULL device - it continues to represent the address family as always. 2) In the DLT_NULL case get bpf_movein to store the u_int32_t in a sockaddr rather than in the mbuf, to be consistent with all the DLT types. 3) Consequently fix a bug in bpf_movein/bpfwrite which only permitted packets up to 4 bytes less than the MTU to be written. 4) Fix all DLT_NULL devices to have the code required to allow writing to their bpf devices. 5) Move the code to allow writing to if_lo from if_simloop to looutput, because it only applies to DLT_NULL devices but was being applied to other devices that use if_simloop possibly incorrectly. PR: 82157 Submitted by: Matthew Luckie <mjl@luckie.org.nz> Approved by: re (scottl)	2005-06-26 18:11:11 +00:00
Brooks Davis	fc74a9f93a	Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam	2005-06-10 16:49:24 +00:00
Christian S.J. Peron	0eb206049e	Change the maximum bpf program instruction limitation from being hard- coded at 512 (BPF_MAXINSNS) to being tunable. This is useful for users who wish to use complex or large bpf programs when filtering traffic. For now we will default it to BPF_MAXINSNS. I have tested bpf programs with well over 21,000 instructions without any problems. Discussed with: phk	2005-06-06 22:19:59 +00:00
Christian S.J. Peron	a3272e3ce3	-introduce net.bpf sysctl instead of the less intuitive debug.* debug.bpf_bufsize is now net.bpf.bufsize debug.bpf_maxbufsize is now net.bpf.maxbufsize -move function prototypes for bpf_drvinit and bpf_clone up to the top of the file with the others -assert bpfd lock in catchpacket() and bpf_wakeup() MFC after: 2 weeks	2005-05-04 03:09:28 +00:00
Poul-Henning Kamp	f4f6abcb4e	Explicitly hold a reference to the cdev we have just cloned. This closes the race where the cdev was reclaimed before it ever made it back to devfs lookup.	2005-03-31 12:19:44 +00:00
Brian Feldman	4549709fb5	You must selwakeup{,pri}() when closing a selectable object or the td->td_sel will get trashed and crash the system. Fix BPF's mistake in this area. MFC after: 1 day	2005-03-27 23:16:17 +00:00
John-Mark Gurney	7819da7944	fix a bug where bpf would try to wakeup before updating the state.. This was causing kqueue not to see the correct state and not wake up a process that is waiting... Submitted by: nCircle Network Security, Inc.	2005-03-02 21:59:39 +00:00
Gleb Smirnoff	31199c8463	Use NET_CALLOUT_MPSAFE macro.	2005-03-01 12:01:17 +00:00
Robert Watson	a8e93fb7ec	In bpf_setf(), protect against races between multiple user threads attempting to change the BPF filter on a BPF descriptor at the same time: retrieve the old filter pointer under the same locked region as setting the new pointer. MFC after: 3 days	2005-02-28 14:04:09 +00:00
Robert Watson	d1a67300e2	Update a comment describing bpf_iflist to indicate that the BPF interface structures correspond to specific link layers, so the same network interface may appear more than once. MFC after: 3 days	2005-02-28 12:35:52 +00:00
Warner Losh	c398230b64	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
Pawel Jakub Dawidek	77fc70c1ef	Fix mbuf leak. Submitted by: Johnny Eriksson <bygg@cafax.se> MFC after: 5 days	2004-12-27 15:53:44 +00:00
Poul-Henning Kamp	e76eee5562	Include fcntl.h Check O_NONBLOCK instead of IO_NDELAY Include uio.h Don't include vnode.h Don't include filedesc.h	2004-12-22 17:37:57 +00:00
John-Mark Gurney	86c9a45388	don't try to recurse on the bpf lock.. kqueue already locks the bpf lock now... Submitted by: Ed Maste of Sandvine Inc. MFC after: 1 week	2004-12-17 03:21:46 +00:00
Sam Leffler	3518d22073	Don't require a device to be marked up when issuing BIOCSETIF.	2004-12-08 05:40:02 +00:00
Brian Feldman	93daabdd83	Don't recurse the BPF descriptor lock during the BIOCSDLT operation (and panic). To try to finish making BPF safe, at the very least, the BPF descriptor lock really needs to change into a reader/writer lock that controls access to "settings," and a mutex that controls access to the selinfo/knote/callout. Also, use of callout_drain() instead of callout_stop() (which is really a much more widespread issue).	2004-10-06 04:25:37 +00:00
Robert Watson	46448b5a1b	Reformulate bpf_dettachd() to acquire the BIF_LOCK() as well as BPFD_LOCK() when removing a descriptor from an interface descriptor list. Hold both over the operation, and do a better job at maintaining the invariant that you can't find partially connected descriptors on an active interface descriptor list. This appears to close a race that resulted in the kernel performing a NULL pointer dereference when BPF sessions are detached during heavy network activity on SMP systems. RELENG_5 candidate.	2004-09-09 04:11:12 +00:00
Robert Watson	4a3feeaa86	Reformulate use of linked lists in 'struct bpf_d' and 'struct bpf_if' to use queue(3) list macros rather than hand-crafted lists. While here, move to doubly linked lists to eliminate iterating lists in order to remove entries. This change simplifies and clarifies the list logic in the BPF descriptor code as a first step towards revising the locking strategy. RELENG_5 candidate. Reviewed by: fenner	2004-09-09 00:19:27 +00:00
Robert Watson	d17d818425	Compare/set pointers using NULL not 0.	2004-09-09 00:11:50 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Robert Watson	46691dd8d7	Do a lockless read of the BPF interface structure descriptor list head before grabbing BPF locks to see if there are any entries in order to avoid the cost of locking if there aren't any. Avoids a mutex lock/ unlock for each packet received if there are no BPF listeners.	2004-08-05 02:37:36 +00:00
Robert Watson	572bde2aea	Prefer NULL to '0' when checking a pointer value.	2004-07-24 16:58:56 +00:00
Robert Watson	28b8605232	In the BPF and ethernet bridging code, don't allow callouts to execute without Giant if we're not debug.mpsafenet=1.	2004-07-05 16:28:31 +00:00
Poul-Henning Kamp	f3732fd15b	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Robert Watson	b8f9429d55	Switch to conditionally acquiring and dropping Giant around calls into ifp->if_output() basedd on debug.mpsafenet. That way once bpfwrite() can be called without Giant, it will acquire Giant (if desired) before entering the network stack.	2004-06-11 03:47:21 +00:00
Robert Watson	8240bf1e04	Un-staticize 'dst' sockaddr in the stack of bpfwrite() to prevent the need to synchronize access to the structure. I believe this should fit into the stack under the necessary circumstances, but if not we can either add synchronization or use a thread-local malloc for the duration.	2004-06-11 03:45:42 +00:00
Warner Losh	f36cfd49ad	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
Robert Watson	f747d2dd90	Grab Giant after MAC processing on outgoing packets being sent via BPF. Grab the BPF descriptor lock before entering MAC since the MAC Framework references BPF descriptor fields, including the BPF descriptor label. Submitted by: sam	2004-02-29 15:32:33 +00:00
Poul-Henning Kamp	dc08ffec87	Device megapatch 4/6: Introduce d_version field in struct cdevsw, this must always be initialized to D_VERSION. Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.	2004-02-21 21:10:55 +00:00
Poul-Henning Kamp	c9c7976f7f	Device megapatch 1/6: Free approx 86 major numbers with a mostly automatically generated patch. A number of strategic drivers have been left behind by caution, and a few because they still (ab)use their major number.	2004-02-21 19:42:58 +00:00
Dag-Erling Smørgrav	9e6108885c	Random style fixes and a comment update. No functional changes.	2004-02-16 18:19:15 +00:00

1 2 3 4

170 Commits