freebsd-skq

Author	SHA1	Message	Date
jkim	14f08fd627	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
bz	7fcd42cfee	MFP4: @177254 Add missing CURVNET_RESTORE() calls for multiple code paths, to stop leaking the currently cached vnet into callers and to the process. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-04-27 15:16:54 +00:00
kib	2861e20e63	Provide compat32 shims for bpf(4), except zero-copy facilities. bd_compat32 field of struct bpf_d is kept unconditionally to not impose the requirement of including "opt_compat.h" on all numerous users of bpfdesc.h. Submitted by: jhb (version for 6.x) Reviewed and tested by: emaste MFC after: 2 weeks	2010-04-25 16:43:41 +00:00
jkim	2b1849a3fc	Check the pointer to JIT binary filter before its de-allocation. Submitted by: Alexander Sack (asack at niksun dot com) MFC after: 3 days	2010-03-29 20:24:03 +00:00
jkim	5e802872af	Fix a style(9) nit.	2010-03-12 19:42:42 +00:00
jkim	df5e72589a	Tidy up callout for select(2) and read timeout. - Add a missing callout_drain(9) before the descriptor deallocation.[1] - Prefer callout_init_mtx(9) over callout_init(9) and let the callout subsystem handle the mutex for callout function. PR: kern/144453 Submitted by: Alexander Sack (asack at niksun dot com)[1] MFC after: 1 week	2010-03-12 19:14:58 +00:00
jkim	30767ce70b	Return partially filled buffer for non-blocking read(2) in non-immediate mode. PR: kern/143855	2010-02-20 00:19:21 +00:00
rwatson	07e1c4bd14	Remove unneeded blank line from bpf_drvinit(). MFC after: 3 days	2009-10-23 17:26:29 +00:00
rwatson	53eaed07bc	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
jkim	cda13d6d86	Always embed pointer to BPF JIT function in BPF descriptor to avoid inconsistency when opt_bpf.h is not included. Reviewed by: rwatson Approved by: re (rwatson)	2009-08-12 17:28:53 +00:00
rwatson	fb9ffed650	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
csjp	888867acdc	Implement the -z (zero counters) option for the various bpf counters. Add necessary changes to the kernel for this (basically introduce a bpf_zero_counters() function). As well, update the man page. MFC after: 1 month Discussed with: rwatson	2009-06-19 20:31:44 +00:00
bz	48dc6805f8	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
kib	e1cb2941d4	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland	2009-06-10 20:59:32 +00:00
rwatson	f4934662e5	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
sam	4578aab134	rev bpf attach/detach event api to include the dlt	2009-05-25 16:34:35 +00:00
sam	f487a64e06	add bpf_track eventhandler for monitoring bpf taps attached/detached Reviewed by: csjp	2009-05-18 17:18:40 +00:00
zec	d78a1b1a82	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
csjp	274c597d1c	Disable zerocopy by default for now. It's causing some problems in pcap consumers which fork after the shared pages have been setup. pflogd(8) is an example. The problem is understood and there is a fix coming in shortly. Folks who want to continue using it can do so by setting net.bpf.zerocopy_enable to 1. Discussed with: rwatson	2009-03-10 14:28:19 +00:00
rwatson	f18c279752	When resetting a BPF descriptor, properly check that zero-copy buffers are not currently owned by userspace before clearing or rotating them. Otherwise we may not play by the rules of the shared memory protocol, potentially corrupting packet data or causing userspace applications that are playing by the rules to spin due to being notified that a buffer is complete but the shared memory header not reflecting that. This behavior was seen with pflogd by a number of reporters; note that this fix is not sufficient to get pflogd properly working with zero-copy BPF, due to pflogd opening the BPF device before forking, leading to the shared memory buffer not being propery inherited in the privilege-separated child. We're still deciding how to fix that problem. This change exposes buffer-model specific strategy information in reset_d(), which will be fixed at a later date once we've decided how best to improve the BPF buffer abstraction. Reviewed by: csjp Reported by: keramida	2009-03-07 22:17:44 +00:00
csjp	d55a784c1d	Mark the bpf stats sysctl as being mpsafe. We do not require Giant here.	2009-03-07 17:07:29 +00:00
csjp	59b707a408	Switch the default buffer mode in bpf(4) to zero-copy buffers. Discussed with: rwatson	2009-03-02 19:42:01 +00:00
zec	95a15f5c84	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00
des	66f807ed8b	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
jkim	a19823099f	Make bpf_maxinsns visible from ng_bpf.c. Pass me the pointyhat, please.	2008-08-29 20:34:06 +00:00
ed	c392c35035	Change bpf(4) to use the cdevpriv API. Right now the bpf(4) driver uses the cloning API to generate /dev/bpf%u. When an application such as tcpdump needs a BPF, it opens /dev/bpf0, /dev/bpf1, etc. until it opens the first available device node. We used this approach, because our devfs implementation didn't allow per-descriptor data. Now that we can, make it use devfs_get_cdevpriv() to obtain the private data. To remain compatible with the existing implementation, add a symlink from /dev/bpf0 to /dev/bpf. I've already changed libpcap to compile with HAVE_CLONING_BPF, which makes it use /dev/bpf. There may be other applications in the base system (dhclient) that use the loop to obtain a valid bpf. Discussed on: src-committers Approved by: csjp	2008-08-13 15:41:21 +00:00
csjp	471e3f43d5	Annotate why we do not call BPF_CHECK_DIRECTION() in this tapping routine. There is no way for the caller to tell us which direction this packet is going. With the bpf_mtap{2} routines, we can check the interface pointer. MFC after: 2 weeks	2008-08-01 21:38:46 +00:00
jkim	5faf505c39	Allow injecting big packets via bpf(4) up to min(MTU, 16K-byte). MFC after: 1 week	2008-07-14 22:41:48 +00:00
dwmalone	ea74539fbc	Add a new ioctl for changing the read filter (BIOCSETFNR). This is just like BIOCSETF but it doesn't drop all the packets buffered on the discriptor and reset the statistics. Also, when setting the write filter, don't drop packets waiting to be read or reset the statistics. PR: 118486 Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz> MFC after: 1 month	2008-07-07 09:25:49 +00:00
csjp	4f71d026f8	Make sure we are clearing the ZBUF_FLAG_IMMUTABLE any time a free buffer is reclaimed by the kernel. This fixes a bug resulted in the kernel over writing packet data while user-space was still processing it when zerocopy is enabled. (Or a panic if invariants was enabled). Discussed with: rwatson	2008-07-05 20:11:28 +00:00
jhb	7f51c7d8b8	Set D_TRACKCLOSE to avoid a race in devfs that could lead to orphaned bpf devices never getting fully closed. MFC after: 3 days	2008-05-09 19:29:08 +00:00
jkim	2723f83a5a	Check packet directions more properly instead of just checking received interface is null. PR: kern/123138 Submitted by: Dmitry (hanabana at mail dot ru) MFC after: 1 week	2008-04-28 19:42:11 +00:00
jkim	1759259742	Revert the previous commit and use M_PROMISC flag instead. It is safer because it will never be used for outgoing packets.	2008-04-15 17:08:24 +00:00
jkim	477235af67	Remove M_SKIP_FIREWALL abuse and add more appropriate check. Pointyhat to: jkim Reported by: Eugene Grosbein (eugen at kuzbass dot ru) MFC after: 3 days	2008-04-15 00:50:01 +00:00
rwatson	6d3db5778b	Maintain and observe a ZBUF_FLAG_IMMUTABLE flag on zero-copy BPF buffer kernel descriptors, which is used to allow the buffer currently in the BPF "store" position to be assigned to userspace when it fills, even if userspace hasn't acknowledged the buffer in the "hold" position yet. To implement this, notify the buffer model when a buffer becomes full, and check that the store buffer is writable, not just for it being full, before trying to append new packet data. Shared memory buffers will be assigned to userspace at most once per fill, be it in the store or in the hold position. This removes the restriction that at most one shared memory can by owned by userspace, reducing the chances that userspace will need to call select() after acknowledging one buffer in order to wait for the next buffer when under high load. This more fully realizes the goal of zero system calls in order to process a high-speed packet stream from BPF. Update bpf.4 to reflect that both buffers may be owned by userspace at once; caution against assuming this.	2008-04-07 02:51:00 +00:00
ru	3b1bf8c2e9	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.	2008-03-25 09:39:02 +00:00
rwatson	59900e5206	Check for a NULL free buffer pointer in BPF before invoking bpf_canfreebuf() in order to avoid potentially calling a non-inlinable but trivial function in zero-copy buffer mode for every packet received when we couldn't free the buffer anyway. MFC after: 4 months	2008-03-25 07:41:33 +00:00
csjp	310e3f93dd	Introduce support for zero-copy BPF buffering, which reduces the overhead of packet capture by allowing a user process to directly "loan" buffer memory to the kernel rather than using read(2) to explicitly copy data from kernel address space. The user process will issue new BPF ioctls to set the shared memory buffer mode and provide pointers to buffers and their size. The kernel then wires and maps the pages into kernel address space using sf_buf(9), which on supporting architectures will use the direct map region. The current "buffered" access mode remains the default, and support for zero-copy buffers must, for the time being, be explicitly enabled using a sysctl for the kernel to accept requests to use it. The kernel and user process synchronize use of the buffers with atomic operations, avoiding the need for system calls under load; the user process may use select()/poll()/kqueue() to manage blocking while waiting for network data if the user process is able to consume data faster than the kernel generates it. Patchs to libpcap are available to allow libpcap applications to transparently take advantage of this support. Detailed information on the new API may be found in bpf(4), including specific atomic operations and memory barriers required to synchronize buffer use safely. These changes modify the base BPF implementation to (roughly) abstrac the current buffer model, allowing the new shared memory model to be added, and add new monitoring statistics for netstat to print. The implementation, with the exception of some monitoring hanges that break the netstat monitoring ABI for BPF, will be MFC'd. Zerocopy bpf buffers are still considered experimental are disabled by default. To experiment with this new facility, adjust the net.bpf.zerocopy_enable sysctl variable to 1. Changes to libpcap will be made available as a patch for the time being, and further refinements to the implementation are expected. Sponsored by: Seccuris Inc. In collaboration with: rwatson Tested by: pwood, gallatin MFC after: 4 months [1] [1] Certain portions will probably not be MFCed, specifically things that can break the monitoring ABI.	2008-03-24 13:49:17 +00:00
rwatson	877d7c65ba	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
rwatson	b3df56a7f6	Add comment that bpfread() has multi-threading issues. Fix minor white space nit.	2008-02-02 20:35:05 +00:00
rwatson	940f7f2284	Use __FBSDID() in the kernel BPF implementation. MFC after: 3 days	2007-12-25 13:24:02 +00:00
rwatson	d2a56e1c39	Remove trailing whitespace from lines in BPF. MFC after: 3 days	2007-12-23 14:10:33 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
csjp	e5337493b3	Make sure that we refresh the PID on read(2) and write(2) operations. This fixes the process portion of the bpf(4) stats if the peer forks into the background after it's opened the descriptor. This bug results in the following behavior for netstat -B: # netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command netstat: kern.proc.pid failed: No such process 78023 em0 p--s-- 2237404 43119 2237404 13986 0 ?????? MFC after: 1 week	2007-10-12 14:58:34 +00:00
thompsa	16614f83a0	Check for multicast destination on bpf injected packets and update the M_*CAST flags, the absense of these flags causes problems in other areas such as bridging which expect them to be correct. At the moment only Ethernet DLTs are checked. Reviewed by: bms, csjp, sam Approved by: re (bmah)	2007-09-10 00:03:06 +00:00
rwatson	23574c8673	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
rwatson	a62dbe240a	Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove definition of NET_CALLOUT_MPSAFE, which is no longer required now that debug.mpsafenet has been removed. The once over: bz Approved by: re (kensmith)	2007-07-28 07:31:30 +00:00
csjp	6711a44482	Silence some gcc 4 warnings. It is expected that the bpf_movein() routine will intialize the the header length and re-initialize the mbuf pointer to reference the mbuf that is allocated after moving user supplied packet data in.	2007-06-17 21:51:43 +00:00
csjp	bc108718c3	- Conditionally pickup Giant around the network interface ioctl routines if we are running with !mpsafenet - Change un-conditional Giant acquisition around ifpromisc to occur only if we are running with !mpsafenet With these locking bits in place, we can now remove the Giant requirement from BPF, so drop the D_NEEDGIANT device flag. This change removes Giant acquisitions around BPF device handlers (read, write, ioctl etc). MFC after: 1 month Discussed with: rwatson	2007-06-15 02:53:51 +00:00
jkim	2bd7382fdc	Add three new ioctl(2) commands for bpf(4). - BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining whether incoming, outgoing, or all packets on the interface should be returned by BPF. Set to BPF_D_IN to see only incoming packets on the interface. Set to BPF_D_INOUT to see packets originating locally and remotely on the interface. Set to BPF_D_OUT to see only outgoing packets on the interface. This setting is initialized to BPF_D_INOUT by default. BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but kept for backward compatibility. - BIOCFEEDBACK sets packet feedback mode. This allows injected packets to be fed back as input to the interface when output via the interface is successful. When BPF_D_INOUT direction is set, injected outgoing packet is not returned by BPF to avoid duplication. This flag is initialized to zero by default. Note that libpcap has been modified to support BPF_D_OUT direction for pcap_setdirection(3) and PCAP_D_OUT direction is functional now. Reviewed by: rwatson	2007-02-26 22:24:14 +00:00

1 2 3 4 5

225 Commits