Commit Graph

289 Commits

Author SHA1 Message Date
Alexander V. Chernikov
afa85850e7 Fix old panic when BPF consumer attaches to destroying interface.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'

Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.

Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:17:29 +00:00
Alexander V. Chernikov
6c74ff0ea6 Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@)
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock

Add several forgotten assertions (thanks to adrian@)

Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.

Approved by:       kib(mentor)
Tested by:         gnn
MFC in:            4 weeks
2012-05-21 22:13:48 +00:00
Alexander V. Chernikov
9431cc1696 Fix build broken by r233938.
Pointed by:     David Wolfskill <david@catwhisker.org>
Approved by:    kib (mentor)
Pointy hat to:  melifaro
2012-04-06 13:34:19 +00:00
Alexander V. Chernikov
51ec1eb70d - Improve performace for writer-only BPF users.
Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send
raw ethernet frames. The only FreeBSD interface that can be used to send raw frames
is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses
BPF only to send data. This leads us to the situation when software like cdpd,
being run on high-traffic-volume interface significantly reduces overall performance
since we have to acquire additional locks for every packet.

Here we add sysctl that changes BPF behavior in the following way:
If program came and opens BPF socket without explicitly specifyin read filter we
assume it to be write-only and add it to special writer-only per-interface list.
This makes bpf_peers_present() return 0, so no additional overhead is introduced.
After filter is supplied, descriptor is added to original per-interface list permitting
packets to be captured.

Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of
setting snap length.

Fortunately, most programs explicitly sets (event catch-all) filter after that.
tcpdump(1) is a good example.

So a bit hackis approach is taken: we upgrade description only after second
BIOCSETF is received.

Sysctl is named net.bpf.optimize_writers and is turned off by default.

- While here, document all sysctl variables in bpf.4

Sponsored by Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      4 weeks
2012-04-06 06:55:21 +00:00
Alexander V. Chernikov
e4b3229aa5 - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00
Juli Mallett
9624d94701 o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI.  This mostly follows nwhitehorn's lead in implementing
   COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
   32-bit ABI being used.  Since the MIPS port is relatively-new, even the 32-bit
   ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
   with 32-bit compatibility, then, disable some code which assumes otherwise
   wrongly when built for MIPS.  A more general macro to check in this case would
   seem like a good idea eventually.  If someone adds support for using n32
   userland with n64 kernels on MIPS, then they will have to add a variety of
   flags related to each piece of the ABI that can vary.  That's probably the
   right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
   freebsd32 compat code.  Probably this should be generalized at some point.

Reviewed by:	gonzo
2012-03-03 08:19:18 +00:00
Lawrence Stewart
9a7e6bac47 Consumers of bpfdetach() expect it to remove all bpf_if structs from the
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.

Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.

Whilst here, also:

- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
  exist in the list with a NULL ifnet pointer.

- Except when INVARIANTS is in the kernel config, silently ignore the case where
  no bpf_if referencing the specified ifnet is found, as it is harmless and does
  not require user attention.

Reviewed by:	csjp
MFC after:	1 week
2012-01-10 00:48:29 +00:00
Lawrence Stewart
253a3814d4 Revert r228986 until it can be reworked to avoid panicing the kernel when the
same interface is attached multiple times with different DLTs, as is done in
net80211 for example.

Reported by:	adrian
2011-12-31 07:21:28 +00:00
Lawrence Stewart
0f89fc22f3 - Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one
aspect of time stamp configuration per interface rather than per BPF
  descriptor. Prior to this, the order in which BPF devices were opened and the
  per descriptor time stamp configuration settings could cause non-deterministic
  and unintended behaviour with respect to time stamping. With the new scheme, a
  BPF attached interface's tscfg sysctl entry can be set to "default", "none",
  "fast", "normal" or "external". Setting "default" means use the system default
  option (set with the net.bpf.tscfg.default sysctl), "none" means do not
  generate time stamps for tapped packets, "fast" means generate time stamps for
  tapped packets using a hz granularity system clock read, "normal" means
  generate time stamps for tapped packets using a full timecounter granularity
  system clock read and "external" (currently unimplemented) means use the time
  stamp provided with the packet from an underlying source.

- Utilise the recently introduced sysclock_getsnapshot() and
  sysclock_snap2bintime() KPIs to ensure the system clock is only read once per
  packet, regardless of the number of BPF descriptors and time stamp formats
  requested. Use the per BPF attached interface time stamp configuration to
  control if sysclock_getsnapshot() is called and whether the system clock read
  is fast or normal. The per BPF descriptor time stamp configuration is then
  used to control how the system clock snapshot is converted to a bintime by
  sysclock_snap2bintime().

- Remove all FAST related BPF descriptor flag variants. Performing a "fast"
  read of the system clock is now controlled per BPF attached interface using
  the net.bpf.tscfg sysctl tree.

- Update the bpf.4 man page.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

In collaboration with:	Julien Ridoux (jridoux at unimelb edu au)
2011-12-30 08:57:58 +00:00
Lawrence Stewart
3e47c78798 Revert r227778 in preparation for committing reworked patches in its place. 2011-11-29 12:55:26 +00:00
Lawrence Stewart
b6f1c7db32 - When feed-forward clock support is compiled in, change the BPF header to
contain both a regular timestamp obtained from the system clock and the
  current feed-forward ffcounter value. This enables new possibilities including
  comparison of timekeeping performance and timestamp correction during post
  processing.

- Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping
  packets using the feedback or feed-forward system clock.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-21 04:17:24 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Attilio Rao
6aba400a70 Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
Jung-uk Kim
d0d7bcdf92 Fix a typo in a comment.
Submitted by:	afiveg
2010-09-16 18:37:33 +00:00
Jung-uk Kim
547d94bde3 Implement flexible BPF timestamping framework.
- Allow setting format, resolution and accuracy of BPF time stamps per
listener.  Previously, we were only able to use microtime(9).  Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command.  Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts.  bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp.  The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries.  On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long.  On 32-bit platforms, the size may
increase by 8 bytes.  For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested.  However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver.  Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by:	net@
2010-06-15 19:28:44 +00:00
Bjoern A. Zeeb
1b610a749e MFP4: @177254
Add missing CURVNET_RESTORE() calls for multiple code paths, to stop
leaking the currently cached vnet into callers and to the process.

Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-04-27 15:16:54 +00:00
Konstantin Belousov
fc0a61a401 Provide compat32 shims for bpf(4), except zero-copy facilities.
bd_compat32 field of struct bpf_d is kept unconditionally to not
impose the requirement of including "opt_compat.h" on all numerous
users of bpfdesc.h.

Submitted by:	jhb (version for 6.x)
Reviewed and tested by:	emaste
MFC after:	2 weeks
2010-04-25 16:43:41 +00:00
Jung-uk Kim
704858479c Check the pointer to JIT binary filter before its de-allocation.
Submitted by:	Alexander Sack (asack at niksun dot com)
MFC after:	3 days
2010-03-29 20:24:03 +00:00
Jung-uk Kim
5d7af3a1cc Fix a style(9) nit. 2010-03-12 19:42:42 +00:00
Jung-uk Kim
9fee1bd1d8 Tidy up callout for select(2) and read timeout.
- Add a missing callout_drain(9) before the descriptor deallocation.[1]
- Prefer callout_init_mtx(9) over callout_init(9) and let the callout
subsystem handle the mutex for callout function.

PR:		kern/144453
Submitted by:	Alexander Sack (asack at niksun dot com)[1]
MFC after:	1 week
2010-03-12 19:14:58 +00:00
Jung-uk Kim
8df67d77ed Return partially filled buffer for non-blocking read(2)
in non-immediate mode.

PR:		kern/143855
2010-02-20 00:19:21 +00:00
Robert Watson
974e99b008 Remove unneeded blank line from bpf_drvinit().
MFC after:	3 days
2009-10-23 17:26:29 +00:00
Robert Watson
e76d823b81 Use C99 initialization for struct filterops.
Obtained from:	Mac OS X
Sponsored by:	Apple Inc.
MFC after:	3 weeks
2009-09-12 20:03:45 +00:00
Jung-uk Kim
a36599cce7 Always embed pointer to BPF JIT function in BPF descriptor
to avoid inconsistency when opt_bpf.h is not included.

Reviewed by:	rwatson
Approved by:	re (rwatson)
2009-08-12 17:28:53 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Christian S.J. Peron
0e37f3e196 Implement the -z (zero counters) option for the various bpf counters.
Add necessary changes to the kernel for this (basically introduce a
bpf_zero_counters() function).  As well, update the man page.

MFC after:	1 month
Discussed with:	rwatson
2009-06-19 20:31:44 +00:00
Bjoern A. Zeeb
ebd8672cc3 Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.
2009-06-17 15:01:01 +00:00
Konstantin Belousov
d8b0556c6d Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Sam Leffler
5ce8d9708c rev bpf attach/detach event api to include the dlt 2009-05-25 16:34:35 +00:00
Sam Leffler
b743c31009 add bpf_track eventhandler for monitoring bpf taps attached/detached
Reviewed by:	csjp
2009-05-18 17:18:40 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Christian S.J. Peron
ffeeb9241a Disable zerocopy by default for now. It's causing some problems in pcap
consumers which fork after the shared pages have been setup.  pflogd(8)
is an example.  The problem is understood and there is a fix coming in
shortly.

Folks who want to continue using it can do so by setting

net.bpf.zerocopy_enable

to 1.

Discussed with:	rwatson
2009-03-10 14:28:19 +00:00
Robert Watson
e82669d99b When resetting a BPF descriptor, properly check that zero-copy buffers
are not currently owned by userspace before clearing or rotating them.

Otherwise we may not play by the rules of the shared memory protocol,
potentially corrupting packet data or causing userspace applications
that are playing by the rules to spin due to being notified that a
buffer is complete but the shared memory header not reflecting that.

This behavior was seen with pflogd by a number of reporters; note that
this fix is not sufficient to get pflogd properly working with
zero-copy BPF, due to pflogd opening the BPF device before forking,
leading to the shared memory buffer not being propery inherited in the
privilege-separated child.  We're still deciding how to fix that
problem.

This change exposes buffer-model specific strategy information in
reset_d(), which will be fixed at a later date once we've decided how
best to improve the BPF buffer abstraction.

Reviewed by:	csjp
Reported by:	keramida
2009-03-07 22:17:44 +00:00
Christian S.J. Peron
927094113e Mark the bpf stats sysctl as being mpsafe. We do not require
Giant here.
2009-03-07 17:07:29 +00:00
Christian S.J. Peron
f32b457b6e Switch the default buffer mode in bpf(4) to zero-copy buffers.
Discussed with:	rwatson
2009-03-02 19:42:01 +00:00
Marko Zec
97021c2464 Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by:  bz, julian
Approved by:  julian (mentor)
Obtained from:        //depot/projects/vimage-commit2/...
X-MFC after:  never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-11-26 22:32:07 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Jung-uk Kim
12dc95824b Make bpf_maxinsns visible from ng_bpf.c.
Pass me the pointyhat, please.
2008-08-29 20:34:06 +00:00
Ed Schouten
136600fe59 Change bpf(4) to use the cdevpriv API.
Right now the bpf(4) driver uses the cloning API to generate /dev/bpf%u.
When an application such as tcpdump needs a BPF, it opens /dev/bpf0,
/dev/bpf1, etc. until it opens the first available device node. We used
this approach, because our devfs implementation didn't allow
per-descriptor data.

Now that we can, make it use devfs_get_cdevpriv() to obtain the private
data. To remain compatible with the existing implementation, add a
symlink from /dev/bpf0 to /dev/bpf. I've already changed libpcap to
compile with HAVE_CLONING_BPF, which makes it use /dev/bpf. There may be
other applications in the base system (dhclient) that use the loop to
obtain a valid bpf.

Discussed on:	src-committers
Approved by:	csjp
2008-08-13 15:41:21 +00:00
Christian S.J. Peron
a05cf8c6db Annotate why we do not call BPF_CHECK_DIRECTION() in this tapping routine.
There is no way for the caller to tell us which direction this packet is
going.  With the bpf_mtap{2} routines, we can check the interface pointer.

MFC after:	2 weeks
2008-08-01 21:38:46 +00:00
Jung-uk Kim
968c88bc75 Allow injecting big packets via bpf(4) up to min(MTU, 16K-byte).
MFC after:	1 week
2008-07-14 22:41:48 +00:00
David Malone
f11c35082b Add a new ioctl for changing the read filter (BIOCSETFNR). This is
just like BIOCSETF but it doesn't drop all the packets buffered on
the discriptor and reset the statistics.

Also, when setting the write filter, don't drop packets waiting to
be read or reset the statistics.

PR:		118486
Submitted by:	Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after:	1 month
2008-07-07 09:25:49 +00:00
Christian S.J. Peron
29f612ec71 Make sure we are clearing the ZBUF_FLAG_IMMUTABLE any time a free buffer
is reclaimed by the kernel.  This fixes a bug resulted in the kernel
over writing packet data while user-space was still processing it when
zerocopy is enabled.  (Or a panic if invariants was enabled).

Discussed with:	rwatson
2008-07-05 20:11:28 +00:00
John Baldwin
7fb547c7f5 Set D_TRACKCLOSE to avoid a race in devfs that could lead to orphaned bpf
devices never getting fully closed.

MFC after:	3 days
2008-05-09 19:29:08 +00:00
Jung-uk Kim
f81a2a4956 Check packet directions more properly instead of just checking received
interface is null.

PR:		kern/123138
Submitted by:	Dmitry (hanabana at mail dot ru)
MFC after:	1 week
2008-04-28 19:42:11 +00:00
Jung-uk Kim
8cd892f752 Revert the previous commit and use M_PROMISC flag instead.
It is safer because it will never be used for outgoing packets.
2008-04-15 17:08:24 +00:00
Jung-uk Kim
9a3a0f9278 Remove M_SKIP_FIREWALL abuse and add more appropriate check.
Pointyhat to:	jkim
Reported by:	Eugene Grosbein (eugen at kuzbass dot ru)
MFC after:	3 days
2008-04-15 00:50:01 +00:00
Robert Watson
a7a91e6592 Maintain and observe a ZBUF_FLAG_IMMUTABLE flag on zero-copy BPF
buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet.  To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data.  Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.

This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load.  This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.

Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.
2008-04-07 02:51:00 +00:00
Ruslan Ermilov
ea26d58729 Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
2008-03-25 09:39:02 +00:00
Robert Watson
fa0c2b3474 Check for a NULL free buffer pointer in BPF before invoking
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.

MFC after:	4 months
2008-03-25 07:41:33 +00:00
Christian S.J. Peron
4d621040ff Introduce support for zero-copy BPF buffering, which reduces the
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.

The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.

The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.

These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.

Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.

Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.

Sponsored by:		Seccuris Inc.
In collaboration with:	rwatson
Tested by:		pwood, gallatin
MFC after:		4 months [1]

[1] Certain portions will probably not be MFCed, specifically things
    that can break the monitoring ABI.
2008-03-24 13:49:17 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Robert Watson
31b32e6dc3 Add comment that bpfread() has multi-threading issues.
Fix minor white space nit.
2008-02-02 20:35:05 +00:00
Robert Watson
c786600793 Use __FBSDID() in the kernel BPF implementation.
MFC after:	3 days
2007-12-25 13:24:02 +00:00
Robert Watson
2a0a392e1c Remove trailing whitespace from lines in BPF.
MFC after:	3 days
2007-12-23 14:10:33 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Christian S.J. Peron
50ed6e0713 Make sure that we refresh the PID on read(2) and write(2) operations.
This fixes the process portion of the bpf(4) stats if the peer forks
into the background after it's opened the descriptor.  This bug
results in the following behavior for netstat -B:

# netstat -B
  Pid  Netif  Flags      Recv      Drop     Match Sblen Hblen Command
netstat: kern.proc.pid failed: No such process
78023    em0 p--s--   2237404     43119   2237404 13986     0 ??????

MFC after:	1 week
2007-10-12 14:58:34 +00:00
Andrew Thompson
cb44b6dfe8 Check for multicast destination on bpf injected packets and update the M_*CAST
flags, the absense of these flags causes problems in other areas such as
bridging which expect them to be correct.

At the moment only Ethernet DLTs are checked.

Reviewed by:	bms, csjp, sam
Approved by:	re (bmah)
2007-09-10 00:03:06 +00:00
Robert Watson
0bf686c125 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
Robert Watson
c6b2899785 Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove
definition of NET_CALLOUT_MPSAFE, which is no longer required now that
debug.mpsafenet has been removed.

The once over:	bz
Approved by:	re (kensmith)
2007-07-28 07:31:30 +00:00
Christian S.J. Peron
d83e603ac7 Silence some gcc 4 warnings. It is expected that the bpf_movein() routine
will intialize the the header length and re-initialize the mbuf pointer
to reference the mbuf that is allocated after moving user supplied packet
data in.
2007-06-17 21:51:43 +00:00
Christian S.J. Peron
5632c9822a - Conditionally pickup Giant around the network interface
ioctl routines if we are running with !mpsafenet
- Change un-conditional Giant acquisition around ifpromisc
  to occur only if we are running with !mpsafenet

With these locking bits in place, we can now remove the Giant
requirement from BPF, so drop the D_NEEDGIANT device flag.
This change removes Giant acquisitions around BPF device
handlers (read, write, ioctl etc).

MFC after:	1 month
Discussed with:	rwatson
2007-06-15 02:53:51 +00:00
Jung-uk Kim
560a54e10c Add three new ioctl(2) commands for bpf(4).
- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF.  Set to BPF_D_IN to see only incoming packets on the
interface.  Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface.  Set to BPF_D_OUT to see only outgoing
packets on the interface.  This setting is initialized to BPF_D_INOUT
by default.  BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.

- BIOCFEEDBACK sets packet feedback mode.  This allows injected packets
to be fed back as input to the interface when output via the interface is
successful.  When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication.  This flag is initialized to
zero by default.

Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.

Reviewed by:	rwatson
2007-02-26 22:24:14 +00:00
Robert Watson
5d1f828354 Remove slightly dubious comment; add descriptive strings for several
sysctls.

MFC after:	3 days
2007-01-28 16:38:44 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Robert Watson
a359443290 Since bpf_allocbufs() uses malloc() with M_WAITOK, don't check return
values for NULL or return an error state.  Assert that all three bpf
buffer pointers are NULL before starting.

MFC after:	1 week
2006-08-09 16:30:26 +00:00
Sam Leffler
246b546762 add support for 802.11 packet injection via bpf
Together with:	Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Reviewed by:	arch@
MFC after:	1 month
2006-07-26 03:15:16 +00:00
David Malone
91433904b5 Rather than calling mircotime() in catchpacket(), make catchpacket()
take a timeval indicating when the packet was captured. Move
microtime() to the calling functions and grab the timestamp as soon
as we know that we're going to call catchpacket at least once.

This means that we call microtime() once per matched packet, as
opposed to once per matched packet per bpf listener. It also means
that we return the same timestamp to all bpf listeners, rather than
slightly different ones.

It would be more accurate to call microtime() even earlier for all
packets, as you have to grab (1+#listener) locks before you can
determine if the packet will be logged. You could always grab a
timestamp before the locks, but microtime() can be costly, so this
didn't seem like a good idea.

(I guess most ethernet interfaces will have a bpf listener these
days because of dhclient. That means that we could be doing two bpf
locks on most packets going through the interface.)

PR:		71711
2006-07-24 15:42:04 +00:00
Christian S.J. Peron
4b19419ee7 Adjust descriptor locking to tell the kqueue subsystem that our descriptor is
already locked. The reason to do this is to avoid two lock+unlock operations
in a row. We need the lock here to serialize access to bd_pid for stats
collection purposes.

Drop the locks all together on detach, as they will be picked up by
knlist_remove.

This should fix a failed locking assertion when kqueue is being used with bpf
descriptors.

Discussed with:	jmg
2006-07-03 20:02:06 +00:00
Christian S.J. Peron
19ba8395e1 Since we are doing some bpf(4) clean up, change a couple of function prototypes
to be consistent. Also, ANSI'fy function definitions. There is no functional
change here.
2006-06-15 15:39:12 +00:00
Christian S.J. Peron
7eae78a419 If bpf(4) has not been compiled into the kernel, initialize the bpf interface
pointer to a zeroed, statically allocated bpf_if structure. This way the
LIST_EMPTY() macro will always return true. This allows us to remove the
additional unconditional memory reference for each packet in the fast path.

Discussed with:	sam
2006-06-14 02:23:28 +00:00
Christian S.J. Peron
16d878cc99 Fix the following bpf(4) race condition which can result in a panic:
(1) bpf peer attaches to interface netif0
	(2) Packet is received by netif0
	(3) ifp->if_bpf pointer is checked and handed off to bpf
	(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
	    initialized to NULL.
	(5) ifp->if_bpf is dereferenced by bpf machinery
	(6) Kaboom

This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.

Summary of changes:

- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
  bpf interface structure. Once this is done, ifp->if_bpf should never be
  NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
  a lockless read bpf peer list associated with the interface. It should
  be noted that the bpf code will pickup the bpf_interface lock before adding
  or removing bpf peers. This should serialize the access to the bpf descriptor
  list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
  can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present

Now what happens is:

	(1) Packet is received by netif0
	(2) Check to see if bpf descriptor list is empty
	(3) Pickup the bpf interface lock
	(4) Hand packet off to process

From the attach/detach side:

	(1) Pickup the bpf interface lock
	(2) Add/remove from bpf descriptor list

Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).

[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
    not we have any bpf peers that might be interested in receiving packets.

In collaboration with:	sam@
MFC after:	1 month
2006-06-02 19:59:33 +00:00
Ruslan Ermilov
293c06a186 Fix -Wundef warnings. 2006-05-30 19:24:01 +00:00
Christian S.J. Peron
1fc9e38706 Pickup locks for the BPF interface structure. It's quite possible that
bpf(4) descriptors can be added and removed on this interface while we
are processing stats.

MFC after:	2 weeks
2006-05-07 03:21:43 +00:00
Jung-uk Kim
848c454cc1 Add BPF Just-In-Time compiler support for ng_bpf(4).
The sysctl is changed from net.bpf.jitter.enable to net.bpf_jitter.enable
and this controls both bpf(4) and ng_bpf(4) now.
2005-12-07 21:30:47 +00:00
Jung-uk Kim
ae275efcae Add experimental BPF Just-In-Time compiler for amd64 and i386.
Use the following kernel configuration option to enable:

	options BPF_JITTER

If you want to use bpf_filter() instead (e. g., debugging), do:

	sysctl net.bpf.jitter.enable=0

to turn it off.

Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is
partially supported because 1) no need, 2) avoid expensive m_copydata(9).

Obtained from:	WinPcap 3.1 (for i386)
2005-12-06 02:58:12 +00:00
Christian S.J. Peron
cb1d4f92ec Protect PID initializations for statistics by the bpf descriptor
locks. Also while we are here, protect the bpf descriptor during
knlist_remove{add} operations.

Discussed with:	rwatson
2005-10-04 15:06:10 +00:00
Andre Oppermann
035ba19027 Undo a tad little optimization to bpf_mtap() introduced in rev. 1.95
which broke the correct handling of the BIOCGSEESENT flag in the bpf
listener.

PR:		kern/56441
Submitted by:	<vys at renet.ru>
MFC after:	3 days
2005-09-14 16:37:05 +00:00
Christian S.J. Peron
b75a24a075 Instead of caching the PID which opened the bpf descriptor, continuously
refresh the PID which has the descriptor open. The PID is refreshed in various
operations like ioctl(2), kevent(2) or poll(2). This produces more accurate
information about current bpf consumers. While we are here remove the bd_pcomm
member of the bpf stats structure because now that we have an accurate PID we
can lookup the via the kern.proc.pid sysctl variable. This is the trick that
NetBSD decided to use to deal with this issue.

Special care needs to be taken when MFC'ing this change, as we have made a
change to the bpf stats structure. What will end up happening is we will leave
the pcomm structure but just mark it as being un-used. This way we keep the ABI
in tact.

MFC after:	1 month
Discussed with:	Rui Paulo < rpaulo at NetBSD dot org >
2005-09-05 23:08:04 +00:00
Christian S.J. Peron
93e39f0b93 Introduce two new ioctl(2) commands, BIOCLOCK and BIOCSETWF. These commands
enhance the security of bpf(4) by further relinquishing the privilege of
the bpf(4) consumer (assuming the ioctl commands are being implemented).

Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underly parameters of the
bpf(4) device. An example might be the setting of bpf(4) filter programs or
attaching to different network interfaces.

BIOCSETWF can be used to set write filters for outgoing packets. Currently if
a bpf(4) consumer is compromised, the bpf(4) descriptor can essentially be used
as a raw socket, regardless of consumer's UID. Write filters give users the
ability to constrain which packets can be sent through the bpf(4) descriptor.

These features are currently implemented by a couple programs which came from
OpenBSD, such as the new dhclient and pflogd.

-Modify bpf_setf(9) to accept a "cmd" parameter. This will be used to specify
 whether a read or write filter is to be set.
-Add a bpf(4) filter program as a parameter to bpf_movein(9) as we will run the
 filter program on the mbuf data once we move the packet in from user-space.
-Rather than execute two uiomove operations, (one for the link header and the
 other for the packet data), execute one and manually copy the linker header
 into the sockaddr structure via bcopy.
-Restructure bpf_setf to compensate for write filters, as well as read.
-Adjust bpf(4) stats structures to include a bd_locked member.

It should be noted that the FreeBSD and OpenBSD implementations differ a bit in
the sense that we unconditionally enforce the lock, where OpenBSD enforces it
only if the calling credential is not root.

Idea from:	OpenBSD
Reviewed by:	mlaier
2005-08-22 19:35:48 +00:00
Christian S.J. Peron
4ddfb5312a Add missing braces around bpf_filter which were missed when I
merged the bpfstat code.

Pointed out by:	iedowse
Pointy hat to:	csjp
MFC after:	3 days
2005-08-18 22:30:52 +00:00
Robert Watson
6a113b3de7 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
Christian S.J. Peron
422a63da6e Rather than hold a mutex over calls to SYSCTL_OUT allocate a
temporary buffer then pass the array to user-space once we have
dropped the lock.

While we are here, drop an assertion which could result in a
kernel panic under certain race conditions.

Pointed out by:	rwatson
2005-07-26 17:21:56 +00:00
Christian S.J. Peron
69f7644bc9 Introduce new sysctl variable: net.bpf.stats. This sysctl variable can
be used to pass statistics regarding dropped, matched and received
packet counts from the kernel to user-space. While we are here
introduce a new counter for filtered or matched packets. We currently
keep track of packets received or dropped by the bpf device, but not
how many packets actually matched the bpf filter.

-Introduce net.bpf.stats sysctl OID
-Move sysctl variables after the function prototypes so we can
 reference bpf_stats_sysctl(9) without build errors.
-Introduce bpf descriptor counter which is used mainly for sizing
 of the xbpf_d array.
-Introduce a xbpf_d structure which will act as an external
 representation of the bpf_d structure.
-Add a the following members to the bpfd structure:

	bd_fcount	- Number of packets which matched bpf filter
	bd_pid		- PID which opened the bpf device
	bd_pcomm	- Process name which opened the device.

It should be noted that it's possible that the process which opened
the device could be long gone at the time of stats collection. An
example might be a process that opens the bpf device forks then exits
leaving the child process with the bpf fd.

Reviewed by:	mdodd
2005-07-24 17:21:17 +00:00
Suleiman Souhlal
571dcd15e2 Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone
2005-07-01 16:28:32 +00:00
David Malone
01399f34a5 Fix some long standing bugs in writing to the BPF device attached to
a DLT_NULL interface. In particular:

        1) Consistently use type u_int32_t for the header of a
           DLT_NULL device - it continues to represent the address
           family as always.
        2) In the DLT_NULL case get bpf_movein to store the u_int32_t
           in a sockaddr rather than in the mbuf, to be consistent
           with all the DLT types.
        3) Consequently fix a bug in bpf_movein/bpfwrite which
           only permitted packets up to 4 bytes less than the MTU
           to be written.
        4) Fix all DLT_NULL devices to have the code required to
           allow writing to their bpf devices.
        5) Move the code to allow writing to if_lo from if_simloop
           to looutput, because it only applies to DLT_NULL devices
           but was being applied to other devices that use if_simloop
           possibly incorrectly.

PR:		82157
Submitted by:	Matthew Luckie <mjl@luckie.org.nz>
Approved by:	re (scottl)
2005-06-26 18:11:11 +00:00
Brooks Davis
fc74a9f93a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
Christian S.J. Peron
0eb206049e Change the maximum bpf program instruction limitation from being hard-
coded at 512 (BPF_MAXINSNS) to being tunable. This is useful for users
who wish to use complex or large bpf programs when filtering traffic.
For now we will default it to BPF_MAXINSNS. I have tested bpf programs
with well over 21,000 instructions without any problems.

Discussed with:	phk
2005-06-06 22:19:59 +00:00
Christian S.J. Peron
a3272e3ce3 -introduce net.bpf sysctl instead of the less intuitive debug.*
debug.bpf_bufsize is now net.bpf.bufsize
    debug.bpf_maxbufsize is now net.bpf.maxbufsize

-move function prototypes for bpf_drvinit and bpf_clone up to the
 top of the file with the others
-assert bpfd lock in catchpacket() and bpf_wakeup()

MFC after:	2 weeks
2005-05-04 03:09:28 +00:00
Poul-Henning Kamp
f4f6abcb4e Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.
2005-03-31 12:19:44 +00:00
Brian Feldman
4549709fb5 You must selwakeup{,pri}() when closing a selectable object or the
td->td_sel will get trashed and crash the system.  Fix BPF's mistake
in this area.

MFC after:	1 day
2005-03-27 23:16:17 +00:00
John-Mark Gurney
7819da7944 fix a bug where bpf would try to wakeup before updating the state.. This
was causing kqueue not to see the correct state and not wake up a process
that is waiting...

Submitted by:	nCircle Network Security, Inc.
2005-03-02 21:59:39 +00:00
Gleb Smirnoff
31199c8463 Use NET_CALLOUT_MPSAFE macro. 2005-03-01 12:01:17 +00:00
Robert Watson
a8e93fb7ec In bpf_setf(), protect against races between multiple user threads
attempting to change the BPF filter on a BPF descriptor at the same
time: retrieve the old filter pointer under the same locked region
as setting the new pointer.

MFC after:	3 days
2005-02-28 14:04:09 +00:00
Robert Watson
d1a67300e2 Update a comment describing bpf_iflist to indicate that the BPF interface
structures correspond to specific link layers, so the same network
interface may appear more than once.

MFC after:	3 days
2005-02-28 12:35:52 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Pawel Jakub Dawidek
77fc70c1ef Fix mbuf leak.
Submitted by:	Johnny Eriksson <bygg@cafax.se>
MFC after:	5 days
2004-12-27 15:53:44 +00:00
Poul-Henning Kamp
e76eee5562 Include fcntl.h
Check O_NONBLOCK instead of IO_NDELAY
Include uio.h
Don't include vnode.h
Don't include filedesc.h
2004-12-22 17:37:57 +00:00