freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	ed63043b21	- Utilize m_get2(), accidentially fixing some signedness bugs. - Return EMSGSIZE in both cases if uio_resid is oversized or undersized. - No need to clear rcvif.	2013-01-24 14:29:31 +00:00
Guy Helmer	3b3b91e736	Changes to resolve races in bpfread() and catchpacket() that, at worst, cause kernel panics. Add a flag to the bpf descriptor to indicate whether the hold buffer is in use. In bpfread(), set the "hold buffer in use" flag before dropping the descriptor lock during the call to bpf_uiomove(). Everywhere else the hold buffer is used or changed, wait while the hold buffer is in use by bpfread(). Add a KASSERT in bpfread() after re-acquiring the descriptor lock to assist uncovering any additional hold buffer races.	2012-12-10 16:14:44 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Alexander V. Chernikov	f079a0fa8c	Fix bpf_if structure leak introduced in r235745. Move all such structures to delayed-free lists and delete all matching on interface departure event. MFC after: 1 week	2012-12-02 21:43:37 +00:00
Guy Helmer	0e8a1cb3c9	Work around a race in bpfread() by validating the hold buffer pointer before freeing it. Otherwise, we can lose a buffer and cause a panic in catchpacket().	2012-11-06 21:07:04 +00:00
Alexander V. Chernikov	4fe83b8159	Fix typo introduced in r236559. Pointed by: bcr Approved by: kib(mentor)	2012-06-09 10:04:40 +00:00
Alexander V. Chernikov	784292f89a	Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface. Add several comments on locking Found by: avg Approved by: ae(mentor) Tested by: avg MFC after: 1 week	2012-06-04 12:36:58 +00:00
Jung-uk Kim	9b7d4a7f2d	Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf().	2012-05-29 22:28:46 +00:00
Jung-uk Kim	8b04b48a7d	- Save the previous filter right before we set new one. - Reduce duplicate code and make it little easier to read. MFC after: 2 weeks	2012-05-29 22:21:53 +00:00
Jung-uk Kim	6f731135ac	Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor and reset statistics as it should. MFC after: 3 days	2012-05-29 18:44:53 +00:00
Alexander V. Chernikov	a86227d176	Fix BPF_JITTER code broken by r235746. Pointed by: jkim Reviewed by: jkim (except locking changes) Approved by: (mentor) MFC after: 2 weeks	2012-05-29 12:52:30 +00:00
Alexander V. Chernikov	97aacec622	Make most BPF ioctls() SMP-safe. Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:21:00 +00:00
Alexander V. Chernikov	c7b0200eb5	Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter. Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl. This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock. Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:19:19 +00:00
Alexander V. Chernikov	afa85850e7	Fix old panic when BPF consumer attaches to destroying interface. 'flags' field is added to the end of bpf_if structure. Currently the only flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd() Problem can be easily triggered on SMP stable/[89] by the following command (sort of): 'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done' Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory, while interface is still UP and there can be routes via this interface. Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api. Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches) Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:17:29 +00:00
Alexander V. Chernikov	6c74ff0ea6	Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@) Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@) Protect most of bpf_setf() by BPF global lock Add several forgotten assertions (thanks to adrian@) Document current locking model inside bpf.c Document EVENTHANDLER(9) usage inside BPF. Approved by: kib(mentor) Tested by: gnn MFC in: 4 weeks	2012-05-21 22:13:48 +00:00
Alexander V. Chernikov	9431cc1696	Fix build broken by r233938. Pointed by: David Wolfskill <david@catwhisker.org> Approved by: kib (mentor) Pointy hat to: melifaro	2012-04-06 13:34:19 +00:00
Alexander V. Chernikov	51ec1eb70d	- Improve performace for writer-only BPF users. Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send raw ethernet frames. The only FreeBSD interface that can be used to send raw frames is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses BPF only to send data. This leads us to the situation when software like cdpd, being run on high-traffic-volume interface significantly reduces overall performance since we have to acquire additional locks for every packet. Here we add sysctl that changes BPF behavior in the following way: If program came and opens BPF socket without explicitly specifyin read filter we assume it to be write-only and add it to special writer-only per-interface list. This makes bpf_peers_present() return 0, so no additional overhead is introduced. After filter is supplied, descriptor is added to original per-interface list permitting packets to be captured. Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of setting snap length. Fortunately, most programs explicitly sets (event catch-all) filter after that. tcpdump(1) is a good example. So a bit hackis approach is taken: we upgrade description only after second BIOCSETF is received. Sysctl is named net.bpf.optimize_writers and is turned off by default. - While here, document all sysctl variables in bpf.4 Sponsored by Yandex LLC Reviewed by: glebius (previous version) Reviewed by: silence on -net@ Approved by: (mentor) MFC after: 4 weeks	2012-04-06 06:55:21 +00:00
Alexander V. Chernikov	e4b3229aa5	- Improve BPF locking model. Interface locks and descriptor locks are converted from mutex(9) to rwlock(9). This greately improves performance: in most common case we need to acquire 1 reader lock instead of 2 mutexes. - Remove filter(descriptor) (reader) lock in bpf_mtap[2] This was suggested by glebius@. We protect filter by requesting interface writer lock on filter change. - Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h without including rwlock stuff. However, this is is temporary solution, struct bpf_if should be made opaque for any external caller. Found by: Dmitrij Tejblum <tejblum@yandex-team.ru> Sponsored by: Yandex LLC Reviewed by: glebius (previous version) Reviewed by: silence on -net@ Approved by: (mentor) MFC after: 3 weeks	2012-04-06 06:53:58 +00:00
Juli Mallett	9624d94701	o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands using the o32 ABI. This mostly follows nwhitehorn's lead in implementing COMPAT_FREEBSD32 on powerpc64. o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the 32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit ABIs use a 64-bit time_t. o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS with 32-bit compatibility, then, disable some code which assumes otherwise wrongly when built for MIPS. A more general macro to check in this case would seem like a good idea eventually. If someone adds support for using n32 userland with n64 kernels on MIPS, then they will have to add a variety of flags related to each piece of the ABI that can vary. That's probably the right time to generalize further. o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the freebsd32 compat code. Probably this should be generalized at some point. Reviewed by: gonzo	2012-03-03 08:19:18 +00:00
Lawrence Stewart	9a7e6bac47	Consumers of bpfdetach() expect it to remove all bpf_if structs from the bpf_iflist list which reference the specified ifnet. The existing implementation only removes the first matching bpf_if found in the list, effectively leaking list entries if an ifnet has been bpfattach()ed multiple times with different DLTs. Fix the leak by performing the detach logic in a loop, stopping when all bpf_if structs referencing the specified ifnet have been detached and removed from the bpf_iflist list. Whilst here, also: - Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never exist in the list with a NULL ifnet pointer. - Except when INVARIANTS is in the kernel config, silently ignore the case where no bpf_if referencing the specified ifnet is found, as it is harmless and does not require user attention. Reviewed by: csjp MFC after: 1 week	2012-01-10 00:48:29 +00:00
Lawrence Stewart	253a3814d4	Revert r228986 until it can be reworked to avoid panicing the kernel when the same interface is attached multiple times with different DLTs, as is done in net80211 for example. Reported by: adrian	2011-12-31 07:21:28 +00:00
Lawrence Stewart	0f89fc22f3	- Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one aspect of time stamp configuration per interface rather than per BPF descriptor. Prior to this, the order in which BPF devices were opened and the per descriptor time stamp configuration settings could cause non-deterministic and unintended behaviour with respect to time stamping. With the new scheme, a BPF attached interface's tscfg sysctl entry can be set to "default", "none", "fast", "normal" or "external". Setting "default" means use the system default option (set with the net.bpf.tscfg.default sysctl), "none" means do not generate time stamps for tapped packets, "fast" means generate time stamps for tapped packets using a hz granularity system clock read, "normal" means generate time stamps for tapped packets using a full timecounter granularity system clock read and "external" (currently unimplemented) means use the time stamp provided with the packet from an underlying source. - Utilise the recently introduced sysclock_getsnapshot() and sysclock_snap2bintime() KPIs to ensure the system clock is only read once per packet, regardless of the number of BPF descriptors and time stamp formats requested. Use the per BPF attached interface time stamp configuration to control if sysclock_getsnapshot() is called and whether the system clock read is fast or normal. The per BPF descriptor time stamp configuration is then used to control how the system clock snapshot is converted to a bintime by sysclock_snap2bintime(). - Remove all FAST related BPF descriptor flag variants. Performing a "fast" read of the system clock is now controlled per BPF attached interface using the net.bpf.tscfg sysctl tree. - Update the bpf.4 man page. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ In collaboration with: Julien Ridoux (jridoux at unimelb edu au)	2011-12-30 08:57:58 +00:00
Lawrence Stewart	3e47c78798	Revert r227778 in preparation for committing reworked patches in its place.	2011-11-29 12:55:26 +00:00
Lawrence Stewart	b6f1c7db32	- When feed-forward clock support is compiled in, change the BPF header to contain both a regular timestamp obtained from the system clock and the current feed-forward ffcounter value. This enables new possibilities including comparison of timekeeping performance and timestamp correction during post processing. - Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping packets using the feedback or feed-forward system clock. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-21 04:17:24 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Attilio Rao	6aba400a70	Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks	2011-08-25 15:51:54 +00:00
Jung-uk Kim	d0d7bcdf92	Fix a typo in a comment. Submitted by: afiveg	2010-09-16 18:37:33 +00:00
Jung-uk Kim	547d94bde3	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
Bjoern A. Zeeb	1b610a749e	MFP4: @177254 Add missing CURVNET_RESTORE() calls for multiple code paths, to stop leaking the currently cached vnet into callers and to the process. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-04-27 15:16:54 +00:00
Konstantin Belousov	fc0a61a401	Provide compat32 shims for bpf(4), except zero-copy facilities. bd_compat32 field of struct bpf_d is kept unconditionally to not impose the requirement of including "opt_compat.h" on all numerous users of bpfdesc.h. Submitted by: jhb (version for 6.x) Reviewed and tested by: emaste MFC after: 2 weeks	2010-04-25 16:43:41 +00:00
Jung-uk Kim	704858479c	Check the pointer to JIT binary filter before its de-allocation. Submitted by: Alexander Sack (asack at niksun dot com) MFC after: 3 days	2010-03-29 20:24:03 +00:00
Jung-uk Kim	5d7af3a1cc	Fix a style(9) nit.	2010-03-12 19:42:42 +00:00
Jung-uk Kim	9fee1bd1d8	Tidy up callout for select(2) and read timeout. - Add a missing callout_drain(9) before the descriptor deallocation.[1] - Prefer callout_init_mtx(9) over callout_init(9) and let the callout subsystem handle the mutex for callout function. PR: kern/144453 Submitted by: Alexander Sack (asack at niksun dot com)[1] MFC after: 1 week	2010-03-12 19:14:58 +00:00
Jung-uk Kim	8df67d77ed	Return partially filled buffer for non-blocking read(2) in non-immediate mode. PR: kern/143855	2010-02-20 00:19:21 +00:00
Robert Watson	974e99b008	Remove unneeded blank line from bpf_drvinit(). MFC after: 3 days	2009-10-23 17:26:29 +00:00
Robert Watson	e76d823b81	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
Jung-uk Kim	a36599cce7	Always embed pointer to BPF JIT function in BPF descriptor to avoid inconsistency when opt_bpf.h is not included. Reviewed by: rwatson Approved by: re (rwatson)	2009-08-12 17:28:53 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Christian S.J. Peron	0e37f3e196	Implement the -z (zero counters) option for the various bpf counters. Add necessary changes to the kernel for this (basically introduce a bpf_zero_counters() function). As well, update the man page. MFC after: 1 month Discussed with: rwatson	2009-06-19 20:31:44 +00:00
Bjoern A. Zeeb	ebd8672cc3	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
Konstantin Belousov	d8b0556c6d	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland	2009-06-10 20:59:32 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Sam Leffler	5ce8d9708c	rev bpf attach/detach event api to include the dlt	2009-05-25 16:34:35 +00:00
Sam Leffler	b743c31009	add bpf_track eventhandler for monitoring bpf taps attached/detached Reviewed by: csjp	2009-05-18 17:18:40 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Christian S.J. Peron	ffeeb9241a	Disable zerocopy by default for now. It's causing some problems in pcap consumers which fork after the shared pages have been setup. pflogd(8) is an example. The problem is understood and there is a fix coming in shortly. Folks who want to continue using it can do so by setting net.bpf.zerocopy_enable to 1. Discussed with: rwatson	2009-03-10 14:28:19 +00:00
Robert Watson	e82669d99b	When resetting a BPF descriptor, properly check that zero-copy buffers are not currently owned by userspace before clearing or rotating them. Otherwise we may not play by the rules of the shared memory protocol, potentially corrupting packet data or causing userspace applications that are playing by the rules to spin due to being notified that a buffer is complete but the shared memory header not reflecting that. This behavior was seen with pflogd by a number of reporters; note that this fix is not sufficient to get pflogd properly working with zero-copy BPF, due to pflogd opening the BPF device before forking, leading to the shared memory buffer not being propery inherited in the privilege-separated child. We're still deciding how to fix that problem. This change exposes buffer-model specific strategy information in reset_d(), which will be fixed at a later date once we've decided how best to improve the BPF buffer abstraction. Reviewed by: csjp Reported by: keramida	2009-03-07 22:17:44 +00:00
Christian S.J. Peron	927094113e	Mark the bpf stats sysctl as being mpsafe. We do not require Giant here.	2009-03-07 17:07:29 +00:00
Christian S.J. Peron	f32b457b6e	Switch the default buffer mode in bpf(4) to zero-copy buffers. Discussed with: rwatson	2009-03-02 19:42:01 +00:00
Marko Zec	97021c2464	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00

1 2 3 4 5 ...

252 Commits