Commit Graph

281 Commits

Author SHA1 Message Date
wma
d14bbbd48e BPF: Switch to 32 bit compatible mode only when thread is 32 bit
Sometimes 32 bit and 64 bit ioctls are represented by the same number.
It causes unnecessary switch to 32 bit commpatible mode.

This patch prevents switching when we are dealing with 64 bit executable.
It fixes issue mentioned here

Authored by:           Patryk Duda <pdk@semihalf.com>
Submitted by:          Wojciech Macek <wma@semihalf.com>
Reviewed by:           andrew, wma
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14023
2018-01-25 12:13:41 +00:00
eadler
421a929b1e kernel: Fix several typos and minor errors
- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by:	imp, benno
2017-12-27 03:23:21 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
kp
6ab3759e1c bpf: Fix incorrect cleanup
Cleaning up a bpf_if is a two stage process. We first move it to the
bpf_freelist (in bpfdetach()) and only later do we actually free it (in
bpf_ifdetach()).

We cannot set the ifp->if_bpf to NULL from bpf_ifdetach() because it's
possible that the ifnet has already gone away, or that it has been assigned
a new bpf_if.
This can lead to a struct ifnet which is up, but has if_bpf set to NULL,
which will panic when we try to send the next packet.

Keep track of the pointer to the bpf_if (because it's not always
ifp->if_bpf), and NULL it immediately in bpfdetach().

PR:		213896
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11782
2017-08-16 19:40:07 +00:00
jhibbits
342f85f1d4 Update comments and simplify conditionals for compat32
Only amd64 (because of i386) needs 32-bit time_t compat now, everything else is
64-bit time_t.  Rather than checking on all 64-bit time_t archs, only check the
oddball amd64/i386.

Reviewed By: emaste, kib, andrew
Differential Revision: https://reviews.freebsd.org/D11364
2017-06-27 01:29:10 +00:00
jhibbits
11d34fd62a Solve the y2038 problem for powerpc
AKA Make time_t 64 bits on powerpc(32).

PowerPC currently (until now) was one of two architectures with a 32-bit time_t
on 32-bit archs (the other being i386).  This is an ABI breakage, so all ports,
and all local binaries, *must* be recompiled.

Tested by:	andreast, others
MFC after:	Never
Relnotes:	Yes
2017-06-26 02:25:19 +00:00
ae
84935b56a6 Ignore ifnet renaming in the bpf ifnet departure handler.
PR:		213015
MFC after:	1 week
2017-03-13 09:04:10 +00:00
imp
7e6cabd06e Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
kib
42da5a6952 Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI.  The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure.  As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance.  Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by:	pho (as part of the bigger patch)
Reviewed by:	jhb (same)
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
X-Differential revision:	https://reviews.freebsd.org/D7302
2016-07-27 11:08:59 +00:00
pfg
729533413f sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
cem
9904129a9f bpf_getdltlist: Don't overrun 'lst'
'lst' is allocated with 'n1' members.  'n' indexes 'lst'.  So 'n == n1' is an
invalid 'lst' index.  This is a follow-up to r296009.

Reported by:	Coverity
CID:		1352743
Sponsored by:	EMC / Isilon Storage Division
2016-04-20 01:39:31 +00:00
bz
146f54f44d During if_vmove() we call if_detach_internal() which in turn calls the event
handler notifying about interface departure and one of the consumers will
detach if_bpf.
There is no way for us to re-attach this easily as the DLT and hdrlen are
only given on interface creation.
Add a function to allow us to query the DLT and hdrlen from a current
BPF attachment and after if_attach_internal() manually re-add the if_bpf
attachment using these values.

Found by panics triggered by nd6 packets running past BPF_MTAP() with no
proper if_bpf pointer on the interface.

Also add a basic DDB show function to investigate the if_bpf attachment
of an interface.

Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5896
2016-04-11 10:00:38 +00:00
kib
4c4becd77e In bpf_getdltlist(), do not call copyout(9) while holding bpf lock.
Copy the data into temprorary malloced buffer and drop the lock for
copyout.

Reported, reviewed and tested by:	cem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-24 22:00:35 +00:00
melifaro
93152c67c9 Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
bz
a240c87b63 If bootverbose is enabled every vnet startup and virtual interface
creation will print extra lines on the console. We are generally not
interested in this (repeated) information for each VNET. Thus only
print it for the default VNET. Virtual interfaces on the base system
will remain printing information, but e.g. each loopback in each vnet
will no longer cause a "bpf attached" line.

Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Reviewed by:		gnn
Differential Revision:	https://reviews.freebsd.org/D4531
2015-12-22 15:00:04 +00:00
loos
1511a8a27a Remove the mtx_sleep() from the kqueue f_event filter.
The filter is called from the network hot path and must not sleep.

The filter runs with the descriptor lock held and does not manipulates the
buffers, so it is not necessary sleep when the hold buffer is in use.

Just ignore the hold buffer contents when it is being copied to user space
(when hold buffer in use is set).

This fix the "Sleeping thread owns a non-sleepable lock" panic when the
userland thread is too busy reading the packets from bpf(4).

PR:		200323
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-08-03 22:14:45 +00:00
loos
b6407759b0 Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the
buffers while the hold buffer is in use).

Suggested by:	ed, ghelmer
MFC with:	r286142
2015-08-03 18:22:31 +00:00
loos
bf2e4c7805 Remove two unnecessary sleeps from the hot path in bpf(4).
The first one never triggers because bpf_canfreebuf() can only be true for
zero-copy buffers and zero-copy buffers are not read with read(2).

The second also never triggers, because we check the free buffer before
calling ROTATE_BUFFERS().  If the hold buffer is in use the free buffer
will be NULL and there is nothing else to do besides drop the packet.  If
the free buffer isn't NULL the hold buffer _is_ free and it is safe to
rotate the buffers.

Update the comment in ROTATE_BUFFERS macro to match the logic described
here.

While here fix a few typos in comments.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 21:43:27 +00:00
loos
4244ee9e43 Do not allocate the buffers at opening of the descriptor, because once
the buffer is allocated we are committed to a particular buffer method
(BPF_BUFMODE_BUFFER in this case).

If we are using zero-copy buffers, the userland program must register its
buffers before set the interface.

If we are using kernel memory buffers, we can allocate the buffer at the
time that the interface is being set.

This fix allows the usage of BIOCSETBUFMODE after r235746.

Update the comments to reflect the recent changes.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:02:12 +00:00
markj
639543ed86 Move the definition of struct bpf_if to bpf.c.
A couple of fields are still exposed via struct bpf_if_ext so that
bpf_peers_present() can be inlined into its callers. However, this change
eliminates some type duplication in the resulting CTF container, since
otherwise ctfmerge(1) propagates the duplication through all types that
contain a struct bpf_if.

Differential Revision:	https://reviews.freebsd.org/D2319
Reviewed by:	melifaro, rpaulo
2015-04-20 22:08:11 +00:00
mav
161ff946d4 Activate write-only optimization if bpf device opened with O_WRONLY.
dhclient opens bpf as write-only to send packets. It never reads received
packets from that descriptor, but processing them in kernel takes time.
Especially much time takes packet timestamping on systems with expensive
timecounter, such as bhyve guest, where network speed dropped in half.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-04-20 10:44:46 +00:00
melifaro
59ad9f39b6 Eliminate SIOCGIFADDR handling in bpf.
Quoting 19 years bpf.4 manual from bpf-1.2a1:
"
(SIOCGIFADDR is obsolete under BSD systems.  SIOCGIFCONF should be
 used to query link-level addresses.)
"
* SIOCGIFADDR was not imported in NetBSD (bpf.c 1.36) and OpenBSD.
* Last bits (e.g. manpage claiming SIOCGIFADDR exists) was cleaned
  from NetBSD via kern/21513 5 years ago,
  from OpenBSD via documentation/6352 5 years ago.
2015-01-16 10:09:28 +00:00
glebius
99f4ec50e8 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
hselasky
a0b8ff0c54 The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.

Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-28 12:00:39 +00:00
melifaro
e9f6263cd3 Improve logic besides net.bpf.optimize_writers.
Direct bpf(4) consumers should now work fine with this tunable turned on.
In fact, the only case when optimized_writers can change program
behavior is direct bpf(4) consumer setting its read filter to
catch-all one.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-06-11 11:27:44 +00:00
adrian
42bc4565be Convert the random entropy harvesting code to use a const void * pointer
rather than just void *.

Then, as part of this, convert a couple of mbuf m->m_data accesses
to mtod(m, const void *).

Reviewed by:	markm
Approved by:	security-officer (delphij)
Sponsored by:	Netflix, Inc.
2013-11-01 20:53:49 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
ghelmer
37dcf710d1 While waiting for the bpf hold buffer to become idle, check
the return value from mtx_sleep() and exit bpfread() on
errors such as EINTR.

Reviewed by:	jhb
2013-05-23 21:33:10 +00:00
glebius
37a43650ed Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
glebius
9f6a60e000 - Utilize m_get2(), accidentially fixing some signedness bugs.
- Return EMSGSIZE in both cases if uio_resid is oversized or undersized.
- No need to clear rcvif.
2013-01-24 14:29:31 +00:00
ghelmer
726beb3d43 Changes to resolve races in bpfread() and catchpacket() that, at worst,
cause kernel panics.

Add a flag to the bpf descriptor to indicate whether the hold buffer
is in use. In bpfread(), set the "hold buffer in use" flag before
dropping the descriptor lock during the call to bpf_uiomove().
Everywhere else the hold buffer is used or changed, wait while
the hold buffer is in use by bpfread(). Add a KASSERT in bpfread()
after re-acquiring the descriptor lock to assist uncovering any
additional hold buffer races.
2012-12-10 16:14:44 +00:00
glebius
8e20fa5ae9 Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
melifaro
20cedcd3c8 Fix bpf_if structure leak introduced in r235745.
Move all such structures to delayed-free lists and
delete all matching on interface departure event.

MFC after:	1 week
2012-12-02 21:43:37 +00:00
ghelmer
249e3589f9 Work around a race in bpfread() by validating the hold buffer pointer
before freeing it. Otherwise, we can lose a buffer and cause a panic
in catchpacket().
2012-11-06 21:07:04 +00:00
melifaro
ca3b31c289 Fix typo introduced in r236559.
Pointed by:   bcr
Approved by:  kib(mentor)
2012-06-09 10:04:40 +00:00
melifaro
511c0436a1 Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface.
Add several comments on locking

Found by:         avg
Approved by:      ae(mentor)
Tested by:        avg
MFC after:        1 week
2012-06-04 12:36:58 +00:00
jkim
f2bf1d383e Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf(). 2012-05-29 22:28:46 +00:00
jkim
9a4c9bf04c - Save the previous filter right before we set new one.
- Reduce duplicate code and make it little easier to read.

MFC after:	2 weeks
2012-05-29 22:21:53 +00:00
jkim
2287ab7bca Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor
and reset statistics as it should.

MFC after:	3 days
2012-05-29 18:44:53 +00:00
melifaro
ce59e84fbe Fix BPF_JITTER code broken by r235746.
Pointed by:       jkim
Reviewed by:      jkim (except locking changes)
Approved by:      (mentor)
MFC after:        2 weeks
2012-05-29 12:52:30 +00:00
melifaro
e145f3abac Make most BPF ioctls() SMP-safe.
Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:21:00 +00:00
melifaro
e5f61c6580 Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter.
Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl.
This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:19:19 +00:00
melifaro
34ec5c8650 Fix old panic when BPF consumer attaches to destroying interface.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'

Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.

Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:17:29 +00:00
melifaro
1c776bfa6e Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@)
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock

Add several forgotten assertions (thanks to adrian@)

Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.

Approved by:       kib(mentor)
Tested by:         gnn
MFC in:            4 weeks
2012-05-21 22:13:48 +00:00
melifaro
ff7ed324aa Fix build broken by r233938.
Pointed by:     David Wolfskill <david@catwhisker.org>
Approved by:    kib (mentor)
Pointy hat to:  melifaro
2012-04-06 13:34:19 +00:00
melifaro
85ccef88d3 - Improve performace for writer-only BPF users.
Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send
raw ethernet frames. The only FreeBSD interface that can be used to send raw frames
is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses
BPF only to send data. This leads us to the situation when software like cdpd,
being run on high-traffic-volume interface significantly reduces overall performance
since we have to acquire additional locks for every packet.

Here we add sysctl that changes BPF behavior in the following way:
If program came and opens BPF socket without explicitly specifyin read filter we
assume it to be write-only and add it to special writer-only per-interface list.
This makes bpf_peers_present() return 0, so no additional overhead is introduced.
After filter is supplied, descriptor is added to original per-interface list permitting
packets to be captured.

Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of
setting snap length.

Fortunately, most programs explicitly sets (event catch-all) filter after that.
tcpdump(1) is a good example.

So a bit hackis approach is taken: we upgrade description only after second
BIOCSETF is received.

Sysctl is named net.bpf.optimize_writers and is turned off by default.

- While here, document all sysctl variables in bpf.4

Sponsored by Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      4 weeks
2012-04-06 06:55:21 +00:00
melifaro
8b1d10268c - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00
jmallett
50c253779f o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI.  This mostly follows nwhitehorn's lead in implementing
   COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
   32-bit ABI being used.  Since the MIPS port is relatively-new, even the 32-bit
   ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
   with 32-bit compatibility, then, disable some code which assumes otherwise
   wrongly when built for MIPS.  A more general macro to check in this case would
   seem like a good idea eventually.  If someone adds support for using n32
   userland with n64 kernels on MIPS, then they will have to add a variety of
   flags related to each piece of the ABI that can vary.  That's probably the
   right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
   freebsd32 compat code.  Probably this should be generalized at some point.

Reviewed by:	gonzo
2012-03-03 08:19:18 +00:00
lstewart
15b1ff3609 Consumers of bpfdetach() expect it to remove all bpf_if structs from the
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.

Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.

Whilst here, also:

- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
  exist in the list with a NULL ifnet pointer.

- Except when INVARIANTS is in the kernel config, silently ignore the case where
  no bpf_if referencing the specified ifnet is found, as it is harmless and does
  not require user attention.

Reviewed by:	csjp
MFC after:	1 week
2012-01-10 00:48:29 +00:00
lstewart
8a799f2a2f Revert r228986 until it can be reworked to avoid panicing the kernel when the
same interface is attached multiple times with different DLTs, as is done in
net80211 for example.

Reported by:	adrian
2011-12-31 07:21:28 +00:00