Commit Graph

133956 Commits

Author SHA1 Message Date
arybchik
f0c5ab9387 sfxge(4): insert filters for encapsulated packets
On Medford, with full-featured firmware running, encapsulated
packets may not be delivered unless filters are inserted for
them, as ordinary filters are not applied to encapsulated
packets. So filters for encapsulated packets need to be
inserted for each class of encapsulated packet. For simplicity,
catch-all filters are always inserted. These may match more
packets than the OS has asked for, but trying to insert more
precise filters increases complexity for little gain.

Submitted by:   Mark Spender <mspender at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18074
2018-11-23 09:03:32 +00:00
arybchik
0cc8ee2dd9 sfxge(4): support filters for encapsulated packets
This supports filters which match all unicast or multicast
inner frames in VXLAN, GENEVE, or NVGRE packets.
(Additional fields to match on can be added easily.)

Submitted by:   Mark Spender <mspender at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18073
2018-11-23 09:03:20 +00:00
arybchik
262bf6afbb sfxge(4): use proper MCDI command for encap filters
MC_CMD_FILTER_OP_IN_EXT is needed to set filters for encapsulated
packets.

Submitted by:   Mark Spender <mspender at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18072
2018-11-23 09:03:09 +00:00
arybchik
23174883ee sfxge(4): provide information about supported tunnels
VXLAN/NVGRE (and Geneve) support is available on SFN8xxx with
full-feature firmware variant running.

Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18071
2018-11-23 09:02:58 +00:00
arybchik
6c3839dc7c sfxge(4): let caller know that queue is already flushed
Tx/Rx queue may be already flushed due to Tx/Rx error on the queue or
MC reboot. Caller needs to know that the queue is already flushed to
avoid waiting for flush done event.

Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18070
2018-11-23 07:50:56 +00:00
arybchik
56dc042300 sfxge(4): fix error code usage
MCDI results returned in req.emr_rc have already been translated
from MC_CMD_ERR_* to errno names, so using an MC_CMD_ERR_* value
is incorrect.

Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18069
2018-11-23 07:50:45 +00:00
arybchik
8d2b3bd70e sfxge(4): fix out of bounds read in VIs allocation
Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18068
2018-11-23 07:50:34 +00:00
arybchik
871a40bad0 sfxge(4): fix potential buffer overflow in Tx queue init
Improve error checking to avoid a caller overflowing the MCDI
request buffer if the requested TXQ size was excessively large.

Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18067
2018-11-23 07:50:22 +00:00
arybchik
54070c13c2 sfxge(4): fix failure path in EF10 Tx queue PIO enable
Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18066
2018-11-23 07:43:44 +00:00
arybchik
50624fdbc6 sfxge(4): add advanced function to extract FW version
Some libefx-based drivers might need this functionality to
indicate DPCPU FW IDs as part of FW version info to assist
experienced users.

Submitted by:   Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18065
2018-11-23 07:38:59 +00:00
arybchik
c54f54c911 sfxge(4): add MCDI agnostic wrapper for MAC stats clear
If a libefx-based driver needs some way to clear port statistics,
then an MCDI agnostic method is required.

Submitted by:   Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18064
2018-11-23 07:26:37 +00:00
mjg
2dadc8e0dd Revert "fork: fix use-after-free with vfork"
This unreliably breaks libc handling of vfork where forking succeded,
but execve did not.

vfork code in libc performs waitpid with WNOHANG in case of failed exec.
With the fix exit codepath was waking up the parent before the child
fully transitioned to a zombie. Woken up parent would waitpid, which
could find a not-yet-zombie child and fail to reap it due to the WNOHANG
flag.

While removing the flag fixes the problem, it is not an option due to older
releases which would still suffer from the kernel change.

Revert the fix until a solution can be worked out.

Note that while use-after-free which gets back due to the revert is a real
bug, it's side-effects are limited due to the fact that struct proc memory
is never released by UMA.
2018-11-23 04:38:50 +00:00
rmacklem
ea74f03a35 Make sure the NFS readdir client fills in all "struct dirent" data.
The NFS client code (nfsrpc_readdir() and nfsrpc_readdirplus()) wasn't
filling in parts of the readdir reply, such as d_pad[01] and the bytes
at the end of d_name within d_reclen. As such, data left in a buffer cache
block could be leaked to userland in the readdir reply.
This patch makes sure all of the data is filled in.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib, markj
MFC after:	2 weeks
2018-11-23 00:17:47 +00:00
mjg
42ea18a714 Annotate TDP_RFPPWAIT as unlikely.
The flag is only set on vfork, but is tested for *all* syscalls.
On amd64 this shortens common-case (not vfork) code.
2018-11-22 21:38:24 +00:00
mjg
09e7b78f66 fork: remove avoidable proc lock/unlock pair
We don't have to access the process after making it runnable, so there
is no need to hold it either.

Sponsored by:	The FreeBSD Foundation
2018-11-22 21:29:36 +00:00
mjg
75deef51a7 fork: fix use-after-free with vfork
The pointer to the child is stored without any reference held. Then it is
blindly used to wait until P_PPWAIT is cleared. However, if the child is
autoreaped it could have exited and get freed before the parent started
waiting.

Use the existing hold mechanism to mitigate the problem. Most common case
of doing exec remains unchanged. The corner case of doing exit performs
wake up before waiting for holds to clear.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18295
2018-11-22 21:08:37 +00:00
markj
5c563658ea Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
tuexen
a02b4525ca A TCP stack is required to check SEG.ACK first, when processing a
segment in the SYN-SENT state as stated in Section 3.9 of RFC 793,
page 66. Ensure this is also done by the TCP RACK stack.

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18034
2018-11-22 20:05:57 +00:00
tuexen
82210e189d Ensure that the TCP RACK stack honours the setting of the
net.inet.tcp.drop_synfin sysctl-variable.

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18033
2018-11-22 20:02:39 +00:00
tuexen
e4a2b60c79 Ensure that the default RTT stack can make an RTT measurement if
the TCP connection was initiated using the RACK stack, but the
peer does not support the TCP RACK extension.

This ensures that the TCP behaviour on the wire is the same if
the TCP connection is initated using the RACK stack or the default
stack.

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18032
2018-11-22 19:56:52 +00:00
tuexen
3ece71ca83 Ensure that TCP RST-segments announce consistently a receiver window of
zero. This was already done when sending them via tcp_respond().

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D17949
2018-11-22 19:49:52 +00:00
markj
fbbdea9acc Clear unused bytes in ia32_osendsig().
Mirror the fix for the native i386 implementation from r218327.  This
code is compiled only when the non-default COMPAT_43 option is
configured.

Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18298
2018-11-22 17:51:19 +00:00
emaste
1a89b15463 proto: change device permissions to 0600
C Turt reports that the driver is not thread safe and may have
exploitable races.

Note that the proto device is intended for prototyping and development,
and is not for use on production systems.  From the man page:

SECURITY CONSIDERATIONS
     Because programs have direct access to the hardware, the proto
     driver is inherently insecure.  It is not advisable to use this
     driver on a production machine.

The proto device is not included in any of FreeBSD's kernel config files
(although the module is built).

The issues in the proto device still need to be fixed, and the device is
inherently (and intentionally) insecure, but it might as well be limited
to root only.

admbugs:	782
Reported by:	C Turt <ecturt@gmail.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-22 16:55:09 +00:00
arybchik
710a20b029 sfxge(4): limit max TXQ size on Medford to 2048
Queues with 4096 descriptors are not supported as the top bit is used for vfifo
stuffing.

Submitted by:   Mark Spender <mspender at solarflare.com>
Reviewed by:    gnn
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
Differential Revision:  https://reviews.freebsd.org/D8948
2018-11-22 16:15:24 +00:00
arybchik
440bb161d3 sfxge(4): support packed stream Rx mode in libefx
Submitted by:   Artem V. Andreev <Artem.Andreev@oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18022
2018-11-22 14:31:35 +00:00
arybchik
63eeccfbf4 sfxge(4): cleanup: move into right place
Due to incorrect merge the piece of code was put in incorrect
place and diverge from libefx in other locations.

Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18024
2018-11-22 14:10:46 +00:00
mjg
ffdee46ab5 uipc_usrreq: fix inode number assignment
The code was incrementing a global variable in an unsafe manner.
Two different threads stating two different sockets could have resulted
in the same inode numbers assigned to both.

Creation is protected with a global lock, move the assigment there.
Since inode numbers are 64-bit now drop the check for overflows.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:25:05 +00:00
mjg
b51585a153 proc: update list manipulation comment on process exit
Processes stay in the hash until they get reaped.

This code does not unlink the child from the parent, so remove
the claim that it does.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:16:10 +00:00
mjg
6fd8f10bb4 uipc_shm: use unr64 for inode numbers
Sponsored by:	The FreeBSD Foundation
2018-11-21 22:01:06 +00:00
mjg
b06fa0f93a proc: convert pfind & friends to use pidhash locks and other cleanup
pfind_locked is retired as it relied on allproc which unnecessarily
restricts locking of the hash.

Sponsored by:	The FreeBSD Foundation
2018-11-21 20:15:56 +00:00
mjg
71aabf21a1 proc: implement pid hash locks and an iterator
forks, exits and waits are frequently stalled during poudriere -j 128 runs
due to killpg and process list exports performed for each package.

Both uses take the allproc lock. The latter case can be modified to iterate
over the hash with finer grained locking instead.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17817
2018-11-21 18:56:15 +00:00
tuexen
3ca57eff63 Improve two KASSERTs in the TCP RACK stack.
There are two locations where an always true comparison was made in
a KASSERT. Replace this by an appropriate check and use a consistent
panic message. Also use this code when checking a similar condition.

PR:			229664
Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18021
2018-11-21 18:19:15 +00:00
mav
e9a5feab20 Revert r340096: 9952 Block size change during zfs receive drops spill block
It was reported, and I easily reproduced it, that this change triggers panic
when receiving replication stream with enabled embedded blocks, when short
file compressing into one embedded block changes its block size.  I am not
sure that the problem is in this particuler patch, not just triggered by it,
but since investigation and fix will take some time, I've decided to revert
this for now.

PR:		198457, 233277
2018-11-21 18:18:57 +00:00
markj
2e57d41f44 Avoid unsynchronized updates to kn_status.
kn_status is protected by the kqueue's lock, but we were updating it
without the kqueue lock held.  For EVFILT_TIMER knotes, there is no
knlist lock, so the knote activation could occur during the kn_status
update and result in KN_QUEUED being lost, in which case we'd enqueue
an already-enqueued knote, corrupting the queue.

Fix the problem by setting or clearing KN_DISABLED before dropping the
kqueue lock to call into the filter.  KN_DISABLED is used only by the
core kevent code, so there is no side effect from setting it earlier.

Reported and tested by:	Sylvain GALLIANO <sg@efficientip.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18060
2018-11-21 17:32:09 +00:00
markj
6994e92830 Remove KN_HASKQLOCK.
It is a write-only flag whose last use was removed in r302235.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18059
2018-11-21 17:28:10 +00:00
markj
746f5464d4 Use taskqueue_quiesce(9) to implement taskq_wait().
PR:		227784
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17975
2018-11-21 17:19:08 +00:00
markj
0e3d68b2b4 Add a taskqueue_quiesce(9) KPI.
This is similar to taskqueue_drain_all(9) but will wait for the queue
to become idle before returning instead of only waiting for
already-enqueued tasks to finish.  This will be used in the opensolaris
compat layer.

PR:		227784
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17975
2018-11-21 17:18:27 +00:00
jhibbits
fa1f71dac0 DTrace/powerpc: Fix FBT return probes
The FBT fuction boundary prober was setting one return probe marker value,
but the dtrace handler was expecting another.  This causes a hang when
tracing return probes.
2018-11-21 16:47:11 +00:00
oleg
875afd892a Unbreak kernel build with VLAN_ARRAY defined.
MFC after:	1 week
2018-11-21 13:34:21 +00:00
bwidawsk
11416ef4a8 linuxkpi: Use pageproc instead of vmproc
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages.  vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary.  It's a
somewhat legacy component of the system and isn't required.  You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.

Based on this, we want pageproc to emulate kswapd, not vmproc.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D18061
2018-11-21 04:34:18 +00:00
bwidawsk
0a288ff526 Add definitions for Intel Speed Shift
These definitions will be used by a driver to implement Hardware
P-States (autonomous control of HWP, via Intel Speed Shift technology).

Reviewed by:	kib
Approved by:	emaste (mentor)
Differential Revision:	https://reviews.freebsd.org/D18050
2018-11-21 00:21:58 +00:00
bwidawsk
0c21a36e53 linuxkpi: Remove duplicated text
Somehow this got botched while moving from git -> svn
2018-11-20 23:05:09 +00:00
bwidawsk
82426acbdd linuxkpi: Add some basic swap functions
These are used by kms-drm to determine various heuristics relate
memory conditions.

The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.

This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h

Reviewed by:    mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by:    emaste (mentor)
Differential Revision:  https://reviews.freebsd.org/D18052
2018-11-20 22:49:19 +00:00
markj
62e656950f Clear pad bytes in the struct exported by kern.ntp_pll.gettime.
Reported by:	Thomas Barabosch, Fraunhofer FKIE
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-20 20:32:10 +00:00
zeising
6ae84e7e37 Enable evdev on ppc32
Enable evdev on ppc32 as well, similar to what was done i386 and amd64 in
r340387 and ppc64 in r340632.

Evdev can be used by X and is used by wayland to handle input devices.

Approved by:	jhibbits
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D18049
2018-11-20 19:31:02 +00:00
ae
d19730211c Make multiline APPLY_MASK() macro to be function-like.
Reported by:	cem
MFC after:	1 week
2018-11-20 18:38:28 +00:00
mjg
442ec8ecee tmpfs: use unr64 for inode numbers
Sponsored by:	The FreeBSD Foundation
2018-11-20 15:14:30 +00:00
markj
7cafe98f91 Handle kernel superpage mappings in pmap_remove_l2().
PR:		233088
Reviewed by:	alc, andrew, kib
Tested by:	sbruno
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17981
2018-11-20 15:12:37 +00:00
mjg
b359994628 pipe: use unr64
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18054
2018-11-20 14:59:27 +00:00
mjg
b46ad9fa56 Implement unr64
Important users of unr like tmpfs or pipes can get away with just
ever-increasing counters, making the overhead of managing the state
for 32 bit counters a pessimization.

Change it to an atomic variable. This can be further sped up by making
the counts variable "allocate" ranges and store them per-cpu.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18054
2018-11-20 14:58:41 +00:00