Commit Graph

241484 Commits

Author SHA1 Message Date
Johannes Lundberg
1462308d8b LinuxKPI: Add vm_fault_t type.
This patch is part of D19565

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-14 23:32:02 +00:00
Johannes Lundberg
395be823fd LinuxKPI: Add context member to ww_mutex and bump FreeBSD version.
This patch is part of https://reviews.freebsd.org/D19565.

Reviewed by:	hps
Approved by:	imp (mentor), hps
2019-05-14 23:21:20 +00:00
Johannes Lundberg
02927c768a LinuxKPI: Let del_timer return a value to match Linux.
This patch is part of https://reviews.freebsd.org/D19565.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-14 23:12:14 +00:00
Rick Macklem
711d44ee56 Replace global list for grouplist with list(s) for each exportlist element.
In mountd.c, the grouplist structures are linked into a single global
linked list headed by "grphead". The only use of this linked list is
to free all list elements when the exportlist elements are also all being
free'd at the time the exports are being reloaded.
This patch replaces this one global linked list head with a list head in
each exportlist structure, where the grouplist elements for that exported
file system are linked.
The only change is that now the grouplist elements are free'd with the
associated exportlist element as they are free'd instead of all grouplist
elements being free'd after the exportlist elements are free'd. This
change should have no effect in practice.
This is being done, since a future patch that will add a "-I" option for
incrementally updating the exports in the kernel needs to know which
grouplist elements are associated with each exported file system and
having them linked into a list headed by the exportlist element does that.

MFC after:	1 month
2019-05-14 22:00:47 +00:00
Mark Johnston
5a1e222bfd Close some races in multicast socket option handling.
r333175 converted the global multicast lock to a sleepable sx lock,
so the lock order with respect to the (non-sleepable) inp lock changed.
To handle this, r333175 and r333505 added code to drop the inp lock,
but this opened races that could leave multicast group description
structures in an inconsistent state.  This change fixes the problem by
simply acquiring the global lock sooner.  Along the way, this fixes
some LORs and bogus error handling introduced in r333175, and commits
some related cleanup.

Reported by:	syzbot+ba7c4943547e0604faca@syzkaller.appspotmail.com
Reported by:	syzbot+1b803796ab94d11a46f9@syzkaller.appspotmail.com
Reviewed by:	ae
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20070
2019-05-14 21:30:55 +00:00
Edward Tomasz Napierala
060d0b57b8 Fix handling of r10 in Linux ptrace(2). This fixes decoding
of the 'flags' argument to mmap(2) with Linux strace(1).

Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20223
2019-05-14 20:59:44 +00:00
Kyle Evans
db226f0d8e tuntap: Defer clearing if_softc until after if_detach
r346670 added an sx to close a race between the ifioctl handler and
interface destruction. Unfortunately, it clears if_softc immediately after
the interface is closed, but before if_detach has been invoked.

Any time before detachment, an interface that's part of a bridge may still
receive traffic that's pushed through tunstart/tunstart_l2 and promptly
lead to a panic because if_softc is now NULL.

Fix it by deferring the clearing of if_softc until after the interface has
detached and thus been removed from the bridge. if_softc still gets cleared
in case another thread has already entered the ioctl handler before it's
replaced with ifdead_ioctl.

Reported by:	markj
MFC after:	3 days
2019-05-14 20:32:29 +00:00
Mark Johnston
367ba2d2a3 Specify -z notext when building with -z ifunc-noplt.
The upstream implementation of -z ifunc-noplt disallows its combination
with -z text.  The option does not have much significance for kernel
builds, though.

Reviewed by:	kib (previous version)
Discussed with:	emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20260
2019-05-14 18:26:39 +00:00
Mark Johnston
b5155bc919 Remove redundant -Wl uses from the kernel's LDFLAGS.
No functional change intended.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-05-14 18:10:32 +00:00
Konstantin Belousov
7355a02bdd Mitigations for Microarchitectural Data Sampling.
Microarchitectural buffers on some Intel processors utilizing
speculative execution may allow a local process to obtain a memory
disclosure.  An attacker may be able to read secret data from the
kernel or from a process when executing untrusted code (for example,
in a web browser).

Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html
Security:	CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
Security:	FreeBSD-SA-19:07.mds
Reviewed by:	jhb
Tested by:	emaste, lwhsu
Approved by:	so (gtetlow)
2019-05-14 17:02:20 +00:00
Guangyuan Yang
3a89c98bec Fix some spelling errors in ng_eiface(4).
MFC after:	3 days
PR:		237764
Submitted by:	Tom Marcoen <tom.marcoen@gmail.com>
2019-05-14 15:41:34 +00:00
Mark Johnston
0ac6ef663b Fix formatting.
MFC after:	3 days
2019-05-14 15:19:48 +00:00
Andrey V. Elsukov
2317067c31 Remove bpf interface lock, it is no longer exist. 2019-05-14 10:21:28 +00:00
Conrad Meyer
e199792d23 Revert r346292 (permit_nonrandom_stackcookies)
We have a better, more comprehensive knob for this now:
kern.random.initial_seeding.bypass_before_seeding=1.

Requested by:	delphij
Sponsored by:	Dell EMC Isilon
2019-05-13 23:37:44 +00:00
Toomas Soome
b17868a211 loader: fix memory handling errors in module.c
file_loadraw():
check for file_alloc() and strdup() results.
we leak 'name'.

mod_load() does leak 'filename'.

mod_loadkld() does not need to check fp, file_discard() does check.
2019-05-13 22:17:11 +00:00
Andrey V. Elsukov
82d7bf6b1b Avoid possible recursion on BPF_LOCK() in bpfwrite().
Release BPF_LOCK() before invoking if_output() and if_input().
Also enter epoch section before releasing lock, this should prevent
access to ifnet that may be freed on interface detach.

Reported by:	markj
2019-05-13 20:17:55 +00:00
Conrad Meyer
e8e1f0b420 Fortuna: Fix false negatives in is_random_seeded()
(1) We may have had sufficient entropy to consider Fortuna seeded, but the
random_fortuna_seeded() function would produce a false negative if
fs_counter was still zero.  This condition could arise after
random_harvestq_prime() processed the /boot/entropy file and before any
read-type operation invoked "pre_read()."  Fortuna's fs_counter variable is
only incremented (if certain conditions are met) by reseeding, which is
invoked by random_fortuna_pre_read().

is_random_seeded(9) was introduced in r346282, but the function was unused
prior to r346358, which introduced this regression.  The regression broke
initial seeding of arc4random(9) and broke periodic reseeding[A], until something
other than arc4random(9) invoked read_random(9) or read_random_uio(9) directly.
(Such as userspace getrandom(2) or read(2) of /dev/random.  By default,
/etc/rc.d/random does this during multiuser start-up.)

(2) The conditions under which Fortuna will reseed (including initial seeding)
are: (a) sufficient "entropy" (by sheer byte count; default 64) is collected
in the zeroth pool (of 32 pools), and (b) it has been at least 100ms since
the last reseed (to prevent trivial DoS; part of FS&K design).  Prior to
this revision, initial seeding might have been prevented if the reseed
function was invoked during the first 100ms of boot.

This revision addresses both of these issues.  If random_fortuna_seeded()
observes a zero fs_counter, it invokes random_fortuna_pre_read() and checks
again.  This addresses the problem where entropy actually was sufficient,
but nothing had attempted a read -> pre_read yet.

The second change is to disable the 100ms reseed guard when Fortuna has
never been seeded yet (fs_lasttime == 0).  The guard is intended to prevent
gratuitous subsequent reseeds, not initial seeding!

Machines running CURRENT between r346358 and this revision are encouraged to
refresh when possible.  Keys generated by userspace with /dev/random or
getrandom(9) during this timeframe are safe, but any long-term session keys
generated by kernel arc4random consumers are potentially suspect.

[A]: Broken in the sense that is_random_seeded(9) false negatives would cause
arc4random(9) to (re-)seed with weak entropy (SHA256(cyclecount ||
FreeBSD_version)).

PR:		237869
Reported by:	delphij, dim
Reviewed by:	delphij
Approved by:	secteam(delphij)
X-MFC-With:	r346282, r346358 (if ever)
Security:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20239
2019-05-13 19:35:35 +00:00
Mark Johnston
aa0a893384 Add an UPDATING entry and bump __FreeBSD_version for r347532.
Reported by:	rgrimes, Oliver Pinter <oliver.pinter@hardenedbsd.org>
2019-05-13 18:48:08 +00:00
Mark Johnston
8cd6a80d7d Restore the pre-r347532 behaviour of ignoring wiring failures in mmap().
The error handling added in r347532 is not right when mapping vnodes
and will be fixed separately.

Reported by:	syzbot+1d2cc393bd6c88a548be@syzkaller.appspotmail.com
MFC with:	r347532
2019-05-13 18:40:01 +00:00
Dmitry Chagin
6e4cf32e95 Add warning to the Linuxulator makefiles that building it outside of a
kernel does not make sence.

PR:		222861
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20179
2019-05-13 18:28:40 +00:00
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Dmitry Chagin
caaad8736e Linuxulator getpeername() returns EINVAL in case then namelen less then 0.
MFC after:	2 weeks
2019-05-13 18:14:20 +00:00
Mark Johnston
adbb25df4b Extend the libcap_sysctl tests.
- Add some coverage for cap_sysctl(3).
- Add a test for the case where the caller wishes to find the sysctl
  output length without specifying an output buffer.

Reviewed by:	oshogbo
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17856
2019-05-13 17:53:03 +00:00
Mark Johnston
3c766430f7 Convert the libcap_sysctl test cases to ATF.
Reviewed by:	oshogbo
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17855
2019-05-13 17:51:03 +00:00
Mark Johnston
1608c46ea4 Add cap_sysctl(3) and cap_sysctlnametomib(3).
These complement cap_sysctlbyname(3) to provide a drop-in
replacement for the corresponding libc functions.

Also revise the libcap_sysctl limit interface to provide access
to sysctls by MIB, and to avoid direct manipulation of nvlists
by the caller.

Reviewed by:	oshogbo
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17854
2019-05-13 17:49:54 +00:00
Dmitry Chagin
d5368bf3df Our bsd_to_linux_sockaddr() and linux_to_bsd_sockaddr() functions
alter the userspace sockaddr to convert the format between linux and BSD versions.
That's the minimum 3 of copyin/copyout operations for one syscall.

Also some syscall uses linux_sa_put() and linux_getsockaddr() when load
sockaddr to userspace or from userspace accordingly.

To avoid this chaos, especially converting sockaddr in the userspace,
rewrite these 4 functions to convert sockaddr only in kernel and leave
only 2 of this functions.

Also in order to reduce duplication between MD parts of the Linuxulator put
struct sockaddr conversion functions that are MI out into linux_common module.

PR:		232920
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20157
2019-05-13 17:48:16 +00:00
Mark Johnston
54a3a11421 Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes.  User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.

The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2).  Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks.  In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process.  The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.

The choice to count virtual user-wired pages rather than physical
pages was done for simplicity.  There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.

The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded.  For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.

Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM.  Users that wish to exceed the limit must tune
vm.max_user_wired.

Reviewed by:	kib, ngie (mlock() test changes)
Tested by:	pho (earlier version)
MFC after:	45 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00
Andrey V. Elsukov
af1f58df99 Do not leak memory used for binary filter. 2019-05-13 14:07:02 +00:00
Andrey V. Elsukov
699281b545 Rework locking in BPF code to remove rwlock from fast path.
On high packets rate the contention on rwlock in bpf_*tap*() functions
can lead to packets dropping. To avoid this, migrate this code to use
epoch(9) KPI and ConcurrencyKit's lists.

* all lists changed to use CK_LIST;
* reference counting added to bpf_if and bpf_d;
* now bpf_if references ifnet and releases this reference on destroy;
* each bpf_d descriptor references bpf_if when it is attached;
* new struct bpf_program_buffer introduced to keep BPF filter programs;
* bpf_program_buffer, bpf_d and bpf_if structures are freed by
  epoch_call();
* bpf_freelist and ifnet_departure event are no longer needed, thus
  both are removed;

Reviewed by:	melifaro
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D20224
2019-05-13 13:45:28 +00:00
Emmanuel Vadot
eb4c63f731 Revert r347356 and r347371
passwd related files need to be tagged as config file so pkg update
will attempt merging them when we install a new package.
We should use CONFS for that.
Revert for now until I come up with a better version of this patch as
it breaks pkgbase for users.
2019-05-13 12:38:33 +00:00
Andrey V. Elsukov
740d4c7c9f Revert r347402. After r347429 symlink is no longer needed. 2019-05-13 08:34:13 +00:00
Mark Johnston
11a5fc4fb9 Catch up with r347241.
MFC with:	r347241
2019-05-13 01:18:17 +00:00
Ruslan Bukin
b803d0b790 Add support for HiFive Unleashed -- the board with a multi-core RISC-V SoC
from SiFive, Inc.

The first core on this SoC (hart 0) is a 64-bit microcontroller.

o Pick a hart to run boot process using hart lottery.
  This allows to exclude hart 0 from running the boot process.
  (BBL releases hart 0 after the main harts, so it never wins the lottery).
o Renumber CPUs early on boot.
  Exclude non-MMU cores. Store the original hart ID in struct pcpu. This
  allows to find out the correct destination for IPIs and remote sfence
  calls.

Thanks to SiFive, Inc for the board provided.

Reviewed by:	markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20225
2019-05-12 16:17:05 +00:00
Emmanuel Vadot
da153f8e60 arm: allwinner: aw_clk_nm: Don't reparent the clock if we didn't ask
When looking for the best frequency don't change the clock parent if the
clock wasn't configured to do that.
2019-05-12 15:27:01 +00:00
Mateusz Guzik
8ba6c1391b cache: fix a brainfart in r347505
If bumping over the counter goes over the limit we have to decrement it back.

Previous code would only bump the counter after adding the entry (thus allowing
the cache to go over the limit).

Sponsored by:	The FreeBSD Foundation
2019-05-12 07:56:01 +00:00
Mateusz Guzik
2425b5168c seqc: fix sed-introduced typos (seqcuence -> sequence)
Sponsored by:	The FreeBSD Foundation
2019-05-12 07:13:25 +00:00
Mateusz Guzik
b72515e129 amd64: tidy up pagezero*/pagecopy (movq -> movl)
Sponsored by:	The FreeBSD Foundation
2019-05-12 07:11:44 +00:00
Mateusz Guzik
5bf50787e6 cache: bump numcache on entry, while here fix lnumcache type
Sponsored by:	The FreeBSD Foundation
2019-05-12 06:59:22 +00:00
Mateusz Guzik
45372f1a6f amd64: fixup MEMMOVE comment (10 -> r10)
Sponsored by:	The FreeBSD Foundation
2019-05-12 06:42:17 +00:00
Mateusz Guzik
63ad3b65b0 cache: push sdt probes in cache_zap_locked to code doing the work
Avoids branching to check which probe to evaluate. Very same check was
being done later to do the actual work.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:39:30 +00:00
Mateusz Guzik
a8c2fcb287 x86: store pending bitmapped IPIs in per-cpu areas
This gets rid of the global cpu_ipi_pending array.

While replace cmpset with fcmpset in the delivery code and opportunistically
check if given IPI is already pending.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:36:54 +00:00
Mateusz Guzik
8eae2be460 amd64: stop re-reading curpc in suword
Plugs re-reads missed in r341719

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:34:58 +00:00
Mateusz Guzik
5e57adc874 random(4): depessimize arc4random
- __predict_false reseeding on entry as it is almost never true.
- don't blindly atomic_cmpset as on x86 it ends up dirtying the cacheline.
it almost ever succeeds per above
- fetch the timestamp prior to getting the cpu number

Reviewed by:	cem
Approved by:	secteam (delphij)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20242
2019-05-12 06:32:46 +00:00
Rick Macklem
3e08dc749c Factor code into two new functions in preparation for a future commit.
Factor code into two functions.
read_exportfile() a functon  which reads the exports file(s) and calls
get_exportlist_one() to process each of them.
delete_export() a function which deletes the exports in the kernel for a file
system.
The contents of these functions is just the same code as was used to do the
operations, moved into separate functions. As such, there is no semantic change.
This is being done in preparation for a future commit that will add an
option to do incremental changes of kernel exports upon receiving SIGHUP.

MFC after:	1 month
2019-05-11 22:41:58 +00:00
Jens Schweikhardt
82455a3319 Correct a handful of typos. 2019-05-11 19:31:54 +00:00
Cy Schubert
706a3d9c65 Support the use of the ipsec kld.
X-MFC with:	r347410
2019-05-11 17:59:13 +00:00
Doug Moore
87ae0686a2 A new parameter to blist_alloc specifies an upper bound on the size of
the allocation request, so that the blocks allocated are from the next
set of free blocks big enough to satisfy the minimum requirements of
the request, and the number of blocks allocated are as many as
possible, up to the specified maximum. The implementation of
swp_pager_getswapspace uses this parameter to ask for a number of
blocks between the new halved request size and the previous failed
request size. Thus a request for 32 blocks may fail, but instead of
getting only 16 blocks instead, the caller asks for 16 to 31 next, and
might get 19 or 27, which is closer to what they originally wanted.

I expect this to lead to bigger block allocations and less block
fragmentation, at least in some cases.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20001
2019-05-11 16:15:13 +00:00
Justin Hibbits
2f420a7c7f revert r346588 for now
The rewrite of strcmp in assembly uses an instruction added in PowerISA
2.05, making it SIGILL on CPUs older than the POWER6, such as the PPC970 in
the PowerMac G5.  Revert this until we get clang+lld, or retire the in-tree
binutils in favor of newer binutils with IFUNC support, whichever comes
first.
2019-05-11 15:17:42 +00:00
Emmanuel Vadot
f78a4afd30 twsi: Calculate the clock param based on the bus frequency
Instead of precalculating the different speed, respect the bus frequency
and calculate the clock register parameter based on it.
If the platform didn't register the core clk, fallback on the precomputed
values (This is likely do be the case on Marvell boards).
2019-05-11 15:03:51 +00:00
Emmanuel Vadot
e69181cfc6 allwinner: clk: sun8i_r: Correct resets
The i2c reset wasn't defined and some bits where wrong, correct them.
2019-05-11 15:02:55 +00:00