127130 Commits

Author SHA1 Message Date
Andrey V. Elsukov
2317067c31 Remove bpf interface lock, it is no longer exist. 2019-05-14 10:21:28 +00:00
Conrad Meyer
e199792d23 Revert r346292 (permit_nonrandom_stackcookies)
We have a better, more comprehensive knob for this now:
kern.random.initial_seeding.bypass_before_seeding=1.

Requested by:	delphij
Sponsored by:	Dell EMC Isilon
2019-05-13 23:37:44 +00:00
Andrey V. Elsukov
82d7bf6b1b Avoid possible recursion on BPF_LOCK() in bpfwrite().
Release BPF_LOCK() before invoking if_output() and if_input().
Also enter epoch section before releasing lock, this should prevent
access to ifnet that may be freed on interface detach.

Reported by:	markj
2019-05-13 20:17:55 +00:00
Conrad Meyer
e8e1f0b420 Fortuna: Fix false negatives in is_random_seeded()
(1) We may have had sufficient entropy to consider Fortuna seeded, but the
random_fortuna_seeded() function would produce a false negative if
fs_counter was still zero.  This condition could arise after
random_harvestq_prime() processed the /boot/entropy file and before any
read-type operation invoked "pre_read()."  Fortuna's fs_counter variable is
only incremented (if certain conditions are met) by reseeding, which is
invoked by random_fortuna_pre_read().

is_random_seeded(9) was introduced in r346282, but the function was unused
prior to r346358, which introduced this regression.  The regression broke
initial seeding of arc4random(9) and broke periodic reseeding[A], until something
other than arc4random(9) invoked read_random(9) or read_random_uio(9) directly.
(Such as userspace getrandom(2) or read(2) of /dev/random.  By default,
/etc/rc.d/random does this during multiuser start-up.)

(2) The conditions under which Fortuna will reseed (including initial seeding)
are: (a) sufficient "entropy" (by sheer byte count; default 64) is collected
in the zeroth pool (of 32 pools), and (b) it has been at least 100ms since
the last reseed (to prevent trivial DoS; part of FS&K design).  Prior to
this revision, initial seeding might have been prevented if the reseed
function was invoked during the first 100ms of boot.

This revision addresses both of these issues.  If random_fortuna_seeded()
observes a zero fs_counter, it invokes random_fortuna_pre_read() and checks
again.  This addresses the problem where entropy actually was sufficient,
but nothing had attempted a read -> pre_read yet.

The second change is to disable the 100ms reseed guard when Fortuna has
never been seeded yet (fs_lasttime == 0).  The guard is intended to prevent
gratuitous subsequent reseeds, not initial seeding!

Machines running CURRENT between r346358 and this revision are encouraged to
refresh when possible.  Keys generated by userspace with /dev/random or
getrandom(9) during this timeframe are safe, but any long-term session keys
generated by kernel arc4random consumers are potentially suspect.

[A]: Broken in the sense that is_random_seeded(9) false negatives would cause
arc4random(9) to (re-)seed with weak entropy (SHA256(cyclecount ||
FreeBSD_version)).

PR:		237869
Reported by:	delphij, dim
Reviewed by:	delphij
Approved by:	secteam(delphij)
X-MFC-With:	r346282, r346358 (if ever)
Security:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20239
2019-05-13 19:35:35 +00:00
Mark Johnston
aa0a893384 Add an UPDATING entry and bump __FreeBSD_version for r347532.
Reported by:	rgrimes, Oliver Pinter <oliver.pinter@hardenedbsd.org>
2019-05-13 18:48:08 +00:00
Mark Johnston
8cd6a80d7d Restore the pre-r347532 behaviour of ignoring wiring failures in mmap().
The error handling added in r347532 is not right when mapping vnodes
and will be fixed separately.

Reported by:	syzbot+1d2cc393bd6c88a548be@syzkaller.appspotmail.com
MFC with:	r347532
2019-05-13 18:40:01 +00:00
Dmitry Chagin
6e4cf32e95 Add warning to the Linuxulator makefiles that building it outside of a
kernel does not make sence.

PR:		222861
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20179
2019-05-13 18:28:40 +00:00
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Dmitry Chagin
caaad8736e Linuxulator getpeername() returns EINVAL in case then namelen less then 0.
MFC after:	2 weeks
2019-05-13 18:14:20 +00:00
Dmitry Chagin
d5368bf3df Our bsd_to_linux_sockaddr() and linux_to_bsd_sockaddr() functions
alter the userspace sockaddr to convert the format between linux and BSD versions.
That's the minimum 3 of copyin/copyout operations for one syscall.

Also some syscall uses linux_sa_put() and linux_getsockaddr() when load
sockaddr to userspace or from userspace accordingly.

To avoid this chaos, especially converting sockaddr in the userspace,
rewrite these 4 functions to convert sockaddr only in kernel and leave
only 2 of this functions.

Also in order to reduce duplication between MD parts of the Linuxulator put
struct sockaddr conversion functions that are MI out into linux_common module.

PR:		232920
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20157
2019-05-13 17:48:16 +00:00
Mark Johnston
54a3a11421 Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes.  User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.

The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2).  Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks.  In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process.  The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.

The choice to count virtual user-wired pages rather than physical
pages was done for simplicity.  There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.

The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded.  For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.

Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM.  Users that wish to exceed the limit must tune
vm.max_user_wired.

Reviewed by:	kib, ngie (mlock() test changes)
Tested by:	pho (earlier version)
MFC after:	45 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00
Andrey V. Elsukov
af1f58df99 Do not leak memory used for binary filter. 2019-05-13 14:07:02 +00:00
Andrey V. Elsukov
699281b545 Rework locking in BPF code to remove rwlock from fast path.
On high packets rate the contention on rwlock in bpf_*tap*() functions
can lead to packets dropping. To avoid this, migrate this code to use
epoch(9) KPI and ConcurrencyKit's lists.

* all lists changed to use CK_LIST;
* reference counting added to bpf_if and bpf_d;
* now bpf_if references ifnet and releases this reference on destroy;
* each bpf_d descriptor references bpf_if when it is attached;
* new struct bpf_program_buffer introduced to keep BPF filter programs;
* bpf_program_buffer, bpf_d and bpf_if structures are freed by
  epoch_call();
* bpf_freelist and ifnet_departure event are no longer needed, thus
  both are removed;

Reviewed by:	melifaro
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D20224
2019-05-13 13:45:28 +00:00
Andrey V. Elsukov
740d4c7c9f Revert r347402. After r347429 symlink is no longer needed. 2019-05-13 08:34:13 +00:00
Mark Johnston
11a5fc4fb9 Catch up with r347241.
MFC with:	r347241
2019-05-13 01:18:17 +00:00
Ruslan Bukin
b803d0b790 Add support for HiFive Unleashed -- the board with a multi-core RISC-V SoC
from SiFive, Inc.

The first core on this SoC (hart 0) is a 64-bit microcontroller.

o Pick a hart to run boot process using hart lottery.
  This allows to exclude hart 0 from running the boot process.
  (BBL releases hart 0 after the main harts, so it never wins the lottery).
o Renumber CPUs early on boot.
  Exclude non-MMU cores. Store the original hart ID in struct pcpu. This
  allows to find out the correct destination for IPIs and remote sfence
  calls.

Thanks to SiFive, Inc for the board provided.

Reviewed by:	markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20225
2019-05-12 16:17:05 +00:00
Emmanuel Vadot
da153f8e60 arm: allwinner: aw_clk_nm: Don't reparent the clock if we didn't ask
When looking for the best frequency don't change the clock parent if the
clock wasn't configured to do that.
2019-05-12 15:27:01 +00:00
Mateusz Guzik
8ba6c1391b cache: fix a brainfart in r347505
If bumping over the counter goes over the limit we have to decrement it back.

Previous code would only bump the counter after adding the entry (thus allowing
the cache to go over the limit).

Sponsored by:	The FreeBSD Foundation
2019-05-12 07:56:01 +00:00
Mateusz Guzik
2425b5168c seqc: fix sed-introduced typos (seqcuence -> sequence)
Sponsored by:	The FreeBSD Foundation
2019-05-12 07:13:25 +00:00
Mateusz Guzik
b72515e129 amd64: tidy up pagezero*/pagecopy (movq -> movl)
Sponsored by:	The FreeBSD Foundation
2019-05-12 07:11:44 +00:00
Mateusz Guzik
5bf50787e6 cache: bump numcache on entry, while here fix lnumcache type
Sponsored by:	The FreeBSD Foundation
2019-05-12 06:59:22 +00:00
Mateusz Guzik
45372f1a6f amd64: fixup MEMMOVE comment (10 -> r10)
Sponsored by:	The FreeBSD Foundation
2019-05-12 06:42:17 +00:00
Mateusz Guzik
63ad3b65b0 cache: push sdt probes in cache_zap_locked to code doing the work
Avoids branching to check which probe to evaluate. Very same check was
being done later to do the actual work.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:39:30 +00:00
Mateusz Guzik
a8c2fcb287 x86: store pending bitmapped IPIs in per-cpu areas
This gets rid of the global cpu_ipi_pending array.

While replace cmpset with fcmpset in the delivery code and opportunistically
check if given IPI is already pending.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:36:54 +00:00
Mateusz Guzik
8eae2be460 amd64: stop re-reading curpc in suword
Plugs re-reads missed in r341719

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:34:58 +00:00
Mateusz Guzik
5e57adc874 random(4): depessimize arc4random
- __predict_false reseeding on entry as it is almost never true.
- don't blindly atomic_cmpset as on x86 it ends up dirtying the cacheline.
it almost ever succeeds per above
- fetch the timestamp prior to getting the cpu number

Reviewed by:	cem
Approved by:	secteam (delphij)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20242
2019-05-12 06:32:46 +00:00
Cy Schubert
706a3d9c65 Support the use of the ipsec kld.
X-MFC with:	r347410
2019-05-11 17:59:13 +00:00
Doug Moore
87ae0686a2 A new parameter to blist_alloc specifies an upper bound on the size of
the allocation request, so that the blocks allocated are from the next
set of free blocks big enough to satisfy the minimum requirements of
the request, and the number of blocks allocated are as many as
possible, up to the specified maximum. The implementation of
swp_pager_getswapspace uses this parameter to ask for a number of
blocks between the new halved request size and the previous failed
request size. Thus a request for 32 blocks may fail, but instead of
getting only 16 blocks instead, the caller asks for 16 to 31 next, and
might get 19 or 27, which is closer to what they originally wanted.

I expect this to lead to bigger block allocations and less block
fragmentation, at least in some cases.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20001
2019-05-11 16:15:13 +00:00
Emmanuel Vadot
f78a4afd30 twsi: Calculate the clock param based on the bus frequency
Instead of precalculating the different speed, respect the bus frequency
and calculate the clock register parameter based on it.
If the platform didn't register the core clk, fallback on the precomputed
values (This is likely do be the case on Marvell boards).
2019-05-11 15:03:51 +00:00
Emmanuel Vadot
e69181cfc6 allwinner: clk: sun8i_r: Correct resets
The i2c reset wasn't defined and some bits where wrong, correct them.
2019-05-11 15:02:55 +00:00
Emmanuel Vadot
45f64a5956 allwinner: clk: prediv_mux: Init the current parent
Do not init the first parent but read the clock register to find
it's current parent and init this one.
2019-05-11 15:02:20 +00:00
Doug Moore
48e98a2afc Callers of swp_pager_getswapspace get either as many blocks as they
requested, or none, and in the latter case it is up to them to pick a
smaller request to make - which they always do by halving the failed
request. This change to swp_pager_getswapspace leaves the task of
downsizing the request to the function and not its caller. It still
does so by halving the original request.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20228
2019-05-11 10:16:43 +00:00
Doug Moore
535192530c When bitpos can't be implemented with an inline ffs* instruction,
change the binary search so that it does not depend on a single bit
only being set in the bitmask. Use bitpos more generally, and avoid
some clearing of bits to accommodate its current behavior.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20237
2019-05-11 09:09:10 +00:00
Kyle Evans
81b3b91e6b tuntap: Improve style
No functional change.

tun_flags of the tuntap_driver was renamed to ident_flags to reflect the
fact that it's a subset of the tun_flags that identifies a tuntap device.
This maps more easily (visually) to the TUN_DRIVER_IDENT_MASK that masks off
the bits of tun_flags that are applicable to tuntap driver ident. This is a
purely cosmetic change.
2019-05-11 04:18:06 +00:00
Doug Moore
0cb36fc9c2 Revert r347469.
Approved by: kib (mentor)
2019-05-11 02:13:52 +00:00
Conrad Meyer
64e7d18f34 netdump: Ref the interface we're attached to
Serialize netdump configuration / deconfiguration, and discard our
configuration when the affiliated interface goes away by monitoring
ifnet_departure_event.

Reviewed by:	markj, with input from vangyzen@ (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20206
2019-05-10 23:12:59 +00:00
Doug Moore
12cd7ded68 Don't use _Generic, as many systems don't know about it. Go back to a lo-tech switch statement.
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20235
2019-05-10 23:12:37 +00:00
Conrad Meyer
070e7bf95e netdump: Fix boot-time configuration typo
Boot-time netdump configuration is much more useful if one can configure the
client and gateway addresses.  Fix trivial typo.

(Long-standing bug, I believe it dates to the original netdump commit.)

Spotted by:	one of vangyzen@ or markj@
Sponsored by:	Dell EMC Isilon
2019-05-10 23:10:22 +00:00
Johannes Lundberg
5098ed5f3b Implement linux_pci_unregister_drm_driver in linuxkpi so that drm drivers
can be unloaded.

This patch is a part of D19565.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
2019-05-10 23:10:22 +00:00
Doug Moore
4ab18ea23a When bitpos can't be implemented with an inline ffs* instruction,
change the binary search so that it does not depend on a single bit
only being set in the bitmask. Use bitpos more generally, and avoid
some clearing of bits to accommodate its current behavior.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20232
2019-05-10 22:49:01 +00:00
Doug Moore
d4808c4403 Add a (q)uit option to the subr_blist test program.
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20234
2019-05-10 22:02:29 +00:00
Conrad Meyer
6144b50f8b netdump: Don't store sensitive key data we don't need
Prior to this revision, struct diocskerneldump_arg (and struct netdump_conf
with embedded diocskerneldump_arg before r347192), were copied in their
entirety to the global 'nd_conf' variable.  Also prior to this revision,
de-configuring netdump would *not* remove the the key material from global
nd_conf.

As part of Encrypted Kernel Crash Dumps (EKCD), which was developed
contemporaneously with netdump but happened to land first, the
diocskerneldump_arg structure will contain sensitive key material
(kda_key[]) when encrypted dumps are configured.

Netdump doesn't have any use for the key data -- encryption is handled in
the core dumper code -- so in this revision, we no longer store it.

Unfortunately, I think this leak dates to the initial import of netdump in
r333283; so it's present in FreeBSD 12.0.

Fortunately, the impact *seems* relatively minor.  Any new *netdump*
configuration would overwrite the key material; for active encrypted netdump
configurations, the key data stored was just a duplicate of the key material
already in the core dumper code; and no user interface (other than
/dev/kmem) actually exposed the leaked material to userspace.

Reviewed by:	markj, rpokala (earlier commit message)
MFC after:	2 weeks
Security:	yes (minor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20233
2019-05-10 21:55:11 +00:00
Gleb Smirnoff
54bb7ac0c4 Fix regression from r347375: do not panic when sending an IP multicast
packet from an interface that doesn't have IPv4 address.

Reported by:	Michael Butler <imb protected-networks.net>
2019-05-10 21:51:17 +00:00
John Baldwin
c9d337083f Apply r280991 to ip6_fragment.
This uses m_dup_pkthdr() to copy all of the metadata about a packet to
each of its fragments including VLAN tags, mbuf tags, etc. instead of
hand-copying a few fields.

Reviewed by:	bz
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20117
2019-05-10 20:15:40 +00:00
Doug Moore
09b380a1ff Replace the expression "-mask & ~mask" with a function call that does
the same thing, but is commented so that it might be better
understood.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20231
2019-05-10 19:55:29 +00:00
Justin Hibbits
f04019c3c6 powerpc: Initialize the Hardware Interrupt Offset Register (HIOR) earlier for ppc970
Since we now have a much larger KVA on powerpc64, it's possible to get SLB
traps earlier in boot, possibly even before the HIOR is properly configured
for us.  Move the HIOR setup to immediately after reset, so that we use our
exception handlers instead of Open Firmware's.

PR:		233863
Submitted by:	Mark Millard (partial)
Reported by:	Mark Millard
MFC after:	2 weeks
2019-05-10 19:36:14 +00:00
Doug Moore
d1c34a6b76 blist_next_leaf_alloc walks over all the meta-nodes between one leaf
and the next one, and if blocks are allocated from the next leaf, it
walks back toward where it started, as long as there are interleaving
meta-nodes to be updated on account of the last free blocks under
those meta-nodes being allocated. Only if the walk goes all the way
back to the starting point must we calculate the position of the
meta-node that is the least-comment parent of one leaf and the next,
and update a bit in that meta-node to indicate the allocation of its
last free block.

There's no need to start calculating the position of that least-common
parent until the walk back reaches the original starting point, and
there's no need for a calculation that updates 'radius' to tell us
when we've walked back to the beginning, since comparing scan to next
suffices for that.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20229
2019-05-10 18:25:06 +00:00
Doug Moore
b1f59c92d8 Replace panic() with KASSERT() and provide more useful information when failure happens.
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20226
2019-05-10 18:22:40 +00:00
Bryan Drewery
29317e6a0e Fix build race with machine links and genoffset.o.
Generate the ilinks for all dependency objects not just the ones
in the CLEAN list.

Possibly related to r345351

Reported by:	kmoore
MFC after:	2 weeks
X-MFC-with:	r345351
Sponsored by:	Dell EMC Isilon
2019-05-10 18:09:27 +00:00
Emmanuel Vadot
595d5338fa arm64: rockchip: Don't always put PLL to normal mode
We used to put every PLL in normal mode (meaning that the output would
be the result of the PLL configuration) instead of slow mode (the output
is equal to the external oscillator frequency, 24-26Mhz) but this doesn't
work for most of the PLLs as when we put them into normal mode the registers
configuring the output frequency haven't been set.
Add a normal_mode member in clk_pll_def/clk_pll_sc struct and if it's true
we then set the PLL to normal mode.
For now only set it to the LPLL and BPLL (Little cluster PLL and Big cluster
PLL respectively).

Reviewed by:	ganbold
Differential Revision:	https://reviews.freebsd.org/D20174
2019-05-10 16:45:17 +00:00