In mountd.c, the grouplist structures are linked into a single global
linked list headed by "grphead". The only use of this linked list is
to free all list elements when the exportlist elements are also all being
free'd at the time the exports are being reloaded.
This patch replaces this one global linked list head with a list head in
each exportlist structure, where the grouplist elements for that exported
file system are linked.
The only change is that now the grouplist elements are free'd with the
associated exportlist element as they are free'd instead of all grouplist
elements being free'd after the exportlist elements are free'd. This
change should have no effect in practice.
This is being done, since a future patch that will add a "-I" option for
incrementally updating the exports in the kernel needs to know which
grouplist elements are associated with each exported file system and
having them linked into a list headed by the exportlist element does that.
MFC after: 1 month
r333175 converted the global multicast lock to a sleepable sx lock,
so the lock order with respect to the (non-sleepable) inp lock changed.
To handle this, r333175 and r333505 added code to drop the inp lock,
but this opened races that could leave multicast group description
structures in an inconsistent state. This change fixes the problem by
simply acquiring the global lock sooner. Along the way, this fixes
some LORs and bogus error handling introduced in r333175, and commits
some related cleanup.
Reported by: syzbot+ba7c4943547e0604faca@syzkaller.appspotmail.com
Reported by: syzbot+1b803796ab94d11a46f9@syzkaller.appspotmail.com
Reviewed by: ae
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20070
of the 'flags' argument to mmap(2) with Linux strace(1).
Reviewed by: dchagin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20223
r346670 added an sx to close a race between the ifioctl handler and
interface destruction. Unfortunately, it clears if_softc immediately after
the interface is closed, but before if_detach has been invoked.
Any time before detachment, an interface that's part of a bridge may still
receive traffic that's pushed through tunstart/tunstart_l2 and promptly
lead to a panic because if_softc is now NULL.
Fix it by deferring the clearing of if_softc until after the interface has
detached and thus been removed from the bridge. if_softc still gets cleared
in case another thread has already entered the ioctl handler before it's
replaced with ifdead_ioctl.
Reported by: markj
MFC after: 3 days
The upstream implementation of -z ifunc-noplt disallows its combination
with -z text. The option does not have much significance for kernel
builds, though.
Reviewed by: kib (previous version)
Discussed with: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20260
Microarchitectural buffers on some Intel processors utilizing
speculative execution may allow a local process to obtain a memory
disclosure. An attacker may be able to read secret data from the
kernel or from a process when executing untrusted code (for example,
in a web browser).
Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html
Security: CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
Security: FreeBSD-SA-19:07.mds
Reviewed by: jhb
Tested by: emaste, lwhsu
Approved by: so (gtetlow)
We have a better, more comprehensive knob for this now:
kern.random.initial_seeding.bypass_before_seeding=1.
Requested by: delphij
Sponsored by: Dell EMC Isilon
file_loadraw():
check for file_alloc() and strdup() results.
we leak 'name'.
mod_load() does leak 'filename'.
mod_loadkld() does not need to check fp, file_discard() does check.
Release BPF_LOCK() before invoking if_output() and if_input().
Also enter epoch section before releasing lock, this should prevent
access to ifnet that may be freed on interface detach.
Reported by: markj
(1) We may have had sufficient entropy to consider Fortuna seeded, but the
random_fortuna_seeded() function would produce a false negative if
fs_counter was still zero. This condition could arise after
random_harvestq_prime() processed the /boot/entropy file and before any
read-type operation invoked "pre_read()." Fortuna's fs_counter variable is
only incremented (if certain conditions are met) by reseeding, which is
invoked by random_fortuna_pre_read().
is_random_seeded(9) was introduced in r346282, but the function was unused
prior to r346358, which introduced this regression. The regression broke
initial seeding of arc4random(9) and broke periodic reseeding[A], until something
other than arc4random(9) invoked read_random(9) or read_random_uio(9) directly.
(Such as userspace getrandom(2) or read(2) of /dev/random. By default,
/etc/rc.d/random does this during multiuser start-up.)
(2) The conditions under which Fortuna will reseed (including initial seeding)
are: (a) sufficient "entropy" (by sheer byte count; default 64) is collected
in the zeroth pool (of 32 pools), and (b) it has been at least 100ms since
the last reseed (to prevent trivial DoS; part of FS&K design). Prior to
this revision, initial seeding might have been prevented if the reseed
function was invoked during the first 100ms of boot.
This revision addresses both of these issues. If random_fortuna_seeded()
observes a zero fs_counter, it invokes random_fortuna_pre_read() and checks
again. This addresses the problem where entropy actually was sufficient,
but nothing had attempted a read -> pre_read yet.
The second change is to disable the 100ms reseed guard when Fortuna has
never been seeded yet (fs_lasttime == 0). The guard is intended to prevent
gratuitous subsequent reseeds, not initial seeding!
Machines running CURRENT between r346358 and this revision are encouraged to
refresh when possible. Keys generated by userspace with /dev/random or
getrandom(9) during this timeframe are safe, but any long-term session keys
generated by kernel arc4random consumers are potentially suspect.
[A]: Broken in the sense that is_random_seeded(9) false negatives would cause
arc4random(9) to (re-)seed with weak entropy (SHA256(cyclecount ||
FreeBSD_version)).
PR: 237869
Reported by: delphij, dim
Reviewed by: delphij
Approved by: secteam(delphij)
X-MFC-With: r346282, r346358 (if ever)
Security: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20239
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.
So, we should prevent our users from building obviously defective modules.
Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.
PR: 222861
Reviewed by: trasz
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20178
- Add some coverage for cap_sysctl(3).
- Add a test for the case where the caller wishes to find the sysctl
output length without specifying an output buffer.
Reviewed by: oshogbo
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17856
These complement cap_sysctlbyname(3) to provide a drop-in
replacement for the corresponding libc functions.
Also revise the libcap_sysctl limit interface to provide access
to sysctls by MIB, and to avoid direct manipulation of nvlists
by the caller.
Reviewed by: oshogbo
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17854
alter the userspace sockaddr to convert the format between linux and BSD versions.
That's the minimum 3 of copyin/copyout operations for one syscall.
Also some syscall uses linux_sa_put() and linux_getsockaddr() when load
sockaddr to userspace or from userspace accordingly.
To avoid this chaos, especially converting sockaddr in the userspace,
rewrite these 4 functions to convert sockaddr only in kernel and leave
only 2 of this functions.
Also in order to reduce duplication between MD parts of the Linuxulator put
struct sockaddr conversion functions that are MI out into linux_common module.
PR: 232920
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20157
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes. User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.
The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2). Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks. In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process. The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.
The choice to count virtual user-wired pages rather than physical
pages was done for simplicity. There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.
The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded. For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.
Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM. Users that wish to exceed the limit must tune
vm.max_user_wired.
Reviewed by: kib, ngie (mlock() test changes)
Tested by: pho (earlier version)
MFC after: 45 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19908
On high packets rate the contention on rwlock in bpf_*tap*() functions
can lead to packets dropping. To avoid this, migrate this code to use
epoch(9) KPI and ConcurrencyKit's lists.
* all lists changed to use CK_LIST;
* reference counting added to bpf_if and bpf_d;
* now bpf_if references ifnet and releases this reference on destroy;
* each bpf_d descriptor references bpf_if when it is attached;
* new struct bpf_program_buffer introduced to keep BPF filter programs;
* bpf_program_buffer, bpf_d and bpf_if structures are freed by
epoch_call();
* bpf_freelist and ifnet_departure event are no longer needed, thus
both are removed;
Reviewed by: melifaro
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D20224
passwd related files need to be tagged as config file so pkg update
will attempt merging them when we install a new package.
We should use CONFS for that.
Revert for now until I come up with a better version of this patch as
it breaks pkgbase for users.
from SiFive, Inc.
The first core on this SoC (hart 0) is a 64-bit microcontroller.
o Pick a hart to run boot process using hart lottery.
This allows to exclude hart 0 from running the boot process.
(BBL releases hart 0 after the main harts, so it never wins the lottery).
o Renumber CPUs early on boot.
Exclude non-MMU cores. Store the original hart ID in struct pcpu. This
allows to find out the correct destination for IPIs and remote sfence
calls.
Thanks to SiFive, Inc for the board provided.
Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20225
If bumping over the counter goes over the limit we have to decrement it back.
Previous code would only bump the counter after adding the entry (thus allowing
the cache to go over the limit).
Sponsored by: The FreeBSD Foundation
This gets rid of the global cpu_ipi_pending array.
While replace cmpset with fcmpset in the delivery code and opportunistically
check if given IPI is already pending.
Sponsored by: The FreeBSD Foundation
- __predict_false reseeding on entry as it is almost never true.
- don't blindly atomic_cmpset as on x86 it ends up dirtying the cacheline.
it almost ever succeeds per above
- fetch the timestamp prior to getting the cpu number
Reviewed by: cem
Approved by: secteam (delphij)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20242
Factor code into two functions.
read_exportfile() a functon which reads the exports file(s) and calls
get_exportlist_one() to process each of them.
delete_export() a function which deletes the exports in the kernel for a file
system.
The contents of these functions is just the same code as was used to do the
operations, moved into separate functions. As such, there is no semantic change.
This is being done in preparation for a future commit that will add an
option to do incremental changes of kernel exports upon receiving SIGHUP.
MFC after: 1 month
the allocation request, so that the blocks allocated are from the next
set of free blocks big enough to satisfy the minimum requirements of
the request, and the number of blocks allocated are as many as
possible, up to the specified maximum. The implementation of
swp_pager_getswapspace uses this parameter to ask for a number of
blocks between the new halved request size and the previous failed
request size. Thus a request for 32 blocks may fail, but instead of
getting only 16 blocks instead, the caller asks for 16 to 31 next, and
might get 19 or 27, which is closer to what they originally wanted.
I expect this to lead to bigger block allocations and less block
fragmentation, at least in some cases.
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20001
The rewrite of strcmp in assembly uses an instruction added in PowerISA
2.05, making it SIGILL on CPUs older than the POWER6, such as the PPC970 in
the PowerMac G5. Revert this until we get clang+lld, or retire the in-tree
binutils in favor of newer binutils with IFUNC support, whichever comes
first.
Instead of precalculating the different speed, respect the bus frequency
and calculate the clock register parameter based on it.
If the platform didn't register the core clk, fallback on the precomputed
values (This is likely do be the case on Marvell boards).