The miscellaneous x86 sysent->sv_setregs() implementations tried to
migrate PSL_T from the previous program to the new executed one, but
they evaluated regs->tf_eflags after the whole regs structure was
bzeroed. Make this functional by saving PSL_T value before zeroing.
Note that if the debugger is not attached, executing the first
instruction in the new program with PSL_T set results in SIGTRAP, and
since all intercepted signals are reset to default dispostion on
exec(2), this means that non-debugged process gets killed immediately
if PSL_T is inherited. In particular, since suid images drop
P_TRACED, attempt to set PSL_T for execution of such program would
kill the process.
Another issue with userspace PSL_T handling is that it is reset by
trap(). It is reasonable to clear PSL_T when entering SIGTRAP
handler, to allow the signal to be handled without recursion or
delivery of blocked fault. But it is not reasonable to return back to
the normal flow with PSL_T cleared. This is too late to change, I
think.
Discussed with: bde, Ali Mashtizadeh
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D14995
This was inadvertently overriding the first found SYSDIR with the last
of /usr/src which could result in the wrong headers being used if not
building from /usr/src.
SYSDIR?= is not used here to avoid evaluating the exists() when unneeded.
Reported by: rgrimes, sjg, Mark Millard
Pointyhat to: bdrewery
Sponsored by: Dell EMC
"Terminus BSD Console" is a derivative of Terminus that is provided
by Mr. Dimitar Zhekov under the 2-clause BSD license for use by the
FreeBSD vt(4) console and other BSDs.
PR: 227409
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
In pti-enabled pmap, the PCID allocation scheme assigns temporal id
for the kernel page table, and user page table twin PCID is
calculating by setting high bit in the kernel PCID. So the kernel AS
is mapped with per-vmspace PCID, and we must completely shut down all
mappings in KVA when switching contexts, so that newly switched thread
would see all changes in KVA occured while it was not executing.
After all, KVA is same between all threads.
Currently the pti context switch for the user part of the page table
gets its TLB entries flushed too. It is excessive. The same PCID
flushing algorithm that is used for non-pti pmap, correctly works for
the UVA mappings. The only shared TLB entries are the pages from KVA
accessed by the kernel entry trampoline. All of them are static
except per-thread TSS and LDT. For TSS and LDT, the lifetime of newly
allocated entries is the whole thread life, so it is fine as well. If
not fine, then explicit shutdowns for current pmap of the newly
allocated LDT and TSS pages would be enough.
Also restore the constant value for the pm_pcid for the kernel_pmap.
Before, for PTI pmap, pm_pcid was erronously rolled same as user
pmap's pm_pcid, but it was not used.
Reviewed by: markj (previous version)
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D14961
After r331668 handling of F_NOT flag done in one place by
print_instruction() function. Also remove unused argument from
print_ip[6]() functions.
MFC after: 1 week
Some BIOSes have trouble booting from GPT in non-UEFI mode. This is
commonly reported with Lenovo laptops, including my x220. As we do not
currently support booting FreeBSD/i386 via UEFI there's no reason to
prefer GPT.
The "vestigial swap partition" was added in r265017 to work around an
issue with loader's GPT support, so we should not need it when using
MBR.
We may want to make the same change to amd64, although the issue there is
mitigated by such systems booting via UEFI in the common case.
PR: 227422
Reviewed by: gjb
MFC after: 3 weeks
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Originally, on the VAX exect() enable tracing once the new executable
image was loaded. This was possible because tracing was controllable
through user space code by setting the PSL_T flag. The following
instruction is a system call that activated tracing (as all
instructions do) by copying PSL_T to PSL_TP (trace pending). The
first instruction of the new executable image would trigger a trace
fault.
This is not portable to all platforms and the behavior was replaced with
ptrace(PT_TRACE_ME, ...) since FreeBSD forked off of the CSRG repository.
Platforms either incorrectly call execve(), trigger trace faults inside
the original executable, or do contain an implementation of this
function.
The exect() interfaces is deprecated or removed on NetBSD and OpenBSD.
Submitted by: Ali Mashtizadeh <ali@mashtizadeh.com>
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D14989
mdoc treats verbatim quotes in .Dl as a string delimiter and does
not pass them to the rendered output. Use special char \*q to specify
double quote
PR: 216755
MFC after: 3 days
To create hybrid boot media we want to specify a partition at a known location.
This extends the syntax of size partitions to include an optional offset that
can be absolute or relative. It also introduces validation to make sure that
this hasn't resulted in overlapping partitions. I haven't added this to the
file and process partition specifications yet but the mechanics are designed
such that if someone comes up with a good way of specifying the offset it
will be fairly easy to add in.
Reviewed by: imp
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14916
Add one extra lock initialization to iflib_register() that was missed
in the git<->phab conversion.
Split out flag manipulation from general context manipulation in iflib
To avoid blocking on the context lock in the swi thread and risk potential
deadlocks, this change protects lighter weight updates that only need to
be consistent with each other with their own lock.
Submitted by: Matthew Macy <mmacy@mattmacy.io>
Reviewed by: shurd
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14967
Directory mtime will only change if a file is added or removed, not
modified. For /var/cron/tabs, this is fine because of how crontab(1) manages
it using temp files so all crontab(1) changes will trigger a reload of the
database.
For /etc/cron.d and /usr/local/etc/cron.d, this is not necessarily the case.
Instead of checking their mtime, we should descend into them and check mtime
on all jobs also.
Reported by: des
Reviewed by: bapt
MFC after: 1 week
The change adds -t <name> option to zpool create and -t option to zpool
import in its form with an old name and a new name. This allows to
import (or create) a pool under a name that's different from its real,
permanent name without affecting that name. This is useful when working
with VM images or images of other physical systems if they happen to
have a ZFS pool with the same name as the host system.
The changes come from ZoL with some small tweaks.
The porting has been done by julian.
The change is being submitted to OpenZFS:
https://github.com/openzfs/openzfs/pull/600
Submitted by: julian
Reviewed by: smh
MFC after: 2 weeks
Sponsored by: Panzura (porting)
Differential Revision: https://reviews.freebsd.org/D14972
Changelist:
- Turn tx_rings and rx_rings arrays into arrays of pointers to kring
structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib,
vtnet and ptnet drivers to cope with the change.
- Generalize the nm_config() callback to accept a struct containing many
parameters.
- Introduce NKR_FAKERING to support buffers sharing (used for netmap
pipes)
- Improved API for external VALE modules.
- Various bug fixes and improvements to the netmap memory allocator,
including support for externally (userspace) allocated memory.
- Refactoring of netmap pipes: now linked rings share the same netmap
buffers, with a separate set of kring pointers (rhead, rcur, rtail).
Buffer swapping does not need to happen anymore.
- Large refactoring of the control API towards an extensible solution;
the goal is to allow the addition of more commands and extension of
existing ones (with new options) without the need of hacks or the
risk of running out of configuration space.
A new NIOCCTRL ioctl has been added to handle all the requests of the
new control API, which cover all the functionalities so far supported.
The netmap API bumps from 11 to 12 with this patch. Full backward
compatibility is provided for the old control command (NIOCREGIF), by
means of a new netmap_legacy module. Many parts of the old netmap.h
header has now been moved to netmap_legacy.h (included by netmap.h).
Approved by: hrs (mentor)
In UTF-8 locales mandoc uses a number of characters outside of the Basic
Latin group, e.g. from general punctuation or miscellaneous mathematical
symbols, and these rendered as ? in text mode.
This change adds (char, replacement, code point, description):
– - U+2013 En Dash
⟨ < U+27E8 Mathematical Left Angle Bracket
⟩ > U+27E9 Mathematical Right Angle Bracket
This change addresses some common cases; there are others that still
need to be added after a more thorough review.
PR: 227409
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Also, since ifc_nhwrxqs is only used in one place, remove it from the struct.
This was preventing iflib_dma_free() from being called via
iflib_device_detach().
Submitted by: Matthew Macy <mmacy@mattmacy.io>
Reviewed by: shurd
Sponsored by: Limelight Networks
Refactor the currdev setting to find the device we booted from. Limit
searching when we don't already have a reasonable currdev from that to
the same device only. Search a little harder for ZFS volumes as that's
needed for loader.efi to live on an ESP.
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D13784
There's problems with them. The order of efi stuff isn't quite right,
and there's various problems. Revert until thos problems can be fixed.
Reviewed by: kevans@
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).
Reviewed by: kib, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15017
pf ioctls frequently take a variable number of elements as argument. This can
potentially allow users to request very large allocations. These will fail,
but even a failing M_NOWAIT might tie up resources and result in concurrent
M_WAITOK allocations entering vm_wait and inducing reclamation of caches.
Limit these ioctls to what should be a reasonable value, but allow users to
tune it should they need to.
Differential Revision: https://reviews.freebsd.org/D15018
Now that 10 years have passed since the original limit of 10000 was
committed, bump it a little bit.
Spinning waiting for writers is semi-informed in the sense that we always
know if the owner is running and base the decision to spin on that.
However, no such information is provided for read-locking. In particular
this means that it is possible for a write-spinner to completely waste cpu
time waiting for the lock to be released, while the reader holding it was
preempted and is now waiting for the spinner to go off cpu.
Nonetheless, in majority of cases it is an improvement to spin instead of
instantly giving up and going to sleep.
The current approach is pretty simple: snatch the number of current readers
and performs that many pauses before checking again. The total number of
pauses to execute is limited to 10k. If the lock is still not free by
that time, go to sleep.
Given the previously noted problem of not knowing whether spinning makes
any sense to begin with the new limit has to remain rather conservative.
But at the very least it should also be related to the machine. Waiting
for writers uses parameters selected based on the number of activated
hardware threads. The upper limit of pause instructions to be executed
in-between re-reads of the lock is typically 16384 or 32678. It was
selected as the limit of total spins. The lower bound is set to
already present 10000 as to not change it for smaller machines.
Bumping the limit reduces system time by few % during benchmarks like
buildworld, buildkernel and others. Tested on 2 and 4 socket machines
(Broadwell, Skylake).
Figuring out how to make a more informed decision while not pessimizing
the fast path is left as an exercise for the reader.
Add a -R option to setfacl to operate recursively on directories, along
with the accompanying flags -H, -L, and -P (whose behaviour mimics
chmod).
A patch was submitted with PR 155163, but this is a new implementation
based on comments raised in the Phabricator review for that patch
(review D9096).
PR: 155163
Submitted by: Mitchell Horne <mhorne063@gmail.com>
Reviewed by: jilles
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14934
Sometimes the values contain geli passphrases being communicated from
loader(8) to the kernel, and some day the compiler may decide to start
eliding calls to memset() for a pointer which is not dereferenced again
before being passed to free().
Most other architectures already re-enter KDB on faults, powerpc and mips
are the only outliers. Correct this for powerpc, so that now bad addresses
can be handled gracefully instead of panicking.