The removed text claimed that memcpy is implemented using bcopy and thus
strings may overlap. Use of bcopy is an implementation detail that is
no longer true, even if the implementation (on some archs) does allow
overlap.
In any case behaviour is undefined per the C standard if memcpy is
called with overlapping objects, and this man page already claimed that
src and dst may not overlap.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31192
rmacklem@ spotted two things in the system call:
- Upon returning from a successful operation, vop_stddeallocate can
update rmsr.r_offset to a value greater than file size. This behavior,
although being harmless, can be confusing.
- The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is
undocumented.
This commit has the following changes:
- vop_stddeallocate and shm_deallocate to bound the the affected area
further by the file size.
- The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is
documented.
- The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return
len is explicitly documented the be the value 0, and the return offset
is restricted to be the smallest of off + len and current file size
suggested by kib@. This semantic allows callers to interact better
with potential file size growth after the call.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D31604
Add missing wrapper code to librt for these new functions so that
SIGEV_THREAD works. Without machinery to convert it to SIGEV_THREAD_ID,
you got EINVAL.
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31618
Allow multiple vector IOs to be started with one system call.
aio_readv() and aio_writev() already used these opcodes under the
covers. This commit makes them available to user space.
Being non-standard extensions, they're only visible if __BSD_VISIBLE is
defined, like the functions.
Reviewed by: asomers, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31627
Variant I architectures use off and Variant II ones use size + off.
Define TLS_VARIANT_I/TLS_VARIANT_II symbols similarly to how libc
handles it.
Reviewed by: kib
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31539
Differential revision: https://reviews.freebsd.org/D31541
Add support for 'VDSO_TH_ALGO_X86_PVCLK'; add vDSO-based timekeeping for
devices that support the KVM/XEN paravirtual clock API.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
Remove a useless note about unlinking temporary files, they are unlinked
in tmpfile(3) [1]. Add a note about __cxa_atexit().
Explain exactly what are the FreeBSD implementation differences between
exit() and _Exit().
Noted by: markj [1]
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D31425
Add fflush(stdout) as the common idiom. Explain the need to use exit()
but advise against it.
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D31425
_PC_MIN_HOLE_SIZE and _PC_DEALLOC_PRESENT were mixed somehow before this
fix.
Sponsored by: The FreeBSD Foundation
Reviewed by: delphij
Differential Revision: https://reviews.freebsd.org/D31436
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.
The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.
fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.
fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28347
Current POSIX standard requires fork() to be async-signal safe. Neither
our implementation, nor implementations in other operating systems are,
and practically it is impossible to make fork() async-signal safe without
too much efforts. Also, that would put undue requirement that all atfork
handlers should be async-signal safe as well, which contradicts its main
use.
As result, Austin Group dropped the requirement, and added a new function
_Fork() that should be async-signal safe, but it does not call atfork
handlers. Basically, _Fork() can be implemented as a raw syscall.
Release of glibc 2.34 added _Fork(), do the same for FreeBSD.
Clarify threading behavior for fork() in the manpage.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31378
They deliberately read out-of-bounds values to avoid byte-by-byte
loads and check multiple bytes at once. While this will work on x86,
it is flagged as an out-of-bounds read with ASAN, so we have to
disable instrumentation here. This also causes bounds errors for CHERI,
so in CheriBSD we use implementations that avoid OOB reads.
Differential Revision: https://reviews.freebsd.org/D31045
The ifunc resolver is called before the sanitizer runtime is initialized,
so any instrumentation results in an immediate crash.
Reviewed By: kib
Differential Revision: https://reviews.freebsd.org/D31046
This is needed to bootstrap llvm-tblgen on Linux since LLVM calls
`::open(...)` which does not work if open is a statement macro.
Also stop defining O_SHLOCK/O_EXLOCK and update the only bootstrap tools
user of those flags to deal with missing definitions.
Reviewed By: jrtc27
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31226
Linux standardized what we call CLOCK_{REALTIME,MONOTONIC}_FAST as
CLOCK_{REALTIME,MONOTONIC}_COARSE. In addition, Linux spells
CLOCK_UPTIME as CLOCK_BOOTTIME.
Add aliases to time.h and document these new aliases in
clock_gettime(2).
Reviewed by: vangyzen, kib (prior), dchagin (prior)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30988
of the /dev/hpet and /dev/hv_tsc devices, to not leak internal libc
filedescriptors on exec.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31344
The left side of the MIN() expression is the (signed) result of pointer
subtraction (ptrdiff_t). The right hand side is the also the (signed)
result of pointer subtraction, additionally subtracting the element size
('es'), which is unsigned size_t. This coerces the right-hand
expression into an unsigned value. MIN(signed, unsigned) triggers
-Wsign-compare.
Sorting elements of size greater than SSIZE_MAX is nonsensical, so we
can instead treat the element size as ssize_t, leaving the right-hand
result the same signedness as the left.
Reviewed by: arichardson, kib
Differential Revision: https://reviews.freebsd.org/D31292
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.
This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.
Reviewed by: philip (network), kbowling (transport), gbe (manpages)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26652
Before this patch there was a chance for thread that called rand(3)
slightly later to see rand3_state already allocated, but not yet
initialized. While this API is not expected to be thread-safe, it
is not expected to crash. ztest on 64-thread system reproduced it
reliably for me.
Submitted by: avg@
MFC after: 1 month
Before this patch there was a chance for thread that called rand(3)
slightly later to see rand3_state already allocated, but not yet
initialized. While this API is not expected to be thread-safe, it
is not expected to crash. ztest on 64-thread system reproduced it
reliably for me.
MFC after: 1 month
The early environment is typically cleared, so these new options
need the PRESERVE_EARLY_KENV kernel config(8) option. These environments
are reported as missing by kenv(1) if the option is not present in the
running kernel.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D30835
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
This permits more efficient accesses of thread-local variables, which
are heavily used at least by jemalloc and locale-aware code. Note that
on amd64 and i386, jemalloc's thread-local variables already have their
TLS model overridden by defining JEMALLOC_TLS_MODEL.
For now the change is applied only to tested platforms, but should in
principle be enabled everywhere.
PR: 255840
Suggested by: jrtc27
Reviewed by: kib
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31070
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
Finally, we have the correct function definition for strmode. NetBSD/OpenBSD
did this many years ago. This code is weird sign extension safe.
Reviewed by: imp@
Pull Request: https://github.com/freebsd/freebsd-src/pull/493
When debugging POSIX shared memory issues, it's really
useful to learn that there is a command line tool now
to manipulate shared memory segments.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D30896
so that libc vdso and kernel syscall give closer results.
Reported by: dchagin
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30873
Call binuptime inside switch statement, instead of pre-calculating
the abs argument.
Change the type of the abs argument to bool.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30873
We can use the buffer passed to fread(3) directly in the FILE *.
The buffer needs to be reset before each call to __srefill().
This preserves the expected behavior in all cases.
The change was found originally in OpenBSD and later adopted by NetBSD.
MFC after: 2 weeks
Obtained from: OpenBSD (CVS 1.18)
Differential Revision: https://reviews.freebsd.org/D30548
Previously, a negative change list length would be treated the same as
an empty change list. A negative event list length would result in
bogus copyouts. Make kevent(2) return EINVAL for both cases so that
application bugs are more easily found, and to be more robust against
future changes to kevent internals.
Reviewed by: imp, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30480
Document that LOG_PID is ignored and can not be disabled.
This change was made along with the move from RFC 3164 to RFC 5424 log messages.
PR: 255664
Reported by: des.gaufres@gmail.com
Reviewed by: gbe, jilles
Approved by: gbe (mentor, manpages), jilles
There are still references to timed(8) and timedc(8) in the base system,
which were removed in 2018.
PR: 255425
Reported by: Ceri Davies <ceri at submonkey dot net>
Reviewed by: ygy, gbe
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30232
It reopens the passed file descriptor, checking the file backing vnode'
current access rights against open mode. In particular, this flag allows
to convert file descriptor opened with O_PATH, into operable file
descriptor, assuming permissions allow that.
Reviewed by: markj
Tested by: Andrew Walker <awalker@ixsystems.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30148
It writes the core of live stopped process to the file descriptor
provided as an argument.
Based on the initial version from https://reviews.freebsd.org/D29691,
submitted by Michał Górny <mgorny@gentoo.org>.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
Teach poll(2) to support Linux-style POLLRDHUP events for sockets, if
requested. Triggered when the remote peer shuts down writing or closes
its end.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29757
While most 64-bit architectures have an assembly implementation of this
file, RISC-V does not. As we now store 8 bytes instead of 4 it should speed
up RISC-V.
Reviewed By: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29536
While most 64-bit architectures have an assembly implementation of this
file RISC-V does not. As we now copy 8 bytes instead of 4 it should speed
up RISC-V. Using intptr_t instead of int also allows using this file for
CHERI pure-capability code since trying to copy pointers using integer
loads/stores will invalidate pointers.
Reviewed By: kib
Obtained from: CheriBSD (partially)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29535
This commit should not have introduced any functional changes, but
apparently it did. This appears to have broken LDAP setups.
Reverting for now. Will reland once I have fixed the breakage.
This reverts commit 5245bf7b92.
Reported By: Александр Недоцуков, brd
MFC after: immediately
It seems to be a nice idea to show how fork() is usually used in
practice. This may act as a guide to developers who want to quickly
recall how to use the fork() function.
Reviewed by: bcr, yuripv
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27626
if VREAD access is checked as allowed during open
Requested by: wulf
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29323
by only keeping hold count on the vnode, instead of the use count.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29323
It is currently allowed to fchownat(2), fchmodat(2), fchflagsat(2),
utimensat(2), fstatat(2), and linkat(2).
For linkat(2), PRIV_VFS_FHOPEN privilege is required to exercise the flag.
It allows to link any open file.
Requested by: trasz
Tested by: pho, trasz
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29111
This is the same change as d36d681615, but for libc static implementaion
of dl_iterate_phdr().
Reported by: emacsray@gmail.com
PR: 254774
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29623
This was only needed on 32-bit arm prior to ARMv6. As we only support
ARMv6 or later remove it.
Reviewed by: mannu
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29624
dl_iterate_phdr() dlpi_tls_data should provide the TLS module segment
address, and not the TLS init segment address as it does now.
Reported by: emacsray@gmail.com
PR: 254774
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Instead of polling nleft[i] (without appropriate memory barriers!) and
using sleep() to detect the exit just call pthread_join() on all threads.
Also replace the use of a mutex that guarding the increments with atomic
fetch_add. This should reduce the runtime of this test on SMP systems.
Finally, remove all the debug printfs unless DEBUG_OUTPUT is set in
the environment.
Test Plan: still fails sometimes on qemu (but maybe less often?)
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D29390
POWER architecture CPUs (Book-S) require natural alignment for
cache-inhibited storage accesses. Since we can't know the caching model
for a page ahead of time, always enforce natural alignment in bcopy.
This fixes a SIGBUS when calling the function with misaligned pointers
on POWER7.
Submitted by: Bruno Larsen <bruno.larsen@eldorado.org.br>
Reviewed by: luporl, bdragon (IRC)
MFC after: 1 week
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D28776
The error cases (goto fin) of _nsdispatch were missing the unlock.
This change also drops the checks for __isthreaded since the pthread stubs
are already no-ops if threads are not being used. Dropping those conditionals
allows clang's thread safety analysis to deal with the file and also makes
the code a bit more readable. While touching the file also add a few more
assertions in debug mode that the right locks are held.
Reviewed By: markj
Differential Revision: https://reviews.freebsd.org/D29372
- Defined MAXLINE constant (8192 octets by default instead 2048) for
centralized limit setting up. It sets maximum number of characters of
the syslog message. RFC5424 doesn't limit maximum size of the message.
Named after MAXLINE in syslogd(8).
- Fixed size of fmt_cpy buffer up to MAXLINE for rendering formatted
(%m) messages.
- Introduced autoexpansion of sending socket buffer up to MAXLINE.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D27205
With an out-of-tree Clang, we can use the -resource-dir flag when linking
to point it at the runtime libraries from the current SYSROOT.
This moves the path to the clang-internal library directory to a separate
.mk file that can be used by Makefiles that want to find the sanitizer
libraries. I intend to re-use this .mk file for my upcoming changes that
allow building the entire base system with ASAN/UBSAN/MSAN.
Reviewed By: dim
Differential Revision: https://reviews.freebsd.org/D28852
with the reasoning that the flags did not worked properly, and were not
shipped in a release.
O_RESOLVE_BENEATH is kept as useful.
Reviewed by: markj
Tested by: arichardson, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D28907
Parentheses added to HASZERO macro to avoid a GCC warning, and formatted
with clang-format as we have adopted these and don't consider them
'contrib' code.
Obtained from: musl (snapshot at commit 4d0a82170a25)
Reviewed by: kib (libc integration), mjg (both earlier)
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17630
The default behavior for attaching processes to jails is that the jail's
cpuset augments the attaching processes, so that it cannot be used to
escalate a user's ability to take advantage of more CPUs than the
administrator wanted them to.
This is problematic when root needs to manage jails that have disjoint
sets with whatever process is attaching, as this would otherwise result
in a deadlock. Therefore, if we did not have an appropriate common
subset of cpus/domains for our new policy, we now allow the process to
simply take on the jail set *if* it has the privilege to widen its mask
anyways.
With the new logic, root can still usefully cpuset a process that
attaches to a jail with the desire of maintaining the set it was given
pre-attachment while still retaining the ability to manage child jails
without jumping through hoops.
A test has been added to demonstrate the issue; cpuset of a process
down to just the first CPU and attempting to attach to a jail without
access to any of the same CPUs previously resulted in EDEADLK and now
results in taking on the jail's mask for privileged users.
PR: 253724
Reviewed by: jamie (also discussed with)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28952
This flag has been set on startup since 65618fdda0.
However, This causes some of the math-related tests to fail as they report
zero instead of a tiny number. This fixes at least
/usr/tests/lib/msun/ldexp_test and possibly others.
Additionally, setting this flag prevents printf() from printing subnormal
numbers in decimal form.
See also https://www.openwall.com/lists/musl/2021/02/26/1
PR: 253847
Reviewed By: mmel
Differential Revision: https://reviews.freebsd.org/D28938
This caused LDBL_MANT_DIG to not be defined and therefore the scalbnl
alias was not being emitted for double==long double platforms.
Fixes: 760b2ffc ("Update scalbn* functions to the musl versions")
Reported by: Jenkins
We could just use a C implementation using __builtin_fabs(), but using
this assembly version guarantees that there is no additional prolog/epilog
code. Additionally, clang generates worse code for masking off the top bit
than GCC: https://bugs.llvm.org/show_bug.cgi?id=49377.
This fixes the RISCV64 softfloat world build after cf97d2a1da. That commit
added -fno-builtin to the msun tests which resulted in the first references to
fabs (previously the compiler inlined all calls).
Reviewed By: dim
Reported by: mjg
Differential Revision: https://reviews.freebsd.org/D28994
Building R on powerpc64 exposed a problem in fpsetmask() whereby we
were not properly clamping the provided mask to the valid range.
This same issue affects powerpc and powerpcspe.
Properly limit the range of bits that can be set via fpsetmask().
While here, use the correct fp_except_t type instead of fp_rnd_t.
Reported by: pkubaj, jhibbits (in IRC)
Sponsored by: Tag1 Consulting, Inc.
MFC after: 1 week
Building R exposed a problem in fpsetmask() whereby we were not properly
clamping the provided mask to the valid range.
R initilizes the mask by calling fpsetmask(~0) on FreeBSD. Since we
recently enabled precise exceptions, this was causing an immediate
SIGFPE because we were attempting to set invalid bits in the fpscr.
Properly limit the range of bits that can be set via fpsetmask().
While here, use the correct fp_except_t type instead of fp_rnd_t.
Reported by: pkubaj (in IRC)
MFC after: 1 week
Sponsored by: Tag1 Consulting, Inc.
All supported platforms support thread-local vars and __thread.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28796
Add a BUGS section about using pwrite(2) when O_APPEND is set on the fd.
MFC after: 3 days
Submitted by: Ka Ho Ng <khng300@gmail.com>
Reviewed by: gbe, yuripv
Differential Revision: https://reviews.freebsd.org/D28372
jail_attach(2) performs an internal chroot operation, leaving it up to
the calling process to assure the working directory is inside the jail.
Add a matching internal chdir operation to the jail's root. Also
ignore kern.chroot_allow_open_directories, and always disallow the
operation if there are any directory descriptors open.
Reported by: mjg
Approved by: markj, kib
MFC after: 3 days
This causes problems when using ASAN with a runtime older than 12.0 since
the intercept does not expect qsort() to call itself using an interposable
function call. This results in infinite recursion and stack exhaustion
when a binary compiled with -fsanitize=address calls qsort.
See also https://bugs.llvm.org/show_bug.cgi?id=46832 and
https://reviews.llvm.org/D84509 (ASAN runtime patch).
To prevent this problem, this patch uses a static helper function
for the actual qsort() implementation. This prevents interposition and
allows for direct calls. As a nice side-effect, we can also move the
qsort_s checks to the top-level function and out of the recursive calls.
Reviewed By: kib
Differential Revision: https://reviews.freebsd.org/D28133
Preserve more space for swap devise names.
Prevent line overflow with long devise name.
Don't draw a bar when swap is not used at all.
Simplify and optimize code.
Change the label to end at end of 100%.
PR: 251655
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27496
It was reported that getdirentries(2) was
returning dirents with d_off set to 0 for an NFS
mount.
This is believed to be correct behaviour at
this time (it may change for some NFS mounts
in the future), but is inconsistent with what the
getdirentries(2) man page says.
This patch fixes the man page.
This is a content change.
PR: 253428
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28664
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.
This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.
This reverts commit 710e45c4b8.
It breaks for some corner cases on big endian ppc64.
Given the stage of the release process it is best to revert for now.
Reported by: jhibbits
This is a tradeoff which saves jumps for smaller sizes while making
the 8-16 range slower (roughly in line with the other cases).
Tested with glibc test suite.
For example size 3 (most common with vfs namecache) (ops/s):
before: 407086026
after: 461391995
The regressed range of 8-16 (with 8 as example):
before: 540850489
after: 461671032
The previous code neglected to use primitives which can find the end
of the string without having to branch on every character.
While here augment the somewhat misleading commentary -- strlen as
implemented here leaves performance on the table, especially so for
userspace. Every arch should get a dedicated variant instead.
In the meantime this commit lessens the problem.
Tested with glibc test suite.
Naive test just calling strlen in a loop on Haswell (ops/s):
$(perl -e "print 'A' x 3"):
before: 211198039
after: 338626619
$(perl -e "print 'A' x 100"):
before: 83151997
after: 98285919
This is all code only run on ARMv4 and ARMv5. Support for these have
been dropped from FreeBSD.
Differential Revision: https://reviews.freebsd.org/D28314
This was only used when building for ARMv4 or some ARMv5 or when
_STANDALONE is defined. As ARMv4 and ARMv5 support has been removed,
and we only define _STANDALONE in the bootloader where we don't use
this version of memcpy we can remove it.
Differential Revision: https://reviews.freebsd.org/D28313
Because the "files" and "compat" implementations failed to set the
"stayopen", keyed lookups would close the database handle, contrary to
the purpose of setgroupent(3). setpassent(3)'s implementation does not
have this bug.
PR: 165527
Submitted by: Andrey Simonenko
MFC after: 1 month
The getpwent(3) and getgrent(3) implementations maintain some internal
iterator state. Interleaved calls to functions which do passwd/group
lookups using a key, such as getpwnam(3), would in some cases clobber
this state, causing a subsequent getpwent() or getgrent() call to
restart iteration from the beginning of the database or to terminate
early. This is particularly troublesome in programming environments
where execution of green threads is interleaved within a single OS
thread.
Take care to restore any iterator state following a keyed lookup. The
"files" provider for the passwd database was already handling this
correctly, but "compat" was not, and both providers had this problem
when accessing the group database.
PR: 252094
Submitted by: Viktor Dukhovni <ietf-dane@dukhovni.org>
MFC after: 1 month
Some NSS regression tests for getgrent(3) and getpwent(3) were not
testing anything because the test incorrectly requested creation of a
database snapshot.
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
This file has other questionable code and "optimizations" (such as copying
one int at a time) that are probably no longer useful, so it might make
sense to replace it with a different implementation at some point.
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D28134
Define a non-const static char EMSG[] = "" to avoid having to add
__DECONST() to all uses of EMSG. Also make current_dash a const char *
to fix this warning.
Previously, we would accept any kind of LIO_* opcode, including ones
that were intended for in-kernel use only like LIO_SYNC (which is not
defined in userland). The situation became more serious with
022ca2fc7f. After that revision, setting
aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion.
Note that POSIX does not specify what should happen if aio_lio_opcode is
invalid.
MFC-with: 022ca2fc7f
Reviewed by: jhb, tmunro, 0mp
Differential Revision: <https://reviews.freebsd.org/D28078
Without wrapping, rtld services and malloc(3) are not guaranteed
to operate correctly in the forked child.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28088
which makes stack prot correct for non-main threads created by binaries
with statically linked libthr.
Cache result, but do not engage into the full double-checked locking,
since calculation of the return value is idempotent.
PR: 252549
Reported and reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28075
Detect and use RDTSCP if available, instead of fence+RDTSC. For AMD Zens+,
use LFENCE+RDTSC instead of RDTSCP (or MFENCE;RDTSC previously).
Reviewed by: gallatin, markj
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
Create array of rdtsc selectors and provide helper that calculate the
index into the selectors array.
Reviewed by: gallatin, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
Instead of providing ifuncs for each kind of fence, define ifuncs
that combine fence and invocation of RDTSC. This refactoring makes
introduction of RDTSCP use possible.
Reviewed by: gallatin, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
regcomp.c uses the "start + count < end" idiom to check that there are
"count" bytes available in an array of char "start" and "end" both point to.
This is fine, unless "start + count" goes beyond the last element of the
array. In this case, pedantic interpretation of the C standard makes the
comparison of such a pointer against "end" undefined, and optimizers from
hell will happily remove as much code as possible because of this.
An example of this occurs in regcomp.c's bothcases(), which defines
bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...
Because bothcases() and p_bracket() are static functions in regcomp.c, there
is a real risk of miscompilation if aggressive inlining happens.
The following diff rewrites the "start + count < end" constructs into "end -
start > count". Assuming "end" and "start" are always pointing in the array
(such as "bracket[3]" above), "end - start" is well-defined and can be
compared without trouble.
As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a
bit.
PR: 252403
aio_fsync(O_DSYNC, ...) is the asynchronous version of fdatasync(2).
Reviewed by: kib, asomers, jhb
Differential Review: https://reviews.freebsd.org/D25071
POSIX O_DSYNC means that writes include an implicit fdatasync(2), just
as O_SYNC implies fsync(2).
VOP_WRITE() functions that understand the new IO_DATASYNC flag can act
accordingly, but we'll still pass down IO_SYNC so that file systems that
don't understand it will continue to provide the stronger O_SYNC
behaviour.
Flag also applies to fcntl(2).
Reviewed by: kib, delphij
Differential Revision: https://reviews.freebsd.org/D25090
As suggested in D27598. This also supports MK_WERROR.clang=no and
MK_WERROR.gcc=no to support the existing NO_WERROR.<compiler> uses.
Reviewed By: brooks
Differential Revision: https://reviews.freebsd.org/D27601
POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.
It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.
Reviewed by: jhb, kib, bcr
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D27743
PR#252358 reported a serious performance problem w.r.t.
cp(1) when copying large non-sparse files.
This problem appears to have been caused by cp(1)
calling copy_file_range(2) with a small "len" argument.
This patch adds a recommendation to use a large "len"
value where possible, for performance reasons.
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D27935
The current POSIX.1-202x draft (1.1) was used as source material.
Submitted by: Soumendra Ganguly <soumendraganguly@gmail.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27787
The cpuset(2) tests should be run as root (require.user properly set) with
>= 3 cpus for maximum coverage. All tests that want to modify the cpuset
don't assume any particular cpu layout (i.e. the first cpu may not be 0, the
last may not be first + count) and the following scenarios are tested:
1.) newset: basic execute cpuset() to grab a new cpuset, make sure the
assigned cpuset then has a different ID.
2.) transient: create a new cpuset then assign the process its original
cpuset, ensuring that the one we created is now gone.
3.) deadlk: test assigning an anonymous mask, then resetting the process
base affinity with 1-cpu overlap w.r.t. the anonymous mask and with
0-cpu overlap w.r.t. the anonymous mask.
4.) jail_attach_newbase: process attaches to a jail with its own
cpuset+mask (e.g. cpuset -c -l 1,2 jail -c path=/ command=/bin/sh)
5.) jail_attach_newbase_plain: process attaches to a jail with its own
cpuset (e.g. cpuset -c jail -c path=/ command=/bin/sh)
6.) jail_attach_prevbase: process attaches to a jail with the containing
jail's root cpuset (e.g. jail -c path=/ command=/bin/sh)
7.) jail_attach_plain: process attaches to a jail with the containing jail's
root cpuset+mask.
8.) badparent: creates a new cpuset and modifies the anonymous thread mask,
then setid's back to the original and checks that cpuset_getid() returns
the expected set.
Differential Revision: https://reviews.freebsd.org/D27307
Add shims to map NetBSD's API to CPUSET(9). Obviously the invalid input
parts of these tests are relatively useless since we're just testing the
shims that aren't used elsewhere, there's still some amount of value in
the parts testing valid inputs.
Differential Revision: https://reviews.freebsd.org/D27307