The second 'linuxcommon' line was added by c66f5b079d
but Linuxulator's modules dependend on 'linux_common'.
To avoid such mistakes in the future rename moduledata name and module
name to 'linux_common' and retire 'linuxcommon' line.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D30409
MFC after: 2 weeks
Remove all the 'entry' and 'return' probes; they clutter up the source
and are redundant to FBT.
Reviewed By: dchagin
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30040
This is required for the current Arch Linux binaries to work.
PR: 254112
Reviewed By: emaste
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29218
Without it, Qt5 apps from Focal fail to start, being unable to load
their plugins. It's also necessary for glibc 2.33, as found in recent
Arch snapshots.
PR: 254112
Reviewed By: kib
Sponsored by: The FreeBSD Foundation, EPSRC
Differential Revision: https://reviews.freebsd.org/D28192
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.
Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'. This matches what
I've observed under Ubuntu Focal VM.
Reviewed By: chuck, dchagin
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29609
For native FreeBSD binaries, the return value from __getcwd(2)
doesn't really matter, as the libc wrapper takes over and returns
the proper errno.
PR: kern/254120
Reported By: Alex S <iwtcex@gmail.com>
Reviewed By: kib
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29217
linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO. When destroying the VDSO object, we failed to
destroy the mapping and free KVA. Fix this.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28696
Current epoll implementation stores udata fields of epoll_event
structure in special dynamically-sized table rather than in udata field
of backing kevent structure because of 2 reasons:
1. Kevent's udata size is smaller than epoll's on 32-bit archs.
2. Kevent's udata can be clobbered on execution EPOLL_CTL_ADD as kqueue
modifies existing event while epoll returns error in this case.
After r320043 has introduced four new 64bit user data members (ext[]),
we can store epoll udata in one of them and drop aforementioned table.
According to kqueue_register() source code ext members are not updated
when existing kevent is modified that fixes p.2.
As a side effect the patch fixes PR/252582.
Reviewed by: trasz
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D28169
It returns "unconfined", like Linux without SELinux would.
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28164
Previously the flags were passed as-is, which could resulted
in spurious EAGAIN returned for non-blocking sockets, which
broke some Steam games.
PR: 248065
Reported By: Alex S <iwtcex@gmail.com>
Tested By: Alex S <iwtcex@gmail.com>
Reviewed By: emaste
MFC After: 3 days
Sponsored By: The FreeBSD Foundation
The lock around callout_drain() is unnecessary and may cause
deadlock when one closes a timer descriptor during timer execution.
Reviewed By: delphij
Submitted By: ankohuu_outlook.com (Shunchao Hu)
Differential Revision: https://reviews.freebsd.org/D28148
On Linux, read(2) from a timerfd file descriptor returns an unsigned
8-byte integer (uint64_t) containing the number of expirations
that have occurred, if the timer has already expired one or more
times since its settings were last modified using timerfd_settime(),
or since the last successful read(2). That's to say, once we do
a read or call timerfd_settime(), timer fd's expiration count should
be zero. Some Linux applications create timerfd and add it to epoll
with LT mode, when event comes, they do timerfd_settime instead
of read to stop event source from trigger. On FreeBSD,
timerfd_settime(2) didn't set the count to zero, which caused high
CPU utilization.
Submitted by: ankohuu_outlook.com (Shunchao Hu)
Differential Revision: https://reviews.freebsd.org/D28231
Replace all uses of kern_mmap with kern_mmap_req move the old kern_mmap.
Reand rename kern_mmap_req to kern_mmap .
The helper saved some code churn initially, but having multiple
interfaces is sub-optimal.
Obtained from: CheriBSD
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28292
This is required for Qt5, as found in Ubuntu Focal. The library contains
the minimum kernel version encoded in an ELF note; this makes rtld ignore
it altogether, with a confusing error message. Without it, things fail
like this:
$ konsole: error while loading shared libraries: libQt5Core.so.5: cannot
open shared object file: No such file or directory
For reference, the Qt kernel version requirements can be found at:
https://github.com/qt/qtbase/blob/dev/src/corelib/global/minimum-linux_p.h
Sponsored by: The FreeBSD Foundation
Reviewed By: emaste
Differential Revision: https://reviews.freebsd.org/D28105
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues
are not passable between processes. And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow. This came up when debugging strange HW video decode
failures in Firefox. A native implementation would avoid these problems
and help with porting Linux software.
Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.
Submitted by: greg@unrelenting.technology
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26668
linux_common.c to linux_util.c so they become available on i386.
linux_common.c defines the linux_common kernel module but this module does
not exist on i386 and linux_common.c is not included in the linux module.
linux_util.c is included in the linux_common module on amd64 and the linux
module on i386.
Remove linux_common.c from files.i386 again. It was added recently in
r367433 when the DTrace provider definitions were moved.
The V4L feature declarations were moved to linux_common in r283423.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1(). This change depends on the pwddesc split
introduced in r367777.
I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag). This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.
In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.
The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).
Discussed with: markj, trasz (earlier version)
Differential Revision: https://reviews.freebsd.org/D27016
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084
from a Linux binary. Should come handy for AppImages.
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26959
- map those IPv4 / IPv6 socket options which exist in FreeBSD
+ most of them visually verified to have the same type/layout of arguments
+ not tested with linux programs to behave as intended
- be more human readable for known options which are not handled
- be more verbose for unhandled socket message flags we know about
- print the jail ID in linux_msg if run in a jail
- add possibility to print debug message about known missing parts only once
- add multiple levels of sysctl linux.debug:
1: print debug messages, tell about unimplemented stuff (only once)
2: like 1, but also print messages about implemented but not tested
stuff (only once)
3+: like 2, but no rate limiting of messages
- increase default linux debug level from 1 to 3
We are a lot more verbose in as we need to be (e.g. some of the IP socket
options which are the same, and share the same memory layout, and are
believed to work). The reason is that we have no good testsuite to test those
linux-bits. The LTP or other test suites like the python one, are not fully
up to the task we need. As such the excessive messages about emulated but not
tested socket options.
IMO any MFC (possible, but most probably not by me) should set the default
debug level to 1.
Discussed with: trasz
Move dtrace SDT definitions into linux_common module code. Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.
Reported by: several
PR: 250897
Discussed with: emaste, trasz
Tested by: John Kennedy, Yasuhiro KIMURA, Oleg Sidorkin
X-MFC-With: r367395
Differential Revision: https://reviews.freebsd.org/D27124
This will be used by fuse(4).
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26974
Add some missing netlink_family definitions and produce vaguely
human-readable error messages for those definitions, like we used to do for
just ROUTE and KOBJECT_UEVENTS.
Additionally, if we know it's a netfilter socket but didn't find it in the
table, fall back to printing that instead of the generic handler ("socket
domain 16, ...").
No change to the emulator correctness, just mildly improved diagnostics for
gaps.
Proxy the flag to the roughly analogous FreeBSD procctl 'TRACE'.
TRACE-disabled processes are not coredumped, and Linux !DUMPABLE processes
can not be ptraced. There are some additional semantics around ownership of
files in the /proc/[pid] pseudo-filesystem, which we do not attempt to
emulate correctly at this time.
Reviewed by: markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D27015
This is required by some major linux applications, such as Chrome and
Firefox. (As well as Electron-using applications, which are essentially
a bundled version of Chrome.)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27012
On Linux, sqlite probes for underlying F2FS filesystems that support
certain kinds of atomic update with this ioctl. The expected result on
non-F2FS filesystem (i.e., all FreeBSD filesystems) is any error value.
Minimally implement the ioctl and avoid the warning message.
(This shows up in Linux Chrome, which embeds sqlite.)
Reviewed by: emaste, trasz
Differential Revision: https://reviews.freebsd.org/D27050
more readable. While here, add linux_check_errtbl() function to make
sure we don't leave holes.
No objections: emaste (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26972
Turns out the dummy rlimits fix prlimit(1), but break su(8)
(login-1:4.5-1ubuntu2) - although not sudo(8), for some reason.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26814
defaults, makes core files smaller, and fixes applications which use
pthread_join(3) in a wrong way, namely Steam.
This is based on a patch submitted by Jason Yang, which I've reworked
to set the limit instead of only changing the value reported (which
is enough to fix the bug for Linux pthreads, but could be confusing).
PR: 248225
Submitted by: Jason_YH_Yang at wistron.com (earlier version)
Analyzed by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26778
and current address space is already destroyed, so kern_execve()
terminates the process.
While there, clean up some internals of post_execve() inlined in init_main.
Reported by: Peter <pmc@citylink.dinoex.sub.org>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26525
This makes it possible to run an unmodified Linux syzkaller executor
against the Linuxulator, and have it gather code coverage information.
Sponsored by: The FreeBSD Foundation
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.
It defaults to on (== current behavior).
make -C /root/linux-5.3-rc8 -s -j 1 bzImage:
use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
After r340674, the "continue" would restart the loop without having
updated clen, resulting in an infinite loop. Restore the old behaviour
of simply ignoring all control messages on such sockets, since we
currently only implement handling for AF_UNIX-specific messages.
Reported by: syzkaller
Reviewed by: tijl
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26093
vm_object_madvise() is a no-op for unmanaged objects, but we should also
limit the scope of mappings on which pmap_remove() is called. In
particular, with the WIP largepage shm objects patch the kernel must
remove mappings of such objects along superpage boundaries, and without
this check Linux madvise(MADV_DONTNEED) could violate that requirement.
Reviewed by: alc, kib
MFC with: r362631
Sponsored by: Juniper Networks, Klara Inc.
Differential Revision: https://reviews.freebsd.org/D26084
It is documented as a raw hardware-based clock not subject to NTP or
incremental adjustments. With this "not as precise as CLOCK_MONOTONIC"
description in mind, map it to our CLOCK_MONOTNIC_FAST (the same
mapping as for the linux CLOCK_MONOTONIC_COARSE).
This is needed for the webcomponent of steam (chromium) and some
other steam component or game.
The linux-steam-utils port contains a LD_PRELOAD based fix for this.
There this is mapped to CLOCK_MONOTONIC.
As an untrained ear/eye (= the majority of people) is normaly not
noticing a difference of jitter in the 10-20 ms range, specially
if you don't pay attention like for example in a browser session
while watching a video stream, the mapping to CLOCK_MONOTONIC_FAST
seems more appropriate than to CLOCK_MONOTONIC.
The reason for this is to work around an idiosyncrasy of glibc
getttynam(3) implementation: it checks whether st_dev returned for
fd 0 is the same as st_dev returned for the target of /proc/self/fd/0
symlink, and with linux chroots having their own devfs instance,
the check will fail if you chrooted into it.
PR: kern/240767
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25559
memfd_create fds will no longer require an ftruncate(2) to set the size;
they'll grow (to the extent that it's possible) upon write(2)-like syscalls.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25502
it would fail with EINVAL, breaking some of the Python regression
tests.
While here, cap the user-controlled message length.
Note that the code doesn't seem to be copying out the new length
in either (success or failure) case. This will be addressed separately.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25392
TCGETS et al are frequently issued by Linux binaries while the previous code
avoidably ping-pongs a global sx lock and serializes on Giant.
Note that even with the fix the common case will serialize on a per-tty lock.
and fixes a bug where calling accept(2) could result in closing fd 0.
Note that the code still contains a number of problems: it makes
assumptions about l_sockaddr_in being the same as sockaddr_in,
the EFAULT-related code looks like it doesn't work at all, and the
socket type check is racy. Those will be addressed later on;
I'm trying to work in small steps to avoid breaking one thing while
fixing another.
It fixes Redis, among other things.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25461
rpokala notes that splitting the definitions like this is kind of silly,
since the comment applies to both. Move the comment up (or the definition
down, depending on your perspective on life) accordingly.
Reported by: rpokala
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.
The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.
Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:
- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
(?)
Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.
PR: 240874
Reviewed by: kib, trasz
Differential Revision: https://reviews.freebsd.org/D21845
with python3.8 from Focal triggers those.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25491
Linux MADV_DONTNEED is not advisory: it has side effects for anonymous
memory, and some system software depends on that. In particular,
MADV_DONTNEED causes anonymous pages to be discarded. If the mapping is
a private mapping of a named object then subsequent faults are to
repopulate the range from that object, otherwise pages will be
zero-filled. For mappings of non-anonymous objects, Linux MADV_DONTNEED
can be implemented in the same way as our MADV_DONTNEED.
This implementation differs from Linux semantics in its handling of
private mappings, inherited through fork(), of non-anonymous objects.
After applying MADV_DONTNEED, subsequent faults will repopulate the
mapping from the parent object rather than the root of the shadow chain.
PR: 230160
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25330
FreeBSD madvise(2) directly. While some of the flag values match,
most don't.
PR: kern/230160
Reported by: markj
Reviewed by: markj
Discussed with: brooks, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25272
If multithreaded non-Linux process execs Linux binary, then non-Linux
threads different from the one that execing are cleared by
single-threading at boundary, and then terminating them in
post_execve(). Since at that time the process is already switched to
linux ABI, linuxolator is involved in the thread clearing on boundary,
but cannot find the emul data.
Handle it by pre-creating emuldata for all threads in the execing process.
Also remove a code in linux_proc_exec() handler that cleared emul data
for other threads when execing from multithreaded Linux process. It is
excessive.
PR: 247020
Reported by: Martin FIlla <freebsd@sysctl.cz>
Reported by: Henrique L. Amorim, Independent Security Researcher
Reported by: Rodrigo Rubira Branco (BSDaemon), Amazon Web Services
Reviewed by: markj
Tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25293
PR: kern/240432
Analyzed by by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25248
the debug messages. While here, clean up some variable naming.
Reviewed by: bcr (manpages), emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25230
- Use the same definition of free memory as Linux.
- Rename the totalbig and freebig fields to match the corresponding
names on Linux.
Discussed with: alc
MFC after: 1 week
applications, which often depend on this being the case. There's a new
sysctl, compat.linux.default_openfiles, to control this behaviour.
Reviewed by: kevans, emaste, bcr (manpages)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25177
standard SO_SNDBUF/SO_RCVBUF. Mostly cosmetics, to get rid
of the warning during 'apt upgrade'.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25173
The previous code was computing an incorrect value in a very expensive
manner. "sharedram" is supposed to be the amount of memory used by
named swap objects, which on FreeBSD basically corresponds to memory
usage by shared memory objects (including, for example, GEM objects) and
tmpfs. We currently have no cheap way to count such pages. The
previous code tried to determine the number of copy-on-write pages
shared between processes.
Just replace the computed value with 0. illumos reportedly does the
same thing. Linux itself did not populate this field until a 2014
commit, "mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfaces".
Reported by: mjg
MFC after: 1 week
Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h
and replace existing definitation with includes where required. This
eliminates duplicate code and allows Linux and FreeBSD compatability
headers to be included in the same files.
Input from: cem, jhb
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24275
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal. We take advantage of the mandatory
zeroing of fields not listed in the initializer.
Remove kern_mmap_fpcheck() and use kern_mmap_req().
The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations. In CheriBSD we have already added two more arguments.
Reviewed by: kib
Discussed with: kevans
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D23164
On Linux the valid range of priorities for the SCHED_FIFO and SCHED_RR
scheduling policies is [1,99]. For SCHED_OTHER the single valid priority is
0. On FreeBSD it is [0,31] for all policies. Programs are supposed to
query the valid range using sched_get_priority_(min|max), but of course some
programs assume the Linux values are valid.
This commit adds a tunable compat.linux.map_sched_prio. When enabled
sched_get_priority_(min|max) return the Linux values and sched_setscheduler
and sched_(get|set)param translate between FreeBSD and Linux values.
Because there are more Linux levels than FreeBSD levels, multiple Linux
levels map to a single FreeBSD level, which means pre-emption might not
happen as it does on Linux, so the tunable allows to disable this behaviour.
It is enabled by default because I think it is unlikely that anyone runs
real-time software under Linux emulation on FreeBSD that critically relies
on correct pre-emption.
This fixes FMOD, a commercial sound library used by several games.
PR: 240043
Tested by: Alex S <iwtcex@gmail.com>
Reviewed by: dchagin
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D23790
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Submitted by: Bora Özarslan <borako.ozarslan@gmail.com>
Submitted by: Yang Wang <2333@outlook.jp>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19917
- handle the CLOCK_{PROCESS,THREAD}_CPUTIME_ID specified directly;
- fix thread id calculation as in the Linuxulator we should
convert the user supplied thread id to struct thread * by linux_tdfind();
- fix CPUCLOCK_SCHED case by using kern_{process,thread}_cputime()
directly as native get_cputime() used by kern_clock_gettime() uses
native tdfind()/pfind() to find proccess/thread.
PR: 240990
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23341
MFC after: 2 weeks
so don't initialize nwhich in declaration and remove stale comment from r161304.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D23339
MFC after: 2 weeks
for missing IP_RECVERR setsockopt(2) support. Without it, DNS
resolution is broken for glibc >= 2.30 (glibc BZ #24047).
From the user point of view this fixes "yum update" on recent
CentOS 8.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23234
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23306
This unbreaks Mono (mono-devel-4.6.2.7+dfsg-1ubuntu1 from Ubuntu Bionic);
previously would crash on "amd64_is_imm32" assert.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
The Linux32 system call argument fetcher places each argument (passed in
registers in the Linux x86 system call convention) into an entry in the
generic system call args array. Each member of this array is 8 bytes
wide, so this approach is broken for system calls that take off_t
arguments.
Fix the problem by splitting l_loff_t arguments in the 32-bit system
call descriptions, the same as we do for FreeBSD32. Change entry points
to handle this using the PAIR32TO64 macro.
Move linux_ftruncate64() into compat/linux.
PR: 243155
Reported by: Alex S <iwtcex@gmail.com>
Reviewed by: kib (previous version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23210
sys_setsockopt. Just a cleanup; no functional changes.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22812
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.
Reviewed by: kib (previous version)
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D23036
Linux mmap rejects mmap() on a write-only file with EACCES.
linux_mmap_common currently does a fun dance to grab the fp associated with
the passed in fd, validates it, then drops the reference and calls into
kern_mmap(). Doing so is perhaps both fragile and premature; there's still
plenty of chance for the request to get rejected with a more appropriate
error, and it's prone to a race where the file we ultimately mmap has
changed after it drops its referenced.
This change alleviates the need to do this by providing a kern_mmap variant
that allows the caller to inspect the fp just before calling into the fileop
layer. The callback takes flags, prot, and maxprot as one could imagine
scenarios where any of these, in conjunction with the file itself, may
influence a caller's decision.
The file type check in the linux compat layer has been removed; EINVAL is
seemingly not an appropriate response to the file not being a vnode or
device. The fileop layer will reject the operation with ENODEV if it's not
supported, which more closely matches the common linux description of
mmap(2) return values.
If we discover that we're allowing an mmap() on a file type that Linux
normally wouldn't, we should restrict those explicitly.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22977
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
syscall is to query the CPU number and the NUMA domain the calling
thread is currently running on. The third argument is ignored.
It doesn't do anything regarding scheduling - it's literally
just a way to query the current state, without any guarantees
you won't get rescheduled an opcode later.
This unbreaks Java from CentOS 8
(java-11-openjdk-11.0.5.10-0.el8_0.x86_64).
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22972
copy_file_range(2) is implemented natively since r350315, make it available
for Linux binaries too.
Reviewed by: kib (mentor), trasz (previous version)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D22959
devices. It's required for LTP, among other things. It's not
complete, but good enough for now.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22950