memfd_create fds will no longer require an ftruncate(2) to set the size;
they'll grow (to the extent that it's possible) upon write(2)-like syscalls.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25502
it would fail with EINVAL, breaking some of the Python regression
tests.
While here, cap the user-controlled message length.
Note that the code doesn't seem to be copying out the new length
in either (success or failure) case. This will be addressed separately.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25392
create /sys/class/power_supply/. This silences some warnings
from biology/linux-foldingathome.
Reported by: 0mp
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25557
TCGETS et al are frequently issued by Linux binaries while the previous code
avoidably ping-pongs a global sx lock and serializes on Giant.
Note that even with the fix the common case will serialize on a per-tty lock.
Unify functions bodies.
Do not call tdfind() if pid is passed, and do not call pfind() if tid
is supplied.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25534
and fixes a bug where calling accept(2) could result in closing fd 0.
Note that the code still contains a number of problems: it makes
assumptions about l_sockaddr_in being the same as sockaddr_in,
the EFAULT-related code looks like it doesn't work at all, and the
socket type check is racy. Those will be addressed later on;
I'm trying to work in small steps to avoid breaking one thing while
fixing another.
It fixes Redis, among other things.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25461
and not the process ID. Make sure the linux_task_exiting() function uses tdfind()
to lookup the BSD procedure structure pointer by the "pid" field, and only
fallback to pfind() when no match is found! This makes linux_task_exiting()
in line with the rest of the code.
Differential Revision: https://reviews.freebsd.org/D25509
Submitted by: Greg V <greg@unrelenting.technology>
MFC after: 1 week
Sponsored by: Mellanox Technologies
This is to silence down some Chromium assertions.
PR: kern/240991
Analyzed by: Alex S <iwtcex@gmail.com>
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25256
rpokala notes that splitting the definitions like this is kind of silly,
since the comment applies to both. Move the comment up (or the definition
down, depending on your perspective on life) accordingly.
Reported by: rpokala
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.
The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.
Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:
- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
(?)
Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.
PR: 240874
Reviewed by: kib, trasz
Differential Revision: https://reviews.freebsd.org/D21845
with python3.8 from Focal triggers those.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25491
Linux MADV_DONTNEED is not advisory: it has side effects for anonymous
memory, and some system software depends on that. In particular,
MADV_DONTNEED causes anonymous pages to be discarded. If the mapping is
a private mapping of a named object then subsequent faults are to
repopulate the range from that object, otherwise pages will be
zero-filled. For mappings of non-anonymous objects, Linux MADV_DONTNEED
can be implemented in the same way as our MADV_DONTNEED.
This implementation differs from Linux semantics in its handling of
private mappings, inherited through fork(), of non-anonymous objects.
After applying MADV_DONTNEED, subsequent faults will repopulate the
mapping from the parent object rather than the root of the shadow chain.
PR: 230160
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25330
RB_CLEAR_NODE. But it is not an expression, and ought not to be
enclosed in parens. Remove them.
Approved by: markj
Differential Revision: https://reviews.freebsd.org/D25421
pointers. Define RB_SWAP_CHILD to replace the child of a parent with
its twin, and use it in 4 places. Use RB_SET in rb_link_node to remove
the only linuxkpi reference to color, and then drop color- and
parent-related definitions that are defined and used only in rbtree.h.
This is intended to be entirely cosmetic, with no impact on program
behavior, and leave RB_PARENT and RB_SET_PARENT as the only ways to
read and write rb parent pointers.
Reviewed by: markj, kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25264
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.
Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.
Reviewed by: mjg, kib, tmunro
Approved by: mjg (mentor)
Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision: https://reviews.freebsd.org/D25024
FreeBSD madvise(2) directly. While some of the flag values match,
most don't.
PR: kern/230160
Reported by: markj
Reviewed by: markj
Discussed with: brooks, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25272
Functions which take untrusted user ranges must validate against the
bounds of the map, and also check for wraparound. Instead of having the
same logic duplicated in a number of places, add a function to check.
Reviewed by: dougm, kib
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25328
If multithreaded non-Linux process execs Linux binary, then non-Linux
threads different from the one that execing are cleared by
single-threading at boundary, and then terminating them in
post_execve(). Since at that time the process is already switched to
linux ABI, linuxolator is involved in the thread clearing on boundary,
but cannot find the emul data.
Handle it by pre-creating emuldata for all threads in the execing process.
Also remove a code in linux_proc_exec() handler that cleared emul data
for other threads when execing from multithreaded Linux process. It is
excessive.
PR: 247020
Reported by: Martin FIlla <freebsd@sysctl.cz>
Reported by: Henrique L. Amorim, Independent Security Researcher
Reported by: Rodrigo Rubira Branco (BSDaemon), Amazon Web Services
Reviewed by: markj
Tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25293
PR: kern/240432
Analyzed by by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25248
making them break when the representation changes. Revert changes that
eliminated the color field from rb-trees, leaving everything as it was
before.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D25250
RB_LEFT or RB_RIGHT, so they aren't stripping off the color bit
encoded there. Strip off that bit for linuxkpi.
Reported by: dch
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D25245
the debug messages. While here, clean up some variable naming.
Reviewed by: bcr (manpages), emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25230
- Use the same definition of free memory as Linux.
- Rename the totalbig and freebig fields to match the corresponding
names on Linux.
Discussed with: alc
MFC after: 1 week
applications, which often depend on this being the case. There's a new
sysctl, compat.linux.default_openfiles, to control this behaviour.
Reviewed by: kevans, emaste, bcr (manpages)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25177
standard SO_SNDBUF/SO_RCVBUF. Mostly cosmetics, to get rid
of the warning during 'apt upgrade'.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25173
significant bit in the pointer to the node from its parent to indicate
that the node is red. Have the tree rotation macros leave the
old-parent/new-child node red and the new-parent/old-child node black.
This change makes RB_LEFT and RB_RIGHT no longer assignable, and
RB_COLOR no longer defined. Any code that modifies the tree or
examines a node color would have to be modified after this change.
Reviewed by: markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25105
Add a freebsd32_ptrace() and move as many freebsd32 shims as possible
to freebsd32_ptrace(). Aside from register sets, freebsd32 passes
pointers to native structures to kern_ptrace() and converts to/from
native/32-bit structure formats in freebsd32_ptrace() outside of
kern_ptrace().
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D25195
The previous code was computing an incorrect value in a very expensive
manner. "sharedram" is supposed to be the amount of memory used by
named swap objects, which on FreeBSD basically corresponds to memory
usage by shared memory objects (including, for example, GEM objects) and
tmpfs. We currently have no cheap way to count such pages. The
previous code tried to determine the number of copy-on-write pages
shared between processes.
Just replace the computed value with 0. illumos reportedly does the
same thing. Linux itself did not populate this field until a 2014
commit, "mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfaces".
Reported by: mjg
MFC after: 1 week
and not only the software cache of that register. Else
pci_channel_offline() won't detect that the PCI device is gone when
using the LinuxKPI.
MFC after: 1 week
Sponsored by: Mellanox Technologies