Commit Graph

3296 Commits

Author SHA1 Message Date
Kyle Evans
85c5f3cb57 Add COMPAT12 support to makesyscalls.sh
Reviewed by:	kib, imp, brooks (all without syscalls.master edits)
Differential Revision:	https://reviews.freebsd.org/D21366
2019-09-25 17:29:45 +00:00
Tijl Coosemans
55258ab0ff Create a "drm" subdirectory for drm devices in linsysfs. Recent versions of
linux libdrm check for the existence of this directory:

https://cgit.freedesktop.org/mesa/drm/commit/?id=f8392583418aef5e27bfed9989aeb601e20cc96d

MFC after:	2 weeks
2019-09-23 12:27:55 +00:00
Ed Maste
2eb6ef203a linux: add trivial renameat2 implementation
Just return EINVAL if flags != 0.  The Linux man page documents one
case of EINVAL as "The filesystem does not support one of the flags in
flags."

After r351723 userland binaries will try using new system calls.

Reported by:	mjg
Reviewed by:	mjg, trasz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21590
2019-09-11 13:01:59 +00:00
Hans Petter Selasky
4c8ba7d94f Use true and false when dealing with bool type in the LinuxKPI.
No functional change.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 08:24:47 +00:00
Hans Petter Selasky
16732c193c Fix synchronous work drain issue in the LinuxKPI.
A work callback may restart itself. Loop in the drain function to see if the
work has been rescheduled and stop the subsequent reschedules, if any.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 08:20:13 +00:00
Hans Petter Selasky
6575da5eef Fix broken DECLARE_TASKLET() macro after r347852.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-09-11 07:53:49 +00:00
Jeff Roberson
c75757481f Replace redundant code with a few new vm_page_grab facilities:
- VM_ALLOC_NOCREAT will grab without creating a page.
 - vm_page_grab_valid() will grab and page in if necessary.
 - vm_page_busy_acquire() automates some busy acquire loops.

Discussed with:	alc, kib, markj
Tested by:	pho (part of larger branch)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21546
2019-09-10 19:08:01 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Johannes Lundberg
f6668e9f56 LinuxKPI: Improve sysfs support.
- Add functions for creating and merging sysfs groups.
- Add sysfs_streq function to compare strings ignoring newline from the
  sysctl userland call.
- Add a call to sysfs_create_groups in device_add.
- Remove duplicate header include.
- Bump __FreeBSD_version.

Reviewed by:	hselasky
Approved by:	imp (mentor), hselasky
MFC after:	4 days
Differential Revision:	D21542
2019-09-06 15:43:53 +00:00
Edward Tomasz Napierala
e55366be83 Fix /proc/mounts for autofs(5) mounts.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-09-04 18:00:54 +00:00
Konstantin Belousov
fe69291ff4 Add procctl(PROC_STACKGAP_CTL)
It allows a process to request that stack gap was not applied to its
stacks, retroactively.  Also it is possible to control the gaps in the
process after exec.

PR:	239894
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21352
2019-09-03 18:56:25 +00:00
Edward Tomasz Napierala
bb3c7a5440 Make linprocfs(4) report Tgid, Linux ltrace(1) needs it.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-09-03 16:33:02 +00:00
Mateusz Guzik
d05b53e0ba Add sysctlbyname system call
Previously userspace would issue one syscall to resolve the sysctl and then
another one to actually use it. Do it all in one trip.

Fallback is provided in case newer libc happens to be running on an older
kernel.

Submitted by:	Pawel Biernacki
Reported by:	kib, brooks
Differential Revision:	https://reviews.freebsd.org/D17282
2019-09-03 04:16:30 +00:00
Edward Tomasz Napierala
1d3a302b4a Bump Linux version to 3.2.0. Without it, binaries linked against
glibc 2.24 and up (eg Ubuntu 19.04) fail with "FATAL: kernel too old".

This alone is not enough to make newer binaries actually work;
fix/hack/workaround is pending review at https://reviews.freebsd.org/D20687.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20757
2019-09-02 18:10:35 +00:00
Edward Tomasz Napierala
7a8cbc5288 Relax compat.linux.osrelease checks. This way one can do eg
'compat.linux.osrelease=3.10.0-957.12.1.el7.x86_64', which
corresponds to CentOS 7.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20685
2019-09-02 16:57:42 +00:00
Johannes Lundberg
458ba18d43 LinuxKPI: Add sysfs create/remove functions that handles multiple files in one call.
Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D21475
2019-09-02 14:51:59 +00:00
Hans Petter Selasky
4d83500fda Use DEVICE memory instead of UNCACHEABLE on aarch64 in ioremap() in the LinuxKPI.
This fixes system hangs on reading device registers on aarch64.

Tested with:	Marvell MACCHIATObin (Armada8k) + mlx4en, amdgpu
Submitted by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D20789
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-02 08:34:45 +00:00
Konstantin Belousov
bb9e2184f0 Change locking requirements for VOP_UNSET_TEXT().
Require the vnode to be locked for the VOP_UNSET_TEXT() call.  This
will be used by the following bug fix for a tmpfs issue.

Tested by:	sbruno, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-18 20:24:52 +00:00
Conrad Meyer
419fe172c2 Linuxkpi: Prevent easy generated ctor name conflicts with prefix
Sponsored by:	Dell EMC Isilon
2019-08-17 03:00:58 +00:00
Hans Petter Selasky
4f109faadc Implement pci_enable_msi() and pci_disable_msi() in the LinuxKPI.
This patch makes the DRM graphics driver in ports usable on aarch64.

Submitted by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D21008
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-08-14 09:36:25 +00:00
John Baldwin
e28024fbad Fix build with DRM and INVARIANTS enabled.
The DRM drivers use the lockdep assertion macros with spinlock_t locks
which are backed by mutexes, not sx locks.  This causes compile
failures since you can't use sx_assert with a mutex.  Instead, change
the lockdep macros to use lock_class methods.  This works by assuming
that each LinuxKPI locking primitive embeds a FreeBSD lock as its
first structure and uses a cast to get to the underlying 'struct
lock_object'.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20992
2019-08-13 21:15:59 +00:00
Konstantin Belousov
62375ca79c compat/linux: Remove obsoleted and somewhat confusing comments related to COMPAT_43.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21200
2019-08-11 19:17:29 +00:00
Justin Hibbits
4eaa2fde6f Fix 32-bit build again, post r350570.
Missed this part with my testing as well.  Pass the right type to
BUS_TRANSLATE_RESOURCE().
2019-08-04 20:00:39 +00:00
Justin Hibbits
4b238da67b Fix 32-bit build post-r350570
The error message prints a rman_res_t, which is an uintmax_t.  Explicitly
cast, just for future-proofing, and use the correct format.
2019-08-04 19:55:43 +00:00
Justin Hibbits
937a05ba81 Add necessary bits for Linux KPI to work correctly on powerpc
PowerPC, and possibly other architectures, use different address ranges for
PCI space vs physical address space, which is only mapped at resource
activation time, when the BAR gets written.  The DRM kernel modules do not
activate the rman resources, soas not to waste KVA, instead only mapping
parts of the PCI memory at a time.  This introduces a
BUS_TRANSLATE_RESOURCE() method, implemented in the Open Firmware/FDT PCI
driver, to perform this necessary translation without activating the
resource.

In addition to system KPI changes, LinuxKPI is updated to handle a
big-endian host, by adding proper endian swaps to the I/O functions.

Submitted by:	mmacy
Reported by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D21096
2019-08-04 19:28:10 +00:00
Konstantin Belousov
fc83c5a7d0 Make randomized stack gap between strings and pointers to argv/envs.
This effectively makes the stack base on the csu _start entry
randomized.

The gap is enabled if ASLR is for the ABI is enabled, and then
kern.elf{64,32}.aslr.stack_gap specify the max percentage of the
initial stack size that can be wasted for gap.  Setting it to zero
disables the gap, and max is capped at 50%.

Only amd64 for now.

Reviewed by:	cem, markj
Discussed with:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21081
2019-07-31 20:23:10 +00:00
Konstantin Belousov
48d35b8f45 Regen. 2019-07-31 19:20:39 +00:00
Konstantin Belousov
4dd892181d freebsd32 shims for copy_file_range(2).
Reviewed by:	brooks, rmacklem (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21092
2019-07-31 19:20:05 +00:00
Kyle Evans
b5a7ac997f kern_shm_open: push O_CLOEXEC into caller control
The motivation for this change is to allow wrappers around shm to be written
that don't set CLOEXEC. kern_shm_open currently accepts O_CLOEXEC but sets
it unconditionally. kern_shm_open is used by the shm_open(2) syscall, which
is mandated by POSIX to set CLOEXEC, and CloudABI's sys_fd_create1().
Presumably O_CLOEXEC is intended in the latter caller, but it's unclear from
the context.

sys_shm_open() now unconditionally sets O_CLOEXEC to meet POSIX
requirements, and a comment has been dropped in to kern_fd_open() to explain
the situation and add a pointer to where O_CLOEXEC setting is maintained for
shm_open(2) correctness. CloudABI's sys_fd_create1() also unconditionally
sets O_CLOEXEC to match previous behavior.

This also has the side-effect of making flags correctly reflect the
O_CLOEXEC status on this fd for the rest of kern_shm_open(), but a
glance-over leads me to believe that it didn't really matter.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21119
2019-07-31 15:16:51 +00:00
Mark Johnston
918988576c Avoid relying on header pollution from sys/refcount.h.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:26:01 +00:00
Andriy Gapon
c66f5b079d linuxcommon: add module version
MFC after:	2 weeks
2019-07-10 13:47:10 +00:00
Tijl Coosemans
e2fba140a8 Let linuxulator mprotect mask unsupported bits before calling kern_mprotect.
After r349240 kern_mprotect returns EINVAL for unsupported bits in the prot
argument.  Linux rtld uses PROT_GROWSDOWN and PROT_GROWS_UP when marking the
stack executable.  Mask these bits like kern_mprotect used to do.  For other
unsupported bits EINVAL is returned like Linux does.

Reviewed by:	trasz, brooks
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20864
2019-07-10 08:19:33 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Ed Maste
b97ebbbf72 Update Linux compat version to 2.6.36
New system calls between 2.6.32 and 2.6.26 are already implemented.

This should be mostly NFC as far as contemporary Linux applications are
concerned though, as Linux kernel 3.2 is the oldest supported by a
number of popular distros today; work is in progress by others to enable
support for those applications.

Discussed with:	trasz
MFC after:	1 month
2019-07-04 20:42:08 +00:00
Edward Tomasz Napierala
0fabd7b5cc Return ENOTSUP for Linux FS_IOC_FIEMAP ioctl.
Linux man(1) calls it for no good reason; this avoids the console spam
(eg '(man): ioctl fd=4, cmd=0x660b ('f',11) is not implemented').

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20690
2019-07-04 20:16:04 +00:00
Edward Tomasz Napierala
2478d444d1 Fix linuxulator prlimit64(2) with pid == 0. This makes 'ulimit -a'
return something reasonable, and helps linux binaries which attempt
to close all the files, eg apt(8).

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20692
2019-07-04 19:40:01 +00:00
Hans Petter Selasky
8996977a89 Remove dead code added after r348743 in the LinuxKPI. The
LINUXKPI_VERSION macro is not defined for any compiled LinuxKPI code
which basically means __GFP_NOTWIRED is never checked when allocating
pages. This should work fine with the existing external DRM code as
long as the page wiring and unwiring is balanced.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-07-03 09:48:20 +00:00
Mark Johnston
fc795c25d4 Remove the CDIOCREADSUBCHANNEL_SYSSPACE ioctl.
This was added for emulation of Linux's CDROMSUBCHNL, but allows
users with read access to a cd(4) device to overwrite kernel memory
provided that the driver detects some media present.

Reimplement CDROMSUBCHNL by bouncing the data from CDIOCREADSUBCHANNEL
through the linux_cdrom_subchnl structure passed from userspace.

admbugs:	768
Reported by:	Alex Fortune
Security:	CVE-2019-5602
Security:	FreeBSD-SA-19:11.cd_ioctl
2019-07-03 00:10:01 +00:00
Konstantin Belousov
5dc7e31a09 Control implicit PROT_MAX() using procctl(2) and the FreeBSD note
feature bit.

In particular, allocate the bit to opt-out the image from implicit
PROTMAX enablement.  Provide procctl(2) verbs to set and query
implicit PROTMAX handling.  The knobs mimic the same per-image flag
and per-process controls for ASLR.

Reviewed by:	emaste, markj (previous version)
Discussed with:	brooks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D20795
2019-07-02 19:07:17 +00:00
Johannes Lundberg
6425fed7e6 LinuxKPI: Additions to rcu list.
- Add rcu list functions.
- Make rcu hlist's foreach macro use rcu calls instead of the non-rcu macro.
- Bump FreeBSD version so we have a checkpoint for the vboxvideo drm driver.

Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D20719
2019-06-21 18:48:07 +00:00
Johannes Lundberg
62260f68b4 LinuxKPI: Add atomic_long_sub macro.
Reviewed by:	imp (mentor), hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D20718
2019-06-21 16:43:16 +00:00
Mark Johnston
88ea538a98 Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone.  But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20470
2019-06-07 18:23:29 +00:00
Mark Johnston
1ef5e651fd Make the linuxkpi's alloc_pages() consistently return wired pages.
Previously it did this only on platforms without a direct map.  This
also more closely matches Linux's semantics.

Since some DRM v5.0 code assumes the old behaviour, use a
LINUXKPI_VERSION guard to preserve that until the out-of-tree module
is updated.

Reviewed by:	hselasky, kib (earlier versions), johalun
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20502
2019-06-06 16:09:19 +00:00
Brooks Davis
4af6033324 makesyscalls.sh: always use absolute path for syscalls.conf
syscalls.conf is included using "." which per the Open Group:

 If file does not contain a <slash>, the shell shall use the search
 path specified by PATH to find the directory containing file.

POSIX shells don't fall back to the current working directory.

Submitted by:	Nathaniel Wesley Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	bdrewery
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20476
2019-05-30 20:56:23 +00:00
Dmitry Chagin
c5afec6e89 Complete LOCAL_PEERCRED support. Cache pid of the remote process in the
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.

PR:		215202
Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20415
2019-05-30 14:24:26 +00:00
Dmitry Chagin
1410bfe142 Linux does not support MSG_OOB for unix(4) or non-stream oriented socket,
return EOPNOTSUPP as a Linux does.

Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20409
2019-05-30 14:21:51 +00:00
Dmitry Chagin
8128cfc59e Do not leak sa in linux_recvmsg() call if kern_recvit() fails.
MFC after:	1 week
2019-05-21 18:08:19 +00:00
Dmitry Chagin
57cb29a73e Do not use uninitialised sa.
Reported by:	tijl@
MFC after:	1 week
2019-05-21 18:05:57 +00:00
Dmitry Chagin
dcd6241868 Do not leak sa in linux_recvfrom() call if kern_recvit() fails.
MFC after:	1 week
2019-05-21 18:03:58 +00:00
Conrad Meyer
e12be3218a Include eventhandler.h in more compilation units
This was enumerated with exhaustive search for sys/eventhandler.h includes,
cross-referenced against EVENTHANDLER_* usage with the comm(1) utility.  Manual
checking was performed to avoid redundant includes in some drivers where a
common os_bsd.h (for example) included sys/eventhandler.h indirectly, but it is
possible some of these are redundant with driver-specific headers in ways I
didn't notice.

(These CUs did not show up as missing eventhandler.h in tinderbox.)

X-MFC-With:	r347984
2019-05-21 01:18:43 +00:00