Commit Graph

1173 Commits

Author SHA1 Message Date
Ed Maste
2eb6ef203a linux: add trivial renameat2 implementation
Just return EINVAL if flags != 0.  The Linux man page documents one
case of EINVAL as "The filesystem does not support one of the flags in
flags."

After r351723 userland binaries will try using new system calls.

Reported by:	mjg
Reviewed by:	mjg, trasz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21590
2019-09-11 13:01:59 +00:00
Edward Tomasz Napierala
1d3a302b4a Bump Linux version to 3.2.0. Without it, binaries linked against
glibc 2.24 and up (eg Ubuntu 19.04) fail with "FATAL: kernel too old".

This alone is not enough to make newer binaries actually work;
fix/hack/workaround is pending review at https://reviews.freebsd.org/D20687.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20757
2019-09-02 18:10:35 +00:00
Edward Tomasz Napierala
7a8cbc5288 Relax compat.linux.osrelease checks. This way one can do eg
'compat.linux.osrelease=3.10.0-957.12.1.el7.x86_64', which
corresponds to CentOS 7.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20685
2019-09-02 16:57:42 +00:00
Konstantin Belousov
bb9e2184f0 Change locking requirements for VOP_UNSET_TEXT().
Require the vnode to be locked for the VOP_UNSET_TEXT() call.  This
will be used by the following bug fix for a tmpfs issue.

Tested by:	sbruno, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-18 20:24:52 +00:00
Konstantin Belousov
62375ca79c compat/linux: Remove obsoleted and somewhat confusing comments related to COMPAT_43.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21200
2019-08-11 19:17:29 +00:00
Andriy Gapon
c66f5b079d linuxcommon: add module version
MFC after:	2 weeks
2019-07-10 13:47:10 +00:00
Tijl Coosemans
e2fba140a8 Let linuxulator mprotect mask unsupported bits before calling kern_mprotect.
After r349240 kern_mprotect returns EINVAL for unsupported bits in the prot
argument.  Linux rtld uses PROT_GROWSDOWN and PROT_GROWS_UP when marking the
stack executable.  Mask these bits like kern_mprotect used to do.  For other
unsupported bits EINVAL is returned like Linux does.

Reviewed by:	trasz, brooks
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20864
2019-07-10 08:19:33 +00:00
Ed Maste
b97ebbbf72 Update Linux compat version to 2.6.36
New system calls between 2.6.32 and 2.6.26 are already implemented.

This should be mostly NFC as far as contemporary Linux applications are
concerned though, as Linux kernel 3.2 is the oldest supported by a
number of popular distros today; work is in progress by others to enable
support for those applications.

Discussed with:	trasz
MFC after:	1 month
2019-07-04 20:42:08 +00:00
Edward Tomasz Napierala
0fabd7b5cc Return ENOTSUP for Linux FS_IOC_FIEMAP ioctl.
Linux man(1) calls it for no good reason; this avoids the console spam
(eg '(man): ioctl fd=4, cmd=0x660b ('f',11) is not implemented').

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20690
2019-07-04 20:16:04 +00:00
Edward Tomasz Napierala
2478d444d1 Fix linuxulator prlimit64(2) with pid == 0. This makes 'ulimit -a'
return something reasonable, and helps linux binaries which attempt
to close all the files, eg apt(8).

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20692
2019-07-04 19:40:01 +00:00
Mark Johnston
fc795c25d4 Remove the CDIOCREADSUBCHANNEL_SYSSPACE ioctl.
This was added for emulation of Linux's CDROMSUBCHNL, but allows
users with read access to a cd(4) device to overwrite kernel memory
provided that the driver detects some media present.

Reimplement CDROMSUBCHNL by bouncing the data from CDIOCREADSUBCHANNEL
through the linux_cdrom_subchnl structure passed from userspace.

admbugs:	768
Reported by:	Alex Fortune
Security:	CVE-2019-5602
Security:	FreeBSD-SA-19:11.cd_ioctl
2019-07-03 00:10:01 +00:00
Dmitry Chagin
c5afec6e89 Complete LOCAL_PEERCRED support. Cache pid of the remote process in the
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.

PR:		215202
Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20415
2019-05-30 14:24:26 +00:00
Dmitry Chagin
1410bfe142 Linux does not support MSG_OOB for unix(4) or non-stream oriented socket,
return EOPNOTSUPP as a Linux does.

Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20409
2019-05-30 14:21:51 +00:00
Dmitry Chagin
8128cfc59e Do not leak sa in linux_recvmsg() call if kern_recvit() fails.
MFC after:	1 week
2019-05-21 18:08:19 +00:00
Dmitry Chagin
57cb29a73e Do not use uninitialised sa.
Reported by:	tijl@
MFC after:	1 week
2019-05-21 18:05:57 +00:00
Dmitry Chagin
dcd6241868 Do not leak sa in linux_recvfrom() call if kern_recvit() fails.
MFC after:	1 week
2019-05-21 18:03:58 +00:00
Edward Tomasz Napierala
d49fb289c8 Implement PTRACE_O_TRACESYSGOOD. This makes Linux strace(1) work.
Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20200
2019-05-19 12:58:44 +00:00
Dmitry Chagin
aa28871254 Linux send() call returns EAGAIN instead of ENOTCONN in case when the
socket is non-blocking and connect() is not finished yet.

Initial patch developed by Steven Hartland in 2008 and adopted by me.

PR:		129169
Reported by:	smh@
MFC after:	2 weeks
2019-05-19 09:23:20 +00:00
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Dmitry Chagin
caaad8736e Linuxulator getpeername() returns EINVAL in case then namelen less then 0.
MFC after:	2 weeks
2019-05-13 18:14:20 +00:00
Dmitry Chagin
d5368bf3df Our bsd_to_linux_sockaddr() and linux_to_bsd_sockaddr() functions
alter the userspace sockaddr to convert the format between linux and BSD versions.
That's the minimum 3 of copyin/copyout operations for one syscall.

Also some syscall uses linux_sa_put() and linux_getsockaddr() when load
sockaddr to userspace or from userspace accordingly.

To avoid this chaos, especially converting sockaddr in the userspace,
rewrite these 4 functions to convert sockaddr only in kernel and leave
only 2 of this functions.

Also in order to reduce duplication between MD parts of the Linuxulator put
struct sockaddr conversion functions that are MI out into linux_common module.

PR:		232920
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20157
2019-05-13 17:48:16 +00:00
Dmitry Chagin
bbac65c772 Rewrite linux_ifflags() in more readable Linuxulator style.
Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20146
2019-05-06 19:57:51 +00:00
Dmitry Chagin
9c1437ae57 Complete r347052 (https://reviews.freebsd.org/D20137) as it it was not
a final revision.

Fix style issues and change bool-like variables from int to bool.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20141
2019-05-06 19:56:13 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Dmitry Chagin
d151344dbf In order to reduce duplication between MD parts of the Linuxulator
move bits that are MI out into the headers in compat/linux.
For that remove bogus _packed attribute from struct l_sockaddr
and use MI types for struct members.

And continue to move into the linux_common module a code that is
intended for both Linuxulator modules (both instruction set - 32 & 64 bit)
or for external modules like linsysfs or linprocfs.

To avoid header pollution introduce new sys/compat/linux_common.h header.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20137
2019-05-03 08:42:49 +00:00
Dmitry Chagin
5d520d7fab Follow the FreeBSD and implement PDEATH_SIG prctl ops in the Linuxulator.
It was first introduced in r163734 and missied by me in r283383.

MFC after:	1 week
2019-04-30 17:18:05 +00:00
Mariusz Zaborski
a1304030b8 Introduce funlinkat syscall that always us to check if we are removing
the file associated with the given file descriptor.

Reviewed by:	kib, asomers
Reviewed by:	cem, jilles, brooks (they reviewed previous version)
Discussed with:	pjd, and many others
Differential Revision:	https://reviews.freebsd.org/D14567
2019-04-06 09:34:26 +00:00
Dmitry Chagin
803fff9065 Whitespace cleanup (annoying).
MFC after:	1 month
2019-03-24 15:08:30 +00:00
Dmitry Chagin
f730d606d5 Update syscall.master to 5.0.
For 32-bit Linuxulator, ipc() syscall was historically
the entry point for the IPC API. Starting in Linux 4.18, direct
syscalls are provided for the IPC. Enable it.

MFC after:	1 month
2019-03-24 14:50:02 +00:00
Dmitry Chagin
7dabf89bcf Linux between 4.18 and 5.0 split IPC system calls.
In preparation for doing this in the Linuxulator modify our linux_shmat()
to match actual Linux shmat() system call.

MFC after:	1 month
2019-03-24 14:44:35 +00:00
Ed Maste
347a8ed1bf linuxulator: fix stack memory disclosure in linux_sigaltstack
Most siginfo_to_lsiginfo callers already zeroed the l_siginfo_t before
callit it, but linux_waitid did not.  Instead of zeroing in the called
function to address linux_waitid (as in commit 2e6ebe70), just do it in
linux_waitid.

admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	Andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 17:12:16 +00:00
Ed Maste
9866e7bbae linuxulator: fix stack memory disclosure in linux_ioctl_termio
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 16:21:03 +00:00
Ed Maste
4308a37410 linuxulator: fix stack memory disclosure in linux_ioctl_v4l
admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	andrew
MFC after:	1 day
Security:	Kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 16:19:02 +00:00
Kirk McKusick
88640c0e8b Create new EINTEGRITY error with message "Integrity check failed".
An integrity check such as a check-hash or a cross-correlation failed.
The integrity error falls between EINVAL that identifies errors in
parameters to a system call and EIO that identifies errors with the
underlying storage media. EINTEGRITY is typically raised by intermediate
kernel layers such as a filesystem or an in-kernel GEOM subsystem when
they detect inconsistencies. Uses include allowing the mount(8) command
to return a different exit value to automate the running of fsck(8)
during a system boot.

These changes make no use of the new error, they just add it. Later
commits will be made for the use of the new error number and it will
be added to additional manual pages as appropriate.

Reviewed by:    gnn, dim, brueffer, imp
Discussed with: kib, cem, emaste, ed, jilles
Differential Revision: https://reviews.freebsd.org/D18765
2019-01-17 06:35:45 +00:00
Mark Johnston
bb376a990c Specify the correct option level when emulating SO_PEERCRED.
Our equivalent to SO_PEERCRED, LOCAL_PEERCRED, is implemented at
socket option level 0, not SOL_SOCKET.

PR:		234722
Submitted by:	Dániel Bakai <bakaidl@gmail.com>
MFC after:	2 weeks
2019-01-08 17:21:59 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Mark Johnston
792843c38f Pass malloc flags directly through kevent(2) subroutines.
Some kevent functions have a boolean "waitok" parameter for use when
calling malloc(9).  Replace them with the corresponding malloc() flags:
the desired behaviour is known at compile-time, so this eliminates a
couple of conditional branches, and makes the code easier to read.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18318
2018-11-24 17:06:01 +00:00
Tijl Coosemans
7df0e7beb7 Fix another user address dereference in linux_sendmsg syscall.
This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument.  Stop using the macro as well as LINUX_CMSG_FIRSTHDR.  Use
the size field of the kernel copy of the control message header to obtain
the next control message.

PR:		217901
MFC after:	2 days
X-MFC-With:	r340631
2018-11-20 14:18:57 +00:00
Tijl Coosemans
e3b385fc95 Do proper copyin of control message data in the Linux sendmsg syscall.
Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin.  For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly.  One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).

PR:		217901
Reviewed by:	kib
MFC after:	3 days
2018-11-19 15:31:54 +00:00
Mateusz Guzik
2c054ce924 proc: always store parent pid in p_oppid
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17825
2018-11-16 17:07:54 +00:00
Brooks Davis
318f0d7720 Use declared types for caddr_t arguments.
Leave ptrace(2) alone for the moment as it's defined to take a caddr_t.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:46:38 +00:00
Tijl Coosemans
8fc08087a1 On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers.  The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver.  Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.

Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration.  These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.

Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again.  That is a much bigger effort
however and this commit is meant to be MFCable.

This enables linux64 support in x11/nvidia-driver*.

PR:		206711
Reviewed by:	kib
MFC after:	3 days
2018-11-06 13:51:08 +00:00
Konstantin Belousov
4f77f48884 Implement O_BENEATH and AT_BENEATH.
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory.  Supposedly the interface is similar
to the same proposed Linux flags.

Reviewed by:	jilles (code, previous version of manpages), 0mp (manpages)
Discussed with:	allanjude, emaste, jonathan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17547
2018-10-25 22:16:34 +00:00
Mark Johnston
c7902fbeae Improve handling of control message truncation.
If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space
for all control messages, the kernel sets MSG_CTRUNC in the message
flags to indicate truncation of the control messages.  In the case
of SCM_RIGHTS messages, however, we were failing to dispose of the
rights that had already been externalized into the recipient's file
descriptor table.  Add a new function and mbuf type to handle this
cleanup task, and use it any time we fail to copy control messages
out to the recipient.  To simplify cleanup, control message truncation
is now only performed at control message boundaries.

The change also fixes a few related bugs:
- Rights could be leaked to the recipient process if an error occurred
  while copying out a message's contents.
- We failed to set MSG_CTRUNC if the truncation occurred on a control
  message boundary, e.g., if the caller received two control messages
  and provided only the exact amount of buffer space needed for the
  first.

PR:		131876
Reviewed by:	ed (previous version)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16561
2018-08-07 16:36:48 +00:00
Alan Somers
6040822c4e Make timespecadd(3) and friends public
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub.  NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions.  Solaris also defines a three-argument version, but
only in its kernel.  This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with:	cem, jilles, ian, bde
Differential Revision:	https://reviews.freebsd.org/D14725
2018-07-30 15:46:40 +00:00
Ed Maste
e8a1ec3e05 Split kern_break from sys_break and use it in linuxulator
Previously the linuxulator's linux_brk invoked the FreeBSD sys_break
syscall implementation directly.  Instead, move the bulk of the existing
implementation to kern_break, and call that from both sys_break and
linux_brk.

This also addresses a minor bug in linux_brk in that we now return the
actual (rounded up) break address, rather than the requested value.

Reviewed by:	brooks (earlier version)
Sponsored by:	Turing Robotic Industries
Differential Revision:	https://reviews.freebsd.org/D16019
2018-06-27 14:45:13 +00:00
Ed Maste
9c42fa94a6 Quiet unused fn warning for linuxulator w/o legacy syscalls
Sponsored by:	Turing Robotic Industries
2018-06-25 19:24:50 +00:00
Chuck Tuffli
3575504976 Fix the Linux kernel version number calculation
The Linux compatibility code was converting the version number (e.g.
2.6.32) in two different ways and then comparing the results.

The linux_map_osrel() function converted MAJOR.MINOR.PATCH similar to
what FreeBSD does natively. I.e. where major=v0, minor=v1, and patch=v2
    v = v0 * 1000000 + v1 * 1000 + v2;

The LINUX_KERNVER() macro, on the other hand, converted the value with
bit shifts. I.e. where major=a, minor=b, and patch=c
    v = (((a) << 16) + ((b) << 8) + (c))

The Linux kernel uses the later format via the KERNEL_VERSION() macro in
include/generated/uapi/linux/version.h

Fix is to use the LINUX_KERNVER() macro in linux_map_osrel() as well as
in the .trans_osrel functions.

PR: 229209
Reviewed by: emaste, cem, imp (mentor)
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15952
2018-06-22 00:02:03 +00:00
Konstantin Belousov
31665c1abb linux_clone_thread: mark new thread as TDB_BORN.
So that the ptrace code will catch it and report it to attached
debugger.  Enables debugging of threaded Linux binaries with FreeBSD
debugger.

Submitted by:	Yanko Yankulov <yanko.yankulov@gmail.com>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15880
2018-06-21 21:15:04 +00:00
Ed Maste
aec1e6d390 linuxulator: handle V3 capget/capset
Linux 2.6.26 introduced 64-bit capability sets.  Extend our stub
implementation to handle both 32- and 64-bit.  (We still report no
capabilities in capget, and disallow any in capset.)

Reviewed by:	chuck
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D15887
2018-06-19 21:26:23 +00:00