Commit Graph

139 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
9d167945e8 linux: improve reporting for unsupported syscall flags
Filter out the flags we do support; previously we would print
out the flag value verbatim.

Reviewed By:	dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30693
2021-06-15 10:18:18 +01:00
Philippe Michaud-Boudreault
2362ad457a linux: implement statx(2)
PR:		252106
Reviewed By:	dchagin
Differential Revision:	https://reviews.freebsd.org/D30466
2021-06-08 10:08:56 +01:00
Edward Tomasz Napierala
4b45c2bb83 linux: make fstatat(2) handle AT_EMPTY_PATH
Without it, Qt5 apps from Focal fail to start, being unable to load
their plugins.  It's also necessary for glibc 2.33, as found in recent
Arch snapshots.

PR:		254112
Reviewed By:	kib
Sponsored by:	The FreeBSD Foundation, EPSRC
Differential Revision:	https://reviews.freebsd.org/D28192
2021-04-16 08:56:19 +01:00
Edward Tomasz Napierala
1a34e9fad6 Fix potential race condition in linux stat(2).
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25618
2020-10-20 17:19:10 +00:00
Mateusz Guzik
1a18003240 compat: clean up empty lines in .c and .h files 2020-09-01 21:24:33 +00:00
Mateusz Guzik
7ad2a82da2 vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error
Most consumers pass NULL.
2020-08-19 02:51:17 +00:00
Mateusz Guzik
a125ed50a6 linux: add sysctl compat.linux.use_emul_path
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.

It defaults to on (== current behavior).

make -C /root/linux-5.3-rc8 -s -j 1 bzImage:

use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
2020-08-18 22:04:22 +00:00
Edward Tomasz Napierala
17f701a3fb Make linux stat(2) return the same st_dev for every devfs instance.
The reason for this is to work around an idiosyncrasy of glibc
getttynam(3) implementation: it checks whether st_dev returned for
fd 0 is the same as st_dev returned for the target of /proc/self/fd/0
symlink, and with linux chroots having their own devfs instance,
the check will fail if you chrooted into it.

PR:		kern/240767
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25559
2020-07-11 13:08:16 +00:00
Mateusz Guzik
c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Edward Tomasz Napierala
273ce4ae1b Make Linux stat(2) et al distinguish between block and character
devices.  It's required for LTP, among other things.  It's not
complete, but good enough for now.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22950
2019-12-29 13:54:02 +00:00
Dmitry Chagin
c5156c7785 Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR:		222861
Reviewed by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20178
2019-05-13 18:24:29 +00:00
Ed Maste
931e2a1a6e linuxulator: do not include legacy syscalls on arm64
Existing linuxulator platforms (i386, amd64) support legacy syscalls,
such as non-*at ones like open, but arm64 and other new platforms do
not.

Wrap these in #ifdef LINUX_LEGACY_SYSCALLS, #defined in the MD linux.h
files.  We may need finer grained control in the future but this is
sufficient for now.

Reviewed by:	andrew
Sponsored by:	Turing Robotic Industries
Differential Revision:	https://reviews.freebsd.org/D15237
2018-06-15 14:41:51 +00:00
Bruce Evans
407a812657 Oops, r335053 had an old version of the comment about 16-bit linux dev_t
translation.
2018-06-13 12:44:45 +00:00
Bruce Evans
ab35e1c71b Fix the encoding of major and minor numbers in 64-bit dev_t by restoring
the old encodings for the lower 16 and 32 bits and only using the
higher 32 bits for unusually large major and minor numbers.  This
change breaks compatibility with the previous encoding (which was only
used in -current).

Fix truncation to (essentially) 16-bit dev_t in newnfs v3.

Any encoding of device numbers gives an ABI, so it can't be changed
without translations for compatibility.  Extra bits give the much
larger complication that the translations need to compress into fewer
bits.  Fortunately, more than 32 bits are rarely needed, so
compression is rarely needed except for 16-bit linux dev_t where it
was always needed but never done.

The previous encoding moved the major number into the top 32 bits.
Almost no translation code handled this, so the major number was blindly
truncated away in most 32-bit encodings.  E.g., for ffs, mknod(8) with
major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent
this and blindly truncated it to 2.  But if this mknod was run on any
released version of FreeBSD, it gives dev_t = 0x102.  ffs can represent
this, but in the previous encoding it was not decoded, giving major = 0,
minor = 0x102.

The presence of bugs was most obvious for exporting dev_t's from an
old system to -current, since bugs in newnfs augment them.  I fixed
oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed
to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then
further in -current.  E.g., old ad0 with major = 234, minor = 0x10002
had the correct (major, minor) number on the wire, but newnfs truncated
this to (234, 2) and then the previous encoding shifted the major
number into oblivion as seen by ffs or old applications.

I first tried to fix this by translating on every ABI/API boundary, but
there are too many boundaries and too many sloppy translations by blind
truncation.  So use the old encoding for the low 32 bits so that sloppy
translations work no worse than before provided the high 32 bits are
not set.  Add some error checking for when bits are lost.  Keep not
doing any error checking for translations for almost everything in
compat/linux.

compat/freebsd32/freebsd32_misc.c:
Optionally check for losing bits after possibly-truncating assignments as
before.

compat/linux/linux_stats.c:
Depend on the representation being compatible with Linux's (or just with
itself for local use) and spell some of the translations as assignments in
a macro that hides the details.

fs/nfsclient/nfs_clcomsubs.c:
Essentially the same fix as in 1996, except there is now no possible
truncation in makedev() itself.  Also fix nearby style bugs.

kern/vfs_syscalls.c:
As for freebsd32.  Also update the sysctl description to include file
numbers, and change it to describe device ids as device numbers.

sys/types.h:
Use inline functions (wrapped by macros) since the expressions are now
a bit too complicated for plain macros.  Describe the encoding and
some of the reasons for it.  16-bit compatibility didn't leave many
reasonable choices for the 32-bit encoding, and 32-bit compatibility
doesn't leave many reasonable choices for the 64-bit encoding.  My
choice is to put the 8 new minor bits in the low 8 bits of the top 32
bits.  This minimizes discontiguities.

Reviewed by:	kib (except for rewrite of the comment in linux_stats.c)
2018-06-13 12:22:00 +00:00
Bruce Evans
372639f944 Fix some bugs found while fixing the representation and translation
of 64-bit dev_t's (but not ones involving dev_t's).

st_size was supposed to be clamped in cvtstat() and linux's copy_stat(),
but the clamping code wasn't aware that st_size is signed, and also had
an obfuscated off-by-1 value for the unsigned limit, so its effect was
to produce a bizarre negative size instead of clamping.

Change freebsd32's copy_ostat() to be no worse than cvtstat().  It was
missing clamping and bzero()ing of padding.

Reviewed by:	kib (except a final fix of the clamp to the signed maximum)
2018-06-13 08:50:43 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Ed Maste
340f4a8d3e Linuxulator: apply style(9) to return
Sponsored by:	Turing Robotic Industries Inc.
2018-03-12 15:35:24 +00:00
Ed Maste
0ba1b36553 Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text.  To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.  Additional files
waiting on permission from others are listed in review D14210.

Approved by:	kan, marcel, sos, rdivacky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-02-16 15:00:14 +00:00
Pedro F. Giffuni
7f2d13d607 sys/compat: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:13:23 +00:00
Pedro F. Giffuni
2c75d7b08d Small style(9) issue: spaces vs TAB. 2017-09-24 20:57:03 +00:00
Ed Maste
dbaa9ebf1b Add ZFS to Linux statfs ftype
PR:		220086
Reviewed by:	cem
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D11252
2017-06-18 11:51:03 +00:00
Edward Tomasz Napierala
e801ac7852 Fix linux_fstatfs() to return proper value for f_frsize. Without it,
linux df(1) binary from Xenial shows garbage.

Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9692
2017-02-25 20:32:37 +00:00
Edward Tomasz Napierala
3b51ec0886 Get rid of foo_sys() in linuxulator code. It was commented out, and it
would be useless anyway - there is no point in pretending to have block
devices; our "block" devices are in fact character ones, and can only
be accessed as such.

Discussed with:	dchagin
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-02-21 15:57:01 +00:00
Konstantin Belousov
2f304845e2 Do not allocate struct statfs on kernel stack.
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable.  With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from:	ino64 work by gleb
Discussed with:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-05 17:19:26 +00:00
Dmitry Chagin
2ad0231309 Check bsd_to_linux_statfs() return value. Forgotten in r297070.
MFC after:	1 week
2016-03-20 19:06:21 +00:00
Dmitry Chagin
525c9796c3 Return EOVERFLOW in case when actual statfs values are large enough and
not fit into 32 bit fileds of a Linux struct statfs.

PR:		181012
MFC after:	1 week
2016-03-20 18:31:30 +00:00
Dmitry Chagin
7958a34cb5 Whitespaces, style(9) fixes. No functional changes.
MFC after:	1 week
2016-03-20 14:06:27 +00:00
Dmitry Chagin
99546279d6 Implement fstatfs64 system call.
PR:		181012
Submitted by:	John Wehle
MFC after:	1 week
2016-03-20 13:21:20 +00:00
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Dmitry Chagin
9802eb9ebc Implement Linux specific syncfs() system call. 2015-05-24 18:08:01 +00:00
Dmitry Chagin
2166e4e0a5 As for now our tmpfs is no longer being considered
"highly experimental" remove /dev/shm magic commited
in r218497 and convert tmpfs type to an expected magic number.

Differential Revision:	https://reviews.freebsd.org/D1497
Reviewed by:	emaste, trasz
2015-05-24 17:26:58 +00:00
Dmitry Chagin
606bcc1741 Add newfstatat system call for 64-bit Linuxulator.
Differential Revision:	https://reviews.freebsd.org/D1071
Reviewed by:	trasz
2015-05-24 15:48:34 +00:00
Dmitry Chagin
4ca75bed31 Fix compilation with -DDEBUG option.
Differential Revision:	https://reviews.freebsd.org/D1070
Reviewed by:	trasz
2015-05-24 15:47:15 +00:00
Dmitry Chagin
7f8f1d7f7a Disable i386 call for x86-64 Linux.
Differential Revision:	https://reviews.freebsd.org/D1067
Reviewed by:	trasz
2015-05-24 15:43:53 +00:00
Dmitry Chagin
297f61cc01 Get ready to commit x86_64 Linux emulation.
All fields of type l_int in struct statfs are defined
as l_long on i386 and amd64.

Differential Revision:	https://reviews.freebsd.org/D1064
Reviewed by:	trasz
2015-05-24 15:39:08 +00:00
Konstantin Belousov
6e646651d3 Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 18:01:51 +00:00
Ed Schouten
7870adb640 Remove direct access to si_name.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after:	2 weeks
2012-02-10 12:35:57 +00:00
Ulrich Spörlein
9a14aa017b Convert files to UTF-8 2012-01-15 13:23:18 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Alexander Leidinger
529844c77c Linux' shm_open() fails because it wants to find some funky shmfs
to construct the full pathname. It starts to search at the default
mountpoint which is /dev/shm. If this fails it runs through fstab
and searches for shmfs and tmpfs. Whatever it finds will be
statfs()'ed to be checked for Linux' fs magic for shmfs (0x01021994).

Ideally our tmpfs should deliver this fs magic to Linux processes, but
as our tmpfs is considered to be an experimental feature we can not
assume that there is always a tmpfs available.

To make shared memory work in the Linuxulator, force the fs type of
/dev/shm (which can be a symlink) to match what Linux expects. The user
is responsible (info has to be added to the linux base ports and the docs)
to setup a suitable link for /dev/shm.

Noticed by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
Submitted by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
MFC after:	1 month
2011-02-09 20:23:22 +00:00
Ed Schouten
510ea843ba Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.
2010-03-28 13:13:22 +00:00
Pawel Jakub Dawidek
957d68dd91 No need to include security/mac/mac_framework.h here. 2010-02-18 22:26:01 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Jamie Gritton
7ae27ff49f Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
Ed Schouten
0eee862a54 Don't make Linux stat() open character devices to resolve its name.
The existing code calls kern_open() to resolve the vnode of a pathname
right after a stat(). This is not correct, because it causes random
character devices to be opened in /dev. This means ls'ing a tape
streamer will cause it to rewind, for example. Changes I have made:

- Add kern_statat_vnhook() to allow binary emulators to `post-process'
  struct stat, using the proper vnode.

- Remove unneeded printf's from stat() and statfs().

- Make the Linuxolator use kern_statat_vnhook(), replacing
  translate_path_major_minor_at().

- Let translate_fd_major_minor() use vp->v_rdev instead of
  vp->v_un.vu_cdev.

Result:

	crw-rw-rw- 1 root root   0, 14 Feb 20 13:54 /dev/ptmx
	crw--w---- 1 root adm  136,  0 Feb 20 14:03 /dev/pts/0
	crw--w---- 1 root adm  136,  1 Feb 20 14:02 /dev/pts/1
	crw--w---- 1 ed   tty  136,  2 Feb 20 14:03 /dev/pts/2

Before this commit, ptmx also had a major number of 136, because it
silently allocated and deallocated a pseudo-terminal. Device nodes that
cannot be opened now have proper major/minor-numbers.

Reviewed by:	kib, netchild, rdivacky (thanks!)
2009-02-20 13:05:29 +00:00
Ed Schouten
a4611ab612 Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining
the device minor number of a character device. Because we made device
numbers dynamically allocated and independent of the unit number passed
to make_dev() a long time ago, it was actually a misnomer. If you really
want to obtain the device number, you should use dev2udev().

We already converted all the drivers to use dev2unit() to obtain the
device unit number, which is still used by a lot of drivers. I've
noticed not a single driver passes NULL to dev2unit(). Even if they
would, its behaviour would make little sense. This is why I've removed
the NULL check.

Ths commit removes minor(), minor2unit() and unit2minor() from the
kernel. Because there was a naming collision with uminor(), we can
rename umajor() and uminor() back to major() and minor(). This means
that the makedev(3) manual page also applies to kernel space code now.

I suspect umajor() and uminor() isn't used that often in external code,
but to make it easier for other parties to port their code, I've
increased __FreeBSD_version to 800062.
2009-01-28 17:57:16 +00:00
Ed Schouten
bc093719ca Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

  The old TTY layer has a driver model that is not abstract enough to
  make it friendly to use. A good example is the output path, where the
  device drivers directly access the output buffers. This means that an
  in-kernel PPP implementation must always convert network buffers into
  TTY buffers.

  If a PPP implementation would be built on top of the new TTY layer
  (still needs a hooks layer, though), it would allow the PPP
  implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

  With the old TTY layer, it isn't entirely safe to destroy TTY's from
  the system. This implementation has a two-step destructing design,
  where the driver first abandons the TTY. After all threads have left
  the TTY, the TTY layer calls a routine in the driver, which can be
  used to free resources (unit numbers, etc).

  The pts(4) driver also implements this feature, which means
  posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

  One of the major improvements is the per-TTY mutex, which is expected
  to improve scalability when compared to the old Giant locking.
  Another change is the unbuffered copying to userspace, which is both
  used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from:		//depot/projects/mpsafetty/...
Approved by:		philip (ex-mentor)
Discussed:		on the lists, at BSDCan, at the DevSummit
Sponsored by:		Snow B.V., the Netherlands
dcons(4) fixed by:	kan
2008-08-20 08:31:58 +00:00
Ed Schouten
a147e6cadf Push down the major/minor conversion for pts/%u to improve consistency.
In the mpsafetty branch, Linux sshd seems to work properly inside a
jail. Some small modifications had to be made to the Linux compatibility
layer.

The Linux PTY routines always expect the device major number to be 136
or higher. Our code always set the major/minor number pair to 136:0.
This makes routines like ttyname() and ptsname() fail, because we'll end
up having ambiguous device numbers.

The conversion was not performed on all *stat() routines, which meant in
some cases the numbers didn't get transformed. By pushing the conversion
into linux_driver_get_major_minor(), the transformation will take place
on all calls.

Approved by:	philip (mentor), rdivacky
2008-06-02 08:40:06 +00:00
Konstantin Belousov
48b05c3f82 Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
    renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by:	rdivacky
Sponsored by:	Google Summer of Code 2007
Tested by:	pho
2008-04-08 09:45:49 +00:00
Konstantin Belousov
d075105da0 After applying LCONVPATH() to the path, do use the converted path
instead of original user-mode string in the linux_stat() and
linux_lstat() syscalls.

Tested by:	Peter Holm
MFC after:	3 days
2008-01-05 12:36:35 +00:00