636 Commits

Author SHA1 Message Date
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
kib
d836be949c Remove the support for mknod(S_IFMT), which created dummy vnodes with
VBAD type.

FFS ffs_write() VOP catches such vnodes and panics, other VOPs do not
check for the type and their behaviour is really undefined.  The
comment claims that this support was done for 'badsect' to flag bad
sectors, we do not have such facility in kernel anyway.

Reported by:	Dmitry Vyukov <dvyukov@google.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-22 08:11:45 +00:00
emaste
5d14c78c8e allow posix_fallocate in capability mode
posix_fallocate is logically equivalent to writing zero blocks to the
desired file size and there is no reason to prevent calling it in
capability mode. posix_fallocate already checked for the CAP_WRITE
right, so we merely need to list it in capabilities.conf.

Reviewed by:	allanjude
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D12640
2017-10-12 15:45:53 +00:00
dchagin
b084ee9dfc Implement proper Linux /dev/fd and /proc/self/fd behavior by adding
Linux specific things to the native fdescfs file system.

Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.

Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.

For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.

For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:

    mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd

Reviewed by:	kib@
MFC after:	1 week
Relnotes:	yes
2017-08-01 03:40:19 +00:00
kib
8aea8b1631 Define ino64_trunc_error under same conditions as the code which uses
the variable.

Noted by:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-30 16:10:21 +00:00
kib
f703462277 Enhance vfs.ino64_trunc_error sysctl.
Provide a new mode "2" which returns a special overflow indicator in
the non-representable field instead of the silent truncation (mode
"0") or EOVERFLOW (mode "1").

In particular, the typical use of st_ino to detect hard links with
mode "2" reports false positives, which might be more suitable for
some uses.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-09 11:17:08 +00:00
kib
6a43deb7a3 Add sysctl vfs.ino64_trunc_error controlling action on truncating
inode number or link count for the ABI compat binaries.

Right now, and by default after the change, too large 64bit values are
silently truncated to 32 bits.  Enabling the knob causes the system to
return EOVERFLOW for stat(2) family of compat syscalls when some
values cannot be completely represented by the old structures.  For
getdirentries(2), knob skips the dirents which would cause non-trivial
truncation of d_ino.

EOVERFLOW error is specified by the X/Open 1996 LFS document
('Adding Support for Arbitrary File Sizes to the Single UNIX
Specification').

Based on the discussion with:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-05 11:40:30 +00:00
kib
e75ba1d5c4 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
emaste
1901c3e1f2 Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10193
2017-05-17 00:34:34 +00:00
rwatson
7bd4c799c3 Audit arguments to posix_fallocate(2) and posix_fadvise(2) system calls.
As posix_fadvise() does not lock the vnode argument, don't capture
detailed vnode information for the time being.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL
2017-03-31 14:17:14 +00:00
trasz
c8e06fdde3 Replace calls to sys_truncate() with kern_truncate().
Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9371
2017-01-31 15:19:44 +00:00
trasz
3612389ea7 Add kern_lseek() and use it instead of sys_lseek() in various compats.
I didn't touch svr4/, there's no point.

Reviewed by:	ed@, kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9366
2017-01-30 12:24:47 +00:00
trasz
69861f6d1f Replace sys_ftruncate() with kern_ftruncate() in various compats.
Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9368
2017-01-30 11:50:54 +00:00
kib
5def9fa2c2 Do not allocate struct statfs on kernel stack.
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable.  With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from:	ino64 work by gleb
Discussed with:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-05 17:19:26 +00:00
kib
6d69bbcc31 Some style fixes for getfstat(2)-related code.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-05 17:03:35 +00:00
kib
e1aff3b457 The callers of kern_getfsstat(UIO_SYSSPACE) expect that *buf always
returns memory which must be freed, regardless of the error.  Assign
NULL to *buf in case we are not going to allocate any memory due to
invalid mode.

Reported and tested by:	pho
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks (together with r310638)
Differential revision:	https://reviews.freebsd.org/D9042
2017-01-04 16:09:45 +00:00
kib
778442cef7 There is no need to use temporary statfs buffer for fsid obliteration
and prison enforcement.  Do it on the caller buffer directly.

Besides eliminating memory copies, this change also removes large
structure from the kernel stack.

Extracted from:	ino64 work by gleb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-02 18:59:23 +00:00
kib
9a7682de00 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-02 18:49:48 +00:00
kib
577892f66a Move common code from kern_statfs() and kern_fstatfs() into a new helper.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-02 18:20:22 +00:00
jhb
9105298a66 Rename the 'flags' argument to getfsstat() to 'mode' and validate it.
This argument is not a bitmask of flags, but only accepts a single value.
Fail with EINVAL if an invalid value is passed to 'flag'.  Rename the
'flags' argument to getmntinfo(3) to 'mode' as well to match.

This is a followup to r308088.

Reviewed by:	kib
MFC after:	1 month
2016-12-27 20:21:11 +00:00
mjg
aa6fdf05e8 vfs: use vrefact in getcwd and fchdir 2016-12-12 19:16:35 +00:00
kib
a41f4cc9a5 Allow some dotdot lookups in capability mode.
If dotdot lookup does not escape from the file descriptor passed as
the lookup root, we can allow the component traversal.  Track the
directories traversed, and check the result of dotdot lookup against
the recorded list of the directory vnodes.

Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently
disabled by default until more verification of the approach is done.

Disallow non-local filesystems for dotdot, since remote server might
conspire with the local process to allow it to escape the namespace.
This might be too cautious, provide the knob
vfs.lookup_cap_dotdot_nonlocal to override as well.

Idea by:	rwatson
Discussed with:	emaste, jonathan, rwatson
Reviewed by:	mjg (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 week
Differential revision:	https://reviews.freebsd.org/D8110
2016-11-02 12:43:15 +00:00
trasz
2c0de38912 Fix getfsstat(2) with MNT_WAIT to not skip filesystems that are in the
process of being unmounted.  Previously it would skip them, even if the
unmount eventually failed eg due to the filesystem being busy.

This behaviour broke autounmountd(8) - if you tried to manually unmount
a mounted filesystem, using 'automount -u', and the autounmountd attempted
to refresh the filesystem list in that very moment, it would conclude that
the filesystem got unmounted and not try to unmount it afterwards.

Reviewed by:	kib@
Tested by:	pho@
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8030
2016-11-02 09:43:19 +00:00
trasz
5ea37b9562 Fix getfsstat(2) handling of flags. The 'flags' argument is an enum,
not a bitfield. For the intended usage - being passed either MNT_WAIT,
or MNT_NOWAIT - this shouldn't introduce any changes in behaviour.

Reviewed by:	jhb@
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8373
2016-10-29 12:38:30 +00:00
emaste
00b67b15b9 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
ed
b877ee1f8a Eliminate use of sys_fsync() and sys_fdatasync().
Make the kern_fsync() function public, so that it can be used by other
parts of the kernel. Fix up existing consumers to make use of it.

Requested by:	kib
2016-08-15 20:11:52 +00:00
kib
aa6b4fc56a Add an implementation of fdatasync(2).
The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing
code with fsync(2).  For all filesystems, this commit provides the
implementation which delegates the work of VOP_FDATASYNC() to
VOP_FSYNC().  This is functionally correct but not efficient.

This is not yet POSIX-compliant implementation, because it does not
ensure that queued AIO requests are completed before returning.

Reviewed by:	mckusick
Discussed with:	avg (ZFS), jhb (AIO part)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7471
2016-08-15 19:08:51 +00:00
kib
53c82a1389 Do not allow creation of char or block special nodes with VNOVAL dev_t.
As was reported on http://seclists.org/oss-sec/2016/q3/68, tmpfs code
contains assertion that rdev != VNOVAL.  On FreeBSD, there is no other
consequences except triggering the assert.  To be compatible with
systems where device nodes have some significance, reject mknod(2)
call with dev == VNOVAL at the syscall level.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-07-15 09:23:18 +00:00
rwatson
f003d8da48 Audit the file-descriptor number argument for openat(2). Remove a comment
about the desirability of auditing the number, as it was in fact in the
wrong place (in the common path for open(2) and openat(2), and only the
latter accepts a file-descriptor argument).  Where other ABIs support
openat(2), it may be necessary to do additional argument auditing as it is
not performed in kern_openat(9).

MFC after:	3 days
Sponsored by:	DARPA, AFRL
2016-07-10 09:50:21 +00:00
glebius
a0c5a05de8 Fix kernel stack disclosures in the Linux and 4.3BSD compat layers.
Submitted by:	CTurt
Security:	SA-16:20
Security:	SA-16:21
2016-05-31 16:56:30 +00:00
jhb
1b87e4306e Simplify AIO initialization now that it is standard.
- Mark AIO system calls as STD and remove the helpers to dynamically
  register them.
- Use COMPAT6 for the old system calls with the older sigevent instead of
  an 'o' prefix.
- Simplify the POSIX configuration to note that AIO is always available.
- Handle AIO in the default VOP_PATHCONF instead of special casing it in
  the pathconf() system call.  fpathconf() is still hackish.
- Remove freebsd32_aio_cancel() as it just called the native one directly.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5589
2016-03-09 19:05:11 +00:00
markj
9abb1836d9 Improve error handling for posix_fallocate(2) and posix_fadvise(2).
- Set td_errno so that ktrace and dtrace can obtain the syscall error
  number in the usual way.
- Pass negative error numbers directly to the syscall layer, as they're
  not intended to be returned to userland.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5425
2016-02-25 19:58:23 +00:00
mckusick
3ddfa78dde Clarify a comment in kern_openat() about the use of falloc_noinstall().
Suggested by: Steve Jacobson
2016-02-07 01:04:47 +00:00
trasz
a6632d64f5 The freebsd4_getfsstat() was broken in r281551 to always return 0 on success.
All versions of getfsstat(3) are supposed to return the number of [o]statfs
structs in the array that was copied out.

Also fix missing bounds checking and signed comparison of unsigned types.

Submitted by:	bde@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-11-20 14:08:12 +00:00
markj
d6b5b38ff0 Revert r288628 and instead fix a discrepancy between the posix_fadvise(2)
man page and POSIX: posix_fadvise(2) returns an error number on failure.

Reported by:	jilles
MFC after:	1 week
2015-10-03 22:27:14 +00:00
markj
f73def85e6 The return value of posix_fadvise(2) is just an error status, so
sys_posix_fadvise() should simply return the errno (or 0) to syscallenter()
rather than setting a return value.

MFC after:	1 week
2015-10-03 19:37:41 +00:00
markj
6348241c12 As a step towards the elimination of PG_CACHED pages, rework the handling
of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to
the head of the inactive queue instead of being cached.

This affects the implementation of POSIX_FADV_NOREUSE as well, since it
works by applying POSIX_FADV_DONTNEED to file ranges after they have been
read or written.  At that point the corresponding buffers may still be
dirty, so the previous implementation would coalesce successive ranges and
apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
dirty buffers would eventually be cached.  To preserve this behaviour in an
efficient manner, this change adds a new buf flag, B_NOREUSE, which causes
the pages backing a VMIO buf to be placed at the head of the inactive queue
when the buf is released.  POSIX_FADV_NOREUSE then works by setting this
flag in bufs that underlie the specified range.

Reviewed by:	alc, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3726
2015-09-30 23:06:29 +00:00
avg
425c0bb088 save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE
SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days
2015-09-28 12:14:16 +00:00
ed
066f63003b Decompose linkat()/renameat() rights to source and target.
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.

This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.

Reviewed by:	rwatson, wblock
Differential Revision:	https://reviews.freebsd.org/D3411
2015-08-27 15:16:41 +00:00
bz
782cc51566 Try to unbreak the build after r285390 removing the obsolete static
declaration.
2015-07-12 00:26:22 +00:00
mjg
c71e9ab863 Move chdir/chroot-related fdp manipulation to kern_descrip.c
Prefix exported functions with pwd_.

Deduplicate some code by adding a helper for setting fd_cdir.

Reviewed by:	kib
2015-07-11 16:19:11 +00:00
mjg
1a3e7a935e Replace struct filedesc argument in getvnode with struct thread
This is is a step towards removal of spurious arguments.
2015-06-16 13:09:18 +00:00
mjg
82d355a2e3 Tidy up sys_umask a little bit
Consistently use saved fdp pointer as it cannot change. If it could change the
code would be already incorrect.

No functional changes.
2015-05-18 13:43:33 +00:00
kib
2254748ed0 The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and
pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x
kernels which required padding before the off_t parameter.  The
fcntl(2) contains compatibility code to handle kernels before the
struct flock was changed during the 8.x CURRENT development.  The
shims were reasonable to allow easier revert to the older kernel at
that time.

Now, two or three major releases later, shims do not serve any
purpose.  Such old kernels cannot handle current libc, so revert the
compatibility code.

Make padded syscalls support conditional under the COMPAT6 config
option.  For COMPAT32, the syscalls were under COMPAT6 already.

Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to
(partially) disable the removed shims.

Reviewed by:	jhb, imp (previous versions)
Discussed with:	peter
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-18 21:50:13 +00:00
trasz
009c656eaf Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This
adds missing jail and MAC checks.

Differential Revision:	https://reviews.freebsd.org/D2193
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-15 09:13:11 +00:00
jilles
308eebbd52 utimensat: Correct Capsicum required capability rights. 2015-04-04 21:47:54 +00:00
mjg
0a219ba739 filedesc: simplify fget_unlocked & friends
Introduce fget_fcntl which performs appropriate checks when needed.
This removes a branch from fget_unlocked.

Introduce fget_mmap dealing with cap_rights_to_vmprot conversion.
This removes a branch from _fget.

Modify fget_unlocked to pass sequence counter to interested callers so
that they can perform their own checks and make sure the result was
otained from stable & current state.

Reviewed by:	silence on -hackers
2015-02-17 23:54:06 +00:00
jilles
67db24d0f2 Add futimens and utimensat system calls.
The core kernel part is patch file utimes.2008.4.diff from
pluknet@FreeBSD.org. I updated the code for API changes, added the manual
page and added compatibility code for old kernels. There is also audit and
Capsicum support.

A new UTIME_* constant might allow setting birthtimes in future.

Differential Revision:	https://reviews.freebsd.org/D1426
Submitted by:	pluknet (partially)
Reviewed by:	delphij, pluknet, rwatson
Relnotes:	yes
2015-01-23 21:07:08 +00:00
kib
77c9d3f4e8 The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter().  Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by:	Chris Torek <torek@pi-coral.com>
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-18 10:01:12 +00:00
jkim
66032933f9 Correct a typo to fix chown(2). It was broken since r274476.
Pointy hat to:	kib
X-MFC-With:	r274476
2014-11-13 23:51:13 +00:00