Commit Graph

345 Commits

Author SHA1 Message Date
jamie
1c11f552d6 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
brooks
9d79658aab Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
jhb
050eb30c42 Report INT_MAX for LINK_MAX for devfs' VOP_PATHCONF().
devfs uses int's for link counts internally and already reports the the
full link count via stat() post ino64.

Sponsored by:	Chelsio Communications
2017-12-19 20:07:57 +00:00
jhb
4531c97d0c Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX for devfs' VOP_PATHCONF().
MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:53:34 +00:00
jhb
3efec8ad25 Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().
Having all filesystems fall through to default values isn't always correct
and these values can vary for different filesystem implementations.  Most
of these changes just use the existing default values with a few exceptions:
- Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact
  permissions check this claims for chown().
- Use NANDFS_NAME_LEN for NAME_MAX for nandfs.
- Don't report a LINK_MAX of 0 on smbfs.  Now fail with EINVAL to
  indicate hard links aren't supported.

Requested by:	bde (though perhaps not this exact implementation)
Reviewed by:	kib (earlier version)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:51:36 +00:00
kib
a26924110c In devfs_lookupx() dotdot lookup case, avoid dereferencing
dvp->v_mount after dvp is unlocked.

The vnode might be reclaimed after unlock, so v_mount becomes NULL.
Cache the struct mount pointer before the unlock, the struct is
type-stable.

Note that devfs_allocv() reads mp->mnt_data but does not operate on it
further when dirent is doomed.  The unmount cannot proceed until all
dirents are reclaimed.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-12-14 13:41:11 +00:00
manu
cae4d7505c r326394 is calling malloc with M_WAITOK under a lock, revert for now
Reported by:	andrew
2017-11-30 14:06:54 +00:00
manu
cbfe9423a0 devfs: Avoid a malloc/free if we just need to increment the refcount
MFC after:	1 week
Sponsored by:	Gandi.net
2017-11-30 12:38:42 +00:00
pfg
43f5681c36 sys/fs: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:15:37 +00:00
jhb
c5f4b54f16 Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.
Move handling of these three pathconf() variables out of vop_stdpathconf()
and into devfs_pathconf() as TTY devices can only be devfs files.  In
addition, only return settings for these three variables for devfs devices
whose device switch has the D_TTY flag set.

Discussed with:	bde, kib
Sponsored by:	Chelsio Communications
2017-09-21 23:05:32 +00:00
kib
e75ba1d5c4 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
trasz
c37dffb562 Simplify devfs_fsync() by removing it. This might also be a minor
optimization, as vn_isdisk() needs to lock a global mutex.

Reviewed by:	imp
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9628
2017-02-20 16:18:33 +00:00
kib
68b818dcb5 Apply noexec mount option for mmap(PROT_EXEC).
Right now the noexec mount option disallows image activators to try
execve the files on the mount point.  Also, after r127187, noexec
also limits max_prot map entries permissions for mappings of files
from such mounts, but not the actual mapping permissions.

As result, the API behaviour is inconsistent.  The files from noexec
mount can be mapped with PROT_EXEC, but if mprotect(2) drops execution
permission, it cannot be re-enabled later.  Make this consistent
logically and aligned with behaviour of other systems, by disallowing
PROT_EXEC for mmap(2).

Note that this change only ensures aligned results from mmap(2) and
mprotect(2), it does not prevent actual code execution from files
coming from noexec mount.  Such files can always be read into
anonymous executable memory and executed from there.

Reported by:	shamaz.mazum@gmail.com
PR:	217062
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-02-19 20:51:04 +00:00
kib
9849e14951 Minor style fixes.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-02-16 06:31:36 +00:00
trasz
d3a6b909a1 Change the "devfs_fsync: vop_stdfsync failed" from panic to a printf.
It's not a proper fix, but should be better than what we have now.
Since it got broken some six months ago it results in an incredibly
annoying and trivially reproducible panic every time eg an USB disk
gets disconnected.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-02-15 16:52:21 +00:00
kib
42da5a6952 Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI.  The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure.  As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance.  Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by:	pho (as part of the bigger patch)
Reviewed by:	jhb (same)
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
X-Differential revision:	https://reviews.freebsd.org/D7302
2016-07-27 11:08:59 +00:00
cem
4c8503deb3 devfs: Move most ioctl logic down to vnode layer
Devfs' file layer ioctl is now just a thin shim around the vnode layer.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7286
2016-07-25 16:28:02 +00:00
kib
f3b36332af Another follow-up to r291460. Only access vp->v_rdev for VCHR vnodes
in devfs_reclaim().

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
2016-06-15 15:55:14 +00:00
kib
d0b1101f75 Remove zero assignments in the cdev allocator. cdp memory is
requested with M_ZERO.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-05-21 09:55:32 +00:00
pfg
0f281bd3eb sys/devfs: unsign an index to prevent signed integer overflow.
cdp_maxdirent in struct:cdev_priv is of type u_int.  Use the same
type for the corresponding index in devfs_revoke().

MFC after:	1 week
2016-04-28 02:39:43 +00:00
kib
8c18805577 When devfs dirent is freed, a vnode might still keep a pointer to it,
apparently.  Interlock and clear the pointer to avoid free memory
dereference.

Submitted by:	bde (previous version)
MFC after:	3 weeks
2016-01-22 20:30:51 +00:00
kib
32d7f35235 Assert that the linkage between struct cdev_privdata and and struct
file is consistent.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-01-17 08:34:35 +00:00
kib
21f21e7647 Make devfs_fpdrop() static. It was not a public KPI, and it has no
reason to remain exported for some time.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-13 14:03:06 +00:00
kib
ae62b8f932 Hide transient EBADF errors caused by the parallel revoke(2) or forced
unmount of devfs mounts, by restarting the failed syscall.

When restarted, failing syscalls eventually either stop finding the
node and returning ENOENT, or the vnode op vectors finally transition
to the deadfs vop.  The later return EIO or other error, more
appropriate for the operation.

Submitted by:	bde
Tested by:	pho
MFC after:	3 weeks
2016-01-02 20:29:28 +00:00
kib
348ef00d1d Minor style cleanup.
Submitted by:	bde
MFC after:	1 week
2016-01-01 15:48:48 +00:00
kib
76abdf80ab Make it possible for the cdevsw d_close() driver method to detect last
close and close due to revoke(2)-like operation.

A new FLASTCLOSE flag indicates that this is last close.  FREVOKE is
set for revokes, and FNONBLOCK is also set, same as is already done
for VOP_CLOSE() call from vgonel().

The flags reuse user open(2) flags which are never stored in f_flag,
to not consume bit space in the ABI visible way.  Assert this with the
static check.

Requested and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-12-22 20:37:34 +00:00
kib
2a63539543 Keep devfs mount locked for the whole duration of the devfs_setattr(),
and ensure that our dirent is instantiated.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-22 20:22:17 +00:00
jhb
f4698eb999 The cdevpriv_dtr_t typedef was not able to be used in a function prototype
like the various d_*_t typedefs since it declared a function pointer rather
than a function.  Add a new d_priv_dtor_t typedef that declares the function
and can be used as a function prototype.  The previous typedef wasn't
useful outside of the cdevpriv implementation, so retire it.

The name d_priv_dtor_t was chosen to be more consistent with cdev methods
since it is commonly used in place of d_close_t even though it is not a
direct pointer in struct cdevsw.

Reviewed by:	kib, imp
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D4340
2015-12-02 18:27:30 +00:00
trasz
db61d1271a Make it possible to forcibly unmount devfs.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-24 14:04:44 +00:00
trasz
9e31188bdc After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-23 14:53:54 +00:00
jhb
3fab33edd0 The changes that introduced fo_mmap() treated all character device
mappings as if MAP_SHARED was always present since in general MAP_PRIVATE
is not permitted for character devices.  However, there is one exception
in that MAP_PRIVATE mappings are permitted for /dev/zero.

Only require a writable file descriptor (FWRITE) for shared, writable
mappings of character devices.  vm_mmap_cdev() will reject any private
mappings for other devices.

Reviewed by:	kib
Reported by:	sbruno (broke qemu cross-builds), peter
Differential Revision:	https://reviews.freebsd.org/D3316
2015-08-06 16:50:37 +00:00
jhb
bba1e1e047 Add a new file operations hook for mmap operations. File type-specific
logic is now placed in the mmap hook implementation rather than requiring
it to be placed in sys/vm/vm_mmap.c.  This hook allows new file types to
support mmap() as well as potentially allowing mmap() for existing file
types that do not currently support any mapping.

The vm_mmap() function is now split up into two functions.  A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.

The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings.  For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead.  The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset.  The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.

The fo_mmap() hook is optional.  If it is not set, then fo_mmap() will
fail with ENODEV.  A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).

While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead.  While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.

Differential Revision:	https://reviews.freebsd.org/D2658
Reviewed by:	alc (glanced over), kib
MFC after:	1 month
Sponsored by:	Chelsio
2015-06-04 19:41:15 +00:00
kib
b7c417708f Refine r280308. Do not completely disable timestamping of devfs nodes
on reads or writes, the time marks are used to display idle time by
w(1) [1].  Instead, use vfs.devfs.dotimes as the selector of default
precision vs. using time_second.  The later gives seconds precision,
which is good enough for the purpose.

Note that timestamp updates are unlocked and the updates itself, as
well as the check in devfs_timestamp, are non-atomic.

Noted by:	truckman [1]
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-01 08:25:40 +00:00
delphij
041657da93 Disable timestamping on devfs read/write operations by default.
Currently we update timestamps unconditionally when doing read or
write operations.  This may slow things down on hardware where
reading timestamps is expensive (e.g. HPET, because of the default
vfs.timestamp_precision setting is nanosecond now) with limited
benefit.

A new sysctl variable, vfs.devfs.dotimes is added, which can be
set to non-zero value when the old behavior is desirable.

Differential Revision:	https://reviews.freebsd.org/D2104
Reported by:	Mike Tancsa <mike sentex net>
Reviewed by:	kib
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2015-03-21 01:14:11 +00:00
kib
3bc9cbc06a The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems.  Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.

Sponsored by:  EMC / Isilon Storage Division
Submitted by:   Conrad Meyer
MFC after:	1 week
2015-02-27 16:43:50 +00:00
kib
f748dc7ade Stop enforcing additional reference on all cdevs, which was introduced
in r277199.  Acquire the neccessary reference in delist_dev_locked()
and inform destroy_devl() about it using CDP_UNREF_DTR flag.

Fix some style nits, add asserts.

Discussed with:	hselasky
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:36:52 +00:00
kib
b3741c8701 Ignore devfs directory entries for devices either being destroyed or
delisted.  The check is racy.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:24:52 +00:00
hselasky
b04cbf0c36 Avoid race with "dev_rel()" when using the recently added
"delist_dev()" function. Make sure the character device structure
doesn't go away until the end of the "destroy_dev()" function due to
concurrently running cleanup code inside "devfs_populate()".

MFC after:	1 week
Reported by:	dchagin@
2015-01-14 22:07:13 +00:00
mjg
5c258fe4b9 Fix up some session-related races in devfs.
One was introduced with r272596, the rest was there to begin with.

Noted by: jhb
2014-11-03 03:12:15 +00:00
hselasky
49c137f7be Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
kib
6b880aa521 When vnode bypass cannot be performed on the cdev file descriptor for
read/write/poll/ioctl, call standard vnode filedescriptor fop.  This
restores the special handling for terminals by calling the deadfs VOP,
instead of always returning ENXIO for destroyed devices or revoked
terminals.

Since destroyed (and not revoked) device would use devfs_specops VOP
vector, make dead_read/write/poll non-static and fill VOP table with
pointers to the functions, to instead of VOP_PANIC.

Noted and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-15 13:16:51 +00:00
mjg
b14804421c devfs: tidy up after 272596
This moves a var to an if statement, no functional changes.

MFC after:	1 week
2014-10-06 07:22:48 +00:00
mjg
59a46e800a devfs: don't take proctree_lock unconditionally in devfs_close
MFC after:	1 week
2014-10-06 06:20:35 +00:00
jhb
8f082668d0 Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
  various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
  for each file and then convert that to a kinfo_ofile structure rather than
  keeping a second, different set of code that directly manipulates
  type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision:	https://reviews.freebsd.org/D775
Reviewed by:	kib, glebius (earlier version)
2014-09-22 16:20:47 +00:00
kib
f5cc3af6c8 In msdosfs_setattr(), add a check for result of the utimes(2)
permissions test, forgotten in r164033.

Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once.  Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).

Reported by:	bde
Reviewed by:	bde, trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-17 07:11:00 +00:00
mav
794856e581 Fix long known bug with handling device aliases residing not in devfs root.
Historically creation of device aliases created symbolic links using only
name of target device as a link target, not considering current directory.
Fix that by adding number of "../" chunks to the terget device name,
required to get out of the current directory to devfs root first.

MFC after:	1 month
2013-12-12 11:05:48 +00:00
kib
3dc87905fa Similar to debug.iosize_max_clamp sysctl, introduce
devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized
i/o requests on the devfs files.

Sponsored by:	The FreeBSD Foundation
Reminded by:	Dmitry Sivachenko <trtrmitya@gmail.com>
MFC after:	1 week
2013-10-15 06:33:10 +00:00
kib
58fc49ff4f Remove two instances of ARGSUSED comment, and wrap lines nearby the
code that is to be changed.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-10-15 06:28:11 +00:00
kib
6a459eb27c Make the seek a method of the struct fileops.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-21 17:36:01 +00:00
kib
408a640438 Restore the previous sendfile(2) behaviour on the block devices.
Provide valid .fo_sendfile method for several missed struct fileops.

Reviewed by:	glebius
Sponsored by:	The FreeBSD Foundation
2013-08-16 14:22:20 +00:00