Commit Graph

235 Commits

Author SHA1 Message Date
Konstantin Belousov
aeace3c33c Assert that the linkage between struct cdev_privdata and and struct
file is consistent.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-01-17 08:34:35 +00:00
Konstantin Belousov
a53b7c692d Make devfs_fpdrop() static. It was not a public KPI, and it has no
reason to remain exported for some time.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-01-13 14:03:06 +00:00
Konstantin Belousov
fb57d63e47 Hide transient EBADF errors caused by the parallel revoke(2) or forced
unmount of devfs mounts, by restarting the failed syscall.

When restarted, failing syscalls eventually either stop finding the
node and returning ENOENT, or the vnode op vectors finally transition
to the deadfs vop.  The later return EIO or other error, more
appropriate for the operation.

Submitted by:	bde
Tested by:	pho
MFC after:	3 weeks
2016-01-02 20:29:28 +00:00
Konstantin Belousov
d52aff3c7a Minor style cleanup.
Submitted by:	bde
MFC after:	1 week
2016-01-01 15:48:48 +00:00
Konstantin Belousov
cccac8a1ef Make it possible for the cdevsw d_close() driver method to detect last
close and close due to revoke(2)-like operation.

A new FLASTCLOSE flag indicates that this is last close.  FREVOKE is
set for revokes, and FNONBLOCK is also set, same as is already done
for VOP_CLOSE() call from vgonel().

The flags reuse user open(2) flags which are never stored in f_flag,
to not consume bit space in the ABI visible way.  Assert this with the
static check.

Requested and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-12-22 20:37:34 +00:00
Konstantin Belousov
b63d070ad1 Keep devfs mount locked for the whole duration of the devfs_setattr(),
and ensure that our dirent is instantiated.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-22 20:22:17 +00:00
John Baldwin
8d7e0f5889 The cdevpriv_dtr_t typedef was not able to be used in a function prototype
like the various d_*_t typedefs since it declared a function pointer rather
than a function.  Add a new d_priv_dtor_t typedef that declares the function
and can be used as a function prototype.  The previous typedef wasn't
useful outside of the cdevpriv implementation, so retire it.

The name d_priv_dtor_t was chosen to be more consistent with cdev methods
since it is commonly used in place of d_close_t even though it is not a
direct pointer in struct cdevsw.

Reviewed by:	kib, imp
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D4340
2015-12-02 18:27:30 +00:00
Edward Tomasz Napierala
6e572e084b After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-23 14:53:54 +00:00
John Baldwin
fada4adf95 The changes that introduced fo_mmap() treated all character device
mappings as if MAP_SHARED was always present since in general MAP_PRIVATE
is not permitted for character devices.  However, there is one exception
in that MAP_PRIVATE mappings are permitted for /dev/zero.

Only require a writable file descriptor (FWRITE) for shared, writable
mappings of character devices.  vm_mmap_cdev() will reject any private
mappings for other devices.

Reviewed by:	kib
Reported by:	sbruno (broke qemu cross-builds), peter
Differential Revision:	https://reviews.freebsd.org/D3316
2015-08-06 16:50:37 +00:00
John Baldwin
7077c42623 Add a new file operations hook for mmap operations. File type-specific
logic is now placed in the mmap hook implementation rather than requiring
it to be placed in sys/vm/vm_mmap.c.  This hook allows new file types to
support mmap() as well as potentially allowing mmap() for existing file
types that do not currently support any mapping.

The vm_mmap() function is now split up into two functions.  A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.

The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings.  For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead.  The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset.  The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.

The fo_mmap() hook is optional.  If it is not set, then fo_mmap() will
fail with ENODEV.  A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).

While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead.  While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.

Differential Revision:	https://reviews.freebsd.org/D2658
Reviewed by:	alc (glanced over), kib
MFC after:	1 month
Sponsored by:	Chelsio
2015-06-04 19:41:15 +00:00
Konstantin Belousov
bda2eb9ae8 Refine r280308. Do not completely disable timestamping of devfs nodes
on reads or writes, the time marks are used to display idle time by
w(1) [1].  Instead, use vfs.devfs.dotimes as the selector of default
precision vs. using time_second.  The later gives seconds precision,
which is good enough for the purpose.

Note that timestamp updates are unlocked and the updates itself, as
well as the check in devfs_timestamp, are non-atomic.

Noted by:	truckman [1]
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-01 08:25:40 +00:00
Xin LI
4f9343fc7c Disable timestamping on devfs read/write operations by default.
Currently we update timestamps unconditionally when doing read or
write operations.  This may slow things down on hardware where
reading timestamps is expensive (e.g. HPET, because of the default
vfs.timestamp_precision setting is nanosecond now) with limited
benefit.

A new sysctl variable, vfs.devfs.dotimes is added, which can be
set to non-zero value when the old behavior is desirable.

Differential Revision:	https://reviews.freebsd.org/D2104
Reported by:	Mike Tancsa <mike sentex net>
Reviewed by:	kib
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2015-03-21 01:14:11 +00:00
Konstantin Belousov
08189ed667 The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems.  Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.

Sponsored by:  EMC / Isilon Storage Division
Submitted by:   Conrad Meyer
MFC after:	1 week
2015-02-27 16:43:50 +00:00
Konstantin Belousov
a57a934a38 Ignore devfs directory entries for devices either being destroyed or
delisted.  The check is racy.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-19 17:24:52 +00:00
Mateusz Guzik
4dba07b216 Fix up some session-related races in devfs.
One was introduced with r272596, the rest was there to begin with.

Noted by: jhb
2014-11-03 03:12:15 +00:00
Konstantin Belousov
f12aa60c62 When vnode bypass cannot be performed on the cdev file descriptor for
read/write/poll/ioctl, call standard vnode filedescriptor fop.  This
restores the special handling for terminals by calling the deadfs VOP,
instead of always returning ENXIO for destroyed devices or revoked
terminals.

Since destroyed (and not revoked) device would use devfs_specops VOP
vector, make dead_read/write/poll non-static and fill VOP table with
pointers to the functions, to instead of VOP_PANIC.

Noted and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-15 13:16:51 +00:00
Mateusz Guzik
3a222fe000 devfs: tidy up after 272596
This moves a var to an if statement, no functional changes.

MFC after:	1 week
2014-10-06 07:22:48 +00:00
Mateusz Guzik
1d1b55fbba devfs: don't take proctree_lock unconditionally in devfs_close
MFC after:	1 week
2014-10-06 06:20:35 +00:00
John Baldwin
9696feebe2 Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
  various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
  for each file and then convert that to a kinfo_ofile structure rather than
  keeping a second, different set of code that directly manipulates
  type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision:	https://reviews.freebsd.org/D775
Reviewed by:	kib, glebius (earlier version)
2014-09-22 16:20:47 +00:00
Konstantin Belousov
7b81a399a4 In msdosfs_setattr(), add a check for result of the utimes(2)
permissions test, forgotten in r164033.

Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once.  Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).

Reported by:	bde
Reviewed by:	bde, trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-17 07:11:00 +00:00
Konstantin Belousov
bf3e483b44 Similar to debug.iosize_max_clamp sysctl, introduce
devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized
i/o requests on the devfs files.

Sponsored by:	The FreeBSD Foundation
Reminded by:	Dmitry Sivachenko <trtrmitya@gmail.com>
MFC after:	1 week
2013-10-15 06:33:10 +00:00
Konstantin Belousov
64548150b6 Remove two instances of ARGSUSED comment, and wrap lines nearby the
code that is to be changed.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-10-15 06:28:11 +00:00
Konstantin Belousov
c0a46535c4 Make the seek a method of the struct fileops.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-21 17:36:01 +00:00
Konstantin Belousov
b1dd38f408 Restore the previous sendfile(2) behaviour on the block devices.
Provide valid .fo_sendfile method for several missed struct fileops.

Reviewed by:	glebius
Sponsored by:	The FreeBSD Foundation
2013-08-16 14:22:20 +00:00
Konstantin Belousov
2ca4998342 Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by:	jilles
Discussed with:	bde
MFC after:	2 weeks
2013-02-07 14:53:33 +00:00
Konstantin Belousov
ad9789f6db Do not force a writer to the devfs file to drain the buffer writes.
Requested and tested by:	Ian Lepore <freebsd@damnhippie.dyndns.org>
MFC after:	2 weeks
2012-12-23 22:43:27 +00:00
Hans Petter Selasky
07da61a6cc Streamline use of cdevpriv and correct some corner cases.
1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)

2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.

3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.

Discussed with:	phk
MFC after:	2 weeks
2012-08-15 16:19:39 +00:00
Konstantin Belousov
c5c1199c83 Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by:	pho
No objections from:	jhb
MFC after:    3 weeks
2012-07-02 21:01:03 +00:00
Alexander Motin
d499701b0c Revert devfs part of r235911. I was unaware about old but unfinished
discussion between kib@ and gibbs@ about it.
2012-05-24 18:19:23 +00:00
Alexander Motin
f6ad3f237a MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.

Sponsored by:   Spectra Logic Corporation
Sponsored by:   iXsystems, Inc.
Submitted by:   gibbs, will, mav
2012-05-24 14:07:44 +00:00
Konstantin Belousov
526d0bd547 Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
John Baldwin
e517e6f12c Explicitly use curthread while manipulating td_fpop during last close
of a devfs file descriptor in devfs_close_f().  The passed in td argument
may be NULL if the close was invoked by garbage collection of open
file descriptors in pending control messages in the socket buffer of a
UNIX domain socket after it was closed.

PR:		kern/151758
Submitted by:	Andrey Shidakov  andrey shidakov ru
Submitted by:	Ruben van Staveren  ruben verweg com
Reviewed by:	kib
MFC after:	2 weeks
2011-12-09 17:49:34 +00:00
Konstantin Belousov
f82360acf2 Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs.  The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by:	pho
Reviewed by:	mckusick
MFC after:	3 weeks (subject of re approval)
2011-11-19 07:50:49 +00:00
John Baldwin
dccc45e4c0 Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by:	kib
MFC after:	1 week
2011-11-04 03:39:31 +00:00
Konstantin Belousov
1fef78c3f0 Fix kernel panic when d_fdopen csw method is called for NULL fp.
This may happen when kernel consumer calls VOP_OPEN().

Reported by:	Tavis Ormandy <taviso  cmpxchg8b com> through delphij
MFC after:	3 days
2011-11-03 18:55:18 +00:00
Konstantin Belousov
9c00bb9190 Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by:	glebius
Reviewed by:	rwatson
Approved by:	re (bz)
2011-08-16 20:07:47 +00:00
Konstantin Belousov
724ce55b5b While fixing the looping of a thread while devfs vnode is reclaimed,
r179247 introduced a possibility of devfs_allocv() returning spurious
ENOENT. If the vnode is selected by vnlru daemon for reclamation, then
devfs_allocv() can get ENOENT from vget() due to devfs_close() dropping
vnode lock around the call to cdevsw d_close method.

Use LK_RETRY in the vget() call, and do some part of the devfs_reclaim()
work in devfs_allocv(), clearing vp->v_data and de->de_vnode. Retry the
allocation of the vnode, now with de->de_vnode == NULL.

The check vp->v_data == NULL at the start of devfs_close() cannot be
affected by the change, since vnode lock must be held while VI_DOOMED
is set, and only dropped after the check.

Reported and tested by:	Kohji Okuno <okuno.kohji jp panasonic com>
Reviewed by:	attilio
MFC after:	3 weeks
2011-07-13 21:07:41 +00:00
Jaakko Heinonen
2d843e7d34 Don't allow user created symbolic links to cover another entries marked
with DE_USER. If a devfs rule hid such entry, it was possible to create
infinite number of symbolic links with the same name.

Reviewed by:	kib
2010-12-15 16:49:47 +00:00
Jaakko Heinonen
ef456eec95 - Assert that dm_lock is exclusively held in devfs_rules_apply() and
in devfs_vmkdir() while adding the entry to de_list of the parent.
- Apply devfs rules to newly created directories and symbolic links.

PR:		kern/125034
Submitted by:	Mateusz Guzik (original version)
2010-12-15 16:42:44 +00:00
Jaakko Heinonen
d318c565d7 Add reference counting for devfs paths containing user created symbolic
links. The reference counting is needed to be able to determine if a
specific devfs path exists. For true device file paths we can traverse
the cdevp_list but a separate directory list is needed for user created
symbolic links.

Add a new directory entry flag DE_USER to mark entries which should
unreference their parent directory on deletion.

A new function to traverse cdevp_list and the directory list will be
introduced in a separate commit.

Idea from:	kib
Reviewed by:	kib
2010-09-27 17:47:09 +00:00
Jaakko Heinonen
6adc52306a Modify devfs_fqpn() for future use in devfs path reference counting
code:

- Accept devfs_mount and devfs_dirent as the arguments instead of a
  vnode. This generalizes the function so that it can be used from
  contexts where vnode references are not available.
- Accept NULL cnp argument. No '/' will be appended, if a NULL cnp is
  provided.
- Make the function global and add its prototype to devfs.h.

Reviewed by:	kib
2010-09-21 16:49:02 +00:00
Jaakko Heinonen
89d10571db Remove empty devfs directories automatically.
devfs_delete() now recursively removes empty parent directories unless
the DEVFS_DEL_NORECURSE flag is specified. devfs_delete() can't be
called anymore with a parent directory vnode lock held because the
possible parent directory deletion needs to lock the vnode. Thus we
unlock the parent directory vnode in devfs_remove() before calling
devfs_delete().

Call devfs_populate_vp() from devfs_symlink() and devfs_vptocnp() as now
directories can get removed.

Add a check for DE_DOOMED flag to devfs_populate_vp() because
devfs_delete() drops dm_lock before the VI_DOOMED vnode flag gets set.
This ensures that devfs_populate_vp() returns an error for directories
which are in progress of deletion.

Reviewed by:	kib
Discussed on:	freebsd-current (mostly silence)
2010-09-15 14:23:55 +00:00
Jaakko Heinonen
4136388a18 Set de_dir for user created symbolic links. This will be needed to be
able to resolve their parent directories.
2010-08-26 16:01:29 +00:00
Jaakko Heinonen
f5efcd64f4 Call devfs_populate_vp() from devfs_getattr(). It was possible that
fstat(2) returned stale information through an open file descriptor.
2010-08-25 15:29:12 +00:00
Jaakko Heinonen
0f6bb099ae Introduce and use devfs_populate_vp() to unlock a vnode before calling
devfs_populate(). This is a prerequisite for the automatic removal of
empty directories which will be committed in the future.

Reviewed by:	kib (previous version)
2010-08-22 16:08:12 +00:00
John Baldwin
3634d5b241 Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created.  Use them to implement macros that
otherwise manipulated the flags directly.  Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe.  This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by:	kib
MFC after:	3 days
2010-08-20 19:46:50 +00:00
Jaakko Heinonen
96835d61b6 Call dev_rel() in error paths.
Reported by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2010-08-19 16:39:00 +00:00
Jaakko Heinonen
64040d3978 Allow user created symbolic links to cover device files and directories
if the device file appears during or after the link creation.

User created symbolic links are now inserted at the head of the
directory entry list after the "." and ".." entries. A new directory
entry flag DE_COVERED indicates that an entry is covered by a symbolic
link.

PR:		kern/114057
Reviewed by:	kib
Idea from:	kib
Discussed on:	freebsd-current (mostly silence)
2010-08-12 15:29:07 +00:00
Konstantin Belousov
3979450b4c Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.

In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:42:15 +00:00
Konstantin Belousov
9968a42675 Enable shared locks for the devfs vnodes. Honor the locking mode
requested by lookup(). This should be a nop at the moment.

In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:23:47 +00:00