Commit Graph

215 Commits

Author SHA1 Message Date
Konstantin Belousov
bf3e483b44 Similar to debug.iosize_max_clamp sysctl, introduce
devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized
i/o requests on the devfs files.

Sponsored by:	The FreeBSD Foundation
Reminded by:	Dmitry Sivachenko <trtrmitya@gmail.com>
MFC after:	1 week
2013-10-15 06:33:10 +00:00
Konstantin Belousov
64548150b6 Remove two instances of ARGSUSED comment, and wrap lines nearby the
code that is to be changed.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-10-15 06:28:11 +00:00
Konstantin Belousov
c0a46535c4 Make the seek a method of the struct fileops.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-21 17:36:01 +00:00
Konstantin Belousov
b1dd38f408 Restore the previous sendfile(2) behaviour on the block devices.
Provide valid .fo_sendfile method for several missed struct fileops.

Reviewed by:	glebius
Sponsored by:	The FreeBSD Foundation
2013-08-16 14:22:20 +00:00
Konstantin Belousov
2ca4998342 Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by:	jilles
Discussed with:	bde
MFC after:	2 weeks
2013-02-07 14:53:33 +00:00
Konstantin Belousov
ad9789f6db Do not force a writer to the devfs file to drain the buffer writes.
Requested and tested by:	Ian Lepore <freebsd@damnhippie.dyndns.org>
MFC after:	2 weeks
2012-12-23 22:43:27 +00:00
Hans Petter Selasky
07da61a6cc Streamline use of cdevpriv and correct some corner cases.
1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)

2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.

3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.

Discussed with:	phk
MFC after:	2 weeks
2012-08-15 16:19:39 +00:00
Konstantin Belousov
c5c1199c83 Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by:	pho
No objections from:	jhb
MFC after:    3 weeks
2012-07-02 21:01:03 +00:00
Alexander Motin
d499701b0c Revert devfs part of r235911. I was unaware about old but unfinished
discussion between kib@ and gibbs@ about it.
2012-05-24 18:19:23 +00:00
Alexander Motin
f6ad3f237a MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.

Sponsored by:   Spectra Logic Corporation
Sponsored by:   iXsystems, Inc.
Submitted by:   gibbs, will, mav
2012-05-24 14:07:44 +00:00
Konstantin Belousov
526d0bd547 Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
John Baldwin
e517e6f12c Explicitly use curthread while manipulating td_fpop during last close
of a devfs file descriptor in devfs_close_f().  The passed in td argument
may be NULL if the close was invoked by garbage collection of open
file descriptors in pending control messages in the socket buffer of a
UNIX domain socket after it was closed.

PR:		kern/151758
Submitted by:	Andrey Shidakov  andrey shidakov ru
Submitted by:	Ruben van Staveren  ruben verweg com
Reviewed by:	kib
MFC after:	2 weeks
2011-12-09 17:49:34 +00:00
Konstantin Belousov
f82360acf2 Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs.  The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by:	pho
Reviewed by:	mckusick
MFC after:	3 weeks (subject of re approval)
2011-11-19 07:50:49 +00:00
John Baldwin
dccc45e4c0 Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by:	kib
MFC after:	1 week
2011-11-04 03:39:31 +00:00
Konstantin Belousov
1fef78c3f0 Fix kernel panic when d_fdopen csw method is called for NULL fp.
This may happen when kernel consumer calls VOP_OPEN().

Reported by:	Tavis Ormandy <taviso  cmpxchg8b com> through delphij
MFC after:	3 days
2011-11-03 18:55:18 +00:00
Konstantin Belousov
9c00bb9190 Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by:	glebius
Reviewed by:	rwatson
Approved by:	re (bz)
2011-08-16 20:07:47 +00:00
Konstantin Belousov
724ce55b5b While fixing the looping of a thread while devfs vnode is reclaimed,
r179247 introduced a possibility of devfs_allocv() returning spurious
ENOENT. If the vnode is selected by vnlru daemon for reclamation, then
devfs_allocv() can get ENOENT from vget() due to devfs_close() dropping
vnode lock around the call to cdevsw d_close method.

Use LK_RETRY in the vget() call, and do some part of the devfs_reclaim()
work in devfs_allocv(), clearing vp->v_data and de->de_vnode. Retry the
allocation of the vnode, now with de->de_vnode == NULL.

The check vp->v_data == NULL at the start of devfs_close() cannot be
affected by the change, since vnode lock must be held while VI_DOOMED
is set, and only dropped after the check.

Reported and tested by:	Kohji Okuno <okuno.kohji jp panasonic com>
Reviewed by:	attilio
MFC after:	3 weeks
2011-07-13 21:07:41 +00:00
Jaakko Heinonen
2d843e7d34 Don't allow user created symbolic links to cover another entries marked
with DE_USER. If a devfs rule hid such entry, it was possible to create
infinite number of symbolic links with the same name.

Reviewed by:	kib
2010-12-15 16:49:47 +00:00
Jaakko Heinonen
ef456eec95 - Assert that dm_lock is exclusively held in devfs_rules_apply() and
in devfs_vmkdir() while adding the entry to de_list of the parent.
- Apply devfs rules to newly created directories and symbolic links.

PR:		kern/125034
Submitted by:	Mateusz Guzik (original version)
2010-12-15 16:42:44 +00:00
Jaakko Heinonen
d318c565d7 Add reference counting for devfs paths containing user created symbolic
links. The reference counting is needed to be able to determine if a
specific devfs path exists. For true device file paths we can traverse
the cdevp_list but a separate directory list is needed for user created
symbolic links.

Add a new directory entry flag DE_USER to mark entries which should
unreference their parent directory on deletion.

A new function to traverse cdevp_list and the directory list will be
introduced in a separate commit.

Idea from:	kib
Reviewed by:	kib
2010-09-27 17:47:09 +00:00
Jaakko Heinonen
6adc52306a Modify devfs_fqpn() for future use in devfs path reference counting
code:

- Accept devfs_mount and devfs_dirent as the arguments instead of a
  vnode. This generalizes the function so that it can be used from
  contexts where vnode references are not available.
- Accept NULL cnp argument. No '/' will be appended, if a NULL cnp is
  provided.
- Make the function global and add its prototype to devfs.h.

Reviewed by:	kib
2010-09-21 16:49:02 +00:00
Jaakko Heinonen
89d10571db Remove empty devfs directories automatically.
devfs_delete() now recursively removes empty parent directories unless
the DEVFS_DEL_NORECURSE flag is specified. devfs_delete() can't be
called anymore with a parent directory vnode lock held because the
possible parent directory deletion needs to lock the vnode. Thus we
unlock the parent directory vnode in devfs_remove() before calling
devfs_delete().

Call devfs_populate_vp() from devfs_symlink() and devfs_vptocnp() as now
directories can get removed.

Add a check for DE_DOOMED flag to devfs_populate_vp() because
devfs_delete() drops dm_lock before the VI_DOOMED vnode flag gets set.
This ensures that devfs_populate_vp() returns an error for directories
which are in progress of deletion.

Reviewed by:	kib
Discussed on:	freebsd-current (mostly silence)
2010-09-15 14:23:55 +00:00
Jaakko Heinonen
4136388a18 Set de_dir for user created symbolic links. This will be needed to be
able to resolve their parent directories.
2010-08-26 16:01:29 +00:00
Jaakko Heinonen
f5efcd64f4 Call devfs_populate_vp() from devfs_getattr(). It was possible that
fstat(2) returned stale information through an open file descriptor.
2010-08-25 15:29:12 +00:00
Jaakko Heinonen
0f6bb099ae Introduce and use devfs_populate_vp() to unlock a vnode before calling
devfs_populate(). This is a prerequisite for the automatic removal of
empty directories which will be committed in the future.

Reviewed by:	kib (previous version)
2010-08-22 16:08:12 +00:00
John Baldwin
3634d5b241 Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created.  Use them to implement macros that
otherwise manipulated the flags directly.  Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe.  This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by:	kib
MFC after:	3 days
2010-08-20 19:46:50 +00:00
Jaakko Heinonen
96835d61b6 Call dev_rel() in error paths.
Reported by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2010-08-19 16:39:00 +00:00
Jaakko Heinonen
64040d3978 Allow user created symbolic links to cover device files and directories
if the device file appears during or after the link creation.

User created symbolic links are now inserted at the head of the
directory entry list after the "." and ".." entries. A new directory
entry flag DE_COVERED indicates that an entry is covered by a symbolic
link.

PR:		kern/114057
Reviewed by:	kib
Idea from:	kib
Discussed on:	freebsd-current (mostly silence)
2010-08-12 15:29:07 +00:00
Konstantin Belousov
3979450b4c Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.

In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:42:15 +00:00
Konstantin Belousov
9968a42675 Enable shared locks for the devfs vnodes. Honor the locking mode
requested by lookup(). This should be a nop at the moment.

In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:23:47 +00:00
Konstantin Belousov
3a6fc63c9f Initialize VV_ISTTY vnode flag on the devfs vnode creation instead of
doing it on each open.

In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:06:55 +00:00
Jaakko Heinonen
f40645c83d Add a new function devfs_parent_dirent() for resolving devfs parent
directory entry. Use the new function in devfs_fqpn(), devfs_lookupx()
and devfs_vptocnp() instead of manually resolving the parent entry.

Reviewed by:	kib
2010-06-09 15:29:12 +00:00
Jaakko Heinonen
59e0452e82 Don't try to call cdevsw d_close() method when devfs_close() is called
because of insmntque1() failure.

Found with:	stress2
Suggested and reviewed by:	kib
2010-06-01 18:57:21 +00:00
Ed Schouten
8dc9b4cf04 Let access overriding to TTYs depend on the cdev_priv, not the vnode.
Basically this commit changes two things, which improves access to TTYs
in exceptional conditions. Basically the problem was that when you ran
jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the
node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if
you want to attach to screens quickly, use ssh(1), etc.

The fixes:

- Cache the cdev_priv of the controlling TTY in struct session. Change
  devfs_access() to compare against the cdev_priv instead of the vnode.
  This allows you to bypass UNIX permissions, even across different
  mounts of devfs.

- Extend devfs_prison_check() to unconditionally expose the device node
  of the controlling TTY, even if normal prison nesting rules normally
  don't allow this. This actually allows you to interact with this
  device node.

To be honest, I'm not really happy with this solution. We now have to
store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp).
In an ideal world, we should just get rid of the latter two and only use
s_ttyp, but this makes certian pieces of code very impractical (e.g.
devfs, kern_exit.c).

Reported by:	Many people
2009-12-19 18:42:12 +00:00
Ed Schouten
f8f6146082 Improve nested jail awareness of devfs by handling credentials.
Now that we start to use credentials on character devices more often
(because of MPSAFE TTY), move the prison-checks that are in place in the
TTY code into devfs.

Instead of strictly comparing the prisons, use the more common
prison_check() function to compare credentials. This means that
pseudo-terminals are only visible in devfs by processes within the same
jail and parent jails.

Even though regular users in parent jails can now interact with
pseudo-terminals from child jails, this seems to be the right approach.
These processes are also capable of interacting with the jailed
processes anyway, through signals for example.

Reviewed by:	kib, rwatson (older version)
2009-06-20 14:50:32 +00:00
Konstantin Belousov
c4df27d5c8 VOP_IOCTL takes unlocked vnode as an argument. Due to this, v_data may
be NULL or derefenced memory may become free at arbitrary moment.

Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL
to prevent reclaim; check whether the vnode was already reclaimed after
the lock is granted.

Reported by:	georg at dts su
Reviewed by:	des (pseudofs)
MFC after:	2 weeks
2009-06-10 13:57:36 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Konstantin Belousov
0e9bd89d7d Devfs replaces file ops vector with devfs-specific one in devfs_open(),
before the struct file is fully initialized in vn_open(), in particular,
fp->f_vnode is NULL. Other thread calling file operation before f_vnode
is set results in NULL pointer dereference in devvn_refthread().

Initialize f_vnode before calling d_fdopen() cdevsw method, that might
set file ops too.

Reported and tested by:	Chris Timmons <cwt networks cwu edu>
	(RELENG_7 version)
MFC after:	3 days
2009-05-15 19:23:05 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Robert Watson
885868cd8f Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by:    jeff
Reviewed by:    jeff
Discussed with: rmacklem, zach loafman @ isilon
2009-04-10 10:52:19 +00:00
Konstantin Belousov
52dc9305aa Enable advisory file locking for devfs vnodes.
Reported by:	Timothy Redaelli <timothy redaelli eu>
MFC after:	1 week
2009-03-11 12:53:16 +00:00
Konstantin Belousov
125dcf8c7d Extract the no_poll() and vop_nopoll() code into the common routine
poll_no_poll().
Return a poll_no_poll() result from devfs_poll_f() when
filedescriptor does not reference the live cdev, instead of ENXIO.

Noted and tested by:	hps
MFC after:	1 week
2009-03-06 15:35:37 +00:00
Bjoern A. Zeeb
7956d34b95 Remove unused local variables.
Submitted by:	Christoph Mallon christoph.mallon@gmx.de
Reviewed by:	kib
MFC after:	2 weeks
2009-01-31 17:36:22 +00:00
Edward Tomasz Napierala
71624181c8 Don't panic with "vinvalbuf: dirty bufs" when the mounted device that was
being written to goes away.

Reviewed by:	kib, scottl
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-01-08 19:13:34 +00:00
Konstantin Belousov
c7c7520a95 Do not leak defs_de_interlock on error.
Another pointy hat for my collection.
2008-12-12 11:10:10 +00:00
Joe Marcus Clarke
4c44fd376a Implement VOP_VPTOCNP for devfs. Directory and character device vnodes are
properly translated to their component names.

Reviewed by:	arch
Approved by:	kib
2008-12-12 01:00:38 +00:00
Edward Tomasz Napierala
15bc6b2bd8 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
Konstantin Belousov
7818e0a545 Save previous content of the td_fpop before storing the current
filedescriptor into it. Make sure that td_fpop is NULL when calling
d_mmap from dev_pager_getpages().

Change guards against td_fpop field being non-NULL with private state
for another device, and against sudden clearing the td_fpop. This
could occur when either a driver method calls another driver through
the filedescriptor operation, or a page fault happen while driver is
writing to a memory backed by another driver.

Noted by:	rwatson
Tested by:	rnoland
MFC after:	3 days
2008-09-26 14:50:49 +00:00
Konstantin Belousov
caf8aec886 fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:50:52 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00