Commit Graph

2203 Commits

Author SHA1 Message Date
Konstantin Belousov
9e40a5f827 When devfs_allocv() committed to create new vnode, since de_vnode is NULL,
the dm_lock is held while the newly allocated vnode is locked. Since no
other threads may try to lock the new vnode yet, the LOR there cannot
result in the deadlock.

Shut down the witness warning to note this fact.

Tested by:	pho
Prodded by:	attilio
2008-06-05 09:15:47 +00:00
Ed Schouten
16151645c2 Revert the changes I made to devfs_setattr() in r179457.
As discussed with Robert Watson and John Baldwin, it would be better if
PTY's are created with proper permissions, turning grantpt() into a
no-op.

Bypassing security frameworks like MAC by passing NOCRED to
VOP_SETATTR() will only make things more complex.

Approved by:	philip (mentor)
2008-06-01 14:02:46 +00:00
Ed Schouten
34d1dcf0cc Merge back devfs changes from the mpsafetty branch.
In the mpsafetty branch, PTY's are allocated through the posix_openpt()
system call. The controller side of a PTY now uses its own file
descriptor type (just like sockets, vnodes, pipes, etc).

To remain compatible with existing FreeBSD and Linux C libraries, we can
still create PTY's by opening /dev/ptmx or /dev/ptyXX. These nodes
implement d_fdopen(). Devfs has been slightly changed here, to allow
finit() to be called from d_fdopen().

The routine grantpt() has also been moved into the kernel. This routine
is a little odd, because it needs to bypass standard UNIX permissions.
It needs to change the owner/group/mode of the slave device node, which
may often not be possible. The old implementation solved this by
spawning a setuid utility.

When VOP_SETATTR() is called with NOCRED, devfs_setattr() dereferences
ap->a_cred, causing a kernel panic. Change the de_{uid,gid,mode} code to
allow changes when a->a_cred is set to NOCRED.

Approved by:	philip (mentor)
2008-05-31 14:06:37 +00:00
Ulf Lilleengen
60af8a6a7a - Add locking to all filesystem operations in fdescfs and flag it as MPSAFE.
- Use proper synhronization primitives to protect the internal fdesc node cache
  used in fdescfs.
- Properly initialize and uninitalize hash.
- Remove unused functions.

Since fdescfs might recurse on itself, adding proper locking to it needed some
tricky workarounds in some parts to make it work. For instance, a descriptor in
fdescfs could refer to an open descriptor to itself, thus forcing the thread to
recurse on vnode locks. Because of this, other race conditions also had to be
fixed.

Tested by:	pho
Reviewed by:	kib (mentor)
Approved by:	kib (mentor)
2008-05-24 14:51:30 +00:00
Konstantin Belousov
772e245341 When vget() fails (because the vnode has been reclaimed), there is no
sense to loop trying to vget() the vnode again.

PR:	122977
Submitted by:	Arthur Hartwig <arthur.hartwig nokia com>
Tested by:	pho
Reviewed by:	jhb
MFC after:	1 week
2008-05-23 16:36:39 +00:00
Konstantin Belousov
82f4d64035 Implement the per-open file data for the cdev.
The patch does not change the cdevsw KBI. Management of the data is
provided by the functions
int	devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr);
int	devfs_get_cdevpriv(void **datap);
void	devfs_clear_cdevpriv(void);
All of the functions are supposed to be called from the cdevsw method
contexts.

- devfs_set_cdevpriv assigns the priv as private data for the file
  descriptor which is used to initiate currently performed driver
  operation. dtr is the function that will be called when either the
  last refernce to the file goes away, the device is destroyed  or
  devfs_clear_cdevpriv is called.
- devfs_get_cdevpriv is the obvious accessor.
- devfs_clear_cdevpriv allows to clear the private data for the still
  open file.

Implementation keeps the driver-supplied pointers in the struct
cdev_privdata, that is referenced both from the struct file and struct
cdev, and cannot outlive any of the referee.

Man pages will be provided after the KPI stabilizes.

Reviewed by:	jhb
Useful suggestions from:	jeff, antoine
Debugging help and tested by:	pho
MFC after:	1 month
2008-05-21 09:31:44 +00:00
Markus Brueffer
9c2bf69d32 Fix and speedup timestamp calculations which is roughly based on the patch in
the mentioned PR:

- bounds check time->month as it is used as an array index
- fix usage of time->month as array index (month is 1-12)
- fix calculation based on time->day (day is 1-31)
- fix the speedup code as it doesn't calculate correct timestamps before
  the year 2000 and reduce the number of calculation in the year-by-year code
- speedup month calculations by replacing the array content with cumulative
  values
- add microseconds calculation
- fix an endian problem

PR:		kern/97786
Submitted by:	Andriy Gapon <avg@topspin.kiev.ua>
Reviewed by:	scottl (earlier version)
Approved by:	emax (mentor)
MFC after:	1 week
2008-05-16 22:31:17 +00:00
Attilio Rao
58c5a5eb70 lockinit() can't accept LK_EXCLUSIVE as an initializaiton flag, so just
drop it.

Reported by:	Josh Carroll <josh dot carroll at gmail dot com>
Submitted by:	jhb
2008-05-15 21:39:25 +00:00
John Baldwin
06d0d0e274 Don't explicitly drop Giant around d_open/d_fdopen/d_close for MPSAFE
drivers.  Since devfs is already marked MPSAFE it shouldn't be held
anyway.

MFC after:	2 weeks
Discussed with:	phk
2008-05-07 19:03:57 +00:00
Daichi GOTO
3af387c9d2 - change function name from *_vdir to *_vnode because
VSOCK has been added as cache target. Now they process
  not only VDIR but also VSOCK.
- fixed panic issue caused by cache incorrect free process
  by "umount -f"

Submitted by:	Masanori OZAWA <ozawa@ongs.co.jp>
MFC after:	1 week
2008-05-07 05:32:55 +00:00
Daichi GOTO
fe5f08cda3 o Fixed multi thread access issue reported by Alexander V. Chernikov
(admin@su29.net)
  fixed: kern/109950

PR:		kern/109950
Submitted by:	Alexander V. Chernikov (admin@su29.net)
Reviewed by:	Masanori OZAWA (ozawa@ongs.co.jp)
MFC after:	1 week
2008-04-25 11:37:20 +00:00
Daichi GOTO
938161d61a o Improved unix socket connection issue
fixed: kern/118346

PR:		kern/118346
Submitted by:	Masanori OZAWA (ozawa@ongs.co.jp)
MFC after:	1 week
2008-04-25 09:53:52 +00:00
Daichi GOTO
5307411cbe o Fixed rename panic issue
Submitted by:	Masanori OZAWA (ozawa@ongs.co.jp)
MFC after:	1 week
2008-04-25 09:44:47 +00:00
Daichi GOTO
a9b794ff5e o Fixed inaccessible issue especially including devfs on unionfs case.
fixed also: kern/117829

PR:		kern/117829
Submitted by:	Masanori OZAWA (ozawa@ongs.co.jp)
MFC after:	1 week
2008-04-25 09:38:48 +00:00
Daichi GOTO
a68ae31c71 o Added system hang-up process when VOP_READDIR of unionfs_nodeget()
returns not end of the file status on debug mode (DIAGNOSTIC defined)
  kernel.

Submitted by:	Masanori OZAWA (ozawa@ongs.co.jp)
MFC after:	1 week
2008-04-25 07:58:19 +00:00
Konstantin Belousov
eab626f110 Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by:	kris
Tested by:	kris, pho
Discussed with:	jeff, dfr
MFC after:	2 weeks
2008-04-16 11:33:32 +00:00
Doug Rabson
18121c17f5 When calling lf_advlock to unlock a record, make sure that ap->a_fl->l_type
is F_UNLCK otherwise we trigger a LOCKF_DEBUG panic.

MFC after: 3 days
2008-04-14 09:22:48 +00:00
Attilio Rao
047dd67e96 Optimize lockmgr in order to get rid of the pool mutex interlock, of the
state transitioning flags and of msleep(9) callings.
Use, instead, an algorithm very similar to what sx(9) and rwlock(9)
alredy do and direct accesses to the sleepqueue(9) primitive.

In order to avoid writer starvation a mechanism very similar to what
rwlock(9) uses now is implemented, with the correspective per-thread
shared lockmgrs counter.

This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and
lockmgr_args_rw().  These two are like the 2 "normal" versions, but they
both accept a rwlock as interlock.  In order to realize this, the general
lockmgr manager function "__lockmgr_args()" has been implemented through
the generic lock layer. It supports all the blocking primitives, but
currently only these 2 mappers live.

The patch drops the support for WITNESS atm, but it will be probabilly
added soon. Also, there is a little race in the draining code which is
also present in the current CVS stock implementation: if some sharers,
once they wakeup, are in the runqueue they can contend the lock with
the exclusive drainer.  This is hard to be fixed but the now committed
code mitigate this issue a lot better than the (past) CVS version.
In addition assertive KA_HELD and KA_UNHELD have been made mute
assertions because they are dangerous and they will be nomore supported
soon.

In order to avoid namespace pollution, stack.h is splitted into two
parts: one which includes only the "struct stack" definition (_stack.h)
and one defining the KPI.  In this way, newly added _lockmgr.h can
just include _stack.h.

Kernel ABI results heavilly changed by this commit (the now committed
version of "struct lock" is a lot smaller than the previous one) and
KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction,
so manpages and __FreeBSD_version will be updated accordingly.

Tested by:      kris, pho, jeff, danger
Reviewed by:    jeff
Sponsored by:   Google, Summer of Code program 2007
2008-04-06 20:08:51 +00:00
Konstantin Belousov
8eb6b6ecb6 The temporary workaround for the call to the vget() without lock type in
the fdesc_allocvp(). The caller of the fdesc_allocvp() expects that the
returned vnode is not reclaimed. Do lock the vnode exclusive and drop
the lock after.

Reported by:	pho
Reviewed by:	jeff
2008-04-04 09:37:57 +00:00
Konstantin Belousov
57b4252e45 Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
Jeff Roberson
4c65d593e2 - Simplify null_hashget() and null_hashins() by using vref() rather
than a complex series of steps involving vget() without a lock type
   to emulate the same thing.
2008-03-29 23:24:54 +00:00
Doug Rabson
dfdcada31e Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00
Jeff Roberson
698b1a6643 - Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
 - Create a new lock in the bufobj to lock bufobj fields independently.
   This leaves the vnode interlock as an 'identity' lock while the bufobj
   is an io lock.  The bufobj lock is ordered before the vnode interlock
   and also before the mnt ilock.
 - Exploit this new lock order to simplify softdep_check_suspend().
 - A few sync related functions are marked with a new XXX to note that
   we may not properly interlock against a non-zero bv_cnt when
   attempting to sync all vnodes on a mountlist.  I do not believe this
   race is important.  If I'm wrong this will make these locations easier
   to find.

Reviewed by:	kib (earlier diff)
Tested by:	kris, pho (earlier diff)
2008-03-22 09:15:16 +00:00
Konstantin Belousov
91a35e7870 Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly
obtain the reference. In particular, this fixes the panic reported in
the PR. Remove the comments stating that this needs to be done.

PR:	kern/119422
MFC after:	1 week
2008-03-20 16:08:42 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Robert Watson
970a2d8770 Replace lockmgr lock protecting nwfs vnode hash table with an sx lock.
MFC after:	1 month
2008-03-02 19:02:30 +00:00
Robert Watson
7947229ff6 Replace lockmgr lock protecting smbfs node hash table with sx lock.
MFC after:	1 month
2008-03-02 18:56:13 +00:00
Attilio Rao
7fbfba7bf8 - Handle buffer lock waiters count directly in the buffer cache instead
than rely on the lockmgr support [1]:
  * bump the waiters only if the interlock is held
  * let brelvp() return the waiters count
  * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check
    for the waiters number
- Remove a namespace pollution introduced recently with lockmgr.h
  including lock.h by including lock.h directly in the consumers and
  making it mandatory for using lockmgr.
- Modify flags accepted by lockinit():
  * introduce LK_NOPROFILE which disables lock profiling for the
    specified lockmgr
  * introduce LK_QUIET which disables ktr tracing for the specified
    lockmgr [2]
  * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it
    can only be used on a per-instance basis
- Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer
  used

This patch breaks KPI so __FreBSD_version will be bumped and manpages
updated by further commits. Additively, 'struct buf' changes results in
a disturbed ABI also.

[2] Really, currently there is no ktr tracing in the lockmgr, but it
will be added soon.

[1] Submitted by:	kib
Tested by:	pho, Andrea Barberio <insomniac at slackware dot it>
2008-03-01 19:47:50 +00:00
Konstantin Belousov
e6591b84ff Rename fdescfs vnode from "fdesc" to "fdescfs" to avoid name collision
of the vnode lock with the fdesc_mtx mutex. Having different kinds of
locks with the same name confuses witness.
2008-02-26 10:10:55 +00:00
Robert Watson
18ff731caa Add "Make MPSAFE" to the Coda todo list.
MFC after:	3 days
2008-02-26 09:27:47 +00:00
Attilio Rao
81c794f998 Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by:	Andrea Barberio <insomniac at slackware dot it>
2008-02-25 18:45:57 +00:00
Attilio Rao
628f51d275 Introduce some functions in the vnode locks namespace and in the ffs
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode

In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock

Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.
2008-02-24 16:38:58 +00:00
Marcel Moolenaar
043ec583dc Don't check the bpbSecPerTrack and bpbHeads fields of the BPB.
They are typically 0 on new ia64 systems. Since we don't use
either field, there's no harm in not checking.
2008-02-21 03:19:46 +00:00
Robert Watson
fa8003c6b9 Remove custom queue macros in Coda, replacing them with queue(9) tailq
macros.  The only semantic change was the need to add a vc_opened field
to struct vcomm since we can no longer use the request queue returning
to an uninitialized state to hold whether or not the device is open.

MFC after:	1 month
2008-02-17 14:33:28 +00:00
Robert Watson
b15ce9be2e Remove namecache performance-tuning todo for Coda: we now use the FreeBSD
name cache.

MFC after:	1 month
2008-02-17 12:40:27 +00:00
Robert Watson
a8c34e8ee0 The possibly interruptible msleep in coda_call() means well, but is
fundamentally fairly confused about how signals work and when it is
appropriate for upcalls to be interrupted.  In particular, we should
be exempting certain upcalls from interruption, we should not always
eventually time out sleeping on a upcall, and we should not be
interrupting the sleep for certain signals that we currently are
(including SIGINFO).  This code needs to be reworked in the style of
NFS interruptible mounts.

MFC after:	1 month
2008-02-15 13:31:35 +00:00
Robert Watson
c30ddc8d99 Spell replys as replies.
MFC after:	1 month
2008-02-15 12:11:45 +00:00
Robert Watson
93b510870f Reorder and clean up make_coda_node(), annotate weaknesses in the
implementation.

MFC after:	1 month
2008-02-15 11:58:11 +00:00
Robert Watson
c0964f549b Remove debugging code under OLD_DIAGNOSTIC; this is all >10 years old and
hasn't been used in that time.

MFC after:	1 month
2008-02-14 00:55:03 +00:00
Robert Watson
57a77b811f In Coda, flush the attribute cache for a cnode when its fid is
changed, as its synthesized inode number may have changed and we
want stat(2) to pick up the new inode number.

MFC after:	1 month
2008-02-14 00:30:06 +00:00
Robert Watson
89d1d7886a Update cache flushing behavior in light of recent namecache and
access cache improvements:

- Flush just access control state on CODA_PURGEUSER, not the full
  namecache for /coda.

- When replacing a fid on a cnode as a result of, e.g.,
  reintegration after offline operation, we no longer need to
  purge the namecache entries associated with its vnode.

MFC after:	1 month
2008-02-13 19:50:17 +00:00
Robert Watson
38ab9a906a Implement a rudimentary access cache for the Coda kernel module,
modeled on the access cache found in NFS, smbfs, and the Linux coda
module.  This is a positive access cache of a single entry per file,
tracking recently granted rights, but unlike NFS and smbfs,
supporting explicit invalidation by the distributed file system.

For each cnode, maintain a C_ACCCACHE flag indicating the validity
of the cache, and a cached uid and mode tracking recently granted
positive access control decisions.

Prefer the cache to venus_access() in VOP_ACCESS() if it is valid,
and when we must fall back to venus_access(), update the cache.

Allow Venus to clear the access cache, either the whole cache on
CODA_FLUSH, or just entries for a specific uid on CODA_PURGEUSER.
Unlike the Coda module on Linux, we don't flush all entries on a
user purge using a generation number, we instead walk present
cnodes and clear only entries for the specific user, meaning it is
somewhat more expensive but won't hit all users.

Since the Coda module is agressive about not keeping around
unopened cnodes, the utility of the cache is somewhat limited for
files, but works will for directories.  We should make Coda less
agressive about GCing cnodes in VOP_INACTIVE() in order to improve
the effectiveness of in-kernel caching of attributes and access
rights.

MFC after:	1 month
2008-02-13 15:45:12 +00:00
Robert Watson
d25a3c4c44 Remove now-unused Coda namecache.
MFC after:	1 month
2008-02-13 13:26:01 +00:00
Robert Watson
44abffb44b Rather than having the Coda module use its own namecache, use the global
VFS namecache, as is done by the Coda module on Linux.  Unlike the Coda
namecache, the global VFS namecache isn't tagged by credential, so use
ore conservative flushing behavior (for now) when CODA_PURGEUSER is
issued by Venus.

This improves overall integration with the FreeBSD VFS, including
allowing __getcwd() to work better, procfs/procstat monitoring, and so
on.  This improves shell behavior in many cases, and improves ".."
handling.  It may lead to some slowdown until we've implemented a
specific access cache, which should net improve performance, but in the
mean time, lookup access control now always goes to Venus, whereas
previously it didn't.

MFC after:	1 month
2008-02-13 13:06:22 +00:00
Attilio Rao
d1215e10d2 Fix a lock leak in the ntfs locking scheme:
When ntfs_ntput() reaches 0 in the refcount the inode lockmgr is not
released and directly destroyed. Fix this by unlocking the lockmgr() even
in the case of zero-refcount.

Reported by: dougb, yar, Scot Hetzel <swhetzel at gmail dot com>
Submitted by: yar
2008-02-13 13:02:12 +00:00
Robert Watson
4f52b754df Clean up coda_pathconf() slightly while debugging a problem there.
MFC after:	1 month
2008-02-11 00:01:45 +00:00
Robert Watson
21bb029533 Since we're now actively maintaining the Coda module in the FreeBSD source
tree, restyle everything but coda.h (which is more explicitly shared
across systems) into a closer approximation to style(9).

Remove a few more unused function prototypes.

Add or clarify some comments.

MFC after:	1 month
2008-02-10 11:18:12 +00:00
Robert Watson
d57786ec68 Various further non-functional cleanups to coda:
- Rename print_vattr to coda_print_vattr and make static, rename
  print_cred to coda_print_cred.
- Remove unused coda_vop_nop.
- Add XXX comment because coda_readdir forwards to the cache vnode's
  readdir rather than venus_readdir, and annotate venus_readdir as
  unused.
- Rename vc_nb_* to vc_*.
- Use d_open_t, d_close_t, d_read_t, d_write_t, d_ioctl_t and d_poll_t
  for prototyping vc_* as that is the intent, don't use our own
  definitions.
- Rename coda_nb_statfs to coda_statfs, rename NB_SFS_SIZ to
  CODA_SFS_SIZ.
- Replace one more OBE reference to NetBSD with a reference to FreeBSD.
- Tidy up a little vertical whitespace here and there.
- Annotate coda_nc_zapvnode as unused.
- Remove unused vcodattach.
- Annotate VM_INTR as unused.
- Annotate that coda_fhtovp is unused and doesn't match the FreeBSD
  prototype, so isn't hooked up to vfs_fhtovp.  If we want NFS export of
  Coda to work someday, this needs to be fixed.
- Remove unused getNewVnode.
- Remove unused coda_vget, coda_init, coda_quotactl prototypes.

MFC after:	1 month
2008-02-09 12:49:18 +00:00
Robert Watson
fc9d8f0057 No reason not to maintain stats on statfs in Coda, as it's done for
other VFS operations, so uncomment the existing statistics gathering.

MFC after:	1 month
2008-02-09 11:40:49 +00:00
Robert Watson
8571e9a189 Remove unused devtomp(), which exploited UFS-specific knowledge to find
the mountpoint for a specific device.  This was implemented incorrectly,
a bad idea in a fundamental sense, and also never used, so presumably
a long-idle debugging function.

MFC after:	1 month
2008-02-09 11:12:18 +00:00
Robert Watson
82e4904ffb Since Coda is effectively a stacked file system, use VOP_EOPNOTSUPP
for vop_bmap; delete the existing stub that returned either EINVAL
or EOPNOTSUPP, and had unreachable calls to VOP_BMAP on the cache
vnode.

MFC after:	1 month
2008-02-09 09:33:19 +00:00
Robert Watson
37245e3742 Lock cache vnode when VOP_FSYNC() is called on a Coda vnode.
MFC after:	1 month
2008-02-09 00:12:22 +00:00
Robert Watson
6dc70a9dec Make all calls to vn_lock() in Coda, including recently added ones,
use LK_RETRY, since failure is undesirable (and not handled).

MFC after:	1 month
Pointed out by:	kib
2008-02-09 00:03:22 +00:00
Robert Watson
7a246a6314 The Coda module was originally ported to NetBSD from Mach by rvb, and
then later to FreeBSD.  Update various NetBSD-related comments: in some
cases delete them because they don't appply, in others update to say
FreeBSD as they still apply but in FreeBSD (and might for that matter
no longer apply on NetBSD), and flag one case where I'm not sure
whether it applies.

MFC after:	1 month
2008-02-08 23:15:36 +00:00
Robert Watson
efeac2fb25 Before invoking vnode operations on cache vnodes, acquire the vnode
locks of those vnodes.  Probably, Coda should do the same lock sharing/
pass-through that is done for nullfs, but in the mean time this ensures
that locks are adequately held to prevent corruption of data structures
in the cache file system.

Assuming most operations came from the top layer of Coda and weren't
performed directly on the cache vnodes, in practice this corruption was
relatively unlikely as the Coda vnode locks were ensuring exclusive
access for most consumers.

This causes WITNESS to squeal like a pig immediately when Coda is used,
rather than waiting until file close; I noticed these problems because
of the lack of said squealing.

MFC after:	1 month
2008-02-08 23:01:40 +00:00
Robert Watson
99a2317ed3 Remove undefined coda excluded by #if 1 #else, which previously protected
vget() calls using inode numbers to query the root of /coda, which is not
needed since we now cache the root vnode with the mountpoint.

MFC after:	1 month
2008-02-08 22:37:15 +00:00
Attilio Rao
2433c4883e Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into
VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should
only acquire curthread as argument; this will lead in axing the additional
argument from both functions, making the code cleaner.

Reviewed by: jeff, kib
2008-02-08 21:45:47 +00:00
Robert Watson
c55376e791 Remove Giant acquisition around soreceive() and sosend() in fifofs. The
bug that caused us to reintroduce it is believed to be fixed, and Kris
says he no longer sees problems with fifofs in highly parallel builds.
If this works out, we'll MFC it for 7.1.

MFC after:	3 months
Pointed out by:	kris
2008-01-26 12:34:23 +00:00
Attilio Rao
0e9eb108f0 Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
  always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
  Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
  present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
  curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo
2008-01-24 12:34:30 +00:00
Robert Watson
9d3e5c0e2b Put "coda_rdwr: Internally Opening" printf generated by in-kernel writes
to files, such as ktrace output, under CODA_VERBOSE.  Otherwise, each
such call to VOP_WRITE() results in a kernel printf.

MFC after:	3 days
Obtained from:	NetBSD
2008-01-21 21:39:08 +00:00
Robert Watson
e866951b59 Replace references to VOP_LOCK() w/o LK_RETRY to vn_lock() with LK_RETRY,
avoiding extra error handling, or in some cases, missing error handling.

MFC after:	3 days
Discussed with:	kib
2008-01-21 21:19:07 +00:00
Robert Watson
9440b9f7ea Remove unused oldhash definition from Coda namecache.
MFC after:	3 days
2008-01-19 19:21:07 +00:00
Robert Watson
de5910460a Improve default vnode operation handling for Coda:
- Don't specify vnode operations for mknod, lease, and advlock--let them
  fall through to vop_default.

- Implement vop_default with &default_vnodeops, rather than with VOP_PANIC,
  so that unimplemented vnode operations are handled in more sensible ways
  than panicking, such as EOPNOTSUPP on ACL queries generated by bsdtar,
  or mknod.

MFC after:	3 days
2008-01-19 17:12:44 +00:00
Robert Watson
aeab4f72a0 Rework coda_statfs(): no longer need to zero the statfs structure or
fill out all fields, just fill out the ones the file system knows
about.  Among other things, this causes the outpuf of "mount" and
"df" to make quite a bit more sense as /dev/cfs0 is specified as the
mountfrom name.

MFC after:	3 days
2008-01-19 16:39:14 +00:00
Robert Watson
82bf4517ef Zero mi_rotovp and coda_ctlvp immediately after calling vrele() on the
vnodes during coda_unmount() in order to detect errant use of them
after the vnode references may no longer be valid.

No need to clear the VV_ROOT flag on mi_rootvp flag (especially after
the vnode reference is no longer valid) as this isn't done on other
file systems.

MFC after:	3 days
2008-01-19 15:40:46 +00:00
Robert Watson
96b1e9b015 Don't acquire an additional vnode reference to a vnode when it is opened
and then release it when it is closed: we rely on the caller to keep the
vnode around with a valid reference.  This avoids vrele() destroying the
vnode vop_close() is being called from during a call to vop_close(), and
a crash due to lockmgr recursing the vnode lock when a Coda unmount
occurs.

MFC after:	3 days
2008-01-19 15:39:10 +00:00
Robert Watson
76898521e8 Don't declare functions as extern.
Move all extern variable definitions to associated .h files, move some
extern variable definitions between include files to place them more
appropriately.

MFC after:	3 days
2008-01-19 14:32:44 +00:00
Robert Watson
11cc4ab95a Use VOP_NULL rather than VOP_PANIC for Coda's vop_print routine, so as
to avoid panicking in DDB show lockedvnods.

MFC after:	3 days
2008-01-19 13:41:56 +00:00
Robert Watson
d883e8e720 Lock the new directory vnode returned by coda_mkdir(), as this is required
by FreeBSD's vnode locking protocol.

MFC after:	3 days
2008-01-19 13:29:14 +00:00
Robert Watson
6885d70dfe Borrow the VM object associated with an underlying cache vnode with the
Coda vnode derived from it, in the style of nullfs.  This allows files
in the Coda file system to be memory-mapped, such as with execve(2) or
mmap(2).

MFC after:	3 days
Reported by:	Rune <u+openafsdev-sr55 at chalmers dot se>
2008-01-19 13:27:14 +00:00
Konstantin Belousov
61af195933 udf_vget() shall vgone() the vnode when the file_entry cannot be allocated
or read from the volume. Otherwise, half-constructed vnode could be found
later and cause panic when accessed.

PR:	118322
MFC after:	1 week
2008-01-18 12:09:54 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Attilio Rao
d7a7e17968 Remove explicit calling of lockmgr() with the NULL argument.
Now, lockmgr() function can only be called passing curthread and the
KASSERT() is upgraded according with this.

In order to support on-the-fly owner switching, the new function
lockmgr_disown() has been introduced and gets used in BUF_KERNPROC().
KPI, so, results changed and FreeBSD version will be bumped soon.
Differently from previous code, we assume idle thread cannot try to
acquire the lockmgr as it cannot sleep, so loose the relative check[1]
in BUF_KERNPROC().

Tested by: kris

[1] kib asked for a KASSERT in the lockmgr_disown() about this
condition, but after thinking at it, as this is a well known general
rule, I found it not really necessary.
2008-01-08 23:48:31 +00:00
John Baldwin
314464f422 Lock the vnode interlock while reading v_usecount to update si_usecount
in a cdev in devfs_reclaim().

MFC after:	3 days
Reviewed by:	jeff (a while ago)
2008-01-08 04:45:24 +00:00
John Baldwin
e46502943a Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
  a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
  object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
  vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by:	rwatson
2008-01-07 20:05:19 +00:00
Attilio Rao
7a52326a0d g_vfs_close() wants the sx topology lock held while executing, so just
add correct locking to the operation of unmounting.
This will prevent debugging kernels from panicking if mounting a
non-hpfs partition (I'm not sure if this can be a problem with a
successful mounting operation though).

MFC: 3 days
2008-01-07 16:51:24 +00:00
Jeff Roberson
397c19d175 Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
   in such a way that the ops vector is only valid after the data, type,
   and flags are valid.
 - Protect f_flag and f_count with atomic operations.
 - Remove the global list of all files and associated accounting.
 - Rewrite the unp garbage collection such that it no longer requires
   the global list of all files and instead uses a list of all unp sockets.
 - Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by:	kris, pho
2007-12-30 01:42:15 +00:00
Attilio Rao
100f241571 Trimm out now unused option LK_EXCLUPGRADE from the lockmgr namespace.
This option just adds complexity and the new implementation no longer
will support it, so axing it now that it is unused is probabilly the
better idea.

FreeBSD version is bumped in order to reflect the KPI breakage introduced
by this patch.

In the ports tree, kris found that only old OSKit code uses it, but as
it is thought to work only on 2.x kernels serie, version bumping will
solve any problem.
2007-12-28 00:38:13 +00:00
Robert Watson
3de213cc00 Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
Markus Brueffer
a8a27cb0f9 Fix calculation of descriptor tag checksums. According to ECMA-167, Part 4,
7.2.3, bytes 0-3 and 5-15 are used to calculate the checksum of a descriptor
tag.

PR:		kern/90521
Submitted by:	Björn König <bkoenig@cs.tu-berlin.de>
Reviewed by:	scottl
Approved by:	emax (mentor)
2007-12-11 19:49:40 +00:00
Xin LI
1fa8f5f051 Turn MPASS(0) into panic with more obvious reason why the assertion
is failed.
2007-12-07 00:00:21 +00:00
Xin LI
745973bd99 size_max should be unsigned, as such, use size_t here. 2007-12-06 23:19:05 +00:00
Wojciech A. Koszek
9889281da3 Explicitly initialize 'error' to 0 (two places). It lets one to build tmpfs
from the latest source tree with older compiler--gcc3.

Reviewed by:	kib@ (on freebsd-current@)
Approved by:	cognet@ (mentor)
2007-12-04 20:14:15 +00:00
Maxim Konovalov
23c1e989a6 o English lesson from bde@: "iff" is not a typo, it means "if and only if".
Backout previous.
2007-11-18 09:21:30 +00:00
Xin LI
7871e52bfd MFp4: Several fixes to tmpfs which makes it to survive from pho@'s
strees2 suite, to quote his letter, this change:

1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
   because nothing protects vnode/tmpfs node between lookup is done, and
   actual operation is performed, in the case the vnode lock is dropped.
   At least, this is the case with the from vnode for rename.

   For now, we do the linear lookup in the parent node. This has its own
   drawbacks. Not mentioning speed (that could be fixed by using hash), the
   real problem is the situation where several hardlinks exist in the dvp.
   But, I think this is fixable.

2. The patch restores the VV_ROOT flag on the root vnode after it became
   reclaimed and allocated again. This fixes MPASS assertion at the start
   of the tmpfs_lookup() reported by many.

Submitted by:	kib
2007-11-18 04:52:40 +00:00
Xin LI
e0f51ae7cd MFp4: Fix several style(9) bugs.
Submitted by:	des
2007-11-18 04:40:42 +00:00
Maxim Konovalov
3f61687ba1 o Mask maximum file permissions we get from mount_ntfs -m
with ACCESSPERMS.  Document in mount_ntfs(8) only the nine
low-order bits of mask are used (taken from mount_msdosfs(8)).

PR:		kern/114856
Submitted by:	Ighighi
MFC after:	1 month
2007-11-17 17:05:01 +00:00
Maxim Konovalov
4adf89efc6 o Fix a typo in the comment. 2007-11-17 16:19:48 +00:00
Maxim Konovalov
6b0659fc0f o Do not leak inodes hash table at module unload.
PR:		kern/118017
Submitted by:	Ighighi
MFC after:	1 week
2007-11-13 19:34:06 +00:00
Xin LI
eed4ee29e5 Correct a stack overflow which will trigger panics when
mode= is specified, caused by incorrect format string
specified to vfs_scanopt() and subsequently vsscanf().

Pointed out by:	kib
Submitted by:	des
2007-11-12 18:57:33 +00:00
Tom Rhodes
ededffc06b Remove some debugging code that, while useful, doesn't belong in the committed
version.  While here, expand a macro only used once.

Discussed with/oked by:	bde
2007-10-25 08:23:08 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Xin LI
3247c9ddcc Fixes to msdosfs dirtyflag related stuff:
- markvoldirty() needs to write to underlying GEOM provider.  We
   have to do that *before* g_access() which sets the GEOM provider
   to read-only.
 - Remove dirty flag before free'ing iconv related resources.  The
   dirty flag removal could fail, and it is hard to revert the
   iconv-free after the fail.
 - Mark volume as dirty if we have failed to mark it clean for safe.
 - Other style fixes to the touched functions.
2007-10-22 17:43:43 +00:00
Bruce Evans
cb65c1ee29 Implement the async (really, delayed-write) mount option for msdosfs.
This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.

This is more complete and correct than in ffs.  Several places in ffs
are are still missing the choice.  ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).

However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too.  Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race.  Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.
2007-10-19 12:23:25 +00:00
Bruce Evans
9e916c3163 Add noclusterr and noclusterw options to the options list. I forgot these
when I implemented clustering.
2007-10-18 16:25:47 +00:00
Bruce Evans
7c3fc9de5c Fix some style bugs in the mount options list. Mainly, sort the list,
leaving space for adding missing options.  Negative options are sorted
after removing their "no" prefix, and generic options are sorted before
msdosfs-specific ones.
2007-10-18 15:48:10 +00:00
Bruce Evans
cefb55828f In msdosfs_settattr(), don't do synchronous updates of the denode
(except indirectly for the size pseudo-attribute).  If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them.  (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)

Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).

This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).
2007-10-18 07:26:21 +00:00
Alfred Perlstein
77465d9390 Get rid of qaddr_t.
Requested by: bde
2007-10-16 10:54:55 +00:00
Daichi GOTO
1016626062 This changes give nullfs correctly work with latest unionfs.
Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:57:11 +00:00
Daichi GOTO
20885def58 Added whiteout behavior option. ``-o whiteout=always'' is default mode
(it is established practice) and ``-o whiteout=whenneeded'' is less
disk-space using mode especially for resource restricted environments
like embedded environments. (Contributed by Ed Schouten. Thanks)

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:55:38 +00:00
Daichi GOTO
524f3f285d Default copy mode has been changed from traditional-mode to transparent-mode.
Some folks who have reported some issues have solved with transparent mode.
We guess it is time to change the default copy mode. The transparent-mode is
the best in most situations.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:53:38 +00:00
Daichi GOTO
7d72c5e67d Fixed un-vrele issue of upper layer root vnode of unionfs.
Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:52:01 +00:00
Daichi GOTO
6c98d0e9db Added NULL check code pointed out by Coverity. (via Stanislav
Sedov. Thanks)

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:50:58 +00:00
Daichi GOTO
57821163d3 - It has been become MPSAFE.
- Fixed lock panic issue under MPSAFE.
- Fixed panic issue whenever it locks vnode with reclaim.
- Fixed lock implementations not conforming to vnode_if.src style.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:49:30 +00:00
Daichi GOTO
7e0c899579 Fixed vnode unlock/vrele untreated issues whenever errors have
occurred during some treatments.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:47:44 +00:00
Daichi GOTO
dc2dd18518 - Added support for vfs_cache on unionfs. As a result, you can use
applications that use procfs on unionfs.
- Removed unionfs internal cache mechanism because it has
  vfs_cache support instead. As a result, it just simplified code of
  unionfs.
- Fixed kern/111262 issue.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:46:11 +00:00
Daichi GOTO
5adc408078 Added treatments to prevent readdir infinity loop using with Linux binary
compatibility feature.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:44:06 +00:00
Daichi GOTO
b2b0db08c5 Changed it frees unneeded memory ASAP.
Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:42:05 +00:00
Daichi GOTO
3282e2c406 Log:
Improved access permission check treatments.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:37:52 +00:00
John Baldwin
c1f7cf23b1 Use the correct pid when checking to see whether or not the /proc/<pid>
directory itself (rather than any of its contents) is visible to the
current thread.

MFC after:	1 week
PR:		kern/90063
Submitted by:	john of 8192.net
Approved by:	re (kensmith)
2007-10-05 17:37:25 +00:00
Xin LI
3543c1b429 MFp4: Provide a dummy verb "export" to shut up the message
showed up at start when NFS is enabled.

Reported by:	rafan
Approved by:	re (tmpfs blanket)
2007-10-04 17:11:48 +00:00
Xin LI
386c969205 Additional work is still needed before we can claim that tmpfs
is stable enough for production usage.  Warn user upon mount.

Approved by:	re (tmpfs blanket)
2007-10-04 17:08:46 +00:00
Bruce Evans
ed316d339f Remove some of the pessimizations involving writing the fsi sector.
All active fields in fsi are advisory/optional, so we shouldn't do
extra work to make them valid at all times, but instead we write to
the fsi too often (we still do), and we searched for a free cluster
for fsinxtfree too often.

This commit just removes the whole search and its results, so that we
write out our in-core copy of fsinxtfree instead of writing a "fixed"
copy and clobbering our in-core copy.  This saves fixing 3 bugs:
- off-by-1 error for the end of the search, resulting in fsinxtfree
  not actually being adjusted iff only the last cluster is free.
- missing adjustment when no clusters are free.
- off-by-many error for the start of the search.  Starting the search
  at 0 instead of at (the in-core copy of) fsinxtfree did more than
  defeat the reasons for existence of fsinxtfree.  fsinxtfree exists
  mainly to avoid having to start at 0 for just the first search per
  mount, but has the side effect of reducing bias towards allocating
  near cluster 0.  The bias would normally only be generated by the
  first search per mount (if fsinxtfree is not supported), but since
  we also adjusted the in-core copy of fsinxtfree here, we were doing
  extra work to maximize the bias.

Approved by:	re (kensmith)
2007-09-23 14:49:32 +00:00
Craig Rodrigues
00cedf971b Disable multiple ntfs mounts to the same mountpoint.
Eliminates panics due to locking issues.
Idea taken from src/sys/gnu/fs/xfs/FreeBSD/xfs_super.c.

PR:	89966, 92000, 104393
Reported by:	H. Matsuo <hiroshi50000 yahoo co jp>,
		Chris <m2chrischou gmail.com>,
		Andrey V. Elsukov <bu7cher yandex ru>,
		Jan Henrik Sylvester <me janh de>
Approved by:	re (kensmith)
2007-09-21 23:50:15 +00:00
Jeff Roberson
b61ce5b0e6 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
Bruce Evans
c2819440b3 Fix races in msdosfs_lookup() and msdosfs_readdir(). These functions
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.

The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer.  They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.

The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.

The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do.  Stack use from this
is large but not too large.  This also fixes a memory leak on module
unload.

Reviewed by:	kib
Approved by:	re (kensmith)
2007-08-31 22:29:55 +00:00
Xin LI
1f32d0127b MFp4: rework tmpfs_readdir() logic in terms of correctness.
Approved by:	re (tmpfs blanket)
Tested with:	fstest, fsx
2007-08-16 11:00:07 +00:00
John Baldwin
1dc5b1cc56 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
Xin LI
ad3638ee08 MFp4:
- LK_RETRY prohibits vget() and vn_lock() to return error.
   Remove associated code. [1]
 - Properly use vhold() and vdrop() instead of their unlocked
   versions, we are guaranteed to have the vnode's interlock
   unheld. [1]
 - Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
   with the same way used in modern NetBSD versions. [2]
 - Reorganize tmpfs_readdir to reduce duplicated code.

Submitted by:	kib [1]
Obtained from:	NetBSD [2]
Approved by:	re (tmpfs blanket)
2007-08-10 11:00:30 +00:00
Xin LI
0ae6383d39 MFp4:
- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
 - Properly lock around tn_vnode to avoid NULL deference
 - Be more careful handling vnodes (*)

(*) This is a WIP
[1] by pjd via howardsu

Thanks kib@ for his valuable VFS related comments.

Tested with:	fsx, fstest, tmpfs regression test set
Found by:	pho's stress2 suite
Approved by:	re (tmpfs blanket)
2007-08-10 05:24:49 +00:00
Bruce Evans
a4e6807c49 In msdosfs_read() and msdosfs_write(), don't check explicitly for
(uio_offset < 0) since this can't happen.  If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).

In msdosfs_read(), don't check for (uio_resid < 0).  msdosfs_write()
already didn't check.

In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid.  ffs checks using KASSERT(),
and that is enough sanity checking.  In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.

In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.

In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.

Approved by:	re (kensmith) (blanket)
2007-08-07 10:35:27 +00:00
Bruce Evans
b7837a91c9 Fix and update the comments about the effect of the read-only flag on writing.
They are still too verbose.

Remove nearby unreachable code for handling symlinks.

Approved by:	re (kensmith) (blanket)
2007-08-07 05:42:10 +00:00
Bruce Evans
e3117f852e Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).

Improve an error message by quoting ".", and by not printing large positive
values as negative ones.

Approved by:	re (kensmith) (blanket)
2007-08-07 03:59:49 +00:00
Bruce Evans
c0f5121cac Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix only a couple of whtespace errors).

Approved by:	re (kensmith) (blanket)
2007-08-07 03:43:28 +00:00
Bruce Evans
2d7c6b2724 Fix some style bugs (mainly some whitespace errors).
Approved by:	re (kensmith) (blanket)
2007-08-07 03:38:36 +00:00
Bruce Evans
b6d0381e7e Fix some style bugs (some whitespace errors only).
Approved by:	re (kensmith) (blanket)
2007-08-07 03:22:10 +00:00
Bruce Evans
d2bb66bacd Sort includes.
Remove rotted banal comment attached to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:28:33 +00:00
Bruce Evans
6becd1c855 Sort includes.
Remove banal comments attached to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:27:35 +00:00
Bruce Evans
5696c6e0b2 Sort includes.
Remove banal comments before includes.  Remove rotted banal comments attached
to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:20:37 +00:00
Bruce Evans
9b0802c90b Remove unused include(s).
Remove banal comments before includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:11:16 +00:00
Bruce Evans
a878a31c13 Remove unused include(s).
Approved by:	re (kensmith) (blanket)
2007-08-07 02:08:06 +00:00
Bruce Evans
eba34270fa Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/buf.h> and/or <sys/vnode.h>

Approved by:	re (kensmith) (blanket)
2007-08-07 01:40:27 +00:00
Bruce Evans
1103771d95 Include <sys/mutex.h>'s prerequisite <sys/lock.h> instead of depending on
namespace pollution in <sys/vnode.h>.

Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.

Approved by:	re (kensmith) (blanket)
2007-08-07 01:37:59 +00:00
Bruce Evans
6fd81fc7a6 Remove unused include(s).
Approved by:	re (kensmith) (blanket)
2007-08-07 01:07:16 +00:00
Bruce Evans
8d61a735c6 Silently fix up the estimated next free cluster number from the fsinfo
sector, instead of failing the whole mount if it is garbage.  Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).

This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD.  1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary.  Now we handle the magic value as a special case of
fixing up all out of bounds values.

Also fix up the estimated next free cluster number when there is no
fsinfo sector.  We were using 0, but CLUST_FIRST is safer.

Approved by:	re (kensmith)
2007-08-05 12:58:34 +00:00
Bruce Evans
3726942956 Oops, fix the fix for the i/o size of the fsinfo block. Its log
message explained why the size is 1 sector, but the code used a
size of 1 cluster.

I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache.  Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.

Approved by:	re (kensmith)
2007-08-03 23:13:50 +00:00
Xin LI
fb7557140e MFp4 - Refine locking to eliminate some potential race/panics:
- Copy before testing a pointer.  This closes a race window.
 - Use msleep with the node interlock instead of tsleep.
 - Do proper locking around access to tn_vpstate.
 - Assert vnode VOP lock for dir_{atta,de}tach to capture
   inconsistent locking.

Suggested by:	kib
Submitted by:	delphij
Reviewed by:	Howard Su
Approved by:	re (tmpfs blanket)
2007-08-03 06:24:31 +00:00
Pawel Jakub Dawidek
57fd3d5572 When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with:	kib, ups
Approved by:	re (rwatson)
2007-07-26 16:58:09 +00:00
Xin LI
f62e5595fd MFp4: Force 64-bit arithmatic when caculating the maximum file size.
This fixes tmpfs caculations on 32-bit systems equipped with more than
4GB swap.

Reported by:	Craig Boston <craig xfoil gank org>
PR:		kern/114870
Approved by:	re (tmpfs blanket)
2007-07-24 17:14:53 +00:00
Bruce Evans
4eb3abf0a5 Make using msdosfs as the root file system sort of work:
o Initialize ownerships and permissions.  They were garbage (0) for
  root mounts since vfs_mountroot_try() doesn't ask for them to be set
  and msdosfs's old incomplete code to set them was removed.  The
  garbage happened to give the correct ownerships root:wheel, but it
  gave permissions 000 so init could not be execed.  Use the macros
  for root: wheel and 0755.  (The removed code gave 0:0 and 0777.  0755
  is more normal and secure, thought wrong for /tmp.)

o Check the readonly flag for initial (non-MNT_UPDATE) mounts in the
  correct place, as in ffs.  For root mounts, it is only passed in
  mp->mnt_flags, since vfs_mountroot_try() only passes it as a flag
  and nothing translates the flag to the "ro" option string.  msdosfs
  only looked for it in the string, so it gave a rw mount for root
  mounts without even clearing the flag in mp->mnt_flags, so the final
  state was inconsistent.  Checking the flag only in mp->mnt_flags
  works for initial userland mounts too.  The MNT_UPDATE case is
  messier.

The main point that should work but doesn't is fsck of msdosfs root
while it is mounted ro.  This needs mainly MNT_RELOAD support to work.
It should be possible to run fsck -p and succeed provided the fs is
consistent, not just for msdosfs, but this fails because fsck -p always
tries to open the device rw.  The hack that allows open for writing
in ffs is not implemented in msdosfs, since without MNT_RELOAD support
writing could only be harmful.  So fsck must be turned off to use
msdosfs as root.  This is quite dangerous, since msdosfs is still missing
actually using its fs-dirty flag internally, so it is happy to mount
dirty fileystems rw.

Unrelated changes:
- Fix missing error handling for MNT_UPDATE from rw to ro.
- Catch up with renaming msdos to msdosfs in a string.

Approved by:	re (kensmith)
2007-07-23 07:10:17 +00:00
Xin LI
7280082944 MFp4: When swapping is not enabled, allow creating files by taking
physical memory pages into account for tm_maxfilesize.

Reported by:	Dominique Goncalves <dominique.goncalves gmail.com>
Submitted by:	Howard Su
Approved by:	re (tmpfs blanket)
2007-07-23 06:54:58 +00:00
Bruce Evans
6b6c5f5ef9 Implement vfs clustering for msdosfs.
This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read).  mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.

msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes.  Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.

The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code.  This implementation loops
calling a lower level bmap function to do the hard parts.  This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.

Approved by:	re (hrs)
2007-07-20 17:06:57 +00:00
Bruce Evans
d34b0a1bac Clean up before implementing vfs clustering for msdosfs:
In msdosfs_read(), mainly reorder the main loop to the same order as in
ffs_read().

In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of
clrbuf().  I think this just just a bogus optimization, but ffs always
does it and msdosfs already did it in one place, and it is what I've
tested.

In msdosfs_write(), merge good bits from a comment in ffs_write(), and
fix 1 style bug.

In the main comment for msdosfs_pcbmap(), improve wording and catch
up with 13 years of changes in the function.  This comment belongs in
VOP_BMAP.9 but that doesn't exist.

In msdosfs_bmap(), return EFBIG if the requested cluster number is out
of bounds instead of blindly truncating it, and fix many style bugs.

Approved by:	re (hrs)
2007-07-20 16:21:47 +00:00
Robert Watson
825eaf3470 Make sure we release the control vnode in Coda:
We allocate coda_ctlvp when /coda is mounted, but never release it.
During the unmount this vnode was marked as UNMOUNTING and when venus
is started a second time the system would hang, possibly waiting for
the old vnode to disappear.

So now we call vrele on the control vnode when file system is unmounted
to drop the reference we got during the mount. I'm pretty sure it is
also necessary to not skip the handling in coda_inactive for the control
vnode, it seems like that is the place we actually get rid of the vnode
once the refcount has dropped to 0.

Submitted by:	Jan Harkes <jaharkes at cs dot cmu dot edu>
Approved by:	re (kensmith)
2007-07-20 11:14:51 +00:00
Xin LI
c5be778305 MFp4: Rework on tmpfs's mapped read/write procedures. This
should finally fix fsx test case.

The printf's added here would be eventually turned into
assertions.

Submitted by:	Mingyan Guo (mostly)
Approved by:	re (tmpfs blanket)
2007-07-19 03:34:50 +00:00
Robert Watson
00f05dc847 Complete repo-copy and move of Coda from src/sys/coda to src/sys/fs/coda
by removing files from src/sys/coda, and updating include paths in the
new location, kernel configuration, and  Makefiles.  In one case add
$FreeBSD$.

Discussed with:		anderson, Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:		re (kensmith)
Repo-copy madness:	simon
2007-07-12 21:04:58 +00:00
Robert Watson
d21e51d059 Forced commit to recognize repo-copy of Coda files from src/sys/coda to
src/sys/fs/coda.

Discussed with:         anderson, Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:            re (kensmith)
Repo-copy madness:      simon
2007-07-12 20:40:38 +00:00
Bruce Evans
93fe42b62f Round up the FAT block size to a multiple of the sector size so that i/o
to the FAT is possible.

Make the FAT block size less arbitrary before it is rounded up:
- for FAT12, default to 3*512 instead of to 3 sectors.  The magic 3 is
  the default number of 512-byte FAT sectors on a floppy drive.  That
  many sectors is too many if the sector size is larger.
- for !FAT12, default to PAGE_SIZE instead of to 4096.  Remove
  MSDOSFS_DFLTBSIZE since it only obfuscated this 4096.

For reading the BPB, use a block size of 8192 instead of 2048 so that
sector sizes up to 8192 can work.  We should try several sizes, or just
try the maximum supported size (MAXBSIZE = 64K).  I use 8192 because
that is enough for DVD-RW's (even 2048 is enough) and 8192 has been
tested a lot in use by ffs.

This completes fixing msdosfs for some large sector sizes (up to 8K
for read and 64K for write).  Microsoft documents support for sector
sizes up to 4K in mdosfs.  ffs is currently limited to 8K for both
read and write.

Approved by:	re (kensmith)
Approved by:	nyan (several years ago)
2007-07-12 17:17:47 +00:00
Bruce Evans
fd7c4230b2 Fix some bugs involving the fsinfo block (many remain unfixed). This is
part of fixing msdosfs for large sector sizes.  One of the fixed bugs
was fatal for large sector sizes.

1. The fsinfo block has size 512, but it was misunderstood and declared
   as having size 1024, with nothing in the second 512 bytes except a
   signature at the end.  The second 512 bytes actually normally (if
   the file system was created by Windows) consist of a second boot
   sector which is normally (in WinXP) empty except for a signature --
   the normal layout is one boot sector, one fsinfo sector, another
   boot sector, then these 3 sectors duplicated.  However, other
   layouts are valid.  newfs_msdos produces a valid layout with one
   boot sector, one fsinfo sector, then these 2 sectors duplicated.
   The signature check for the extra part of the fsinfo was thus
   normally checking the signature in either the second boot sector
   or the first boot sector in the copy, and thus accidentally
   succeeding.  The extra signature check would just fail for weirder
   layouts with 512-byte sectors, and for normal layouts with any other
   sector size.

   Remove the extra bytes and the extra signature check.

2. Old versions did i/o to the fsinfo block using size 1024, with the
   second half only used for the extra signature check on read.  This
   was harmless for sector size 512, and worked accidentally for sector
   size 1024.  The i/o just failed for larger sector sizes.

   The version being fixed did i/o to the fsinfo block using size
   fsi_size(pmp) = (1024 << ((pmp)->pm_BlkPerSec >> 2)).  This
   expression makes no sense.  It happens to work for sector small
   sector sizes, but for sector size 32K it gives the preposterous
   value of 64M and thus causes panics.  A sector size of 32768 is
   necessary for at least some DVD-RW's (where the minimum write size
   is 32768 although the minimum read size is 2048).

   Now that the size of the fsinfo block is 512, it always fits in
   one sector so there is no need for a macro to express it.  Just
   use the sector size where the old code uses 1024.

Approved by:	re (kensmith)
Approved by:	nyan (several years ago for a different version of (2))
2007-07-12 16:09:07 +00:00