Commit Graph

783 Commits

Author SHA1 Message Date
Stanislav Sedov
52d9a478fe - Style(9) improvements.
- Convert all K&R definitions to ANSI equialents.
- Retire bsd_malloc and bsd_free macros and
  use malloc/free directly.
- Drop some unused debugging calls.

This commit brings no functional changes.
2009-06-03 14:18:37 +00:00
Stanislav Sedov
22db14d54b - Sync our copies of ext2fs Linux headers to current Linux versions.
Minimize differencies between our ext2fs headers and relevant Linux
  versions by using EXT2_SB macro to access the superblock fields. Most
  of the differencies in access to these fields are now hidden inside
  this macro.
- Rename the s_db_per_group field of ext2fs_sb_info to s_gdb_count
  to reflect the similar change in Linux headers. New name also seem
  to be more appropriate for this field.
- Use proper types for s_first_inode and s_inode_size in-core superblock
  fields. Now they reflec types used in the on-disk superblock version.
- Add support for older filesystem revisions that doesn't have proper
  s_first_ino and s_inode_size fields in the on-disk superblock. In these
  cases predefined values for these fields are used.
- Add simple sanity checks for s_first_inode and s_inode_size correctness.

Reviewed by:	bde (previous version)
MFC after:	2 weeks
2009-06-03 13:25:50 +00:00
Alexander Kabaev
0d6b4bcc58 Remove empty files and do nto try to build them.
Apparently, they are problematic for CTF users.

PR:	119298
Submitted by: Julian H. Stacey
2009-05-18 17:20:24 +00:00
Attilio Rao
120b18d86f FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now).  Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by:	jhb, kmacy
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2009-05-14 17:43:00 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Konstantin Belousov
c1d8b5e82c Fix two issues with bufdaemon, often causing the processes to hang in
the "nbufkv" sleep.

First, ffs background cg group block write requests a new buffer for
the shadow copy. When ffs_bufwrite() is called from the bufdaemon due
to buffers shortage, requesting the buffer deadlock bufdaemon.
Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk
to not block while allocating the buffer, and return failure
instead. Add a flag argument to the geteblk to allow to pass the flags
to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer
allocation failed and either GB_NOWAIT_BD is specified, or geteblk()
is called from bufdaemon (or its helper, see below). In
ffs_bufwrite(), fall back to synchronous cg block write if shadow
block allocation failed.

Since r107847, buffer write assumes that vnode owning the buffer is
locked. The second problem is that buffer cache may accumulate many
buffers belonging to limited number of vnodes. With such workload,
quite often threads that own the mentioned vnodes locks are trying to
read another block from the vnodes, and, due to buffer cache
exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make
any substantial progress because the vnodes are locked.

Allow the threads owning vnode locks to help the bufdaemon by doing
the flush pass over the buffer cache before getnewbuf() is going to
uninterruptible sleep. Move the flushing code from buf_daemon() to new
helper function buf_do_flush(), that is called from getnewbuf().  The
number of buffers flushed by single call to buf_do_flush() from
getnewbuf() is limited by new sysctl vfs.flushbufqtarget.  Prevent
recursive calls to buf_do_flush() by marking the bufdaemon and threads
that temporarily help bufdaemon by TDP_BUFNEED flag.

In collaboration with:	pho
Reviewed by:	 tegge (previous version)
Tested by:	 glebius, yandex ...
MFC after:	 3 weeks
2009-03-16 15:39:46 +00:00
David Schultz
bb2a335b35 Don't declare bin_search() as an inline function, since there's no
inline definition of it.
2009-03-08 06:14:33 +00:00
Ed Schouten
802cb57e34 Add memmove() to the kernel, making the kernel compile with Clang.
When copying big structures, LLVM generates calls to memmove(), because
it may not be able to figure out whether structures overlap. This caused
linker errors to occur. memmove() is now implemented using bcopy().
Ideally it would be the other way around, but that can be solved in the
future. On ARM we don't do add anything, because it already has
memmove().

Discussed on:	arch@
Reviewed by:	rdivacky
2009-02-28 16:21:25 +00:00
Stanislav Sedov
b51ba332bb - Eliminate warnings in debug print macros by explicitly converting all
field to unsigned long.
2009-01-18 15:10:46 +00:00
Stanislav Sedov
e6b6691abb - Whitespace fixes.
- s_bmask field doesn't exist.
- Use correct flags in debug printf.
2009-01-18 14:54:46 +00:00
Stanislav Sedov
291156ecf1 - Obtain inode sizes and location of the first inode based on the contents
of superblock rather than using hardcoded values. This fixes ext2fs on
  filesystems with inode sized other than 128.

Submitted by:	Alex Lyashkov <Alexey.Lyashkov@Sun.COM> (based on)
MFC after:	2 weeks
2009-01-18 14:04:56 +00:00
Konstantin Belousov
178e776169 Do not incorrectly add the low 5 bits of the offset to the resulting
position of the found zero bit.

Submitted by:	Jaakko Heinonen <jh saunalahti fi>
MFC after:	2 weeks
2009-01-04 15:56:49 +00:00
Edward Tomasz Napierala
0da50f6ef8 According to phk@, VOP_STRATEGY should never, _ever_, return
anything other than 0.  Make it so.  This fixes
"panic: VOP_STRATEGY failed bp=0xc320dd90 vp=0xc3b9f648",
encountered when writing to an orphaned filesystem.  Reason
for the panic was the following assert:
KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp));
at vfs_bio:bufstrategy().

Reviewed by:	scottl, phk
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-16 21:13:11 +00:00
Edward Tomasz Napierala
f02bd9eacc Adapt to accmode_t changes.
Approved by:	rwatson (mentor), kan
2008-11-14 09:58:16 +00:00
Attilio Rao
83b3bdbc8a Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
  mnt_lock and using instead a refcount in order to keep track of lock
  requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
  useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
  Change so its KPI by removing the interlock argument and defining 2 new
  flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
  old version (which was unlinked from the lockmgr alredy) and
  MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
  once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
  mnt_ref if running for more than 3 seconds, make it totally useless.
  Remove it as it was thought to work into older versions.
  If a problem of "refcount held never going away" should appear, we will
  need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
  leaving the MNTK_MWAIT flag on even if it the waiters were actually
  woken up. Just a place in vfs_mount_destroy() is left because it is
  going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with:	kib
Tested by:	pho
2008-11-02 10:15:42 +00:00
Edward Tomasz Napierala
15bc6b2bd8 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
Konstantin Belousov
3a8540afa5 Garbage-collect ext2_kqfilter vop that is now a copy of vop_stdkqfilter(). 2008-10-28 12:15:11 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Attilio Rao
0d7935fd01 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
Konstantin Belousov
caf8aec886 fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:50:52 +00:00
Konstantin Belousov
86dacdfe2b Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Reviewed by:	bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:46:45 +00:00
Konstantin Belousov
bdb8094763 Garbage-collect vn_write_suspend_wait().
Suggested and reviewed by:	tegge
Tested by:	pho
MFC after:	1 month
2008-09-16 11:09:26 +00:00
Sam Leffler
39297ba455 Make ddb command registration dynamic so modules can extend
the command set (only so long as the module is present):
o add db_command_register and db_command_unregister to add and remove
  commands, respectively
o replace linker sets with SYSINIT's (and SYSUINIT's) that register
  commands
o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table
  for registering top-level commands, show operands, and show all operands,
  respectively

While here also:
o sort command lists
o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases
  for existing commands
o add "show all trace" as an alias for "show alltrace"
o add "show all locks" as an alias for "show alllocks"

Submitted by:	Guillaume Ballet <gballet@gmail.com> (original version)
Reviewed by:	jhb
MFC after:	1 month
2008-09-15 22:45:14 +00:00
Edward Tomasz Napierala
dfa7fd1d70 Remove VSVTX, VSGID and VSUID. This should be a no-op,
as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID.

Approved by:	rwatson (mentor)
2008-09-10 13:16:41 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Attilio Rao
09400d5abe - Disallow XFS mounting in write mode. The write support never worked really
and there is no need to maintain it.
- Fix vn_get() in order to let it call vget(9) with a valid locking
  request.  vget(9) returns the vnode locked in order to prevent recycling,
  but in this case internal XFS locks alredy prevent it from happening, so
  it is safe to drop the vnode lock before to return by vn_get().
- Add a VNASSERT() in vget(9) in order to catch malformed locking requests.

Discussed with:	kan, kib
Tested by:	Lothar Braun <lothar at lobraun dot de>
2008-07-21 23:01:09 +00:00
Konstantin Belousov
eab626f110 Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by:	kris
Tested by:	kris, pho
Discussed with:	jeff, dfr
MFC after:	2 weeks
2008-04-16 11:33:32 +00:00
John Baldwin
d952ba1bd5 Fix a nit with the 'nofoo' options where 'foo' is mapped to 'nonofoo'
(such as 'atime' vs 'noatime').  The filesystems will always see either
'nofoo' or 'nonofoo', never plain 'foo'.  As such, their list of valid
mount options should include 'nofoo' instead of 'foo'.  With this fix,
you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as
noatime without getting an error.  You can also update a noatime FFS
filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime'
using nmount(2) (e.g. 7.x /sbin/mount binary).

MFC after:	1 week
Reviewed by:	crodig
2008-03-26 20:48:07 +00:00
Attilio Rao
628f51d275 Introduce some functions in the vnode locks namespace and in the ffs
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode

In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock

Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.
2008-02-24 16:38:58 +00:00
Attilio Rao
84887fa362 - Add real assertions to lockmgr locking primitives.
A couple of notes for this:
  * WITNESS support, when enabled, is only used for shared locks in order
    to avoid problems with the "disowned" locks
  * KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order
    to assert for a generic thread (not curthread) owning or not the
    lock.  Really, this kind of check is bogus but it seems very
    widespread in the consumers code.  So, for the moment, we cater this
    untrusted behaviour, until the consumers are not fixed and the
    options could be removed (hopefully during 8.0-CURRENT lifecycle)
  * Implementing KA_HELD and KA_UNHELD (not surported natively by
    WITNESS) made necessary the introduction of LA_MASKASSERT which
    specifies the range for default lock assertion flags
  * About other aspects, lockmgr_assert() follows exactly what other
    locking primitives offer about this operation.

- Build real assertions for buffer cache locks on the top of
  lockmgr_assert().  They can be used with the BUF_ASSERT_*(bp)
  paradigm.

- Add checks at lock destruction time and use a cookie for verifying
  lock integrity at any operation.

- Redefine BUF_LOCKFREE() in order to not use a direct assert but
  let it rely on the aforementioned destruction time check.

KPI results evidently broken, so __FreeBSD_version bumping and
manpage update result necessary and will be committed soon.

Side note: lockmgr_assert() will be used soon in order to implement
real assertions in the vnode namespace replacing the legacy and still
bogus "VOP_ISLOCKED()" way.

Tested by:      kris (earlier version)
Reviewed by:    jhb
2008-02-13 20:44:19 +00:00
Attilio Rao
0e9eb108f0 Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
  always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
  Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
  present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
  curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo
2008-01-24 12:34:30 +00:00
Attilio Rao
d638e093d6 - Introduce the function lockmgr_recursed() which returns true if the
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
  locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
  VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj

BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.

KPI results, obviously, broken so further commits will update manpages
and freebsd version.

Tested by: kris (on UFS and NFS)
2008-01-19 17:36:23 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Craig Rodrigues
d2169cb67d Remove duplicate "union" from ext2_opts.
Noticed by:	bde
2007-10-27 16:14:33 +00:00
Alfred Perlstein
77465d9390 Get rid of qaddr_t.
Requested by: bde
2007-10-16 10:54:55 +00:00
Olivier Houchard
7dd9c45f26 Some times ago, vfs_getopts() was changed, so that it would set error to
ENOENT if the option wasn't provided, instead of setting it to 0.
xfs however didn't catch up on this, so it assumed something went bad if
vfs_getopts() sets the error to non-zero, and just returns the error.
Unbreak xfs mount by just ignoring the error if vfs_getopts() sets the
error to ENOENT, as we should have sane defaults.

Reviewed by:    kan
Approved by:    re (rwatson)
Tested by:      rpaulo
2007-08-20 15:33:22 +00:00
John Baldwin
1dc5b1cc56 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
Craig Rodrigues
d678780e60 The last entry in the ext2_opts array must be NULL,
otherwise the kernel with crash in vfs_filteropt() if an invalid
mount option is passed to ext2fs.

Approved by:	re (kensmith)
2007-07-14 21:18:19 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Matt Jacob
4f9822d264 Remove 'inline' qualifiers from functions which are not, in fact, inlines. 2007-06-10 04:54:42 +00:00
Konstantin Belousov
7a31868ed0 Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file:
part 2. Convert calls missed in the first big commit.

Noted by:	rwatson
Pointy hat to:	kib
2007-06-01 14:33:11 +00:00
Jeff Roberson
1c4bcd050a - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
Alexander Kabaev
4e5001c263 Bow to incomplete GCC 4. constant propagation optimizations and
initialize some of the local variables GCC claims are being used
uninitialized.
2007-05-30 03:03:06 +00:00
Craig Rodrigues
3b1b4d767f Change #include <machine/pcpu.h> to #include <sys/pcpu.h>
to get definition of curthread, required by <sys/sx.h>.
2007-04-01 12:48:10 +00:00
John Baldwin
4e7f640dfb Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks.  The algorithms for
manipulating the lock cookie are very similar to that rwlocks.  This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior.  The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option.  Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels.  As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after:	1 month
Submitted by:	attilio
Tested by:	kris, pjd
2007-03-31 23:23:42 +00:00
Craig Rodrigues
cd8c2bbe40 Add "force" to ext2_ops, to match what was in the old mount_ext2fs binary.
Reported by:	Ivan Voras <ivoras fer hr>
2007-03-15 00:09:50 +00:00
Tor Egge
61b9d89ff0 Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
Pawel Jakub Dawidek
bb531912ff Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better
describe the privilege.

OK'ed by:	rwatson
2007-03-01 20:47:42 +00:00
Pawel Jakub Dawidek
3b2eb461e0 Avoid checking for privileges if there is no need to.
Discussed with:	rwatson
2007-03-01 20:38:24 +00:00
Pawel Jakub Dawidek
10bcafe9ab Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
Konstantin Belousov
2cc7d26f7f Cylinder group bitmaps and blocks containing inode for a snapshot
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.

Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.

Reviewed by:	tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by:	Peter Holm
X-MFC after:	3 weeks (if ever: it changes ABI)
2007-01-23 10:01:19 +00:00
Craig Rodrigues
ba8e255297 Previously, the mount_ext2fs binary listed the acceptable mount
options for ext2fs.  Now that we use nmount() directly from the mount
binary to access ext2fs filesystems, add the list of acceptable mount
options to ext2_ops, so that vfs_filteropts() will accept
options like "noatime" for ext2fs.

PR:		105483
Noticed by:	Dr. Markus Waldeck <waldeck gmx de>
MFC after:	1 month
2006-11-18 18:22:11 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Tor Egge
a1e363f256 Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC.  Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.
2006-09-26 04:15:59 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Alexander Leidinger
d056fa046c Add snd_emu10kx driver for Creative SoundBlaster Live! and Audigy series
sound cards with optional pseudo-multichannel playback.

It's based on snd_emu10k1 sound driver. Single channel version is available
from audio/emu10kx port since some time.

The two new ALSA header files (GPLed), which contain Audigy 2 ("p16v") and
Audigy 2 Value ("p17v") specific interfaces, are latest versions from ALSA
Mercurial repository.

This is not connected to the build yet.

Submitted by:	Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
2006-07-15 19:36:28 +00:00
Alexander Leidinger
53f4fb1109 - Update ALSA emu10k1.h (it was imported as emu10k1-alsa.h) header file to
latest version from Mercurial repository. It brings definition of some
  additional Audigy 2 / Audigy 2 Value registers.
- Use new #defines from ALSA emu10k1.h
- Remove unused include files:
  + emu10k1-ac97.h was imported from ALSA and never used,
  + emu10k1.h was imported from Creative Linux emu10k1 driver, but only
    AUDIGY_CODEBASE was used from it.

Submitted by:	Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
2006-07-15 19:19:54 +00:00
Craig Rodrigues
277815d5c2 Implement vnode operations for setting and removing extended attributes. 2006-06-11 03:32:50 +00:00
Craig Rodrigues
c45dc21431 Restore routines for getting and listing extended attributes which
were lost in the last merge.
2006-06-11 00:55:32 +00:00
Craig Rodrigues
6c0a366182 Restore changes to spinlock macros before merge. 2006-06-11 00:54:35 +00:00
Craig Rodrigues
79fde20246 Remove debugging printf 2006-06-11 00:19:00 +00:00
Craig Rodrigues
1aba098ee0 Temporarily disable log recovery until we fix panics. 2006-06-10 23:40:58 +00:00
Craig Rodrigues
a41448133d Logical OR the following flags into the va_mode field:
S_IFDIR when making a directory
S_IFLNK when making a symbolic link
S_IFIFO when making a pipe

xfs_ialloc() checks this field for these flags when figuring
out whether to make a directory, make a symbolic link or make a pipe.
2006-06-10 23:10:55 +00:00
Craig Rodrigues
a65032a3ae Call g_vfs_close() if:
(1)  _xfs_mount() fails
(2)  at the end of _xfs_unmount()
2006-06-10 19:04:21 +00:00
Craig Rodrigues
490f186b62 Do not call vput() after we call VOP_UNLOCK(). 2006-06-10 19:02:13 +00:00
Craig Rodrigues
1b5ea8f89c Change %llx to %jx in printf() to eliminate warnings on 64-bit platforms. 2006-06-09 06:57:00 +00:00
Craig Rodrigues
20e82979ab Bring back changes in version 1.3 lost in previous commit. 2006-06-09 06:40:22 +00:00
Craig Rodrigues
94ba00f925 More changes due to latest XFS import.
Work done by:	Russell Cattelan <cattelan at xfs dot org>
2006-06-09 06:07:42 +00:00
Craig Rodrigues
331b6cc0ea Sync XFS for FreeBSD tree with newer changes from SGI XFS for Linux tree.
Improve support for writing to XFS partitions.

Work done by:	Russell Cattelan <cattelan at xfs dot org>
2006-06-09 06:04:06 +00:00
Craig Rodrigues
392cb4c78c Include "xfs_macros.h" to fix tinderbox build breakage. 2006-06-01 20:51:59 +00:00
Warner Losh
088c5ab556 Cope with -Wundef. This means including xfs_macros.h early in a few more
files and changing #if XXXKAN -> #ifdef XXXKAN.

# this is just compile tested, since I don't have xfs partitions.
2006-06-01 19:01:47 +00:00
Craig Rodrigues
4df2902f3b Add support for "export" option, to allow NFS exporting
of XFS filesystems.
2006-05-26 13:01:53 +00:00
Craig Rodrigues
4ca073e86c Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.

Approved by:	dumbbell
2006-05-26 11:58:30 +00:00
Craig Rodrigues
5eb304a91a Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 00:32:21 +00:00
Giorgos Keramidas
ac7050c114 Check for VFS_STATFS() failure in _xfs_mount() and abort the mount
on errors.

Found by:	Coverity Prevent
Approved by:	rodrigc, Russell Cattelan
MFC after:	4 weeks
2006-05-05 18:41:56 +00:00
Martin Cracauer
447be3f1e4 Repair ext2fs writes.
Strong candidate for backport to 6.x.

When allocating new blocks, the search for block group beginnings
would fail with a segfault.  There was a side-effect read access with
an off-by-one errors.  The results were not used in the error case so
the code worked in the past.  But now the FreeBSD kernel has tighter
mappings and the word accessed is not mapped (for me).

The Linux kernel has rewritten most of the allocation strategy by now.
Also, the Linux kernel cleaned up the integration of these files and
it look feasable to wrap the original Linux files in wrapper that
provides their favorite arguments instead of dragging around our own
code.
2006-04-13 19:37:32 +00:00
John Baldwin
8abafcd00f Update a DB_SET to DB_FUNC I missed yesterday. 2006-03-08 15:47:48 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Jean-Sébastien Pédron
566abf420d Don't hold a reference to the disk vnode for each inode. 2006-01-05 19:27:07 +00:00
Martin Cracauer
0265cf8906 This is the style-fix for my previous commit. Sorry for the delay, I
forgot about it.
2005-12-29 21:34:49 +00:00
Dag-Erling Smørgrav
0430a5e289 Eradicate caddr_t from the VFS API. 2005-12-14 00:49:52 +00:00
Craig Rodrigues
63ec04adea Hide DDB-specific functions inside check for #ifdef DDB.
Noticed by:	des
2005-12-13 22:42:02 +00:00
Craig Rodrigues
e497db39e8 Inherit system-wide BLKDEV_IOSIZE definition.
Submitted by:	kan
2005-12-13 02:32:30 +00:00
Craig Rodrigues
15fd62fa0f #define __user to nothing 2005-12-12 03:21:37 +00:00
Craig Rodrigues
93d9c69ff4 Initial import of read-only support for SGI's XFS filesystem.
Contributed by:		XFS for FreeBSD project
2005-12-12 01:04:32 +00:00
Ruslan Ermilov
342ed5d948 Fix -Wundef warnings found when compiling i386 LINT, GENERIC and
custom kernels.
2005-12-05 11:58:35 +00:00
Ruslan Ermilov
3238c6bd33 Fix -Wundef from compiling the amd64 LINT. 2005-12-04 10:06:06 +00:00
Ruslan Ermilov
b2be20597c Oops, the bug is still here, but reimplement the cpp(1) conditional properly. 2005-12-04 09:57:09 +00:00
Ruslan Ermilov
44dbe53f04 There no longer seems to be this bug in gcc(1). Remove the
badly implemented workaround that caused a workaround to be
applied to all architectures, not only amd64.
2005-12-04 09:47:20 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Martin Cracauer
12c8305335 Fix this:
kern/87959	cracauer	ext2fs: no cp(1) possible, mmap returns EINVAL

ext2fs was missing vnode_create_vobject.

(Reisefs probably has the same problem but I want to get this in quick
for 6-release)
2005-10-28 18:39:00 +00:00
Jean-Sébastien Pédron
9d575322b0 Apply the same fix to a potential race in the ISDOTDOT code
in reiserfs_lookup() that was used to fix an actual race in
ufs_lookup.c:1.78. This is not currently a hazard, but the
bug would be activated by marking reiserfs as MPSAFE.

Reviewed by:	mux (mentor)
MFC after:	2 weeks
2005-10-21 09:15:26 +00:00
Don Lewis
9e4ce0ae8f Apply the same fix to a potential race in the ISDOTDOT code in
ext2_lookup() that was used to fix an actual race in ufs_lookup.c:1.78.
This is not currently a hazard, but the bug would be activated by
marking ext2fs as MPSAFE.

Requested by:	bde
MFC after:	2 weeks
2005-10-16 21:39:29 +00:00
Robert Watson
5f419982c2 Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by:	bde
2005-09-28 07:03:03 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
Craig Rodrigues
bdb7d194d0 In ext2_mountfs(), check that the superblock size, SBSIZE,
is aligned with the sectorsize value returned by GEOM, before
doing a bread() of the superblock.
This eliminates a panic when trying the following on an empty CD-ROM drive:
mount_ext2fs /dev/acd0 /mnt

Reviewed by:	phk
2005-09-10 21:30:49 +00:00
Don Lewis
d07f87a218 Add a new struct buf flag bit, B_PERSISTENT, and use it to tag
struct bufs that are persistently held by ext2fs.  Ignore any buffers
with this flag in the code in boot() that counts "busy" and dirty
buffers and attempts to sync the dirty buffers, which is done before
attempting to unmount all the file systems during shutdown.

This fixes the problem caused by any ext2fs file systems that are
mounted at system shutdown time, which caused boot() to give up on
a non-zero number of buffers and skip the call to vfs_unmountall().
This left all the mounted file systems in a dirty state and caused
them to all require cleanup by fsck on reboot.

Move the two separate copies of the "busy" buffer test in boot()
to a separate function.

Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro.

Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with
this and previous flag changes.

PR:		kern/56675, kern/85163
Tested by:	"Matthias Andree" matthias.andree at gmx.de
Reviewed by:	bde
MFC after:	3 days
2005-09-08 06:30:05 +00:00
Suleiman Souhlal
68da388325 Unbreak hpfs/ntfs/udf/ext2fs/reiserfs mounting.
Another pointyhat to:	ssouhlal
2005-09-03 20:23:41 +00:00