966 Commits

Author SHA1 Message Date
jeff
46f09d5bc3 - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
 - Reflect these changes in the proc.h documentation and consumers throughout
   the kernel.  This is a substantial reduction in locking cost for these
   fields and was made possible by recent changes to threading support.
2008-03-19 06:19:01 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
cokane
1dbd762cae Replace the non-MPSAFE timeout(9) API in ffs_softdep.c with the MPSAFE
callout_* API (e.g. callout_init_mtx(9)). This was one of the numerous
items on the http://wiki.freebsd.org/SMPTODO list.

Reviewed by:	imp, obrien, jhb
MFC after:	1 week
2008-03-13 20:15:48 +00:00
emaste
747c71e58b Remove include of opt_quota.h; as of revision 1.205 there is no longer
any #ifdef QUOTA conditional code.
2008-03-10 18:44:07 +00:00
kib
e378ea9934 Initialize mnt_stat.f_iosize before autostarting UFS1 extattrs.
It is normally initialized by ffs_statfs() after ffs_mount finished.

The extattr autostart code calls the ufs_lookup(), that uses value above
to iterate over the directory blocks, see bmask initialization in the
ufs_lookup() and ufsdirhash. Having the filesystem with root directory
spanning more then one block would result in reading a random kernel
memory.

PR:	kern/120781
Test case provided by:	rwatson
MFC after:	1 week
2008-03-05 16:34:03 +00:00
rwatson
96767d9090 Move setting of MNTK_MPSAFE flag before UFS1 extended attribute
auto-start so that the flag is set before we start performing I/O
in the auto-start routine.

MFC after:	2 weeks
Suggested by:	kib
2008-03-04 12:10:03 +00:00
keramida
41eb7744b1 Minor typo nit. 2008-02-25 19:31:44 +00:00
attilio
4014b55830 Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by:	Andrea Barberio <insomniac at slackware dot it>
2008-02-25 18:45:57 +00:00
attilio
0d54671a48 Introduce some functions in the vnode locks namespace and in the ffs
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode

In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock

Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.
2008-02-24 16:38:58 +00:00
attilio
265cb5fb91 - Introduce lockmgr_args() in the lockmgr space. This function performs
the same operation of lockmgr() but accepting a custom wmesg, prio and
  timo for the particular lock instance, overriding default values
  lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo.
- Use lockmgr_args() in order to implement BUF_TIMELOCK()
- Cleanup BUF_LOCK()
- Remove LK_INTERNAL as it is nomore used in the lockmgr namespace

Tested by:	Andrea Barberio <insomniac at slackware dot it>
2008-02-15 21:04:36 +00:00
attilio
7213f4c32b Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
  always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
  Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
  present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
  curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo
2008-01-24 12:34:30 +00:00
attilio
caa2ca048b - Introduce the function lockmgr_recursed() which returns true if the
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
  locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
  VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj

BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.

KPI results, obviously, broken so further commits will update manpages
and freebsd version.

Tested by: kris (on UFS and NFS)
2008-01-19 17:36:23 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
kib
149cd5b092 ffs_balloc_ufsX() routines, in the case of recovering from the failed
allocation, free the indirect blocks before clearing the disk pointers,
that could lead to the softupdate inconsistencies in the case of the
machine or disk crash at the wrong time.

Rearrange the recover code to do the ffs_blkfree() after the second
ffs_syncvnode(), that clears the pointers chain.

Proposed and reviewed by:	tegge
Tested by:	Peter Holm
MFC after:	3 weeks
2008-01-03 12:28:57 +00:00
obrien
51625bf288 style(9) 2008-01-02 01:19:17 +00:00
kib
c9fb56ffc8 The ffs_balloc() routines, whan allocating the indirect blocks for
the inode, do the rollback in case the allocation failed (due to
insufficient free space or quota limits). But, the code does leaves the
buffers corresponding to the inoirect blocks on the vnode bufobj list.
This causes several assertion failures (for instance, "ffs_truncate3"
in ffs_truncate()) to fail, and could result in the indirect block
aliasing problem, like writing the context of such blocks to random
disk location.

Remove the buffers from the bufobj properly.

Reported and tested by:	Peter Holm
Reviewed by:	tegge
MFC after:	3 weeks
2007-12-29 13:31:27 +00:00
kensmith
66cb6fd44f Fix a broken check that recently became more annoying because it now
gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently
nobody uses).  From Tor's description:

  This happens when the block range spans two block maps, the first in the
  inode (mapping up to NDADDR direct blocks) and the second being the first
  indirect block.  The current check assumes that both block maps are
  indirect blocks.

Work done by:	tegge
Tested by:	kris, kensmith
2007-12-01 13:12:43 +00:00
ru
1f9b2f54c8 Fix build without INVARIANTS and update a comment to match
a change made in previous revision.
2007-11-09 11:04:36 +00:00
obrien
1ae16b4e64 Turn most ffs 'DIAGNOSTIC's into INVARIANTS. 2007-11-08 17:21:51 +00:00
rwatson
60570a92bf Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
julian
51d643caa6 Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
alfred
3a60df401c Get rid of qaddr_t.
Requested by: bde
2007-10-16 10:54:55 +00:00
bz
621c3a5b99 Fix a DIV0 in case a large value for fs_avgfilesize or fs_avgfpdir
is given (with newfs or tunefs) and dirsize overflows.

In case dirsize is <= 0 because of an overflow set maxcontigdirs
to 0 so it will be 1 later. This is what would happen for large
fs_avgfilesize. [1]

Identified with help from:	roberto, pjd
Submitted by:			pjd [1]
Approved by:			re (rwatson)
MFC after:			8 days
2007-09-10 14:12:29 +00:00
rodrigc
85690ea861 Perform range check before allocating memory when reading
extended attributes.

Reviewed by:	kib
Approved by:	re (hrs)
PR:		114389
2007-07-13 18:51:08 +00:00
kib
2f486f25b6 Fix livelock that could occur when snapshoting UFS with quotas, where
some quota limit was exceeded. Sequence of UFS_VALLOC()/UFS_VFREE()
call there could cause inodeblock to have both freefile and inodedep
dependencies without any inode in the block being marked for write.
Then, softdep_check_suspend() would return EAGAIN forewer.

Force write of inodeblock with allocated freefile softdependency by
setting IN_MODIFIED flag in softdep_freefile and unconditionally calling
UFS_UPDATE() in ufs_reclaim.

Reported by:	kris
Debug help and tested by: 	Peter Holm
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-06-22 13:22:37 +00:00
rwatson
00b02345d4 Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
jeff
91d1501790 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
jeff
a7a8bac81f - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
kib
162fa8dc6d Since renaming of vop_lock to _vop_lock, pre- and post-condition
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.
2007-05-18 13:02:13 +00:00
thompsa
25f570bc48 Add a newline to the printf message. 2007-05-03 22:39:52 +00:00
kib
0f0ec54312 Fix the NAMEI zone leak when snapshot was successfully created.
Reported and tested by:	Peter Holm
MFC after:		2 weeks
2007-04-10 09:31:42 +00:00
kib
859db1e740 Recalculate the NEWBLOCK flag for pagedep structure after the softdep
lock is dropped, since pagedep may be already processed and deallocated.

Found and tested by:	kris
MFC after:		2 weeks
2007-04-10 09:30:41 +00:00
kib
61f6e27c62 When LK_NOWAIT is passed as argument to process_worklist_item(), this
does not prevent handle_workitem_remove() from recursing into a blocking
version. Add the dirrem to worklist instead of processing it now if this
is the case.

Reported and tested by:	kris
Submitted by:		tegge
MFC after:		2 weeks
2007-04-10 09:28:17 +00:00
delphij
5a4b35079b Use *_EMPTY macros when appropriate. 2007-04-04 07:29:53 +00:00
kib
951894c128 Revert rev. 1.205. Replace unconditional acquision of Giant when QUOTAS are
defined with VFS_LOCK_GIANT(NULL) call.
This shall fix softdep operation when mpsafe_vfs = 0.

Reported and tested by:	kris
Submitted by:	tegge
MFC after:	1 week
2007-03-29 08:26:04 +00:00
kib
7f02b9589e Mark UFS as being MP-Safe in "options QUOTA" case too. Remove no more
neccessary Giant acquisions in softdepend processing code.

Tested by:	Peter Holm
Reviewed by:	tegge
Approved by:	re (kensmith)
2007-03-20 10:51:45 +00:00
brian
c235d1165c When we write extended attributes, assert that the inode hasn't
already been deleted.  The assertion is important to show that
we won't end up accounting for extended attribute blocks (using
fs_pendingblocks) in our subsequent call to fs_alloc().

Agreed verbally by: mckusick

MFC after:	3 weeks
2007-03-19 18:51:02 +00:00
kib
3f7d43feef Implement fine-grained locking for UFS quotas.
Each struct dquot gets dq_lock mutex to protect dq_flags and to interlock
with DQ_LOCK. qhash, dqfreelist and dq.dq_cnt are protected by global
dqhlock mutex.

i_dquot array for inode is protected by lockmgr' vnode lock, corresponding
assert added to the dqget(). Access to struct ufsmount quota-related fields
(um_quotas and um_qflags) is protected by um_lock.

Tested by:	Peter Holm
Reviewed by:	tegge
Approved by:	re (kensmith)

This work were not possible without enormous amount of help given by
Tor Egge and Peter Holm. Tor reviewed each version of patch, pointed out
numerous errors and provided invaluable suggestions. Peter did tireless
testing of the patch as it was developed.
2007-03-14 08:54:08 +00:00
tegge
214bc5723c Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
pjd
d38fdb51a0 Fix build breakage. 2007-03-01 23:14:46 +00:00
pjd
e544923453 Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better
describe the privilege.

OK'ed by:	rwatson
2007-03-01 20:47:42 +00:00
pjd
9558665f1e Avoid checking for privileges if there is no need to.
Discussed with:	rwatson
2007-03-01 20:38:24 +00:00
brian
c3843b2cca Account for di_blocks allocations when IN_SPACECOUNTED is set in an
inode's i_flag.

It's possible that after ufs_infactive() calls softdep_releasefile(),
i_nlink stays >0 for a considerable amount of time (> 60 seconds here).
During this period, any ffs allocation routines that alter di_blocks
must also account for the blocks in the filesystem's fs_pendingblocks
value.

This change fixes an eventual df/du discrepency that will happen as
the result of fs_pendingblocks being reduced to <0.

The only manifestation of this that people may recognise is the
following message on boot:

    /somefs: update error: blocks -N files M

at which point the negative pending block count is adjusted to zero.

Reviewed by:	tegge
MFC after:	3 weeks
2007-02-23 20:23:35 +00:00
mckusick
96503737f7 The functions that set and delete external attributes must check
that the filesystem is not mounted read-only before proceeding.

Reported by: Ryan Beasley <ryanb@FreeBSD.org>
MFC after: 1 week
2007-02-21 08:50:06 +00:00
mckusick
368de56c4b This README file is obsolete. The cited problems were fixed long ago
and the code is installed by default so no longer requires action by
the administrator to be included.
2007-02-17 08:25:43 +00:00
pjd
cb2d7c85a8 Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
tegge
06da132002 Call pbgetvp() and pbrelvp() instead of setting b_vp directly.
PR:		kern/108151
2007-02-04 23:42:02 +00:00
kib
fdd50404d1 Cylinder group bitmaps and blocks containing inode for a snapshot
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.

Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.

Reviewed by:	tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by:	Peter Holm
X-MFC after:	3 weeks (if ever: it changes ABI)
2007-01-23 10:01:19 +00:00
mpp
0f6ed07b89 Quota system cleanup.
1) Do not do quota accounting for the actual quota data files
   or for file system snapshot files ("system" files).  This
   prevents a deadlock descibed in PR kern/30958 if the kernel
   ever has to grow the quota file.  Snapshot files were already
   exempt from the quota checks, but this change generalized the check.
2) Fix a cast that caused extremely large uids/gids to incorrectly
   write the quota information to the data file at a truncated
   value for a uint_t32 id value.  The incorrect cast caused quota
   files in this case to be around 4GB in size, with the correct cast
   they can now be 131GB in size.  Also related to PR kern/30958.
3) Check for what appear to be negative UIDs/GIDs and not account
   for them.  This prevents the quota files from becoming 131GB in
   size and causing quotacheck to run forever at bootup.  This could
   also cause the kernel to try and expand the quota file, which might
   deadlock due to the issue in #1.  kern/30958 and kern/38156
   (and some much older closed PR's).
4) With the deadlock problems gone, the kernel can now expand the
   size of the quota database files if it needs to.
5) Pass in the i-node count change value to chkiq and chkiqchg as an
   int, like it used to be before the common routine was split up
   into 2 different routines to increase / decrease the i-node in-use
   count.  Prevents an underflow on the i-node count.  Related
   to PR kern/89247.
6) Prevent the block usage from growing slowly if a file system is
   full and the write was denied due to that fact.  PR kern/89247.

Some of these changes require an updated quotacheck to prevent
the creation of huge (131GB) quota data files (item #3).

#1/#4 probably fixes a lot of the random hangs when quotas are enabled,
possibly some of the jail hangs.
2007-01-20 11:58:32 +00:00