Commit Graph

816 Commits

Author SHA1 Message Date
pjd
b49c6b382f The v_data field is a pointer, so set it to NULL, not 0.
MFC after:	3 days
2011-10-25 14:01:17 +00:00
jonathan
9c782bcdf6 Change one printf() to log().
As noted in kern/159780, printf() is not very jail-friendly, since it can't be easily monitored by jail management tools. This patch reports an error via log() instead, which, if nobody is watching the log file, still prints to the console.

Approved by: mentor (rwatson)
Submitted by: Eugene Grosbein <eugen@eg.sd.rdtc.ru>
MFC after: 5 days
2011-10-07 09:51:12 +00:00
kib
cc9fd3ad53 Move parts of the commit log for r166167, where Tor explained the
interaction between vnode locks and vfs_busy(), into comment.

MFC after:	1 week
2011-10-04 18:45:29 +00:00
attilio
683d7a54ce Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
mckusick
ffeefed9fc Move the MNTK_SUJ flag in mnt_kern_flag to MNT_SUJ in mnt_flag
so that it is visible to userland programs. This change enables
the `mount' command with no arguments to be able to show if a
filesystem is mounted using journaled soft updates as opposed
to just normal soft updates.

Approved by: re (bz)
2011-07-24 18:27:09 +00:00
alc
21902be08c Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings.  Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write().  It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by:	kib
2011-06-29 16:40:41 +00:00
jonathan
d77535edb6 Tidy up a capabilities-related comment.
This comment refers to an #ifdef that hasn't been merged [yet?]; remove it.

Approved by: rwatson
2011-06-24 14:40:22 +00:00
mdf
bbbc4c5455 Use a name instead of a magic number for kern_yield(9) when the priority
should not change.  Fetch the td_user_pri under the thread lock.  This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.

Requested by:	Jason Behmer < jason DOT behmer AT isilon DOT com >
2011-05-13 05:27:58 +00:00
attilio
d685681d59 Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, des
MFC after:	2 weeks
2011-04-28 16:02:05 +00:00
rmacklem
872195caf9 Fix a LOR in vfs_busy() where, after msleeping, it would lock
the mutexes in the wrong order for the case where the
MBF_MNTLSTLOCK is set. I believe this did have the
potential for deadlock. For example, if multiple nfsd threads
called vfs_busyfs(), which calls vfs_busy() with MBF_MNTLSTLOCK.
Thanks go to pho for catching this during his testing.

Tested by:	pho
Submitted by:	kib
MFC after:	2 weeks
2011-04-23 11:22:48 +00:00
pluknet
6d33997006 Remove malloc type M_NETADDR unused since splitting into vfs_subr.c
and vfs_export.c.

MFC after:	1 week
2011-04-04 16:23:01 +00:00
kib
4d0733e0f8 Do not assert buffer lock in VFS_STRATEGY() when kernel already paniced.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2011-03-08 11:50:59 +00:00
mdf
33ee365b55 Based on discussions on the svn-src mailing list, rework r218195:
- entirely eliminate some calls to uio_yeild() as being unnecessary,
   such as in a sysctl handler.

 - move should_yield() and maybe_yield() to kern_synch.c and move the
   prototypes from sys/uio.h to sys/proc.h

 - add a slightly more generic kern_yield() that can replace the
   functionality of uio_yield().

 - replace source uses of uio_yield() with the functional equivalent,
   or in some cases do not change the thread priority when switching.

 - fix a logic inversion bug in vlrureclaim(), pointed out by bde@.

 - instead of using the per-cpu last switched ticks, use a per thread
   variable for should_yield().  With PREEMPTION, the only reasonable
   use of this is to determine if a lock has been held a long time and
   relinquish it.  Without PREEMPTION, this is essentially the same as
   the per-cpu variable.
2011-02-08 00:16:36 +00:00
mdf
b291e9a365 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
kib
f5d7ab843b When vtruncbuf() iterates over the vnode buffer list, lock buffer object
before checking the validity of the next buffer pointer. Otherwise, the
buffer might be reclaimed after the check, causing iteration to run into
wrong buffer.

Reported and tested by:	pho
MFC after:	1 week
2011-01-25 14:04:02 +00:00
mdf
ca68749f2a Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.
2011-01-18 21:14:18 +00:00
mdf
f6a71a40b2 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
2011-01-12 19:54:19 +00:00
jhb
3a7fd5f8c7 - Restore dropping the priority of syncer down to PPAUSE when it is idle.
This was lost when it was converted to using a condition variable instead
  of lbolt.
- Drop the priority of flowtable down to PPAUSE when it is idle as well
  since it is a similar background task.

MFC after:	2 weeks
2011-01-06 22:17:07 +00:00
kib
747e187ac4 Teach ddb "show mount" about MNTK_SUJ flag. 2010-12-27 12:06:38 +00:00
kib
87501c4bfe Allow shared-locked vnode to be passed to vunref(9).
When shared-locked vnode is supplied as an argument to vunref(9) and
resulting usecount is 0, set VI_OWEINACT and do not try to upgrade vnode
lock. The later could cause vnode unlock, allowing the vnode to be
reclaimed meantime.

Tested by:	pho
MFC after:	1 week
2010-11-24 12:30:41 +00:00
kib
7980fb6d3a Remove prtactive variable and related printf()s in the vop_inactive
and vop_reclaim() methods. They seems to be unused, and the reported
situation is normal for the forced unmount.

MFC after:   1 week
X-MFC-note:  keep prtactive symbol in vfs_subr.c
2010-11-19 21:17:34 +00:00
brucec
d3c1b43ec6 Fix some more style(9) issues. 2010-11-14 16:10:15 +00:00
brucec
34e35cbdea Fix style(9) issues from r215281 and r215282.
MFC after:	1 week
2010-11-14 08:06:29 +00:00
brucec
76f2d09c8d Add descriptions to some more sysctls.
PR:	kern/148510
MFC after:	1 week
2010-11-14 07:38:42 +00:00
kib
107ea66c07 Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak
when mount and update are executed in parallel.

Encapsulate syncer vnode deallocation into the helper function
vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c.

Found and reviewed by:	jh (previous version of the patch)
Tested by:	pho
MFC after:	3 weeks
2010-09-11 13:06:06 +00:00
emaste
5216dea167 As long as we are going to panic anyway, there's no need to hide additional
information behind DIAGNOSTIC.
2010-09-01 13:47:11 +00:00
jh
c2cb836190 execve(2) has a special check for file permissions: a file must have at
least one execute bit set, otherwise execve(2) will return EACCES even
for an user with PRIV_VFS_EXEC privilege.

Add the check also to vaccess(9), vaccess_acl_nfs4(9) and
vaccess_acl_posix1e(9). This makes access(2) to better agree with
execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check
to zfs_freebsd_access() too. There may be other file systems which are
not using vaccess*() functions and need to be handled separately.

PR:		kern/125009
Reviewed by:	bde, trasz
Approved by:	pjd (ZFS part)
2010-08-30 16:30:18 +00:00
pjd
bc73fabf27 There is a bug in vfs_allocate_syncvnode() failure handling in mount code.
Actually it is hard to properly handle such a failure, especially in MNT_UPDATE
case. The only reason for the vfs_allocate_syncvnode() function to fail is
getnewvnode() failure. Fortunately it is impossible for current implementation
of getnewvnode() to fail, so we can assert this and make
vfs_allocate_syncvnode() void. This in turn free us from handling its failures
in the mount code.

Reviewed by:	kib
MFC after:	1 month
2010-08-28 08:57:15 +00:00
kib
ade28bdd40 The buffers b_vflags field is not always properly protected by
bufobj lock. If b_bufobj is not NULL, then bufobj lock should be
held when manipulating the flags. Not doing this sometimes leaves
BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to
stuck indefinitely in "getbuf" sleep, waiting for background write to
finish which is not actually performed.

Add BO_LOCK() in the cases where it was missed.

In collaboration with:	pho
Tested by:	bz
Reviewed by:	jeff
MFC after:	1 month
2010-08-12 08:36:23 +00:00
alc
b6ec5a5f0a In order for MAXVNODES_MAX to be an "int" on powerpc and sparc, we must
cast PAGE_SIZE to an "int".  (Powerpc and sparc, unlike the other
architectures, define PAGE_SIZE as a "long".)

Submitted by:	Andreas Tobler
2010-08-04 05:09:02 +00:00
alc
329f9f0435 Update the "desiredvnodes" calculation. In particular, make the part of
the calculation that is based on the kernel's heap size more conservative.
Hopefully, this will eliminate the need for MAXVNODES_MAX, but for the
time being set MAXVNODES_MAX to a large value.

Reviewed by:	jhb@
MFC after:	6 weeks
2010-08-02 21:33:36 +00:00
ed
76489ac1ea Use ISO C99 integer types in sys/kern where possible.
There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.
2010-06-21 09:55:56 +00:00
pjd
b3024a4af9 Backout r207970 for now, it can lead to deadlocks.
Reported by:	kan
MFC after:	3 days
2010-06-17 17:39:51 +00:00
kib
2ba33ab98e Sometimes vnodes share the lock despite being different vnodes on
different mount points, e.g. the nullfs vnode and the covered vnode
from the lower filesystem. In this case, existing assertion in
vop_rename_pre() may be triggered.

Check for vnode locks equiality instead of the vnodes itself to
not trip over the situation.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
Tested by:	pho
MFC after:	2 weeks
2010-06-03 10:20:08 +00:00
zml
773cda6040 Add VOP_ADVLOCKPURGE so that the file system is called when purging
locks (in the case where the VFS impl isn't using lf_*)

Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        zml, dfr
2010-05-12 21:24:46 +00:00
pjd
05f836c1c3 When there is no memory or KVA, try to help by reclaiming some vnodes.
This helps with 'kmem_map too small' panics.

No objections from:	kib
Tested by:		Alexander V. Ribchansky <shurik@zk.informjust.ua>
MFC after:		1 week
2010-05-12 16:42:28 +00:00
pjd
f1b200bbcc I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.

MFC after:	3 days
2010-05-11 22:46:36 +00:00
jeff
a574495410 - Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
   for background fsck on unclean shutdown.

Sponsored by:   iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
2010-04-24 07:05:35 +00:00
jh
6b4bef1bca Add missing MNT_NFS4ACLS. 2010-04-04 14:48:43 +00:00
pjd
bd6cec6aca Fix some whitespace nits. 2010-04-03 11:19:20 +00:00
pjd
f0663e1c41 Add missing mnt_kern_flag flags in 'show mount' output. 2010-04-03 11:15:55 +00:00
kib
86c35b90b7 Add function vop_rename_fail(9) that performs needed cleanup for locks
and references of the VOP_RENAME(9) arguments. Use vop_rename_fail()
in deadfs_rename().

Tested by:	Mikolaj Golub
MFC after:	1 week
2010-04-02 14:03:01 +00:00
kib
1c4578d223 Add new function vunref(9) that decrements vnode use count (and hold
count) while vnode is exclusively locked.

The code for vput(9), vrele(9) and vunref(9) is merged.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	3 weeks
2010-01-17 21:24:27 +00:00
kib
56603546c6 Add a knob to allow reclaim of the directory vnodes that are source of
the namecache records. The reclamation is not enabled by default because
for typical workload it would make namecache unusable, but large nested
directory tree easily puts any process that accesses filesystem into 1
second wait for vlru.

Reported by:	yar (long time ago)
MFC after:	3 days
2009-12-28 15:35:39 +00:00
trasz
a78d6a0fda Now that all the callers seem to be fixed, add KASSERTs to make sure VAPPEND
is not being used improperly.
2009-12-26 11:36:10 +00:00
kib
b79e14054c VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object
flag. Besides providing the redundand information, need to update both
vnode and object flags causes more acquisition of vnode interlock.
OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.

Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for
vnode-backed vm objects.

Suggested and reviewed by:	alc
Tested by:	pho
MFC after:	3 weeks
2009-12-21 12:29:38 +00:00
jh
0c0aa71530 Extend ddb(4) "show mount" command to print active string mount options.
Note that only option names are printed, not values.

Reviewed by:	pjd
Approved by:	trasz (mentor)
MFC after:	2 weeks
2009-11-19 14:33:03 +00:00
trasz
d5661d631d Provide default implementation for VOP_ACCESS(9), so that filesystems which
want to provide VOP_ACCESSX(9) don't have to implement both.  Note that
this commit makes implementation of either of these two mandatory.

Reviewed by:	kib
2009-10-01 17:22:03 +00:00
rwatson
53eaed07bc Use C99 initialization for struct filterops.
Obtained from:	Mac OS X
Sponsored by:	Apple Inc.
MFC after:	3 weeks
2009-09-12 20:03:45 +00:00
kib
3119d26c95 In vfs_mark_atime(9), be resistent against reclaimed vnodes.
Assert that neccessary locks are taken, since vop might not be called.

Tested by:	pho
MFC after:	3 days
2009-09-09 10:51:50 +00:00