buffer for the last vnode on the mount back to the server, it
returns. At that point, the code continues with the unmount,
including freeing up the nfs specific part of the mount structure.
It is possible that an nfsiod thread will try to check for an
empty I/O queue in the nfs specific part of the mount structure
after it has been free'd by the unmount. This patch avoids this problem by
setting the iodmount entries for the mount back to NULL while holding the
mutex in the unmount and checking the appropriate entry is non-NULL after
acquiring the mutex in the nfsiod thread.
Reported and tested by: pho
Reviewed by: kib
MFC after: 2 weeks
fixed the issues with file descriptor locks, but the same problems are
present for vnode lock/user map lock.
If the nfs_asyncio() cannot find the free nfsiod, schedule task to
create new nfsiod and return error. This causes fall back to the
synchronous i/o for nfs_strategy(), or does not start read at all in
the case of readahead. The caller that holds vnode and potentially
user map lock does not wait for kproc_create() to finish, preventing
the LORs.
The change effectively reverts r203072, because we never hand off the
request to newly created nfsiod thread anymore.
Reviewed by: jhb
Tested by: jhb, pluknet
MFC after: 3 weeks
vnode lock and several locks needed during fork, like fd lock.
Instead, schedule the task to be executed in the taskqueue context. We
still waiting for the fork to finish, but the context of the thread
executing the task does not make real LORs with our vnode lock.
Submitted by: pluknet at gmail com
Reviewed by: jhb
Tested by: pho
MFC after: 3 weeks
module that can be used by both the regular and experimental nfs
clients. This fixes the problem reported by jh@ where /dev/nfslock
would be registered twice when both nfs clients were used.
I also defined the size of the lm_fh field to be the correct value,
as it should be the maximum size of an NFSv3 file handle.
Reviewed by: jh
MFC after: 2 weeks
Without this patch it was possible for a different thread that calls
nfs_asyncio() to snitch a newly created nfsiod thread that was
intended for another caller of nfs_asyncio(), because the nfs_iod_mtx
mutex was unlocked while the new nfsiod thread was created. This patch
labels the newly created nfsiod, so that it is not taken by another
caller of nfs_asyncio(). This is believed to fix the problem reported
on the freebsd-stable email list under the subject:
FreeBSD NFS client/Linux NFS server issue.
Tested by: to DOT my DOT trociny AT gmail DOT com
Reviewed by: jhb
MFC after: 2 weeks
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.
timeout occurring at exactly the same time. If this happens, the nfsiod
exits although there may be a queued async IO request for it.
Found by : Kris Kennaway
Approved by: re
client into the kernel by default, and many users won't use NFS,
don't start an extra 4 kernel threads that are unused. Once NFS
becomes active, it will start nfsiod's as it needs them.
We might consider mandating a minimum iod's equal to the number of
active NFS mounts (truncated to some value), which would force some
to remain available without having to create a new one if the file
system is mostly inactive.
PR: 70880
MFC after: 2 weeks
Prodded by: cel
Head nod: peter
Pointed out by: Joe <fbsd_user at a1poweruser dot com>
- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some
memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.
- NFS direct IO completely bypasses the buffer and page caches.
If a file is open for direct IO all caching is disabled.
- Direct IO for Directories will be addressed later.
- 2 new NFS directio related sysctls are added. One is a knob to
disable NFS direct IO completely (direct IO is enabled by default).
The other is to disallow mmaped IO on a file that has at least one
O_DIRECT open (see the comment in nfs_vnops.c for more details).
The default is to allow mmaps on a file that has O_DIRECT opens.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
userland and a dedicated system call to get replies.
The vnode-bypass of fifos broke this into a panic.
Ditch all the magic and create a device /dev/nfslock instead, and
use that for both directions apart from the shorter path, this is
also faster because the device driver runs Giant free using the
vnode bypass.
Noticed by: marcel
This includes a modified form of some code from Thomas Moestl (tmm@)
to properly clean up the UMA zone and the "nfsnodehashtbl" hash
table.
Reviewed By: iedowse
PR: 16299
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.
Reviewed by: jake, peter, jhb
vfs.nfs.iodmaxidle (idle time before nfsiod's exit). Make it adaptive
so that we create nfsiod's on demand and they go away after not being
used for a while. The upper limit is NFS_MAXASYNCDAEMON (currently 20).
More will be done here, but this is a useful checkpoint.
Submitted by: Maxime Henrion <mux@qualys.com>
reference: with td->td_ucred, it will be desirable to authorize
based on td->td_ucred, rather than p->p_ucred.
o Since the same variable 'p' was later used with pfind() on the target
process for the wakeup, introduce a new local variable 'targetp'
to use instead.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
actually in the kernel. This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there. As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).
This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size. This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.
Reviewed by: bde
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.
Pre-rfork code assumed inherent locking of a process's file descriptor
array. However, with the advent of rfork() the file descriptor table
could be shared between processes. This patch closes over a dozen
serious race conditions related to one thread manipulating the table
(e.g. closing or dup()ing a descriptor) while another is blocked in
an open(), close(), fcntl(), read(), write(), etc...
PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
<sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
into vnode dirtyblkhd we append it to the list instead of prepend it to
the list in order to maintain a 'forward' locality of reference, which
is arguably better then 'reverse'. The original algorithm did things this
way to but at a huge time cost.
Enhance the append interlock for NFS writes to handle intr/soft mounts
better.
Fix the hysteresis for NFS async daemon I/O requests to reduce the
number of unnecessary context switches.
Modify handling of NFS mount options. Any given user option that is
too high now defaults to the kernel maximum for that option rather then
the kernel default for that option.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
when returning an error. Bug fix was extracted from the PR. The PR
is not yet entirely resolved by this commit.
PR: kern/13049
Reviewed by: Matt Dillon <dillon@freebsd.org>
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>