Commit Graph

13099 Commits

Author SHA1 Message Date
Konstantin Belousov
7a61281f22 Correct the lock class for the vm object lock.
Reported and tested by:	joel
2013-03-09 10:16:08 +00:00
Alexander Motin
21a37a7196 Rework overflow checks of r247898 to not let too "intelligent" compiler to
optimize it out.

Submitted by:	bde
2013-03-09 09:07:13 +00:00
Attilio Rao
89f6b8632c Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
Andre Oppermann
15ae0c9af9 Move the callout subsystem initialization to its own SYSINIT()
from being indirectly called via cpu_startup()+vm_ksubmap_init().
The boot order position remains the same at SI_SUB_CPU.

Allocation of the callout array is changed to stardard kernel malloc
from a slightly obscure direct kernel_map allocation.

kern_timeout_callwheel_alloc() is renamed to callout_callwheel_init()
to better describe its purpose.
kern_timeout_callwheel_init() is removed simplifying the per-cpu
initialization.

Reviewed by:	davide
2013-03-08 10:37:17 +00:00
Andre Oppermann
f8ccf82a4c Move the auto-sizing of the callout array from init_param2() to
kern_timeout_callwheel_alloc() where it is actually used.

This is a mechanical move and no tuning parameters are changed.

The pre-allocated callout array is only used for legacy timeout(9)
calls and is only allocated and active on cpu0.  Eventually all
remaining users of timeout(9) should switch to the callout_* API.

Reviewed by:	davide
2013-03-08 10:14:58 +00:00
Alexander Motin
836972b877 Fix off-by-one error in nanoseconds validation.
Submitted by:	bde
2013-03-07 16:50:07 +00:00
Ian Lepore
9a2bff7ca6 Call sched_prio() to immediately change the priority of the thread in
response to an rtprio_thread() call, when the priority is different
than the old priority, and either the old or the new priority class is
not RTP_PRIO_NORMAL (timeshare).

The reasoning for the second half of the test is that if it's a change in
timeshare priority, then the scheduler is going to adjust that priority
in a way that completely wipes out the requested change anyway, so
what's the point?  (If that's not true, then allowing a thread to change
its own timeshare priority would subvert the scheduler's adjustments and
let a cpu-bound thread monopolize the cpu; if allowed at all, that
should require priveleges.)

On the other hand, if either the old or new priority class is not
timeshare, then the scheduler doesn't make automatic adjustments, so we
should honor the request and make the priority change right away.  The
reason the old class gets caught up in this is the very reason for this
change:  when thread A changes the priority of its child thread B from
idle back to timeshare, thread B never actually gets moved to a
timeshare-range run queue unless there are some idle cycles available
to allow it to first get scheduled again as an idle thread.

Reviewed by:	jhb@
2013-03-07 02:53:29 +00:00
Alexander Motin
b5ea3779da Reduce minimal time intervals of setitimer(2) from 1/HZ to 1/(16*HZ) by
using callout_reset_sbt() instead of callout_reset().  We can't remove
lower limit completely in this case because of significant processing
overhead, caused by unability to use direct callout execution due to using
process mutex in callout handler for sending SEGALRM signal.  With support
of periodic events that would allow unprivileged user to abuse the system.

Reviewed by:	davide
2013-03-06 22:40:47 +00:00
Alexander Motin
980c545d76 Fix time math overflows and improve zero intervals handling in poll(),
select(), nanosleep() and kevent() functions after calloutng changes.

Reported by:	bde
2013-03-06 19:37:38 +00:00
Fabien Thomas
d49302aead Add a generic way to call per event allocate / release function.
Reviewed by:	mav
MFC after:	1 month
2013-03-05 10:18:48 +00:00
Davide Italiano
ac42a1726a Complete r247813:
Use true/false instead of TRUE/FALSE.

Reported by:	attilio
Requested by:	jhb
2013-03-04 21:52:12 +00:00
Davide Italiano
a4a3ce9919 Use C99 'bool' rather than Machish 'boolean_t'.
Requested by:	jhb
2013-03-04 21:09:22 +00:00
Davide Italiano
40e794ab19 MFcalloutng:
- Rewrite kevent() timeout implementation to allow sub-tick precision.
- Make the interval timings for EVFILT_TIMER more accurate. This also
removes an hack introduced in r238424.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 16:55:16 +00:00
Davide Italiano
cf5e4fe6bb MFcalloutng:
Fix kern_select() and sys_poll() so that they can handle sub-tick
precision for timeouts (in the same fashion it was done for nanosleep()
in r247797).

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 16:41:27 +00:00
Davide Italiano
4601bab1fb MFcalloutng (r244251 with minor changes):
Specify that precision of 0.5s is enough for resource limitation.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 16:25:12 +00:00
Davide Italiano
c38250c9b9 MFcalloutng (r244255 by mav, with minor changes):
Specify that syslog doesn't need exactly 5 wakeups per second.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 16:07:55 +00:00
Davide Italiano
098176f0d0 MFcalloutng:
kern_nanosleep() is now converted to use tsleep_sbt(). With this change
nanosleep() and usleep() can handle sub-tick precision for timeouts.
Also, try to help coalesce of events passing as argument to tsleep_bt()
a precision value calculated as a percentage of the sleep time.
This percentage is default 5%, but it can tuned according to users
need via the sysctl interface.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 15:57:41 +00:00
Davide Italiano
037637812d Fix build with DIAGNOSTIC/CALLOUT_PROFILING options turned on.
Reported by:	kib, David Wolfskill <david at catwhisker dot org>
Pointy-hat to:	davide
2013-03-04 15:03:52 +00:00
Davide Italiano
24e48c6d5b MFcalloutng:
Introduce sbt variants of msleep(), msleep_spin(), pause(), tsleep() in
the KPI, allowing to specify timeout in 'sbintime_t' rather than ticks.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 12:48:41 +00:00
Davide Italiano
461537356a MFcalloutng:
Extend condvar(9) KPI introducing sbt variant of cv_timedwait. This
rely on the previously committed sleepq_set_timeout_sbt().

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 12:20:48 +00:00
Davide Italiano
965ac611ec MFcalloutng:
Convert sleepqueue(9) bits to the new callout KPI. Take advantage of
the possibility to run callback directly from hw interrupt context.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 11:51:46 +00:00
Davide Italiano
dbd2e1677f MFcalloutng (r244355):
Make loadavg calculation callout direct. There are several reasons for it:
 - it is very simple and doesn't worth context switch to SWI;
 - since SWI is no longer used here, we can remove twelve years old hack,
excluding this SWI from from the loadavg statistics;
 - it fixes problem when eventtimer (HPET) shares interrupt with some other
device, and that interrupt thread counted as permanent loadavg of 1; now
loadavg accounted before that interrupt thread is scheduled.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, Fabian Keil, markj
2013-03-04 11:22:19 +00:00
Davide Italiano
5b999a6be0 - Make callout(9) tickless, relying on eventtimers(4) as backend for
precise time event generation. This greatly improves granularity of
callouts which are not anymore constrained to wait next tick to be
scheduled.
- Extend the callout KPI introducing a set of callout_reset_sbt* functions,
which take a sbintime_t as timeout argument. The new KPI also offers a
way for consumers to specify precision tolerance they allow, so that
callout can coalesce events and reduce number of interrupts as well as
potentially avoid scheduling a SWI thread.
- Introduce support for dispatching callouts directly from hardware
interrupt context, specifying an additional flag. This feature should be
used carefully, as long as interrupt context has some limitations
(e.g. no sleeping locks can be held).
- Enhance mechanisms to gather informations about callwheel, introducing
a new sysctl to obtain stats.

This change breaks the KBI. struct callout fields has been changed, in
particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t'
(8 bytes) and another 'sbintime_t' field was added for precision.

Together with:	mav
Reviewed by:	attilio, bde, luigi, phk
Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo (amd64, sparc64), marius (sparc64), ian (arm),
		markj (amd64), mav, Fabian Keil
2013-03-04 11:09:56 +00:00
Pawel Jakub Dawidek
8cb539f18f For some reason when I started to pass filedescent structures instead of
pointers to the file structure receiving descriptors stopped to work when also
at least few kilobytes of data is being send. In the kernel the
soreceive_generic() function doesn't see control mbuf as the first mbuf and
unp_externalize() is never called, first 6(?) kilobytes of data is missing as
well on receiving end.

This breaks for example tmux.

I don't know yet why going from 8 bytes to sizeof(struct filedescent) per
descriptor (or even to 16 bytes per descriptor) breaks things, but to
work-around it for now use 8 bytes per file descriptor at the cost of memory
allocation.

Reported by:	flo, Diane Bruce, Jan Beich <jbeich@tormail.org>
Simple testcase provided by:	mjg
2013-03-03 23:39:30 +00:00
Pawel Jakub Dawidek
5f39e56581 Use dedicated malloc type for filecaps-related data, so we can detect any
memory leaks easier.
2013-03-03 23:25:45 +00:00
Pawel Jakub Dawidek
a6157c3d61 Plug memory leaks in file descriptors passing. 2013-03-03 23:23:35 +00:00
Davide Italiano
3f555c45eb callwheelmask and callwheelsize are always greater than zero.
Switch their type to u_int.
2013-03-03 15:01:33 +00:00
Davide Italiano
0fb285b716 Remove a couple of unused include. 2013-03-03 14:47:02 +00:00
Alexander Motin
4514d6fa18 MFcalloutng:
Some whitespace fixes.
2013-03-03 09:11:24 +00:00
Pawel Jakub Dawidek
378a73d1bd Regen after r247667. 2013-03-02 21:12:54 +00:00
Pawel Jakub Dawidek
7493f24ee6 - Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
	int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

  which allow to bind and connect respectively to a UNIX domain socket with a
  path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
  the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
  in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwatson, jilles, kib, des
2013-03-02 21:11:30 +00:00
Pawel Jakub Dawidek
6d4e99aaef If the target file already exists, check for the CAP_UNLINKAT capabiity right
on the target directory descriptor, but only if this is renameat(2) and real
target directory descriptor is given (not AT_FDCWD). Without this fix regular
rename(2) fails if the target file already exists.

Reported by:	Michael Butler <imb@protected-networks.net>
Reported by:	Larry Rosenman <ler@lerctr.org>
Sponsored by:	The FreeBSD Foundation
2013-03-02 09:58:47 +00:00
Pawel Jakub Dawidek
1dc31587bf Regen after r247602. 2013-03-02 00:55:09 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
John Baldwin
f9379dc411 Replace the TDP_NOSLEEPING flag with a counter so that the
THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest.

Reviewed by:	attilio
2013-03-01 22:03:31 +00:00
Pawel Jakub Dawidek
71ac38e896 Remove unnecessary variables. 2013-03-01 21:58:56 +00:00
Pawel Jakub Dawidek
f4d0191b22 Reduce lock scope a little. 2013-03-01 21:57:02 +00:00
Marius Strobl
db9066f798 - Use strdup(9) instead of reimplementing it.
- Use __DECONST instead of strange casts.
- Reduce code duplication and simplify name2oid().

PR:		176373
Submitted by:	Christoph Mallon
MFC after:	1 week
2013-03-01 18:49:14 +00:00
Konstantin Belousov
58248e57ab Make the default implementation of the VOP_VPTOCNP() fail if the
directory entry, matched by the inode number, is ".".

NFSv4 client might instantiate the distinct vnodes which have the same
inode number, since single v4 export can be combined from several
filesystems on the server.  For instance, a case when the nested
server mount point is exactly one directory below the top of the
export, causes directory and its parent to have the same inode number
2.  The vop_stdvptocnp() algorithm then returns "." as the name of the
lower directory.

Filtering out the "." entry with ENOENT works around this behaviour,
the error forces getcwd(3) to fall back to usermode implementation,
which compares both st_dev and st_ino.

Based on the submission by:	rmacklem
Tested by:	rmacklem
MFC after:	1 week
2013-03-01 18:40:14 +00:00
Davide Italiano
e234a588cb MFcalloutng:
Style fixes.
2013-02-28 16:22:49 +00:00
Alexander Motin
fdc5dd2d2f MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
Davide Italiano
acccf7d8b4 MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.
2013-02-28 10:46:54 +00:00
Konstantin Belousov
20f4e3e158 Make recursive getblk() slightly more useful. Keep the buffer state
intact if getblk() is done on the already owned buffer.  Exit from
brelse() early when the lock recursion is detected, otherwise brelse()
might prematurely destroy the buffer under some circumstances.

Sponsored by:	The FreeBSD Foundation
Noted by:	mckusick
Tested by:	pho
MFC after:	2 weeks
2013-02-27 07:34:09 +00:00
Alexander Motin
1af19ee4a2 Add support for good old 8192Hz profiling clock to software PMC.
Reviewed by:	fabient
2013-02-26 18:13:42 +00:00
Attilio Rao
590f9303e5 Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by:	EMC / Isilon storage division
Tested by:	pho
Reviewed by:	alc
2013-02-26 01:00:11 +00:00
Pawel Jakub Dawidek
1d59211b2e Style.
Suggested by:	kib
2013-02-25 20:51:29 +00:00
Pawel Jakub Dawidek
893365e42d After r237012, the fdgrowtable() doesn't drop the filedesc lock anymore,
so update a stale comment.

Reviewed by:	kib, keramida
2013-02-25 20:50:08 +00:00
John Baldwin
593efaf9f7 Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep().  In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing.  Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag.  Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed.  For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
  the request as being on behalf of the thread performing the lookup
  (cnp_thread) rather than using a NULL thread pointer.  This causes
  NFS to properly handle signals during this VOP on an interruptible
  mount.

PR:		kern/176179
Reported by:	Russell Cattelan (sigdeferstop() recursion)
Reviewed by:	kib
MFC after:	1 month
2013-02-21 19:02:50 +00:00
Jamie Gritton
ffc72591b1 Don't worry if a module is already loaded when looking for a fstype to mount
(possible in a race condition).

Reviewed by:	kib
MFC after:	1 week
2013-02-21 02:41:37 +00:00
John Baldwin
353374b525 Fix a few typos. 2013-02-19 16:35:27 +00:00