Commit Graph

2442 Commits

Author SHA1 Message Date
kib
c8fe5d3045 Do not leak vnode lock when msdosfs mount is updated and specified
device is different from the device used to the original mount.

Note that update_mp does not need devvp locked, and pmp->pm_devvp cannot
be freed meantime.

Reported and tested by:	pho
MFC after:	3 weeks
2010-03-02 17:24:33 +00:00
kib
7c1461258e Only destroy pm_fatlock on error if it was initialized.
MFC after:	3 weeks
2010-03-02 11:02:59 +00:00
kib
6f730a76e6 Mark msdosfs as mpsafe.
Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:19:22 +00:00
kib
ea5c9b9023 Fix the race between dotdot lookup and forced unmount, by using
msdosfs-specific variant of vn_vget_ino(), msdosfs_deget_dotdot().

As was done for UFS, relookup the dotdot denode after the call to
msdosfs_deget_dotdot(), because vnode lock is dropped and directory
might be moved.

Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:17:29 +00:00
kib
9f07095877 Use pm_fatlock to protect per-filesystem rb tree used to allocate fileno
on the large FAT volumes. Previously, a single global mutex was used.

Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:16:43 +00:00
kib
4521136655 Add assertions for FAT bitmap state.
Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:15:45 +00:00
kib
eb2b3c8672 Use pm_fatlock to protect fat bitmap.
Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:13:59 +00:00
kib
e272e9af75 Add per-mountpoint lockmgr lock for msdosfs. It is intended to be used
as fat bitmap lock and to replace global mutex protecting fileno rbtree.

Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:13:07 +00:00
kib
8b37b1e408 In msdosfs deget(), properly handle the case when the vnode is found in hash.
Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:11:31 +00:00
kib
11cf2ecaee In msdosfs_inactive(), reclaim the vnodes both for SLOT_DELETED and
SLOT_EMPTY deName[0] values. Besides conforming to FAT specification, it
also clears the issue where vfs_hash_insert found the vnode in hash, and
newly allocated vnode is vput()ed. There, deName[0] == 0, and vnode is
not reclaimed, indefinitely kept on mountlist.

Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:10:41 +00:00
kib
21f579c3fb Remove seemingly unneeded unlock/relock of the dvp in msdosfs_rmdir,
causing LOR.

Reported and tested by:	pho
MFC after:	3 weeks
2010-02-28 17:09:09 +00:00
kib
cc5d11fa86 Assert that the msdosfs vnode is (e)locked in several places.
The plan is to use vnode lock to protect denode and fat cache,
and having separate lock for block use map.

Change the check and return on impossible condition into KASSERT().

Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:07:49 +00:00
kib
9be56bb3cb Remove unused global statistic about fat cache usage.
Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:06:42 +00:00
uqs
f41a820b03 Fix common misspelling of hierarchy
Pointed out by:		bf1783 at gmail
Approved by:		np (cxgb), kientzle (tar, etc.), philip (mentor)
2010-02-20 10:19:19 +00:00
kib
3dd6e87cb0 Invalid filesystem might cause the bp to be never read.
Noted by:	Pedro F. Giffuni <giffunip tutopia com>
Obtanined from:	NetBSD
MFC after:	1 week
2010-02-14 12:10:49 +00:00
rmacklem
2d2c919325 Change the default value for vfs.newnfs.enable_locallocks to 0 for
the experimental NFS server, since local locking is known to be
broken and the patch to fix it is still a work in progress.

MFC after:	5 days
2010-02-14 00:18:32 +00:00
rmacklem
f9c823994c This fixes the experimental NFS server so that it won't crash in the
caching code for IPv6 by fixing a typo that used the incorrect variable.
It also fixes the indentation of the statement above it.

Reported by:	simon AT comsys.ntu-kpi.kiev.ua
MFC after:	5 days
2010-02-13 23:56:19 +00:00
kib
43f4e3729a Fix function name in the comment in the second location too.
Submitted by:	ed
MFC after:	1 week
2010-02-13 12:50:09 +00:00
kib
b6bb83ada4 - Add idempotency guards so the structures can be used in other utilities.
- Update bpb structs with reserved fields.
- In direntry struct join deName with deExtension. Although a
  fix was attempted in the past, these fields were being overflowed,
  Now this is consistent with the spec, and we can now share the
  WinChksum code with NetBSD.

Submitted by:	Pedro F. Giffuni <giffunip tutopia com>
Mostly obtained from:	NetBSD
Reviewed by:	bde
MFC after:	2 weeks
2010-02-13 12:41:07 +00:00
kib
b99a62d97c Use M_ZERO instead of calling bzero().
Fix function name in the comment.

MFC after:	1 week
2010-02-13 12:11:03 +00:00
kib
5fa4d4f819 Remove unused macros.
MFC after:	1 week
2010-02-13 11:34:25 +00:00
rmacklem
f54e0fab4c Patch the experimental NFS client so that there is a timeout
for negative name cache entries in a manner analogous to
r202767 for the regular NFS client. Also, make the code in
nfs_lookup() compatible with that of the regular client
and replace the sysctl variable that enabled negative name
caching with the mount point option.

MFC after:	2 weeks
2010-01-31 19:12:24 +00:00
ed
39978ea638 Properly use dev_refl()/dev_rel() in kern.devname.
While there, perform some clean-up fixes. Update some stale comments on
struct cdev * instead of dev_t and devfs_random(). Also add some missing
whitespace.

MFC after:	1 week
2010-01-31 15:19:16 +00:00
jh
cf48230780 Add "maxfilesize" mount option for tmpfs to allow specifying the
maximum file size limit. Default is UINT64_MAX when the option is
not specified. It was useless to set the limit to the total amount of
memory and swap in the system.

Use tmpfs_mem_info() rather than get_swpgtotal() in tmpfs_mount() to
check if there is enough memory available.

Remove now unused get_swpgtotal().

Reviewed by:	Gleb Kurtsou
Approved by:	trasz (mentor)
2010-01-29 12:09:14 +00:00
rmacklem
c20e37e81a Patch the experimental NFS client in a manner analogous to
r203072 for the regular NFS client. Also, delete two fields
of struct nfsmount that are not used by the FreeBSD port of
the client.

MFC after:	2 weeks
2010-01-28 16:17:24 +00:00
trasz
de024c2452 Don't touch v_interlock; use VI_* macros instead. 2010-01-27 19:30:44 +00:00
marius
6d095a5fb7 On LP64 struct ifid is 64-bit aligned while struct fid is 32-bit aligned
so on architectures with strict alignment requirements we can't just simply
cast the latter to the former but need to copy it bytewise instead.

PR:		143010
MFC after:	3 days
2010-01-23 22:38:01 +00:00
jh
03418e0c50 Truncate read request rather than returning EIO if the request is
larger than MAXPHYS + 1. This fixes a problem with cat(1) when it
uses a large I/O buffer.

Reported by:	Fernando Apesteguía
Suggested by:	jilles
Reviewed by:	des
Approved by:	trasz (mentor)
2010-01-22 08:45:12 +00:00
jh
c1e8eaa2ba - Change the type of nodes_max to u_int and use "%u" format string to
convert its value. [1]
- Set default tm_nodes_max to min(pages + 3, UINT32_MAX). It's more
  reasonable than the old four nodes per page (with page size 4096) because
  non-empty regular files always use at least one page. This fixes possible
  overflow in the calculation. [2]
- Don't allow more than tm_nodes_max nodes allocated in tmpfs_alloc_node().

PR:		kern/138367
Suggested by:	bde [1], Gleb Kurtsou [2]
Approved by:	trasz (mentor)
2010-01-20 16:56:20 +00:00
lulf
4c036705bb Revert parts of r202283:
- Return EOPNOTSUPP before EROFS to be consistent with other filesystems.
- Fix setting of the nodump flag for users without PRIV_VFS_SYSFLAGS privilege.

Submitted by:	jh@
2010-01-18 19:09:16 +00:00
lulf
8ab7715033 Bring in the ext2fs work done by Aditya Sarawgi during and after Google Summer
of Code 2009:

- BSDL block and inode allocation policies for ext2fs. This involves the use
  FFS1 style block and inode allocation for ext2fs. Preallocation was removed
  since it was GPL'd.
- Make ext2fs MPSAFE by introducing locks to per-mount datastructures.
- Fixes for kern/122047 PR.
- Various small bugfixes.
- Move out of gnu/ directory.

Sponsored by:   Google Inc.
Submitted by:	Aditya Sarawgi <sarawgi.aditya AT SPAMFREE gmail DOT com>
2010-01-14 14:30:54 +00:00
jh
37c0c9cce1 - Fix some style bugs in tmpfs_mount(). [1]
- Remove a stale comment about tmpfs_mem_info() 'total' argument.

Reported by:	bde [1]
2010-01-13 14:17:21 +00:00
brooks
a880630c51 Update the comment on printing group membership to reflect that fact
that each groupt the process is a member of is printed rather than an
entry for each group the user could be a member of.

MFC after:	3 days
2010-01-09 23:23:52 +00:00
trasz
c819c5cab4 Remove unused smbfs_smb_qpathinfo(). 2010-01-08 15:53:07 +00:00
jh
0f220ff694 - Change the type of size_max to u_quad_t because its value is converted
with vfs_scanopt(9) using the "%qu" format string.
- Limit the maximum value of size_max to (SIZE_MAX - PAGE_SIZE) to
  prevent overflow in howmany() macro.

PR:		kern/141194
Approved by:	trasz (mentor)
MFC after:	2 weeks
2010-01-08 07:57:43 +00:00
rmacklem
b51bb2125d The test for "same client" for the experimental nfs server over NFSv4
was broken w.r.t. byte range lock conflicts when it was the same client
and the request used the open_to_lock_owner4 case, since lckstp->ls_clp
was not set. This patch fixes it by using "clp" instead of "lckstp->ls_clp".

MFC after:	2 weeks
2010-01-03 20:08:10 +00:00
rmacklem
a507f603a7 Fix three related problems in the experimental nfs client when
checking for conflicts w.r.t. byte range locks for NFSv4.
1 - Return 0 instead of EACCES when a conflict is found, for F_GETLK.
2 - Check for "same file" when checking for a conflict.
3 - Don't check for a conflict for the F_UNLCK case.
2010-01-03 18:27:10 +00:00
rmacklem
71291348da Fix the experimental NFS client so that it can create Unix
domain sockets on an NFSv4 mount point. It was generating
incorrect XDR in the request for this case.

Tested by:	infofarmer
MFC after:	2 weeks
2009-12-31 18:02:48 +00:00
rmacklem
155f429095 When porting the experimental nfs subsystem to the FreeBSD8 krpc,
I added 3 functions that were already in the experimental client
under different names. This patch deletes the functions in the
experimental client and renames the calls to use the other set.
(This is just removal of duplicated code and does not fix any bug.)

MFC after:	2 weeks
2009-12-26 19:15:15 +00:00
rmacklem
0ccdae7b7d Modify the experimental server so that it uses VOP_ACCESSX().
This is necessary in order to enable NFSv4 ACL support. The
argument to nfsvno_accchk() was changed to an accmode_t and
the function nfsrv_aclaccess() was no longer needed and,
therefore, deleted.

Reviewed by:	trasz
MFC after:	2 weeks
2009-12-25 20:44:19 +00:00
ed
449c6ac843 Let access overriding to TTYs depend on the cdev_priv, not the vnode.
Basically this commit changes two things, which improves access to TTYs
in exceptional conditions. Basically the problem was that when you ran
jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the
node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if
you want to attach to screens quickly, use ssh(1), etc.

The fixes:

- Cache the cdev_priv of the controlling TTY in struct session. Change
  devfs_access() to compare against the cdev_priv instead of the vnode.
  This allows you to bypass UNIX permissions, even across different
  mounts of devfs.

- Extend devfs_prison_check() to unconditionally expose the device node
  of the controlling TTY, even if normal prison nesting rules normally
  don't allow this. This actually allows you to interact with this
  device node.

To be honest, I'm not really happy with this solution. We now have to
store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp).
In an ideal world, we should just get rid of the latter two and only use
s_ttyp, but this makes certian pieces of code very impractical (e.g.
devfs, kern_exit.c).

Reported by:	Many people
2009-12-19 18:42:12 +00:00
delphij
dd5a134c1b Allow using IPv6 in nfsrvd_sentcache() callback.
PR:		kern/141289
Submitted by:	Petr Lampa <lampa fit vutbr cz>
Approved by:	rmacklem
MFC after:	1 week
2009-12-08 23:43:50 +00:00
guido
8de2ee35f1 Fix ntfs such that it understand media with a non-512-bytes sector size:
1. Fixups are always done on 512 byte chunks (in stead of sectors). This
is kind of stupid.
2. Conevrt between NTFS blocknumbers (the blocksize equals the media
sector size) and the bread() and getblk() blocknr (which are 512-byte
sized)

NB: this change should not affect ntfs for 512-byte sector sizes.
2009-12-07 15:15:08 +00:00
trasz
67e0aa531b Remove unneeded ifdefs.
Reviewed by:	rmacklem
2009-12-03 18:03:42 +00:00
trasz
fe87d5a624 Don't use ap->a_td->td_ucred when we were passed ap->a_cred. 2009-12-02 18:09:22 +00:00
rmacklem
5ca6063e7a Modify the experimental nfs server so that it falls back to
using VOP_LOOKUP() when VFS_VGET() returns EOPNOTSUPP in the
ReaddirPlus RPC. This patch is based upon one by pjd@ for the
regular nfs server which has not yet been committed. It is needed
when a ZFS volume is exported and ReaddirPlus (which almost
always happens for NFSv4) is performed by a client. The patch
also simplifies vnode lock handling somewhat.

MFC after:	2 weeks
2009-11-23 16:08:15 +00:00
rmacklem
483d9d96d0 Patch the experimental NFS server is a manner analagous to
r197525, so that the creation verifier is handled correctly
in va_atime for 64bit architectures. There were two problems.
One was that the code incorrectly assumed that
sizeof (struct timespec) == 8 and the other was that the tv_sec
field needs to be assigned from a signed 32bit integer, so that
sign extension occurs on 64bit architectures. This is required
for correct operation when exporting ZFS volumes.

Reviewed by:	pjd
MFC after:	2 weeks
2009-11-20 21:21:13 +00:00
jh
2727bc045c Create verifier used by FreeBSD NFS client is suboptimal because the
first part of a verifier is set to the first IP address from
V_in_ifaddrhead list. This address is typically the loopback address
making the first part of the verifier practically non-unique. The second
part of the verifier is initialized to zero making its initial value
non-unique too.

This commit changes the strategy for create verifier initialization:
just initialize it to a random value. Also move verifier handling into
its own function and use a mutex to protect the variable.

This change is a candidate for porting to sys/nfsclient.

Reviewed by:	jhb, rmacklem
Approved by:	trasz (mentor)
2009-11-11 15:43:07 +00:00
attilio
e3a5598e1c - Improve comments about locking of the "struct fifoinfo" which is a bit
unclear.
- Fix a memory leak [0]

[0] Diagnosed by:	Dorr H. Clark <dclark at engr dot scu dot edu>
MFC:	1 week
2009-11-06 22:29:46 +00:00
alc
33ab51801e There is no need to "busy" a page when the object is locked for the duration
of the operation.
2009-10-26 18:02:05 +00:00
ru
e866365bb9 Spell DIAGNOSTIC correctly. 2009-10-24 18:49:17 +00:00
jh
7a2f24399a Unloading of the nfscl module is unsupported because newnfslock doesn't
support unloading. It's not trivial to implement newnfslock unloading so
for now just admit that unloading is unsupported and refuse to attempt
unload in all nfscl module event handlers.

Reviewed by:	rmacklem
Approved by:	trasz (mentor)
2009-10-20 15:06:18 +00:00
jh
d857c58c98 Fix ordering of nfscl_modevent() and ncl_uninit(). nfscl_modevent() must
be called after ncl_uninit() when unloading the nfscl module because
ncl_uninit() uses ncl_iod_mutex which is destroyed in nfscl_modevent().

Reviewed by:	rmacklem
Approved by:	trasz (mentor)
2009-10-20 15:01:46 +00:00
jh
daadfe0f12 Fix comment typos.
Reviewed by:	rmacklem
Approved by:	trasz (mentor)
2009-10-20 14:57:26 +00:00
delphij
3af844143a Add locking around access to parent node, and bail out when the parent
node is already freed rather than panicking the system.

PR:		kern/122038
Submitted by:	gk
Tested by:	pho
MFC after:	1 week
2009-10-11 07:03:56 +00:00
delphij
9fa7d8b89b Add a special workaround to handle UIO_NOCOPY case. This fixes data
corruption observed when sendfile() is being used.

PR:		kern/127213
Submitted by:	gk
MFC after:	2 weeks
2009-10-07 23:17:15 +00:00
delphij
df8f450648 Fix a bug that causes the fsx test case of mmap'ed page being out of sync
of read/write, inspired by ZFS's counterpart.

PR:		kern/139312
Submitted by:	gk@
MFC after:	1 week
2009-10-04 10:38:04 +00:00
trasz
d5661d631d Provide default implementation for VOP_ACCESS(9), so that filesystems which
want to provide VOP_ACCESSX(9) don't have to implement both.  Note that
this commit makes implementation of either of these two mandatory.

Reviewed by:	kib
2009-10-01 17:22:03 +00:00
trasz
1a81c5cfff Fix typo in the comment. 2009-09-30 18:50:50 +00:00
kib
08eef7d34c Add per-process osrel node to the procfs, to allow read and set p_osrel
value for the process.

Approved by:	des (procfs maintainer)
MFC after:	3 weeks
2009-09-23 12:08:08 +00:00
rwatson
53eaed07bc Use C99 initialization for struct filterops.
Obtained from:	Mac OS X
Sponsored by:	Apple Inc.
MFC after:	3 weeks
2009-09-12 20:03:45 +00:00
rmacklem
2f0e817202 Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs
vnodes, since these nodes are not linked into the mount queue and,
as such, the vn_lock() cannot cause a deadlock so LORs are harmless.

Suggested by:	kib
Approved by:	kib (mentor)
MFC after:	3 days
2009-09-09 20:37:49 +00:00
phk
5297261651 Revert previous commit and add myself to the list of people who should
know better than to commit with a cat in the area.
2009-09-08 13:19:05 +00:00
phk
2314521104 Add necessary include. 2009-09-08 13:16:55 +00:00
kib
8aa65727e7 If a race is detected, pfs_vncache_alloc() may reclaim a vnode that had
never been inserted into the pfs_vncache list. Since pfs_vncache_free()
does not anticipate this case, it decrements pfs_vncache_entries
unconditionally; if the vnode was not in the list, pfs_vncache_entries
will no longer reflect the actual number of list entries. This may cause
size of the cache to exceed the configured maximum. It may also trigger
a panic during module unload or system shutdown.

Do not decrement pfs_vncache_entries for the vnode that was not in the
list.

Submitted by:	tegge
Reviewed by:	des
MFC after:	1 week
2009-09-07 12:10:41 +00:00
kib
30f476628e insmntque_stddtr() clears vp->v_data and resets vp->v_op to
dead_vnodeops before calling vgone(). Revert r189706 and corresponding
part of the r186560.

Noted and reviewed by:	tegge
Approved by:	des (pseudofs part)
MFC after:	3 days
2009-09-07 11:55:34 +00:00
kib
06073946fa Remove spurious pfs_unlock().
PR:	kern/137310
Reviewed by:	des
MFC after:	3 days
2009-08-31 09:26:04 +00:00
jilles
0b9c3c2f3d Fix poll() on half-closed sockets, while retaining POLLHUP for fifos.
This reverts part of r196460, so that sockets only return POLLHUP if both
directions are closed/error. Fifos get POLLHUP by closing the unused
direction immediately after creating the sockets.

The tools/regression/poll/*poll.c tests now pass except for two other things:
- if POLLHUP is returned, POLLIN is always returned as well instead of only
  when there is data left in the buffer to be read
- fifo old/new reader distinction does not work the way POSIX specs it

Reviewed by:	kib, bde
2009-08-25 21:44:14 +00:00
zec
19bb42c468 Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by:	rmacklem (in part)
Reviewed by:	rmacklem, rwatson
Discussed with:	dfr, bz
Approved by:	re (rwatson), julian (mentor)
MFC after:	3 days
2009-08-24 10:09:30 +00:00
rmacklem
ea9d3c1b1e Apply the same patch as r196205 for nfs_upgrade_lock() and
nfs_downgrade_lock() to the experimental nfs client.

Approved by:	re (kensmith), kib (mentor)
2009-08-17 16:12:28 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
jhb
03d158678f Fix some LORs between vnode locks and filedescriptor table locks.
- Don't grab the filedesc lock just to read fd_cmask.
- Drop vnode locks earlier when mounting the root filesystem and before
  sanitizing stdin/out/err file descriptors during execve().

Submitted by:	kib
Approved by:	re (rwatson)
MFC after:	1 week
2009-07-31 13:40:06 +00:00
rmacklem
1aa3b666bc Fix the experimental nfs client so that it only calls ncl_vinvalbuf()
for NFSv2 and not NFSv4 when nfscl_mustflush() returns 0. Since
nfscl_mustflush() only returns 0 when there is a valid write delegation
issued to the client, it only affects the case of an NFSv4 mount with
callbacks/delegations enabled.

Approved by:	 re (kensmith), kib (mentor)
2009-07-29 14:50:31 +00:00
jhb
44220d7e1e Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
rmacklem
fc8aed6a34 When vfs.newnfs.callback_addr is set to an IPv4 address, the
experimental NFSv4 client might try and use it as an IPv6 address,
breaking callbacks. The fix simply initializes the isinet6 variable
for this case.

Approved by:	re (kensmith), kib (mentor)
2009-07-22 18:10:44 +00:00
rmacklem
d00a1eab14 Add changes to the experimental nfs client to use the PBDRY flag for
msleep(9) when a vnode lock or similar may be held. The changes are
just a clone of the changes applied to the regular nfs client by
r195703.

Approved by:	re (kensmith), kib (mentor)
2009-07-22 14:37:53 +00:00
rmacklem
ce2a12d7ea When using an NFSv4 mount in the experimental nfs client with delegations
being issued from the server, there was a case where an Open issued locally
based on the delegation would be released before the associated vnode
became inactive. If the delegation was recalled after the open was released,
an Open against the server would not have been acquired and subsequent I/O
operations would need to use the special stateid of all zeros. This patch
fixes that case.

Approved by:	re (kensmith), kib (mentor)
2009-07-22 14:32:28 +00:00
rmacklem
dcc7a6b868 Fix two bugs in the experimental nfs client:
- When the root vnode was acquired during mounting, mnt_stat.f_iosize was
  still set to 0, so getnewvnode() would set bo_bsize == 0. This would
  confuse getblk(), so that it always returned the first block causing
  the problem when the root directory of the mount point was greater
  than one block in size. It was fixed by setting mnt_stat.f_iosize to
  NFS_DIRBLKSIZ before calling ncl_nget() to acquire the root vnode.
- NFSMNT_INT was being set temporarily while the initial connect to a
  server was being done. This erroneously configured the krpc for
  interruptible RPCs, which caused problems because signals weren't
  being masked off as they would have been for interruptible mounts.
  This code was deleted to fix the problem. Since mount_nfs does an
  NFS null RPC before the mount system call, connections to the server
  should work ok.

Tested by:	swell dot k at gmail dot com
Approved by:	re (kensmith), kib (mentor)
2009-07-19 16:44:26 +00:00
rmacklem
7d9d8e91f8 Fix the experimental nfs client so that it does not cause a
"share->excl" panic when doing a lookup of dotdot at the root
of a server's file system. The patch avoids calling vn_lock()
for that case, since nfscl_nget() has already acquired a lock
for the vnode.

Approved by:	re (kensmith), kib (mentor)
2009-07-14 23:10:23 +00:00
rwatson
57ca4583e7 Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
rmacklem
a3e6ec19e5 Add calls to the experimental nfs client for the case of an "intr" mount,
so that signals that aren't supposed to terminate RPCs in progress are
masked off during the RPC.

Approved by:	re (kensmith), kib (mentor)
2009-07-12 17:07:35 +00:00
rmacklem
049621beb5 Fix the handling of dotdot in lookup for the experimental nfs client
in a manner analagous to the change in r195294 for the regular nfs client.

Approved by:	re (kensmith), kib (mentor)
2009-07-12 17:02:17 +00:00
rmacklem
224efd8f1b Since the nfscl_getclose() function both decremented open counts and,
optionally, created a separate list of NFSv4 opens to be closed, it
was possible for the associated OpenOwner to be free'd before the Open
was closed. The problem was that the Open was taken off the OpenOwner
list before the Close RPC was done and OpenOwners can be free'd once the
list is empty. This patch separates out the case of doing the Close RPC
into a separate function called nfscl_doclose() and simplifies nfsrpc_doclose()
so that it closes a single open instead of a list of them. This avoids
removing the Open from the OpenOwner list before doing the Close RPC.

Approved by:	re (kensmith), kib (mentor)
2009-07-09 19:00:29 +00:00
kib
6c6bda868d Fix poll(2) and select(2) for named pipes to return "ready for read"
when all writers, observed by reader, exited. Use writer generation
counter for fifo, and store the snapshot of the fifo generation in the
f_seqcount field of struct file, that is otherwise unused for fifos.
Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is
equal to fifo' fi_wgen, and revert r89376.

Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them.
Note that the patch does not fix not returning POLLHUP for fifos.

PR:	kern/94772
Submitted by:	bde (original version)
Reviewed by:	rwatson, jilles
Approved by:	re (kensmith)
MFC after:	6 weeks (might be)
2009-07-07 09:43:44 +00:00
kib
350f96b4bf In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point
around the sequence that drop vnode lock and then busies the mount point.
Not having vlocked node or direct reference to the mp allows for the
forced unmount to proceed, making mp unmounted or reused.

Tested by:	pho
Reviewed by:	jeff
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-02 18:02:55 +00:00
kib
a7a5954511 Change the type of uio_resid member of struct uio from int to ssize_t.
Note that this does not actually enable full-range i/o requests for
64 architectures, and is done now to update KBI only.

Tested by:	pho
Reviewed by:	jhb, bde (as part of the review of the bigger patch)
2009-06-25 18:46:30 +00:00
rwatson
ea70a3542d Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support.  As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done.  This means that one
class of reader-writer races still exists.

MFC after:      6 weeks
Reviewed by:    bz
2009-06-25 11:52:33 +00:00
kib
fa686c638e Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
kib
2fc79768f3 Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.

This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.

Reviewed by:	rwatson
2009-06-21 19:21:01 +00:00
rdivacky
51f7852228 In non-debugging mode make this define (void)0 instead of nothing. This
helps to catch bugs like the below with clang.

	if (cond);		<--- note the trailing ;
	   something();

Approved by:	ed (mentor)
Discussed on:	current@
2009-06-21 08:36:30 +00:00
rmacklem
7dbb188309 Replace RPCAUTH_UNIXGIDS with NFS_MAXGRPS so that nfscbd.c will build.
Approved by:	kib (mentor)
2009-06-20 17:11:07 +00:00
ed
63a4c7f522 Improve nested jail awareness of devfs by handling credentials.
Now that we start to use credentials on character devices more often
(because of MPSAFE TTY), move the prison-checks that are in place in the
TTY code into devfs.

Instead of strictly comparing the prisons, use the more common
prison_check() function to compare credentials. This means that
pseudo-terminals are only visible in devfs by processes within the same
jail and parent jails.

Even though regular users in parent jails can now interact with
pseudo-terminals from child jails, this seems to be the right approach.
These processes are also capable of interacting with the jailed
processes anyway, through signals for example.

Reviewed by:	kib, rwatson (older version)
2009-06-20 14:50:32 +00:00
rmacklem
40e33db11b Change the size of the nfsc_groups[] array in the experimental nfs
client to RPCAUTH_UNIXGIDS + 1 (17), since that is what can go on
the wire for AUTH_SYS authentication.

Reviewed by:	brooks
Approved by:	kib (mentor)
2009-06-20 00:54:57 +00:00
brooks
f53c1c309d Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively.  (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer.  Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively.  Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary.  In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups.  When feasible, truncate
the group list rather than generating an error.

Minor changes:
  - Reduce the number of hand rolled versions of groupmember().
  - Do not assign to both cr_gid and cr_groups[0].
  - Modify ipfw to cache ucreds instead of part of their contents since
    they are immutable once referenced by more than one entity.

Submitted by:	Isilon Systems (initial implementation)
X-MFC after:	never
PR:		bin/113398 kern/133867
2009-06-19 17:10:35 +00:00
alc
f0cb2be073 Fix some of the style errors in *getpages(). 2009-06-18 05:56:24 +00:00
rmacklem
08a5e53744 Add the SVC_RELEASE(xprt), as required by r194407.
Approved by:	kib (mentor)
2009-06-17 22:55:59 +00:00
bz
48dc6805f8 Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.
2009-06-17 15:01:01 +00:00
rmacklem
f0ea4f4c9f Fix handling of ".." in nfs_lookup() for the forced dismount case
by cribbing the change made to the regular nfs client in r194358.

Approved by:	kib (mentor)
2009-06-17 14:10:18 +00:00
bz
b017972f11 Add the explicit include of vimage.h to another five .c files still
missing it.

Remove the "hidden" kernel only include of vimage.h from ip_var.h added
with the very first Vimage commit r181803 to avoid further kernel poisoning.
2009-06-17 12:44:11 +00:00
rmacklem
2afe948156 Remove the "int *" typecast for the aresid argument to vn_rdwr()
and change the type of the argument from size_t to int. This
should avoid issues on 64bit architectures.

Suggested by:	kib
Approved by:	kib (mentor)
2009-06-16 13:52:21 +00:00