Commit Graph

2305 Commits

Author SHA1 Message Date
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
Konstantin Belousov
c808c9632d Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.

This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.

Reviewed by:	rwatson
2009-06-21 19:21:01 +00:00
Roman Divacky
23057f089b In non-debugging mode make this define (void)0 instead of nothing. This
helps to catch bugs like the below with clang.

	if (cond);		<--- note the trailing ;
	   something();

Approved by:	ed (mentor)
Discussed on:	current@
2009-06-21 08:36:30 +00:00
Rick Macklem
65cc6600c5 Replace RPCAUTH_UNIXGIDS with NFS_MAXGRPS so that nfscbd.c will build.
Approved by:	kib (mentor)
2009-06-20 17:11:07 +00:00
Ed Schouten
f8f6146082 Improve nested jail awareness of devfs by handling credentials.
Now that we start to use credentials on character devices more often
(because of MPSAFE TTY), move the prison-checks that are in place in the
TTY code into devfs.

Instead of strictly comparing the prisons, use the more common
prison_check() function to compare credentials. This means that
pseudo-terminals are only visible in devfs by processes within the same
jail and parent jails.

Even though regular users in parent jails can now interact with
pseudo-terminals from child jails, this seems to be the right approach.
These processes are also capable of interacting with the jailed
processes anyway, through signals for example.

Reviewed by:	kib, rwatson (older version)
2009-06-20 14:50:32 +00:00
Rick Macklem
2c1e6cce5b Change the size of the nfsc_groups[] array in the experimental nfs
client to RPCAUTH_UNIXGIDS + 1 (17), since that is what can go on
the wire for AUTH_SYS authentication.

Reviewed by:	brooks
Approved by:	kib (mentor)
2009-06-20 00:54:57 +00:00
Brooks Davis
838d985825 Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively.  (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer.  Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively.  Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary.  In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups.  When feasible, truncate
the group list rather than generating an error.

Minor changes:
  - Reduce the number of hand rolled versions of groupmember().
  - Do not assign to both cr_gid and cr_groups[0].
  - Modify ipfw to cache ucreds instead of part of their contents since
    they are immutable once referenced by more than one entity.

Submitted by:	Isilon Systems (initial implementation)
X-MFC after:	never
PR:		bin/113398 kern/133867
2009-06-19 17:10:35 +00:00
Alan Cox
57a7e73261 Fix some of the style errors in *getpages(). 2009-06-18 05:56:24 +00:00
Rick Macklem
76b30a0cd4 Add the SVC_RELEASE(xprt), as required by r194407.
Approved by:	kib (mentor)
2009-06-17 22:55:59 +00:00
Bjoern A. Zeeb
ebd8672cc3 Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.
2009-06-17 15:01:01 +00:00
Rick Macklem
81e3c4fc8e Fix handling of ".." in nfs_lookup() for the forced dismount case
by cribbing the change made to the regular nfs client in r194358.

Approved by:	kib (mentor)
2009-06-17 14:10:18 +00:00
Bjoern A. Zeeb
7654a365db Add the explicit include of vimage.h to another five .c files still
missing it.

Remove the "hidden" kernel only include of vimage.h from ip_var.h added
with the very first Vimage commit r181803 to avoid further kernel poisoning.
2009-06-17 12:44:11 +00:00
Rick Macklem
47b7dc9933 Remove the "int *" typecast for the aresid argument to vn_rdwr()
and change the type of the argument from size_t to int. This
should avoid issues on 64bit architectures.

Suggested by:	kib
Approved by:	kib (mentor)
2009-06-16 13:52:21 +00:00
Alan Cox
47f11d9a46 Eliminate unnecessary variables. 2009-06-13 20:21:08 +00:00
Jamie Gritton
c1f192193d Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
Jamie Gritton
01de879ac7 Use getcredhostuuid instead of accessing the prison directly.
Approved by:	bz (mentor)
2009-06-13 15:35:22 +00:00
John Baldwin
04f7f4636f Update the inline version of vn_get_ino() for ".." lookups to match the
recentish changes to vn_get_ino().

MFC after:	1 week
2009-06-12 21:19:57 +00:00
Rick Macklem
934a309971 This commit is analagous to r193952, but for the experimental nfs
subsystem. Add a test for VI_DOOMED just after ncl_upgrade_vnlock() in
ncl_bioread_check_cons(). This is required since it is possible
for the vnode to be vgonel()'d while in ncl_upgrade_vnlock() when
a forced dismount is in progress. Also, move the check for VI_DOOMED
in ncl_vinvalbuf() down to after ncl_upgrade_vnlock() and replace the
out of date comment for it.

Approved by:	kib (mentor)
2009-06-10 21:16:39 +00:00
Konstantin Belousov
93bc76dc3e For cd9660_ioctl, check for recycled vnode after locking it.
Noted by:	Jaakko Heinonen <jh saunalahti fi>
MFC after:	2 weeks
2009-06-10 15:48:34 +00:00
Konstantin Belousov
d6da640860 Fix r193923 by noting that type of a_fp is struct file *, not int.
It was assumed that r193923 was trivial change that cannot be done
wrong.

MFC after:	2 weeks
2009-06-10 14:24:31 +00:00
Konstantin Belousov
e4d9bdc105 s/a_fdidx/a_fp/ for VOP_OPEN comments that inline struct vop_open_args
definition.

Discussed with:	bde
MFC after:	2 weeks
2009-06-10 14:09:05 +00:00
Konstantin Belousov
c4702e66f4 Remove unused VOP_IOCTL and VOP_KQFILTER implementations for fifofs.
MFC after:	2 weeks
2009-06-10 14:02:22 +00:00
Konstantin Belousov
c4df27d5c8 VOP_IOCTL takes unlocked vnode as an argument. Due to this, v_data may
be NULL or derefenced memory may become free at arbitrary moment.

Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL
to prevent reclaim; check whether the vnode was already reclaimed after
the lock is granted.

Reported by:	georg at dts su
Reviewed by:	des (pseudofs)
MFC after:	2 weeks
2009-06-10 13:57:36 +00:00
Rick Macklem
410654ec1b Since vn_lock() with the LK_RETRY flag never returns an error
for FreeBSD-CURRENT, the code that checked for and returned the
error was broken. Change it to check for VI_DOOMED set after
vn_lock() and return an error for that case. I believe this
should only happen for forced dismounts.

Approved by:	kib (mentor)
2009-06-09 15:18:01 +00:00
Rick Macklem
2fa32154e4 Fix nfscl_getcl() so that it doesn't crash when it is called to
do an NFSv4 Close operation with the cred argument NULL. Also,
clarify what NULL arguments mean in the function's comment.

Approved by:	kib (mentor)
2009-06-08 18:41:23 +00:00
Robert Watson
dde155e95a Use #ifdef APPLE_MAC instead of #ifdef MAC to conditionalize Apple-specific
behavior for unicode support in UDF so as not to conflict with the MAC
Framework.

Note that Apple's XNU kernel also uses #ifdef MAC for the MAC Framework.

Suggested by:	pjd
MFC after:	3 days
2009-06-06 07:13:57 +00:00
Dag-Erling Smørgrav
c097b30885 Drop Giant.
MFC after:	1 week
2009-06-06 00:44:13 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Robert Watson
37ba986a5f Don't check MAC in the NFS server ACL set path, right now we aren't
enforcing MAC for NFS clients.
2009-06-05 14:15:00 +00:00
Robert Watson
927e0a56ce Re-add opt_mac.h include, which is required in order for MNT_MULTILABEL
to be set properly on devfs.  Otherwise, it isn't possible to set labels
on /dev nodes.

Reported by:	Sergio Rodriguez <sergiorr at yahoo.com>
MFC after:	3 days
2009-06-04 10:30:18 +00:00
Alan Cox
1f17689408 nfs_write() can use the recently introduced vfs_bio_set_valid() instead of
vfs_bio_set_validclean(), thereby avoiding the page queues lock.

Garbage collect vfs_bio_set_validclean().  Nothing uses it any longer.
2009-05-31 20:18:02 +00:00
Konstantin Belousov
b00098d164 Unlock the pseudofs vnode before calling fill method for pfs_readlink().
The fill code may need to lock another vnode, e.g. procfs file
implementation.

Reviewed by:	des
Tested by:	pho
MFC after:	2 weeks
2009-05-31 15:01:50 +00:00
Konstantin Belousov
b0f34bb643 Implement the bypass routine for VOP_VPTOCNP in nullfs.
Among other things, this makes procfs <pid>/file working for executables
started from nullfs mount.

Tested by:	pho
PR:	94269, 104938
2009-05-31 14:58:43 +00:00
Konstantin Belousov
b9131889d2 Do not drop vnode interlock in null_checkvp(). null_lock() verifies that
v_data is not-null before calling NULLVPTOLOWERVP(), and dropping the
interlock allows for reclaim to clean v_data and free the memory.

While there, remove unneeded semicolons and convert the infinite loops
to panics. I have a will to remove null_checkvp() altogether, or leave
it as a trivial stub, but not now.

Reported and tested by:	pho
2009-05-31 14:54:20 +00:00
Konstantin Belousov
cec9ed6d7f Lock the real null vnode lock before substitution of vp->v_vnlock.
This should not really matter for correctness, since vp->v_lock is
not locked before the call, and null_lock() holds the interlock,
but makes the control flow for reclaim more clear.

Tested by:	pho
2009-05-31 14:52:45 +00:00
Marko Zec
705fe7ce35 Unbreak options VIMAGE kernel builds.
Approved by:	julian (mentor)
2009-05-31 11:57:51 +00:00
Rick Macklem
d00a615a98 Add a check to v_type == VREG for the recently modified code that
does NFSv4 Closes in the experimental client's VOP_INACTIVE().
I also replaced a bunch of ap->a_vp with a local copy of vp,
because I thought that made it more readable.

Approved by:	kib (mentor)
2009-05-30 22:11:12 +00:00
Edward Tomasz Napierala
c97fcdba57 Add VOP_ACCESSX, which can be used to query for newly added V*
permissions, such as VWRITE_ACL.  For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.

Reviewed by:	rwatson@
2009-05-30 13:59:05 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Alan Cox
3933ec4d15 Make *getpages()s' assertion on the state of each page's dirty bits
stricter.
2009-05-28 18:11:09 +00:00
Dag-Erling Smørgrav
26088b9d4d Use a temporary variable to avoid a duplicate strlen().
Submitted by:	kib
MFC after:	1 week
2009-05-28 10:24:26 +00:00
Rick Macklem
fdd88deeba Fix handling of NFSv4 Close operations in ncl_inactive(). Only
do them for NFSv4 and flush writes to the server before doing
the Close(s), as required. Also, use the a_td argument instead of
curthread.

Approved by:	kib (mentor)
2009-05-27 19:41:29 +00:00
Alan Cox
e2d0be0172 Eliminate redundant setting of a page's valid bits and pointless clearing
of the same page's dirty bits.
2009-05-27 18:12:10 +00:00
Rick Macklem
ee1c1433e3 Add a function to the experimental nfs subsystem that tests to see
if a local file system supports NFSv4 ACLs. This allows the
NFSHASNFS4ACL() macro to be correctly implemented. The NFSv4 ACL
support should now work when the server exports a ZFS volume.

Approved by:	kib (mentor)
2009-05-27 15:16:56 +00:00
Jamie Gritton
0304c73163 Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
Rick Macklem
c3e22f831f Fix the experimental nfs subsystem so that it builds with the
current NFSv4 ACLs, as defined in sys/acl.h. It still needs a
way to test a mount point for NFSv4 ACL support before it will
work. Until then, the NFSHASNFS4ACL() macro just always returns 0.

Approved by:	kib (mentor)
2009-05-26 22:21:53 +00:00
Edward Tomasz Napierala
ff392a1d87 Adapt to the new ACL #define names.
Reviewed by:	rmacklem@
2009-05-26 17:01:00 +00:00
Rick Macklem
10a5a16efa Add two sysctl variables to the experimental nfs server, so
that the range of versions of NFS handled by the server can
be limited. The nfsd daemon must be restarted after these
sysctl variables are changed, in order for the change to take
effect.

Approved by:	kib (mentor)
2009-05-26 01:47:37 +00:00
Rick Macklem
9183a2a33e Fix the handling of NFSv4 Illegal Operation number to conform
to RFC3530 (the operation number in the reply must be set to
the value for OP_ILLEGAL). Also cleaned up some indentation.

Approved by:	kib (mentor)
2009-05-26 01:16:09 +00:00
Rick Macklem
415670e4c2 Fix the experimental nfs server's interface to the new krpc so
that it handles the case of a non-exported NFSv4 root correctly.
Also, delete handling for the case where nd_repstat is already
set in nfs_proc(), since that no longer happens.

Approved by:	kib (mentor)
2009-05-26 01:09:33 +00:00