directory. vrele() may lock the passed vnode, which in these cases would
give an invalid lock order of child -> parent. These situations are
deadlock prone although do not typically deadlock because the vrele
is typically not releasing the last reference to the vnode. Users of
vrele must consider it as a call to vn_lock() and order it appropriately.
MFC After: 1 week
Sponsored by: Isilon Systems, Inc.
Tested by: kkenn
kern_prot.c. This API handles reference counting among many other things.
Notably, if MAC is compiled into the kernel, it will properly initialize the
MAC labels when the ucred is allocated.
This work is in preparation for a new MAC entry point which will be responsible
for properly initializing policy specific labels for the NFS server credential.
Utilization of the crfree/crget APIs reduce the complexity associated with
this label's management.
Submitted by: green (with changes) [1]
Obtained from: TrustedBSD Project
Discussed with: rwatson, alfred
[1] I moved the ucred allocation outside the scope of the NFS server lock to
prevent M_WAIKOK allocations from occurring with non-sleep-able locks held.
Additionally, to reduce complexity, the ucred persist as long as the NFS
server descriptor.
to see if additional write requests will arrive that can be coalesced and
clustered with earlier ones. When doing so, it must determine whether
the two requests are made by credentials with the same access writes, so
as not to coalesce improperly. NFSW_SAMECRED() implements a test of two
credentials using a binary compare.
Replace NFSW_SAMECRED() macro with nfsrv_samecred() function, which is
aware of the contents and layout of a struct ucred, rather than a simple
binary compare. While the binary compare works when ucred is simply a
zero'd and embedded 'struct ucred' in the NFS descriptor, it will work
less well when the ucred associated with an NFS descriptor is "real", so
has defined and populated reference count, mutex, etc.
MFC after: 1 week
Obtained from: TrustedBSD Project
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.
If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.
Discussed with: rwatson
be made holding the NFS server mutex. To clean this up, introduce a
version of the function, nfsrv_access_withgiant(), that expects the
NFS server mutex to already have been dropped and Giant acquired.
Wrap nfsrv_access() around this. This permits callers to more
efficiently check access if they're in a code block performing VFS
operations, and can be substitited for the nfsrv_access() call that
triggered this bug.
PR: 73807, 73208
MFC after: 1 week
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
failure in the NFS server would result in a leaked instance of the NFS
server subsystem lock. Liberally sprinkle assertions in all target
labels for error unwinding to assert the desired locking state.
RELENG_5_3 candidate.
MFC after: 3 days
Reported by: Wilkinson, Alex <alex dot wilkinson at dsto dot defence dot gov dot au>
prevent leakage of Giant. With INVARIANTS, this results in an
assertion failure following execution of the RPC. Without INVARIANTS,
it could result in problems if the NFS server is killed causing nfsd
to return to user space holding Giant.
Feet provided by: brueffer
The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
a vnode. Not bumped into with asserts in the main tree because we
run the NFS server with Giant by default. Discovered by inspection.
Complete annotations of Giant acquisition/release to note that it's
only because of VFS that we acquire Giant in most places in the NFS
server.
subsystem lock to avoid tripping over an assertion regarding whether
the lock is held or not. This is likely to be the cause of a panic
tripped over by Andrea Campi.
mechanism so that early processing on mbufs can be performed before
a context switch to the NFS server threads. Because of this, if
the socket code is running without Giant, the NFS server also needs
to be able to run the upcall code without relying on the presence on
Giant. This change modifies the NFS server to run using a "giant
code lock" covering operation of the whole subsystem. Work is in
progress to move to data-based locking as part of the NFSv4 server
changes.
Introduce an NFS server subsystem lock, 'nfsd_mtx', and a set of
macros to operate on the lock:
NFSD_LOCK_ASSERT() Assert nfsd_mtx owned by current thread
NFSD_UNLOCK_ASSERT() Assert nfsd_mtx not owned by current thread
NFSD_LOCK_DONTCARE() Advisory: this function doesn't care
NFSD_LOCK() Lock nfsd_mtx
NFSD_UNLOCK() Unlock nfsd_mtx
Constify a number of global variables/structures in the NFS server
code, as they are not modified and contain constants only:
nfsrvv2_procid nfsrv_nfsv3_procid nonidempotent
nfsv2_repstat nfsv2_type nfsrv_nfsv3_procid
nfsrvv2_procid nfsrv_v2errmap nfsv3err_null
nfsv3err_getattr nfsv3err_setattr nfsv3err_lookup
nfsv3err_access nfsv3err_readlink nfsv3err_read
nfsv3err_write nfsv3err_create nfsv3err_mkdir
nfsv3err_symlink nfsv3err_mknod nfsv3err_remove
nfsv3err_rmdir nfsv3err_rename nfsv3err_link
nfsv3err_readdir nfsv3err_readdirplus nfsv3err_fsstat
nfsv3err_fsinfo nfsv3err_pathconf nfsv3err_commit
nfsrv_v3errmap
There are additional structures that should be constified but due
to their being passed into general purpose functions without const
arguments, I have not yet converted.
In general, acquire nfsd_mtx when accessing any of the global NFS
structures, including struct nfssvc_sock, struct nfsd, struct
nfsrv_descript.
Release nfsd_mtx whenever calling into VFS, and acquire Giant for
calls into VFS. Giant is not required for any part of the
operation of the NFS server with the exception of calls into VFS.
Giant will never by acquired in the upcall code path. However, it
may operate entirely covered by Giant, or not. If debug.mpsafenet
is set to 0, the system calls will acquire Giant across all
operations, and the upcall will assert Giant. As such, by default,
this enables locking and allows us to test assertions, but should not
cause any substantial new amount of code to be run without Giant.
Bugs should manifest in the form of lock assertion failures for now.
This approach is similar (but not identical) to modifications to the
BSD/OS NFS server code snapshot provided by BSDi as part of their
SMPng snapshot. The strategy is almost the same (single lock over
the NFS server), but differs in the following ways:
- Our NFS client and server code bases don't overlap, which means
both fewer bugs and easier locking (thanks Peter!). Also means
NFSD_*() as opposed to NFS_*().
- We make broad use of assertions, whereas the BSD/OS code does not.
- Made slightly different choices about how to handle macros building
packets but operating with side effects.
- We acquire Giant only when entering VFS from the NFS server daemon
threads.
- Serious bugs in BSD/OS implementation corrected -- the snapshot we
received was clearly a work in progress.
Based on ideas from: BSDi SMPng Snapshot
Reviewed by: rick@snowhite.cis.uoguelph.ca
Extensive testing by: kris
FreeBSD, we can have a negative available space value, but the
corresponding fields in the NFS protocol are unsigned. So
trnucate the value to 0 if it's negative, so that the client
doesn't receive absurdly high values.
Tested by: cognet
short read operations at the end of a file to not have the "eof"
flag set as they should. The problem is that the requested read
count was compared against the rounded-up reply data length instead
of the actual reply data length. This bug appears to have been
introduced in revision 1.78 (June 1999). It causes first-time reads
of certain file sizes (e.g 4094 bytes) to fail with EIO on a RedHat
9.0 NFSv3 client.
MFC after: 1 week
when serving up more than about 32 active files. For details see
section 6.3 (pg 111) of Daniel Ellard and Margo Seltzer, ``NFS
Tricks and Benchmarking Traps'' in the Proceedings of the Usenix
2003 Freenix Track, June 9-14, 2003 pg 101-114.
Obtained from: Daniel Ellard <ellard@eecs.harvard.edu>
Sponsored by: DARPA & NAI Labs.
is not pretty, but it fixes the code so that it no longer violates the
vnode locking rules in the VFS API and doesn't trip any of the locking
assertions enabled by the DEBUG_VFS_LOCKS kernel configuration option.
There is one report that this patch fixed a "locking against myself"
panic on an NFS server that was tripped by a diskless client.
Approved by: re (scottl)
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
whether its caller has locked the vm_object. (This is a temporary
measure to bootstrap vm_object locking.)
- Remove the buftimelock mutex and acquire the buf's interlock to protect
these fields instead.
- Hold the vnode interlock while locking bufs on the clean/dirty queues.
This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
BUF_LOCK with a LK_TIMEFAIL to a single lock.
Reviewed by: arch, mckusick
32k read and write operations on datagram sockets when in fact we
reject requests larger than 16k. It must be the case that virtually
all clients use data sizes of 16k or less for UDP transport (FreeBSD's
client defaults to 8k and never exceeds 16k), as this bug has been
present ever since NFSv3 support was added.
Reported by: Senthil <lihtnes78@netscape.net>
Reviewed by: dillon
Approved by: re
MFC-after: 1 week
nfsm_srvpathsiz. This macro plucks a length out of an rpc request and
verifies that its size does not exceed NFS_MAXPATHLEN. If it does
it generates an ENAMETOOLONG response.
- Use this macro, and the existing nfsm_srvnamsiz macro in two places
where we deal with paths passed in by the client.
This fixes a linux interoperability bug. Linux was sending oversized path
components which would cause us to ignore the request all together. This
causes linux to hang indefinitly while it waits for a response. This
could still happen in other cases where we error out with EBADRPC.
Sponsored by: Isilon Systems, Inc.
Reviewed by: alfred, fabbri@isilon.com, neal@isilon.com
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.
Idea stolen from: BSD/OS
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on. Extensive testing has appeared to have shown no
increase in overhead.
Disadvantages
Dirties more cache lines during lookups.
Not as fast as a hash table lookup (but still N log N and optimal
when there is locality of reference).
Advantages
vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
syncer operate more efficiently.
I get to rip out all the old hacks (some of which were mine) that tried
to keep the v_dirtyblkhd tailq sorted.
The per-vnode splay tree should be easier to lock / SMPng pushdown on
vnodes will be easier.
This commit along with another that Alan is working on for the VM page
global hash table will allow me to implement ranged fsync(), optimize
server-side nfs commit rpcs, and implement partial syncs by the
filesystem syncer (aka filesystem syncer would detect that someone is
trying to get the vnode lock, remembers its place, and skip to the
next vnode).
Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.
Suggested by: alc
nfsrv_readdir and nfsrv_readdirplus can return. A client request
containing an over-large `count' field could trigger the "Bad nfs
svc reply" panic in nfs_syscalls.c.
Spotted while trying to reproduce kern/37304, which turned out to
be fixed in FreeBSD a long time ago.
MFC after: 1 week
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
server side. This can lead to a system deadlock.
Reviewed by: iedowse
Tested by: Alexey G Misurenko <mag@caravan.ru>, iedowse
Bug found with help by: Alexey G Misurenko <mag@caravan.ru>
MFC at: earliest convenience
that va_mode of the supplied attributes is filled in with a valid
file mode (i.e not VNOVAL, and only ALLPERM bits set). However,
some NFS server op functions didn't guarantee this for all possible
request messages:
If a V3 client chose not include to a mode specification, we could
end up creating an ffs inode with mode 0177777, requiring a manual
fsck on the next reboot. Fix this by setting va_mode to 0 before
calling the VOP if a mode hasn't been supplied by the client.
In nfsrv_symlink(), S_IFMT bits supplied by a V2 client could end
up in the va_mode passed to VOP_SYMLINK with similar effects. We
now use the macro nfstov_mode() to correctly mask the bits.
semantics of the nfsm_reply() macro were changed so that the caller
has to explicitly handle the V2 error case, whereas before,
nfsm_reply() did a `goto nfsmout' then. A few server ops (setattr,
readlink, create, mkdir) weren't updated to match, so errors in the
V2 case could cause protocol hangs and leaked mbufs.
Correct some comments that describe the old nfsm_reply behaviour.
[older, harmless nit] Remove the unnecessary `nfsmreply0' label in
nfsrv_create(), since for its users, the main `ereply' label does
the same thing.
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.
The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.
Reviewed by: peter
out nd.ni_vp to prevent the resource cleanup code at the end of
nfsrv_symlink from trying to vrele it. This fixes a "vrele: negative
ref cnt" panic that can occur when a symlink is attempted on an NFS
filesystem with no free space. Found locally, but the symptoms
correspond to those in the PR referenced below.
PR: kern/26878
MFC after: 3 days
to do it explicitly in nfsrv_noop so that the reply gets sent back
to the client. This fixes the generation of a selection of RPC
error replies (RPC_PROGMISMATCH, RPC_PROGUNAVAIL, RPC_PROCUNAVAIL
etc.) that are used by some clients to detect support for optional
protocols and features.
Reviewed by: peter
Reported by: Thomas Quinot <quinot@inf.enst.fr>
PR: kern/31479
next to equivalent m_len adjustments. Move the nfsm_subs.h macros
into groups depending on which phase they are used in, since that
affects the error recovery requirements. Collect some of the common error
checking into a single macro as preparation for unwinding some more.
Have nfs_rephead return a value instead of secretly modifying args.
Remove some unused function arguments that were being passed around.
Clarify nfsm_reply()'s error handling (I hope).
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
of returning an error code to the caller, NFS server op routines
must themselves build an error reply and return 0 to the caller.
This is achieved by replacing the erroneous return statements with
code that jumps forward to the op function's reply code. We need
to be careful to ensure that the 'struct mount' pointer is NULL
though, so that the final vn_finished_write() call becomes a no-op.
Reviewed by: mckusick, dillon
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.
Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
<sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
generate the NFSv3 Version id. boottime itself may change, sometimes
once every tick if you are running xntpd, which really throws off
clients. Clients will tend to throw away what they believe to be
stale data too often, and can get into long loops rewriting the same
data over and over again because they believe the server has rebooted
over and over again due to the changing version id.
Approved by: jkh
cannot unilaterally pass data to a client it can reduce the physical
disk transaction overhead by reading larger blocks. This results in
better pipelining of requests/responses over the network and an almost
100% increase in cpu efficiency on the server. On a 100BaseTX network
NFS read performance increases from 8.5 MBytes/sec to 10 MB/sec (maxed
out), and cpu efficiency increases from 72% idle to 80% idle on the server.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
NFS packets, mainly initializing structure pointers to NULL which
are conditionally freed prior to return.
PR: kern/15249
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.
error for a directory. I have made this change after a great deal of
review although I cannot be absolutely sure that this meets the spec.
The issue devolves into whether changes in an underlying (UFS) directory
can cause NFS directory blocks to be renumbered. My read of the code
indicates that NFS directory blocks will not be renumbered, which means
that the cookies should still remain valid after a change is made to
the underlying directory. This being the case, a cookie error should
not be returned when a change is made to the underlying directory and,
instead, the NFS client should rely on mtime detection to invalidate and
reload the directory.
The use of mtime is problematic in of itself, due to insufficient
resolution, which is why I believe the original conservative error
handling was done. Still, there have been dozens of bug reports by
people needing solaris<->FreeBSD interoperability and these have to
be accomodated.
previously issued synchronously even if async daemons (nfsiod's) were
available. The commit has been moved from the strategy code to the doio
code in order to asynchronize it.
Removed use of lastr in preparation for removal of vnode->v_lastr. It
has been replaced with seqcount, which is already supported by the system
and, in fact, gives us a better heuristic for sequential detection then
lastr ever did.
Made major performance improvements to the server side commit. The
server previously fsync'd the entire file for each commit rpc. The
server now bawrite()s only those buffers related to the offset/size
specified in the commit rpc.
Note that we do not commit the meta-data yet. This works still needs
to be done.
Note that a further optimization can be done (and has not yet been done)
on the client: we can merge multiple potential commit rpc's into a
single rpc with a greater file offset/size range and greatly reduce
rpc traffic.
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
being incremented by 4 bytes too few each time through the loop, which
allows more data into the mbuf chain that we really want. In the worst
case, when we're using 32K read/write sizes with a TCP client, this causes
readdirplus replies to sometimes exceed NFS_MAXPACKET which leads to a
panic. This problem cropped up for me using an IRIX 6.5.4 NFSv3 TCP client
with 32K read/write sizes, however supposedly it can be triggered by
WinNT NFS servers too. In theory, it can probably be triggered by any
NFS v3 implementation using TCP as long as it's using the maxiumum block
size.
Reviewed by: Matthew Dillon <dillon@backplane.com>
txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge.
nfs_serv.c: don't call nfsm_adj() with negative values, windows clients
could crash servers when doing a readdir of a large directory.
nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin
loop trying to bind(). Don't clobber a mbuf pointer or we get panics
on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf.
nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB.
Obtained from: OpenBSD
udev_t in the kernel but still called dev_t in userland.
Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()
For now they're functions, they will become in-line functions
after one of the next two steps in this process.
Return major/minor/makedev to macro-hood for userland.
Register a name in cdevsw[] for the "filedescriptor" driver.
In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.
In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).
A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.
Without DEVT_FASCIST I belive this patch is a no-op.
Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.
Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1:
s/suser/suser_xxx/
2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with
later.
There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
sure that this is necessary to be a sync write here since a VOP_FSYNC()
follows and it will schedule, sort and complete the writes that the
vm_object_page_clean() started (as I think I understand things).
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days
time.