Make the alias list a SLIST.
Drop the "fast recycling" optimization of vnodes (including
the returning of a prexisting but stale vnode from checkalias).
It doesn't buy us anything now that we don't hardlimit
vnodes anymore.
Rename checkalias2() and checkalias() to addalias() and
addaliasu() - which takes dev_t and udev_t arg respectively.
Make the revoke syscalls use vcount() instead of VALIASED.
Remove VALIASED flag, we don't need it now and it is faster
to traverse the much shorter lists than to maintain the
flag.
vfs_mountedon() can check the dev_t directly, all the vnodes
point to the same one.
Print the devicename in specfs/vprint().
Remove a couple of stale LFS vnode flags.
Remove unimplemented/unused LK_DRAINED;
Diskslice/label code not yet handled.
Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)
Add the correct hook for devfs to kern_conf.c
The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.
A few drivers had minor additional cleanups performed relating to cdevsw
registration.
A few drivers don't register a cdevsw{} anymore, but only use make_dev().
Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout.
please see comment in sys/conf.h about the flag argument.
Remove strategy argument from all the diskslice/label/bad144
implementations, it should be found from the dev_t.
Remove bogus and unused strategy1 routines.
Remove open/close arguments from dssize(). Pick them up from dev_t.
Remove unused and unfinished setgeom support from diskslice/label/bad144 code.
operations. This allows a device driver better insight into
what is going on that the current:
proc1: open /dev/foo R/O
devsw->open( R/O, proc1 )
proc2: open /dev/foo R/W
devsw->open( R/W, proc2 )
proc2: close
/* nothing, but device is
really only R/O open */
proc1: close
devsw->close( R/O, proc1 )
- %q -> %ll; don't assume that the promotion of off_t is quad_t; only
assume that off_t's are representable as long longs.
- printing of dev_t's was completely broken.
Fixed nearby printf format errors not reported by gcc -Wformat on i386's:
- printing of ino_t's and pointers was sloppy.
Translated from: a similar fix in ufs_readwrite.c rev.1.61.
Don't forget to set DE_ACCESS for short reads.
Check for invalid (negative) offsets before checking for reads of
0 bytes, as in ufs, although checking for invalid offsets at all
is probably a bug.
vnodes referencing this device.
Details:
cdevsw->d_parms has been removed, the specinfo is available
now (== dev_t) and the driver should modify it directly
when applicable, and the only driver doing so, does so:
vn.c. I am not sure the logic in checking for "<" was right
before, and it looks even less so now.
An intial pool of 50 struct specinfo are depleted during
early boot, after that malloc had better work. It is
likely that fewer than 50 would do.
Hashing is done from udev_t to dev_t with a prime number
remainder hash, experiments show no better hash available
for decent cost (MD5 is only marginally better) The prime
number used should not be close to a power of two, we use
83 for now.
Add new checkalias2() to get around the loss of info from
dev2udev() in bdevvp();
The aliased vnodes are hung on a list straight of the dev_t,
and speclisth[SPECSZ] is unused. The sharing of struct
specinfo means that the v_specnext moves into the vnode
which grows by 4 bytes.
Don't use a VBLK dev_t which doesn't make sense in MFS, now
we hang a dummy cdevsw on B/Cmaj 253 so that things look sane.
Storage overhead from all of this is O(50k).
Bump __FreeBSD_version to 400009
The next step will add the stuff needed so device-drivers can start to
hang things from struct specinfo
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.
cdevsw_add() will print an message if the d_maj field looks bogus.
Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.
Move bdevsw() and devsw() functions to kern/kern_conf.c
Bump __FreeBSD_version to 400006
This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions
if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.
I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.
Reformat and initialize correctly all "struct cdevsw".
Initialize the d_maj and d_bmaj fields.
The d_reset field was not removed, although it is never used.
I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.
Vinum and i4b not modified, patches emailed to respective authors.
udev_t in the kernel but still called dev_t in userland.
Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()
For now they're functions, they will become in-line functions
after one of the next two steps in this process.
Return major/minor/makedev to macro-hood for userland.
Register a name in cdevsw[] for the "filedescriptor" driver.
In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.
In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).
A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.
Without DEVT_FASCIST I belive this patch is a no-op.
Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.
Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
Made a new (inline) function devsw(dev_t dev) and substituted it.
Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)
DEVFS will eventually benefit from this change too.
Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.
Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)
Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)
(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1:
s/suser/suser_xxx/
2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with
later.
There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
before mounting (should help to do not mount extended partitions:-).
Fixed problem with hanging while unmounting busy fs.
And (the most important) added some locks to prevent
simulaneous access to kernel structures!
cannot yet be closed, though.
I hope I got all credits right, and that the multiple submitted by lines
do not break anyone's scripts...
PR: kern/5038, kern/5567
Submitted by: Keith Jang <keith@email.gcn.net.tw>
Submitted by: Joachim Kuebart <joki@kuebart.stuttgart.netsurf.de>
Submitted by: Byung Yang <byung@wam.umd.edu>
Submitted by: Motomichi Matsuzaki <mzaki@e-mail.ne.jp>
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
ufs/ufs/ufs_readwrite.c kern/vfs_bio.c
Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>
This can have bad security implications, but the impact on FreeBSD
systems is minimal because this fs isn't in the default kernels and it
is unknown if it even works.
Submitted by: Manuel Bouyer <bouyer@antioche.eu.org> and
Artur Grabowski <art@stacken.kth.se>
Add d_parms() to {c,b}devsw[]. If non-NULL this function points to
a device routine that will properly fill in the specinfo structure.
vfs_subr.c's checkalias() supplies appropriate defaults. This change
should be fully backwards compatible with existing devices.
is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.
c_caddr_t with extreme prejudice. Here we want to convert from
`const char *' to `const char *'. Casting through c_caddr_t is
not the way to do this. The original cast to caddr_t was apparently
to break warnings about const mismatches in other versions of BSD
(in 4.4BSDLite2, the conversion is from `const char *path' to
plain caddr_t).
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
lock, and add some macros and function parameters to make sure that
the information get to the point where it can be put in the lock
structure.
While I'm here, add DEBUG_VFS_LOCKS to LINT.
is enough to satisfy things like StarOffice. This is a hack, but doing
it properly would be a LOT of work, and would require extensive grovelling
around in the user address space to find the argv[].
Obtained from: Mostly from Andrzej Bialecki <abial@nask.pl>.
Noticed by: Carl Mascott <cmascott@world.std.com>
Don't write access time of a file more than once per day. (Its precision is
1 day anyway). Don't try to write access and creation time in nonwin95 case.
Suggested by: bde (long time ago).
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>
client programs are allowed to finish up (coda_call is
forced to complete) and release their locks. Thus there
is a reasonable chance that the vflush implicit in the
unmount will not get hung on held locks.
references to them.
The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
device drivers about sectors no longer in use.
Device-drivers receive the call through d_strategy, if they have
D_CANFREE in d_flags.
This allows flash based devices to erase the sectors and avoid
pointlessly carrying them around in compactions.
Reviewed by: Kirk Mckusick, bde
Sponsored by: M-Systems (www.m-sys.com)
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
bits. We used a private, wrong, version of `struct dirent' to help
break getdirentries(), and we use a silly check that the size of this
struct is a power of 2 to help break mount() if getdirentries() would
not work. This fix just changes the struct to match `struct dirent'
(except for the name length).
There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries. The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).
rawread()/rawwrite() went away as part of this though it's not strictly
the same patch, just that it involves all the same lines in the drivers.
cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.
Reviewed by: Eivind Eklund and Mike Smith
Changes suggested by eivind.
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
emulators. The emulators assume that filesystem may just ignore cookies, and
handle this case correctly. So we just ignore cookies.
Also sync *_readdir "prototypes" with reality.
Check args using the same expression as in fdesc and kernfs. The check
was actually already correct, modulo overflow. It could be tightened
up to either allow huge (aligned) offsets, treating them as EOF, or
disallow all offsets beyond EOF.
Didn't fix invalid address calculation &foo[i] where i may be out of
bounds.
Didn't fix shooting of foot using a private unportable dirent struct.
and missing arg checking.
Panic instead of returning bogus error codes or forgetting to check
all cases if fdesc_readdir() gets called for a non-directory. This
can't happen.