Commit Graph

2296 Commits

Author SHA1 Message Date
Alan Cox
4221e284a3 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
Mark Murray
a8af2bd86b This routine was "use"ing File::Basename. This commit removes that
"use" and replaces it with equivalent inline code. The reason is that
Perl has some very nasty circular dependancies, and I am trying to
get the System Perl upgraded by one maintenance level.

The basic rule, until I can find a way to solve this, is that
the build tools MAY NOT use any library code; it must all be inline.
1999-05-02 08:55:27 +00:00
Mike Smith
4a034f21cd Add a hook that can be called to initialise a slave processor's memory
range attributes after they have been extracted from the master.

Hook up the i686 MP code to do this for each AP.

Be more careful about printing the default memory type for the i686.

Suggestions from: luoqi
1999-04-30 22:09:45 +00:00
Poul-Henning Kamp
07901f227b Add beer-ware license and $Id$
Noticed by:	dillon
1999-04-30 06:51:51 +00:00
Poul-Henning Kamp
430210c00b Make BOOTP to work again.
Submitted by:	dillon
Reviewed by:	phk
1999-04-30 06:30:15 +00:00
Dmitrij Tejblum
188554bba1 Set curproc at the end of proc0_init().
This patch also moves the bogus comment (the comment is still not quite
right) and (as a side effect) removes some verbose initialisations (we
depend on static initialisation to 0 for almost everything in proc0).

The alpha kernels are bootable again. The change  won't affect i386's
until machdep.c is changed.

Submitted by:	bde
1999-04-29 22:51:59 +00:00
Alan Cox
0043b4376a Address a performance problem in getnewbuf:
In heavy-writing situations, QUEUE_LRU can contain a large number
	of DELWRI buffers at its head.  These buffers must be moved
	to the tail if they cannot be written async in order to reduce
	the scanning time required to skip past these buffers in later
	getnewbuf() calls.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-04-29 18:15:25 +00:00
Poul-Henning Kamp
75c1354190 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
Poul-Henning Kamp
02daf150a4 Add the jail system call. 1999-04-28 11:28:49 +00:00
Dmitrij Tejblum
604359cf9b s/static foo_devsw_installed = 0;/static int foo_devsw_installed;/.
(Edited automatically)
1999-04-28 10:54:24 +00:00
Luoqi Chen
5206bca10a Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
  is added to point to per-cpu pages, per-cpu global variables are now
  accessed through this new selector (%fs). The selectors in gdt table are
  rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		John Dyson	<dyson@iquest.net>
		Julian Elischer	<julian@whistel.com>
		Bruce Evans	<bde@zeta.org.au>
		David Greenman	<dg@root.com>
1999-04-28 01:04:33 +00:00
Poul-Henning Kamp
1c308b817a Change suser_xxx() to suser() where it applies. 1999-04-27 12:21:16 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Peter Wemm
b6ad3506f3 Register the local (unix domain) sockets ourselves. 1999-04-26 08:56:53 +00:00
Peter Wemm
5b23857d22 Redo domain registration to use SYSINITS rather than linker sets.
Get rid of the spl wrapper kludge, it doesn't seem to be needed between
init calls since all that's running is the domain/protocol timers and they
are safe since domain list modifications are splnet() protected (which
blocks the timers)
1999-04-26 08:56:09 +00:00
Peter Wemm
fc51d58e62 Fix a very long standing bug in run_interrupt_driven_config_hooks(). It
was fetching the next pointer from memory that could have been free()'d.
1999-04-25 22:13:34 +00:00
Poul-Henning Kamp
0bb2226a4d Make the machdep.i8254_freq and machdep.tsc_freq sysctls modify the
timecounter as well

Asked for by:	bde, jhay
1999-04-25 09:00:00 +00:00
Dmitrij Tejblum
ba41a07d04 Fixed printf format errors on alpha. 1999-04-24 18:50:48 +00:00
Andrey A. Chernov
02a3d5261d Lite2 bugfixes merge:
so_linger is in seconds, not in 1/HZ
range checking in SO_*TIMEO was wrong

PR: 11252
1999-04-24 18:22:34 +00:00
Poul-Henning Kamp
22f054e258 Fix a braino in the v_id wraparound code. Give more (current) details
in comment.

PR:		11307
Spotted by:	Ville-Pertti Keinonen <will@iki.fi>
1999-04-24 17:58:14 +00:00
Dmitrij Tejblum
0dd9741eb4 Use pointer arithmetic to do pointer arithmetic. 1999-04-24 11:25:01 +00:00
SADA Kenji
565592bd9c The function msgrcv() could copy larger data than it should do
under some circumstances.
PR:		kern/10765
Submitted by:	Yasuhito FUTATSUKI <futatuki@fureai.or.jp>
1999-04-21 13:30:01 +00:00
Peter Wemm
54a8c69347 Stage 1 of a cleanup of the i386 interrupt registration mechanism.
Interrupts under the new scheme are managed by the i386 nexus with the
awareness of the resource manager.  There is further room for optimizing
the interfaces still.  All the users of register_intr()/intr_create()
should be gone, with the exception of pcic and i386/isa/clock.c.
1999-04-21 07:26:30 +00:00
Alan Cox
f78fd73fa6 Address several problems in vn_read and vn_write:
1. Make read-ahead work for pread and aio_read.

2. Fix one place where a comparison of uio_offset with -1
   wasn't updated to use FOF_OFFSET.

3. Honor O_APPEND in the FOF_OFFSET case.

In addition, use the variable name "ioflag" in both vn_read and
vn_write to avoid possible confusion between the variable "flag"
and the parameter "flags".

Submitted by:	Bruce Evans <bde@zeta.org.au> and me
1999-04-21 05:56:45 +00:00
Dag-Erling Smørgrav
5f967b24fc Make the location of init(8) tunable at boot time. 1999-04-20 21:15:13 +00:00
Peter Wemm
4d823d728f GC some stray debugging printf()s... 1999-04-19 19:39:08 +00:00
Peter Wemm
d95939af7a Zap LKM option and support. Farewell old friend. 1999-04-19 14:19:52 +00:00
Peter Wemm
db42d90829 unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.
1999-04-19 14:14:14 +00:00
Peter Wemm
2072df97aa GC some unused code. 1999-04-17 09:12:35 +00:00
Peter Wemm
e91896117b Well folks, this is it - The second stage of the removal for build support
for LKM's..
1999-04-17 08:36:07 +00:00
Peter Wemm
6182fdbda8 Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition.  eisa, isapnp and pccard* are
not yet using the new resource manager.  Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
 ATA driver to the Alpha.  Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by:	core
1999-04-16 21:22:55 +00:00
Dmitrij Tejblum
35871a15c5 getnewbuf(): check return value from tsleep(). Interruptible NFS may pass
PCATCH to slpflag.
1999-04-14 18:51:52 +00:00
Tor Egge
87c737bc83 Backout early start of APs since it caused some machines to hang. 1999-04-13 03:24:47 +00:00
Eivind Eklund
e9e9477aac More consistent with surrounding style. (Hey - it looked great in the
diff...)

Prodded by:	bde
1999-04-12 14:34:52 +00:00
Dag-Erling Smørgrav
eca2ddda6f Typo in comment. 1999-04-12 10:07:15 +00:00
Eivind Eklund
2a96b3faf9 Staticize. 1999-04-11 02:27:06 +00:00
Eivind Eklund
632a035f84 Staticize. 1999-04-11 02:17:47 +00:00
Tor Egge
44c57e7121 Add prototype for wait_ap(). 1999-04-11 00:43:43 +00:00
Tor Egge
90c26b0d2d Let BSP wait until all APs are initialized. 1999-04-10 22:58:29 +00:00
Dag-Erling Smørgrav
5a00f36414 Allow setting MAXFILES in the kernel config. 1999-04-09 16:28:11 +00:00
Nick Sayer
c0bd94a75d More secure clock management. Allow positive steps only once per second
for as much as one second, but no more. Allows a miscreant to
double-time march the clock, but no worse.

XXX Unlike putting negative deltas in a while(1), performing small
positive steps inside of a while(1) will return EPERM for the
unpermitted ones. Repeated negative deltas are clamped without
error (but the kernel does log a notice).
1999-04-07 19:48:09 +00:00
Matt Jacob
3f92429a24 Fix last delta so file would compile again- I think I got it
right. Add a clarifying (to me at least) comment. Some formatting
fixes.
1999-04-07 17:32:21 +00:00
Peter Wemm
bfda1e3ff7 Disable the mtrr copy calls, it doesn't work with the i686_mem.c stuff.
This should make it compile/link again.
1999-04-07 17:08:40 +00:00
Nick Sayer
fcae3aa61f If securelevel>1, allow the clock to be adjusted negatively only up to
1 second prior to the highest the clock has run so far. This allows
time adjusters like xntpd to do their work, but the worst a miscreant
can do is "freeze" the clock, not go back in time.

We still need to decide on an algorithm to clamp positive adjustments.
As it stands, it is possible to achieve arbitrary negative adjustments
by "wrapping" time around.

PR:		10361
1999-04-07 16:36:56 +00:00
Alan Cox
b2e2337ba1 Fix a performance problem with the new getnewbuf() code: in an outofspace
condition ( bufspace > hibufspace ), an inappropriate scan of the empty
queue was performed looking for buffer space to free up.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-04-07 02:41:54 +00:00
Peter Wemm
57dc594832 Use the reference counted PHOLD()/PRELE() rather than P_PHYSIO. 1999-04-06 03:04:47 +00:00
Peter Wemm
af8ad83e5c Use the reference-counted PHOLD()/PRELE() rather than P_NOSWAP. 1999-04-06 03:03:34 +00:00
Peter Wemm
88b4f4ee55 LK_RETRY is a vn_lock() flag, not one for lockmgr(). 1999-04-06 03:02:11 +00:00
Julian Elischer
8d17e69460 Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
    vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
    ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>
1999-04-05 19:38:30 +00:00
Dmitrij Tejblum
5cc4ab5323 Regenerate (padding for pread and pwrite). 1999-04-04 21:43:36 +00:00