Commit Graph

936 Commits

Author SHA1 Message Date
phk
96e683d754 Fix floppy drives on machines with lots of RAM.
The fix works by reverting the ordering of free memory so that the
chances of contig_malloc() succeeding increases.

PR:		23291
Submitted by:	Andrew Atrens <atrens@nortel.ca>
2000-12-18 20:12:13 +00:00
tanimura
a8dbeb3f7b - If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by:	dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by:	alfred, dillon
2000-12-13 10:01:00 +00:00
jake
a4ad237eaa - Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr.  Also provides macros for the flags
  pased to specify shared, exclusive or release which map to the
  lockmgr flags.  This is so that the use of lockmgr can be easily
  replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
2000-12-13 00:17:05 +00:00
dillon
7317f38702 Be less conservative with a recently added KASSERT. Certain edge
cases with file fragments and read-write mmap's can lead to a situation
    where a VM page has odd dirty bits, e.g. 0xFC - due to being dirtied by
    an mmap and only the fragment (representing a non-page-aligned end of
    file) synced via a filesystem buffer.  A correct solution that
    guarentees consistent m->dirty for the file EOF case is being
    worked on.  In the mean time we can't be so conservative in the
    KASSERT.
2000-12-11 07:52:47 +00:00
dwmalone
dd75d1d73b Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
alfred
012cf93dca Really fix phys_pager:
Backout the previous delta (rev 1.4), it didn't make any difference.

If the requested handle is NULL then don't add it to the list of
objects, to be found by handle.

The problem is that when asking for a NULL handle you are implying
you want a new object.  Because objects with NULL handles were
being added to the list, any further requests for phys backed
objects with NULL handles would return a reference to the initial
NULL handle object after finding it on the list.

Basically one couldn't have more than one phys backed object without
a handle in the entire system without this fix.  If you did more
than one shared memory allocation using the phys pager it would
give you your initial allocation again.
2000-12-06 21:52:23 +00:00
alfred
f9b418f187 need to adjust allocation size to properly deal with non PAGE_SIZE
allocations, specifically with allocations < PAGE_SIZE when the code
doesn't work properly
2000-12-05 22:22:24 +00:00
bde
2f9fb1a43d Backed out previous commit. Don't depend on namespace pollution in
<sys/buf.h>.
2000-12-02 12:03:58 +00:00
jhb
967c93e448 Protect p_stat with sched_lock. 2000-12-02 06:09:44 +00:00
jhb
b8b5877ce7 Protect p_stat with sched_lock. 2000-12-02 03:29:33 +00:00
alfred
663be26bfa remove unneded sys/ucred.h includes 2000-11-30 18:52:32 +00:00
jake
0c0be4e826 Protect the following with a lockmgr lock:
allproc
	zombproc
	pidhashtbl
	proc.p_list
	proc.p_hash
	nextpid

Reviewed by:	jhb
Obtained from:	BSD/OS and netbsd
2000-11-22 07:42:04 +00:00
rwatson
4a8be80602 o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT.
This removes a reason that systat requires setgid kmem.  More to
  come.
2000-11-20 00:39:04 +00:00
dillon
2ace352085 Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
    situations prior to now.

    The new code is based on the concept that I/O must be able to function in
    a low memory situation.  All major modules related to I/O (except
    networking) have been adjusted to allow allocation out of the system
    reserve memory pool.  These modules now detect a low memory situation but
    rather then block they instead continue to operate, then return resources
    to the memory pool instead of cache them or leave them wired.

    Code has been added to stall in a low-memory situation prior to a vnode
    being locked.

    Thus situations where a process blocks in a low-memory condition while
    holding a locked vnode have been reduced to near nothing.  Not only will
    I/O continue to operate, but many prior deadlock conditions simply no
    longer exist.

Implement a number of VFS/BIO fixes

	(found by Ian): in biodone(), bogus-page replacement code, the loop
        was not properly incrementing loop variables prior to a continue
        statement.  We do not believe this code can be hit anyway but we
        aren't taking any chances.  We'll turn the whole section into a
        panic (as it already is in brelse()) after the release is rolled.

	In biodone(), the foff calculation was incorrectly
        clamped to the iosize, causing the wrong foff to be calculated
        for pages in the case of an I/O error or biodone() called without
        initiating I/O.  The problem always caused a panic before.  Now it
        doesn't.  The problem is mainly an issue with NFS.

	Fixed casts for ~PAGE_MASK.  This code worked properly before only
        because the calculations use signed arithmatic.  Better to properly
        extend PAGE_MASK first before inverting it for the 64 bit masking
        op.

	In brelse(), the bogus_page fixup code was improperly throwing
        away the original contents of 'm' when it did the j-loop to
        fix the bogus pages.  The result was that it would potentially
        invalidate parts of the *WRONG* page(!), leading to corruption.

	There may still be cases where a background bitmap write is
        being duplicated, causing potential corruption.  We have identified
        a potentially serious bug related to this but the fix is still TBD.
        So instead this patch contains a KASSERT to detect the problem
  	and panic the machine rather then continue to corrupt the filesystem.
	The problem does not occur very often..  it is very hard to
	reproduce, and it may or may not be the cause of the corruption
	people have reported.

Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
dillon
59e131028f Add the splvm()'s suggested in PR 20609 to protect vm_pager_page_unswapped().
The remainder of the PR is still open.

PR: kern/20609 (partial fix)
2000-11-18 21:11:23 +00:00
dillon
15a44d16ca This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
tegge
5262a5cdb7 Clear the MAP_ENTRY_USER_WIRED flag from cloned vm_map entries.
PR:		2840
2000-11-02 21:38:18 +00:00
phk
f82e4ca62c Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing
the offending inline function (BUF_KERNPROC) on it being #included
already.

I'm not sure BUF_KERNPROC() is even the right thing to do or in the
right place or implemented the right way (inline vs normal function).

Remove consequently unneeded #includes of <sys/proc.h>
2000-10-29 14:54:55 +00:00
jhb
9ae17765f4 - Catch a machine/mutex.h -> sys/mutex.h I somehow missed.
- Close a small race condition.  The sched_lock mutex protects
  p->p_stat as well as the run queues.  Another CPU could change p_stat
  of the process while we are waiting for the lock, and we would end up
  scheduling a process that isn't runnable.
2000-10-25 00:04:16 +00:00
ps
c71ac689e0 Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before:				After:
	WCE           TIME		WCE           TIME
	------------------		------------------
	1	141.820972		1	 15.600111
	0	797.265072		0	 65.480465

Obtained from:	Yahoo!
Reviewed by:	peter
2000-10-17 10:05:49 +00:00
dillon
b9b696f1ab The swap bitmap allocator was not calculating the bitmap size properly
in the face of non-stripe-aligned swap areas.  The bug could cause a
      panic during boot.

      Refuse to configure a swap area that is too large (67 GB or so)

      Properly document the power-of-2 requirement for SWB_NPAGES.

      The patch is slightly different then the one Tor enclosed in the P.R.,
      but accomplishes the same thing.

PR: kern/20273
Submitted by: Tor.Egge@fast.no
2000-10-13 16:44:34 +00:00
jasone
db944ecf00 For lockmgr mutex protection, use an array of mutexes that are allocated
and initialized during boot.  This avoids bloating sizeof(struct lock).
As a side effect, it is no longer necessary to enforce the assumtion that
lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been
removed.

Idea taken from:	BSD/OS.
2000-10-12 22:37:28 +00:00
dwmalone
16be903415 If a process is over its resource limit for datasize, still allow
it to lower its memory usage. This was mentioned on the mailing
lists ages ago, and I've lost the name of the person who brought
it up.

Reviewed by:	alc
2000-10-06 13:03:50 +00:00
jasone
4e290e67b7 Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
jhb
6aa22e7189 - Add a new process flag P_NOLOAD that marks a process that should be
ignored during load average calcuations.
- Set this flag for the idle processes and the softinterrupt process.
2000-09-15 22:00:23 +00:00
bp
a7bc78c86d Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.
They will be used by nullfs and other stacked filesystems to support full
cache coherency.

Reviewed in general by:	mckusick, dillon
2000-09-12 09:49:08 +00:00
jasone
769e0f974d Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
obrien
dba23c4faf Make the arguments match the functionality of the functions. 2000-08-26 04:51:39 +00:00
peter
441c016268 Minor cleanups:
- remove unused variables (fix warnings)
 - use a more consistant ansi style rather than a mixture
 - remove dead #if 0 code and declarations
2000-07-28 22:03:08 +00:00
mckusick
b86877bef0 Clean up the snapshot code so that it no longer depends on the use of
the SF_IMMUTABLE flag to prevent writing. Instead put in explicit
checking for the SF_SNAPSHOT flag in the appropriate places. With
this change, it is now possible to rename and link to snapshot files.
It is also possible to set or clear any of the owner, group, or
other read bits on the file, though none of the write or execute
bits can be set. There is also an explicit test to prevent the
setting or clearing of the SF_SNAPSHOT flag via chflags() or
fchflags(). Note also that the modify time cannot be changed as
it needs to accurately reflect the time that the snapshot was taken.

Submitted by:	Robert Watson <rwatson@FreeBSD.org>
2000-07-26 23:07:01 +00:00
mckusick
a3d0c189ea Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
alfred
4036af0a9c #elsif -> #elif
Noticed by: green
2000-07-11 09:41:29 +00:00
jhb
b6e74b58eb Support for unsigned integer and long sysctl variables. Update the
SYSCTL_LONG macro to be consistent with other integer sysctl variables
and require an initial value instead of assuming 0.  Update several
sysctl variables to use the unsigned types.

PR:		15251
Submitted by:	Kelly Yancey <kbyanc@posi.net>
2000-07-05 07:46:41 +00:00
phk
e5de271d47 Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
jhb
183fce400a Replace the PQ_*CACHE options with a single PQ_CACHESIZE option that you
set equal to the number of kilobytes in your cache.  The old options are
still supported for backwards compatibility.

Submitted by:	Kelly Yancey <kbyanc@posi.net>
2000-07-04 08:55:18 +00:00
mckusick
2f0e9591fa Simplify and rationalise the management of the vnode free list
(preparing the code to add snapshots).
2000-07-04 04:32:40 +00:00
phk
61ff05be25 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
markm
53a44ce9d3 Nifty idea from Jeroen van Gelderen; don't call a routine to check if
we are using the /dev/zero device, just check a flag (supplied by
/dev/zero).
Reviewed by:	dfr
2000-06-25 09:44:32 +00:00
hsu
ce8639fd44 Add missing increment of allocation counter. 2000-06-05 06:34:41 +00:00
dillon
82627e96a0 This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it.  It implements a new
    PG_UNMANAGED flag that has slightly different characteristics
    from PG_FICTICIOUS.

    A new sysctl, kern.ipc.shm_use_phys has been added to enable the
    use of physically-backed sysv shared memory rather then swap-backed.
    Physically backed shm segments are not tracked with PV entries,
    allowing programs which use a large shm segment as a rendezvous
    point to operate without eating an insane amount of KVM in the
    PV entry management.  Read: Oracle.

    Peter's OBJT_PHYS object will also allow us to eventually implement
    page-table sharing and/or 4MB physical page support for such segments.
    We're half way there.
2000-05-29 22:40:54 +00:00
dfr
14048face6 Brucify the pmap_enter_temporary() changes. 2000-05-29 19:21:01 +00:00
dillon
7832cdab03 Fix bug in vm_pageout_page_stats() that always resulted in a full
scan of the active queue.  This fix is not expected to have any
    noticeable impact on performance.

Noticed by: Rik van Riel <riel@conectiva.com.br>
2000-05-29 02:31:55 +00:00
dfr
3d3263476e Add a new pmap entry point, pmap_enter_temporary() to be used during
dumps to create temporary page mappings. This replaces the use of CADDR1
which is fairly x86 specific.

Reviewed by: dillon
2000-05-28 15:49:55 +00:00
jake
961b97d434 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
jake
d93fbc9916 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
peter
807a551902 Checkpoint of a new physical memory backed object type, that does not
have pv_entries.  This is intended for very special circumstances,
eg: a certain database that has a 1GB shm segment mapped into 300
processes.  That would consume 2GB of kvm just to hold the pv_entries
alone.  This would not be used on systems unless the physical ram was
available, as it's not pageable.

This is a work-in-progress, but is a useful and functional checkpoint.
Matt has got some more fixes for it that will be committed soon.

Reviewed by:	dillon
2000-05-21 13:41:29 +00:00
peter
ee5cd6988f Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that.  In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t.  This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system.  (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc).  Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries.  This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist.  This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by:	dillon
2000-05-21 12:50:18 +00:00
dillon
5303cc9523 Fixed bug in madvise() / MADV_WILLNEED. When the request is offset
from the base of the first map_entry the call to pmap_object_init_pt()
    uses the wrong start VA.  MFC to follow.

PR: i386/18095
2000-05-14 18:46:40 +00:00
phk
36c3965ff9 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
phk
c7feb17572 Convert the vm_pager_strategy() interface to take a struct bio instead of
a struct buf.  Don't try to examine B_ASYNC, it is a layering violation
to do so.  The only current user of this interface is vn(4) which, since
it emulates a disk interface, operates on struct bio already.
2000-05-03 07:47:46 +00:00