Commit Graph

1697 Commits

Author SHA1 Message Date
dyson
aad85d5a04 In kern_physio.c fix tsleep priority messup.
In vfs_bio.c, remove b_generation count usage,
	remove redundant reassignbuf,
	remove redundant spl(s),
	manage page PG_ZERO flags more correctly,
	utilize in invalid value for b_offset until it
		is properly initialized.  Add asserts
		for #ifdef DIAGNOSTIC, when b_offset is
		improperly used.
	when a process is not performing I/O, and just waiting
		on a buffer generally, make the sleep priority
		low.
	only check page validity in getblk for B_VMIO buffers.

In vfs_cluster, add b_offset asserts, correct pointer calculation
	for clustered reads.  Improve readability of certain parts of
	the code.  Remove redundant spl(s).

In vfs_subr, correct usage of vfs_bio_awrite (From Andrew Gallatin
	<gallatin@cs.duke.edu>).  More vtruncbuf problems fixed.
1998-03-19 22:48:16 +00:00
dyson
dc6d7f19f8 Fix an embarassing problem in vtruncbuf. 1998-03-19 18:46:58 +00:00
dyson
9470d0a235 Correct a problem where data OR metadata could be thrown away if a
buffer is grown.
1998-03-17 17:36:05 +00:00
kato
7fa5494b44 Deleted PC-98 code because (1) machine dependent code should not be in
here, and (2) the flag used in PC-98 code has been assigned to another
purpose.
1998-03-17 08:41:28 +00:00
dyson
ca4225334c Correct a severely evil bug in the vtruncbuf code. It didn't cause
me any problems until after the previous commit.  This problem then
caused a severe case of creeping crud on my diskdrive, and hosed
my system so bad, that I needed to do a complete reinstall.  Sorry!!!

I assume that others have manifest this bug.
1998-03-17 06:30:52 +00:00
julian
54a5f1c1b0 Remove a soft-update hook that was accidentally added to the READ path.
also add some comments, and a couple of very minor cosmetic changes.
1998-03-16 18:39:41 +00:00
phk
edaad77cba A bunch of BNN (Bruce Normal Nits) from bde:
Bring back the softclock inlining
	save a couple of <<32's
	many white-space shuffles.
1998-03-16 10:19:12 +00:00
dyson
9bd499d7fa Allow vfs_ioopt to be enabled with a (temporary) config option. 1998-03-16 02:13:03 +00:00
dyson
6e92f5716b Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
dyson
7c9ff06841 Disable the vfs.ioopt option for now, so that we don't get gratuitious
bugreports.  I might not be able to fix the problems before 3.0, due
to other, more important things.
1998-03-14 19:50:36 +00:00
tegge
fca0f92630 Don't misuse vnode interlocks in routines that can be called from interrupts.
PR:		5893
1998-03-14 02:55:01 +00:00
dufault
06743e32eb idprio processes must be preempted as soon as anything is runnable. 1998-03-11 20:50:42 +00:00
msmith
46304bfe5a If the root mount fails from a device that is not the compatability slice
of a disk, because that slice does not exist, try again mounting from the
compatability slice.

This handles the case where a disk has been initialised by 'disklabel
auto', which places a bogus and invalid slice entry on the disk.
The bootstrap is not smart enough to reject this slice, and pretends to
boot from it.  Believing the the bootstrap at this point is unwise.

Booting from non-'wd' disks thus prepared is still broken, as
'disklabel -rwB xdN auto' does not initialise the disk type field, and
the bootstrap mistakenly claims that the disk is handled by 'wd'.

Behaviour is now consistent with DEVFS expected characteristics.
1998-03-11 00:10:31 +00:00
jb
a069106ffc Add statements to generate a sys/syscall.mk file for inclusion
during the libc/libc_r to automatically pick up syscall names on
the assumption that default asm code needs to generated for them.

In the up-coming changes to the libc makefiles, there is the option
to provide a machine dependent asm source file which will turn off
the automatic generation of the default. There is also an option
to just stop code being generated for a syscall. In most cases,
though, the default asm code is all that is required, so this
change makes that the most convenient was to do business.

Idea suggested by: bde
1998-03-09 04:00:42 +00:00
julian
10c5ccc30a Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
dyson
54f61de05f Free the first page also if it is not valid. 1998-03-08 06:21:33 +00:00
dyson
8ceb6160f4 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
tegge
8644d41f2d The APs now reload the interrupt descriptor table pointer after
f00f_hack has run.

Use the global r_idt descriptor in f00f_hack when in SMP mode,
so the APs find the relocated interrupt descriptor table.

Submitted by:	Partially from David A Adkins <adkin003@tc.umn.edu>
1998-03-07 20:16:49 +00:00
dyson
f5ff9feb67 Some kern_lock code improvements. Add missing wakeup, and enable
disabling some diagnostics when memory or speed is at a premium.
1998-03-07 19:25:34 +00:00
bde
ab37e4723b Set the input and output buffer sizes and the input buffer watermarks
dynamically depending on the line speed(s).  This should give the old
sizes and watermarks until drivers are changed.

Display the input watermarks in pstat and sicontrol.
1998-03-07 15:36:29 +00:00
dufault
e28788f2a4 Reviewed by: msmith, bde long ago
POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:27:00 +00:00
dufault
8893ec06df Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.

POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:25:55 +00:00
dyson
1ae42d49a8 Fix a rounding error for the NFS buffer validend.
Submitted by:	John W. De Boskey <jwd@unx.sas.com>
1998-03-04 03:17:30 +00:00
tegge
9f3982f0f6 When entering the apic version of slow interrupt handler, level
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.

Clock interrupts now have higher priority than other slow interrupts.
1998-03-03 22:56:30 +00:00
tegge
beae57c5b3 Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.
1998-03-03 20:55:26 +00:00
tegge
9b0c9780e5 Reduce timeout before assuming that forwarding of hardclock or softclock
failed. Don't complain on forwarding failure, unless
BETTER_CLOCK_DIAGNOSTIC is defined.
1998-03-03 20:09:14 +00:00
peter
1cfd0bd061 Update the ELF image activator to use some of the exec resources rather
than rolling it's own.  This means that it now uses the "safe"
exec_map_first_page() to get the ld.so headers rather than risking a panic
on a page fault failure (eg: NFS server goes down).
Since all the ELF tools go to a lot of trouble to make sure everything
lives in the first page for executables, this is a win.  I have not seen
any ELF executable on any system where all the headers didn't fit in the
first page with lots of room to spare.
I have been running variations of this code for some time on my pure ELF
systems.
1998-03-02 05:47:58 +00:00
dyson
1928b68b1a Change vfs.ioopt default back to '0'. 1998-03-01 23:07:45 +00:00
msmith
950d32131b The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE.  This is
initially only for file systems that implement the following ops
that do a WILLRELE:

	vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
	vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet.  VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
	ffs mfs nfs msdosfs devfs ext2fs

Custom
	union umapfs

Just EOPNOTSUPP
	fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers.  vput will be replaced by unlock in these cases.  Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer.  This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by:	phk, dyson, terry et. al.
Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-03-01 22:46:53 +00:00
guido
406aea3e09 Make sure that you can only bind a more specific address when it is
done by the same uid.
Obtained from: OpenBSD
1998-03-01 19:39:29 +00:00
dyson
69e5a1e9f5 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
guido
391bb65d14 Raise ncallout from NPROC + 16 to NPROC + 16 + MAXFILES. This shold
prevent a possible DOS attack. The proper fix (to dynamically grow
the callout list) is in the make.
Submitted by:	Paul Traina
1998-02-27 19:58:29 +00:00
bde
bd35cc6b95 Removed unused #includes. 1998-02-25 13:08:07 +00:00
bde
e15b8a8b3a Removed a stale comment and staler code. 1998-02-25 06:30:15 +00:00
bde
df1db50d54 Don't depend on "implicit int" or bloat the data section in the
declaration of ptc_devsw_installed.

Fixed a spelling error.
1998-02-25 06:19:15 +00:00
bde
6c18b55140 Don't depend on "implicit int". 1998-02-25 06:16:37 +00:00
bde
cd90b66afc Declare function pointer args as pointers, not as functions. 1998-02-25 06:13:32 +00:00
bde
59d87ac480 Fixed a missing newline in a debugging printf.
Fixed punctuation in some comments.
1998-02-25 06:04:46 +00:00
bde
d185d33240 Removed unused #includes. 1998-02-25 05:58:50 +00:00
bde
6898166010 Fixed the calculation of `delta' in settime(). We once set all
times consistently wrong (up to 1 tick too late), but recent changes
fixed the setting of the main clock, making other times inconsistent.
The inconsistencies tended to show up as a negative resource usage
for the process that set the time.

Fixed the check for setting the clock backwards.  A stale timestamp
(`time') was checked, so it was possible to set the clock backwards
by up to almost 1 tick.  Until recently, this bug was compensated
for by setting the clock consistently wrong.

Merged the comment about setting the clock backwards from Lite2.

Removed latency micro-optimizations/speed pessimizations in settime().
microtime() and set_timecounter() are relatively expensive, and
they must be called together with clock updates blocked to get a
consistent `delta', so significant latency optimizations are not
possible.

Removed some stale comments.
1998-02-25 04:10:32 +00:00
dyson
7a5637f439 Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden
in a way identically as before.)  I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE.  Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.
1998-02-23 07:41:23 +00:00
dyson
32e0a3673a Clean-up the vget mechanism by permanently attaching VM objects to
vnodes, therefore vget doesn't need to do so anymore.  Other minor
improvements include the temp free vnode queue obeying the VAGE
flag and a printf that warns of to-be-removed code being executed.
1998-02-23 06:59:52 +00:00
phk
044e1e6296 Replace TOD clock code with more systematic approach.
Highlights:
    * Simple model for underlying hardware.
    * Hardware basis for timekeeping can be changed on the fly.
    * Only one hardware clock responsible for TOD keeping.
    * Provides a real nanotime() function.
    * Time granularity: .232E-18 seconds.
    * Frequency granularity:  .238E-12 s/s
    * Frequency adjustment is continuous in time.
    * Less overhead for frequency adjustment.
    * Improves xntpd performance.

Reviewed by:    bde, bde, bde
1998-02-20 16:36:17 +00:00
bde
6982cfc5c0 Staticized.
Don't depend on "implicit int".
1998-02-20 13:52:15 +00:00
bde
bfefd71bbf Don't depend on "implicit int" or bloat the data section in the
declaration of xxx_devsw_installed.
1998-02-20 13:46:58 +00:00
bde
999552194c Don't depend on "implicit int". 1998-02-20 13:37:40 +00:00
bde
9fca072392 Removed unused #includes. 1998-02-20 13:11:54 +00:00
fenner
b318e8e53a Revert sosend() to its behavior from 4.3-Tahoe and before: if
so_error is set, clear it before returning it.  The behavior
introduced in 4.3-Reno (to not clear so_error) causes potentially
transient errors (e.g.  ECONNREFUSED if the other end hasn't opened
its socket yet) to be permanent on connected datagram sockets that
are only used for writing.

(soreceive() clears so_error before returning it, as does
getsockopt(...,SO_ERROR,...).)

Submitted by:	Van Jacobson <van@ee.lbl.gov>, via a comment in the vat sources.
1998-02-19 19:38:20 +00:00
eivind
a5daa0b2f8 Add HW_WDOG to LINT, and turn it into a new-style option. 1998-02-16 23:57:49 +00:00
phk
369a4922e3 A bunch of nits from bde. 1998-02-15 14:15:21 +00:00