Commit Graph

1512 Commits

Author SHA1 Message Date
Tor Egge
5758c2de94 Add two workarounds for broken MP tables:
- Attempt to handle PCI devices where the interrupt is
	  an ISA/EISA interrupt according to the mp table.

	- Attempt to handle multiple IO APIC pins connected to
	  the same PCI or ISA/EISA interrupt source.  Print a
	  warning if this happens, since performance is suboptimal.
	  This workaround is only used for PCI devices.

With these two workarounds, the -SMP kernel is capable of running on
my Asus P/I-P65UP5 motherboard when version 1.4 of the MP table is disabled.
1998-04-01 21:07:37 +00:00
Tor Egge
300e9a7696 Declare some variables modified by interrupt handlers as volatile. 1998-04-01 20:38:28 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Peter Dufault
8a6472b723 Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work.  Changes:

Change all "posix4" to "p1003_1b".  Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.
1998-03-28 11:51:01 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
Jonathan Lemon
640c4313af Add the ability to make real-mode BIOS calls from the kernel. Currently,
everything is contained inside #ifdef VM86, so this option must be
present in the config file to use this functionality.

Thanks to Tor Egge, these changes should work on SMP machines.  However,
it may not be throughly SMP-safe.

Currently, the only BIOS calls made are memory-sizing routines at bootup,
these replace reading the RTC values.
1998-03-23 19:52:59 +00:00
KATO Takenori
61324207f1 Make EPSON_BOUNCEDMA a new-style option. 1998-03-17 09:11:03 +00:00
Mike Smith
4d1b2b52df Add missing entry to list of major device names. This list should not
exist.
1998-03-17 00:28:02 +00:00
Mike Smith
14e6f79297 Spell 'compatibility' like everyone else. 1998-03-16 12:07:54 +00:00
Mike Smith
2167f84289 Use dkmakeminor() rather than magic knowledge of the size and location of
the slice field.  Handle incomprehensible slice numbers slightly better.
Suggested by:	bde
1998-03-16 11:50:39 +00:00
Poul-Henning Kamp
7ab95d0bab Be less draconian about the TSC if APM is configured, use it for
timecounting if APM-BIOS isn't found.
Be just as draconian about SMP as always, but explain it better.
1998-03-16 10:06:58 +00:00
John Dyson
bef608bd7e Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
Mike Smith
0fd31d39fa Use dsname() to generate the disk region name for the "changing root
device to" message.  Suppress this message if only the slice number
has changed.
1998-03-15 04:42:23 +00:00
Tor Egge
c555a715f5 On SMP systems, initially follow the MP spec with regard to which pin
on the IOAPIC being connected to the 8254 timer interrupt.
Verify that timer interrupts are delivered. If they aren't, attempt
a fallback to mixed mode (i.e. routing the timer interrupt via the 8259 PIC).
1998-03-14 03:11:50 +00:00
Tor Egge
d20d60be28 Don't use the standard macros for disabling/enabling interrupt.
On SMP systems, this left the mpintr_lock simplelock locked, causing
further calls to disable_intr to deadlock or panic.
1998-03-14 03:02:15 +00:00
Bruce Evans
7762bc7bdf Fixed breakage of the !SMP case in vm_page_zero_idle() in the
previous commit.  Opportunities to clean pages were often missed,
and leaving of the idle state was sometimes delayed until the next
interrupt (after any that occurred while cleaning).

Fixed an unstaticization, a syntax error and a style bug in the
previous commit.
1998-03-12 09:55:57 +00:00
Bruce Evans
d1d9d2601f Don't depend on "implicit int" or bloat the data section in the
declaration of mem_devsw_installed.

Reduced include nesting.
1998-03-12 09:14:18 +00:00
Eivind Eklund
005092bba6 Turn "PMAP_SHPGPERPROC" into a new-style option, add it to LINT, and
document it there.
1998-03-09 22:09:13 +00:00
Mike Smith
72928bdadb "Correct behaviour" involves being consistent with the canonical names of
other partitions.  In this case, they appear in the first slice in the
WHOLE_DISK_SLICE case.
1998-03-09 08:35:33 +00:00
Mike Smith
34ef5a9d25 Merge from 2.2; behave correctly in the presence of a slice number that
doesn't directly correspond to the slice field in the device minor number.
1998-03-09 08:10:21 +00:00
Mike Smith
f2dddd5e99 Construct the minor number for the root device taking into account the
slice number passed in by the bootblocks.  This means the kernel will
not use the compatability slice to obtain the root filesystem when
booting from a sliced disk.

Use the extraction macros from reboot.h rather than stating them in full
again.
1998-03-08 15:06:55 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
Tor Egge
1146c3560f The APs now reload the interrupt descriptor table pointer after
f00f_hack has run.

Use the global r_idt descriptor in f00f_hack when in SMP mode,
so the APs find the relocated interrupt descriptor table.

Submitted by:	Partially from David A Adkins <adkin003@tc.umn.edu>
1998-03-07 20:16:49 +00:00
Tor Egge
5dd528cd4e Remove special handling for resuming clock interrupt when using APIC_IO.
The `generic' vector stubs do the right thing.
1998-03-05 21:45:53 +00:00
Tor Egge
622a086be3 Use t_idt instead of idt inside setidt() if f00f_hack() has relocated the IDT.
Submitted by:	Bruce Evans <bde@zeta.org.au>
1998-03-05 19:37:03 +00:00
KATO Takenori
3a94cb7f7d Defined CCR6 and CCR7 (configuration registers of M2 CPU.) 1998-03-04 11:39:16 +00:00
Peter Dufault
f3df61a1cd Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.
1998-03-04 10:25:03 +00:00
Tor Egge
02c1dc3bbc When entering the apic version of slow interrupt handler, level
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.

Clock interrupts now have higher priority than other slow interrupts.
1998-03-03 22:56:30 +00:00
Tor Egge
3163861c7b Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.
1998-03-03 20:55:26 +00:00
Tor Egge
fe9cd27373 Reduce timeout before assuming that forwarding of hardclock or softclock
failed. Don't complain on forwarding failure, unless
BETTER_CLOCK_DIAGNOSTIC is defined.
1998-03-03 20:09:14 +00:00
Tor Egge
221e0c595b forward_statclock and forward_hardclock are located in mp_machdep.c. 1998-03-03 19:44:34 +00:00
Peter Wemm
c8a7999933 Update the ELF image activator to use some of the exec resources rather
than rolling it's own.  This means that it now uses the "safe"
exec_map_first_page() to get the ld.so headers rather than risking a panic
on a page fault failure (eg: NFS server goes down).
Since all the ELF tools go to a lot of trouble to make sure everything
lives in the first page for executables, this is a win.  I have not seen
any ELF executable on any system where all the headers didn't fit in the
first page with lots of room to spare.
I have been running variations of this code for some time on my pure ELF
systems.
1998-03-02 05:47:58 +00:00
John Dyson
ffc82b0a70 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
Poul-Henning Kamp
3301c800f5 Prevent the TSC from being used on APM machines, we have no idea if
it runs at a constant frequency.  This was less of an issue before,
because the TSC only interpolated in the HZ intervals, but now where
the timecounter is used all the way, this becomes much more visible.

Nit: Fix a printf which triggered the bde-filter.
1998-02-28 21:16:13 +00:00
John Dyson
660957521c Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs.  The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by:	Partially from Tor Egge.
1998-02-25 03:56:15 +00:00
Bruce Evans
6aeb4f6474 Removed vestiges of previous microtime() implementation. 1998-02-25 02:20:30 +00:00
John Dyson
d9bed5bee1 Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden
in a way identically as before.)  I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE.  Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.

Add some accouting for vm_zone memory allocations, and provide properly
for vm_zone allocations out of the kmem_map.  Also move the vm_zone
allocation stats to the VM OID tree from the KERN OID tree.
1998-02-23 07:42:43 +00:00
Bruce Evans
168be6199e Quick fix for the i8254 timecounter often gaining 10 msec. 1998-02-23 00:11:25 +00:00
Jordan K. Hubbard
6bf2dcfc50 Add missing CLOCK_UNLOCK() before write_eflags().
Submitted by:	dave adkins <adkin003@tc.umn.edu>
1998-02-21 20:45:27 +00:00
Poul-Henning Kamp
7ec73f6417 Replace TOD clock code with more systematic approach.
Highlights:
    * Simple model for underlying hardware.
    * Hardware basis for timekeeping can be changed on the fly.
    * Only one hardware clock responsible for TOD keeping.
    * Provides a real nanotime() function.
    * Time granularity: .232E-18 seconds.
    * Frequency granularity:  .238E-12 s/s
    * Frequency adjustment is continuous in time.
    * Less overhead for frequency adjustment.
    * Improves xntpd performance.

Reviewed by:    bde, bde, bde
1998-02-20 16:36:17 +00:00
Bruce Evans
39e4376ba7 Removed unused #includes. 1998-02-20 13:11:54 +00:00
Mike Smith
42be88e88d Remove DISABLE_PSE option which was masking (but not fixing) the problem.
A correct fix for execution off MFS filesystems has been committed.
1998-02-16 23:57:03 +00:00
Mike Smith
354f83d34f TEMPORARILY disable support for the 4MB kernel page, as it appears to be
causing installation images for -current to be unbootable.

Submitted by:	phk
1998-02-16 00:29:05 +00:00
Bruce Evans
ee10b2a475 Removed a superstitious fnop() that broke the usefulness of the FPU's
"last instruction" pointer.
1998-02-15 06:25:26 +00:00
KATO Takenori
53a76bb7ed Use RDMSR instruction instead of WRMSR. 1998-02-13 09:34:42 +00:00
Bruce Evans
359888780e Ifdefed SMP-only declarations. 1998-02-13 06:59:22 +00:00
Bruce Evans
1d804e7925 Update timer0_prescaler_count before calling hardclock() while timer0
is "acquired".  This fixes a TSC biasing error of about 10 msec when
pcaudio is active.

Update `time' before calling hardclock() when timer0 is being released.
This is not known to be important.

Added some delays in writertc().  Efficiency is not critical here, unlike
in rtcin(), and we already use conservative delays there.

Don't touch the hardware when machdep.i8254_freq is being changed but
the maximum count wouldn't change.  This fixes jitter of up to 10 msec
for most small adjustments to machdep.i8254_freq.  When the maximum
count needs to change, the hardware should be adjusted more carefully.
1998-02-13 06:33:16 +00:00
Bruce Evans
9f449d2aa2 Ifdefed some npx code. npx should be optional again. 1998-02-13 05:30:18 +00:00
Bruce Evans
9729f279db Fixed missing privilege checking and off-by-1 bounds checking in
i386_set_ioperm().  Don't use a magic number for the bound.

Fixed missing bounds checking in i386_get_ioperm().  Don't use a
magic number for the bound elsewhere in this function.

Removed some bogus initializers.
1998-02-13 05:25:37 +00:00
Bruce Evans
045b6fefd8 Fixed initialization of the 4MB page. Kernels larger than about 2.75MB
(from _btext to _end) crashed in pmap_bootstrap().  Smaller kernels
worked accidentally.
1998-02-12 22:00:01 +00:00