Commit Graph

39971 Commits

Author SHA1 Message Date
Poul-Henning Kamp
43f0db6cc5 Don't do silly thing if the disk_create() event gets canceled.
Approved by:	re/scottl
2003-05-25 16:57:10 +00:00
Jeff Roberson
30fd5d085d - Reset the free ent to NULL if we have consumed the last free entry. This
fixes a problem where we would overwrite old data if we ran out of free
   entries.

Submitted by:	sam
Approved by:	re (scottl)
2003-05-25 08:48:42 +00:00
Don Lewis
263c8abeb9 Beat vnode locking in the NFS server code into submission. This change
is not pretty, but it fixes the code so that it no longer violates the
vnode locking rules in the VFS API and doesn't trip any of the locking
assertions enabled by the DEBUG_VFS_LOCKS kernel configuration option.
There is one report that this patch fixed a "locking against myself"
panic on an NFS server that was tripped by a diskless client.

Approved by:	re (scottl)
2003-05-25 06:17:33 +00:00
Don Lewis
a35e7eaa1a Always set the hardware parse bit in the IPCB structure when this
structure, which is new to the 82550 and 82551, is used to transmit
a packet.  This appears to fix the packet truncation problem that was
observed when using 82550-based fxp cards to transmit ICMP or fragmented
UDP packets of certain lengths which only had one to three bytes in the
second and final mbuf of the packet.  This matches a note in the "Intel
8255x 10/100 Mbps Ethernet Controller Family Open Source Software Developer
Manual", which says that the hardware parse bit should be set when sending
these types of packets.

There have also been unconfirmed reports of similar problems when
transmitting TCP packets, which should not be affected by the above
mentioned change because the hardware parse bit was already being set
if the stack requested hardware checksumming of the packet.  If the
problem remains, the use of the IPCB structure can be disabled to
cause the driver to fall back to using the older 82559 interface with
82550-based cards by setting
        hint.fxp.UNIT_NUMBER.ipcbxmit_disable
to a non-zero value at boot time, or using kenv to set this variable
before using kldload to load the fxp driver.

Approved by:	re (jhb)
2003-05-25 05:04:26 +00:00
Marcel Moolenaar
dc0545462e Now that we define user mode as any IP address that isn't in the
kernel's VA regions, we cannot limit the use of break-based
syscalls to user mode only. The signal trampolines are in the
gateway page, which is mapped into the process address space in
region 5 and thus is kernel space.

We don't special case the gateway page here. Allow break-based
syscalls from anywhere in the kernel VA space.

Approved by: re@ (blanket)
2003-05-25 01:01:28 +00:00
Warner Losh
f9aedaa4ba Ignore the 'must allocate below 1MB' flag for the TPL_BAR_REG. It is
set on realtek cards, but they work without it (and don't work with
it).  The standard seems to imply that this is just a hint anyway, so
this should be harmless.  It doesn't appear to be set on any other
cardbus cards that I have (or have seen).

This should make the rl based CardBus cards work again.  I've been
running it for about a month now.

Approved by: re@ (jhb)
2003-05-24 23:23:41 +00:00
Marcel Moolenaar
d7f827116f Fix a source of instability specific to an EPC userland. We return
to userland with interrupts disabled until we restore PSR. However,
it has been observed that interrupts do actually happen before they
are enabled again. This is a bit surprising and I don't know yet
what's going on exactly. Nevertheless, the code was not crafted
carefully enough to allow interrupts to happen and we could
clobber the kernel stack of another thread when interrupts did
happen.

This is what happens: we restore the (memory) stack pointer (sp)
and the register stack base prior to restoring ar.k6 and ar.k7.
This is not a problem if interrupts don't happen between setting
sp/ar.bspstore and ar.k6/ar.k7. Alas, interrupts can happen.
Since sp/ar.bspstore already point to the userland stacks, we
need to switch to the kernel stack in interrupt. However, ar.k6
and ar.k7 have not been set, which means that we were switching
to some unrelated kstack and happily clobbered the trapframe
present there if the thread to which the kstack belonged was
in kernel mode or otherwise we could have our trapframe clobbered
if that other thread enters the kernel. Nasty either way.

We now carefully restore ar.k6 prior to restoring ar.bspstore and
likewise for ar.k7 and sp. All we need is the guarantee that an
interrupt does not clobber ar.k6 or ar.k7 before we're back in
userland. That has been achieved by restoring ar.k6/ar.k7
unconditionally (see exception.s)

While here, remove the disabling of interrupts on EPC entry. It
was added as a way to "resolve" the crashes until it was understood
what was going on. I think I achieved the latter, so we can remove
the patch. Note that setting up a trapframe with interrupts
enabled has it's own share of corner cases, but it's better to
properly fixed those than to keep a mostly wrong patch around
because we're afraid to remove it...

Approved by: re@ (blanket)
2003-05-24 22:53:10 +00:00
Marcel Moolenaar
a7b90d80fc Be more careful how we restore interrupts. Don't rewrite most of the
PSR only to achieve setting PSR.i back to it's previous value. It
makes it impossible to change any of the 30+ other unrelated bits
when done between intr_disable() and intr_restore(). That's bad.

Instead have intr_disable() return 1 when interrupts were previously
enabled and 0 otherwise and only enable interrupts in intr_restore()
when given a non-0 value.

This change specifically disallows using intr_restore() to disable
interrupts. The reason is simple: interrupts only need to be restored
after they are being disabled, which means that intr_restore() is
called with interrupts disabled and we only need to enable them if
they were previously enabled.

This change does not fix any bugs, other than that it bugged me...

Approved by: re@ (blanket)
2003-05-24 21:44:24 +00:00
Marcel Moolenaar
95f2dbba40 Consistently us the same metric to differentiate between kernel mode
and user mode. We need to take into account that the EPC syscall path
introduces a grey area in which one can argue either way, including a
third: neither.

We now use the region in which the IP address lies. Regions 5, 6 and 7
are kernel VA regions and if the IP lies any any of those regions we
assume we're in kernel mode. Hence, we can be in kernel mode even if
we're not on the kernel stack and/or have user privileges. There're
gremlins living in the twilight zone :-)

For the EPC syscall path this particularly means that the process
leaves user mode the moment it calls into the gateway page. This
makes the most sense because from a process' point of view the call
represents a request to the kernel for some service and that service
has been performed if the call returns. With the metric we picked,
this also means that we're back in user mode IFF the call returns.

Approved by: re@ (blanket)
2003-05-24 21:16:19 +00:00
Marcel Moolenaar
fb4aa34f3b Unconditionally restore ar.k7 (memory stack) and ar.k6 (register stack)
when returning from an interrupt. Both registers are used on interrupt
to switch to the right kernel stack, but other than that they are not
used. This means we only have to make sure they contain proper values
while in user mode. As such, we conditionally restored these registers
based on whether we returned to userland or not. A nice property of
conditionally restoring ar.k6 and ar.k7 is that it introduces two
invariants: ar.k6 always points to the bottom of the kernel stack and
ar.k7 always points to the top of the kernel stack (immediately below
the PCB we have there).

However, the EPC syscall path introduces an irregularity: there's no
"thin red line" between user and kernel. There's a grey area that's a
couple of instructions wide. Any interruption in that grey area is
bound to see an inconsistent state. One such state is that we're in
kernel space for all practical purposes, but we still need to have
ar.k6 and ar.k7 restored as if we're in userland.

Thus: restore ar.k6 and ar.k7 unconditionally at the cost of losing
a valuable invariant. Both registers now hold the extend of the
usable portion of the kernel stack at any interrupt nesting, which
when in userland mean the bottom and the top of the kstack.
2003-05-24 20:51:55 +00:00
Peter Wemm
3ebd9b48ce Stop profiled libc from exploding, matching gcc's generated code.
Approved by: re (amd64/* blanket)
2003-05-24 18:24:03 +00:00
Marcel Moolenaar
d1d7df1905 Fix an alpha inheritance bug:
On alpha, PAL is involved in context management and after wiring
the CPU (in alpha_init()) a context switch was performed to tell
PAL about the context. This was bogusly brought over to ia64
where it introduced bugs, because we restored the context from
a mostly uninitialized PCB.

The cleanup constitutes:
o  Remove the unused arguments from ia64_init().
o  Don't return from ia64_init(), but instead call mi_startup()
   directly. This reduces the amount of muckery in assembly and
   also allows for the next bullet:
o  Save our currect context prior to calling mi_startup(). The
   reason for this is that many threads are created from thread0
   by cloning the PCB. By saving our context in the PCB, we have
   something sane to clone. It also ensures that a cloned thread
   that does not alter the context in any way will return to
   the saved context, where we're ready for the eventuality with
   a nice, user unfriendly panic().

The cleanup fixes at least the following bugs:
o  Entering mi_startup() with the RSE in enforced lazy mode.
o  Re-execution of ia64_init() in certain "lab" conditions.

While here, add proper unwind directives to __start() so that
the unwind knows it has reached the bottom of the (call) stack.

Approved by: re@ (blanket)
2003-05-24 00:17:34 +00:00
Marcel Moolenaar
ca125f9c17 Fix a (new) source of instability:
When interrupting a kernel context, we don't need to switch stacks
(memory nor register). As such, we were also not restoring the
register stack pointer (ar.bspstore). This, however, fails to be
valid in 1 situation: when we interrupt a register stack switch as
is being done in restorectx(). The problem is that restorectx()
needs to have ar.bsp == ar.bspstore before it can assign the new
value to ar.bspstore. This is achieved by doing a loadrs prior to
assigning to ar.bspstore. If we take an interrupt in between the
loadrs and the assignment and we don't make sure we restore the
ar.bspstore prior to returning from the interrupt, we switch
stacks with possibly non-zero dirty registers, which means that
the new frame pointer (ar.bsp) will be invalid.

So, instead of jumping over the restoration of the register frame
pointer and related registers, we conditionalize it based on whether
we return to kernel context or user context. A future performance
tweak is possible by only restoring ar.bspstore when returning to
kernel mode *and* when the RSE is in enforced lazy mode. One cannot
assume ar.bsp == ar.bspstore if the RSE is not in enforced lazy mode
anyway.

While here (well, not quite) don't unconditionally assign to
ar.bspstore in exception_save. Only do that when we actually switch
stacks. It can only harm us to do it unconditionally.

Approved by: re@ (blanket)
2003-05-23 23:55:31 +00:00
Marcel Moolenaar
42b919d4a6 In swapctx(), put the RSE in enforced lazy mode before we flush the
register stack. There's nothing really wrong with flushing before
putting the RSE in enforced lazy mode, provided you don't depend on
ar.bspstore being equal to ar.bsp when the RSE has been put in
enforced lazy more. The small window between the flush and setting
the RSE may be sufficient to have the RSE eagerly increase the dirty
region (and hence cause ar.bspstore != ar.bsp) or have an interrupt
that may even get the laziest RSE to do something.

Anyway: we don't depend on ar.bspstore being equal to ar.bsp, so
nothing was and is broken. But the code was non-intuitive and
easily confuses. This is a source of future bugs.

Note: the advantage of not depending on ar.bspstore is that there's
some recilience against an interrupted flushrs. Clobbering is limited
to stacked register contents only, not to RSE address clobbering.

Approved: re@ (blanket)
2003-05-23 23:16:43 +00:00
Alan Cox
2e05d89828 Make the maximum number of vnodes a function of both the physical memory
size and the kernel's heap size, specifically, vm_kmem_size.  This
function allows a maximum of 40% of the vm_kmem_size to be used for
vnodes and vm objects.  This is a conservative bound based upon recent
problem reports.  (In other words, a slight increase in this percentage
may be safe.)

Finally, machines with less than ~3GB of RAM should be unaffected
by this change, i.e., the maximum number of vnodes should remain
the same.  If necessary, machines with 3GB or more of RAM can increase
the maximum number of vnodes by increasing vm_kmem_size.

Desired by:	scottl
Tested by:	jake
Approved by:	re (rwatson,scottl)
2003-05-23 19:54:02 +00:00
Peter Wemm
d9cd1af4aa Typo fix. oops.
Submitted by:  jmallett
Approved by:   re (blanket amd64/*)
2003-05-23 06:36:46 +00:00
Peter Wemm
cbd667fa2f Update comments. Note that the kernel is at -1GB, not -2GB as erroniously
implied by the previous commit.  KVM is still only 1GB until
pmap_growkernel() learns about the extra page table level.

Approved by:  re (blanket)
2003-05-23 06:35:45 +00:00
Peter Wemm
f229f5cf85 As suggested by the gdb folks, pad the 'struct fpreg' to a full 512 bytes
to match the native fxsave/fxrstor object size since thats apparently what
the Linux/NetBSD folks do.
2003-05-23 06:31:56 +00:00
Peter Wemm
637068b1d3 Low risk amd64 fix. Use a vm_offset_t for the virtual location of the
buffer space instead of a u_int32_t.  Otherwise the upper 32 bits of
the address space get truncated and syscons blows up.

Approved by:	re (safe, low risk amd64 fixes)
2003-05-23 05:10:49 +00:00
Peter Wemm
9f0c4ab393 Deal with the user VM space expanding. 32 bit applications do not like
having their stack at the 512GB mark.  Give 4GB of user VM space for 32
bit apps.  Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).

Approved by: re (blanket)
2003-05-23 05:07:33 +00:00
Peter Wemm
3c9a3c9ca3 Major pmap rework to take advantage of the larger address space on amd64
systems.  Of note:
- Implement a direct mapped region using 2MB pages.  This eliminates the
  need for temporary mappings when getting ptes.  This supports up to
  512GB of physical memory for now.  This should be enough for a while.
- Implement a 4-tier page table system.  Most of the infrastructure is
  there for 128TB of userland virtual address space, but only 512GB is
  presently enabled due to a mystery bug somewhere.  The design of this
  was heavily inspired by the alpha pmap.c.
- The kernel is moved into the negative address space(!).
- The kernel has 2GB of KVM available.
- Provide a uma memory allocator to use the direct map region to take
  advantage of the 2MB TLBs.
- Fixed some assumptions in the bus_space macros about the ability
  to fit virtual addresses in an 'int'.

Notable missing things:
- pmap_growkernel() should be able to grow to 512GB of KVM by expanding
  downwards below kernbase.  The kernel must be at the top 2GB of the
  negative address space because of gcc code generation strategies.
- need to fix the >512GB user vm code.

Approved by:	re (blanket)
2003-05-23 05:04:54 +00:00
Greg Lehey
74f2cc2c9c Change the way the plex lock mutexes work. Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

expand_table: Add parameters file and line if we're debugging.

Approved by: re (jhb)
2003-05-23 01:15:55 +00:00
Greg Lehey
93573e2e76 Change the way the plex lock mutexes work. Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

Add and clarify comments.

Approved by: re (jhb)
2003-05-23 01:15:30 +00:00
Greg Lehey
7db14b2ff2 expand_table: Add parameters file and line if we're debugging.
MMalloc, vinum_meminfo: Use strlcpy to copy file name.

Approved by: re (jhb)
2003-05-23 01:15:01 +00:00
Greg Lehey
d026346c86 Change the way the plex lock mutexes work. Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

Approved by: re (jhb)
2003-05-23 01:14:35 +00:00
Greg Lehey
8a697ff435 detachobject: Update volume config after detaching a plex.
update_volume_config: Remove redundant diskconfig parameter.

Approved by: re (jhb)
2003-05-23 01:14:13 +00:00
Greg Lehey
cb5eba5e09 Change the way the plex lock mutexes work. Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

update_volume_config: Remove redundant diskconfig parameter.

expand_table: Add parameters file and line if we're debugging.

Approved by: re (jhb)
2003-05-23 01:13:43 +00:00
Greg Lehey
f7b76dc815 Change many strcpys to strlcpys, etc.
Submitted by:	   Ted Unangst <tedu@stanford.edu>

Correct some inaccurate and badly formatted comments.

config_subdisk: If our drive is down, ensure that the subdisk is
		crashed.  Previously it was possible for the subdisk
		to be up when the drive was down.

Change the way the plex lock mutexes work.  Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

update_volume_config: Remove redundant diskconfig parameter.

Approved by: re (jhb)
2003-05-23 01:13:10 +00:00
Peter Wemm
997f3bfc2a Merge from i386/trap.c rev 1.252. Use td_critnest instead of the
spinlocks count for explicitly enabling interrupts.

Approved by:	re (blanket)
2003-05-22 20:09:50 +00:00
Bernd Walter
cdc95e1bb8 Calculate routed interrupts using the slot number from the device and
not that of the bridge.

Approved by:	re (jhb)
2003-05-22 17:45:26 +00:00
Mike Barcroft
6f9622a926 Fix two misuses of __BSD_VISIBLE.
Submitted by:	bde
Approved by:	re
2003-05-22 17:07:57 +00:00
Julian Elischer
faaa20f639 When we are spilling threads out of the run queue during panic, make sure we
keep the thread state variable consistent with its real state.
i.e. Don't say it's on the run queue when it isn't.

Also clarify the associated comment.

Turns a double panic back to a single panic :-/

Approved by:	re@ (jhb)
2003-05-21 18:53:25 +00:00
Poul-Henning Kamp
67fd2837cd Return ENXIO if the softc pointer is NULL, in all likelyhood the
disk is in the process of disappearing.

Approved by:	re/rwats*
2003-05-21 18:52:29 +00:00
Paul Saab
3284b9ee87 Make ciss usable under PAE
Approved by:	re (scottl)
2003-05-21 07:17:06 +00:00
Paul Saab
487a8c7e61 - Make this work with PAE.
- atomically load and clear the status block so we dont miss an
  update.
  Submitted by: jdp

Approved by:	re (scottl)
2003-05-21 07:00:49 +00:00
Nate Lawson
742d91f211 Quirk for Hitachi DVD USB drive. It returns "invalid field in cdb" for
normal INQUIRY requests so enable the NO_INQUIRY quirk.

Submitted by:	Lars Eggert <larse@ISI.EDU>
Approved by:	re (scottl)
2003-05-21 00:22:07 +00:00
John Baldwin
7f4725bd09 The per-CPU spinlocks list is only maintained when WITNESS is enabled.
Thus, treat all page faults while in a critical section as fatal rather
than just those that occur with a non-empty spinlocks list.  All such page
faults are fatal anyways.  Calling trap_fatal() earlier increases the
chances of getting more useful panic messages and a possible DDB prompt.

Approved by:	re (scottl)
2003-05-20 20:50:33 +00:00
Nate Lawson
2f8f9581dd Remove a redundant quirk. Instead, we wildcard all Asahi Optical chips.
Approved by:	re
2003-05-20 18:04:42 +00:00
Marcel Moolenaar
bfaccb767c o Fix a definite bogon: the dirty bity fault, instruction access
failt and data access fault install the PTE in question into
   the VHPT table. However, a post-increment was missing and we
   wrote the raw PTE data into the pagesize/access key field.
   This leaves a corrupt VHPT entry.
o  While here, remove the explicit cache purge. Insertion into
   the translation implicitly purges any overlapping entries.
o  Make sure there's a cycle break between the itc and the rfi.
o  Whitespace fixes.
2003-05-20 06:57:20 +00:00
Marcel Moolenaar
14d2ae56c7 Rename the "IA64 ITC" counter to "ITC" counter. We don't call the
"TSC" counter on i386 "I386 TSC".

Approved by: re@ (blanket)
2003-05-20 06:51:20 +00:00
Marcel Moolenaar
9b9ce577d4 Prevent corruption of the VHPT collision chain by protecting it with
a mutex. The only volatile chain operations are insertion and deletion
but since updating an existing PTE also updates the VHPT entry itself,
and we have the VHPT mutex in both other cases, we also lock when we
update an existing PTE even though no chain operation is involved.
Note that we perform the insertion and deletion careful enough that
we don't need to lock traversals. If we need to lock traversals, we
also need to lock from the exception handler, which we can't without
creating a trapframe.

We're now able to withstand a -j8 buildworld. More work is needed to
withstand Murphy fields. In other words: we still have a bogon...

Approved by: re@ (blanket)
2003-05-20 02:52:41 +00:00
Peter Wemm
62d8fb93d0 Deal with the possibility of negative available space from the file server
to avoid Bad Things(TM) happening (eg: df crashing with a floating point
exception).

Submitted by:	Harold Gutch <logix@foobar.franken.de>
Approved by:	re (scottl)
2003-05-19 22:35:00 +00:00
Peter Wemm
3830dc4629 Another x86-64 comment fixup
Approved by:	re (blanket amd64 stuff)
2003-05-19 22:19:02 +00:00
Peter Wemm
92f0cd89a0 s/x86_64/amd64/ in comments in header.
Approved by:	re (blanket amd64)
2003-05-19 22:15:30 +00:00
Alexander Kabaev
980ded9a7d sys/sys/limits.h:
- Fix visibilty test for LONG_BIT and WORD_BIT.  `#if defined(__FOO_VISIBLE)'
   is alays wrong because __FOO_VISIBLE is always defined (to 0 for
   invisibility).

sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:

 - Style fixes.

Submitted by:	bde
Reviewed by:	bsdmike
Approved by:	re (scottl)
2003-05-19 20:29:07 +00:00
Søren Schmidt
e1750fb855 Print the right position on disk errors
Approved by: re@
2003-05-19 13:43:12 +00:00
Søren Schmidt
c9f5649b3e Unbork the chip locating code.
Approved by: re@
2003-05-19 13:42:23 +00:00
Marcel Moolenaar
b8c4149cff Turn pmap_install_pte() into a critical section. We better not get
interrupted while writing into the VHPT table. While here, make sure
memory accesses a properly ordered. Tag invalidation must happen
first so that the hardware VHPT walker will not be able to match
this entry while we're updating it and we have to make sure the new
new tag gets written only after the PTE is completely updated.

Approved by: re (blanket)
2003-05-19 08:02:36 +00:00
Marcel Moolenaar
a75b99ea2d Unconditionally set pcb_current_pmap. WIP versions of the code
previously committed cleared pcb_current_pmap prior to changing
the region registers, but that was removed before committing.
Since we don't normally (at all?) pass a NULL pointer, the bug
was mostly harmless. Fix it while I'm here...

I'm here because we need to have data serialization after writing
to the region registers. Not doing so was likely the cause of the
hangs we were experiencing. General exceptions in cpu_switch may
also be caused by the lack of serialization.

Approved by: re (blanket)
2003-05-19 06:05:30 +00:00
Marcel Moolenaar
dc0bde0f18 pmap_install() needs to be atomic WRT to context switching. Protect
switching user regions (region 0-4) with schedlock. Avoid unnecessary
recursion on schedlock by moving the core functionality to another
function (pmap_switch()) where we assert schedlock is held. Turn
pmap_install() into a wrapper that grabs schedlock. This minimizes
the number of callsites that need to be changed.
Since we already have schedlock in cpu_switch() and cpu_throw(),
have them call pmap_switch() directly. These were also the only two
calls to pmap_install() outside pmap.c, so make pmap_install() static
and remove its prototype from pmap.h

Approved by: re (blanket)
2003-05-19 04:16:30 +00:00