On alpha, PAL is involved in context management and after wiring
the CPU (in alpha_init()) a context switch was performed to tell
PAL about the context. This was bogusly brought over to ia64
where it introduced bugs, because we restored the context from
a mostly uninitialized PCB.
The cleanup constitutes:
o Remove the unused arguments from ia64_init().
o Don't return from ia64_init(), but instead call mi_startup()
directly. This reduces the amount of muckery in assembly and
also allows for the next bullet:
o Save our currect context prior to calling mi_startup(). The
reason for this is that many threads are created from thread0
by cloning the PCB. By saving our context in the PCB, we have
something sane to clone. It also ensures that a cloned thread
that does not alter the context in any way will return to
the saved context, where we're ready for the eventuality with
a nice, user unfriendly panic().
The cleanup fixes at least the following bugs:
o Entering mi_startup() with the RSE in enforced lazy mode.
o Re-execution of ia64_init() in certain "lab" conditions.
While here, add proper unwind directives to __start() so that
the unwind knows it has reached the bottom of the (call) stack.
Approved by: re@ (blanket)
When interrupting a kernel context, we don't need to switch stacks
(memory nor register). As such, we were also not restoring the
register stack pointer (ar.bspstore). This, however, fails to be
valid in 1 situation: when we interrupt a register stack switch as
is being done in restorectx(). The problem is that restorectx()
needs to have ar.bsp == ar.bspstore before it can assign the new
value to ar.bspstore. This is achieved by doing a loadrs prior to
assigning to ar.bspstore. If we take an interrupt in between the
loadrs and the assignment and we don't make sure we restore the
ar.bspstore prior to returning from the interrupt, we switch
stacks with possibly non-zero dirty registers, which means that
the new frame pointer (ar.bsp) will be invalid.
So, instead of jumping over the restoration of the register frame
pointer and related registers, we conditionalize it based on whether
we return to kernel context or user context. A future performance
tweak is possible by only restoring ar.bspstore when returning to
kernel mode *and* when the RSE is in enforced lazy mode. One cannot
assume ar.bsp == ar.bspstore if the RSE is not in enforced lazy mode
anyway.
While here (well, not quite) don't unconditionally assign to
ar.bspstore in exception_save. Only do that when we actually switch
stacks. It can only harm us to do it unconditionally.
Approved by: re@ (blanket)
register stack. There's nothing really wrong with flushing before
putting the RSE in enforced lazy mode, provided you don't depend on
ar.bspstore being equal to ar.bsp when the RSE has been put in
enforced lazy more. The small window between the flush and setting
the RSE may be sufficient to have the RSE eagerly increase the dirty
region (and hence cause ar.bspstore != ar.bsp) or have an interrupt
that may even get the laziest RSE to do something.
Anyway: we don't depend on ar.bspstore being equal to ar.bsp, so
nothing was and is broken. But the code was non-intuitive and
easily confuses. This is a source of future bugs.
Note: the advantage of not depending on ar.bspstore is that there's
some recilience against an interrupted flushrs. Clobbering is limited
to stacked register contents only, not to RSE address clobbering.
Approved: re@ (blanket)
size and the kernel's heap size, specifically, vm_kmem_size. This
function allows a maximum of 40% of the vm_kmem_size to be used for
vnodes and vm objects. This is a conservative bound based upon recent
problem reports. (In other words, a slight increase in this percentage
may be safe.)
Finally, machines with less than ~3GB of RAM should be unaffected
by this change, i.e., the maximum number of vnodes should remain
the same. If necessary, machines with 3GB or more of RAM can increase
the maximum number of vnodes by increasing vm_kmem_size.
Desired by: scottl
Tested by: jake
Approved by: re (rwatson,scottl)
buffer space instead of a u_int32_t. Otherwise the upper 32 bits of
the address space get truncated and syscons blows up.
Approved by: re (safe, low risk amd64 fixes)
having their stack at the 512GB mark. Give 4GB of user VM space for 32
bit apps. Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).
Approved by: re (blanket)
systems. Of note:
- Implement a direct mapped region using 2MB pages. This eliminates the
need for temporary mappings when getting ptes. This supports up to
512GB of physical memory for now. This should be enough for a while.
- Implement a 4-tier page table system. Most of the infrastructure is
there for 128TB of userland virtual address space, but only 512GB is
presently enabled due to a mystery bug somewhere. The design of this
was heavily inspired by the alpha pmap.c.
- The kernel is moved into the negative address space(!).
- The kernel has 2GB of KVM available.
- Provide a uma memory allocator to use the direct map region to take
advantage of the 2MB TLBs.
- Fixed some assumptions in the bus_space macros about the ability
to fit virtual addresses in an 'int'.
Notable missing things:
- pmap_growkernel() should be able to grow to 512GB of KVM by expanding
downwards below kernbase. The kernel must be at the top 2GB of the
negative address space because of gcc code generation strategies.
- need to fix the >512GB user vm code.
Approved by: re (blanket)
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
expand_table: Add parameters file and line if we're debugging.
Approved by: re (jhb)
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
Add and clarify comments.
Approved by: re (jhb)
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
Approved by: re (jhb)
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
update_volume_config: Remove redundant diskconfig parameter.
expand_table: Add parameters file and line if we're debugging.
Approved by: re (jhb)
Submitted by: Ted Unangst <tedu@stanford.edu>
Correct some inaccurate and badly formatted comments.
config_subdisk: If our drive is down, ensure that the subdisk is
crashed. Previously it was possible for the subdisk
to be up when the drive was down.
Change the way the plex lock mutexes work. Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
update_volume_config: Remove redundant diskconfig parameter.
Approved by: re (jhb)
keep the thread state variable consistent with its real state.
i.e. Don't say it's on the run queue when it isn't.
Also clarify the associated comment.
Turns a double panic back to a single panic :-/
Approved by: re@ (jhb)
Thus, treat all page faults while in a critical section as fatal rather
than just those that occur with a non-empty spinlocks list. All such page
faults are fatal anyways. Calling trap_fatal() earlier increases the
chances of getting more useful panic messages and a possible DDB prompt.
Approved by: re (scottl)
failt and data access fault install the PTE in question into
the VHPT table. However, a post-increment was missing and we
wrote the raw PTE data into the pagesize/access key field.
This leaves a corrupt VHPT entry.
o While here, remove the explicit cache purge. Insertion into
the translation implicitly purges any overlapping entries.
o Make sure there's a cycle break between the itc and the rfi.
o Whitespace fixes.
a mutex. The only volatile chain operations are insertion and deletion
but since updating an existing PTE also updates the VHPT entry itself,
and we have the VHPT mutex in both other cases, we also lock when we
update an existing PTE even though no chain operation is involved.
Note that we perform the insertion and deletion careful enough that
we don't need to lock traversals. If we need to lock traversals, we
also need to lock from the exception handler, which we can't without
creating a trapframe.
We're now able to withstand a -j8 buildworld. More work is needed to
withstand Murphy fields. In other words: we still have a bogon...
Approved by: re@ (blanket)
to avoid Bad Things(TM) happening (eg: df crashing with a floating point
exception).
Submitted by: Harold Gutch <logix@foobar.franken.de>
Approved by: re (scottl)
- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).
sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:
- Style fixes.
Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)
interrupted while writing into the VHPT table. While here, make sure
memory accesses a properly ordered. Tag invalidation must happen
first so that the hardware VHPT walker will not be able to match
this entry while we're updating it and we have to make sure the new
new tag gets written only after the PTE is completely updated.
Approved by: re (blanket)
previously committed cleared pcb_current_pmap prior to changing
the region registers, but that was removed before committing.
Since we don't normally (at all?) pass a NULL pointer, the bug
was mostly harmless. Fix it while I'm here...
I'm here because we need to have data serialization after writing
to the region registers. Not doing so was likely the cause of the
hangs we were experiencing. General exceptions in cpu_switch may
also be caused by the lack of serialization.
Approved by: re (blanket)
switching user regions (region 0-4) with schedlock. Avoid unnecessary
recursion on schedlock by moving the core functionality to another
function (pmap_switch()) where we assert schedlock is held. Turn
pmap_install() into a wrapper that grabs schedlock. This minimizes
the number of callsites that need to be changed.
Since we already have schedlock in cpu_switch() and cpu_throw(),
have them call pmap_switch() directly. These were also the only two
calls to pmap_install() outside pmap.c, so make pmap_install() static
and remove its prototype from pmap.h
Approved by: re (blanket)
Change config format slightly to save plex preferences correctly.
vinum_scandisk: reinitialise volatile pointer after function call.
This is the "deafc0de" bug.
Approved by: re (scottl)
processes in the first pass. Among other things, this will give
us a chance to launder vnode-backed pages before concluding that
we need more swap. This is particularly useful for systems that
have no swap.
While here, update a comment and remove some long-unused code.
Reported by: Lucky Green <shamrock@cypherpunks.to>
Suggested by: dillon
Approved by: re (rwatson)
This fixes net/pppoa port for Alcatel Speedtouch devices.
Submitted by: Jay Cornwall <jay@evilrealms.net>
Tested by: Francois Rogler <francois@rogler.org>
Approved by: re (scottl)