backing store before we discard them. It is possible that we
enter the kernel (due to an execve in this case) with a lot of
dirty user registers and that the RSE has only partially spilled
them (to make room for new frames). We cannot move the backing
store pointer down (to discard user registers) when not all of
the user registers are on the backing store.
So, we flush the register stack IFF this happens. Unconditionally
doing the flush is too costly, because the condition in which we
need to flush is very rare.
This change appears to fix the SIGSEGV that sometimes happen for
newly executed processes and so far also appears to fix the last
of the corruption. It is possible, although not likely, that this
change prevents some other bug from happening, even though it is
itself not a fix. Hence the uncertainty. We'll know in a couple
of months I guess :-)
the stack to be changed in a way incompatible with elf32_map_insert()
where we used data_buf without initializing it for when the partial
mapping resulting in a misaligned image (typical when the page size
implied by the image is not the same as the page size in use by the
kernel). Since data_buf is passed by reference to vm_map_find(), the
compiler cannot warn about it.
While here, move all local variables to the top of the function.
of C strings internally; C strings require a lot of return value
checking that (a) takes a lot of space, and (b) is difficult to get
right. Prior to the advent of compartment support, modeling APIs
for helper functions on snprintf worked fine; with the additional
complexity, the sbuf_printf() API makes a lot more sense.
While doing this, break out the printing of sequential compartment
lists into a helper function, mac_{biba,mls}_compartment_to_string().
This permits the main body of mac_{biba,mls}_element_to_string()
to be concerned only with identifying sequential ranges rather
than rendering.
At a less disruptive moment, we'll push the move from snprintf()-like
interface to sbuf()-like interface up into the MAC Framework layer.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Use ->init() and ->fini() to handle the mutex in geom_disk.c
Remove the g_add_class() function and replace it with a standardized
g_modevent() function.
This adds the basic infrastructure for loading/unloading GEOM classes
only meaningful for fragments. Also don't bother to byte-swap the
ip_id when we do generate it; it is only used at the receiver as a
nonce. I tried several different permutations of this code with no
measurable difference to each other or to the unmodified version, so
I've settled on the one for which gcc seems to generate the best code.
(If anyone cares to microoptimize this differently for an architecture
where it actually matters, feel free.)
Suggested by: Steve Bellovin's paper in IMW'02
switch to it before calling mi_startup(). The bootstack is WAY too small
for running acpica during probe/attach. While here, pass modulep/physfree
to the startup routine, rather than writing to the global variables in
locore.S.
Approved by: re (amd64/*)
This machine uses a non-standard scheme to specify the interrupts to
be assigned for devices in PCI slots; instead of giving the INO
or full interrupt number (which is done for the other devices in this
box), the firmware interrupt properties contain intpin numbers, which
have to be swizzled as usual on PCI-PCI bridges; however, the PCI host
bridge nodes have no interrupt map, so we need to guess the
correct INO by slot number of the device or the closest PCI-PCI
bridge leading to it, and the intpin.
To do this, this fix makes the following changes:
- Add a newbus method for sparc64 PCI host bridges to guess
the INO, and glue code in ofw_pci_orb_callback() to invoke it based
on a new quirk entry. The guessing is only done for interrupt numbers
too low to contain any IGN found on e450s.
- Create another new quirk entry was created to prevent mapping of EBus
interrupts at PCI level; the e450 has full INOs in the interrupt
properties of EBus devices, so trying to remap them could cause
problems.
- Set both quirk entries for e450s; remove the no-swizzle entry.
- Determine the psycho half (bus A or B) a driver instance manages
in psycho_attach()
- Implement the new guessing method for psycho, using the slot number,
psycho half and property value (intpin).
Thanks go to the testers, especially Brian Denehy, who tested many kernels
for me until I had found the right workaround.
Tested by: Brian Denehy <B.Denehy@90east.com>, jake, fenner,
Marius Strobl <marius@alchemy.franken.de>,
Marian Dobre <mari@onix.ro>
Approved by: re (scottl)
The current name is confusing, because it indicates to
the client that a bus_dmamap_sync() operation is not
necessary when the flag is specified, which is wrong.
The main purpose of this flag is to hint the underlying
architecture that DMA memory should be mapped in a coherent
way, but the architecture can ignore it. But if the
architecture does supports coherent mapping of memory, then
it makes bus_dmamap_sync() calls cheap.
This flag is the same as the one in NetBSD's Bus DMA.
Reviewed by: gibbs, scottl, des (implicitly)
Approved by: re@ (jhb)
nfs_lock.c. Right now, if we permit a signal to interrupt the sleep,
we will slip the lock and no process on that client, the server, or
any other client will be able to acquire the lock. This can happen,
for example, if a user hits Ctrl-C or Ctrl-T while a process is
waiting for the lock. By removing PCATCH, we prevent that from
happening, at the cost of not permitting a user-requested lock abort:
also nasty. However, a user interface bug might be preferable to a
serious semantic bug, so we go with that for now.
We need to teach the rpc.lockd/kernel protocol how to abort lock
requests, and rpc.lockd how to handle aborted lock requests; patches
for the kernel bit are floating around, but no rpc.lockd bit yet.
Approved by: re (scottl)
mismerged from the MAC tree, and didn't get picked up because warnings
are not normally fatal in per-module builds, only when they are linked
into a kernel (such as LINT).
Reported by: des and the technicolor tinderbox
Approved by: re (scottl)
Use the special LUNLEN_SINGLE_LEVEL constant for
post Rev A4 hardware for single byte luns. Without
this change, Rev B hardware would place the single
byte of lun data in byte 0 of the lun structure when
it should be in byte 1. Since there are few if any
devices on the market that support multiple luns in
target mode, the corrupted lun field (which was only
corrupted for non-zero luns) wasn't hurting us.
Approved by: re (rwatson)
aic79xx.h:
aic79xx.reg:
Return the SCB_TAG field to 16byte alignment.
It seems that on some PCI systems, SCBs are not
transferred correctly to the controller with
the previous placement of the SCB_TAG field.
Approved by: re (rwatson)
section to stop gcc generating the dwarf2 .eh_frame unwind tables. It
is dead weight for the time being. Maybe it can be used to perform
stack traces and/or get the location of function arguments in ddb, but
that requires a dwarf2 runtime interpreter, which we do not have.
Approved by: re (amd64 "safe" bits)
disassembler has not been updated yet, and will do some very strange
things. It does tracebacks (without function arguments due to regparm
calling conventions) if -fno-omit-frame-pointer is used (to come later).
This achieves basic functionality.
Approved by: re (amd64/* blanket)
in the kernel, the sysctl_register() call would fail, as expected.
However, when unloading this module again, the kernel would then panic
in sysctl_unregister(). Print a message error instead.
Submitted by: Nicolai Petri <nicolai@catpipe.net>
Reviewed by: imp
Approved by: re@ (jhb)
value to be written into tick_compare in tick_hardclock(). While
we were taking care that the value to be written was at least TICK_GRACE
ticks in the future, a vector interrupt could happen between calculating
the value and writing it. If it took longer than TICK_GRACE to complete
(which is doubtful for a single device-triggered vector interrupt, but
quite likely for some IPIs), the value written would be in the past
and tick interrupts (which drive hardclock and statclock) would stop
until %tick wraps around, which takes a long time.
Also, increase TICK_GRACE from 1000 to 10000 for good measure.
Reported by: kris
Reviewed by: jake
Approved by: re (scottl)
ID allocation is not there yet. This fixes a few warnings about \_OS_ not
being found and an S3 freeze for one user.
Re-staticize AcpiNsRemoveReference() since it is not needed elsewhere.
Approved by: re (scottl)
buf_start() to avoid triggering a panic in softdep_disk_io_initiation()
if b_iocmd happened to be BIO_READ. The later initialisation of
b_iocmd in cluster_wbuild() could probably be moved to before the
buf_start() call, but this patch keeps the change as simple as
possible.
This is reported to fix occasional "softdep_disk_io_initiation: read"
panics, especially on NFS servers.
Reported by: Nick Hilliard <nick@netability.ie>
Tested by: Nick Hilliard <nick@netability.ie>
Approved by: re (rwatson)
function couldn't handle chains of > MCLBYTES, and it had a bug which
caused corruption and panics in certain low mbuf situations.
Additionally, change the failure case so that looutput returns ENOBUFS
rather than attempting to pass on non-defragmented mbuf chains.
Finally, remove the printf which would happen every time the low memory
situation occured. It served no useful purpose other than to clue me
in as to what was causing the panic in question. :)
MFC after: 4 days
865. The APSIZE register has a variable-sized field of enabled bits.
To figure out how many bits a specific host bridge supports, write the
maximum width and see how many bits are set in the hardware. We then
use this mask for setting and getting the aperture size. Prior to this,
the agp(4) driver would treat an aperture size of 256 MB as 128 MB and
would not allocate enough physical memory for the GART as a result.
MFC after: 3 days
Sponsored by: The Weather Channel
Approved by: re (rwatson)
NetBSD dsmethod.c rev 1.7
Fix parent-child loop problem
Fix a reference count problem that may cause unexpected memory free
Intel 20030512 ACPICA drop (nsalloc.c)
Approved by: re (jhb)
Obtained from: NetBSD, Intel
Reported by: mbr, kochi AT netbsd.org
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.
Approved by: re (bmah)
used by DDB and we cannot know in advance whether it's save to
sleep. It often enough isn't. We may want to pre-allocate space
to cover the most common cases without having to use malloc at
all, but that requires some analysis. We leave that for later.
Approved by: re@ (blanket)
o If the address was not within user space we jumped to fusufault
where we would clear pcb_onfault and return 0. There are two
bugs here:
1. We never got to the point where we assigned the address of
pcb_onfault to r15, which means that we would clobber some
random memory location, including I/O space or ROM.
2. We're supposed to return -1 on error.
o Make sure we have proper memory ordering for setting pcb_onfault,
doing the memory access to user space and clearing pcb_onfault.
For the fu* family of functions this means that we need a mf
instruction, because we don't have acquire semantics on stores
and release semantics on loads (hence st;ld cannot be ordered
without intermediate mf).
While here, implement casuptr() so that we are a (small) step
closer to supporting libthr and deobfuscate the non-implementation
of {f|s}uswintr.
Approved by: re@ (blanket)
VM_ALLOC_INTERRUPT to VM_ALLOC_SYSTEM. There was no mention of
this in commit log as it was considered harmless. Guess what:
it does harm. WITNESS showed that we can not safely grab the
page queue lock in vm_page_alloc() in all cases as we may have
to sleep on it. Revert the request to VM_ALLOC_INTERRUPT to
circumvent this. We panic if vm_page_alloc returns 0. I'm not
entirely happy about this, but we have bigger fish to fry.
Approved by: re@ (blanket)
aic79xx.c:
In ahd_handle_ign_wide_residue():
o Use SCB_XFERLEN_ODD SCB field to determine transfer
"oddness" rather than the DATA_COUNT_ODD logic.
SCB_XFERLEN_ODD is toggled on every ignore wide
residue message so that multiple ignore wide residue
messages for the same transaction are properly supported.
o If the sg list has been exausted, the sequencer
doesn't bother to update the residual data count
since it is known to be zero. Perform the zeroing
manually before calculating the remaining data count.
o Use multibyte in/out macros instead of shifting/masking
by hand.
aic79xx_inline.h:
In ahd_setup_scb_common(), setup the SCB_XFERLEN_ODD field.
aic79xx.reg:
Use the SCB_TASK_ATTRIBUTE field as a bit field in the
non-packetized case. We currently only define one bit,
SCB_XFERLEN_ODD.
Remove the ODD_SEG bit field that was used to carry the odd
transfer length information through the SG cache. This
is obviated by SCB_XFERLEN_ODD field.
Remove the DATA_COUNT_ODD scratch ram byte that was used
dynamicaly compute data transfer oddness. This is obviated
by SCB_XFERLEN_ODD field.
aic79xx.seq:
Remove all updates to the DATA_COUNT_ODD scratch ram field.
Remove all uses of ODD_SEG. These two save quite a few
sequencer instructions.
Use SCB_XFERLEN_ODD to validate the end of transfer
ignore wide residue message case.
aic7xxx.c:
In ahc_handle_ign_wide_residue():
o Use SCB_XFERLEN_ODD SCB field to determine transfer
"oddness" rather than the DATA_COUNT_ODD logic.
SCB_XFERLEN_ODD is toggled on every ignore wide
residue message so that multiple ignore wide residue
messages for the same transaction are properly supported.
o If the sg list has been exausted, the sequencer
doesn't bother to update the residual data count
since it is known to be zero. Perform the zeroing
manually before calculating the remaining data count.
o Ensure that SG_LIST_NULL is cleared in the
residual sg pointer for "mid-transfer" ignore
wide residue cases.
o Use multibyte in/out macros instead of shifting/masking
by hand.
aic7xxx.h:
Modify the SCB_GET_LUN() macro to mask the lun hardware
SCB field with LID. This leaves two bits in the LUN
field that can be used for other purposes.
aic7xxx.reg:
Change LID to be 0x3F. This is the maximum supported
lun size for non-packetized SCSI. Map the top bit
of the lun to SCB_XFERLEN_ODD. The host must set
this bit whenever a transfer is an odd length.
Remove the ODD_SEG bit field that was used to carry the odd
transfer length information through the SG cache. This
is obviated by SCB_XFERLEN_ODD field.
Remove the DATA_COUNT_ODD scratch ram byte that was used
dynamicaly compute data transfer oddness. This is obviated
by SCB_XFERLEN_ODD field.
aic7xxx.seq:
Be more careful in our handling of the SCB_LUN field. It
must be masked with LID if only lun information is desired.
Remove all updates to the DATA_COUNT_ODD scratch ram field.
Remove all uses of ODD_SEG. These two save quite a few
sequencer instructions.
Use SCB_XFERLEN_ODD to validate the end of transfer
ignore wide residue message case.
aic7xxx_inline.h:
In ahc_queue_scb(), setup the SCB_XFERLEN_ODD field.
Approved by: RE
FAILDIS in the SEQCTL register, not the HCNTRL register.
aic7xxx.c:
Remeber SEQCTL settings in the "seqctl" field of our
softc. seqctl defaults to just having FASTMODE set,
but the bus attachments can override this.
aic7xxx.h:
Add the seqctl softc field.
aic7xxx_pci.c:
Update the seqctl softc field and manually update SEQCTL
when to many PCI errors occur
Approved by: RE
to be more efficient by having the sequencer copy the
single byte of valid lun data into the long lun field.
aic79xx.c:
Memset our hardware SCB to 0 so that untouched
fields don't confuse diagnostic output. With the
old method for handling the Rev A bug, if the long
lun field was not 0, this could result in bogus
lun information being sent to drives.
Use the same SCB transfer size for all chip types
now that the long lun is not DMA'ed to the chip.
aic79xx.seq:
Add code to copy lun information for Rev.A hardware.
aic79xx_inline.h:
Remove host update of the long_lun field on every
packetized command.
Sort IDs based on chip type.
Remove IROC IDs. We'll switch to using the IROC masks
if/when we want to start attaching to IROC controllers.
Approved by: RE
because we could fail due to a small buffer and loop and rerun. If this
happens, then the vsnprintf() will have already taken the arguments off
the va_list. For i386 and others, this doesn't matter because the
va_list type is a passed as a copy. But on powerpc and amd64, this is
fatal because the va_list is a reference to an external structure that
keeps the vararg state due to the more complicated argument passing system.
On amd64, arguments can be passed as follows:
First 6 int/pointer type arguments go in registers, the rest go on
the memory stack.
Float and double are similar, except using SSE registers.
long double (80 bit precision) are similar except using the x87 stack.
Where the 'next argument' comes from depends on how many have been
processed so far and what type it is. For amd64, gcc keeps this state
somewhere that is referenced by the va_list.
I found a description that showed the va_copy was required here:
http://mirrors.ccs.neu.edu/cgi-bin/unixhelp/man-cgi?va_end+9
The single unix spec doesn't mention va_copy() at all.
Anyway, the problem was that the sysctl kern.geom.conf* nodes would panic
due to walking off the end of the va_arg lists in vsnprintf. A better fix
would be to have sbuf_vprintf() use a single pass and call kvprintf()
with a callback function that stored the results and grew the buffer
as needed.
Approved by: re (scottl)
is not pretty, but it fixes the code so that it no longer violates the
vnode locking rules in the VFS API and doesn't trip any of the locking
assertions enabled by the DEBUG_VFS_LOCKS kernel configuration option.
There is one report that this patch fixed a "locking against myself"
panic on an NFS server that was tripped by a diskless client.
Approved by: re (scottl)
structure, which is new to the 82550 and 82551, is used to transmit
a packet. This appears to fix the packet truncation problem that was
observed when using 82550-based fxp cards to transmit ICMP or fragmented
UDP packets of certain lengths which only had one to three bytes in the
second and final mbuf of the packet. This matches a note in the "Intel
8255x 10/100 Mbps Ethernet Controller Family Open Source Software Developer
Manual", which says that the hardware parse bit should be set when sending
these types of packets.
There have also been unconfirmed reports of similar problems when
transmitting TCP packets, which should not be affected by the above
mentioned change because the hardware parse bit was already being set
if the stack requested hardware checksumming of the packet. If the
problem remains, the use of the IPCB structure can be disabled to
cause the driver to fall back to using the older 82559 interface with
82550-based cards by setting
hint.fxp.UNIT_NUMBER.ipcbxmit_disable
to a non-zero value at boot time, or using kenv to set this variable
before using kldload to load the fxp driver.
Approved by: re (jhb)
kernel's VA regions, we cannot limit the use of break-based
syscalls to user mode only. The signal trampolines are in the
gateway page, which is mapped into the process address space in
region 5 and thus is kernel space.
We don't special case the gateway page here. Allow break-based
syscalls from anywhere in the kernel VA space.
Approved by: re@ (blanket)
set on realtek cards, but they work without it (and don't work with
it). The standard seems to imply that this is just a hint anyway, so
this should be harmless. It doesn't appear to be set on any other
cardbus cards that I have (or have seen).
This should make the rl based CardBus cards work again. I've been
running it for about a month now.
Approved by: re@ (jhb)
to userland with interrupts disabled until we restore PSR. However,
it has been observed that interrupts do actually happen before they
are enabled again. This is a bit surprising and I don't know yet
what's going on exactly. Nevertheless, the code was not crafted
carefully enough to allow interrupts to happen and we could
clobber the kernel stack of another thread when interrupts did
happen.
This is what happens: we restore the (memory) stack pointer (sp)
and the register stack base prior to restoring ar.k6 and ar.k7.
This is not a problem if interrupts don't happen between setting
sp/ar.bspstore and ar.k6/ar.k7. Alas, interrupts can happen.
Since sp/ar.bspstore already point to the userland stacks, we
need to switch to the kernel stack in interrupt. However, ar.k6
and ar.k7 have not been set, which means that we were switching
to some unrelated kstack and happily clobbered the trapframe
present there if the thread to which the kstack belonged was
in kernel mode or otherwise we could have our trapframe clobbered
if that other thread enters the kernel. Nasty either way.
We now carefully restore ar.k6 prior to restoring ar.bspstore and
likewise for ar.k7 and sp. All we need is the guarantee that an
interrupt does not clobber ar.k6 or ar.k7 before we're back in
userland. That has been achieved by restoring ar.k6/ar.k7
unconditionally (see exception.s)
While here, remove the disabling of interrupts on EPC entry. It
was added as a way to "resolve" the crashes until it was understood
what was going on. I think I achieved the latter, so we can remove
the patch. Note that setting up a trapframe with interrupts
enabled has it's own share of corner cases, but it's better to
properly fixed those than to keep a mostly wrong patch around
because we're afraid to remove it...
Approved by: re@ (blanket)
PSR only to achieve setting PSR.i back to it's previous value. It
makes it impossible to change any of the 30+ other unrelated bits
when done between intr_disable() and intr_restore(). That's bad.
Instead have intr_disable() return 1 when interrupts were previously
enabled and 0 otherwise and only enable interrupts in intr_restore()
when given a non-0 value.
This change specifically disallows using intr_restore() to disable
interrupts. The reason is simple: interrupts only need to be restored
after they are being disabled, which means that intr_restore() is
called with interrupts disabled and we only need to enable them if
they were previously enabled.
This change does not fix any bugs, other than that it bugged me...
Approved by: re@ (blanket)
and user mode. We need to take into account that the EPC syscall path
introduces a grey area in which one can argue either way, including a
third: neither.
We now use the region in which the IP address lies. Regions 5, 6 and 7
are kernel VA regions and if the IP lies any any of those regions we
assume we're in kernel mode. Hence, we can be in kernel mode even if
we're not on the kernel stack and/or have user privileges. There're
gremlins living in the twilight zone :-)
For the EPC syscall path this particularly means that the process
leaves user mode the moment it calls into the gateway page. This
makes the most sense because from a process' point of view the call
represents a request to the kernel for some service and that service
has been performed if the call returns. With the metric we picked,
this also means that we're back in user mode IFF the call returns.
Approved by: re@ (blanket)
when returning from an interrupt. Both registers are used on interrupt
to switch to the right kernel stack, but other than that they are not
used. This means we only have to make sure they contain proper values
while in user mode. As such, we conditionally restored these registers
based on whether we returned to userland or not. A nice property of
conditionally restoring ar.k6 and ar.k7 is that it introduces two
invariants: ar.k6 always points to the bottom of the kernel stack and
ar.k7 always points to the top of the kernel stack (immediately below
the PCB we have there).
However, the EPC syscall path introduces an irregularity: there's no
"thin red line" between user and kernel. There's a grey area that's a
couple of instructions wide. Any interruption in that grey area is
bound to see an inconsistent state. One such state is that we're in
kernel space for all practical purposes, but we still need to have
ar.k6 and ar.k7 restored as if we're in userland.
Thus: restore ar.k6 and ar.k7 unconditionally at the cost of losing
a valuable invariant. Both registers now hold the extend of the
usable portion of the kernel stack at any interrupt nesting, which
when in userland mean the bottom and the top of the kstack.
On alpha, PAL is involved in context management and after wiring
the CPU (in alpha_init()) a context switch was performed to tell
PAL about the context. This was bogusly brought over to ia64
where it introduced bugs, because we restored the context from
a mostly uninitialized PCB.
The cleanup constitutes:
o Remove the unused arguments from ia64_init().
o Don't return from ia64_init(), but instead call mi_startup()
directly. This reduces the amount of muckery in assembly and
also allows for the next bullet:
o Save our currect context prior to calling mi_startup(). The
reason for this is that many threads are created from thread0
by cloning the PCB. By saving our context in the PCB, we have
something sane to clone. It also ensures that a cloned thread
that does not alter the context in any way will return to
the saved context, where we're ready for the eventuality with
a nice, user unfriendly panic().
The cleanup fixes at least the following bugs:
o Entering mi_startup() with the RSE in enforced lazy mode.
o Re-execution of ia64_init() in certain "lab" conditions.
While here, add proper unwind directives to __start() so that
the unwind knows it has reached the bottom of the (call) stack.
Approved by: re@ (blanket)
When interrupting a kernel context, we don't need to switch stacks
(memory nor register). As such, we were also not restoring the
register stack pointer (ar.bspstore). This, however, fails to be
valid in 1 situation: when we interrupt a register stack switch as
is being done in restorectx(). The problem is that restorectx()
needs to have ar.bsp == ar.bspstore before it can assign the new
value to ar.bspstore. This is achieved by doing a loadrs prior to
assigning to ar.bspstore. If we take an interrupt in between the
loadrs and the assignment and we don't make sure we restore the
ar.bspstore prior to returning from the interrupt, we switch
stacks with possibly non-zero dirty registers, which means that
the new frame pointer (ar.bsp) will be invalid.
So, instead of jumping over the restoration of the register frame
pointer and related registers, we conditionalize it based on whether
we return to kernel context or user context. A future performance
tweak is possible by only restoring ar.bspstore when returning to
kernel mode *and* when the RSE is in enforced lazy mode. One cannot
assume ar.bsp == ar.bspstore if the RSE is not in enforced lazy mode
anyway.
While here (well, not quite) don't unconditionally assign to
ar.bspstore in exception_save. Only do that when we actually switch
stacks. It can only harm us to do it unconditionally.
Approved by: re@ (blanket)
register stack. There's nothing really wrong with flushing before
putting the RSE in enforced lazy mode, provided you don't depend on
ar.bspstore being equal to ar.bsp when the RSE has been put in
enforced lazy more. The small window between the flush and setting
the RSE may be sufficient to have the RSE eagerly increase the dirty
region (and hence cause ar.bspstore != ar.bsp) or have an interrupt
that may even get the laziest RSE to do something.
Anyway: we don't depend on ar.bspstore being equal to ar.bsp, so
nothing was and is broken. But the code was non-intuitive and
easily confuses. This is a source of future bugs.
Note: the advantage of not depending on ar.bspstore is that there's
some recilience against an interrupted flushrs. Clobbering is limited
to stacked register contents only, not to RSE address clobbering.
Approved: re@ (blanket)
size and the kernel's heap size, specifically, vm_kmem_size. This
function allows a maximum of 40% of the vm_kmem_size to be used for
vnodes and vm objects. This is a conservative bound based upon recent
problem reports. (In other words, a slight increase in this percentage
may be safe.)
Finally, machines with less than ~3GB of RAM should be unaffected
by this change, i.e., the maximum number of vnodes should remain
the same. If necessary, machines with 3GB or more of RAM can increase
the maximum number of vnodes by increasing vm_kmem_size.
Desired by: scottl
Tested by: jake
Approved by: re (rwatson,scottl)
buffer space instead of a u_int32_t. Otherwise the upper 32 bits of
the address space get truncated and syscons blows up.
Approved by: re (safe, low risk amd64 fixes)
having their stack at the 512GB mark. Give 4GB of user VM space for 32
bit apps. Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).
Approved by: re (blanket)
systems. Of note:
- Implement a direct mapped region using 2MB pages. This eliminates the
need for temporary mappings when getting ptes. This supports up to
512GB of physical memory for now. This should be enough for a while.
- Implement a 4-tier page table system. Most of the infrastructure is
there for 128TB of userland virtual address space, but only 512GB is
presently enabled due to a mystery bug somewhere. The design of this
was heavily inspired by the alpha pmap.c.
- The kernel is moved into the negative address space(!).
- The kernel has 2GB of KVM available.
- Provide a uma memory allocator to use the direct map region to take
advantage of the 2MB TLBs.
- Fixed some assumptions in the bus_space macros about the ability
to fit virtual addresses in an 'int'.
Notable missing things:
- pmap_growkernel() should be able to grow to 512GB of KVM by expanding
downwards below kernbase. The kernel must be at the top 2GB of the
negative address space because of gcc code generation strategies.
- need to fix the >512GB user vm code.
Approved by: re (blanket)
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
expand_table: Add parameters file and line if we're debugging.
Approved by: re (jhb)
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
Add and clarify comments.
Approved by: re (jhb)
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
Approved by: re (jhb)
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
update_volume_config: Remove redundant diskconfig parameter.
expand_table: Add parameters file and line if we're debugging.
Approved by: re (jhb)
Submitted by: Ted Unangst <tedu@stanford.edu>
Correct some inaccurate and badly formatted comments.
config_subdisk: If our drive is down, ensure that the subdisk is
crashed. Previously it was possible for the subdisk
to be up when the drive was down.
Change the way the plex lock mutexes work. Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded. Now we maintain a pool of mutexes (currently
32) to be shared by all plexes. This is still a lot better than the
splhigh() method used in other architectures.
update_volume_config: Remove redundant diskconfig parameter.
Approved by: re (jhb)
keep the thread state variable consistent with its real state.
i.e. Don't say it's on the run queue when it isn't.
Also clarify the associated comment.
Turns a double panic back to a single panic :-/
Approved by: re@ (jhb)
Thus, treat all page faults while in a critical section as fatal rather
than just those that occur with a non-empty spinlocks list. All such page
faults are fatal anyways. Calling trap_fatal() earlier increases the
chances of getting more useful panic messages and a possible DDB prompt.
Approved by: re (scottl)
failt and data access fault install the PTE in question into
the VHPT table. However, a post-increment was missing and we
wrote the raw PTE data into the pagesize/access key field.
This leaves a corrupt VHPT entry.
o While here, remove the explicit cache purge. Insertion into
the translation implicitly purges any overlapping entries.
o Make sure there's a cycle break between the itc and the rfi.
o Whitespace fixes.
a mutex. The only volatile chain operations are insertion and deletion
but since updating an existing PTE also updates the VHPT entry itself,
and we have the VHPT mutex in both other cases, we also lock when we
update an existing PTE even though no chain operation is involved.
Note that we perform the insertion and deletion careful enough that
we don't need to lock traversals. If we need to lock traversals, we
also need to lock from the exception handler, which we can't without
creating a trapframe.
We're now able to withstand a -j8 buildworld. More work is needed to
withstand Murphy fields. In other words: we still have a bogon...
Approved by: re@ (blanket)
to avoid Bad Things(TM) happening (eg: df crashing with a floating point
exception).
Submitted by: Harold Gutch <logix@foobar.franken.de>
Approved by: re (scottl)
- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).
sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:
- Style fixes.
Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)