Thus, treat all page faults while in a critical section as fatal rather
than just those that occur with a non-empty spinlocks list. All such page
faults are fatal anyways. Calling trap_fatal() earlier increases the
chances of getting more useful panic messages and a possible DDB prompt.
Approved by: re (scottl)
failt and data access fault install the PTE in question into
the VHPT table. However, a post-increment was missing and we
wrote the raw PTE data into the pagesize/access key field.
This leaves a corrupt VHPT entry.
o While here, remove the explicit cache purge. Insertion into
the translation implicitly purges any overlapping entries.
o Make sure there's a cycle break between the itc and the rfi.
o Whitespace fixes.
a mutex. The only volatile chain operations are insertion and deletion
but since updating an existing PTE also updates the VHPT entry itself,
and we have the VHPT mutex in both other cases, we also lock when we
update an existing PTE even though no chain operation is involved.
Note that we perform the insertion and deletion careful enough that
we don't need to lock traversals. If we need to lock traversals, we
also need to lock from the exception handler, which we can't without
creating a trapframe.
We're now able to withstand a -j8 buildworld. More work is needed to
withstand Murphy fields. In other words: we still have a bogon...
Approved by: re@ (blanket)
to avoid Bad Things(TM) happening (eg: df crashing with a floating point
exception).
Submitted by: Harold Gutch <logix@foobar.franken.de>
Approved by: re (scottl)
- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).
sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:
- Style fixes.
Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)
interrupted while writing into the VHPT table. While here, make sure
memory accesses a properly ordered. Tag invalidation must happen
first so that the hardware VHPT walker will not be able to match
this entry while we're updating it and we have to make sure the new
new tag gets written only after the PTE is completely updated.
Approved by: re (blanket)
previously committed cleared pcb_current_pmap prior to changing
the region registers, but that was removed before committing.
Since we don't normally (at all?) pass a NULL pointer, the bug
was mostly harmless. Fix it while I'm here...
I'm here because we need to have data serialization after writing
to the region registers. Not doing so was likely the cause of the
hangs we were experiencing. General exceptions in cpu_switch may
also be caused by the lack of serialization.
Approved by: re (blanket)
switching user regions (region 0-4) with schedlock. Avoid unnecessary
recursion on schedlock by moving the core functionality to another
function (pmap_switch()) where we assert schedlock is held. Turn
pmap_install() into a wrapper that grabs schedlock. This minimizes
the number of callsites that need to be changed.
Since we already have schedlock in cpu_switch() and cpu_throw(),
have them call pmap_switch() directly. These were also the only two
calls to pmap_install() outside pmap.c, so make pmap_install() static
and remove its prototype from pmap.h
Approved by: re (blanket)
Change config format slightly to save plex preferences correctly.
vinum_scandisk: reinitialise volatile pointer after function call.
This is the "deafc0de" bug.
Approved by: re (scottl)
processes in the first pass. Among other things, this will give
us a chance to launder vnode-backed pages before concluding that
we need more swap. This is particularly useful for systems that
have no swap.
While here, update a comment and remove some long-unused code.
Reported by: Lucky Green <shamrock@cypherpunks.to>
Suggested by: dillon
Approved by: re (rwatson)
This fixes net/pppoa port for Alcatel Speedtouch devices.
Submitted by: Jay Cornwall <jay@evilrealms.net>
Tested by: Francois Rogler <francois@rogler.org>
Approved by: re (scottl)
prime objectives are:
o Implement a syscall path based on the epc inststruction (see
sys/ia64/ia64/syscall.s).
o Revisit the places were we need to save and restore registers
and define those contexts in terms of the register sets (see
sys/ia64/include/_regset.h).
Secundairy objectives:
o Remove the requirement to use contigmalloc for kernel stacks.
o Better handling of the high FP registers for SMP systems.
o Switch to the new cpu_switch() and cpu_throw() semantics.
o Add a good unwinder to reconstruct contexts for the rare
cases we need to (see sys/contrib/ia64/libuwx)
Many files are affected by this change. Functionally it boils
down to:
o The EPC syscall doesn't preserve registers it does not need
to preserve and places the arguments differently on the stack.
This affects libc and truss.
o The address of the kernel page directory (kptdir) had to
be unstaticized for use by the nested TLB fault handler.
The name has been changed to ia64_kptdir to avoid conflicts.
The renaming affects libkvm.
o The trapframe only contains the special registers and the
scratch registers. For syscalls using the EPC syscall path
no scratch registers are saved. This affects all places where
the trapframe is accessed. Most notably the unaligned access
handler, the signal delivery code and the debugger.
o Context switching only partly saves the special registers
and the preserved registers. This affects cpu_switch() and
triggered the move to the new semantics, which additionally
affects cpu_throw().
o The high FP registers are either in the PCB or on some
CPU. context switching for them is done lazily. This affects
trap().
o The mcontext has room for all registers, but not all of them
have to be defined in all cases. This mostly affects signal
delivery code now. The *context syscalls are as of yet still
unimplemented.
Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.
Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.
This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.
Approved by: re@ (jhb)
the vnode and restart the loop. Vflush() is vulnerable since it does not
hold a reference to the vnode and it holds no other locks while waiting
for the vnode lock. The vnode will no longer be on the list when the
loop is restarted.
Approved by: re (rwatson)
switching to kernel_pmap. The pmap is not special enough.
o Clear the active bit on the pmap we're switching out.
o Fix some nearby style(9) bugs.
Approved by: re@
kernel_vm_end in pmap_bootstrap. Don't delay the initialization until
we need to grow the kernel VM space. This BTW happens twice before
we enter either single- or multi-user mode. Don't adjust kernel_vm_end
while growing based on whether the KPT contains a non-NULL entry. We
trust kernel_vm_end to be correct and we make sure it's still correct
after growing.
Define virtual_avail and virtual_end in terms of VM_MIN_KERNEL_ADDRESS
and VM_MAX_KERNEL_ADDRESS (resp). Don't hardcode region knowledge.
o Limit the size of the region ID map to 64KB. This gives a bitmap
that is large enough to keep track of 2^19 numbers. The minimal map
size is 32KB. The reason we limit the map size is that processor
models may have implemented a 24-bit region ID, which would give
a 2MB bitmap while the maximum number of allocations is always
less than PID_MAX*5, which is less than 2^19.
o Allocate all region IDs up-front. The slight downside of reserving
more RIDs then a process needs (3 for ia64 native and 1 for ia32)
is preferable over the call to pmap_ensure_rid() where RIDs are
allocated on demand. On SMP systems this may lead to a race
condition.
o When allocating a region ID, don't use arc4random(). We're not
interested in randomness or uniform distribution across the
spectrum. We only need uniqueness. Random numbers may easily
collide when the number of allocated RIDs is high, creating a
possibly unbounded retry rate.
ia64 only uses relocations with addend, remove the sections specific to
non-addend relocations (.rel.*). Also remove C++ specific sections.
Approved by: re@ (blanket)
PT_DETACH ptrace(2) requests from functioning as advertised in the
manual page. As described in kern/35175, the PT_DETACH request will,
under certain circumstances, pass an unwanted signal on to the traced
process upan detaching from it. The PT_CONTINUE request will
sometimes fail if you make it pass a signal that has "properties" that
differ from the properties of the signal that origionally caused the
traced process to be stopped. Since PT_KILL is nothing than
PT_CONTINUE with SIGKILL, it is broken too. In the PT_KILL case, this
leads to an unkillable process.
PR: 44011
Submitted by: Mark Kettenis <kettenis@chello.nl>
Approved by: re(jhb)
on if_fxp cards. When flow control is enabled, if the operating system
doesn't acknowledge the packet buffer filling, the card will begin to
generate ethernet quench packets, but appears to get into a feedback
loop of some sort, hosing local switches. This is a temporary workaround
for 5.1: the ability to configure flow control should probably be
exposed by some or another management interface on ethernet link layer
devices.
Approved by: re (bmah)
Reviewed by: mux
instead of taking the (userland) eflags from the trap frame and masking
out PSL_I. There is no need to inherit any flags from the forking process;
the old method however can cause flags set in userland for the forking
process to be bogusly set in kernel mode when the newly forked process
runs for the first time (in particular PSL_T, which is set for userland
when the process is single-stepped; this would cause trace traps in
kernel mode).
Approved by: re (jhb)
VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS
will panic, and (b) it may be possible to corrupt entries in the cached
vnode attributes in the nfsnode, since nfsnode attribute cache data is
also protected by the vnode lock.
Approved by: re (jhb)
Pointed out by: VFS_DEBUG_LOCKS
only while holding appropriate vnode locks. This patch slides the lock
release for ufs_extattr_enable() to continue to hold the active vnode lock
on a backing file until after the flag change; it also acquires a vnode
lock when disabling an attribute and hence clearing a flag on the backing
vnode. This permits VFS_DEBUG_LOCKS to run UFS1 extended attributes
without panicking, as well as preventing a potential race and vnode flag
problem.
Approved by: re (jhb)
Pointed out by: DEBUG_VFS_LOCKS
netstat(1) not display it for now because its effects are not yet
completely implemented and we're about to cut 5.2-RELEASE.
This is temporary.
Approved by: re (scottl, rwatson)
in the case where the bridge node was closed down but a timeout
still applied to it, the final reference to the node was freeing the private
data structure using the wrong malloc type.
Approved by: re@
all of the Optio series have the same problems. It might be a better
approach eventually to add wildcard support to USB quirks.
PR: kern/50271, kern/46369
Approved by: re (rwatson)
- Fix compilation without GEM_DEBUG.
- Do not #define GEM_DEBUG by default; it adds overhead (due to bzero()ing
RX space) and is not needed any more, since the driver is quite stable
now.
- Fix watchdog timeouts when failing to load TX packets.
- Do not forcibly limit the number of descriptors used for a packet to
GEM_NTXSEGS, by passing this number to bus_dma_tag_create(). There is
no requirement for a limit any lower than the total number of
available descriptors, and the present limit caused network problems
due to mbuf chains requiring more descriptors.
GEM_NTXSEGS is still used to estimate the interrupt window size, for
which we just need an estimate.
Approved by: re (rwatson)
The submitter of PR 32118 told me that this patch also fixes autoselecting
for znyx 4 port cards (10baseT, 100baseTX did work already).
PR: 32118
Reviewed by: imp
Approved by: rwatson (re)
The advantage of using register sets is that you don't focus on each
register seperately, but instead instroduce a level of abstraction.
This reduces the chance of errors, and also simplifies the code.
The register sers form the basis of everything register.
The sets in this file are:
struct _special
contains all of the control related registers, such as instruction
pointer and stack pointer. It also contains interrupt specific registers
like the faulting address. The set is roughly split in 3 groups. The
first contains the registers that define a context or thread. This is
the only group that the kernel needs to switch threads. The second group
contains registers needed in addition to the first group needed to switch
userland threads. This group contains the thread pointer and the FP control
register. The third group contains those registers we need for execption
handling and are used on top of the first two groups.
struct _callee_saved, struct _callee_saved_fp
These sets contain the preserved registers, including the NaT after
spilling. The general registers (including branch registers) are
seperated from the FP registers for ptrace(2).
struct _caller_saved, struct _caller_saved_fp
These sets contain the scratch registers based on SDM 2.1, This means that
both ar.csd and ar.ccd are included here, even though they contain ia32
segment register descriptions. We keep seperate NaT bits for scratch and
preserved registers, because they are never saved/restored at the same
time.
struct _high_fp
The upper 96 FP registers that can be enabled/disabled seperately on
the CPU from the lower 32 FP registers. Due to the size of this set,
we treat them specially, even though they are defined as scratch
registers.
CVS ----------------------------------------------------------------------
save and restore "sets" of registers in various places.
The restorectx and swapctx functions are used by cpu_switch()
and deal with the special registers, as well as the preserved
registers.
The *callee_saved* functions are used to save and restore the
preserved registers (integer and floating-point). They are
useful for signal delivery and ptrace support.
The save_high_fp and restore_high_fp functions are used to
"load" and "unload" to and from the CPU as part of lazy context
switching.
The ia32 specific context functions have been kept with the ia32
code.
Approved by: re@ (blanket)
on the epc instruction. The epc instruction, given the permissions
of the page in which the epc is located, allows the privilege level
to be increased with little or no overhead. The previous privilege
level is recorded in the current frame marker and is restored by
a regular (function) return.
Since the epc instruction has to live in a page with non-standard
properties, we hardwire a "gateway" page in the address space. The
address of the gateway page is exported to userland in ar.k7. This
allows us to rewire the page without breaking the ABI.
The syscall stubs in libc are regular function calls that slightly
differ from the normal runtime. The difference is mostly to simplify
the stubs themselves by by moving some of the logic to the kernel.
The libc stubs call into the gateway page (offset 0), from where the
kernel trampolines to the code that sets up a minimal trapframe and
arranges to execute from the kernel stack.
The way back is basicly the same. The kernel returns to the gateway
page, whereby privilege is dropped, and jumps back to the syscall
stub.
Only the special registers are saved in the trapframe. None of the
scratch registers are preserved and since the kernel follows the
same runtime model, none of the preserved registers are saved.
Future enhancements can include the implementation of lightweight
syscalls, where kernel functions are performed without setting up
a trapframe. Good candidates are the *context syscalls for example.
Now that there's a gateway page from which code can be executed in
a non-privileged context, we also have the ideal place to put the
signal trampolines. By moving the signal trampolines from the user
stack to the gateway page, we open up the doors to unexecutable
stacks. The gateway page contains signal trampolines for both the
"legacy" break-based syscall code and the new and improved epc-
based syscall code.
Approved: re@ (blanket)
available by Hewlett-Packard under the MIT license. The unwinder is
small, clean and fast and needed little adaptation for use in the
kernel.
This import has embedded in it the changes needed to make it build
in a kernel environment.
To optimize the common case, the kernel will minimize the number
of registers saved by not saving the preserved registers. In case
access to preserved registers is needed (signal handling, ptrace)
the kernel will unwind to the context of the syscall or exception.
For this we need an unwinder.
Approved by: re (blanket)
load_gs() calls into a single place that is less likely to go wrong.
Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.
Approved by: re (amd64/* blanket)
has already been registered with ATAPI/CAM (else there is nothing
to do). atapi_cam_reinit_bus may be called before the bus is
registered if an ATAPI command times out during the boot sequence.
PR: i386/51421
Reviewed by: roberto
Approved by: re (rwatson)
MFC after: 1 week
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.
Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.
On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.
Approved by: re (amd64/* blanket)
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.
Reviewed by: arch@
Approved by: re (rwatson)
don't add the current time to it, but leave it as clear so that when the
timer is disabled, the it_value is always clear.
Reviewed by: bde
Approved by: re (rwatson)
desired buffer is found at one of the roots more than 60% of the time.
Thus, checking both roots before performing either splay eliminates
unnecessary splays on the first tree splayed.
Approved by: re (jhb)
we do not have to run so long with interrupts disabled. This involved
creating tf_addr in the trapframe. Reorganize the trap stubs so that
they consistently reserve the stack space and initialize any missing
bits.
Approved by: re (amd64 stuff)
by using a __packed keyword for the fxp_rfa structure. The Intel
guys who designed this structure with unaligned fields deserve
to be shot.
Tested by: kris
Approved by: re@ (jhb)
synchronization primitives from inside DDB is generally a bad idea,
and in this case it frequently results in panics due to DDB commands
being executed from the sio fast interrupt context on a serial
console. Replace the locking with a note that a lack of locking
means that DDB may get see inconsistent views of the mount and vnode
lists, which could also result in a panic. More frequently,
though, this avoids a panic than causes it.
Discussed with ages ago: bde
Approved by: re (scottl)
common code, the non-trivial part is #ifdef'ed and only executes when
loading amd64 kernels. The rest is trivial but needed for the the amd64
case. (Two variables changed from char ** to Elf_Addr).
Approved by: re (amd64 "low-risk" stuff)
bus_dma MD code for AMD64. (And a trivial ifdef update in dev/kbd because
of this). More updates are needed here to take advantage of the 64 bit
instructions.
Approved by: re (blanket amd64/*)
value on entry and exit. This isn't as easy as it sounds because when
we recursively trap or interrupt, we have to avoid duplicating the
swapgs instruction or we end up back with the userland %gs. I implemented
this by testing TF_CS to see if we're coming from supervisor mode
already, and check for returning to supervisor. To avoid a race with
interrupts in the brief period after beginning executing the handler and
before the swapgs, convert all trap gates to interrupt gates, and reenable
interrupts immediately after the swapgs. I am not happy with this.
There are other possible ways to do this that should be investigated.
(eg: storing the GS.base MSR value in the trapframe)
Add some sysarch functions to let the userland code get to this.
Approved by: re (blanket amd64/*)
- Simplify and correct the bus manager election process.
- Check link_active when choosing cycle master.
- Fix location of the cmr bit.
Approved by: re (scottl)
series. This driver was generously developed and released by David
Jeffreys and Adaptec. I've updated it to work with 5.x and fixed a
few bugs.
MFC After: 1 week
reason for the duplication was that m_freem() was meant to eventually
be optimized to hold the lock of the cache being freed to as long as
possible across frees but the difficulty of implementing said
optimization right now is too high, given that in some cases (see MAC
and non-cluster external buffers), we need to call into other subsytems,
something not permissible when the cache lock is held.
This change minimizes code duplication while keeping at least the
atomic mbuf+cluster free optimization.
Suggested by: luigi
could use different versions of the math code depending on whether there
was real floating point hardware or math emulation. Since the fpu is
part of the core specification on amd64, there is no need for this here.
Approved by: re (blanket amd64/*)
still outstanding, give them a chance to complete.
If after 10 seconds we still find outstanding I/O requests, complete
the close with a console warning that the system is likely to panic
later on.
This is a workaround for umount -f not quite doing the right thing.
Approved by: re/scottl
should really be renamed to fpu.h and npx.c to fpu.c since its part of
the core architecture on amd64 systems, not an isa 'numeric processor
extension'.
uses of m_flags in the kernel. (A future commit will move all
private m_flags users here so they're obvious without a great
deal of searching.)
This should fix the mbuf double-free panics those using ppp or
ipfw reset rules have been seeing since the double-free detection
code went in.
constants in question refer to the number of label slots, not the
maximum number of policies that may be loaded. This should reduce
confusion regarding an element in the MAC sysctl MIB, as well as
make it more clear what the affect of changing the compile-time
constants is.
Approved by: re (jhb)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
that interrupts be disabled and remain disabled until %cr2 is read.
Otherwise we can preempt and another process can fault, and by the
time we read %cr2, we see a different processes fault address. This
Greatly Confuses vm_fault() (to say the least). The i386 port has
got this marked as a bug workaround for a Cyrix CPU, which is what
lead me astray. Its actually necessary for preemption, regardless
of whether Cyrix cpus had a bug or not.
call handler before it was safe. It was possible for to lose context
and for something else to clobber the PCPU scratch variable. This
moves the interrupt enable *way* too late, but its better safe than
sorry for the moment.
ASIC revision is really the major number of the CHIPID. Also store
the chipid, asic rev and chip revision in the softc for later use.
- The write twice to send producer index workaround only applies to
the 5700_BX chips, so only do it there.
Requested by: jdp
- Do not initalize the LED's to 0x00. The default configuration
the chip comes up in should yeild proper operation of the LED's.
Confirmed by: John Cagle <john.cagle@hp.com>
Approved by: re (blanket)
(1) Accept that we're now going to use mutexes, so don't attempt
to avoid treating them as mutexes. This cleans up locking
accessor function names some.
(2) Rename variables to _mtx, _cv, _count, simplifying the naming.
(3) Add a new form of the _busy() primitive that conditionally
makes the list busy: if there are entries on the list, bump
the busy count. If there are no entries, don't bump the busy
count. Return a boolean indicating whether or not the busy
count was bumped.
(4) Break mac_policy_list into two lists: one with the same name
holding dynamic policies, and a new list, mac_static_policy_list,
which holds policies loaded before mac_late and without the
unload flag set. The static list may be accessed without
holding the busy count, since it can't change at run-time.
(5) In general, prefer making the list busy conditionally, meaning
we pay only one mutex lock per entry point if all modules are
on the static list, rather than two (since we don't have to
lower the busy count when we're done with the framework). For
systems running just Biba or MLS, this will halve the mutex
accesses in the network stack, and may offer a substantial
performance benefits.
(6) Lay the groundwork for a dynamic-free kernel option which
eliminates all locking associated with dynamically loaded or
unloaded policies, for pre-configured systems requiring
maximum performance but less run-time flexibility.
These changes have been running for a few weeks on MAC development
branch systems.
Approved by: re (jhb)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
sure that the MAC label on TCP responses during TIMEWAIT is
properly set from either the socket (if available), or the mbuf
that it's responding to.
Unfortunately, this is made somewhat difficult by the TCP code,
as tcp_twstart() calls tcp_twrespond() after discarding the socket
but without a reference to the mbuf that causes the "response".
Passing both the socket and the mbuf works arounds this--eventually
it might be good to make sure the mbuf always gets passed in in
"response" scenarios but working through this provided to
complicate things too much.
Approved by: re (scottl)
Reviewed by: hsu
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
copying for mbuf headers now works properly in m_dup_pkthdr(), so
we don't need to do an explicit copy.
Approved by: re (jhb)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
leads to a panic at unload time, as we own 2 instances of callout and
untimeout() only one.
Will I'm there, remove a call to callout_handler_init(), one is enough.
Reviewed by: wpaul
does the copyin stuff and then calls the second part kern_sendit to do
the hard work. Don't bother holding Giant during the copyin phase.
The intent of this is to allow the Linux emulator to impliment send*
syscalls without using the stackgap.
ILMI daemons. Factor out common softc fields for all ATM interfaces that
need to be externally visible into an ifatm structure and make the midway
driver using this structure and fill the MIB.
entering sys_process.c debugging primitives, or we violate assertions.
Also, be more careful about releasing the process lock around calls
to uiomove() which may sleep waiting for paging machinations or
related notions. We may want to defer the uiomove() in at least
one case, but jhb will look into that at a later date.
Reported by: Philippe Charnier <charnier@xp11.frmug.org>
Reviewed by: jhb
argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}.
The BSD semantics didn't allow the usage of shared segment after
being marked for removal through IPC_RMID.
The patch involves the following functions:
- shmat
- shmctl
- shm_find_segment_by_shmid
- shm_find_segment_by_shmidx
- linux_shmat
- linux_shmctl
Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it>
Reviewed by: marcel
Maintain sector sizes for all objects, not just for drives. Some of
this could do with improvement: in particular, we get an error if the
components of an object have different sector sizes.
Clean up some comments.
the -v option, though it's not clear that it won't bite us elsewhere.
Forgotten by: phk
Implement setreadpol() function for the VINUM_READPOL ioctl.
Submitted by: Allan Saddi <allan@saddi.com>
with it.
Finally implement read policies. The previous "implementation" didn't
work because it referred to plexes which were almost invariably when
referred to. Instead, deprecate the "prefer" keyword for volumes
(though it's still there for the moment) and add a keyword "preferred"
to the plex definition. The relationship is like this:
Old:
vol foo ... prefer foo.p3
New:
plex foo.p3 volume foo preferred
print_config: Print "preferred" where appropriate.
No longer print "prefer" on volume config entries.
work because it referred to plexes which were almost invariably when
referred to. Instead, deprecate the "prefer" keyword for volumes
(though it's still there for the moment) and add a keyword "preferred"
to the plex definition. The relationship is like this:
Old:
vol foo ... prefer foo.p3
New:
plex foo.p3 volume foo preferred
give_plex_to_volume: set preferred plex if specified on plex
definition entry. This involves adding a parameter to the function to
specify the preferred plex.
config_plex: Implement preferred keyword.
Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they
were marked for deprecation ever since SUSv1 at least.
Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is
supported.
Restore a lost comment in MI _limits.h file and remove it from
sys/limits.h where it does not belong.
Add ioctl VINUM_READCONFIG which implements both the "read" and
"start" commands in vinum(8). Aim for marginally better error
messages when something goes wrong.
Switch to handling bad SCSI status as a sequencer interrupt
instead of having the kernel proccess these failures via
the completion queue. This is done because:
o The old scheme required us to pause the sequencer and clear
critical sections for each SCB. It seems that these pause
actions, if coincident with a sequencer FIFO interrupt, would
result in a FIFO interrupt getting lost or directing to the
wrong FIFO. This caused hangs when the driver was stressed
under high "queue full" loads.
o The completion code assumed that it was always called with
the sequencer running. This may not be the case in timeout
processing where completions occur manually via
ahd_pause_and_flushwork().
o With this scheme, the extra expense of clearing critical
sections is avoided since the sequencer will only self pause
once all pending selections have cleared and it is not in
a critical section.
aic79xx.c
Add code to handle the new BAD_SCB_STATUS sequencer
interrupt code. This just redirects the SCB through
the already existing ahd_complete_scb() code path.
Remove code in ahd_handle_scsi_status() that paused
the sequencer, made sure that no selections where
pending, and cleared critical sections. Bad
status SCBs are now only processed when all of these
conditions are true.
aic79xx.reg:
Add the BAD_SCB_STATUS sequencer interrupt code.
aic79xx.seq:
When completing an SCB upload to the host, if
we are doing this because the SCB contains non-zero
SCSI status, defer completing the SCB until there
are no pending selection events. When completing
these SCBs, use the new BAD_SCB_STATUS sequencer
interrupt. For all other uploaded SCBs (currently
only for underruns), the SCB is completed via the
normal done queue. Additionally, keep the SCB that
is currently being uploaded on the COMPLETE_DMA_SCB
list until the dma is completed, not just until the
DMA is started. This ensures that the DMA is restarted
properly should the host disable the DMA transfer for
some reason.
In our RevA workaround for Maxtor drives, guard against
the host pausing us while trying to pause I/O until the
first data-valid REQ by clearing the current snapshot
so that we can tell if the transfer has completed prior
to us noticing the REQINIT status.
In cfg4data_intr, shave off an instruction before getting
the data path running by adding an entrypoint to the
overrun handler to also increment the FIFO use count.
In the overrun handler, be sure to clear our LONGJMP
address in both exit paths.
Perform a few sequencer optimizations.
aic79xx.c:
Print the full path from the SCB when a packetized
status overrun occurs.
Remove references to LONGJMP_SCB which is being
removed from firmware usage.
Print the new SCB_FIFO_USE_COUNT field in the
per-SCB section of ahd_dump_card_state(). The
SCB_TAG field is now re-used by the sequencer,
so it no longer makes sense to reference this
field in the kernel driver.
aic79xx.h:
Re-arrange fields in the hardware SCB from largest
size type to smallest. This makes it easier to
move fields without changing field alignment.
The hardware scb tag field is now down near the
"spare" portion of the SCB to facilitate reuse
by the sequencer.
aic79xx.reg:
Remove LONGJMP_ADDR.
Rearrange SCB fields to match aic79xx.h.
Add SCB_FIFO_USE_COUNT as the first byte
of the SCB_TAG field.
aic79xx.seq:
Add a per-SCB "Fifos in use count" field and use
it to determine when it is safe (all data posted)
to deliver status back to the host. The old method
involved polling one or both FIFOs to verify that
the current task did not have pending data. This
makes running down the GSFIFO very cheap, so we
will empty the GSFIFO in one idle loop pass in
all cases.
Use this simplification of the completion process
to prune down the data FIFO teardown sequencer for
packetized transfers. Much more code is now shared
between the data residual and transfer complete cases.
Correct some issues in the packetized status handler.
It used to be possible to CLRCHN our FIFO before status
had fully transferred to the host. We also failed to
handle NONPACKREQ phases that could occur should a CRC
error occur during transmission of the status data packet.
Correct a few big endian issues:
aic79xx.c:
aic79xx_inline.h:
aic79xx_pci.c:
aic79xx_osm.c:
o Always get the SCB's tag via the SCB_GET_TAG acccessor
o Add missing use of byte swapping macros when touching
hscb fields.
o Don't double swap SEEPROM data when it is printed.
Correct a big-endian bug. We cannot assign a
o When assigning a 32bit LE variable to a 64bit LE
variable, we must be explict about how the words
of the 64bit LE variable are initialized. Cast to
(uint32_t*) to do this.
aic79xx.c:
In ahd_clear_critical_section(), hit CRLSCSIINT
after restoring the interrupt masks to avoid what
appears to be a glitch on SCSIINT. Any real SCSIINT
status will be persistent and will immidiately
reset SCSIINT. This clear should only get rid of
spurious SCSIINTs.
This glitch was the cause of the "Unexpected PKT busfree"
status that occurred under high queue full loads
Call ahd_fini_scbdata() after shutdown so that
any ahd_chip_init() routine that might access
SCB data will not access free'd memory.
Reset the bus on an IOERR since the chip doesn't
seem to reset to the new voltage level without
this.
Change offset calculation for scatter gather maps
so that the calculation is correct if an integral
multiple of sg lists does not fit in the allocation
size.
Adjust bus dma tag for data buffers based on 39BIT
addressing flag in our softc.
Use the QFREEZE count to simplify ahd_pause_and_flushworkd().
We can thus rely on the sequencer eventually clearing ENSELO.
In ahd_abort_scbs(), fix a bug that could potentially
corrupt sequencer state. The saved SCB was being
restored in the SCSI mode instead of the saved mode.
It turns out that the SCB did not need to be saved at all
as the scbptr is already restored by all subroutines
called during this function that modify that register.
aic79xx.c:
aic79xx.h:
aic79xx_pci.c:
Add support for parsing the seeprom vital product
data. The VPD data are currently unused.
aic79xx.h:
aic79xx.seq:
aic79xx_pci.c:
Add a firmware workaround to make the LED blink
brighter during packetized operations on the H2A.
aic79xx_inline.h:
The host does not use timer interrupts, so don't
gate our decision on whether or not to unpause
the sequencer on whether or not a timer interrupt
is pending.
aic7xxx.h:
Split out core chip initialization into ahc_chip_init().
This will allow us to reset the chip correctly at times
other than initial chip setup.
aic7770.c
aic7xxx_pci.c:
Flesh out bus chip init methods for our two
bus attachments and use these, in addition to
bus suspend/resume hooks to get the core in
better shape for handling these events.
When disabling PCI parity error checking, use FAILDIS.
Although the chip docs indicate that clearing PERRESPEN
should also work, it does not.
Auto-disable pci parity error checking after informing
the user of AHC_PCI_TARGET_PERR_THRESH number of parity
errors observed as a target.
aic7xxx.h:
aic7xxx_pci.c
aic7770.c
aic7xxx.c
Add the instruction_ram_size softc field.
Remove the now unused stack_size softc field.
Modify ahc_loadseq to return a failure code
and to actually check the downloaded instruction
count against the limit set in our softc.
Modify callers of ahc_loadseq to handle load
failures as appropriate.
Set instruction RAM sizes for each chip type.
aic7xxx_pci.c:
Add some delay in the aic785X termination
control code. This may fix problems with
the 2930.
Be consistent in how we access config space
registers. 16bit registers are accessed using
16bit ops.
aic7xxx.c:
Correct spelling errors.
Have ahc_force_renegotiation() take a devinfo as is done
in the U320 driver. Use this argument to correct a bug
in the selection timeout handler where we forced a renegotiation
with the last device that had set SAVED_SCSIID. SAVED_SCSIID
is only updated once a selection is *sucessfull* and so is
stale for any selection timeout.
Cleanup the setup of the devinfo for busfree events. We
now use this devinfo for a call to ahc_force_renegotiation()
at the bottom of the routine, so it must be initialized in
all cases.
In ahc_pause_and_flushwork(), adjust the loop so that it
will exit in the hot-eject case even if the INT_PEND mask
is something other than 0xFF (as it is in this driver).
Correct a wrapping string constant.
Call ahc_fini_scbdata() after shutdown so that
any ahc_chip_init() routine that might access
SCB data will not access free'd memory.
Correctly setup our buffer tag to indicate that 39bit
addressing is available if in 39bit addressing mode.
Rearrange some variable declarations based on
type size.
aic7xxx.c
aic7xxx.h:
aic7xxx.reg:
Consistently use MAX_OFFSET for the user max syncrate
set from non-volatile storage. This ensures that the
offset does not conflict with AHC_OFFSET_UNKNOWN.
Change AHC_OFFSET_UNKNOWN to 0xFF. This is
a value that the curr->offset can never be,
unlike '0' which we previously used. This
fixes code that only checks for a non-zero
offset to determine if a sync negotiation
is required since it will fire in the unknown
case even if the goal is async.
Change MAX_OFFSET to 0x7f which is the max
offset U160 aic7xxx controllers can negotiate.
This ensures that curr->offset will not
match AHC_OFFSET_UNKNOWN.
aic7xxx_inline.h:
Have our inline interrupt handler return with a value
indicating whether we serviced a real interrupt. This
is required for Linux support.
Return earlier if the interrupt is not for us.
patch workarounds for each phy revision.
Obtained from: NetBSD & Broadcom Linux driver
- Disable AUTOPOLL when accessing the PHY as it may cause PCI errors.
Obtained from: NetBSD
- Check the UPDATED bit in the status block so the driver knows
that the status block as indeed changed since the last access.
Broadcom documentation states drivers should unset the UPDATED/CHANGED
bits after reading them.
- When changing media types, first loop the phy then set the media.
Broadcom documentation and Linux drivers do this and I observed
much better handling of link after this change.
- Broadcom documentation states that for 1000BaseT operation,
autonegotiation must be enabled. Fix hard coding of media so that
the driver only advertises 1000BaseT as the supported media type
and enable autonegotition.
- Only set Master/Slave on the 5701.
Obtained from Broadcom Linux driver.
this is fixed in a newer version of ACPICA and I don't want to take
this off the vendor branch for a trivial reason. This patch was
applied to NetBSD by kochi-san, who also posted the patch to
acpi-jp@jp.freebsd.org.
# My Dell Inspiron 8000 now powers off!
Submitted by: takayoshi kochi-san kochi at netbsd dot org
trustworthy for vnode-backed objects.
- Restore the old behavior of vm_object_page_remove() when the end
of the given range is zero. Add a comment to vm_object_page_remove()
regarding this behavior.
Reported by: iedowse
- Make sure we don't release the READ CAPACITY CCB twice
- If we have a device that needs a 16 byte READ CAPACITY command, make
sure we call xpt_schedule() so we can get a CCB.
- Don't unlock the peripheral until we're fully probed.
Many thanks to Julian Elischer for providing hardware and testing this.
Tested by: julian
(If there is a legitimate need to correctly encode and pack a
disklabel with an invalid checksum custom tools can be built for
that.)
Make bsd_disklabel_le_dec() validate the magics, number of partitions
(against a new parameter) and the checksum.
Vastly simplify the logic of the GEOM::BSD class implementation:
Let g_bsd_modify() always take a byte-stream label.
This simplifies all users, except the ioctl's which now have to
convert to a byte-stream first. Their loss.
g_bsd_modify() is called with topology held now, and it returns
with it held.
Always update the md5sum in g_bsd_modify(), otherwise the check
is no use after the first modification of the label. Make the
MD5 over the bytestream version of the label.
Move the rawoffset hack to g_bsd_modify() and remove all the
inram/ondisk conversions.
Don't configure hotspots in g_bsd_modify(), do it in taste instead,
we do not support moving the label to a different location on the
fly anyway.
This passes all current regression tests.
- Use htole* macros where appropriate so that the driver could work on non-x86 architectures
- Use m_getcl() instead of MGETHDR/MCLGET macros
Submitted by: sam (Sam Leffler)
Note: this might print failure messages on some systems, unfortunatly
the info from the device, stating if flushing is supported, cannot be trusted
so the operation is always issued on all devices, just in case...