don't need any #ifdef stuff to use atomic_load/store_64() elsewhere in
the kernel. For armv4 the atomics are trivial to implement for kernel
code (just disable interrupts), less so for user mode, so this only has
the kernel mode implementations for now.
value shared across multiple cores is with atomic_load_64() and
atomic_store_64(), because the normal 64-bit load/store instructions
are not atomic on 32-bit arm. Luckily the ldrexd/strexd instructions
that are atomic are fairly cheap on armv6. Because it's fairly simple
to do, this implements all the ops for 64-bit, not just load/store.
Reviewed by: andrew, cognet
It turns out the version of gas we're using interprets the old '_all' mask
as 'fc' instead of 'fsxc'. That is, "all" doesn't really mean "all".
This was the cause of the "wrong-endian register restore" bug that's
been causing problems with some cortex-a9 chips. The 'endian' bit in the
spsr register would never get changed (it falls into the 'x' mask group)
and the first return-from-exception would fail if the chip had powered on
with garbage in the spsr register that included the big-endian bit. It's
unknown why this affected only certain cortex-a9 chips.
instruction set. Thumb-2 requires an if-then instruction to implement
conditional codes.
When building for ARM mode the it-then instructions do not generate any
assembled instruction as per the ARMv7-A Architecture Reference Manual, and
are safe to use.
While this allows the atomic instructions to be built, it doesn't mean we
fully support Thumb code. It works in small tests, but is still known to
fail in a large number of places.
While here add a check for the armv6t2 architecture.
this some compilers will place a cmp instruction before the atomic operation
and expect to be able to use the result afterwards. By adding "cc" to the
list of used registers we tell the compiler to not do this.
Cummulative patch of changes that are not vendor-specific:
- ARMv6 and ARMv7 architecture support
- ARM SMP support
- VFP/Neon support
- ARM Generic Interrupt Controller driver
- Simplification of startup code for all platforms
and ifnet functions
- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own
- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers
- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues
This work was supported by Bitgravity Inc. and Chelsio Inc.
De-hardcode usage of ARM_TP_ADDRESS and RAS local storage, and move this
special purpose page to a more convenient place i.e. after the vectors high
page, more towards the end of address space. Previous location (0xe000_0000)
caused grief if KVA was to go beyond the default limit.
Note that ARM world rebuilding is required after this change since the
location of ARM_TP_ADDRESS is shared between kernel and userland.
Submitted by: Grzegorz Bernacki (gjb AT semihalf dot com)
Reviewed by: imp
Approved by: cognet (mentor)
The RAS implementation would set the end address, then the start
address. These were used by the kernel to restart a RAS sequence if
it was interrupted. When the thread switching code ran, it would
check these values and adjust the PC and clear them if it did.
However, there's a small flaw in this scheme. Thread T1, sets the end
address and gets preempted. Thread T2 runs and also does a RAS
operation. This resets end to zero. Thread T1 now runs again and
sets start and then begins the RAS sequence, but is preempted before
the RAS sequence executes its last instruction. The kernel code that
would ordinarily restart the RAS sequence doesn't because the PC isn't
between start and 0, so the PC isn't set to the start of the sequence.
So when T1 is resumed again, it is at the wrong location for RAS to
produce the correct results. This causes the wrong results for the
atomic sequence.
The window for the first race is 3 instructions. The window for the
second race is 5-10 instructions depending on the atomic operation.
This makes this failure fairly rare and hard to reproduce.
Mutexs are implemented in libthr using atomic operations. When the
above race would occur, a lock could get stuck locked, causing many
downstream problems, as you might expect.
Also, make sure to reset the start and end address when doing a syscall, or
a malicious process could set them before doing a syscall.
Reviewed by: imp, ups (thanks guys)
Pointy hat to: cognet
MFC After: 3 days
It should just contain the value we want to add, as if we're interrupted
between the add and the str, we will restart from the beginning. Just use
a register we can scratch instead.
MFC After: 1 week
the modified memory rather than using register operands that held a pointer
to the memory. The biggest effect is that we now correctly tell the
compiler that these functions change the memory that these functions
modify.
Reviewed by: cognet
variable and returns the previous value of the variable.
Tested on: i386, alpha, sparc64, arm (cognet)
Reviewed by: arch@
Submitted by: cognet (arm)
MFC after: 1 week
in the arm __swp() and sparc64 casa() and casax() functions is actually
being used as an input and output and not just the value of the register
that points to the memory location. This was the underlying source of
the mbuf refcount problems on sparc64 a while back. For arm this should be
a nop because __swp() has a constraint to clobber all memory which can
probably be removed now.
Reviewed by: alc, cognet
MFC after: 1 week
variables rather than void * variables. This makes it easier and simpler
to get asm constraints and volatile keywords correct.
MFC after: 3 days
Tested on: i386, alpha, sparc64
Compiled on: ia64, powerpc, amd64
Kernel toolchain busted on: arm
It only supports sa1110 (on simics) right now, but xscale support should come
soon.
Some of the initial work has been provided by :
Stephane Potvin <sepotvin at videotron.ca>
Most of this comes from NetBSD.