Commit Graph

56 Commits

Author SHA1 Message Date
jkim
0c7a0c810c Implement atomic_load_acq_64(9) and atomic_store_rel_64(9) for i386. These
functions are implemented with CMPXCHG8B instruction where it is available,
i. e., all Pentium-class and later processors.  Note this instruction is
also used for atomic_store_rel_64() because a simple XCHG-like instruction
for 64-bit memory access does not exist, unfortunately.  If the processor
lacks the instruction, i. e., 80486-class CPUs, two 32-bit load/store are
performed with interrupt temporarily disabled, assuming it does not support
SMP.  Although this assumption may be little naive, it is true in reality.
This implementation is inspired by Linux.
2011-04-06 23:59:59 +00:00
kib
43f2291188 Inform a compiler which asm statements in the x86 implementation of
atomics change eflags.

Reviewed by:	jhb
MFC after:	2 weeks
2010-12-18 16:41:11 +00:00
phk
3cad03a86c Rename an argument from "exp" to "expect" since the former makes FlexeLint
uneasy, in case anybody think it might be exp(3) in libm.

This also makes it consistent with other archs.
2010-05-20 06:18:03 +00:00
attilio
615a58802f atomic_cmpset_barr_* was added in order to cope with compilers willing to
specify their own version of atomic_cmpset_* which could have been
different than the membar version.

Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.

In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.

Requested by:	jhb
Reviewed by:	jhb
Tested by:	Giovanni Trematerra <giovanni dot trematerra at
		gmail dot com>
2009-10-09 15:51:40 +00:00
attilio
d6f29069b6 - All the functions in atomic.h needs to be in "physical" form (like
not defined through macros or similar) in order to be later compiled in
  the kernel and offer this way the support for modules (and
  compatibility among the UP case and SMP case).
  Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
  and specifying a template.  Note that the new DEFINE_CMPSET_GEN()
  template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.

[1] Reported by:	Paul B. Mahol <onemda at gmail dot com>
2009-10-06 23:48:28 +00:00
attilio
b1ce942125 Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used.  GCC, however, does that aggressively, even in
presence of volatile operands.  The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).

Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.

Reported by:	jhb
Reviewed by:	jhb
Tested by:	rdivacky, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
2009-10-06 13:45:49 +00:00
kib
2a6c909fae Restore memory clobber, to cause mb on the compiler level too.
Use more sane formatting of the assembler.

Pointed out by:	bde
2008-12-06 21:33:44 +00:00
kib
ffffd56bb1 Unconditionally use locked addition of zero to tip of the stack for
memory barriers on i386. It works as a serialization instruction on
all IA32 CPUs.

Alternative solution of using {s,l,}fence requires run-time checking
of the presense of the corresponding SSE or SSE2 extensions, and
possible boot-time patching of the kernel text.

Suggested by:	many
2008-12-05 21:17:54 +00:00
kmacy
9d3bb599b1 - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
  allow drivers to efficiently manage multiple hardware queues
  (i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
2008-11-22 05:55:56 +00:00
pjd
ea49d310bf Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
bde
c4404408fa Fix oops in previous commit. 2006-12-29 15:48:18 +00:00
bde
b616a20d08 Fixed some style bugs (mainly assorted errors in comments, and inconsistent
spelling of `result').
2006-12-29 15:29:49 +00:00
bde
0a6fe7fb48 Fixed some style bugs (whitespace only). 2006-12-29 14:28:23 +00:00
bde
6a28e42ead Try harder to garbage-collect the "LOCORE" (really asm) version of
MPLOCKED.  The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors.  The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.
2006-12-29 13:36:26 +00:00
bde
9d0b590514 Avoid an instruction in atomic_cmpset_{int_long)() in most cases.
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%.  This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).

Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register.  We converted  to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax.  Let
gcc promote %al in the few cases where this is needed.

Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
  %al anywhere, so don't hard-code it in the constraints either.

Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
  for the main output operand, although this is not required.  The input
  and output operands that use the "a" constraint are now decoupled, and
  this makes things clearer except for the reason that the output register
  is hard-coded.  It is now just a hack to tell gcc that the input "a" has
  been clobbered without increasing the number of operands.
2006-12-27 20:26:00 +00:00
des
af5e05fb0b Use wrapper macros for atomic pointer operations in order to perform the
correct casts.  This should probably be merged to other architectures.
2006-03-28 14:34:48 +00:00
jhb
89caa56972 Add a new atomic_fetchadd() primitive that atomically adds a value to a
variable and returns the previous value of the variable.

Tested on:	i386, alpha, sparc64, arm (cognet)
Reviewed by:	arch@
Submitted by:	cognet (arm)
MFC after:	1 week
2005-09-27 17:39:11 +00:00
jhb
b729e912ca Stop using the '+' constraint modifier with inline assembly. The '+'
constraint is actually only allowed for register operands.  Instead, use
separate input and output memory constraints.

Education from:	alc
Reviewed by:	alc
Tested on:	i386, alpha
MFC after:	1 week
2005-09-15 19:31:22 +00:00
jhb
c7383aebd6 Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables.  This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after:	3 days
Tested on:	i386, alpha, sparc64
Compiled on:	ia64, powerpc, amd64
Kernel toolchain busted on:	arm
2005-07-15 18:17:59 +00:00
jhb
febd45c7d3 Some cleanups and tweaks to some of the atomic.h files in preparation for
further changes and fixes in the future:
- Use aliases via macros rather than duplicated inlines wherever possible.
- Move all the aliases to the bottom of these files and the inline
  functions to the top.
- Add various comments.
- On alpha, drop atomic_{load_acq,store_rel}_{8,char,16,short}().
- On i386 and amd64, don't duplicate the extern declarations for functions
  in the two non-inline cases (KLD_MODULE and compiler doesn't do inlines),
  instead, consolidate those two cases.
- Some whitespace fixes.

Approved by:	re (scottl)
2005-07-09 12:38:53 +00:00
joerg
c85a3e95f7 netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild.  Extension to other compilers is supposed
to be possible, of course.

Submitted by:	netchild
Reviewed by:	various developers on arch@, some time ago
2005-03-02 21:33:29 +00:00
jhb
cc03c9de7d Initiate deorbit burn sequence for 80386 support in FreeBSD: Remove
80386 (I386_CPU) support from the kernel.
2004-11-16 20:42:32 +00:00
jhb
747be736a2 Spell _KERNEL correctly so that UP kernels are actually optimized again.
Submitted by:	pjd
2004-11-12 19:18:46 +00:00
jhb
ba0a3a16d6 - Use the SMP style ops for atomic_load/store() in userland so that
libraries and binaries will work on both UP and SMP machines.
- Remove unnecessary gcc memory barrier from the UP atomic_store() op.

Submitted by:	bde
2004-11-12 18:40:22 +00:00
jhb
dabf0d0c4f - Place the gcc memory barrier hint in the right place in the 80386 version
of atomic_store_rel().
- Use the 80386 versions of atomic_load_acq() and atomic_store_rel() that
  do not use serializing instructions on all UP kernels since a UP machine
  does need to synchronize with other CPUs.  This trims lots of cycles from
  spin locks on UP kernels among other things.

Benchmarked by:	rwatson
2004-11-11 22:42:25 +00:00
trhodes
dfcfecd6e4 These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
 - in_cksum.[ch]:
   * use a generic C version instead of the assembly version in the !gcc
     case (ASM code breaks with the optimizations icc does)
     -> no bad checksums with an icc compiled kernel
     Help from:		andre, grehan, das
     Stolen from: 	alpha version via ppc version
     The entire checksum code should IMHO be replaced with the DragonFly
     version (because it isn't guaranteed future revisions of gcc will
     include similar optimizations) as in:
        ---snip---
          Revision  Changes    Path
          1.12      +1 -0      src/sys/conf/files.i386
          1.4       +142 -558  src/sys/i386/i386/in_cksum.c
          1.5       +33 -69    src/sys/i386/include/in_cksum.h
          1.5       +2 -0      src/sys/netinet/igmp.c
          1.6       +0 -1      src/sys/netinet/in.h
          1.6       +2 -0      src/sys/netinet/ip_icmp.c

          1.4       +3 -4      src/contrib/ipfilter/ip_compat.h
          1.3       +1 -2      src/sbin/natd/icmp.c
          1.4       +0 -1      src/sbin/natd/natd.c
          1.48      +1 -0      src/sys/conf/files
          1.2       +0 -1      src/sys/conf/files.amd64
          1.13      +0 -1      src/sys/conf/files.i386
          1.5       +0 -1      src/sys/conf/files.pc98
          1.7       +1 -1      src/sys/contrib/ipfilter/netinet/fil.c
          1.10      +2 -3      src/sys/contrib/ipfilter/netinet/ip_compat.h
          1.10      +1 -1      src/sys/contrib/ipfilter/netinet/ip_fil.c
          1.7       +1 -1      src/sys/dev/netif/txp/if_txp.c
          1.7       +1 -1      src/sys/net/ip_mroute/ip_mroute.c
          1.7       +1 -2      src/sys/net/ipfw/ip_fw2.c
          1.6       +1 -2      src/sys/netinet/igmp.c
          1.4       +158 -116  src/sys/netinet/in_cksum.c
          1.6       +1 -1      src/sys/netinet/ip_gre.c
          1.7       +1 -2      src/sys/netinet/ip_icmp.c
          1.10      +1 -1      src/sys/netinet/ip_input.c
          1.10      +1 -2      src/sys/netinet/ip_output.c
          1.13      +1 -2      src/sys/netinet/tcp_input.c
          1.9       +1 -2      src/sys/netinet/tcp_output.c
          1.10      +1 -1      src/sys/netinet/tcp_subr.c
          1.10      +1 -1      src/sys/netinet/tcp_syncache.c
          1.9       +1 -2      src/sys/netinet/udp_usrreq.c

          1.5       +1 -2      src/sys/netinet6/ipsec.c
          1.5       +1 -2      src/sys/netproto/ipsec/ipsec.c
          1.5       +1 -1      src/sys/netproto/ipsec/ipsec_input.c
          1.4       +1 -2      src/sys/netproto/ipsec/ipsec_output.c

          and finally remove
            sys/i386/i386        in_cksum.c
            sys/i386/include     in_cksum.h
        ---snip---
 - endian.h:
   * DTRT in C++ mode
 - quad.h:
   * we don't use gcc v1 anymore, remove support for it
   Suggested by:	bde (long ago)
 - assym.h:
   * avoid zero-length arrays (remove dependency on a gcc specific
     feature)
     This change changes the contents of the object file, but as it's
     only used to generate some values for a header, and the generator
     knows how to handle this, there's no impact in the gcc case.
   Explained by:	bde
   Submitted by:	Marius Strobl <marius@alchemy.franken.de>
 - aicasm.c:
   * minor change to teach it about the way icc spells "-nostdinc"
   Not approved by:	gibbs (no reply to my mail)
 - bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by:	-arch
Submitted by:	netchild
2004-03-12 21:45:33 +00:00
bde
1c7581a731 Fixed pedantic syntax errors. Many macros didn't permit a semicolon after
their invocation in the !KLD_MODULE case, but a semicolon is provided after
all invocations and is required in the KLD_MODULE case.
2003-11-17 02:55:25 +00:00
bde
a2958cef1d Avoid a warning for compiling with `gcc -Wbad-function cast'. (This
is the warning that points to the bug in `(char *)malloc(...)' where
malloc() is implicitly declared as returning int.  We do similar things
here, but they work because u_int is the same as uintptr_t on i386's.)
2003-11-17 02:11:13 +00:00
pirzyk
0bdb757b1a Add a knob to turn on and off the CMPXCHG instruction on > i386 IA32 systems.
This is most beneficial for vmware client os installs.

Reviewed by: jmallet, iedowse, tlambert2@mindspring.com
MFC After: never, -STABLE does not currently use this instruction
2002-10-14 19:33:12 +00:00
markm
c5b0a0ebfe Beautify. This has the side effect of improving portability and
making lint work cleaner.

Inspired to do this by:	jhb
2002-07-18 15:56:46 +00:00
markm
a6f49f3bf1 Clean up the syntax WRT semicolons at the end of function-like-macros, and protect GCCisms from non-GNU compilers and lint. 2002-07-17 16:19:37 +00:00
bmilekic
a3cf10660c Make MPLOCKED work again in asm files and stringify it explicitly
where necessary.

Reviewed by: jake
2002-02-28 06:17:05 +00:00
bde
78de9bc0b9 Garbage-collect the "LOCORE" version of MPLOCKED. 2002-02-11 03:41:59 +00:00
jhb
6eb183d2d9 Allow the ATOMIC_ASM() macro to pass in the constraints on the V parameter
since the char versions need to use either ax, bx, cx, or dx.

Submitted by:	Peter Jeremy (mostly)
Recommended by:	bde
2001-12-18 08:51:34 +00:00
jhb
0c3e87867f Use newer constraints for atomic_cmpset().
Requested by:	bde
2001-11-12 18:53:45 +00:00
jhb
25544c8c32 Use newer constraints for inline assembly for an operand that is both an
input and an output by using the '+' modifier rather than listing the
operand in both the input and output sections.

Reviwed by:	bde
2001-11-12 16:57:33 +00:00
jhb
ea70c67008 Allow atomic ops to be somewhat safely used in userland. We always use
lock prefixes in the userland case so that the binaries will work on both
SMP and UP systems.
2001-10-08 20:58:24 +00:00
markm
4e9c36b300 RIP <machine/lock.h>.
Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by:	jhb
2001-02-11 10:44:09 +00:00
jhb
a3a305a2b4 - Sort of lie and say that %eax is an output only and not an input for the
non-386 atomic_load_acq().  %eax is an input since its value is used in
  the cmpxchg instruction, but we don't care what value it is, so setting
  it to a specific value is just wasteful.  Thus, it is being used without
  being initialized as the warning stated, but it is ok for it to be used
  because its value isn't important.  Thus, we are only sort of lying when
  we say it is an output only operand.
- Add "cc" to the clobber list for atomic_load_acq() since the cmpxchgl
  changes ZF.
2001-01-17 02:15:11 +00:00
jhb
b83c97ff1c - Fix atomic_load_* and atomic_store_* to generate functions for atomic.c
that modules can call.
- Remove the old gcc <= 2.8 versions of the atomic ops.
- Resort the order of some things in the file so that there is only
  one #ifdef for KLD_MODULE, and so that all WANT_FUNCTIONS stuff is
  moved to the bottom of the file.
- Remove ATOMIC_ACQ_REL() and just use explicit macros instead.
2001-01-16 00:18:36 +00:00
jhb
7fa2e61832 Fix the atomic_load_acq() and atomic_store_rel() functions to properly
implement memory fences for the 486+.  The 386 still uses versions w/o
memory fences as all operations on the 386 are not program ordered.
The 386 versions are not MP safe.
2001-01-14 09:55:21 +00:00
jhb
94e76ebf96 The x86 atomic operations are already locked, so they do not need an
additional locked instruction to guarantee a write barrier for the acquire
variants.

Approved by:	dfr
Pointy hat to:	jhb
2000-10-28 00:28:15 +00:00
jhb
03e6157448 - Add atomic_cmpset_{acq_,rel_,}_long
- Add in atomic operations for 8-bit, 16-bit, and 32-bit integers
2000-10-25 21:56:16 +00:00
jhb
787712af1c - Expand the set of atomic operations to optionally include memory barriers
in most of the atomic operations.  Now for these operations, you can
  use the normal atomic operation, you can use the operation with a read
  barrier, or you can use the operation with a write barrier.  The function
  names follow the same semantics used in the ia64 instruction set.  An
  atomic operation with a read barrier has the extra suffix 'acq', due to
  it having "acquire" semantics.  An atomic operation with a write barrier
  has the extra suffix 'rel'.  These suffixes are inserted between the
  name of the operation to perform and the typename.  For example, the
  atomic_add_int() function now has 3 variants:
  - atomic_add_int() - this is the same as the previous function
  - atomic_add_acq_int() - this function combines the add operation with a
    read memory barrier
  - atomic_add_rel_int() - this function combines the add operation with a
    write memory barrier
- Add 'ptr' to the list of types that we can perform atomic operations
  on.  This allows one to do atomic operations on uintptr_t's.  This is
  useful in the mutex code, for example, because the actual mutex lock is
  a pointer.
- Add two new operations for doing loads and stores with memory barriers.
  The new load operations use a read barrier before the load, and the
  new store operations use a write barrier after the load.  For example,
  atomic_load_acq_int() will atomically load an integer as well as
  enforcing a read barrier.
2000-10-20 07:00:48 +00:00
jhb
4cc9f87fa2 Add atomic_readandclear_int and atomic_readandclear_long. 2000-10-05 22:19:50 +00:00
phk
6c07bccbf7 Introduce atomic_cmpset_int() and atomic_cmpset_long() from SMPng a
few hours earlier than the rest.

The next DEVFS commit needs these functions.

Alpha versions by: dfr
i386 versions by: jakeb

Approved by:    SMPng
2000-09-06 11:21:14 +00:00
obrien
5cfeff679b When using _asm{} in GCC, one must specify the operand's size if one
specifies the instruction's operation size.  GCC will default to 32-bit
operands reguardless of the prototype (ie, formal parameters' type)
of an inline function.
2000-05-10 01:15:55 +00:00
peter
556bfe0799 Use the rev 1.1.2.1 code from RELENG_3 for atomic operations rather
than the non-atomic C macros.
1999-10-04 16:24:08 +00:00
peter
fb976dbce4 Typo: s/__GNUC_MINOR_/__GNUC_MINOR__/
(__GNUC_MINOR__ on egcs in -current is "91" and is going to be "95" soon)
1999-10-04 16:18:04 +00:00
eivind
f53ad097bb Allow compilation with older versions of GCC, in order to make it possible
to bootstrap and work with -current from older versions of FreeBSD.
1999-10-03 21:15:25 +00:00