Commit Graph

86 Commits

Author SHA1 Message Date
Hans Petter Selasky
25a1e0f636 Implement missing atomic_fcmpset_XXX() support for i386.
This also fixes i386 build after r337527.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-08-09 11:30:13 +00:00
Hans Petter Selasky
a7a7f5b472 Make sure kernel modules built by default are portable between UP and
SMP systems by extending defined(SMP) to include defined(KLD_MODULE).

This is a regression issue after r335873 .

Discussed with:		mmacy@
Sponsored by:		Mellanox Technologies
2018-07-06 10:13:42 +00:00
Matt Macy
f4b3640475 inline atomics and allow tied modules to inline locks
- inline atomics in modules on i386 and amd64 (they were always
  inline on other arches)
- allow modules to opt in to inlining locks by specifying
  MODULE_TIED=1 in the makefile

Reviewed by: kib
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16079
2018-07-02 19:48:38 +00:00
Hans Petter Selasky
43bb1274d0 Implement atomic_add_64() and atomic_subtract_64() for the i386 target.
While at it add missing _acq_ and _rel_ variants for 64-bit atomic
operations under i386.

Reviewed by:	kib @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-29 11:59:02 +00:00
Konstantin Belousov
30d4f9e888 Add atomic_load(9) and atomic_store(9) operations.
They provide relaxed-ordered atomic access semantic.  Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses.  The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.

The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations.  It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.

Suggested by:	jhb
Reviewed by:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13534
2017-12-19 09:59:20 +00:00
Pedro F. Giffuni
83ef78be95 sys/i386: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:08:52 +00:00
Hans Petter Selasky
322f006ecc Implement atomic_fetchadd_64() for i386. This function is needed by the
atomic64 header file in the LinuxKPI for i386.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-24 12:10:42 +00:00
Gleb Smirnoff
83c9dea1ba - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place.  To do per-cpu stats, convert all fields that previously were
  maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
  before we have set up UMA and we can do counter_u64_alloc(), provide an
  early counter mechanism:
  o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
  o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
    so that at early stages of boot, before counters are allocated we already
    point to a counter that can be safely written to.
  o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
  to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by:	kib, gallatin, marius, lidl
Differential Revision:	https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
Mark Johnston
5788c2bde1 Adjust the constraint for "src" in atomic_(f)cmpset_8.
"r" is not sufficient to prevent the use of invalid byte-width registers
with at least gcc.

Reported and reviewed by:	bde
X-MFC-With:	r315718
2017-03-27 16:18:19 +00:00
Mark Johnston
3d6732549d Add support for 8- and 16-bit atomic_(f)cmpset to x86.
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10068
2017-03-22 17:29:04 +00:00
Konstantin Belousov
57f6622f92 For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE
and device npx.

This means that FPU is always initialized and handled when available,
and SSE+ register file and exception are handled when available.  This
makes the kernel FPU code much easier to maintain by the cost of
slight bloat for CPUs older than 25 years.

CPU_DISABLE_CMPXCHG outlived its usefulness, see the removed comment
explaining the original purpose.

Suggested by and discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2017-02-03 12:51:40 +00:00
Mateusz Guzik
ed869ff019 i386: fixup fcmpset
An incorrect output specifier was used which worked with clang by accident,
but breaks with the in-tree gcc version.

While here plug a whitespace nit.

Reported by:	bde
2017-02-02 01:33:08 +00:00
Mateusz Guzik
e7a98aef79 i386: add atomic_fcmpset
Tested by:	pho
2017-01-30 02:24:54 +00:00
Sepherosa Ziehau
dfdc9a05c6 atomic: Add testandclear on i386/amd64
Reviewed by:	kib
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6381
2016-05-16 07:19:33 +00:00
Konstantin Belousov
0b6476ec5b Improve comments.
Submitted by:	bde
MFC after:	2 weeks
2015-07-30 15:47:53 +00:00
Konstantin Belousov
48cae112b5 Use private cache line for the locked nop in *mb() on i386.
Suggested by:	alc
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-30 00:13:20 +00:00
Konstantin Belousov
dd5b64258f MFamd64 r285934: Remove store/load (= full) barrier from the i386
atomic_load_acq_*().

Noted by:	alc (long time ago)
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-29 23:59:17 +00:00
Konstantin Belousov
8954a9a4e6 Add the atomic_thread_fence() family of functions with intent to
provide a semantic defined by the C11 fences with corresponding
memory_order.

atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.

atomic_thread_fence_rel() is r, w | w.

atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation.  Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.

atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.

Reviewed by:	alc
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-08 18:12:24 +00:00
Konstantin Belousov
3ac3c0f269 Add a comment about too strong semantic of atomic_load_acq() on x86.
Submitted by:	bde
MFC after:	2 weeks
2015-06-29 09:58:40 +00:00
Konstantin Belousov
7626d062c3 Remove unneeded data dependency, currently imposed by
atomic_load_acq(9), on it source, for x86.

Right now, atomic_load_acq() on x86 is sequentially consistent with
other atomics, code ensures this by doing store/load barrier by
performing locked nop on the source.  Provide separate primitive
__storeload_barrier(), which is implemented as the locked nop done on
a cpu-private variable, and put __storeload_barrier() before load, to
keep seq_cst semantic but avoid introducing false dependency on the
no-modification of the source for its later use.

Note that seq_cst property of x86 atomic_load_acq() is not documented
and not carried by atomics implementations on other architectures,
although some kernel code relies on the behaviour.  This commit does
not intend to change this.

Reviewed by:	alc
Discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-28 05:04:08 +00:00
Jung-uk Kim
5188b5f3c2 Implement atomic_cmpset_64() and atomic_swap_64() for i386. 2013-08-21 22:30:11 +00:00
Jung-uk Kim
3264fd707a Reimplement atomic_load_acq_64() and atomic_store_rel_64() for i386. These
functions are now real functions rather than function pointers.  Supposedly,
it is faster for modern processors.

Suggested by:	bde
2013-08-21 22:27:42 +00:00
Jung-uk Kim
d36eb3f1c4 Remove empty lines before return statements for style consistency. 2013-08-21 22:05:58 +00:00
Jung-uk Kim
8a1ee2d346 Implement atomic_swap() and atomic_testandset().
Reviewed by:	arch, bde, jilles, kib
2013-08-21 22:03:06 +00:00
Jung-uk Kim
da255e4c7f - Remove the "a" constraint from main output operand for atomic_cmpset().
- Use "+" modifier for the "expect" because it is also an output (unused).
2013-08-21 21:30:06 +00:00
Jung-uk Kim
fe94be3da7 Use '+' modifier for a memory operand that is both an input and an output.
It was actually done in r86301 but reverted in r150182 because GCC 3.x was
not able to handle it for a memory operand.  Apparently, this problem was
fixed in GCC 4.1+ and several contrib sources already rely on this feature.
2013-08-21 21:14:16 +00:00
Jung-uk Kim
c1c84ce1bf Remove bogus labels. No functional change. 2013-08-21 20:49:46 +00:00
Jung-uk Kim
ee93d1173a Use consistent style. No functional change. 2013-08-21 20:43:50 +00:00
Attilio Rao
3a4730256a Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
Konstantin Belousov
fa9f322df9 Use plain store for atomic_store_rel on x86, instead of implicitly
locked xchg instruction.  IA32 memory model guarantees that store has
release semantic, since stores cannot pass loads or stores.

Reviewed by:	  bde, jhb
Tested by:	  pho
MFC after:	  2 weeks
2012-06-02 18:10:16 +00:00
Jung-uk Kim
d521c6b9c4 Implement atomic_load_acq_64(9) and atomic_store_rel_64(9) for i386. These
functions are implemented with CMPXCHG8B instruction where it is available,
i. e., all Pentium-class and later processors.  Note this instruction is
also used for atomic_store_rel_64() because a simple XCHG-like instruction
for 64-bit memory access does not exist, unfortunately.  If the processor
lacks the instruction, i. e., 80486-class CPUs, two 32-bit load/store are
performed with interrupt temporarily disabled, assuming it does not support
SMP.  Although this assumption may be little naive, it is true in reality.
This implementation is inspired by Linux.
2011-04-06 23:59:59 +00:00
Konstantin Belousov
7222d2fbee Inform a compiler which asm statements in the x86 implementation of
atomics change eflags.

Reviewed by:	jhb
MFC after:	2 weeks
2010-12-18 16:41:11 +00:00
Poul-Henning Kamp
065b12a703 Rename an argument from "exp" to "expect" since the former makes FlexeLint
uneasy, in case anybody think it might be exp(3) in libm.

This also makes it consistent with other archs.
2010-05-20 06:18:03 +00:00
Attilio Rao
8448afced8 atomic_cmpset_barr_* was added in order to cope with compilers willing to
specify their own version of atomic_cmpset_* which could have been
different than the membar version.

Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.

In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.

Requested by:	jhb
Reviewed by:	jhb
Tested by:	Giovanni Trematerra <giovanni dot trematerra at
		gmail dot com>
2009-10-09 15:51:40 +00:00
Attilio Rao
d9492a4483 - All the functions in atomic.h needs to be in "physical" form (like
not defined through macros or similar) in order to be later compiled in
  the kernel and offer this way the support for modules (and
  compatibility among the UP case and SMP case).
  Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
  and specifying a template.  Note that the new DEFINE_CMPSET_GEN()
  template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.

[1] Reported by:	Paul B. Mahol <onemda at gmail dot com>
2009-10-06 23:48:28 +00:00
Attilio Rao
86d2e48c22 Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used.  GCC, however, does that aggressively, even in
presence of volatile operands.  The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).

Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.

Reported by:	jhb
Reviewed by:	jhb
Tested by:	rdivacky, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
2009-10-06 13:45:49 +00:00
Konstantin Belousov
422dcc2416 Restore memory clobber, to cause mb on the compiler level too.
Use more sane formatting of the assembler.

Pointed out by:	bde
2008-12-06 21:33:44 +00:00
Konstantin Belousov
2640173120 Unconditionally use locked addition of zero to tip of the stack for
memory barriers on i386. It works as a serialization instruction on
all IA32 CPUs.

Alternative solution of using {s,l,}fence requires run-time checking
of the presense of the corresponding SSE or SSE2 extensions, and
possible boot-time patching of the kernel text.

Suggested by:	many
2008-12-05 21:17:54 +00:00
Kip Macy
db7f0b974f - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
  allow drivers to efficiently manage multiple hardware queues
  (i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
2008-11-22 05:55:56 +00:00
Pawel Jakub Dawidek
6eb4157ffc Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
Bruce Evans
0b194ec872 Fix oops in previous commit. 2006-12-29 15:48:18 +00:00
Bruce Evans
f28e1c8f99 Fixed some style bugs (mainly assorted errors in comments, and inconsistent
spelling of `result').
2006-12-29 15:29:49 +00:00
Bruce Evans
6c296ffa81 Fixed some style bugs (whitespace only). 2006-12-29 14:28:23 +00:00
Bruce Evans
7e4277e591 Try harder to garbage-collect the "LOCORE" (really asm) version of
MPLOCKED.  The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors.  The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.
2006-12-29 13:36:26 +00:00
Bruce Evans
26ab2d1d23 Avoid an instruction in atomic_cmpset_{int_long)() in most cases.
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%.  This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).

Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register.  We converted  to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax.  Let
gcc promote %al in the few cases where this is needed.

Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
  %al anywhere, so don't hard-code it in the constraints either.

Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
  for the main output operand, although this is not required.  The input
  and output operands that use the "a" constraint are now decoupled, and
  this makes things clearer except for the reason that the output register
  is hard-coded.  It is now just a hack to tell gcc that the input "a" has
  been clobbered without increasing the number of operands.
2006-12-27 20:26:00 +00:00
Dag-Erling Smørgrav
6f0f8cca25 Use wrapper macros for atomic pointer operations in order to perform the
correct casts.  This should probably be merged to other architectures.
2006-03-28 14:34:48 +00:00
John Baldwin
3c2bc2bf26 Add a new atomic_fetchadd() primitive that atomically adds a value to a
variable and returns the previous value of the variable.

Tested on:	i386, alpha, sparc64, arm (cognet)
Reviewed by:	arch@
Submitted by:	cognet (arm)
MFC after:	1 week
2005-09-27 17:39:11 +00:00
John Baldwin
80d52f16da Stop using the '+' constraint modifier with inline assembly. The '+'
constraint is actually only allowed for register operands.  Instead, use
separate input and output memory constraints.

Education from:	alc
Reviewed by:	alc
Tested on:	i386, alpha
MFC after:	1 week
2005-09-15 19:31:22 +00:00
John Baldwin
122eceef61 Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables.  This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after:	3 days
Tested on:	i386, alpha, sparc64
Compiled on:	ia64, powerpc, amd64
Kernel toolchain busted on:	arm
2005-07-15 18:17:59 +00:00
John Baldwin
48281036d7 Some cleanups and tweaks to some of the atomic.h files in preparation for
further changes and fixes in the future:
- Use aliases via macros rather than duplicated inlines wherever possible.
- Move all the aliases to the bottom of these files and the inline
  functions to the top.
- Add various comments.
- On alpha, drop atomic_{load_acq,store_rel}_{8,char,16,short}().
- On i386 and amd64, don't duplicate the extern declarations for functions
  in the two non-inline cases (KLD_MODULE and compiler doesn't do inlines),
  instead, consolidate those two cases.
- Some whitespace fixes.

Approved by:	re (scottl)
2005-07-09 12:38:53 +00:00