numam-dpdk/lib
Gavin Hu 9ed8770628 ring/c11: synchronize load and store of the tail
Synchronize the load-acquire of the tail and the store-release
within update_tail, the store release ensures all the ring operations,
enqueue or dequeue, are seen by the observers on the other side as soon
as they see the updated tail. The load-acquire is needed here as the
data dependency is not a reliable way for ordering as the compiler might
break it by saving to temporary values to boost performance.
When computing the free_entries and avail_entries, use atomic semantics
to load the heads and tails instead.

The patch was benchmarked with test/ring_perf_autotest and it decreases
the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains
are dependent on the number of lcores, depth of the ring, SPSC or MPMC.
For 1 lcore, it also improves a little, about 3 ~ 4%.
It is a big improvement, in case of MPMC, with two lcores and ring size
of 32, it saves latency up to (3.26-2.36)/3.26 = 27.6%.

This patch is a bug fix, while the improvement is a bonus. In our analysis
the improvement comes from the cacheline pre-filling after hoisting load-
acquire from _atomic_compare_exchange_n up above.

The test command:
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=\
1024 -- -i

Test result with this patch(two cores):
 SP/SC bulk enq/dequeue (size: 8): 5.86
 MP/MC bulk enq/dequeue (size: 8): 10.15
 SP/SC bulk enq/dequeue (size: 32): 1.94
 MP/MC bulk enq/dequeue (size: 32): 2.36

In comparison of the test result without this patch:
 SP/SC bulk enq/dequeue (size: 8): 6.67
 MP/MC bulk enq/dequeue (size: 8): 13.12
 SP/SC bulk enq/dequeue (size: 32): 2.04
 MP/MC bulk enq/dequeue (size: 32): 3.26

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jia He <justin.he@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-11-05 14:34:19 +01:00
..
librte_acl eal: add macro for attribute weak 2018-10-25 02:11:23 +02:00
librte_bbdev remove useless constructor headers 2018-07-12 00:00:35 +02:00
librte_bitratestats bitrate: add sanity check on parameters 2018-07-26 20:07:57 +02:00
librte_bpf bpf: fix a typo 2018-10-25 11:27:49 +02:00
librte_cfgfile
librte_cmdline ethdev: support MAC address as iterator filter 2018-10-26 22:14:06 +02:00
librte_compat buildtools: change license to SPDX 2018-07-26 22:45:17 +02:00
librte_compressdev compressdev: fix op allocation 2018-11-02 12:25:39 +01:00
librte_cryptodev lib: reduce global variable usage 2018-10-29 02:34:27 +01:00
librte_distributor
librte_eal mem: add thread unsafe version for DMA mask check 2018-11-05 01:02:14 +01:00
librte_efd
librte_ethdev add missing static keyword to globals 2018-10-29 02:01:08 +01:00
librte_eventdev lib: reduce global variable usage 2018-10-29 02:34:27 +01:00
librte_flow_classify flow_classify: do not check for invalid socket ID 2018-10-11 10:37:45 +02:00
librte_gro
librte_gso gso: support UDP/IPv4 fragmentation 2018-07-11 23:45:20 +02:00
librte_hash hash: remove unnecessary pause 2018-10-26 22:01:37 +02:00
librte_ip_frag ip_frag: fix overflow in key comparison 2018-10-28 11:16:49 +01:00
librte_jobstats
librte_kni kni: add function to set link state on kernel interface 2018-10-26 19:46:15 +02:00
librte_kvargs kvargs: support list value 2018-10-26 22:14:06 +02:00
librte_latencystats latency: fix timestamp marking and latency calculation 2018-10-25 10:30:13 +02:00
librte_lpm lpm6: add incremental update on delete 2018-09-19 17:11:37 +02:00
librte_mbuf ethdev: support metadata as flow rule criteria 2018-10-26 22:14:05 +02:00
librte_member remove useless constructor headers 2018-07-12 00:00:35 +02:00
librte_mempool malloc: add function to check if socket is external 2018-10-11 11:11:25 +02:00
librte_meter meter: remove experimental tag from profile API 2018-08-06 01:15:11 +02:00
librte_metrics metrics: disallow null as metric name 2018-07-26 20:30:18 +02:00
librte_net add missing static keyword to globals 2018-10-29 02:01:08 +01:00
librte_pci use SPDX tag for 6WIND copyrighted files 2018-05-25 10:47:06 +02:00
librte_pdump mk: build with _GNU_SOURCE defined by default 2018-10-22 11:28:27 +02:00
librte_pipeline pipeline: add table action for packet decap 2018-10-12 19:33:34 +02:00
librte_port port: add symmetric crypto 2018-10-12 19:33:02 +02:00
librte_power power: fix traffic aware build 2018-10-26 14:51:36 +02:00
librte_rawdev lib: reduce global variable usage 2018-10-29 02:34:27 +01:00
librte_reorder
librte_ring ring/c11: synchronize load and store of the tail 2018-11-05 14:34:19 +01:00
librte_sched mk: build with _GNU_SOURCE defined by default 2018-10-22 11:28:27 +02:00
librte_security security: support PDCP 2018-10-24 15:12:33 +02:00
librte_table lib/librte_table: add hash function headers 2018-10-12 17:58:53 +02:00
librte_telemetry build: add dependency on telemetry to apps with meson 2018-10-27 15:21:33 +02:00
librte_timer eal: make semantics of lcore role function more intuitive 2018-04-26 16:58:18 +02:00
librte_vhost vhost: initialize postcopy ufd properly 2018-10-26 22:14:06 +02:00
Makefile telemetry: introduce infrastructure 2018-10-27 15:18:20 +02:00
meson.build build: change default driver installation directory 2018-10-27 23:22:12 +02:00