numam-dpdk/lib
Morten Brørup b77f58604a mempool: align cache objects on cache lines
Add __rte_cache_aligned to the objs array.

It makes no difference in the general case, but if get/put operations are
always 32 objects, it will reduce the number of memory (or last level
cache) accesses from five to four 64 B cache lines for every get/put
operation.

For readability reasons, an example using 16 objects follows:

Currently, with 16 objects (128B), we access to 3
cache lines:

      ┌────────┐
      │len     │
cache │********│---
line0 │********│ ^
      │********│ |
      ├────────┤ | 16 objects
      │********│ | 128B
cache │********│ |
line1 │********│ |
      │********│ |
      ├────────┤ |
      │********│_v_
cache │        │
line2 │        │
      │        │
      └────────┘

With the alignment, it is also 3 cache lines:

      ┌────────┐
      │len     │
cache │        │
line0 │        │
      │        │
      ├────────┤---
      │********│ ^
cache │********│ |
line1 │********│ |
      │********│ |
      ├────────┤ | 16 objects
      │********│ | 128B
cache │********│ |
line2 │********│ |
      │********│ v
      └────────┘---

However, accessing the objects at the bottom of the mempool cache is a
special case, where cache line0 is also used for objects.

Consider the next burst (and any following bursts):

Current:
      ┌────────┐
      │len     │
cache │        │
line0 │        │
      │        │
      ├────────┤
      │        │
cache │        │
line1 │        │
      │        │
      ├────────┤
      │        │
cache │********│---
line2 │********│ ^
      │********│ |
      ├────────┤ | 16 objects
      │********│ | 128B
cache │********│ |
line3 │********│ |
      │********│ |
      ├────────┤ |
      │********│_v_
cache │        │
line4 │        │
      │        │
      └────────┘
4 cache lines touched, incl. line0 for len.

With the proposed alignment:
      ┌────────┐
      │len     │
cache │        │
line0 │        │
      │        │
      ├────────┤
      │        │
cache │        │
line1 │        │
      │        │
      ├────────┤
      │        │
cache │        │
line2 │        │
      │        │
      ├────────┤
      │********│---
cache │********│ ^
line3 │********│ |
      │********│ | 16 objects
      ├────────┤ | 128B
      │********│ |
cache │********│ |
line4 │********│ |
      │********│_v_
      └────────┘
Only 3 cache lines touched, incl. line0 for len.

Credits go to Olivier Matz for the nice ASCII graphics.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2022-10-30 10:07:58 +01:00
..
acl version: 22.11-rc0 2022-07-21 12:13:48 +02:00
bbdev bbdev: fix build with clang 3.4.2 2022-10-11 01:34:07 +02:00
bitratestats version: 22.11-rc0 2022-07-21 12:13:48 +02:00
bpf eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
cfgfile version: 22.11-rc0 2022-07-21 12:13:48 +02:00
cmdline version: 22.11-rc0 2022-07-21 12:13:48 +02:00
compressdev dev: hide driver object 2022-09-23 16:14:34 +02:00
cryptodev security: hide session structure 2022-10-04 22:37:54 +02:00
distributor eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
dmadev dmadev: support telemetry dump dmadev 2022-10-03 12:03:36 +02:00
eal rwlock: promote trylock operations as stable 2022-10-27 13:00:11 +02:00
efd eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
ethdev ethdev: add structure for indirect flow age update 2022-10-28 12:41:03 +02:00
eventdev eventdev/eth_tx: fix queue delete 2022-10-21 11:42:08 +02:00
fib lib: remove empty return types from doxygen comments 2022-10-26 17:51:51 +02:00
flow_classify flow_classify: mark library as deprecated 2022-10-28 16:20:59 +02:00
gpudev dev: hide driver object 2022-09-23 16:14:34 +02:00
graph graph: fix node objects allocation 2022-10-10 17:30:39 +02:00
gro gro: check payload length after trim 2022-10-26 17:18:11 +02:00
gso version: 22.11-rc0 2022-07-21 12:13:48 +02:00
hash lib: remove empty return types from doxygen comments 2022-10-26 17:51:51 +02:00
ip_frag ip_frag: add IPv4 fragment copy 2022-08-29 16:24:18 +02:00
ipsec lib: remove empty return types from doxygen comments 2022-10-26 17:51:51 +02:00
jobstats version: 22.11-rc0 2022-07-21 12:13:48 +02:00
kni kni: add deprecation warning at runtime 2022-10-10 17:04:09 +02:00
kvargs version: 22.11-rc0 2022-07-21 12:13:48 +02:00
latencystats version: 22.11-rc0 2022-07-21 12:13:48 +02:00
lpm lib: remove empty return types from doxygen comments 2022-10-26 17:51:51 +02:00
mbuf mbuf: move next pointer to first cache line if PA disabled 2022-10-09 13:14:57 +02:00
member member: fix build with GCC 5.4.0 2022-10-10 12:20:01 +02:00
mempool mempool: align cache objects on cache lines 2022-10-30 10:07:58 +01:00
meter eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
metrics metrics: return error code on initialization failures 2022-10-03 12:03:36 +02:00
net net: fix build with -Wpedantic 2022-09-29 09:20:12 +02:00
node node: check Rx element allocation 2022-10-10 17:53:12 +02:00
pcapng pcapng: record received RSS hash in pcap file 2022-10-27 10:29:59 +02:00
pci eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
pdump pdump: do not allow enable/disable in primary process 2022-10-21 14:54:26 +02:00
pipeline mbuf: add helper to get/set IOVA address 2022-10-08 23:58:26 +02:00
port port: prevent unnecessary flush for ring output port 2022-09-22 16:56:58 +02:00
power power: fix P-state number parsing 2022-10-26 23:36:56 +02:00
rawdev rawdev: support telemetry dump rawdev 2022-10-03 12:03:36 +02:00
rcu rcu: fix build with datapath debug log 2022-10-06 12:37:11 +02:00
regexdev regexdev: add maximum number of mbuf segments 2022-10-09 14:54:30 +02:00
reorder lib: remove empty return types from doxygen comments 2022-10-26 17:51:51 +02:00
rib lib: remove empty return types from doxygen comments 2022-10-26 17:51:51 +02:00
ring mem: fix API doc about allocation on secondary processes 2022-10-04 13:36:13 +02:00
sched sched: fix subport profile configuration 2022-10-28 16:20:59 +02:00
security security: hide session structure 2022-10-04 22:37:54 +02:00
stack version: 22.11-rc0 2022-07-21 12:13:48 +02:00
table table: add entry ID for learner tables 2022-09-24 11:35:23 +02:00
telemetry telemetry: make help command more helpful 2022-09-26 13:49:38 +02:00
timer timer: fix stopping all timers 2022-10-05 15:29:54 +02:00
vhost vhost: promote per-queue stats API to stable 2022-10-26 11:11:03 +02:00
meson.build flow_classify: mark library as deprecated 2022-10-28 16:20:59 +02:00