b77f58604a
Add __rte_cache_aligned to the objs array. It makes no difference in the general case, but if get/put operations are always 32 objects, it will reduce the number of memory (or last level cache) accesses from five to four 64 B cache lines for every get/put operation. For readability reasons, an example using 16 objects follows: Currently, with 16 objects (128B), we access to 3 cache lines: ┌────────┐ │len │ cache │********│--- line0 │********│ ^ │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line1 │********│ | │********│ | ├────────┤ | │********│_v_ cache │ │ line2 │ │ │ │ └────────┘ With the alignment, it is also 3 cache lines: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤--- │********│ ^ cache │********│ | line1 │********│ | │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line2 │********│ | │********│ v └────────┘--- However, accessing the objects at the bottom of the mempool cache is a special case, where cache line0 is also used for objects. Consider the next burst (and any following bursts): Current: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤ │ │ cache │ │ line1 │ │ │ │ ├────────┤ │ │ cache │********│--- line2 │********│ ^ │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line3 │********│ | │********│ | ├────────┤ | │********│_v_ cache │ │ line4 │ │ │ │ └────────┘ 4 cache lines touched, incl. line0 for len. With the proposed alignment: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤ │ │ cache │ │ line1 │ │ │ │ ├────────┤ │ │ cache │ │ line2 │ │ │ │ ├────────┤ │********│--- cache │********│ ^ line3 │********│ | │********│ | 16 objects ├────────┤ | 128B │********│ | cache │********│ | line4 │********│ | │********│_v_ └────────┘ Only 3 cache lines touched, incl. line0 for len. Credits go to Olivier Matz for the nice ASCII graphics. Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com> |
||
---|---|---|
.. | ||
acl | ||
bbdev | ||
bitratestats | ||
bpf | ||
cfgfile | ||
cmdline | ||
compressdev | ||
cryptodev | ||
distributor | ||
dmadev | ||
eal | ||
efd | ||
ethdev | ||
eventdev | ||
fib | ||
flow_classify | ||
gpudev | ||
graph | ||
gro | ||
gso | ||
hash | ||
ip_frag | ||
ipsec | ||
jobstats | ||
kni | ||
kvargs | ||
latencystats | ||
lpm | ||
mbuf | ||
member | ||
mempool | ||
meter | ||
metrics | ||
net | ||
node | ||
pcapng | ||
pci | ||
pdump | ||
pipeline | ||
port | ||
power | ||
rawdev | ||
rcu | ||
regexdev | ||
reorder | ||
rib | ||
ring | ||
sched | ||
security | ||
stack | ||
table | ||
telemetry | ||
timer | ||
vhost | ||
meson.build |