The current implementation using the '__sync' built-ins to synchronize
statistics within worker threads. The '__sync' built-ins functions are
full barriers which will affect the performance, so add a per worker
packets statistics to remove the synchronisation between worker threads.
Since the maximum core number can get to 256, so disable the per core
stats print in default and add the --insight-worker option to enable it.
For example:
sudo examples/packet_ordering/arm64-armv8a-linuxapp-gcc/packet_ordering \
-l 112-115 --socket-mem=1024,1024 -n 4 -- -p 0x03 --insight-worker
RX thread stats:
- Pkts rxd: 226539223
- Pkts enqd to workers ring: 226539223
Worker thread stats on core [113]:
- Pkts deqd from workers ring: 77557888
- Pkts enqd to tx ring: 77557888
- Pkts enq to tx failed: 0
Worker thread stats on core [114]:
- Pkts deqd from workers ring: 148981335
- Pkts enqd to tx ring: 148981335
- Pkts enq to tx failed: 0
Worker thread stats:
- Pkts deqd from workers ring: 226539223
- Pkts enqd to tx ring: 226539223
- Pkts enq to tx failed: 0
TX stats:
- Pkts deqd from tx ring: 226539223
- Ro Pkts transmitted: 226539168
- Ro Pkts tx failed: 0
- Pkts transmitted w/o reorder: 0
- Pkts tx failed w/o reorder: 0
Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>