4b42e90ef0
This patch fixes rte_memcpy performance in Haswell and Broadwell for vhost when copy size larger than 256 bytes. It is observed that for large copies like 1024/1518 ones, rte_memcpy suffers high ratio of store buffer full issue which causes pipeline to stall in scenarios like vhost enqueue. This can be alleviated by adjusting instruction layout. Note that this issue may not be visible in micro test. How to reproduce? PHY-VM-PHY using vhost/virtio or vhost/virtio loop back, with large packets like 1024/1518 bytes ones. Make sure packet generation rate is not the bottleneck if PHY-VM-PHY is used. Test report: http://dpdk.org/ml/archives/dev/2016-May/039716.html Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Tested-by: Qian Xu <qian.q.xu@intel.com>