Simple functional test for rte_smp_mb() implementations.
Also when executed on a single lcore could be used as rough
estimation how many cycles particular implementation of rte_smp_mb()
might take.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>