Instead of polling nleft[i] (without appropriate memory barriers!) and
using sleep() to detect the exit just call pthread_join() on all threads.
Also replace the use of a mutex that guarding the increments with atomic
fetch_add. This should reduce the runtime of this test on SMP systems.
Finally, remove all the debug printfs unless DEBUG_OUTPUT is set in
the environment.
Test Plan: still fails sometimes on qemu (but maybe less often?)
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D29390