3554cddbfa
I benchmarked this by simultaneously extracting 4 large tarballs (basically world images) on a 4-processor AMD64 system, in a malloc-backed md. With this patch, system time was reduced by 43%, and wall clock time by 33%. Submitted by: jeff MFC after: 1 week