cem 36a09a6cd5 stat(1): cache id->name resolution
When invoked on a large list of files, it is most common for a small number of
uids/gids to own most of the results.

Like ls(1), use pwcache(3) to avoid repeatedly looking up the same IDs.

Example microbenchmark and non-scientific results:

$ time (find /usr/src -type f -print0 | xargs -0 stat >/dev/null)

BEFORE:
3.62s user 5.23s system 102% cpu 8.655 total
3.47s user 5.38s system 102% cpu 8.647 total

AFTER:
1.23s user 1.81s system 108% cpu 2.810 total
1.43s user 1.54s system 107% cpu 2.754 total

Does this microbenchmark have any real-world significance?  Until a use case
is demonstrated otherwise, I doubt it.  Ordinarily I would be resistant to
optimizing pointless microbenchmarks in base utilities (e.g., recent totally
gratuitous changes to yes(1)).  However, the pwcache(3) APIs actually
simplify stat(1) logic ever so slightly compared to the raw APIs they wrap,
so I think this is at worst harmless.

PR:		230491
Reported by:	Thomas Hurst <tom AT hur.st>
Discussed with:	gad@
2018-08-11 02:56:43 +00:00
..