Mateusz Guzik 20ca271fdd amd64: depessimize bcmp for small buffers
Adapt assembly generated by clang for memcmp and use it for <= 64 sized
compares (which are the vast majority).

Sample result of doing stats on Broadwell (% of samples):
before: 4.0 kernel     bcmp                 cache_lookup
after : 0.7 kernel     bcmp                 cache_lookup

The routine is most definitely still not optimal. Anyone interested in
spending time improving it is welcome to take over.

Reviewed by:	kib
2018-05-09 15:16:25 +00:00
..
2018-03-20 17:58:51 +00:00
2018-03-20 17:58:51 +00:00