Mateusz Guzik 20ca271fdd amd64: depessimize bcmp for small buffers
Adapt assembly generated by clang for memcmp and use it for <= 64 sized
compares (which are the vast majority).

Sample result of doing stats on Broadwell (% of samples):
before: 4.0 kernel     bcmp                 cache_lookup
after : 0.7 kernel     bcmp                 cache_lookup

The routine is most definitely still not optimal. Anyone interested in
spending time improving it is welcome to take over.

Reviewed by:	kib
2018-05-09 15:16:25 +00:00
..
2018-05-08 21:14:29 +00:00
2018-05-09 14:11:35 +00:00
2018-04-26 12:23:31 +00:00
2018-04-11 17:28:24 +00:00
2018-05-08 13:23:39 +00:00
2018-05-02 15:59:15 +00:00
2018-05-09 14:11:35 +00:00
2018-03-24 21:30:24 +00:00
2018-05-04 17:17:30 +00:00
2018-05-07 21:09:21 +00:00
2018-05-02 10:19:17 +00:00