Xin LI dabae22639 Two optimizations to MI strlen(3) inspired by David S. Miller's
blog posting [1].

 - Use word-sized test for unaligned pointer before working
   the hard way.

   Memory page boundary is always integral multiple of a word
   alignment boundary.  Therefore, if we can access memory
   referenced by pointer p, then (p & ~word mask) must be also
   accessible.

 - Better utilization of multi-issue processor's ability of
   concurrency.

   The previous implementation utilized a formular that must be
   executed sequentially.  However, the ~, & and - operations can
   actually be caculated at the same time when the operand were
   different and unrelated.

   The original Hacker's Delight formular also offered consistent
   performance regardless whether the input would contain
   characters with their highest-bit set, as it catches real
   nul characters only.

These two optimizations has shown further improvements over the
previous implementation on microbenchmarks on i386 and amd64 CPU
including Pentium 4, Core Duo 2 and i7.

[1] http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2010/03/08#strlen_1

MFC after:	1 month
2010-03-12 21:14:56 +00:00
..
2009-06-23 14:11:41 +00:00
2007-06-03 17:20:27 +00:00
2009-11-16 14:33:31 +00:00
2010-01-31 21:47:39 +00:00
2010-02-20 08:19:19 +00:00
2009-11-25 04:45:45 +00:00