It is O(n) in the length of the haystack (big) string, and has special
cases for short needle (little) strings, of one to four bytes, to avoid
excessive overhead.
There are a small set of nearly trivial cases where the startup overhead
of the musl implementation makes it slightly slower -- for example, a 31
byte needle that matches the beginning of the haystack. It's faster for
non-trivial cases, and significantly so for inputs that trigger worst-
case behaviour of the previous implementation. As an example, in my
tests a 16K needle that matches the end of a 64K haystack is nearly
2000x faster with this implementation.
Reviewed by: bapt (earlier), ed (earlier)
Obtained from: musl (snapshot at commit c718f9fc)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2601