87aa297030
implementations inspired by the ones in DragonFly. Unlike the DragonFly versions, these have a small data cache footprint, and my tests show that they're never slower than the old code except when the charset or the span is 0 or 1 characters. This implementation is generally faster than DragonFly until either the charset or the span gets in the ballpark of 32 to 64 characters.