freebsd-dev/usr.bin/sort
Conrad Meyer 9f7e5bdad1 sort(1): Fix two wchar-related bugs in radixsort
Sort(1)'s radixsort implementation was broken for multibyte LC_CTYPEs in at
least two ways:

  * In actual radix sort, it would only bucket the least significant
    byte from each wchar, ignoring the 24 most-significant bits of each
    unicode character.

  * In degenerate cases / "fast paths," it would fall back to another
    sorting algorithm (default: mergesort) with a bogus comparator
    offset.  The string comparison functions in sort(1) take an offset
    in units of the operating character size.  However, radixsort was
    passing an offset in units of bytes.  The byte offset must be
    divided by sizeof(wchar_t).

This revision addresses both discovered issues.

Some example testcases:

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=C LC_COLLATE=C LANG=C           sort --radixsort --debug

  $ (for i in $(jot 34); do echo 耳耳耳耳耳; echo 耳耳耳耳脳; echo 耳耳耳耳脴; done) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

PR:		247494
Reported by:	knu
MFC after:	I do not intend to, but parties interested in stable might want to
2020-06-23 16:43:48 +00:00
..
nls
tests
bwstring.c
bwstring.h
coll.c
coll.h
file.c
file.h
Makefile
Makefile.depend Update Makefile.depend files 2019-12-11 17:37:53 +00:00
Makefile.depend.options
mem.c
mem.h
radixsort.c sort(1): Fix two wchar-related bugs in radixsort 2020-06-23 16:43:48 +00:00
radixsort.h
sort.1.in
sort.c
sort.h
vsort.c
vsort.h