freebsd-dev

History

Conrad Meyer 9f7e5bdad1 sort(1): Fix two wchar-related bugs in radixsort Sort(1)'s radixsort implementation was broken for multibyte LC_CTYPEs in at least two ways: * In actual radix sort, it would only bucket the least significant byte from each wchar, ignoring the 24 most-significant bits of each unicode character. * In degenerate cases / "fast paths," it would fall back to another sorting algorithm (default: mergesort) with a bogus comparator offset. The string comparison functions in sort(1) take an offset in units of the operating character size. However, radixsort was passing an offset in units of bytes. The byte offset must be divided by sizeof(wchar_t). This revision addresses both discovered issues. Some example testcases: $ (echo 耳 ; echo 脳 ; echo 耳) \| \ LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug $ (echo 耳 ; echo 脳 ; echo 耳) \| \ LC_CTYPE=C LC_COLLATE=C LANG=C sort --radixsort --debug $ (for i in $(jot 34); do echo 耳耳耳耳耳; echo 耳耳耳耳脳; echo 耳耳耳耳脴; done) \| \ LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug PR: 247494 Reported by: knu MFC after: I do not intend to, but parties interested in stable might want to		2020-06-23 16:43:48 +00:00
..
nls
tests	sort(1): Add bits to allow easy checking against NetBSD tests	2018-06-20 03:10:49 +00:00
bwstring.c	various: general adoption of SPDX licensing ID tags.	2017-11-27 15:37:16 +00:00
bwstring.h	various: general adoption of SPDX licensing ID tags.	2017-11-27 15:37:16 +00:00
coll.c	sort(1): Memoize MD5 computation to reduce repeated computation	2019-04-13 04:42:17 +00:00
coll.h	sort(1): Memoize MD5 computation to reduce repeated computation	2019-04-13 04:42:17 +00:00
file.c	sort(1): Fix -m when only implicit stdin is used for input	2018-06-20 03:31:19 +00:00
file.h	sort(1): Fix -m when only implicit stdin is used for input	2018-06-20 03:31:19 +00:00
Makefile	Don't use absolute path to sed when building usr.bin/join	2018-08-23 18:18:43 +00:00
Makefile.depend	Update Makefile.depend files	2019-12-11 17:37:53 +00:00
Makefile.depend.options	Add Makefile.depend.options	2019-12-11 17:37:37 +00:00
mem.c	various: general adoption of SPDX licensing ID tags.	2017-11-27 15:37:16 +00:00
mem.h	various: general adoption of SPDX licensing ID tags.	2017-11-27 15:37:16 +00:00
radixsort.c	sort(1): Fix two wchar-related bugs in radixsort	2020-06-23 16:43:48 +00:00
radixsort.h	various: general adoption of SPDX licensing ID tags.	2017-11-27 15:37:16 +00:00
sort.1.in	Adjust history, info source from v1's manuals	2019-09-04 13:44:46 +00:00
sort.c	sort(1): Memoize MD5 computation to reduce repeated computation	2019-04-13 04:42:17 +00:00
sort.h	various: general adoption of SPDX licensing ID tags.	2017-11-27 15:37:16 +00:00
vsort.c	various: general adoption of SPDX licensing ID tags.	2017-11-27 15:37:16 +00:00
vsort.h	various: general adoption of SPDX licensing ID tags.	2017-11-27 15:37:16 +00:00