Commit Graph

132 Commits

Author SHA1 Message Date
Baptiste Daroussin
226e41467e sort: deindent file_reader_free and cleanup its usage 2022-10-13 10:51:17 +02:00
Baptiste Daroussin
ffd41d39c6 sort: simplify file_reader_clean
Deindent the function, remove useless tests:
 - free already test if argument is NULL
 - closefile already test if the input is stdin or null
2022-10-13 10:42:23 +02:00
Baptiste Daroussin
f9d9a7cc4f sort: deindent closefile 2022-10-13 10:38:12 +02:00
Baptiste Daroussin
48a53cc484 sort: use asprintf(3) instead of malloc + snprintf(3) 2022-10-13 10:34:57 +02:00
Baptiste Daroussin
958b0d4642 sort: deindent openfile 2022-10-13 10:31:34 +02:00
Baptiste Daroussin
f079ef8aa4 sort: simplify the code to handle -z flag 2022-10-13 10:24:11 +02:00
Baptiste Daroussin
4d4fcf619e sort: cleanup now unused structutre and prototypes 2022-10-13 10:24:11 +02:00
Baptiste Daroussin
8b9071360a sort: unify the code to read from FILE *
Previously the code to read from a local file or stdin was sperarated
After the change to remove the home made line reader used for stdin
(replaced by getdelim) it apprears that the rest of the code which is
used to read from any FILE * but stdin can benefit from the exact same
change.
2022-10-13 10:24:11 +02:00
Baptiste Daroussin
e8815fb30b sort: remove unused function 2022-10-13 10:24:11 +02:00
Baptiste Daroussin
f02c783757 sort: use memset to initialize structure when possible 2022-10-13 10:24:11 +02:00
Baptiste Daroussin
3f9e5e59bd sort: use mkstemp(3) instead of reinventing it
MFC After:	1 week
2022-10-12 18:01:57 +02:00
Baptiste Daroussin
b58094c0d9 sort: replace home made line reader by getdelim(3)
The previous code had bug when reading lines with an unexpected
encoding, returning without the full line being captured.
This result in sort complaining with "sort: Illegal byte sequence"

Using getdelim(3) instead of the home made code, fixes the situation.

PR:		241679
Reported by:	Ronald F. Guilmette <rfg-freebsd@tristatelogic.com>
MFC After:	1 week
Reviewed by:	markj, imp
Differential Revision:	https://reviews.freebsd.org/D36948
2022-10-12 17:37:33 +02:00
Baptiste Daroussin
ed990a7a2f sort: remove NLS support
NLS support for sort(1) is:
1/ incomplete: many error string are not using nls
2/ only covers hu_HU.ISO8859-2
2022-10-12 16:24:29 +02:00
Baptiste Daroussin
ecc3c29167 sort: replace malloc+memset with calloc 2022-10-12 16:12:04 +02:00
Baptiste Daroussin
a312f3e742 sort: add wrapper around calloc 2022-10-12 16:12:04 +02:00
Doug Rabson
0c19c4db74 Move sort to runtime
Allows pkg bootstrap without having to install FreeBSD-utilities
2022-07-29 11:27:25 +01:00
Mark Johnston
8d8b9b560a sort: Fix message catalogue usage
- Check that catopen() succeeded before calling catclose().  musl will
  crash in the latter if the catalogue descriptor is -1.
- Keep the message catalogue open for most of sort(1)'s actual
  operation.
- Don't use catgets(3) to print error messages if catopen(3) had failed.

Reviewed by:	arichardson, emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34081
2022-01-28 16:52:29 -05:00
Mark Johnston
e9bfb50d5e sort: Fix random sort
bwsrawdata() is supposed to return the string buffer.

PR:		259451
Reported by:	sigsys@gmail.com
Fixes:		d053fb22f6 ("usr.bin/sort: Avoid UBSan errors")
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-10-29 14:29:50 -04:00
Alex Richardson
d053fb22f6 usr.bin/sort: Avoid UBSan errors
UBSan complains about out-of-bounds accesses for zero-length arrays. To
avoid this we can use flexible array members. However, the C standard does
not allow for structures that only contain flexible array members, so we
move the length parameters into that structure too.

Split out from D28233.

Reviewed By:	markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D31009
2021-07-06 10:51:05 +01:00
Cyril Zhang
68d3790ba0 sort: Change default algorithm to mergesort
This results in a significant improvement in the runtime of sort(1) when
radix sort cannot be used.  This comes at the expense of increased
memory usage, but this is small relative to sort's overall memory usage.

PR:		255551
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30319
2021-06-17 13:53:03 -04:00
Mark Johnston
186ba88a7c sort: Hook NetBSD tests up to the build
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-13 09:34:01 -04:00
Cyril Zhang
71ec05a212 sort: Cache value of MB_CUR_MAX
Every usage of MB_CUR_MAX results in a call to __mb_cur_max.  This is
inefficient and redundant.  Caching the value of MB_CUR_MAX in a global
variable removes these calls and speeds up the runtime of sort.  For
numeric sorting, runtime is almost halved in some tests.

PR:		255551
PR:		255840
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30170
2021-05-13 09:33:19 -04:00
Cyril Zhang
fa43162c63 sort: Stop "fixing" obsolete key syntax after -- flag
PR:		255798
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30234
2021-05-13 09:33:19 -04:00
Alex Richardson
266b51ac6e Fix -Wpointer-sign warnings in bwstring.c 2020-09-10 15:37:19 +00:00
Gordon Bergling
2d955e4199 sort(1): Remove duplicate option check
Reviewed by:	lwhsu, emaste
Approved by:	emaste
Obtained from:	DragonFlyBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23892
2020-09-08 15:01:49 +00:00
Conrad Meyer
9f7e5bdad1 sort(1): Fix two wchar-related bugs in radixsort
Sort(1)'s radixsort implementation was broken for multibyte LC_CTYPEs in at
least two ways:

  * In actual radix sort, it would only bucket the least significant
    byte from each wchar, ignoring the 24 most-significant bits of each
    unicode character.

  * In degenerate cases / "fast paths," it would fall back to another
    sorting algorithm (default: mergesort) with a bogus comparator
    offset.  The string comparison functions in sort(1) take an offset
    in units of the operating character size.  However, radixsort was
    passing an offset in units of bytes.  The byte offset must be
    divided by sizeof(wchar_t).

This revision addresses both discovered issues.

Some example testcases:

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=C LC_COLLATE=C LANG=C           sort --radixsort --debug

  $ (for i in $(jot 34); do echo 耳耳耳耳耳; echo 耳耳耳耳脳; echo 耳耳耳耳脴; done) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

PR:		247494
Reported by:	knu
MFC after:	I do not intend to, but parties interested in stable might want to
2020-06-23 16:43:48 +00:00
Simon J. Gerraty
2c9a9dfc18 Update Makefile.depend files
Update a bunch of Makefile.depend files as
a result of adding Makefile.depend.options files

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22494
2019-12-11 17:37:53 +00:00
Simon J. Gerraty
5ab1c5846f Add Makefile.depend.options
Leaf directories that have dependencies impacted
by options need a Makefile.depend.options file
to avoid churn in Makefile.depend

DIRDEPS for cases such as OPENSSL, TCP_WRAPPERS etc
can be set in local.dirdeps-options.mk
which can add to those set in Makefile.depend.options

See share/mk/dirdeps-options.mk

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22469
2019-12-11 17:37:37 +00:00
Sevan Janiyan
08509077b3 Adjust history, info source from v1's manuals
https://www.bell-labs.com/usr/dmr/www/1stEdman.html

MFC after:	5 days
2019-09-04 13:44:46 +00:00
Conrad Meyer
f20b149b45 sort(1): Memoize MD5 computation to reduce repeated computation
Experimentally, reduces sort -R time of a 148160 line corpus from about
3.15s to about 0.93s on this particular system.

There's probably room for improvement using some digest other than md5, but
I don't want to look at sort(1) anymore.  Some discussion of other possible
improvements in the Test Plan section of the Differential.

PR:		230792
Reviewed by:	jhb (earlier version)
Differential Revision:	https://reviews.freebsd.org/D19885
2019-04-13 04:42:17 +00:00
Conrad Meyer
7a590a370a sort(1): Simplify and bound random seeding
Bound input file processing length to avoid the issue reported in [1].  For
simplicity, only allow regular file and character device inputs.  For
character devices, only allow /dev/random (and /dev/urandom symblink).

32 bytes of random is perfectly sufficient to seed MD5; we don't need any
more.  Users that want to use large files as seeds are encouraged to truncate
those files down to an appropriate input file via tools like sha256(1).

(This does not change the sort algorithm of sort -R.)

[1]: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-August/053152.html

PR:		230792
Reported by:	Ali Abdallah <aliovx AT gmail.com>
Relnotes:	yes
2019-04-11 05:08:49 +00:00
Conrad Meyer
74504eefa1 sort(1): Whitespace and style cleanup
No functional change.

Sponsored by:	Dell EMC Isilon
2019-04-11 00:39:06 +00:00
Conrad Meyer
fff4eaebbf sort(1): randomcoll: Skip the memory allocation entirely
There's no reason to order based on strcmp of ASCII digests instead of
memcmp of the raw digests.

While here, remove collision fallback.  If you collide two MD5s, they're
probably the same string anyway.  If robustness against MD5 collisions is
desired, maybe we shouldn't use MD5.

None of the behavior of sort -R is specified by POSIX, so we're free to
implement this however we like.  E.g., using a 128-bit counter and block cipher
to generate unique indices for each line of input.

PR:		230792 (2/many)
Relnotes:	This will change the sort order for a given dataset with a
		given seed.  Other similarly breaking changes are planned.
Sponsored by:	Dell EMC Isilon
2019-04-04 23:32:27 +00:00
Conrad Meyer
e667e2a480 sort(1): randomcoll: Don't sort on ENOMEM
PR:		230792 (1/many)
Sponsored by:	Dell EMC Isilon
2019-04-04 20:27:13 +00:00
Alex Richardson
101db63b42 Don't use absolute path to sed when building usr.bin/join
This is required to build sort on Linux hosts since sed is in /bin there.

Approved By:	jhb (mentor)
2018-08-23 18:18:43 +00:00
Kyle Evans
7137597e15 sort(1): Fix -m when only implicit stdin is used for input
Observe:

printf "a\nb\nc\n" > /tmp/foo
# Next command results in no output
cat /tmp/foo | sort -m
# Next command results in proper output
cat /tmp/foo | sort -m -
# Also works:
sort -m /tmp/foo

Some const'ification was done to simplify the actual solution of adding "-"
explicitly to the file list if we didn't have any file arguments left over.

PR:		190099
MFC after:	1 week
2018-06-20 03:31:19 +00:00
Kyle Evans
36180cd53d sort(1): Add bits to allow easy checking against NetBSD tests
I'm looking at sort(1) failures, for better or worse.
2018-06-20 03:10:49 +00:00
Mark Johnston
7f180c0f80 Fix the WITH_SORT_THREADS build.
PR:		201664
MFC after:	1 week
2018-02-07 20:36:37 +00:00
Pedro F. Giffuni
1de7b4b805 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
Bryan Drewery
ea825d0274 DIRDEPS_BUILD: Update dependencies.
Sponsored by:	Dell EMC Isilon
2017-10-31 00:07:04 +00:00
Pedro F. Giffuni
759a9a9d24 sort(1): Remove unneeded initializations.
Found by:	Clang static analyzer
2017-02-17 19:53:20 +00:00
Pedro F. Giffuni
692cd1a3b2 sort - Don't live-loop threads.
Worker threads now use a pthread_cond_t to wait for work instead of
burning the cpu up.

Obtained from:	DragonflyBSD (07774aea0ccf64a48fcfad8899e3bf7c8f18277a)
MFC after:	2 weeks
2017-01-23 15:39:51 +00:00
Marius Strobl
ed7aec1e45 - Use correct offsets into the keys set array. As the elements of this
zero-length array are dynamically sized at run-time based on the use
  of hints, compilers can't be expected to figure out these offsets on
  their own. [1]
- Fix incorrect comparison in cmp_nans(). [2]

PR:		204571 [1], 202301 [2]
Submitted by:	David Binderman [2]
MFC after:	3 days
2016-12-28 17:13:03 +00:00
Xin LI
3611de44ef pages and psize are always assigned, so there is no need to initialize
them as zero.

MFC after:	2 weeks
2016-11-28 06:38:41 +00:00
Xin LI
c514c3ed4f Eliminate variables that are computed, assigned but never
used.

MFC after:	2 weeks
2016-11-28 06:36:10 +00:00
Xin LI
665d2db378 Fix an obvious typo.
MFC after:	2 weeks
2016-11-28 06:32:05 +00:00
Gabor Kovesdan
a6be469014 - Fix typo
PR:		211245
Submitted by:	Christoph Schonweiler <public2016@hauptsignal.at>
MFC after:	5 days
2016-09-08 14:50:23 +00:00
Pedro F. Giffuni
80c7cc1c8f Cleanup unnecessary semicolons from utilities we all love. 2016-04-15 22:31:22 +00:00
Baptiste Daroussin
902b9f79f7 Fix some mdoc(7) issues
Obtained from:	DragonflyBSD
2015-10-24 13:43:10 +00:00
Gabor Kovesdan
a7bc18929d -C and -c allow at most one input file. Ensure this is the case when the
input files are specified through --files0-from.

Submitted by:	tim@OpenBSD
Obtained from:	OpenBSD
MFC after:	1 week
2015-10-22 10:57:15 +00:00