Commit Graph

107 Commits

Author SHA1 Message Date
Conrad Meyer
9f7e5bdad1 sort(1): Fix two wchar-related bugs in radixsort
Sort(1)'s radixsort implementation was broken for multibyte LC_CTYPEs in at
least two ways:

  * In actual radix sort, it would only bucket the least significant
    byte from each wchar, ignoring the 24 most-significant bits of each
    unicode character.

  * In degenerate cases / "fast paths," it would fall back to another
    sorting algorithm (default: mergesort) with a bogus comparator
    offset.  The string comparison functions in sort(1) take an offset
    in units of the operating character size.  However, radixsort was
    passing an offset in units of bytes.  The byte offset must be
    divided by sizeof(wchar_t).

This revision addresses both discovered issues.

Some example testcases:

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

  $ (echo 耳 ; echo 脳 ; echo 耳) | \
  LC_CTYPE=C LC_COLLATE=C LANG=C           sort --radixsort --debug

  $ (for i in $(jot 34); do echo 耳耳耳耳耳; echo 耳耳耳耳脳; echo 耳耳耳耳脴; done) | \
  LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --radixsort --debug

PR:		247494
Reported by:	knu
MFC after:	I do not intend to, but parties interested in stable might want to
2020-06-23 16:43:48 +00:00
Simon J. Gerraty
2c9a9dfc18 Update Makefile.depend files
Update a bunch of Makefile.depend files as
a result of adding Makefile.depend.options files

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22494
2019-12-11 17:37:53 +00:00
Simon J. Gerraty
5ab1c5846f Add Makefile.depend.options
Leaf directories that have dependencies impacted
by options need a Makefile.depend.options file
to avoid churn in Makefile.depend

DIRDEPS for cases such as OPENSSL, TCP_WRAPPERS etc
can be set in local.dirdeps-options.mk
which can add to those set in Makefile.depend.options

See share/mk/dirdeps-options.mk

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22469
2019-12-11 17:37:37 +00:00
Sevan Janiyan
08509077b3 Adjust history, info source from v1's manuals
https://www.bell-labs.com/usr/dmr/www/1stEdman.html

MFC after:	5 days
2019-09-04 13:44:46 +00:00
Conrad Meyer
f20b149b45 sort(1): Memoize MD5 computation to reduce repeated computation
Experimentally, reduces sort -R time of a 148160 line corpus from about
3.15s to about 0.93s on this particular system.

There's probably room for improvement using some digest other than md5, but
I don't want to look at sort(1) anymore.  Some discussion of other possible
improvements in the Test Plan section of the Differential.

PR:		230792
Reviewed by:	jhb (earlier version)
Differential Revision:	https://reviews.freebsd.org/D19885
2019-04-13 04:42:17 +00:00
Conrad Meyer
7a590a370a sort(1): Simplify and bound random seeding
Bound input file processing length to avoid the issue reported in [1].  For
simplicity, only allow regular file and character device inputs.  For
character devices, only allow /dev/random (and /dev/urandom symblink).

32 bytes of random is perfectly sufficient to seed MD5; we don't need any
more.  Users that want to use large files as seeds are encouraged to truncate
those files down to an appropriate input file via tools like sha256(1).

(This does not change the sort algorithm of sort -R.)

[1]: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-August/053152.html

PR:		230792
Reported by:	Ali Abdallah <aliovx AT gmail.com>
Relnotes:	yes
2019-04-11 05:08:49 +00:00
Conrad Meyer
74504eefa1 sort(1): Whitespace and style cleanup
No functional change.

Sponsored by:	Dell EMC Isilon
2019-04-11 00:39:06 +00:00
Conrad Meyer
fff4eaebbf sort(1): randomcoll: Skip the memory allocation entirely
There's no reason to order based on strcmp of ASCII digests instead of
memcmp of the raw digests.

While here, remove collision fallback.  If you collide two MD5s, they're
probably the same string anyway.  If robustness against MD5 collisions is
desired, maybe we shouldn't use MD5.

None of the behavior of sort -R is specified by POSIX, so we're free to
implement this however we like.  E.g., using a 128-bit counter and block cipher
to generate unique indices for each line of input.

PR:		230792 (2/many)
Relnotes:	This will change the sort order for a given dataset with a
		given seed.  Other similarly breaking changes are planned.
Sponsored by:	Dell EMC Isilon
2019-04-04 23:32:27 +00:00
Conrad Meyer
e667e2a480 sort(1): randomcoll: Don't sort on ENOMEM
PR:		230792 (1/many)
Sponsored by:	Dell EMC Isilon
2019-04-04 20:27:13 +00:00
Alex Richardson
101db63b42 Don't use absolute path to sed when building usr.bin/join
This is required to build sort on Linux hosts since sed is in /bin there.

Approved By:	jhb (mentor)
2018-08-23 18:18:43 +00:00
Kyle Evans
7137597e15 sort(1): Fix -m when only implicit stdin is used for input
Observe:

printf "a\nb\nc\n" > /tmp/foo
# Next command results in no output
cat /tmp/foo | sort -m
# Next command results in proper output
cat /tmp/foo | sort -m -
# Also works:
sort -m /tmp/foo

Some const'ification was done to simplify the actual solution of adding "-"
explicitly to the file list if we didn't have any file arguments left over.

PR:		190099
MFC after:	1 week
2018-06-20 03:31:19 +00:00
Kyle Evans
36180cd53d sort(1): Add bits to allow easy checking against NetBSD tests
I'm looking at sort(1) failures, for better or worse.
2018-06-20 03:10:49 +00:00
Mark Johnston
7f180c0f80 Fix the WITH_SORT_THREADS build.
PR:		201664
MFC after:	1 week
2018-02-07 20:36:37 +00:00
Pedro F. Giffuni
1de7b4b805 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
Bryan Drewery
ea825d0274 DIRDEPS_BUILD: Update dependencies.
Sponsored by:	Dell EMC Isilon
2017-10-31 00:07:04 +00:00
Pedro F. Giffuni
759a9a9d24 sort(1): Remove unneeded initializations.
Found by:	Clang static analyzer
2017-02-17 19:53:20 +00:00
Pedro F. Giffuni
692cd1a3b2 sort - Don't live-loop threads.
Worker threads now use a pthread_cond_t to wait for work instead of
burning the cpu up.

Obtained from:	DragonflyBSD (07774aea0ccf64a48fcfad8899e3bf7c8f18277a)
MFC after:	2 weeks
2017-01-23 15:39:51 +00:00
Marius Strobl
ed7aec1e45 - Use correct offsets into the keys set array. As the elements of this
zero-length array are dynamically sized at run-time based on the use
  of hints, compilers can't be expected to figure out these offsets on
  their own. [1]
- Fix incorrect comparison in cmp_nans(). [2]

PR:		204571 [1], 202301 [2]
Submitted by:	David Binderman [2]
MFC after:	3 days
2016-12-28 17:13:03 +00:00
Xin LI
3611de44ef pages and psize are always assigned, so there is no need to initialize
them as zero.

MFC after:	2 weeks
2016-11-28 06:38:41 +00:00
Xin LI
c514c3ed4f Eliminate variables that are computed, assigned but never
used.

MFC after:	2 weeks
2016-11-28 06:36:10 +00:00
Xin LI
665d2db378 Fix an obvious typo.
MFC after:	2 weeks
2016-11-28 06:32:05 +00:00
Gabor Kovesdan
a6be469014 - Fix typo
PR:		211245
Submitted by:	Christoph Schonweiler <public2016@hauptsignal.at>
MFC after:	5 days
2016-09-08 14:50:23 +00:00
Pedro F. Giffuni
80c7cc1c8f Cleanup unnecessary semicolons from utilities we all love. 2016-04-15 22:31:22 +00:00
Baptiste Daroussin
902b9f79f7 Fix some mdoc(7) issues
Obtained from:	DragonflyBSD
2015-10-24 13:43:10 +00:00
Gabor Kovesdan
a7bc18929d -C and -c allow at most one input file. Ensure this is the case when the
input files are specified through --files0-from.

Submitted by:	tim@OpenBSD
Obtained from:	OpenBSD
MFC after:	1 week
2015-10-22 10:57:15 +00:00
Simon J. Gerraty
2ef6d5a7b9 new depends 2015-06-16 23:37:19 +00:00
Simon J. Gerraty
ccfb965433 Add META_MODE support.
Off by default, build behaves normally.
WITH_META_MODE we get auto objdir creation, the ability to
start build from anywhere in the tree.

Still need to add real targets under targets/ to build packages.

Differential Revision:       D2796
Reviewed by: brooks imp
2015-06-13 19:20:56 +00:00
Simon J. Gerraty
44d314f704 dirdeps.mk now sets DEP_RELDIR 2015-06-08 23:35:17 +00:00
Simon J. Gerraty
98e0ffaefb Merge sync of head 2015-05-27 01:19:58 +00:00
Pedro F. Giffuni
0f4b9a9057 Remove custom getdelim(3) and fix a small memory leak.
Originally from Andre Smagin.

Obtained from:	OpenBSD
MFC after:	1 week
2015-04-07 01:17:49 +00:00
Pedro F. Giffuni
b904ab99bc sort(1): Cleanups and a small memory leak.
Remove useless check for leading blanks in the month name.  The
code didn't adjust len after stripping blanks so even if a month
*did* start with a blank we'd end up copying garbage at the end.
Also convert a malloc + memcpy to strdup and fix a memory leak in
the wide char version if mbstowcs() fails.
Originally from Andre Smagin.

Obtained from:  OpenBSD (CVS rev. 1.2, 1.3)
MFC after:	1 week
2015-04-07 01:17:29 +00:00
Pedro F. Giffuni
45e151e97d sort: style knits / cleanups.
Minor cleanups that got accidentally reverted.

Obtained from:	OpenBSD
2015-04-06 03:02:20 +00:00
Pedro F. Giffuni
e5f71a07e4 Revert (partial) r281123, r281125:
sort: style knits / cleanups.

Our style guide(9) specifies that in absence of local variables
an empty line must be inserted.

Pointed out by:	eadler
2015-04-06 02:35:55 +00:00
Pedro F. Giffuni
db8026c7bb sort: style knits / cleanups.
Obtained from:	OpenBSD
2015-04-05 23:06:42 +00:00
Pedro F. Giffuni
bd0f80c667 sort: Fix a comment.
Obtained from:	OpenBSD
2015-04-05 22:34:03 +00:00
Pedro F. Giffuni
f79477ebd5 sort: Cleanup small issues with spaces.
Obtained from:	OpenBSD
2015-04-05 22:22:43 +00:00
Joel Dahl
385385fb1c mdoc: use An macro. 2015-01-04 12:42:08 +00:00
Baptiste Daroussin
3e11bd9e2a Convert to usr.bin/ to LIBADD
Reduce overlinking
2014-11-25 14:29:10 +00:00
Simon J. Gerraty
9268022b74 Merge from head@274682 2014-11-19 01:07:58 +00:00
Sean Bruno
e4684c7895 Change LDFLAGS to LDADD in order to allow static builds. This is more
proper way to ensure that the command line compile works the way we intend.

Add explicity DPADD statemens on LIBMD and LIBPTHREAD depending on which
options are used in the build.

Reviewed by:	andrew
MFC after:	2 weeks
2014-11-15 18:03:38 +00:00
Baptiste Daroussin
3e16491d77 Make sure to not skip any argument when converting from deprecated
+POS1, -POS2 to -kPOS1,POS2, so that sort +0n gets translated to sort -k1,1n
as it is expected

PR:		193994
Submitted by:	rodrigo
MFC after:	3 days
2014-10-02 06:29:49 +00:00
Simon J. Gerraty
ee7b0571c2 Merge head from 7/28 2014-08-19 06:50:54 +00:00
Glen Barber
96c566eaf9 Remove trailing '.' from See Also section.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2014-07-30 04:40:50 +00:00
Simon J. Gerraty
fae50821ae Updated dependencies 2014-05-16 14:09:51 +00:00
Simon J. Gerraty
76b28ad6ab Updated dependencies 2014-05-10 05:16:28 +00:00
Simon J. Gerraty
cc3f4b9965 Merge from head 2014-05-08 23:54:15 +00:00
Warner Losh
c6063d0da8 Use src.opts.mk in preference to bsd.own.mk except where we need stuff
from the latter.
2014-05-06 04:22:01 +00:00
Simon J. Gerraty
3b8f084595 Merge head 2014-04-28 07:50:45 +00:00
Bryan Drewery
459d6434a8 Fix spelling error.
MFC after:	3 days
2014-04-25 15:27:19 +00:00
Pedro F. Giffuni
b1a409863f Various style(9) fixes and typos in grep, sort and patch.
MFC after:	3 days
2014-04-21 22:52:18 +00:00