Through testing, the user noted that some Cyrillic characters were not
sorting correctly, and this was confirmed.
After extensive testing and review, the localedef tool was eliminated
as the culprit. The sustitutions were encoded correctly in LC_COLLATE.
The error was mainly in wcscoll where character expansions were
mishandled. The main directive pass routines had to be written to
go back for a new collation value when the "state" variable was set.
Before pointers were being advanced, the second lookup was gettting
applied to the wrong character, etc.
The "eat expansion codes" section on collate.c also had a bug. Later
own, the "state" variable logic was changed to only set if next
code was greater than zero (rather than >= 0).
Some additional cleanups got captured from previous work:
1) The previous commit moved the binary search comment from the
correct location to a wrong location because it's wrong upstream
in Illumos. The comment has little value so I just removed it.
2) Don't check if pointers are null before freeing, this is
redundant as free() handles null pointers.
3) The two binary search trees were standardized wrt initialization
4) On the binary search trees, a negative "high" exits rather than
checking the table count again.
Submitted by: marino
Obtained from: DragonflyBSD
The main "fix" here is properly setting a collate loading error for each
early return. Tweaks include removing unnecessary null checks, adding
assertions (from Illumos) and a couple of variables to reduces code
differences and improve readability. For normal use, there are no
functional changes here.
Obtained from: DragonflyBSD, Illumos
packed LC_COLLATE binary formats. These were generated with the colldef
tool, but the new LC_COLLATE files are going to be generated by the new
localedef tool using CLDR POSIX files as input. The BSD-flavored
version of localedef identifies the format as "BSD 1.0". Any
LC_COLLATE file with a different version will simply not be loaded, and
all LC* categories will get set to "C" (aka "POSIX") locale.
This work is based off of Nexenta's contribution to Illumos.
The integration with xlocale is John Marino's work for Dragonfly.
The following commits will enable localedef tool, disable the colldef
tool, add generated colldef directory, and finally remove colldef from
base.
The only difference with Dragonfly are:
- a few fixes to build with clang
- And identification of the flavor as "BSD 1.0" instead of "Dragonfly 4.4"
Obtained from: Dragonfly
load of _l suffixed versions of various standard library functions that use
the global locale, making them take an explicit locale parameter. Also
adds support for per-thread locales. This work was funded by the FreeBSD
Foundation.
Please test any code you have that uses the C standard locale functions!
Reviewed by: das (gdtoa changes)
Approved by: dim (mentor)
currently cached data. It allows a number of nice things, like: removing
fallback code from single locale loading, remove memory leak when LC_CTYPE
data loaded again and again, efficient cache use, not only for
setlocale(locale1); setlocale(locale1), but for setlocale(locale1);
setlocale("C"); setlocale(locale1) too (i.e. data file loaded only once).
Also, make an internal _getprogname() that is used only inside
libc. For libc, getprogname(3) is a weak symbol in case a
function of the same name is defined in userland.
adding (weak definitions to) stubs for some of the pthread
functions. If the threads library is linked in, the real
pthread functions will pulled in.
Use the following convention for system calls wrapped by the
threads library:
__sys_foo - actual system call
_foo - weak definition to __sys_foo
foo - weak definition to __sys_foo
Change all libc uses of system calls wrapped by the threads
library from foo to _foo. In order to define the prototypes
for _foo(), we introduce namespace.h and un-namespace.h
(suggested by bde). All files that need to reference these
system calls, should include namespace.h before any standard
includes, then include un-namespace.h after the standard
includes and before any local includes. <db.h> is an exception
and shouldn't be included in between namespace.h and
un-namespace.h namespace.h will define foo to _foo, and
un-namespace.h will undefine foo.
Try to eliminate some of the recursive calls to MT-safe
functions in libc/stdio in preparation for adding a mutex
to FILE. We have recursive mutexes, but would like to avoid
using them if possible.
Remove uneeded includes of <errno.h> from a few files.
Add $FreeBSD$ to a few files in order to pass commitprep.
Approved by: -arch
just use _foo() <-- foo(). In the case of a libpthread that doesn't do
call conversion (such as linuxthreads and our upcoming libpthread), this
is adequate. In the case of libc_r, we still need three names, which are
now _thread_sys_foo() <-- _foo() <-- foo().
Convert all internal libc usage of: aio_suspend(), close(), fsync(), msync(),
nanosleep(), open(), fcntl(), read(), and write() to _foo() instead of foo().
Remove all internal libc usage of: creat(), pause(), sleep(), system(),
tcdrain(), wait(), and waitpid().
Make thread cancellation fully POSIX-compliant.
Suggested by: deischen
points. For library functions, the pattern is __sleep() <--
_libc_sleep() <-- sleep(). The arrows represent weak aliases. For
system calls, the pattern is _read() <-- _libc_read() <-- read().
else, it is equivalent to strdup(). So, we will check if the substitution
tables are trivial at the load time, and possibly save 2 calls to
__collate_substitute() in strcoll().
Still, __collate_substitute() should not exist.
Other minor optimizations. I got ~30% speedup in strcoll() for 50 char strings,
~40% speedup for 100 char strings, and unmeasurable speedup for 1M strings.
Collates are still terribly slow. To make them reasonable fast,
__collate_substitute() should be killed.
In some cases replace if (a == null) a = malloc(x); else a =
realloc(a, x); with simple reallocf(a, x). Per ANSI-C, this is
guaranteed to be the same thing.
I've been running these on my system here w/o ill effects for some
time. However, the CTM-express is at part 6 of 34 for the CAM
changes, so I've not been able to do a build world with the CAM in the
tree with these changes. Shouldn't impact anything, but...
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
Vulnerable: all programs that use setlocale(LC_COLLATE),
setlocale(LC_CTYPE), or setlocale(LC_ALL). The only setuid/setgid
binary i've found for this is w(1).
Should go into 2.2.
strdup() it to prevent unsetenv() or setenv() effects. Check its length to
not allow user to overflow internal locale buffer. Move PATH_LOCALE
handling code into one place.
POSIX: make better stub for LC_MONETARY & LC_NUMERIC, now it check
locale directory existance instead of refusing all non-C non-POSIX
locales. POSIX treats empty locale env variable as unset variable
while our old code treats it as "C" locale, fix it. Implement previous locale
restoring, if locale setting fails. Old code assumes success if some
of LC_ALL subset is successed even other fails, POSIX treats it as
failure with previous locale restoring, fix it.
Remove unneccessary length checking in currentlocale()
for gcc >= 2.5 and no-ops for gcc >= 2.6. Converted to use __dead2
or __pure2 where it wasn't already done, except in math.h where use
of __pure was mostly wrong.