Commit Graph

135 Commits

Author SHA1 Message Date
Andrey A. Chernov
12eae8c8f3 1) Eliminate possibility to call __*collate_range_cmp() with inclomplete
locale (which cause core dump) by removing whole 'table' argument
by which it passed.

2) Restore __collate_range_cmp() in __sccl().

3) Collating [a-z] range in regcomp() only for single bytes locales
(we can't do it now for other ones). In previous state only first 256
wchars are considered and all others are just silently dropped from the
range.
2016-07-14 09:07:25 +00:00
Andrey A. Chernov
1daad8f5ad Back out non-collating [a-z] ranges.
Instead of changing whole course to another POSIX-permitted way
for consistency and uniformity I decide to completely ignore missing
regex fucntionality and concentrace on fixing bugs in what we have now,
too many small obstacles instead, counting ports.
2016-07-14 08:18:12 +00:00
Andrey A. Chernov
5a5807dd4c Remove broken support for collation in [a-z] type ranges.
Only first 256 wide chars are considered currently, all other are just
dropped from the range. Proper implementation require reverse tables
database lookup, since objects are really big as max UTF-8 (1114112
code points), so just the same scanning as it was for 256 chars will
slow things down.

POSIX does not require collation for [a-z] type ranges and does not
prohibit it for non-POSIX locales. POSIX require collation for ranges
only for POSIX (or C) locale which is equal to ASCII and binary for
other chars, so we already have it.

No other *BSD implements collation for [a-z] type ranges.

Restore ABI compatibility with unused now __collate_range_cmp() which
is visible from outside (will be removed later).
2016-07-10 03:49:38 +00:00
Pedro F. Giffuni
3c2c0c0443 libc/locale: Fix type breakage in __collate_range_cmp().
When collation support was brought in, the second and third
arguments in __collate_range_cmp() were changed from int to
wchar_t, breaking the ABI. Change them to a "char" type which
makes more sense and keeps the ABI compatible.

Also introduce __wcollate_range_cmp() which does work with wide
characters. This function is used only internally in libc so
we don't export it. Use the new function in glob(3), fnmatch(3),
and regexec(3).

PR:		179721
Suggested by:	ache. jilles
MFC after:	3 weeks (perhaps partial only)
2016-06-05 19:12:52 +00:00
Pedro F. Giffuni
93ea9f9fa1 libc: regexec(3) adjustment.
Change the behavior of when REG_STARTEND is combined with REG_NOTBOL.

From the original posting[1]:

"Enable the assumption that pmatch[0].rm_so is a continuation offset
to  a string and allows us to do a proper assessment of the character
in  regards to it's word position ('^' or '\<'), without risking going
into unallocated memory."

This change makes us similar to how glibc handles REG_STARTEND |
REG_NOTBOL, and is closely related to a soon-to-land fix to sed.

Special thanks to Martijn van Duren and Ingo Schwarze for working
out some consistent behaviour.

Differential Revision:	https://reviews.freebsd.org/D6257
Taken from:	openbsd-tech 2016-05-24 [1]  (Martijn van Duren)
Relnotes:	yes
MFC after:	1 month
2016-05-25 15:35:23 +00:00
Pedro F. Giffuni
e9fe9edde7 libc/regex: fix two buffer underruns.
Fix some rather complex regex issues found on OpenBSD as part of some
ongoing work to fix a sed(1) bug.

Curiously the OpenBSD tests don't trigger segfaults on FreeBSD but the
bugs were confirmed by running a port of FreeBSD's regex under OpenBSD's
malloc. Huge thanks to Ingo for confirming the behavior.

Taken from:	Ingo Schwarze (through openbsd-tech 2016-05-15)
MFC after:	1 week
2016-05-21 19:54:10 +00:00
Pedro F. Giffuni
32223c1b7d libc: spelling fixes.
Mostly on comments.
2016-04-30 01:24:24 +00:00
Pedro F. Giffuni
039b877ef0 regex: prevent two improbable signed integer overflows.
In matcher() we used an integer to index nsub of type size_t.
In print() we used an integer to index nstates of type sopno,
typedef'd long.
In both cases the indexes never take negative values.

Match the types to avoid any error.

MFC after:	5 days
2016-04-23 20:45:09 +00:00
Enji Cooper
1385475525 Add -static to CFLAGS to unbreak the tests by using a libc.a with
the xlocale private symbols exposed which aren't exposed publicly
via the DSO

PR: 191354
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2015-12-13 06:33:52 +00:00
Enji Cooper
df943f005e Fix -Wformat issues and minor whitespace issues in surrounding areas
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2015-12-05 02:25:20 +00:00
Enji Cooper
e3bc7f4da8 split.ih:
- Create automatically generated include header for split.c

main.c:
- Use function definitions from debug.ih and split.ih instead of externs

Sponsored by: EMC / Isilon Storage Division
2015-12-05 02:23:44 +00:00
Enji Cooper
16c284eca2 Use == instead of = in the function comment above split(..) so mkh -p
exposes split(..).

MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2015-12-05 02:18:36 +00:00
Enji Cooper
8eba7ea3b3 Use ANSI C function prototypes/definitions instead of K&R style ones
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2015-12-05 02:07:55 +00:00
Enji Cooper
1dce0e7706 Add missing headers and sort #includes per style(9)
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2015-12-05 01:19:35 +00:00
Enji Cooper
f76918a8c4 - Use ANSI C function prototypes/definitions instead of K&R style ones
- Add a missing return type for main(..)

MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2015-12-05 01:13:18 +00:00
Enji Cooper
b20e9c5f4c Fix -Wformat warnings by using the correct format qualifiers
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2015-12-05 01:12:58 +00:00
Baptiste Daroussin
a2746a445f mdoc: rendering fixes 2015-04-26 10:55:39 +00:00
Pedro F. Giffuni
12eb0bca33 computematchjumps(): fix allocator sizeof operand mismatch.
Mostly cosmetical warning.

Found by:	Clang static analyzer
2015-04-22 17:09:02 +00:00
Pedro F. Giffuni
cc3a001f1c Prevent NULL pointer de-reference.
As a follow up to r279090, if dp hasn't been defined, we
shouldn't attempt to do an optimization here.
2015-02-21 15:02:27 +00:00
Pedro F. Giffuni
d58abbfe0f regex(3): Fix uninitialized pointer values.
CID:	405582	(also clang static checker)
CID:	1018724
2015-02-20 21:21:38 +00:00
Xin LI
a89629abd9 Disallow pattern spaces which would cause intermediate calculations to
overflow size_t.

Obtained from:	DragonFly (2841837793bd095a82f477e9c370cfe6cfb3862c dillon)
Security:	CERT VU#695940
MFC after:	3 days
2015-02-14 00:23:53 +00:00
Joel Dahl
f7e00d4bbd mdoc: remove EOL whitespace. 2014-12-29 13:50:59 +00:00
Xin LI
cfbebadc60 Plug a memory leak.
Obtained from:	DragonFlyBSD (commit 5119ece)
MFC after:	2 weeks
2014-12-19 06:48:47 +00:00
Pedro F. Giffuni
6e28368366 regex(3): Add support for \< and \> word delimiters
Solaris and other OSs have support for \< and \> as word
delimiters in utilities like sed(1). These are useful to
have for general compatiblity with Solaris but should be
avoided for portability with other systems, including the
traditional BSDs.

Bump __FreeBSD_version as this is likely to affect some
userland utilities.

Reference:
https://www.illumos.org/issues/516

PR:		bin/153257
Obtained from:	Illumos
MFC after:	1 month
2014-06-30 20:54:25 +00:00
Pedro F. Giffuni
0279beeb0e Revert r267675:
The code doesn't really benefit of using reallocf() in this case.
Also, the realloc() results being assigned temporary variable which
makes blind replacement with reallocf() mostly useless.

Pointed out by:		stefanf, bde
2014-06-21 01:43:56 +00:00
Pedro F. Giffuni
eb0fb866de regex: Make use of reallocf().
Use of reallocf is useful in libraries as we are not certain the
application will exit after NULL.

This somewhat reduces portability but if since you are building
this as part of libc it is likely you have our non-standard
reallocf(3) already.

Reviewed by:	ache
MFC after:	5 days
2014-06-20 15:29:09 +00:00
Pedro F. Giffuni
e234ddef95 Revert r265367:
Use of calloc instead of malloc in regex (from OpenBSD).

In this case the change makes no sense since we are using realloc() later.

Reported by:	ache
2014-05-05 18:04:57 +00:00
Pedro F. Giffuni
9d12ca17b5 regex: Use calloc instead of malloc.
Mostly to reduce differences with OpenBSD.

Obtained from:	OpenBSD (CVS rev. 1.17)
MFC after:	3 days
2014-05-05 16:41:15 +00:00
Pedro F. Giffuni
eab20bceca regex: Remove some unreachable breaks.
This is based on a much bigger cleanup done in Illumos.

Reference:
https://www.illumos.org/issues/2077

MFC after:	1 week
2014-05-01 23:34:14 +00:00
Marcel Moolenaar
8876613dc5 Replace use of ${.CURDIR} by ${LIBC_SRCTOP} and define ${LIBC_SRCTOP}
if not already defined. This allows building libc from outside of
lib/libc using a reach-over makefile.

A typical use-case is to build a standard ILP32 version and a COMPAT32
version in a single iteration by building the COMPAT32 version using a
reach-over makefile.

Obtained from:	Juniper Networks, Inc.
2014-03-04 02:19:39 +00:00
Xin LI
d0ebccde13 Fix assignment of maximum bounadary.
Submitted by:	Sascha Wildner <saw online de>
Obtained from:	DragonFly rev fd39c81ba220f7ad6e4dc9b30d45e828cf58a1ad
MFC after:	2 weeks
2013-03-01 23:26:13 +00:00
David Chisnall
7dfd88318b Remove some duplicated copyright notices.
Approved by:	dim (mentor)
2012-03-06 12:53:44 +00:00
David Chisnall
3c87aa1d3d Implement xlocale APIs from Darwin, mainly for use by libc++. This adds a
load of _l suffixed versions of various standard library functions that use
the global locale, making them take an explicit locale parameter.  Also
adds support for per-thread locales.  This work was funded by the FreeBSD
Foundation.

Please test any code you have that uses the C standard locale functions!

Reviewed by:    das (gdtoa changes)
Approved by:    dim (mentor)
2011-11-20 14:45:42 +00:00
Kevin Lo
5249ac8610 Converting int to wint_t leads to broekn comparison of raw char
and encoded wint_t.

Spotted by:	ache
2011-11-11 01:35:07 +00:00
Kevin Lo
2bf213eb6c - Don't handle out-of-memory condition
- Fix types of function arguments match their declaration

Reviewed by:	delphij
Obtained from:	NetBSD
2011-11-10 01:44:05 +00:00
Ulrich Spörlein
0d9deed52c mdoc: drop redundant .Pp and .LP calls
They have no effect when coming in pairs, or before .Bl/.Bd
2010-10-08 12:40:16 +00:00
Diomidis Spinellis
e7cbc6ee95 Fix an off-by-one error in the marking of the O_CH operator
following an OOR2 operator.

PR:		130504
MFC after:	2 weeks
2009-09-16 06:32:23 +00:00
Diomidis Spinellis
bca3476acd Add a couple of debugging statements. 2009-09-16 06:29:23 +00:00
Diomidis Spinellis
b7bb894067 Add two test cases from PR 130504.
An additional one coming from http://www.research.att.com/~gsf/testregex/
was not added; at some point the entire AT&T regression test harness
should be imported here.
But that would also mean commitment to fix the uncovered errors.

PR:		130504
Submitted by:	Chris Kuklewicz
2009-09-15 21:15:29 +00:00
Giorgos Keramidas
0efdfdee5f Add two example regexps: (1) one for matching all the characters
that belong in a character class, and (2) one for matching all
the characters *not* in a character class.

Submitted by:	Mark B, mkbucc at gmail.com
MFC after:	3 days
2008-09-05 17:41:20 +00:00
Kevin Lo
8f9872ccb3 getopt(3) returns -1, not EOF. 2008-02-18 03:19:25 +00:00
Xin LI
54a648d1e5 Diff reduction against other *BSDs: ANSIfy function
prototypes.  No function changes.
2007-06-11 03:05:54 +00:00
Xin LI
eb2b3d109a Const'ify and ANSIfy the internal interfaces of regex(3).
This is the final change that makes libc to compile with
WERROR on my amd64 crashbox.
2007-05-25 12:44:58 +00:00
Daniel Eischen
5f864214bb Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
Xin LI
2f1b6e8bb5 Test cases for back references.
Obtained from:	OpenBSD
2007-03-05 09:44:41 +00:00
Xin LI
082063a051 Only stop evaluation of a back reference if the match length is
zero and the recursion level is too deep.

Obtained from:	OpenBSD
2007-03-05 09:43:55 +00:00
Xin LI
0f4481c5e4 Avoid infinite recursion on:
echo "foo foo bar bar bar baz" | sed 's/\([^ ]*\)\( *\1\)*/\1/g'

Obtained from:	OpenBSD via NetBSD (rev. 1.18)
2007-03-05 03:07:36 +00:00
Warner Losh
c879ae3536 Per Regents of the University of Calfornia letter, remove advertising
clause.

# If I've done so improperly on a file, please let me know.
2007-01-09 00:28:16 +00:00
Daniel Eischen
6fad3aaf15 Add each directory's symbol map file to SYM_MAPS. 2006-03-13 01:15:01 +00:00
Daniel Eischen
cce72e8860 Add symbol maps and initial symbol version definitions to libc.
Reviewed by:	davidxu
2006-03-13 00:53:21 +00:00