MFC r306782-r306783

r306782:
localedef: Fix ctype dump (fixed wide spread errors)

This commit is from John Marino in dragonfly with the following commit log:

====
This was a CTYPE encoding error involving consecutive points of the same
ctype.  It was reported by myself to Illumos over a year ago but I was
unsure if it was only happening on BSD.  Given the cause, the bug is also
present on Illumos.

Basically, if consecutive points were of the exact same ctype, they would
be defined as a range regardless.  For example, all of these would be
considered equivalent:

  <A> ... <C>, <H>  (converts to <A> .. <H>)
  <A>, <B>, <H>     (converts to <A> .. <H>)
  <A>, <J> ... <H>  (converts to <A> .. <H>)

So all the points that shouldn't have been defined got "bridged" by the
extreme points.

The effects were recently reported to FreeBSD on PR 213013.  There are
countless places were the ctype flags are misdefined, so this is a major
fix that has to be MFC'd.
====

This reveals a bad change I did on the testsuite: while 0x07FF is a valid
unicode it is not used yet (reserved for future use)

PR:		213013
Submitted by:	marino@
Reported by:	Kurtis Rader <krader@skepticism.us>
Obtained from:	Dragonfly
MFC after:	1 month

r306783:
localedef: Improve cc_list parsing

original commit log:
=====
I had originally suspected the parsing of ctype definition files as being
the source of the ctype flag mis-definitions, but it wasn't.  In the
process, I simplified the cc_list parsing so I'm committing the no-impact
improvement separately.  It removes some parsing redundancies and
won't parse partial range definitions anymore.
====

Submitted by:	marino
Obtained from:	Dragonfly
MFC after:	1 month
This commit is contained in:
Baptiste Daroussin 2016-11-05 09:46:48 +00:00
parent 2843197d3c
commit ec99d4d6d8
3 changed files with 15 additions and 16 deletions

View File

@ -88,7 +88,7 @@ static struct test {
0xFFFF, 0x5D, 0x5B, 0x10000, 0x10FFFF, 0x5D, 0x0A
},
#ifdef __FreeBSD__
{ 1, -1, -1, 1, 1, -1, 1, 1, 1, 1, -1, 1, 1, -1, -1,
{ 1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1, 1, 1, -1, -1,
#else
{ 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1,
#endif

View File

@ -407,9 +407,9 @@ dump_ctype(void)
continue;
}
if ((last_ct != NULL) && (last_ct->ctype == ctn->ctype)) {
if ((last_ct != NULL) && (last_ct->ctype == ctn->ctype) &&
(last_ct->wc + 1 == wc)) {
ct[rl.runetype_ext_nranges-1].max = wc;
last_ct = ctn;
} else {
rl.runetype_ext_nranges++;
ct = realloc(ct,
@ -417,8 +417,8 @@ dump_ctype(void)
ct[rl.runetype_ext_nranges - 1].min = wc;
ct[rl.runetype_ext_nranges - 1].max = wc;
ct[rl.runetype_ext_nranges - 1].map = ctn->ctype;
last_ct = ctn;
}
last_ct = ctn;
if (ctn->tolower == 0) {
last_lo = NULL;
} else if ((last_lo != NULL) &&

View File

@ -27,6 +27,8 @@
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/
/*
@ -321,21 +323,18 @@ ctype_kw : T_ISUPPER cc_list T_NL
| T_TOLOWER conv_list T_NL
;
cc_list : cc_list T_SEMI cc_range_end
| cc_list T_SEMI cc_char
| cc_char
;
cc_list : cc_list T_SEMI T_CHAR
cc_range_end : T_ELLIPSIS T_SEMI T_CHAR
{
add_ctype($3);
add_ctype_range($3);
}
| cc_list T_SEMI T_SYMBOL
{
add_charmap_undefined($3);
}
| cc_list T_SEMI T_ELLIPSIS T_SEMI T_CHAR
{
/* note that the endpoints *must* be characters */
add_ctype_range($5);
}
| T_CHAR
;
cc_char : T_CHAR
{
add_ctype($1);
}