freebsd-skq/contrib/awk/test/rebt8b1.awk
2001-11-02 21:06:08 +00:00

139 lines
5.9 KiB
Awk
Raw Blame History

# From hankedr@dms.auburn.edu Sun Jan 28 12:25:43 2001
# Received: from mail.actcom.co.il [192.114.47.13]
# by localhost with POP3 (fetchmail-5.5.0)
# for arnold@localhost (single-drop); Sun, 28 Jan 2001 12:25:43 +0200 (IST)
# Received: by actcom.co.il (mbox arobbins)
# (with Cubic Circle's cucipop (v1.31 1998/05/13) Sun Jan 28 12:27:08 2001)
# X-From_: hankedr@dms.auburn.edu Sat Jan 27 15:15:57 2001
# Received: from lmail.actcom.co.il by actcom.co.il with ESMTP
# (8.9.1a/actcom-0.2) id PAA23801 for <arobbins@actcom.co.il>;
# Sat, 27 Jan 2001 15:15:55 +0200 (EET)
# (rfc931-sender: lmail.actcom.co.il [192.114.47.13])
# Received: from billohost.com (www.billohost.com [209.196.35.10])
# by lmail.actcom.co.il (8.9.3/8.9.1) with ESMTP id PAA15998
# for <arobbins@actcom.co.il>; Sat, 27 Jan 2001 15:16:27 +0200
# Received: from yak.dms.auburn.edu (yak.dms.auburn.edu [131.204.53.2])
# by billohost.com (8.9.3/8.9.3) with ESMTP id IAA00467
# for <arnold@skeeve.com>; Sat, 27 Jan 2001 08:15:52 -0500
# Received: (from hankedr@localhost)
# by yak.dms.auburn.edu (8.9.3/8.9.3/Debian/GNU) id HAA24441;
# Sat, 27 Jan 2001 07:15:44 -0600
# Date: Sat, 27 Jan 2001 07:15:44 -0600
# Message-Id: <200101271315.HAA24441@yak.dms.auburn.edu>
# From: Darrel Hankerson <hankedr@dms.auburn.edu>
# To: arnold@skeeve.com
# Subject: [stolfi@ic.unicamp.br: Bug in [...]* matching with acute-u]
# Mime-Version: 1.0 (generated by tm-edit 7.106)
# Content-Type: message/rfc822
# Status: R
#
# From: Jorge Stolfi <stolfi@ic.unicamp.br>
# To: bug-gnu-utils@gnu.org
# Subject: Bug in [...]* matching with acute-u
# MIME-Version: 1.0
# Reply-To: stolfi@ic.unicamp.br
# X-MIME-Autoconverted: from 8bit to quoted-printable by grande.dcc.unicamp.br id GAA10716
# Sender: bug-gnu-utils-admin@gnu.org
# Errors-To: bug-gnu-utils-admin@gnu.org
# X-BeenThere: bug-gnu-utils@gnu.org
# X-Mailman-Version: 2.0
# Precedence: bulk
# List-Help: <mailto:bug-gnu-utils-request@gnu.org?subject=help>
# List-Post: <mailto:bug-gnu-utils@gnu.org>
# List-Subscribe: <http://mail.gnu.org/mailman/listinfo/bug-gnu-utils>,
# <mailto:bug-gnu-utils-request@gnu.org?subject=subscribe>
# List-Id: Bug reports for the GNU utilities <bug-gnu-utils.gnu.org>
# List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/bug-gnu-utils>,
# <mailto:bug-gnu-utils-request@gnu.org?subject=unsubscribe>
# List-Archive: <http://mail.gnu.org/pipermail/bug-gnu-utils/>
# Date: Sat, 27 Jan 2001 06:46:11 -0200 (EDT)
# Content-Transfer-Encoding: 8bit
# X-MIME-Autoconverted: from quoted-printable to 8bit by manatee.dms.auburn.edu id CAA14936
# Content-Type: text/plain; charset=iso-8859-1
# <mailto:bug-gnu-utils-request@gnu.org?subject=subscribe>
# <mailto:bug-gnu-utils-request@gnu.org?subject=uns
# Content-Length: 3137
#
#
#
# Hi,
#
# I think I have run into a bug in gawk's handling of REs of the
# form [...]* when the bracketed list includes certain 8-bit characters,
# specifically u-acute (octal \372).
#
# The problem occurs in GNU Awk 3.0.4, both under
# Linux 2.2.14-5.0 (intel i686) and SunOS 5.5 (Sun sparc).
#
# Here is a program that illustrates the bug, and its output.
# The first two lines of the output should be equal, shouldn't they?
#
# ----------------------------------------------------------------------
#! /usr/bin/gawk -f
BEGIN {
s = "bananas and ananases in canaan";
t = s; gsub(/[an]*n/, "AN", t); printf "%-8s %s\n", "[an]*n", t;
t = s; gsub(/[an<61>]*n/, "AN", t); printf "%-8s %s\n", "[an<61>]*n", t;
print "";
t = s; gsub(/[a<>]*n/, "AN", t); printf "%-8s %s\n", "[a<>]*n", t;
print "";
t = s; gsub(/[an]n/, "AN", t); printf "%-8s %s\n", "[an]n", t;
t = s; gsub(/[a<>]n/, "AN", t); printf "%-8s %s\n", "[a<>]n", t;
t = s; gsub(/[an<61>]n/, "AN", t); printf "%-8s %s\n", "[an<61>]n", t;
print "";
t = s; gsub(/[an]?n/, "AN", t); printf "%-8s %s\n", "[an]?n", t;
t = s; gsub(/[a<>]?n/, "AN", t); printf "%-8s %s\n", "[a<>]?n", t;
t = s; gsub(/[an<61>]?n/, "AN", t); printf "%-8s %s\n", "[an<61>]?n", t;
print "";
t = s; gsub(/[an]+n/, "AN", t); printf "%-8s %s\n", "[an]+n", t;
t = s; gsub(/[a<>]+n/, "AN", t); printf "%-8s %s\n", "[a<>]+n", t;
t = s; gsub(/[an<61>]+n/, "AN", t); printf "%-8s %s\n", "[an<61>]+n", t;
}
# ----------------------------------------------------------------------
# [an]*n bANas ANd ANases iAN cAN
# [an<61>]*n bananas and ananases in canaan
#
# [a<>]*n bANANas ANd ANANases iAN cANAN
#
# [an]n bANANas ANd ANANases in cANaAN
# [a<>]n bANANas ANd ANANases in cANaAN
# [an<61>]n bANANas ANd ANANases in cANaAN
#
# [an]?n bANANas ANd ANANases iAN cANaAN
# [a<>]?n bANANas ANd ANANases iAN cANaAN
# [an<61>]?n bANANas ANd ANANases iAN cANaAN
#
# [an]+n bANas ANd ANases in cAN
# [a<>]+n bANANas ANd ANANases in cANAN
# [an<61>]+n bananas and ananases in canaan
# ----------------------------------------------------------------------
#
# Apparently the problem is specific to u-acute; I've tried several
# other 8-bit characters and they seem to behave as expected.
#
# By comparing the second and third output lines, it would seem that the
# problem involves backtracking out of a partial match of [...]* in
# order to match the next sub-expression, when the latter begins with
# one of the given characters.
#
#
# All the best,
#
# --stolfi
#
# ------------------------------------------------------------------------
# Jorge Stolfi | http://www.dcc.unicamp.br/~stolfi | stolfi@dcc.unicamp.br
# Institute of Computing (formerly DCC-IMECC) | Wrk +55 (19)3788-5858
# Universidade Estadual de Campinas (UNICAMP) | +55 (19)3788-5840
# Av. Albert Einstein 1251 - Caixa Postal 6176 | Fax +55 (19)3788-5847
# 13083-970 Campinas, SP -- Brazil | Hom +55 (19)3287-4069
# ------------------------------------------------------------------------
#
# _______________________________________________
# Bug-gnu-utils mailing list
# Bug-gnu-utils@gnu.org
# http://mail.gnu.org/mailman/listinfo/bug-gnu-utils
#
#