wc(1): Extend non-controversial optimizations to '-c' mode

wc(1)'s slow path for counting words or multibyte characters requires
conversion of the 8-bit input stream to wide characters.  However, a faster
path can be used for counting only lines ('-l' -- newlines have the same
representation in all supported encodings) or bytes ('-c').

The existing line count optimization was not used if the input was the
implicit stdin.  Additionally, it wasn't used if only byte counting was
requested.  This change expands the fast path to both of these scenarios.

Expanding the buffer size from 64 kB helps reduce the number of read(2)
calls needed, but exactly what impact that change has and what size to
expand the buffer to are still under discussion.

PR:		224160
Tested by:	wosch (earlier version)
Sponsored by:	Dell EMC Isilon
This commit is contained in:
Conrad Meyer 2017-12-09 21:55:19 +00:00
parent 219afc4fe2
commit de1430411d

View File

@ -206,8 +206,7 @@ cnt(const char *file)
linect = wordct = charct = llct = tmpll = 0;
if (file == NULL)
fd = STDIN_FILENO;
else {
if ((fd = open(file, O_RDONLY, 0)) < 0) {
else if ((fd = open(file, O_RDONLY, 0)) < 0) {
xo_warn("%s: open", file);
return (1);
}
@ -218,7 +217,7 @@ cnt(const char *file)
* lines than to get words, since the word count requires some
* logic.
*/
if (doline) {
if (doline || dochar) {
while ((len = read(fd, buf, MAXBSIZE))) {
if (len == -1) {
xo_warn("%s: read", file);
@ -230,6 +229,7 @@ cnt(const char *file)
llct);
}
charct += len;
if (doline) {
for (p = buf; len--; ++p)
if (*p == '\n') {
if (tmpll > llct)
@ -239,7 +239,9 @@ cnt(const char *file)
} else
tmpll++;
}
}
reset_siginfo();
if (doline)
tlinect += linect;
if (dochar)
tcharct += charct;
@ -270,7 +272,6 @@ cnt(const char *file)
return (0);
}
}
}
/* Do it the hard way... */
word: gotsp = 1;