143 Commits

Author SHA1 Message Date
jilles
aca221db9d wordexp: Rewrite to make WRDE_NOCMD reliable.
Shell syntax is too complicated to detect command substitution and unquoted
operators reliably without implementing much of sh's parser. Therefore, have
sh do this detection.

While changing sh's support anyway, also read input from a pipe instead of
arguments to avoid {ARG_MAX} limits and improve privacy, and output count
and length using 16 instead of 8 digits.

The basic concept is:
execl("/bin/sh", "sh", "-c", "freebsd_wordexp ${1:+\"$1\"} -f "$2",
    "", flags & WRDE_NOCMD ? "-p" : "", <pipe with words>);

The WRDE_BADCHAR error is still implemented in libc. POSIX requires us to
fail strings containing unquoted braces with code WRDE_BADCHAR. Since this
is normally not a syntax error in sh, there is still a need for checking
code in libc, we_check().

The new we_check() is an optimistic check that all the characters
  <newline> | & ; < > ( ) { }
are quoted. To avoid duplicating too much sh logic, such characters are
permitted when quoting characters are seen, even if the quoting characters
may themselves be quoted. This code reports all WRDE_BADCHAR errors; bad
characters that get past it and are a syntax error in sh return WRDE_SYNTAX.

Although many implementations of WRDE_NOCMD erroneously allow some command
substitutions (and ours even documented this), there appears to be code that
relies on its security (codesearch.debian.net shows quite a few uses).
Passing untrusted data to wordexp() still exposes a denial of service
possibility and a fairly large attack surface.

Reviewed by:	wblock (man page only)
MFC after:	2 weeks
Relnotes:	yes
Security:	fixes command execution with wordexp(untrusted, WRDE_NOCMD)
2015-09-30 21:32:29 +00:00
jilles
6633d3a788 sh: Allow empty << EOF markers. 2015-09-02 19:49:55 +00:00
jilles
f504ca457f sh: Don't create bad parse result when postponing a bad substitution error.
An invalid substitution like ${var@} does not cause a parse error but is
stored in the intermediate representation, to be written as part of the
error message. If there is a CTL* byte in the stored part, this confuses
some code such as the code to skip an unused alternative such as in
${var-alternative}.

To keep things simple, do not store CTL* bytes.

Found with afl-fuzz.

MFC after:	1 week
2015-08-23 20:44:53 +00:00
jilles
5e4f40c860 sh: Avoid negative character values from $'\Uffffffff' etc.
The negative value was not expected and generated the low 8 bits as a byte,
which may be an invalid character encoding.

The final shift in creating the negative value was undefined as well.

Make the temporary variable unsigned to fix this.
2015-08-20 22:05:55 +00:00
jilles
a0092595b9 sh: Prefer "" to nullstr where possible. 2015-02-15 21:47:43 +00:00
jilles
6c8ea83ec5 sh: Prepend "$0: " to error messages if there is no command name. 2014-11-22 23:28:41 +00:00
jilles
4eb9d5ad9f sh: Allow backslash-newline continuation in more places:
* directly after a $
 * directly after ${
 * between the characters of a multi-character operator token
 * within a parameter name
2014-10-19 11:59:15 +00:00
jilles
885ad60537 sh: Make parseredir() a proper function instead of an emulated nested
function.
2014-10-15 21:26:09 +00:00
jilles
caaf5f0082 sh: Remove more gotos. 2014-10-15 21:20:56 +00:00
jilles
ce43ba6175 sh: Fix LINENO and prompt after $'\0 and newline. 2014-10-03 20:24:56 +00:00
jilles
b9a6c69366 sh: Remove arbitrary length limit on << EOF markers.
This also simplifies the code.
2014-09-14 16:46:30 +00:00
jilles
860c473214 sh: Make checkend() a real function instead of an emulated nested function.
No functional change is intended, but the generated code is slightly
different.
2014-09-14 16:27:49 +00:00
jilles
4e839bb9f4 sh: Add some const keywords. 2014-09-14 15:59:15 +00:00
jilles
9b565c0250 sh: Allow aliases to force alias substitution on the following word.
If an alias's value ends with a space or tab, the next word is also
checked for aliases.

This is a POSIX feature. It is useful with utilities like command and
nohup (alias them to themselves followed by a space).
2014-01-26 21:19:33 +00:00
jilles
6d50a40d08 sh: Simplify list() in the parser.
The erflag argument was only used by old-style (``) command substitutions.
We can remove it and handle the special case in the command substitution
code.
2013-08-30 20:50:28 +00:00
jilles
fc9c61263d sh: Separate out nbinary allocation into a function. 2013-08-30 20:37:52 +00:00
jilles
67c1af2857 sh: Use makename() where possible. 2013-08-30 20:13:33 +00:00
jilles
e60f4ea26b sh: Add a function for the case where one token is required in the parse. 2013-08-30 13:25:15 +00:00
jilles
415d59b5d8 sh: Cast -1 to pointer rather than pointer to variable of wrong type.
NEOF needs to be a non-null pointer distinct from valid union node pointers.
It is not dereferenced.

The new NEOF is much like SIG_ERR except that it is an object pointer
instead of a function pointer.

The variable tokpushback can now be static.
2013-08-30 10:45:02 +00:00
jilles
e694117f82 sh: Disallow empty simple commands.
As per POSIX, a simple command must have at least one redirection,
assignment word or command word.

These occured in rare cases such as  eval "f()" .

The extension of allowing no commands inside { }, if, while, for, etc.
remains.
2013-08-25 10:57:48 +00:00
jilles
e5ea815310 sh: Remove unnecessary reset functions.
These are already handled by exception handlers.
2013-08-16 20:24:41 +00:00
jilles
79360f25fc sh: Allow a lone redirection before '|', ';;' or ';&'.
Example: </dev/null | :

PR:		181240
MFC after:	1 week
2013-08-14 19:34:13 +00:00
jilles
44afceea6d sh: Remove an incorrect comment. 2013-07-25 20:50:35 +00:00
jilles
45d56acf59 sh: Remove #define MKINIT.
MKINIT only served for the removed mkinit. Many variables can be static now.
2013-07-25 19:48:15 +00:00
jilles
0ad2a46f33 sh: Remove mkinit.
Replace the RESET blocks with regular functions and a reset() function that
calls them all.

This code generation tool is unusual and does not appear to provide much
benefit. I do not think isolating the knowledge about which modules need to
be reset is worth an almost 500-line build tool and wider scope for
variables used by the reset functions.

Also, relying on reset functions is often wrong: the cleanup should be done
in exception handlers so that no stale state remains after 'command eval'
and the like.
2013-07-25 15:08:41 +00:00
jilles
3deb97fc0b sh: Fix various compiler warnings.
It now passes WARNS=7 with clang on i386.

GCC 4.2.1 does not understand setjmp() properly so will always trigger
-Wuninitialized. I will not add the volatile keywords to suppress this.
2013-04-01 17:18:22 +00:00
jilles
b479a582c3 sh: Fix crash when parsing '{ } &'.
MFC after:	1 week
2013-01-13 19:26:33 +00:00
jilles
1af5c5cc5f sh: Don't lose $? when backquoted command ends with semicolon or newline.
An empty simple command was added and overwrote the exit status with 0.

This affects `...` but not $(...).

Example:
  v=`false;`; echo $?
2013-01-13 19:19:40 +00:00
jilles
f78ebc168b sh: Remove special support for background simple commands.
It expands the arguments in the parent shell process, which is incorrect.
2011-06-18 23:58:59 +00:00
jilles
c7a72567a8 sh: Add case statement fallthrough (with ';&' instead of ';;').
Replacing ;; with the new control operator ;& will cause the next list to be
executed as well without checking its pattern, continuing until a list ends
with ;; or until the end of the case statement. This is like omitting
"break" in a C "switch" statement.

The sequence ;& was formerly invalid.

This feature is proposed for the next POSIX issue in Austin Group issue
#449.
2011-06-17 13:03:49 +00:00
jilles
bd55770c94 sh: Do parameter expansion before printing PS4 (set -x).
The function name expandstr() and the general idea of doing this kind of
expansion by treating the text as a here document without end marker is from
dash.

All variants of parameter expansion and arithmetic expansion also work (the
latter is not required by POSIX but it does not take extra code and many
other shells also allow it).

Command substitution is prevented because I think it causes too much code to
be re-entered (for example creating an unbounded recursion of trace lines).

Unfortunately, our LINENO is somewhat crude, otherwise PS4='$LINENO+ ' would
be quite useful.
2011-06-09 23:12:23 +00:00
jilles
3dd8ae4222 sh: Expand aliases after assignments and redirections. 2011-05-21 22:03:06 +00:00
jilles
343e29c626 sh: Allow terminating a heredoc with a terminator at EOF without a newline.
This is sometimes used with eval or old-style command substitution, and most
shells other than ash derivatives allow it.

It can also be used with scripts that violate POSIX's requirement on the
application that they end in a newline (scripts must be text files except
that line length is unlimited).

Example:
v=`cat <<EOF
foo
EOF`
echo $v

This commit does not add support for the similar construct with new-style
command substitution, like
v=$(cat <<EOF
foo
EOF)
This continues to require a newline after the terminator.
2011-05-20 16:03:36 +00:00
jilles
c9be2081e0 sh: Add \u/\U support (in $'...') for UTF-8.
Because we have no iconv in base, support for other charsets is not
possible.

Note that \u/\U are processed using the locale that was active when the
shell started. This is necessary to avoid behaviour that depends on the
parse/execute split (for example when placing braces around an entire
script). Therefore, UTF-8 encoding is implemented manually.
2011-05-08 17:40:10 +00:00
jilles
5a49f52603 sh: Add $'quoting' (C-style escape sequences).
A string between $' and ' may contain backslash escape sequences similar to
the ones in a C string constant (except that a single-quote must be escaped
and a double-quote need not be). Details are in the sh(1) man page.

This construct is useful to include unprintable characters, tabs and
newlines in strings; while this can be done with a command substitution
containing a printf command, that needs ugly workarounds if the result is to
end with a newline as command substitution removes all trailing newlines.

The construct may also be useful in future to describe unprintable
characters without needing to write those characters themselves in 'set -x',
'export -p' and the like.

The implementation attempts to comply to the proposal for the next issue of
the POSIX specification. Because this construct is not in POSIX.1-2008,
using it in scripts intended to be portable is unwise.

Matching the minimal locale support in the rest of sh, the \u and \U
sequences are currently not useful.

Exp-run done by: pav (with some other sh(1) changes)
2011-05-05 20:55:55 +00:00
jilles
fa0f3c42ef sh: Detect an error for ${#var<GARBAGE>}.
In particular, this makes things like ${#foo[0]} and ${#foo[@]} errors
rather than silent equivalents of ${#foo}.

PR:		bin/151720
Submitted by:	Mark Johnston
Exp-run done by: pav (with some other sh(1) changes)
2011-05-04 21:49:34 +00:00
jilles
1347144ea4 sh: Do not word split "${#parameter}".
This is only a problem if IFS contains digits, which is unusual but valid.

Because of an incorrect fix for PR bin/12137, "${#parameter}" was treated
as ${#parameter}. The underlying problem was that "${#parameter}"
erroneously added CTLESC bytes before determining the length. This
was properly fixed for PR bin/56147 but the incorrect fix was not backed
out.

Reported by:	Seeker on forums.freebsd.org
MFC after:	2 weeks
2011-04-20 22:24:54 +00:00
jilles
161663c247 sh: Fix some parameter expansion variants ${#...}.
These already worked: $# ${#} ${##} ${#-} ${#?}
These now work as well: ${#+word} ${#-word} ${##word} ${#%word}

There is an ambiguity in the standard with ${#?}: it could be the length of
$? or it could be $# giving an error in the (impossible) case that it is not
set. We continue to use the former interpretation as it seems more useful.
2011-03-13 20:02:39 +00:00
jilles
ff6aee65ce sh: Fix two things about {(...)} <redir:
* In {(...) <redir1;} <redir2, do not drop redir1.
* Maintain the difference between (...) <redir and {(...)} <redir:
  In (...) <redir, the redirection is performed in the child, while in
  {(...)} <redir it should be performed in the parent (like {(...); :;}
  <redir)
2011-02-05 15:02:19 +00:00
jilles
de73f385a5 sh: Allow arbitrary large numbers in CHECKSTRSPACE.
Reduce "stack string" API somewhat and simplify code.
Add a check for integer overflow of the "stack string" length (probably
incomplete).
2010-12-26 13:25:47 +00:00
uqs
889baffc86 Remove duplicate check, turning dead code into live code.
Coverity CID:	5114
Reviewed by:	jilles
2010-12-13 10:48:49 +00:00
jilles
7377de8f91 sh: Code size optimizations to "stack string" memory allocation:
* Prefer one CHECKSTRSPACE with multiple USTPUTC to multiple STPUTC.
* Add STPUTS macro (based on function) and use it instead of loops that add
  nul-terminated strings to the stack string.

No functional change is intended, but code size is about 1K less on i386.
2010-11-23 22:17:39 +00:00
jilles
000173def6 sh: Fix some issues with aliases and case, by importing dash checkkwd code.
This moves the function of the noaliases variable into the checkkwd
variable. This way it is properly reset on errors and aliases can be used
normally in the commands for each case (the case labels recognize the
keyword esac but no aliases).

The new code is clearer as well.

Obtained from:	dash
2010-11-02 23:44:29 +00:00
jilles
4de067d3c2 sh: Use iteration instead of recursion to evaluate semicolon lists.
This reduces CPU and memory usage when executing long lists (such
as long functions).
2010-10-31 12:06:02 +00:00
jilles
2ae15286ba sh: Tweak some string constants to reduce code size.
* Reduce some needless differences.
* Shorten some error messages that should not happen.
2010-10-29 21:44:43 +00:00
jilles
038f244ca5 sh: Reject function names ending in one of !%*+-=?@}~
These do something else in ksh: name=(...) is an array or compound variable
assignment and the others are extended patterns.

This is the last patch of the ones tested in the exp run.

Exp-run done by:	pav (with some other sh(1) changes)
2010-10-29 21:20:56 +00:00
jilles
f98d5a366d sh: Detect various additional errors in the parser.
Apart from detecting breakage earlier or at all, this also fixes a segfault
in the testsuite. The "handling" of the breakage left an invalid internal
representation in some cases.

Examples:
  echo a; do echo b
  echo `) echo a`
  echo `date; do do do`

Exp-run done by:	pav (with some other sh(1) changes)
2010-10-29 21:06:57 +00:00
jilles
b6e7fcf97b sh: Error out on various specials/keywords in the wrong place in backticks.
Example:
  echo `date)`

Exp-run done by:	pav (with some other sh(1) changes)
Obtained from:		NetBSD (Christos Zoulas, NetBSD PR 11317)
2010-10-29 20:23:41 +00:00
jilles
28ad180ab4 sh: Do IFS splitting on word in ${v+word} and ${v-word}.
The code is inspired by NetBSD sh somewhat, but different because we
preserve the old Almquist/Bourne/Korn ability to have an unquoted part in a
quoted ${v+word}. For example, "${v-"*"}" expands to $v as a single field if
v is set, but generates filenames otherwise.

Note that this is the only place where we split text literally from the
script (the similar ${v=word} assigns to v and then expands $v). The parser
must now add additional markers to allow the expansion code to know whether
arbitrary characters in substitutions are quoted.

Example:
  for i in ${$+a b c}; do echo $i; done

Exp-run done by:	pav (with some other sh(1) changes)
2010-10-29 13:42:18 +00:00
jilles
6f54496b16 sh: Only accept a '}' inside ${v+-=?...} if double-quote state matches.
If double-quote state does not match, treat the '}' literally.

This ensures double-quote state remains the same before and after a
${v+-=?...} which helps with expand.c.

It makes things like
  ${foo+"\${bar}"}
which I have seen in the wild work as expected.

Exp-run done by:	pav (with some other sh(1) changes)
2010-10-28 22:34:49 +00:00