Commit Graph

119 Commits

Author SHA1 Message Date
Jilles Tjoelker
d358fa780b wordexp: Rewrite to make WRDE_NOCMD reliable.
Shell syntax is too complicated to detect command substitution and unquoted
operators reliably without implementing much of sh's parser. Therefore, have
sh do this detection.

While changing sh's support anyway, also read input from a pipe instead of
arguments to avoid {ARG_MAX} limits and improve privacy, and output count
and length using 16 instead of 8 digits.

The basic concept is:
execl("/bin/sh", "sh", "-c", "freebsd_wordexp ${1:+\"$1\"} -f "$2",
    "", flags & WRDE_NOCMD ? "-p" : "", <pipe with words>);

The WRDE_BADCHAR error is still implemented in libc. POSIX requires us to
fail strings containing unquoted braces with code WRDE_BADCHAR. Since this
is normally not a syntax error in sh, there is still a need for checking
code in libc, we_check().

The new we_check() is an optimistic check that all the characters
  <newline> | & ; < > ( ) { }
are quoted. To avoid duplicating too much sh logic, such characters are
permitted when quoting characters are seen, even if the quoting characters
may themselves be quoted. This code reports all WRDE_BADCHAR errors; bad
characters that get past it and are a syntax error in sh return WRDE_SYNTAX.

Although many implementations of WRDE_NOCMD erroneously allow some command
substitutions (and ours even documented this), there appears to be code that
relies on its security (codesearch.debian.net shows quite a few uses).
Passing untrusted data to wordexp() still exposes a denial of service
possibility and a fairly large attack surface.

Reviewed by:	wblock (man page only)
MFC after:	2 weeks
Relnotes:	yes
Security:	fixes command execution with wordexp(untrusted, WRDE_NOCMD)
2015-09-30 21:32:29 +00:00
Jilles Tjoelker
62c3711632 sh: Add set -o nolog.
POSIX requires this to prevent entering function definitions in history but
this implementation does nothing except retain the option's value. In ksh88,
function definitions were usually entered in the history file, even when
they came from ~/.profile and the $ENV file, to allow displaying their
definitions.

This is also the first option that does not have a letter.
2015-08-29 19:41:47 +00:00
Jilles Tjoelker
4a4867d667 sh: Fix out of bounds read when there is no ] after a [:class:].
The initial check for a matching ] was incorrect if a ] may be consumed by a
[:class:]. The subsequent loop assumed that there must be a ].

Remove the initial check and make the loop cope with a missing ].

Found with afl-fuzz.

MFC after:	1 week
2015-08-25 21:55:15 +00:00
Nathan Whitehorn
c4b725f42a Fix unitialized variable that broke sh on PowerPC starting with r278826. 2015-02-26 20:59:18 +00:00
Jilles Tjoelker
7034d8df04 sh: Various cleanups to expand.c:
* Remove some gotos.
* Remove unused parameter.
* Remove duplicate code.
2015-02-15 22:38:00 +00:00
Jilles Tjoelker
781bfb5a53 sh: Prefer "" to nullstr where possible. 2015-02-15 21:47:43 +00:00
Jilles Tjoelker
a4652c280b sh: Add stsavestr(), like savestr() but allocates using stalloc(). 2015-02-15 21:41:29 +00:00
Jilles Tjoelker
f649ab8b15 sh: Remove EXP_REDIR.
EXP_REDIR was supposed to generate pathnames in redirection if exactly one
file matches, as permitted but not required by POSIX in interactive mode. It
is unlikely this will be implemented.

No functional change is intended.

MFC after:	1 week
2014-12-21 22:18:30 +00:00
Jilles Tjoelker
08dc8cf90c sh: Use DQSYNTAX only while expanding, not SQSYNTAX.
Quoting during expansion only cares about CCTL, which is the same for
DQSYNTAX and SQSYNTAX.
2014-11-22 16:03:18 +00:00
Jilles Tjoelker
5dff1efc27 sh: Fix corruption of CTL* bytes in positional parameters in redirection.
EXP_REDIR was not being checked for while expanding positional parameters in
redirection, so CTL* bytes were not being prefixed where they should be.

MFC after:	1 week
2014-10-31 22:28:10 +00:00
Jilles Tjoelker
3fb51b3a43 Treat IFS separators in "$*" as quoted.
This makes a difference if IFS starts with *, ?, [ or a CTL* byte.
2014-10-28 22:14:31 +00:00
Jilles Tjoelker
622fdf3236 sh: Remove more gotos. 2014-10-15 21:20:56 +00:00
Jilles Tjoelker
33c5acf038 sh: Eliminate some gotos. 2014-10-05 21:51:36 +00:00
Jilles Tjoelker
7b9104c0a9 sh: Correctly handle positional parameters beyond INT_MAX on 64-bit systems.
Currently, there can be no more than INT_MAX positional parameters. Make
sure to treat all higher ones as unset to avoid incorrect results and
crashes.

On 64-bit systems, our atoi() takes the low 32 bits of the strtol() and
sign-extends them.

On 32-bit systems, the call to atoi() returned INT_MAX for too high values
and there is not enough address space for so many positional parameters, so
there was no issue.
2014-07-12 21:54:11 +00:00
Jilles Tjoelker
5ddabb8348 sh: Consistently treat ${01} like $1.
Leading zeroes were ignored when checking whether a positional parameter is
set, but not when expanding its value. Ignore leading zeroes in any case.
2014-07-12 10:27:30 +00:00
Jilles Tjoelker
1632bf1a88 sh: Fix possible memory leaks and double frees with unexpected SIGINT. 2014-03-26 20:43:40 +00:00
Jilles Tjoelker
61346cbdc7 sh: Add some consts. 2014-03-14 21:45:37 +00:00
Jilles Tjoelker
a2cba42fc2 sh: Make argstr() return where it stopped and simplify expari() using this. 2014-03-04 22:30:38 +00:00
Jilles Tjoelker
ce16da82dd sh: Simplify expari().
Redo expari() like evalvar(). This makes the logic more understandable and
avoids possible problems if arithmetic expansion occurs if CTLESC characters
are not generated (looking backwards for CTLARI is not generally possible in
that case but the old code tried anyway).

This adds an extra argstr() recursion.
2014-03-02 22:59:34 +00:00
Jilles Tjoelker
5439648913 sh: Do not corrupt internal representation if LINENO inner expansion fails.
Example:
  f() { : ${LINENO+$((1/0))}; }
and call this function twice.
2014-02-27 16:54:43 +00:00
Jilles Tjoelker
85bf1d2f07 sh: Make expari() static. 2014-02-26 21:38:42 +00:00
Jilles Tjoelker
670dd3f08f sh: Prefer memcpy() to strcpy() in most cases. Remove the scopy macro. 2013-11-30 21:27:11 +00:00
Jilles Tjoelker
46c6b52dfb sh: Fix various compiler warnings.
It now passes WARNS=7 with clang on i386.

GCC 4.2.1 does not understand setjmp() properly so will always trigger
-Wuninitialized. I will not add the volatile keywords to suppress this.
2013-04-01 17:18:22 +00:00
Jilles Tjoelker
4dc6bdd3e7 sh: Expand here documents in the current process.
Expand here documents at the same point other redirections are expanded but
use a non-fork subshell environment (like simple command substitutions) for
compatibility. Substitition errors result in an empty here document like
before.

As a result, a fork is avoided for short (<4K) expanded here documents.

Unexpanded here documents (with quoted end marker after <<) are not affected
by this change. They already only forked when >4K.

Side effects:
* Order of expansion is slightly different.
* Slow expansions are not executed in parallel with the redirected command.
* A non-fork subshell environment is subtly different from a forked process.
2013-02-03 15:54:57 +00:00
Jilles Tjoelker
260fc3f4d2 sh: Make various functions static. 2012-01-01 22:17:12 +00:00
Jilles Tjoelker
820491f824 sh: Make patmatch() non-recursive. 2012-01-01 20:50:19 +00:00
Jilles Tjoelker
6e8db49a44 sh: Use dirent.d_type in pathname generation.
This improves performance for globs where a slash or another component
follows a component with metacharacters by eliminating unnecessary attempts
to open directories that are not.
2011-12-28 23:40:46 +00:00
Jilles Tjoelker
7a2b9d4b38 sh: Cache de->d_namlen in a local variable. 2011-12-28 23:30:17 +00:00
Jilles Tjoelker
ff4dc67299 sh: Add support for named character classes in bracket expressions.
Example:
  case x in [[:alpha:]]) echo yes ;; esac
2011-06-15 21:48:10 +00:00
Jilles Tjoelker
454a02b372 sh: Fix duplicate prototypes for builtins.
Have mkbuiltins write the prototypes for the *cmd functions to builtins.h
instead of builtins.c and include builtins.h in more .c files instead of
duplicating prototypes for *cmd functions in other headers.
2011-06-13 21:03:27 +00:00
Jilles Tjoelker
c543e1ae9e sh: Save/restore changed variables in optimized command substitution.
In optimized command substitution, save and restore any variables changed by
expansions (${var=value} and $((var=assigned))), instead of trying to
determine if an expansion may cause such changes.

If $! is referenced in optimized command substitution, do not cause jobs to
be remembered longer.

This fixes $(jobs $!) again, simplifies the man page and shortens the code.
2011-06-12 23:06:04 +00:00
Jilles Tjoelker
f5ac5937d3 sh: Fix locale-dependent ranges in bracket expressions.
When I added UTF-8 support in r221646, the LC_COLLATE-based ordering broke
because of sign extension of char.

Because of libc restrictions, this does not work for UTF-8. For UTF-8
locales, ranges always use character code order.
2011-06-12 12:54:52 +00:00
Jilles Tjoelker
292e667663 sh: Do parameter expansion before printing PS4 (set -x).
The function name expandstr() and the general idea of doing this kind of
expansion by treating the text as a here document without end marker is from
dash.

All variants of parameter expansion and arithmetic expansion also work (the
latter is not required by POSIX but it does not take extra code and many
other shells also allow it).

Command substitution is prevented because I think it causes too much code to
be re-entered (for example creating an unbounded recursion of trace lines).

Unfortunately, our LINENO is somewhat crude, otherwise PS4='$LINENO+ ' would
be quite useful.
2011-06-09 23:12:23 +00:00
Jilles Tjoelker
715a0dd556 sh: Fix unquoted $@/$* if IFS=''.
If IFS is null, unquoted $@/$* should still expand to separate words.
This differs from quoted $@ (which does not depend on IFS) in that pathname
generation is performed and empty words are removed.
2011-05-27 15:56:13 +00:00
Jilles Tjoelker
7cc6b3df80 sh: Add UTF-8 support to pattern matching.
?, [...] patterns match codepoints instead of bytes. They do not match
invalid sequences. [...] patterns must not contain invalid sequences
otherwise they will not match anything. This is so that ${var#?} removes the
first codepoint, not the first byte, without putting UTF-8 knowledge into
the ${var#pattern} code. However, * continues to match any string and an
invalid sequence matches an identical invalid sequence. (This differs from
fnmatch(3).)
2011-05-08 11:32:20 +00:00
Jilles Tjoelker
4c244ed255 sh: Add UTF-8 support to ${#var}.
If the current locale uses UTF-8, ${#var} counts codepoints (more precisely,
bytes b with (b & 0xc0) != 0x80).
2011-05-07 14:32:16 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Jilles Tjoelker
3e0b768c63 sh: Remove comment mentioning herefd, which is gone. 2011-02-02 21:48:53 +00:00
Jilles Tjoelker
acd7984f96 sh: Don't do optimized command substitution if expansions have side effects.
Before considering to execute a command substitution in the same process,
check if any of the expansions may have a side effect; if so, execute it in
a new process just like happens if it is not a single simple command.

Although the check happens at run time, it is a static check that does not
depend on current state. It is triggered by:
- expanding $! (which may cause the job to be remembered)
- ${var=value} default value assignment
- assignment operators in arithmetic
- parameter substitutions in arithmetic except ${#param}, $$, $# and $?
- command substitutions in arithmetic

This means that $((v+1)) does not prevent optimized command substitution,
whereas $(($v+1)) does, because $v might expand to something containing
assignment operators.

Scripts should not depend on these exact details for correctness. It is also
imaginable to have the shell fork if and when a side effect is encountered
or to create a new temporary namespace for variables.

Due to the $! change, the construct $(jobs $!) no longer works. The value of
$! should be stored in a variable outside command substitution first.
2010-12-28 21:27:08 +00:00
Jilles Tjoelker
d8f32e7287 sh: Allow arbitrary large numbers in CHECKSTRSPACE.
Reduce "stack string" API somewhat and simplify code.
Add a check for integer overflow of the "stack string" length (probably
incomplete).
2010-12-26 13:25:47 +00:00
Ulrich Spörlein
f6b767b026 Remove dead code.
c is assigned 0 and *loc is pointing to NULL, so c!=0 cannot be true,
and dereferencing loc would be a bad idea anyway.

Coverity Prevent:	CID 5113
Reviewed by:		jilles
2010-12-18 22:16:15 +00:00
Jilles Tjoelker
fa0951d63a sh: Fix corruption of command substitutions with special chars after newline
The CTLESC byte to protect a special character was output before instead of
after a newline directly preceding the special character.

The special handling of newlines is because command substitutions discard
all trailing newlines.
2010-12-16 23:28:20 +00:00
Jilles Tjoelker
9f5a68a002 sh: Remove the herefd hack.
The herefd hack wrote out partial here documents while expanding them. It
seems unnecessary complication given that other expansions just allocate
memory. It causes bugs because the stack is also used for intermediate
results such as arithmetic expressions. Such places should disable herefd
for the duration but not all of them do, and I prefer removing the need for
disabling herefd to disabling it everywhere needed.

Here documents larger than 1024 bytes will use a bit more CPU time and
memory.

Additionally this allows a later change to expand here documents in the
current shell environment. (This is faster for small here documents but also
changes behaviour.)

Obtained from:	dash
2010-12-12 00:07:27 +00:00
Jilles Tjoelker
f7dea8517f sh: Replace some macros and repeated code in expand.c with functions.
No functional change is intended, but the binary is about 1K smaller on
i386.
2010-12-11 22:13:29 +00:00
Jilles Tjoelker
9d37e15722 sh: Code size optimizations to "stack string" memory allocation:
* Prefer one CHECKSTRSPACE with multiple USTPUTC to multiple STPUTC.
* Add STPUTS macro (based on function) and use it instead of loops that add
  nul-terminated strings to the stack string.

No functional change is intended, but code size is about 1K less on i386.
2010-11-23 22:17:39 +00:00
Jilles Tjoelker
aeb5d06504 sh: Code size optimizations to buffered output.
This is mainly less use of the outc macro.

No functional change is intended, but code size is about 2K less on i386.
2010-11-20 14:14:52 +00:00
Jilles Tjoelker
60f7eec450 sh: Fix some issues with CTL* bytes and ${var#pat}.
subevalvar() incorrectly assumed that CTLESC bytes were present iff the
expansion was quoted. However, they are present iff various processing such
as word splitting is to be done later on.

Example:
  v=@$e@$e@$e@
  y="${v##*"$e"}"
  echo "$y"
failed if $e contained the magic CTLESC byte.

Exp-run done by:	pav (with some other sh(1) changes)
2010-10-29 19:34:57 +00:00
Jilles Tjoelker
048f26671a sh: Do IFS splitting on word in ${v+word} and ${v-word}.
The code is inspired by NetBSD sh somewhat, but different because we
preserve the old Almquist/Bourne/Korn ability to have an unquoted part in a
quoted ${v+word}. For example, "${v-"*"}" expands to $v as a single field if
v is set, but generates filenames otherwise.

Note that this is the only place where we split text literally from the
script (the similar ${v=word} assigns to v and then expands $v). The parser
must now add additional markers to allow the expansion code to know whether
arbitrary characters in substitutions are quoted.

Example:
  for i in ${$+a b c}; do echo $i; done

Exp-run done by:	pav (with some other sh(1) changes)
2010-10-29 13:42:18 +00:00
David E. O'Brien
8832864298 In the spirit of r90111, depend on c89 and remove the "STATIC" macro
and its usage.
2010-10-13 22:18:03 +00:00
John Baldwin
8ab2e97063 Make DEBUG traces 64-bit clean:
- Use %t to print ptrdiff_t values.
- Cast a ptrdiff_t value explicitly to int for a field width specifier.

While here, sort includes.

Submitted by:	Garrett Cooper
2010-10-13 13:22:11 +00:00