New features:
* proper lazy evaluation of || and &&
* ?: ternary operator
* executable is considerably smaller (8K on i386) because lex and yacc are
no longer used
Differences from dash:
* arith_t instead of intmax_t
* imaxdiv() not used
* unset or null variables default to 0
* let/exp builtin (undocumented, will probably be removed later)
Obtained from: dash
* In {(...) <redir1;} <redir2, do not drop redir1.
* Maintain the difference between (...) <redir and {(...)} <redir:
In (...) <redir, the redirection is performed in the child, while in
{(...)} <redir it should be performed in the parent (like {(...); :;}
<redir)
POSIX requires this and it is simpler than the previous code that remembered
command locations when appending directories to PATH.
In particular,
PATH=$PATH
is no longer a no-op but discards all cached command locations.
If execve() returns an [ENOEXEC] error, check if the file is binary before
trying to execute it using sh. A file is considered binary if at least one
of the first 256 bytes is '\0'.
In particular, trying to execute ELF binaries for the wrong architecture now
fails with an "Exec format error" message instead of syntax errors and
potentially strange results.
These are called "shell procedures" in the source.
If execve() failed with [ENOEXEC], the shell would reinitialize itself
and execute the program as a script. This requires a fair amount of code
which is not frequently used (most scripts have a #! magic number).
Therefore just execute a new instance of sh (_PATH_BSHELL) to run the
script.
This matches the constants from <signal.h> with 'SIG' removed, which POSIX
requires kill and trap to accept and 'kill -l' to write.
'kill -l', 'trap', 'trap -l' output is now upper case.
In Turkish locales, signal names with an upper case 'I' are now accepted,
while signal names with a lower case 'i' are no longer accepted, and the
output of 'killall -l' now contains proper capital 'I' without dot instead
of a dotted capital 'I'.
* There is no plan for an alternative to the command "set".
* Attempting to unset a readonly variable has not raised an error for quite
a while, so the order of unsetting a variable and a function with the same
name does not matter.
MFC after: 1 week
When a foreground job exits on a signal, a message is printed to stdout
about this. The buffer was not flushed after this which could result in the
message being written to the wrong file if the next command was a builtin
and had stdout redirected.
Example:
sh -c 'kill -9 $$'; : > foo; echo FOO:; cat foo
Reported by: gcooper
MFC after: 1 week
This is useful so that it is easier to exit on a signal than to reset the
trap to default and resend the signal. It matches ksh93. POSIX says that
'exit' without args from a trap action uses the exit status from the last
command before the trap, which is different from 'exit $?' and matches this
if the previous command is assumed to have exited on the signal.
If the signal is SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOU, or if the default
action for the signal is to ignore it, a normal _exit(2) is done with exit
status 128+signal_number.
* Make 'trap --' do the same as 'trap' instead of nothing.
* Make '--' stop option processing (note that '-' action is not an option).
Side effect: The error message for an unknown option is different.
All builtins are now always found before a PATH search.
Most ash derivatives have an undocumented feature where the presence of an
entry "%builtin" in $PATH will cause builtins to be checked at that point of
the PATH search, rather than before looking at any directories as documented
in the man page (very old versions do document this feature).
I am removing this feature from sh, as it complicates the code, may violate
expectations (for example, /usr/bin/alias is very close to a forkbomb with
PATH=/usr/bin:%builtin, only /usr/bin/builtin not being another link saves
it) and appears to be unused (all the %builtin google code search finds is
in some sort of ash source code).
Note that aliases and functions took and take precedence above builtins.
Because aliases work on a lexical level they can only ever be overridden on
a lexical level (quoting or preceding 'builtin' or 'command'). Allowing
override of functions via PATH does not really fit in the model of sh and it
would work differently from %builtin if implemented.
Note: POSIX says special builtins are found before functions. We comply to
this because we do not allow functions with the same name as a special
builtin.
Silence from: freebsd-hackers@ (message sent 20101225)
Discussed with: dougb
It should use the original exit status, just like falling off the
end of the trap handler.
Outside an EXIT trap, 'exit' is still equivalent to 'exit $?'.
An error message is written, the builtin is not executed, nonzero exit
status is returned but the shell does not abort.
This was already checked for special builtins and external commands, with
the same consequences except that the shell aborts for special builtins.
Obtained from: NetBSD
Change the criterion for builtins to be safe to execute in the same process
in optimized command substitution from a blacklist of only cd, . and eval to
a whitelist.
This avoids clobbering the main shell environment such as by $(exit 4) and
$(set -x).
The builtins jobid, jobs, times and trap can still show information not
available in a child process; this is deliberately permitted. (Changing
traps is not.)
For some builtins, whether they are safe depends on the arguments passed to
them. Some of these are always considered unsafe to keep things simple; this
only harms efficiency a little in the rare case they are used alone in a
command substitution.
If SIGINT arrived at exactly the right moment (unlikely), an exception
handler in a no longer active stack frame would be called.
Because the old handler was not used in the normal path, clang thought it
was a dead value and if an exception happened it would longjmp() to garbage.
This caused builtins/fc1.0 to fail if histedit.c was compiled with clang.
MFC after: 1 week
Before considering to execute a command substitution in the same process,
check if any of the expansions may have a side effect; if so, execute it in
a new process just like happens if it is not a single simple command.
Although the check happens at run time, it is a static check that does not
depend on current state. It is triggered by:
- expanding $! (which may cause the job to be remembered)
- ${var=value} default value assignment
- assignment operators in arithmetic
- parameter substitutions in arithmetic except ${#param}, $$, $# and $?
- command substitutions in arithmetic
This means that $((v+1)) does not prevent optimized command substitution,
whereas $(($v+1)) does, because $v might expand to something containing
assignment operators.
Scripts should not depend on these exact details for correctness. It is also
imaginable to have the shell fork if and when a side effect is encountered
or to create a new temporary namespace for variables.
Due to the $! change, the construct $(jobs $!) no longer works. The value of
$! should be stored in a variable outside command substitution first.
Command substitutions consisting of a single simple command are executed in
the main shell process but this should be invisible apart from performance
and very few exceptions such as $(trap).
Maintain a pointer to the end of the stack string area instead of how much
space is left. This simplifies the macros in memalloc.h. The places where
the new variable must be updated are only where the memory area is created,
destroyed or resized.
This allows specifying a %job (which is equivalent to the corresponding
process group).
Additionally, it improves reliability of kill from sh in high-load
situations and ensures "kill" finds the correct utility regardless of PATH,
as required by POSIX (unless the undocumented %builtin mechanism is used).
Side effect: fatal errors (any error other than kill(2) failure) now return
exit status 2 instead of 1. (This is consistent with other sh builtins, but
not in NetBSD.)
Code size increases about 1K on i386.
Obtained from: NetBSD
The #define for warnx now behaves much like the libc function (except that
it uses sh command name and output).
Also, it now uses C99 __VA_ARGS__ so there is no need for three different
macros for 0, 1 or 2 parameters.
Constants in arithmetic starting with 0 should be octal only.
This avoids the following highly puzzling result:
$ echo $((018-017))
3
by making it an error instead.
c is assigned 0 and *loc is pointing to NULL, so c!=0 cannot be true,
and dereferencing loc would be a bad idea anyway.
Coverity Prevent: CID 5113
Reviewed by: jilles
The CTLESC byte to protect a special character was output before instead of
after a newline directly preceding the special character.
The special handling of newlines is because command substitutions discard
all trailing newlines.
* Prefer kill(-X) to killpg(X).
* Remove some dead code.
* No additional SIGINT is needed if int_pending() is already true.
No functional change is intended.
The herefd hack wrote out partial here documents while expanding them. It
seems unnecessary complication given that other expansions just allocate
memory. It causes bugs because the stack is also used for intermediate
results such as arithmetic expressions. Such places should disable herefd
for the duration but not all of them do, and I prefer removing the need for
disabling herefd to disabling it everywhere needed.
Here documents larger than 1024 bytes will use a bit more CPU time and
memory.
Additionally this allows a later change to expand here documents in the
current shell environment. (This is faster for small here documents but also
changes behaviour.)
Obtained from: dash
The code to translate the internal representation to text did not know about
various additions to the internal representation since the original ash and
therefore wrote binary stuff to the terminal.
The code is used in the jobs command and similar output.
Note that the output is far from complete and mostly serves for recognition
purposes.
If describing the status of a pipeline, write all elements of the pipeline
and show the status of the last process (which would also end up in $?).
Only write one report per job, not one for every process that exits.
To keep some earlier behaviour, if any process started by the shell in a
foreground job terminates because of a signal, write a message about the
signal (at most one message per job, however).
Also, do not write messages about signals in the wait builtin in
non-interactive shells. Only true foreground jobs now write such messages
(for example, "Terminated").
In r208489, I added code to reap zombies when forking new processes, to
limit the amount of zombies. However, this can lead to marking a job as done
or stopped if it consists of multiple processes and the first process ends
very quickly. Fix this by only checking for zombies before forking the first
process of a job and not marking any jobs without processes as done or
stopped.
The getpgid() call will fail if the first process in the job has already
terminated, resulting in output of "-1".
The pgid of a job is always the pid of the first process in the job and
other code already relies on this.
Make sure all built-in commands are in the subsection named such, except
exp, let and wordexp which are deliberately undocumented. The text said only
built-ins that really need to be a built-in were documented there but in
fact almost all of them were already documented.
* Prefer one CHECKSTRSPACE with multiple USTPUTC to multiple STPUTC.
* Add STPUTS macro (based on function) and use it instead of loops that add
nul-terminated strings to the stack string.
No functional change is intended, but code size is about 1K less on i386.
If getcwd fails, do not treat this as an error, but print a warning and
unset PWD. This is similar to the behaviour when starting the shell in a
directory whose name cannot be determined.
Since is_alpha/is_name/is_in_name were made ASCII-only, this can no longer
happen.
Additionally, the check was wrong because it did not include the new
CTLQUOTEEND.
This was removed in 2001 but I think it is appropriate to add it back:
* I do not want to encourage people to write fragile and non-portable echo
commands by making printf much slower than echo.
* Recent versions of Autoconf use it a lot.
* Almost no software still wants to support systems that do not have
printf(1) at all.
* In many other shells printf is already a builtin.
Side effect: printf is now always the builtin version (which behaves
identically to /usr/bin/printf) and cannot be overridden via PATH (except
via the undocumented %builtin mechanism).
Code size increases about 5K on i386. Embedded folks might want to replace
/usr/bin/printf with a hard link to /usr/bin/alias.
The information in sh(1) about the echo builtin is equivalent, though less
extensive.
The echo(1) man page (bin/echo/echo.1) is different.
Unfortunately, sh's echo builtin and /bin/echo have gone out of sync and
this probably cannot be fixed any more.
Reported by: uqs (list of untouched files)
MFC after: 1 week
In particular, remove the text about ksh-like features, which are usually
taken for granted nowadays. The original Bourne shell is fading away and for
most users our /bin/sh is one of the most minimalistic they know.
Convert the tests to the perl prove format.
Remove obsolete TEST.README (results of an old TEST.sh for some old Unices)
and TEST.csh (old tests without correct values, far less complete than
TEST.sh).
MFC after: 1 week
This moves the function of the noaliases variable into the checkkwd
variable. This way it is properly reset on errors and aliases can be used
normally in the commands for each case (the case labels recognize the
keyword esac but no aliases).
The new code is clearer as well.
Obtained from: dash
I've noticed various terminal emulators that need to obtain a sane
default termios structure use very complex `hacks'. Even though POSIX
doesn't provide any functionality for this, extend our termios API with
cfmakesane(3), which is similar to the commonly supported cfmakeraw(3),
except that it fills the termios structure with sane defaults.
Change all code in our base system to use this function, instead of
depending on <sys/ttydefaults.h> to provide TTYDEF_*.
These do something else in ksh: name=(...) is an array or compound variable
assignment and the others are extended patterns.
This is the last patch of the ones tested in the exp run.
Exp-run done by: pav (with some other sh(1) changes)
Apart from detecting breakage earlier or at all, this also fixes a segfault
in the testsuite. The "handling" of the breakage left an invalid internal
representation in some cases.
Examples:
echo a; do echo b
echo `) echo a`
echo `date; do do do`
Exp-run done by: pav (with some other sh(1) changes)
subevalvar() incorrectly assumed that CTLESC bytes were present iff the
expansion was quoted. However, they are present iff various processing such
as word splitting is to be done later on.
Example:
v=@$e@$e@$e@
y="${v##*"$e"}"
echo "$y"
failed if $e contained the magic CTLESC byte.
Exp-run done by: pav (with some other sh(1) changes)
The code is inspired by NetBSD sh somewhat, but different because we
preserve the old Almquist/Bourne/Korn ability to have an unquoted part in a
quoted ${v+word}. For example, "${v-"*"}" expands to $v as a single field if
v is set, but generates filenames otherwise.
Note that this is the only place where we split text literally from the
script (the similar ${v=word} assigns to v and then expands $v). The parser
must now add additional markers to allow the expansion code to know whether
arbitrary characters in substitutions are quoted.
Example:
for i in ${$+a b c}; do echo $i; done
Exp-run done by: pav (with some other sh(1) changes)
If double-quote state does not match, treat the '}' literally.
This ensures double-quote state remains the same before and after a
${v+-=?...} which helps with expand.c.
It makes things like
${foo+"\${bar}"}
which I have seen in the wild work as expected.
Exp-run done by: pav (with some other sh(1) changes)
This is a syntax error.
POSIX does not say explicitly whether defining a function with the same name
as a special builtin is allowed, but it does say that it is impossible to
call such a function.
A special builtin can still be overridden with an alias.
This commit is part of a set of changes that will ensure that when
something looks like a special builtin to the parser, it is one. (Not the
other way around, as it remains possible to call a special builtin named
by a variable or other substitution.)
Exp-run done by: pav (with some other sh(1) changes)
Add some conservative checks on function names:
- Disallow expansions or quoting characters; these can only be called via
strange control characters
- Disallow '/'; these functions cannot be called anyway, as exec.c assumes
they are pathnames
- Make the CTL* bytes work properly in function names.
These are syntax errors.
POSIX does not require us to support more than names (letters, digits and
underscores, not starting with a digit), but I do not want to restrict it
that much at this time.
Exp-run done by: pav (with some other sh(1) changes)
This is how ksh93 treats ! within a pipeline and makes the ! in
a | ! b | c
negate the exit status of the pipeline, as if it were
a | { ! b | c; }
Side effect: something like
f() ! a
is now a syntax error, because a function definition takes a command,
not a pipeline.
Exp-run done by: pav (with some other sh(1) changes)
For multi-command pipelines,
1. all commands are direct children of the shell (unlike the original
Bourne shell)
2. all commands are executed in a subshell (unlike the real Korn shell)
MFC after: 1 week