unifdef(1): Improve worst-case bound on symbol resolution
Use RB_TREE to make some algorithms O(lg N) and O(N lg N) instead of O(N)
and O(N^2).
While here, remove arbitrarily limit on number of macros understood.
Reverts r354877 and r354878, which disabled the (correct) test.
PR: 242095
Reported by: lwhsu
Use RB_TREE to make some algorithms O(lg N) and O(N lg N) instead of O(N)
and O(N^2). Because N is typically small and the former linear array also has
great constant factors (as a property of CPU caching), this doesn't provide
material benefit most or all of the time.
While here, remove arbitrarily limit on number of macros understood.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
The most notable new feature is support for processing multiple
files in one invocation. There is also support for more make-friendly
exit statuses.
The most notable bug fix is #line directives now include the input
file name.
Obtained from: http://dotat.at/prog/unifdef
Only significant change is to fix a bu when the output file overwrites
the input when the input is redirected.
Obtained from: http://dotat.at/prog/unifdef
Fix a long-standing cpp compatibility bug: The -DFOO argument
(without an explicit value) should define FOO to 1 not to the empty
string.
Add support for CRLF newlines, based on a suggestion from Mark Rushakoff.
Obtained from: http://dotat.at/prog/unifdef/
file can safely be the same as the input file. Idea from IRIX unifdef(1).
This version fixes a bug in the NetBSD unifdef which refuses to
write to a -o outfile which does not exist.
Obtained from: NetBSD
If the number of nested #if blocks exceeds 64, nest() increments
the nesting depth and then reports an error. The message includes
the line number for the start of the current #if block, which is
read from past the end of the relevant array.
Avoid the out-of-bounds read by reporting the error and exiting
before the nesting depth has a chance to increase.
Submitted by: Jonathan Nieder <jrnieder@gmail.com>
Main highlights:
(A) The new -B option compresses blank lines around a deleted section
so that blank lines around "paragraphs" of code don't get doubled.
(B) Lenient evaluation of && and || so that #if expressions can be
evaluated even when some of their sub-expressions cannot be.
(C) The evaluator can now handle macros with arguments.
(D) Portability fixes, especially for unifdefall.
Contributions from:
Ben Hutchings at Solarflare Communications (A and B)
Anders H Kaseorg <andersk@mit.edu> (A and C)
Jonathan Nieder <jrnieder@gmail.com> (D)
Obtained from: http://dotat.at/prog/unifdef/
caused by files that have #endif and no newline on the last line
(reported by joe@). Also fix a benign uninitialized variable bug.
Update and tidy the copyright.
Allow the user to run unifdef without defining any symbols. This is
useful in conjunction with the -k flag.
Fix a bug in the -s handling code that would have caused out-of-bounds
array accesses.
Add a -n option to insert #line directives in the output.
Ignore comment markers inside string and character literals
(bug reported by Amos Shapira <amos.shapira@netregistry.com.au>).
More accurate copyright notices.
Fix the usage synopsis.
Amend the copyright notice to reflect the fact that there's no Berkeley
code left.
Fix a typo in a comment, improve the descriptions of the way we use
some global variables (relevant to the bug below), and note that
division-by-zero has side effects so the current expression evaluator
can't be trivially extended to arithmetic in its current design.
Avoid hitting an abort(); /* bug */ when in "text mode" (i.e.
ignoring comment state) by updating the line parser state properly.
PR: 53907
* Be less strict about multi-line preprocessor directives (e.g. those
with comments hanging off the right-hand end) since they're more
of a problem in practise than I expected. Prompted by phk.
* Fix the handling of "ignore" symbols.
* Style pedantry from OpenBSD and Ted Unangst <tedu@stanford.edu>,
including some whitespace fixes and removal of strcpy()
(and not including excessively strict KNF enforcement).
* Fix some typos and terminological inconsistencies.
* The partial-evaluation of #elif sequences was broken and the
spaghetti logic of its implementation was too hard to understand.
I've re-done it using a straight-forward table-driven push-down
automaton.
* The pre-processor line parser did not allow for all of the weird
places that people might put comments, which could have caused it
to add syntax-errors to the output by removing a #if line containing
the start- or end-marker of a comment.
* The lexer didn't need to special-case the handling of string-literals
or character-constants, but it did need to learn about line-continuations
(backslash-newline).
* The input routine was buggy and bit-rotten and trivially replacable
with fgets(). I've also made the program static- and const-safe and
improved the presentation-order. The formatting of the state-transition
tables remains non-stylish.
This commit-messsage was brought to you by code-point 45.
MFC-after: one-week
constant controlling expressions: in particular, removing #if 0 sections
is considered "rude". This commit changes the default so that such
things are passed through unchanged, and the old behaviour can be had
with the -k "kill konsts" flag.
Suggested by: markm
MFC after: 3 weeks
* Ensure we work within the array bounds when parsing command-line options;
* Replace h0h0getopt with getopt(3);
* Use consistent whitespace style in the function declarations.
Revieweded by: dwmalone (mentor)
* It now knows about the existence of #elif which would have
caused it to produce incorrect results in some situations.
* It can now process #if and #elif lines according to the
values of symbols that are specified on the command line.
The expression parser is only a simple subset of what C
allows but it should be sufficient for most real-world
code (it can cope with everything it finds in xterm).
* It has an option for printing all of the symbols that might
control #if processing. The unifdefall script uses this
option along with cpp -dM to strip all #ifs from a file.
* It has much larger static limits.
* It handles nested #ifs much more completely.
There have also been many style improvements: KNF; ANSI function
definitions; all global stuff moved to the top of the file; use
stdbool instead of h0h0bool; const-correctness; err(3) instead
of fprintf(stderr, ...); enum instead of #define; commentary.
I used NetBSD's unifdef as the basis of this since it has received
the most attention over the years.
PR: 37454
Reviewed by: markm, dwmalone
Approved by: dwmalone (mentor)
MFC after: 3 weeks