From 39b86a6a6f48aa1365855593fb99578e8d62ac36 Mon Sep 17 00:00:00 2001 From: Hans Petter Selasky Date: Wed, 25 Feb 2015 21:10:03 +0000 Subject: [PATCH] Update to upstream version 2.10 The most notable new feature is support for definition files. Obtained from: http://dotat.at/prog/unifdef MFC after: 1 week --- usr.bin/unifdef/unifdef.1 | 129 +++++--- usr.bin/unifdef/unifdef.c | 560 +++++++++++++++++++++++----------- usr.bin/unifdef/unifdef.h | 4 +- usr.bin/unifdef/unifdefall.sh | 34 +-- 4 files changed, 486 insertions(+), 241 deletions(-) diff --git a/usr.bin/unifdef/unifdef.1 b/usr.bin/unifdef/unifdef.1 index f09a2448a05f..24a9205d8a76 100644 --- a/usr.bin/unifdef/unifdef.1 +++ b/usr.bin/unifdef/unifdef.1 @@ -31,7 +31,7 @@ .\" .\" $FreeBSD$ .\" -.Dd February 21, 2012 +.Dd January 7, 2014 .Dt UNIFDEF 1 PRM .Os " " .Sh NAME @@ -44,6 +44,7 @@ .Op Fl [i]D Ns Ar sym Ns Op = Ns Ar val .Op Fl [i]U Ns Ar sym .Ar ... +.Op Fl f Ar defile .Op Fl x Bro Ar 012 Brc .Op Fl M Ar backext .Op Fl o Ar outfile @@ -70,11 +71,17 @@ utility acts on .Ic #elif , #else , and .Ic #endif -lines. -A directive is only processed -if the symbols specified on the command line are sufficient to allow -.Nm -to get a definite value for its control expression. +lines, +using macros specified in +.Fl D +and +.Fl U +command line options or in +.Fl f +definitions files. +A directive is processed +if the macro specifications are sufficient to provide +a definite value for its control expression. If the result is false, the directive and the following lines under its control are removed. If the result is true, @@ -84,7 +91,7 @@ An or .Ic #ifndef directive is passed through unchanged -if its controlling symbol is not specified on the command line. +if its controlling macro is not specified. Any .Ic #if or @@ -110,7 +117,7 @@ and .Ic #elif lines: integer constants, -integer values of symbols defined on the command line, +integer values of macros defined on the command line, the .Fn defined operator, @@ -131,16 +138,42 @@ if either operand of .Ic || is definitely true then the result is true. .Pp -In most cases, the +When evaluating an expression, .Nm -utility does not distinguish between object-like macros -(without arguments) and function-like arguments (with arguments). -If a macro is not explicitly defined, or is defined with the +does not expand macros first. +The value of a macro must be a simple number, +not an expression. +A limited form of indirection is allowed, +where one macro's value is the name of another. +.Pp +In most cases, +.Nm +does not distinguish between object-like macros +(without arguments) and function-like macros (with arguments). +A function-like macro invocation can appear in +.Ic #if +and +.Ic #elif +control expressions. +If the macro is not explicitly defined, +or is defined with the .Fl D -flag on the command-line, its arguments are ignored. +flag on the command-line, +or with +.Ic #define +in a +.Fl f +definitions file, +its arguments are ignored. If a macro is explicitly undefined on the command line with the .Fl U -flag, it may not have any arguments since this leads to a syntax error. +flag, +or with +.Ic #undef +in a +.Fl f +definitions file, +it may not have any arguments since this leads to a syntax error. .Pp The .Nm @@ -161,7 +194,7 @@ It uses .Nm Fl s and .Nm cpp Fl dM -to get lists of all the controlling symbols +to get lists of all the controlling macros and their definitions (or lack thereof), then invokes .Nm @@ -169,19 +202,15 @@ with appropriate arguments to process the file. .Sh OPTIONS .Bl -tag -width indent -compact .It Fl D Ns Ar sym Ns = Ns Ar val -Specify that a symbol is defined to a given value -which is used when evaluating -.Ic #if -and -.Ic #elif -control expressions. +Specify that a macro is defined to a given value. .Pp .It Fl D Ns Ar sym -Specify that a symbol is defined to the value 1. +Specify that a macro is defined to the value 1. .Pp .It Fl U Ns Ar sym -Specify that a symbol is undefined. -If the same symbol appears in more than one argument, +Specify that a macro is undefined. +.Pp +If the same macro appears in more than one argument, the last occurrence dominates. .Pp .It Fl iD Ns Ar sym Ns Op = Ns Ar val @@ -193,9 +222,37 @@ are ignored within and .Ic #ifndef blocks -controlled by symbols +controlled by macros specified with these options. .Pp +.It Fl f Ar defile +The file +.Ar defile +contains +.Ic #define +and +.Ic #undef +preprocessor directives, +which have the same effect as the corresponding +.Fl D +and +.Fl U +command-line arguments. +You can have multiple +.Fl f +arguments and mix them with +.Fl D +and +.Fl U +arguments; +later options override earlier ones. +.Pp +Each directive must be on a single line. +Object-like macro definitions (without arguments) +are set to the given value. +Function-like macro definitions (with arguments) +are treated as if they are set to 1. +.Pp .It Fl b Replace removed lines with blank lines instead of deleting them. @@ -291,24 +348,15 @@ instead of the standard output when processing a single file. Instead of processing an input file as usual, this option causes .Nm -to produce a list of symbols that appear in expressions -that -.Nm -understands. -It is useful in conjunction with the -.Fl dM -option of -.Xr cpp 1 -for creating -.Nm -command lines. +to produce a list of macros that are used in +preprocessor directive controlling expressions. .Pp .It Fl S Like the .Fl s -option, but the nesting depth of each symbol is also printed. +option, but the nesting depth of each macro is also printed. This is useful for working out the number of possible combinations -of interdependent defined/undefined symbols. +of interdependent defined/undefined macros. .Pp .It Fl t Disables parsing for C strings, comments, @@ -406,6 +454,9 @@ in comment. .Sh SEE ALSO .Xr cpp 1 , .Xr diff 1 +.Pp +The unifdef home page is +.Pa http://dotat.at/prog/unifdef .Sh HISTORY The .Nm @@ -431,7 +482,7 @@ cannot be handled in every situation. .Pp Trigraphs are not recognized. .Pp -There is no support for symbols with different definitions at +There is no support for macros with different definitions at different points in the source file. .Pp The text-mode and ignore functionality does not correspond to modern diff --git a/usr.bin/unifdef/unifdef.c b/usr.bin/unifdef/unifdef.c index 22f81efa00be..fd378a14c1eb 100644 --- a/usr.bin/unifdef/unifdef.c +++ b/usr.bin/unifdef/unifdef.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2002 - 2013 Tony Finch + * Copyright (c) 2002 - 2014 Tony Finch * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -46,7 +46,7 @@ #include "unifdef.h" static const char copyright[] = - "@(#) $Version: unifdef-2.7 $\n" + "@(#) $Version: unifdef-2.10 $\n" "@(#) $FreeBSD$\n" "@(#) $Author: Tony Finch (dot@dotat.at) $\n" "@(#) $URL: http://dotat.at/prog/unifdef $\n" @@ -138,7 +138,7 @@ static char const * const linestate_name[] = { */ #define MAXDEPTH 64 /* maximum #if nesting */ #define MAXLINE 4096 /* maximum length of line */ -#define MAXSYMS 4096 /* maximum number of symbols */ +#define MAXSYMS 16384 /* maximum number of symbols */ /* * Sometimes when editing a keyword the replacement text is longer, so @@ -180,6 +180,15 @@ static char *tempname; /* avoid splatting input */ static char tline[MAXLINE+EDITSLOP];/* input buffer plus space */ static char *keyword; /* used for editing #elif's */ +/* + * When processing a file, the output's newline style will match the + * input's, and unifdef correctly handles CRLF or LF endings whatever + * the platform's native style. The stdio streams are opened in binary + * mode to accommodate platforms whose native newline style is CRLF. + * When the output isn't a processed input file (when it is error / + * debug / diagnostic messages) then unifdef uses native line endings. + */ + static const char *newline; /* input file format */ static const char newline_unix[] = "\n"; static const char newline_crlf[] = "\r\n"; @@ -200,33 +209,41 @@ static bool firstsym; /* ditto */ static int exitmode; /* exit status mode */ static int exitstat; /* program exit status */ -static void addsym(bool, bool, char *); +static void addsym1(bool, bool, char *); +static void addsym2(bool, const char *, const char *); static char *astrcat(const char *, const char *); static void cleantemp(void); -static void closeout(void); +static void closeio(void); static void debug(const char *, ...); +static void debugsym(const char *, int); +static bool defundef(void); +static void defundefile(const char *); static void done(void); static void error(const char *); -static int findsym(const char *); +static int findsym(const char **); static void flushline(bool); static void hashline(void); static void help(void); static Linetype ifeval(const char **); static void ignoreoff(void); static void ignoreon(void); +static void indirectsym(void); static void keywordedit(const char *); +static const char *matchsym(const char *, const char *); static void nest(void); static Linetype parseline(void); static void process(void); static void processinout(const char *, const char *); static const char *skipargs(const char *); static const char *skipcomment(const char *); +static const char *skiphash(void); +static const char *skipline(const char *); static const char *skipsym(const char *); static void state(Ifstate); -static int strlcmp(const char *, const char *, size_t); static void unnest(void); static void usage(void); static void version(void); +static const char *xstrdup(const char *, const char *); #define endsym(c) (!isalnum((unsigned char)c) && c != '_') @@ -238,7 +255,7 @@ main(int argc, char *argv[]) { int opt; - while ((opt = getopt(argc, argv, "i:D:U:I:M:o:x:bBcdehKklmnsStV")) != -1) + while ((opt = getopt(argc, argv, "i:D:U:f:I:M:o:x:bBcdehKklmnsStV")) != -1) switch (opt) { case 'i': /* treat stuff controlled by these symbols as text */ /* @@ -248,17 +265,17 @@ main(int argc, char *argv[]) */ opt = *optarg++; if (opt == 'D') - addsym(true, true, optarg); + addsym1(true, true, optarg); else if (opt == 'U') - addsym(true, false, optarg); + addsym1(true, false, optarg); else usage(); break; case 'D': /* define a symbol */ - addsym(false, true, optarg); + addsym1(false, true, optarg); break; case 'U': /* undef a symbol */ - addsym(false, false, optarg); + addsym1(false, false, optarg); break; case 'I': /* no-op for compatibility with cpp */ break; @@ -278,6 +295,9 @@ main(int argc, char *argv[]) case 'e': /* fewer errors from dodgy lines */ iocccok = true; break; + case 'f': /* definitions file */ + defundefile(optarg); + break; case 'h': help(); break; @@ -334,6 +354,7 @@ main(int argc, char *argv[]) argc = 1; if (argc == 1 && !inplace && ofilename == NULL) ofilename = "-"; + indirectsym(); atexit(cleantemp); if (ofilename != NULL) @@ -392,10 +413,10 @@ processinout(const char *ifn, const char *ofn) if (backext != NULL) { char *backname = astrcat(ofn, backext); if (rename(ofn, backname) < 0) - err(2, "can't rename \"%s\" to \"%s%s\"", ofn, ofn, backext); + err(2, "can't rename \"%s\" to \"%s\"", ofn, backname); free(backname); } - if (rename(tempname, ofn) < 0) + if (replace(tempname, ofn) < 0) err(2, "can't rename \"%s\" to \"%s\"", tempname, ofn); free(tempname); tempname = NULL; @@ -434,7 +455,7 @@ synopsis(FILE *fp) { fprintf(fp, "usage: unifdef [-bBcdehKkmnsStV] [-x{012}] [-Mext] [-opath] \\\n" - " [-[i]Dsym[=val]] [-[i]Usym] ... [file] ...\n"); + " [-[i]Dsym[=val]] [-[i]Usym] [-fpath] ... [file] ...\n"); } static void @@ -455,6 +476,7 @@ help(void) " -iDsym=val \\ ignore C strings and comments\n" " -iDsym ) in sections controlled by these\n" " -iUsym / preprocessor symbols\n" + " -fpath file containing #define and #undef directives\n" " -b blank lines instead of deleting them\n" " -B compress blank lines around deleted section\n" " -c complement (invert) keep vs. delete\n" @@ -650,12 +672,12 @@ done(void) { if (incomment) error("EOF in comment"); - closeout(); + closeio(); } /* * Write a line to the output or not, according to command line options. - * If writing fails, closeout() will print the error and exit. + * If writing fails, closeio() will print the error and exit. */ static void flushline(bool keep) @@ -671,19 +693,19 @@ flushline(bool keep) if (lnnum && delcount > 0) hashline(); if (fputs(tline, output) == EOF) - closeout(); + closeio(); delcount = 0; blankmax = blankcount = blankline ? blankcount + 1 : 0; } } else { if (lnblank && fputs(newline, output) == EOF) - closeout(); + closeio(); exitstat = 1; delcount += 1; blankcount = 0; } if (debugging && fflush(output) == EOF) - closeout(); + closeio(); } /* @@ -700,20 +722,21 @@ hashline(void) e = fprintf(output, "#line %d \"%s\"%s", linenum, linefile, newline); if (e < 0) - closeout(); + closeio(); } /* * Flush the output and handle errors. */ static void -closeout(void) +closeio(void) { /* Tidy up after findsym(). */ if (symdepth && !zerosyms) printf("\n"); - if (ferror(output) || fclose(output) == EOF) - err(2, "%s: can't write to output", filename); + if (output != NULL && (ferror(output) || fclose(output) == EOF)) + err(2, "%s: can't write to output", filename); + fclose(input); } /* @@ -727,6 +750,8 @@ process(void) is preceded by a large number of blank lines. */ blankmax = blankcount = 1000; zerosyms = true; + newline = NULL; + linenum = 0; while (lineval != LT_EOF) { lineval = parseline(); trans_table[ifstate[depth]][lineval](); @@ -746,105 +771,86 @@ parseline(void) { const char *cp; int cursym; - int kwlen; Linetype retval; Comment_state wascomment; - linenum++; - if (fgets(tline, MAXLINE, input) == NULL) { - if (ferror(input)) - err(2, "can't read %s", filename); - else - return (LT_EOF); - } + wascomment = incomment; + cp = skiphash(); + if (cp == NULL) + return (LT_EOF); if (newline == NULL) { if (strrchr(tline, '\n') == strrchr(tline, '\r') + 1) newline = newline_crlf; else newline = newline_unix; } - retval = LT_PLAIN; - wascomment = incomment; - cp = skipcomment(tline); - if (linestate == LS_START) { - if (*cp == '#') { - linestate = LS_HASH; - firstsym = true; - cp = skipcomment(cp + 1); - } else if (*cp != '\0') - linestate = LS_DIRTY; + if (*cp == '\0') { + retval = LT_PLAIN; + goto done; } - if (!incomment && linestate == LS_HASH) { - keyword = tline + (cp - tline); - cp = skipsym(cp); - kwlen = cp - keyword; + keyword = tline + (cp - tline); + if ((cp = matchsym("ifdef", keyword)) != NULL || + (cp = matchsym("ifndef", keyword)) != NULL) { + cp = skipcomment(cp); + if ((cursym = findsym(&cp)) < 0) + retval = LT_IF; + else { + retval = (keyword[2] == 'n') + ? LT_FALSE : LT_TRUE; + if (value[cursym] == NULL) + retval = (retval == LT_TRUE) + ? LT_FALSE : LT_TRUE; + if (ignore[cursym]) + retval = (retval == LT_TRUE) + ? LT_TRUEI : LT_FALSEI; + } + } else if ((cp = matchsym("if", keyword)) != NULL) + retval = ifeval(&cp); + else if ((cp = matchsym("elif", keyword)) != NULL) + retval = linetype_if2elif(ifeval(&cp)); + else if ((cp = matchsym("else", keyword)) != NULL) + retval = LT_ELSE; + else if ((cp = matchsym("endif", keyword)) != NULL) + retval = LT_ENDIF; + else { + cp = skipsym(keyword); /* no way can we deal with a continuation inside a keyword */ if (strncmp(cp, "\\\r\n", 3) == 0 || strncmp(cp, "\\\n", 2) == 0) Eioccc(); - if (strlcmp("ifdef", keyword, kwlen) == 0 || - strlcmp("ifndef", keyword, kwlen) == 0) { - cp = skipcomment(cp); - if ((cursym = findsym(cp)) < 0) - retval = LT_IF; - else { - retval = (keyword[2] == 'n') - ? LT_FALSE : LT_TRUE; - if (value[cursym] == NULL) - retval = (retval == LT_TRUE) - ? LT_FALSE : LT_TRUE; - if (ignore[cursym]) - retval = (retval == LT_TRUE) - ? LT_TRUEI : LT_FALSEI; - } - cp = skipsym(cp); - } else if (strlcmp("if", keyword, kwlen) == 0) - retval = ifeval(&cp); - else if (strlcmp("elif", keyword, kwlen) == 0) - retval = linetype_if2elif(ifeval(&cp)); - else if (strlcmp("else", keyword, kwlen) == 0) - retval = LT_ELSE; - else if (strlcmp("endif", keyword, kwlen) == 0) - retval = LT_ENDIF; - else { + cp = skipline(cp); + retval = LT_PLAIN; + goto done; + } + cp = skipcomment(cp); + if (*cp != '\0') { + cp = skipline(cp); + if (retval == LT_TRUE || retval == LT_FALSE || + retval == LT_TRUEI || retval == LT_FALSEI) + retval = LT_IF; + if (retval == LT_ELTRUE || retval == LT_ELFALSE) + retval = LT_ELIF; + } + /* the following can happen if the last line of the file lacks a + newline or if there is too much whitespace in a directive */ + if (linestate == LS_HASH) { + long len = cp - tline; + if (fgets(tline + len, MAXLINE - len, input) == NULL) { + if (ferror(input)) + err(2, "can't read %s", filename); + /* append the missing newline at eof */ + strcpy(tline + len, newline); + cp += strlen(newline); + linestate = LS_START; + } else { linestate = LS_DIRTY; - retval = LT_PLAIN; - } - cp = skipcomment(cp); - if (*cp != '\0') { - linestate = LS_DIRTY; - if (retval == LT_TRUE || retval == LT_FALSE || - retval == LT_TRUEI || retval == LT_FALSEI) - retval = LT_IF; - if (retval == LT_ELTRUE || retval == LT_ELFALSE) - retval = LT_ELIF; - } - if (retval != LT_PLAIN && (wascomment || incomment)) { - retval = linetype_2dodgy(retval); - if (incomment) - linestate = LS_DIRTY; - } - /* skipcomment normally changes the state, except - if the last line of the file lacks a newline, or - if there is too much whitespace in a directive */ - if (linestate == LS_HASH) { - size_t len = cp - tline; - if (fgets(tline + len, MAXLINE - len, input) == NULL) { - if (ferror(input)) - err(2, "can't read %s", filename); - /* append the missing newline at eof */ - strcpy(tline + len, newline); - cp += strlen(newline); - linestate = LS_START; - } else { - linestate = LS_DIRTY; - } } } - if (linestate == LS_DIRTY) { - while (*cp != '\0') - cp = skipcomment(cp + 1); + if (retval != LT_PLAIN && (wascomment || linestate != LS_START)) { + retval = linetype_2dodgy(retval); + linestate = LS_DIRTY; } +done: debug("parser line %d state %s comment %s line", linenum, comment_name[incomment], linestate_name[linestate]); return (retval); @@ -854,34 +860,34 @@ parseline(void) * These are the binary operators that are supported by the expression * evaluator. */ -static Linetype op_strict(int *p, int v, Linetype at, Linetype bt) { +static Linetype op_strict(long *p, long v, Linetype at, Linetype bt) { if(at == LT_IF || bt == LT_IF) return (LT_IF); return (*p = v, v ? LT_TRUE : LT_FALSE); } -static Linetype op_lt(int *p, Linetype at, int a, Linetype bt, int b) { +static Linetype op_lt(long *p, Linetype at, long a, Linetype bt, long b) { return op_strict(p, a < b, at, bt); } -static Linetype op_gt(int *p, Linetype at, int a, Linetype bt, int b) { +static Linetype op_gt(long *p, Linetype at, long a, Linetype bt, long b) { return op_strict(p, a > b, at, bt); } -static Linetype op_le(int *p, Linetype at, int a, Linetype bt, int b) { +static Linetype op_le(long *p, Linetype at, long a, Linetype bt, long b) { return op_strict(p, a <= b, at, bt); } -static Linetype op_ge(int *p, Linetype at, int a, Linetype bt, int b) { +static Linetype op_ge(long *p, Linetype at, long a, Linetype bt, long b) { return op_strict(p, a >= b, at, bt); } -static Linetype op_eq(int *p, Linetype at, int a, Linetype bt, int b) { +static Linetype op_eq(long *p, Linetype at, long a, Linetype bt, long b) { return op_strict(p, a == b, at, bt); } -static Linetype op_ne(int *p, Linetype at, int a, Linetype bt, int b) { +static Linetype op_ne(long *p, Linetype at, long a, Linetype bt, long b) { return op_strict(p, a != b, at, bt); } -static Linetype op_or(int *p, Linetype at, int a, Linetype bt, int b) { +static Linetype op_or(long *p, Linetype at, long a, Linetype bt, long b) { if (!strictlogic && (at == LT_TRUE || bt == LT_TRUE)) return (*p = 1, LT_TRUE); return op_strict(p, a || b, at, bt); } -static Linetype op_and(int *p, Linetype at, int a, Linetype bt, int b) { +static Linetype op_and(long *p, Linetype at, long a, Linetype bt, long b) { if (!strictlogic && (at == LT_FALSE || bt == LT_FALSE)) return (*p = 0, LT_FALSE); return op_strict(p, a && b, at, bt); @@ -899,7 +905,7 @@ static Linetype op_and(int *p, Linetype at, int a, Linetype bt, int b) { */ struct ops; -typedef Linetype eval_fn(const struct ops *, int *, const char **); +typedef Linetype eval_fn(const struct ops *, long *, const char **); static eval_fn eval_table, eval_unary; @@ -912,7 +918,7 @@ static eval_fn eval_table, eval_unary; */ struct op { const char *str; - Linetype (*fn)(int *, Linetype, int, Linetype, int); + Linetype (*fn)(long *, Linetype, long, Linetype, long); }; struct ops { eval_fn *inner; @@ -930,7 +936,7 @@ static const struct ops eval_ops[] = { }; /* Current operator precedence level */ -static int prec(const struct ops *ops) +static long prec(const struct ops *ops) { return (ops - eval_ops); } @@ -941,7 +947,7 @@ static int prec(const struct ops *ops) * We reset the constexpr flag in the last two cases. */ static Linetype -eval_unary(const struct ops *ops, int *valp, const char **cpp) +eval_unary(const struct ops *ops, long *valp, const char **cpp) { const char *cp; char *ep; @@ -975,32 +981,33 @@ eval_unary(const struct ops *ops, int *valp, const char **cpp) if (ep == cp) return (LT_ERROR); lt = *valp ? LT_TRUE : LT_FALSE; - cp = skipsym(cp); - } else if (strncmp(cp, "defined", 7) == 0 && endsym(cp[7])) { + cp = ep; + } else if (matchsym("defined", cp) != NULL) { cp = skipcomment(cp+7); - debug("eval%d defined", prec(ops)); if (*cp == '(') { cp = skipcomment(cp+1); defparen = true; } else { defparen = false; } - sym = findsym(cp); + sym = findsym(&cp); + cp = skipcomment(cp); + if (defparen && *cp++ != ')') { + debug("eval%d defined missing ')'", prec(ops)); + return (LT_ERROR); + } if (sym < 0) { + debug("eval%d defined unknown", prec(ops)); lt = LT_IF; } else { + debug("eval%d defined %s", prec(ops), symname[sym]); *valp = (value[sym] != NULL); lt = *valp ? LT_TRUE : LT_FALSE; } - cp = skipsym(cp); - cp = skipcomment(cp); - if (defparen && *cp++ != ')') - return (LT_ERROR); constexpr = false; } else if (!endsym(*cp)) { debug("eval%d symbol", prec(ops)); - sym = findsym(cp); - cp = skipsym(cp); + sym = findsym(&cp); if (sym < 0) { lt = LT_IF; cp = skipargs(cp); @@ -1029,11 +1036,11 @@ eval_unary(const struct ops *ops, int *valp, const char **cpp) * Table-driven evaluation of binary operators. */ static Linetype -eval_table(const struct ops *ops, int *valp, const char **cpp) +eval_table(const struct ops *ops, long *valp, const char **cpp) { const struct op *op; const char *cp; - int val; + long val; Linetype lt, rt; debug("eval%d", prec(ops)); @@ -1071,7 +1078,7 @@ static Linetype ifeval(const char **cpp) { Linetype ret; - int val = 0; + long val = 0; debug("eval %s", *cpp); constexpr = killconsts ? false : true; @@ -1080,6 +1087,49 @@ ifeval(const char **cpp) return (constexpr ? LT_IF : ret == LT_ERROR ? LT_IF : ret); } +/* + * Read a line and examine its initial part to determine if it is a + * preprocessor directive. Returns NULL on EOF, or a pointer to a + * preprocessor directive name, or a pointer to the zero byte at the + * end of the line. + */ +static const char * +skiphash(void) +{ + const char *cp; + + linenum++; + if (fgets(tline, MAXLINE, input) == NULL) { + if (ferror(input)) + err(2, "can't read %s", filename); + else + return (NULL); + } + cp = skipcomment(tline); + if (linestate == LS_START && *cp == '#') { + linestate = LS_HASH; + return (skipcomment(cp + 1)); + } else if (*cp == '\0') { + return (cp); + } else { + return (skipline(cp)); + } +} + +/* + * Mark a line dirty and consume the rest of it, keeping track of the + * lexical state. + */ +static const char * +skipline(const char *cp) +{ + if (*cp != '\0') + linestate = LS_DIRTY; + while (*cp != '\0') + cp = skipcomment(cp + 1); + return (cp); +} + /* * Skip over comments, strings, and character literals and stop at the * next character position that is not whitespace. Between calls we keep @@ -1232,88 +1282,226 @@ skipsym(const char *cp) return (cp); } +/* + * Skip whitespace and take a copy of any following identifier. + */ +static const char * +getsym(const char **cpp) +{ + const char *cp = *cpp, *sym; + + cp = skipcomment(cp); + cp = skipsym(sym = cp); + if (cp == sym) + return NULL; + *cpp = cp; + return (xstrdup(sym, cp)); +} + +/* + * Check that s (a symbol) matches the start of t, and that the + * following character in t is not a symbol character. Returns a + * pointer to the following character in t if there is a match, + * otherwise NULL. + */ +static const char * +matchsym(const char *s, const char *t) +{ + while (*s != '\0' && *t != '\0') + if (*s != *t) + return (NULL); + else + ++s, ++t; + if (*s == '\0' && endsym(*t)) + return(t); + else + return(NULL); +} + /* * Look for the symbol in the symbol table. If it is found, we return * the symbol table index, else we return -1. */ static int -findsym(const char *str) +findsym(const char **strp) { - const char *cp; + const char *str; int symind; - cp = skipsym(str); - if (cp == str) - return (-1); + str = *strp; + *strp = skipsym(str); if (symlist) { + if (*strp == str) + return (-1); if (symdepth && firstsym) printf("%s%3d", zerosyms ? "" : "\n", depth); firstsym = zerosyms = false; printf("%s%.*s%s", - symdepth ? " " : "", - (int)(cp-str), str, - symdepth ? "" : "\n"); + symdepth ? " " : "", + (int)(*strp-str), str, + symdepth ? "" : "\n"); /* we don't care about the value of the symbol */ return (0); } for (symind = 0; symind < nsyms; ++symind) { - if (strlcmp(symname[symind], str, cp-str) == 0) { - debug("findsym %s %s", symname[symind], - value[symind] ? value[symind] : ""); + if (matchsym(symname[symind], str) != NULL) { + debugsym("findsym", symind); return (symind); } } return (-1); } +/* + * Resolve indirect symbol values to their final definitions. + */ +static void +indirectsym(void) +{ + const char *cp; + int changed, sym, ind; + + do { + changed = 0; + for (sym = 0; sym < nsyms; ++sym) { + if (value[sym] == NULL) + continue; + cp = value[sym]; + ind = findsym(&cp); + if (ind == -1 || ind == sym || + *cp != '\0' || + value[ind] == NULL || + value[ind] == value[sym]) + continue; + debugsym("indir...", sym); + value[sym] = value[ind]; + debugsym("...ectsym", sym); + changed++; + } + } while (changed); +} + +/* + * Add a symbol to the symbol table, specified with the format sym=val + */ +static void +addsym1(bool ignorethis, bool definethis, char *symval) +{ + const char *sym, *val; + + sym = symval; + val = skipsym(sym); + if (definethis && *val == '=') { + symval[val - sym] = '\0'; + val = val + 1; + } else if (*val == '\0') { + val = definethis ? "1" : NULL; + } else { + usage(); + } + addsym2(ignorethis, sym, val); +} + /* * Add a symbol to the symbol table. */ static void -addsym(bool ignorethis, bool definethis, char *sym) +addsym2(bool ignorethis, const char *sym, const char *val) { + const char *cp = sym; int symind; - char *val; - symind = findsym(sym); + symind = findsym(&cp); if (symind < 0) { if (nsyms >= MAXSYMS) errx(2, "too many symbols"); symind = nsyms++; } - symname[symind] = sym; ignore[symind] = ignorethis; - val = sym + (skipsym(sym) - sym); - if (definethis) { - if (*val == '=') { - value[symind] = val+1; - *val = '\0'; - } else if (*val == '\0') - value[symind] = "1"; - else - usage(); - } else { - if (*val != '\0') - usage(); - value[symind] = NULL; - } - debug("addsym %s=%s", symname[symind], + symname[symind] = sym; + value[symind] = val; + debugsym("addsym", symind); +} + +static void +debugsym(const char *why, int symind) +{ + debug("%s %s%c%s", why, symname[symind], + value[symind] ? '=' : ' ', value[symind] ? value[symind] : "undef"); } /* - * Compare s with n characters of t. - * The same as strncmp() except that it checks that s[n] == '\0'. + * Add symbols to the symbol table from a file containing + * #define and #undef preprocessor directives. */ -static int -strlcmp(const char *s, const char *t, size_t n) +static void +defundefile(const char *fn) { - while (n-- && *t != '\0') - if (*s != *t) - return ((unsigned char)*s - (unsigned char)*t); - else - ++s, ++t; - return ((unsigned char)*s); + filename = fn; + input = fopen(fn, "rb"); + if (input == NULL) + err(2, "can't open %s", fn); + linenum = 0; + while (defundef()) + ; + if (ferror(input)) + err(2, "can't read %s", filename); + else + fclose(input); + if (incomment) + error("EOF in comment"); +} + +/* + * Read and process one #define or #undef directive + */ +static bool +defundef(void) +{ + const char *cp, *kw, *sym, *val, *end; + + cp = skiphash(); + if (cp == NULL) + return (false); + if (*cp == '\0') + goto done; + /* strip trailing whitespace, and do a fairly rough check to + avoid unsupported multi-line preprocessor directives */ + end = cp + strlen(cp); + while (end > tline && strchr(" \t\n\r", end[-1]) != NULL) + --end; + if (end > tline && end[-1] == '\\') + Eioccc(); + + kw = cp; + if ((cp = matchsym("define", kw)) != NULL) { + sym = getsym(&cp); + if (sym == NULL) + error("missing macro name in #define"); + if (*cp == '(') { + val = "1"; + } else { + cp = skipcomment(cp); + val = (cp < end) ? xstrdup(cp, end) : ""; + } + debug("#define"); + addsym2(false, sym, val); + } else if ((cp = matchsym("undef", kw)) != NULL) { + sym = getsym(&cp); + if (sym == NULL) + error("missing macro name in #undef"); + cp = skipcomment(cp); + debug("#undef"); + addsym2(false, sym, NULL); + } else { + error("unrecognized preprocessor directive"); + } + skipline(cp); +done: + debug("parser line %d state %s comment %s line", linenum, + comment_name[incomment], linestate_name[linestate]); + return (true); } /* @@ -1324,12 +1512,34 @@ astrcat(const char *s1, const char *s2) { char *s; int len; + size_t size; - len = 1 + snprintf(NULL, 0, "%s%s", s1, s2); - s = (char *)malloc(len); + len = snprintf(NULL, 0, "%s%s", s1, s2); + if (len < 0) + err(2, "snprintf"); + size = (size_t)len + 1; + s = (char *)malloc(size); if (s == NULL) err(2, "malloc"); - snprintf(s, len, "%s%s", s1, s2); + snprintf(s, size, "%s%s", s1, s2); + return (s); +} + +/* + * Duplicate a segment of a string, checking for failure. + */ +static const char * +xstrdup(const char *start, const char *end) +{ + size_t n; + char *s; + + if (end < start) abort(); /* bug */ + n = (size_t)(end - start) + 1; + s = malloc(n); + if (s == NULL) + err(2, "malloc"); + snprintf(s, n, "%s", start); return (s); } @@ -1356,6 +1566,6 @@ error(const char *msg) else warnx("%s: %d: %s (#if line %d depth %d)", filename, linenum, msg, stifline[depth], depth); - closeout(); + closeio(); errx(2, "output may be truncated"); } diff --git a/usr.bin/unifdef/unifdef.h b/usr.bin/unifdef/unifdef.h index 8837b68bf6fd..e2e0bd8f3b17 100644 --- a/usr.bin/unifdef/unifdef.h +++ b/usr.bin/unifdef/unifdef.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2012 Tony Finch + * Copyright (c) 2012 - 2013 Tony Finch * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -40,6 +40,8 @@ #define fbinmode(fp) (fp) +#define replace(old,new) rename(old,new) + static FILE * mktempmode(char *tmp, int mode) { diff --git a/usr.bin/unifdef/unifdefall.sh b/usr.bin/unifdef/unifdefall.sh index 6c8614b53932..6abf60b6e0e6 100644 --- a/usr.bin/unifdef/unifdefall.sh +++ b/usr.bin/unifdef/unifdefall.sh @@ -47,32 +47,14 @@ trap 'rm -r "$tmp" || exit 2' EXIT export LC_ALL=C -# list of all controlling macros -"$unifdef" $debug -s "$@" | sort | uniq >"$tmp/ctrl" +# list of all controlling macros; assume these are undefined +"$unifdef" $debug -s "$@" | sort -u | sed 's/^/#undef /' >"$tmp/undefs" # list of all macro definitions -cpp -dM "$@" | sort | sed 's/^#define //' >"$tmp/hashdefs" -# list of defined macro names -sed 's/[^A-Za-z0-9_].*$//' <"$tmp/hashdefs" >"$tmp/alldef" -# list of undefined and defined controlling macros -comm -23 "$tmp/ctrl" "$tmp/alldef" >"$tmp/undef" -comm -12 "$tmp/ctrl" "$tmp/alldef" >"$tmp/def" -# create a sed script that extracts the controlling macro definitions -# and converts them to unifdef command-line arguments -sed 's|.*|s/^&\\(([^)]*)\\)\\{0,1\\} /-D&=/p|' <"$tmp/def" >"$tmp/script" -# create the final unifdef command -{ echo "$unifdef" $debug -k '\' - # convert the controlling undefined macros to -U arguments - sed 's/.*/-U& \\/' <"$tmp/undef" - # convert the controlling defined macros to quoted -D arguments - sed -nf "$tmp/script" <"$tmp/hashdefs" | - sed "s/'/'\\\\''/g;s/.*/'&' \\\\/" - echo '"$@"' -} >"$tmp/cmd" +cc -E -dM "$@" | sort >"$tmp/defs" + case $debug in --d) for i in ctrl hashdefs alldef undef def script cmd - do echo ==== $i - cat "$tmp/$i" - done 1>&2 +-d) cat "$tmp/undefs" "$tmp/defs" 1>&2 esac -# run the command we just created -sh "$tmp/cmd" "$@" + +# order of -f arguments means definitions override undefs +"$unifdef" $debug -k -f "$tmp/undefs" -f "$tmp/defs" "$@"