16dd3b375c
- Add xo_format_is_numeric() with improved logic to decide if format
strings are numeric, so json output quotes them
- Convert docs to sphinx/rst
- update tests
Includes fix for PR 221676:
27d3021cc3 (diff-5a0d468963477f7daedb8308c219dd80)
PR: 221676
MFC after: 5 days
371 lines
13 KiB
ReStructuredText
371 lines
13 KiB
ReStructuredText
|
|
.. index:: Field Formatting
|
|
|
|
Field Formatting
|
|
----------------
|
|
|
|
The field format is similar to the format string for printf(3). Its
|
|
use varies based on the role of the field, but generally is used to
|
|
format the field's contents.
|
|
|
|
If the format string is not provided for a value field, it defaults to
|
|
"%s".
|
|
|
|
Note a field definition can contain zero or more printf-style
|
|
'directives', which are sequences that start with a '%' and end with
|
|
one of following characters: "diouxXDOUeEfFgGaAcCsSp". Each directive
|
|
is matched by one of more arguments to the xo_emit function.
|
|
|
|
The format string has the form::
|
|
|
|
'%' format-modifier * format-character
|
|
|
|
The format-modifier can be:
|
|
|
|
- a '#' character, indicating the output value should be prefixed
|
|
with '0x', typically to indicate a base 16 (hex) value.
|
|
- a minus sign ('-'), indicating the output value should be padded on
|
|
the right instead of the left.
|
|
- a leading zero ('0') indicating the output value should be padded on the
|
|
left with zeroes instead of spaces (' ').
|
|
- one or more digits ('0' - '9') indicating the minimum width of the
|
|
argument. If the width in columns of the output value is less than
|
|
the minimum width, the value will be padded to reach the minimum.
|
|
- a period followed by one or more digits indicating the maximum
|
|
number of bytes which will be examined for a string argument, or the maximum
|
|
width for a non-string argument. When handling ASCII strings this
|
|
functions as the field width but for multi-byte characters, a single
|
|
character may be composed of multiple bytes.
|
|
xo_emit will never dereference memory beyond the given number of bytes.
|
|
- a second period followed by one or more digits indicating the maximum
|
|
width for a string argument. This modifier cannot be given for non-string
|
|
arguments.
|
|
- one or more 'h' characters, indicating shorter input data.
|
|
- one or more 'l' characters, indicating longer input data.
|
|
- a 'z' character, indicating a 'size_t' argument.
|
|
- a 't' character, indicating a 'ptrdiff_t' argument.
|
|
- a ' ' character, indicating a space should be emitted before
|
|
positive numbers.
|
|
- a '+' character, indicating sign should emitted before any number.
|
|
|
|
Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be
|
|
removed eventually.
|
|
|
|
The format character is described in the following table:
|
|
|
|
===== ================= ======================
|
|
Ltr Argument Type Format
|
|
===== ================= ======================
|
|
d int base 10 (decimal)
|
|
i int base 10 (decimal)
|
|
o int base 8 (octal)
|
|
u unsigned base 10 (decimal)
|
|
x unsigned base 16 (hex)
|
|
X unsigned long base 16 (hex)
|
|
D long base 10 (decimal)
|
|
O unsigned long base 8 (octal)
|
|
U unsigned long base 10 (decimal)
|
|
e double [-]d.ddde+-dd
|
|
E double [-]d.dddE+-dd
|
|
f double [-]ddd.ddd
|
|
F double [-]ddd.ddd
|
|
g double as 'e' or 'f'
|
|
G double as 'E' or 'F'
|
|
a double [-]0xh.hhhp[+-]d
|
|
A double [-]0Xh.hhhp[+-]d
|
|
c unsigned char a character
|
|
C wint_t a character
|
|
s char \* a UTF-8 string
|
|
S wchar_t \* a unicode/WCS string
|
|
p void \* '%#lx'
|
|
===== ================= ======================
|
|
|
|
The 'h' and 'l' modifiers affect the size and treatment of the
|
|
argument:
|
|
|
|
===== ============= ====================
|
|
Mod d, i o, u, x, X
|
|
===== ============= ====================
|
|
hh signed char unsigned char
|
|
h short unsigned short
|
|
l long unsigned long
|
|
ll long long unsigned long long
|
|
j intmax_t uintmax_t
|
|
t ptrdiff_t ptrdiff_t
|
|
z size_t size_t
|
|
q quad_t u_quad_t
|
|
===== ============= ====================
|
|
|
|
.. index:: UTF-8
|
|
.. index:: Locale
|
|
|
|
.. _utf-8:
|
|
|
|
UTF-8 and Locale Strings
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
For strings, the 'h' and 'l' modifiers affect the interpretation of
|
|
the bytes pointed to argument. The default '%s' string is a 'char \*'
|
|
pointer to a string encoded as UTF-8. Since UTF-8 is compatible with
|
|
ASCII data, a normal 7-bit ASCII string can be used. '%ls' expects a
|
|
'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit
|
|
Unicode values. '%hs' expects a 'char \*' pointer to a multi-byte
|
|
string encoded with the current locale, as given by the LC_CTYPE,
|
|
LANG, or LC_ALL environment varibles. The first of this list of
|
|
variables is used and if none of the variables are set, the locale
|
|
defaults to "UTF-8".
|
|
|
|
libxo will convert these arguments as needed to either UTF-8 (for XML,
|
|
JSON, and HTML styles) or locale-based strings for display in text
|
|
style::
|
|
|
|
xo_emit("All strings are utf-8 content {:tag/%ls}",
|
|
L"except for wide strings");
|
|
|
|
======== ================== ===============================
|
|
Format Argument Type Argument Contents
|
|
======== ================== ===============================
|
|
%s const char \* UTF-8 string
|
|
%S const char \* UTF-8 string (alias for '%ls')
|
|
%ls const wchar_t \* Wide character UNICODE string
|
|
%hs const char * locale-based string
|
|
======== ================== ===============================
|
|
|
|
.. admonition:: "Long", not "locale"
|
|
|
|
The "*l*" in "%ls" is for "*long*", following the convention of "%ld".
|
|
It is not "*locale*", a common mis-mnemonic. "%S" is equivalent to
|
|
"%ls".
|
|
|
|
For example, the following function is passed a locale-base name, a
|
|
hat size, and a time value. The hat size is formatted in a UTF-8
|
|
(ASCII) string, and the time value is formatted into a wchar_t
|
|
string::
|
|
|
|
void print_order (const char *name, int size,
|
|
struct tm *timep) {
|
|
char buf[32];
|
|
const char *size_val = "unknown";
|
|
|
|
if (size > 0)
|
|
snprintf(buf, sizeof(buf), "%d", size);
|
|
size_val = buf;
|
|
}
|
|
|
|
wchar_t when[32];
|
|
wcsftime(when, sizeof(when), L"%d%b%y", timep);
|
|
|
|
xo_emit("The hat for {:name/%hs} is {:size/%s}.\n",
|
|
name, size_val);
|
|
xo_emit("It was ordered on {:order-time/%ls}.\n",
|
|
when);
|
|
}
|
|
|
|
It is important to note that xo_emit will perform the conversion
|
|
required to make appropriate output. Text style output uses the
|
|
current locale (as described above), while XML, JSON, and HTML use
|
|
UTF-8.
|
|
|
|
UTF-8 and locale-encoded strings can use multiple bytes to encode one
|
|
column of data. The traditional "precision'" (aka "max-width") value
|
|
for "%s" printf formatting becomes overloaded since it specifies both
|
|
the number of bytes that can be safely referenced and the maximum
|
|
number of columns to emit. xo_emit uses the precision as the former,
|
|
and adds a third value for specifying the maximum number of columns.
|
|
|
|
In this example, the name field is printed with a minimum of 3 columns
|
|
and a maximum of 6. Up to ten bytes of data at the location given by
|
|
'name' are in used in filling those columns::
|
|
|
|
xo_emit("{:name/%3.10.6s}", name);
|
|
|
|
Characters Outside of Field Definitions
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Characters in the format string that are not part of a field
|
|
definition are copied to the output for the TEXT style, and are
|
|
ignored for the JSON and XML styles. For HTML, these characters are
|
|
placed in a <div> with class "text"::
|
|
|
|
EXAMPLE:
|
|
xo_emit("The hat is {:size/%s}.\n", size_val);
|
|
TEXT:
|
|
The hat is extra small.
|
|
XML:
|
|
<size>extra small</size>
|
|
JSON:
|
|
"size": "extra small"
|
|
HTML:
|
|
<div class="text">The hat is </div>
|
|
<div class="data" data-tag="size">extra small</div>
|
|
<div class="text">.</div>
|
|
|
|
.. index:: errno
|
|
|
|
"%m" Is Supported
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
libxo supports the '%m' directive, which formats the error message
|
|
associated with the current value of "errno". It is the equivalent
|
|
of "%s" with the argument strerror(errno)::
|
|
|
|
xo_emit("{:filename} cannot be opened: {:error/%m}", filename);
|
|
xo_emit("{:filename} cannot be opened: {:error/%s}",
|
|
filename, strerror(errno));
|
|
|
|
"%n" Is Not Supported
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
libxo does not support the '%n' directive. It's a bad idea and we
|
|
just don't do it.
|
|
|
|
The Encoding Format (eformat)
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The "eformat" string is the format string used when encoding the field
|
|
for JSON and XML. If not provided, it defaults to the primary format
|
|
with any minimum width removed. If the primary is not given, both
|
|
default to "%s".
|
|
|
|
Content Strings
|
|
~~~~~~~~~~~~~~~
|
|
|
|
For padding and labels, the content string is considered the content,
|
|
unless a format is given.
|
|
|
|
.. index:: printf-like
|
|
|
|
Argument Validation
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
Many compilers and tool chains support validation of printf-like
|
|
arguments. When the format string fails to match the argument list,
|
|
a warning is generated. This is a valuable feature and while the
|
|
formatting strings for libxo differ considerably from printf, many of
|
|
these checks can still provide build-time protection against bugs.
|
|
|
|
libxo provide variants of functions that provide this ability, if the
|
|
"--enable-printflike" option is passed to the "configure" script.
|
|
These functions use the "_p" suffix, like "xo_emit_p()",
|
|
xo_emit_hp()", etc.
|
|
|
|
The following are features of libxo formatting strings that are
|
|
incompatible with printf-like testing:
|
|
|
|
- implicit formats, where "{:tag}" has an implicit "%s";
|
|
- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to
|
|
ten bytes of data can be inspected to fill a minimum of 4 columns and
|
|
a maximum of 6;
|
|
- percent signs in strings, where "{:filled}%" makes a single,
|
|
trailing percent sign;
|
|
- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means
|
|
locale-based string and "{:tag/%ls}" means a wide character string;
|
|
- distinct encoding formats, where "{:tag/#%s/%s}" means the display
|
|
styles (text and HTML) will use "#%s" where other styles use "%s";
|
|
|
|
If none of these features are in use by your code, then using the "_p"
|
|
variants might be wise:
|
|
|
|
================== ========================
|
|
Function printf-like Equivalent
|
|
================== ========================
|
|
xo_emit_hv xo_emit_hvp
|
|
xo_emit_h xo_emit_hp
|
|
xo_emit xo_emit_p
|
|
xo_emit_warn_hcv xo_emit_warn_hcvp
|
|
xo_emit_warn_hc xo_emit_warn_hcp
|
|
xo_emit_warn_c xo_emit_warn_cp
|
|
xo_emit_warn xo_emit_warn_p
|
|
xo_emit_warnx xo_emit_warnx_p
|
|
xo_emit_err xo_emit_err_p
|
|
xo_emit_errx xo_emit_errx_p
|
|
xo_emit_errc xo_emit_errc_p
|
|
================== ========================
|
|
|
|
.. index:: performance
|
|
.. index:: XOEF_RETAIN
|
|
|
|
.. _retain:
|
|
|
|
Retaining Parsed Format Information
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
libxo can retain the parsed internal information related to the given
|
|
format string, allowing subsequent xo_emit calls, the retained
|
|
information is used, avoiding repetitive parsing of the format string::
|
|
|
|
SYNTAX:
|
|
int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...);
|
|
EXAMPLE:
|
|
xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n",
|
|
some, thing, fancy);
|
|
|
|
To retain parsed format information, use the XOEF_RETAIN flag to the
|
|
xo_emit_f() function. A complete set of xo_emit_f functions exist to
|
|
match all the xo_emit function signatures (with handles, varadic
|
|
argument, and printf-like flags):
|
|
|
|
================== ========================
|
|
Function Flags Equivalent
|
|
================== ========================
|
|
xo_emit_hv xo_emit_hvf
|
|
xo_emit_h xo_emit_hf
|
|
xo_emit xo_emit_f
|
|
xo_emit_hvp xo_emit_hvfp
|
|
xo_emit_hp xo_emit_hfp
|
|
xo_emit_p xo_emit_fp
|
|
================== ========================
|
|
|
|
The format string must be immutable across multiple calls to xo_emit_f(),
|
|
since the library retains the string. Typically this is done by using
|
|
static constant strings, such as string literals. If the string is not
|
|
immutable, the XOEF_RETAIN flag must not be used.
|
|
|
|
The functions xo_retain_clear() and xo_retain_clear_all() release
|
|
internal information on either a single format string or all format
|
|
strings, respectively. Neither is required, but the library will
|
|
retain this information until it is cleared or the process exits::
|
|
|
|
const char *fmt = "{:name} {:count/%d}\n";
|
|
for (i = 0; i < 1000; i++) {
|
|
xo_open_instance("item");
|
|
xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]);
|
|
}
|
|
xo_retain_clear(fmt);
|
|
|
|
The retained information is kept as thread-specific data.
|
|
|
|
Example
|
|
~~~~~~~
|
|
|
|
In this example, the value for the number of items in stock is emitted::
|
|
|
|
xo_emit("{P: }{Lwc:In stock}{:in-stock/%u}\n",
|
|
instock);
|
|
|
|
This call will generate the following output::
|
|
|
|
TEXT:
|
|
In stock: 144
|
|
XML:
|
|
<in-stock>144</in-stock>
|
|
JSON:
|
|
"in-stock": 144,
|
|
HTML:
|
|
<div class="line">
|
|
<div class="padding"> </div>
|
|
<div class="label">In stock</div>
|
|
<div class="decoration">:</div>
|
|
<div class="padding"> </div>
|
|
<div class="data" data-tag="in-stock">144</div>
|
|
</div>
|
|
|
|
Clearly HTML wins the verbosity award, and this output does
|
|
not include XOF_XPATH or XOF_INFO data, which would expand the
|
|
penultimate line to::
|
|
|
|
<div class="data" data-tag="in-stock"
|
|
data-xpath="/top/data/item/in-stock"
|
|
data-type="number"
|
|
data-help="Number of items in stock">144</div>
|