924 lines
33 KiB
Plaintext
924 lines
33 KiB
Plaintext
|
=head1 NAME
|
||
|
|
||
|
perldebguts - Guts of Perl debugging
|
||
|
|
||
|
=head1 DESCRIPTION
|
||
|
|
||
|
This is not the perldebug(1) manpage, which tells you how to use
|
||
|
the debugger. This manpage describes low-level details ranging
|
||
|
between difficult and impossible for anyone who isn't incredibly
|
||
|
intimate with Perl's guts to understand. Caveat lector.
|
||
|
|
||
|
=head1 Debugger Internals
|
||
|
|
||
|
Perl has special debugging hooks at compile-time and run-time used
|
||
|
to create debugging environments. These hooks are not to be confused
|
||
|
with the I<perl -Dxxx> command described in L<perlrun>, which are
|
||
|
usable only if a special Perl built per the instructions the
|
||
|
F<INSTALL> podpage in the Perl source tree.
|
||
|
|
||
|
For example, whenever you call Perl's built-in C<caller> function
|
||
|
from the package DB, the arguments that the corresponding stack
|
||
|
frame was called with are copied to the the @DB::args array. The
|
||
|
general mechanisms is enabled by calling Perl with the B<-d> switch, the
|
||
|
following additional features are enabled (cf. L<perlvar/$^P>):
|
||
|
|
||
|
=over
|
||
|
|
||
|
=item *
|
||
|
|
||
|
Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require
|
||
|
'perl5db.pl'}> if not present) before the first line of your program.
|
||
|
|
||
|
=item *
|
||
|
|
||
|
The array C<@{"_<$filename"}> holds the lines of $filename for all
|
||
|
files compiled by Perl. The same for C<eval>ed strings that contain
|
||
|
subroutines, or which are currently being executed. The $filename
|
||
|
for C<eval>ed strings looks like C<(eval 34)>. Code assertions
|
||
|
in regexes look like C<(re_eval 19)>.
|
||
|
|
||
|
=item *
|
||
|
|
||
|
The hash C<%{"_<$filename"}> contains breakpoints and actions keyed
|
||
|
by line number. Individual entries (as opposed to the whole hash)
|
||
|
are settable. Perl only cares about Boolean true here, although
|
||
|
the values used by F<perl5db.pl> have the form
|
||
|
C<"$break_condition\0$action">. Values in this hash are magical
|
||
|
in numeric context: they are zeros if the line is not breakable.
|
||
|
|
||
|
The same holds for evaluated strings that contain subroutines, or
|
||
|
which are currently being executed. The $filename for C<eval>ed strings
|
||
|
looks like C<(eval 34)> or C<(re_eval 19)>.
|
||
|
|
||
|
=item *
|
||
|
|
||
|
The scalar C<${"_<$filename"}> contains C<"_<$filename">. This is
|
||
|
also the case for evaluated strings that contain subroutines, or
|
||
|
which are currently being executed. The $filename for C<eval>ed
|
||
|
strings looks like C<(eval 34)> or C<(re_eval 19)>.
|
||
|
|
||
|
=item *
|
||
|
|
||
|
After each C<require>d file is compiled, but before it is executed,
|
||
|
C<DB::postponed(*{"_<$filename"})> is called if the subroutine
|
||
|
C<DB::postponed> exists. Here, the $filename is the expanded name of
|
||
|
the C<require>d file, as found in the values of %INC.
|
||
|
|
||
|
=item *
|
||
|
|
||
|
After each subroutine C<subname> is compiled, the existence of
|
||
|
C<$DB::postponed{subname}> is checked. If this key exists,
|
||
|
C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine
|
||
|
also exists.
|
||
|
|
||
|
=item *
|
||
|
|
||
|
A hash C<%DB::sub> is maintained, whose keys are subroutine names
|
||
|
and whose values have the form C<filename:startline-endline>.
|
||
|
C<filename> has the form C<(eval 34)> for subroutines defined inside
|
||
|
C<eval>s, or C<(re_eval 19)> for those within regex code assertions.
|
||
|
|
||
|
=item *
|
||
|
|
||
|
When the execution of your program reaches a point that can hold a
|
||
|
breakpoint, the C<DB::DB()> subroutine is called any of the variables
|
||
|
$DB::trace, $DB::single, or $DB::signal is true. These variables
|
||
|
are not C<local>izable. This feature is disabled when executing
|
||
|
inside C<DB::DB()>, including functions called from it
|
||
|
unless C<< $^D & (1<<30) >> is true.
|
||
|
|
||
|
=item *
|
||
|
|
||
|
When execution of the program reaches a subroutine call, a call to
|
||
|
C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the
|
||
|
name of the called subroutine. This doesn't happen if the subroutine
|
||
|
was compiled in the C<DB> package.)
|
||
|
|
||
|
=back
|
||
|
|
||
|
Note that if C<&DB::sub> needs external data for it to work, no
|
||
|
subroutine call is possible until this is done. For the standard
|
||
|
debugger, the C<$DB::deep> variable (how many levels of recursion
|
||
|
deep into the debugger you can go before a mandatory break) gives
|
||
|
an example of such a dependency.
|
||
|
|
||
|
=head2 Writing Your Own Debugger
|
||
|
|
||
|
The minimal working debugger consists of one line
|
||
|
|
||
|
sub DB::DB {}
|
||
|
|
||
|
which is quite handy as contents of C<PERL5DB> environment
|
||
|
variable:
|
||
|
|
||
|
$ PERL5DB="sub DB::DB {}" perl -d your-script
|
||
|
|
||
|
Another brief debugger, slightly more useful, could be created
|
||
|
with only the line:
|
||
|
|
||
|
sub DB::DB {print ++$i; scalar <STDIN>}
|
||
|
|
||
|
This debugger would print the sequential number of encountered
|
||
|
statement, and would wait for you to hit a newline before continuing.
|
||
|
|
||
|
The following debugger is quite functional:
|
||
|
|
||
|
{
|
||
|
package DB;
|
||
|
sub DB {}
|
||
|
sub sub {print ++$i, " $sub\n"; &$sub}
|
||
|
}
|
||
|
|
||
|
It prints the sequential number of subroutine call and the name of the
|
||
|
called subroutine. Note that C<&DB::sub> should be compiled into the
|
||
|
package C<DB>.
|
||
|
|
||
|
At the start, the debugger reads your rc file (F<./.perldb> or
|
||
|
F<~/.perldb> under Unix), which can set important options. This file may
|
||
|
define a subroutine C<&afterinit> to be executed after the debugger is
|
||
|
initialized.
|
||
|
|
||
|
After the rc file is read, the debugger reads the PERLDB_OPTS
|
||
|
environment variable and parses this as the remainder of a C<O ...>
|
||
|
line as one might enter at the debugger prompt.
|
||
|
|
||
|
The debugger also maintains magical internal variables, such as
|
||
|
C<@DB::dbline>, C<%DB::dbline>, which are aliases for
|
||
|
C<@{"::_<current_file"}> C<%{"::_<current_file"}>. Here C<current_file>
|
||
|
is the currently selected file, either explicitly chosen with the
|
||
|
debugger's C<f> command, or implicitly by flow of execution.
|
||
|
|
||
|
Some functions are provided to simplify customization. See
|
||
|
L<perldebug/"Options"> for description of options parsed by
|
||
|
C<DB::parse_options(string)>. The function C<DB::dump_trace(skip[,
|
||
|
count])> skips the specified number of frames and returns a list
|
||
|
containing information about the calling frames (all of them, if
|
||
|
C<count> is missing). Each entry is reference to a a hash with
|
||
|
keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine
|
||
|
name, or info about C<eval>), C<args> (C<undef> or a reference to
|
||
|
an array), C<file>, and C<line>.
|
||
|
|
||
|
The function C<DB::print_trace(FH, skip[, count[, short]])> prints
|
||
|
formatted info about caller frames. The last two functions may be
|
||
|
convenient as arguments to C<< < >>, C<< << >> commands.
|
||
|
|
||
|
Note that any variables and functions that are not documented in
|
||
|
this manpages (or in L<perldebug>) are considered for internal
|
||
|
use only, and as such are subject to change without notice.
|
||
|
|
||
|
=head1 Frame Listing Output Examples
|
||
|
|
||
|
The C<frame> option can be used to control the output of frame
|
||
|
information. For example, contrast this expression trace:
|
||
|
|
||
|
$ perl -de 42
|
||
|
Stack dump during die enabled outside of evals.
|
||
|
|
||
|
Loading DB routines from perl5db.pl patch level 0.94
|
||
|
Emacs support available.
|
||
|
|
||
|
Enter h or `h h' for help.
|
||
|
|
||
|
main::(-e:1): 0
|
||
|
DB<1> sub foo { 14 }
|
||
|
|
||
|
DB<2> sub bar { 3 }
|
||
|
|
||
|
DB<3> t print foo() * bar()
|
||
|
main::((eval 172):3): print foo() + bar();
|
||
|
main::foo((eval 168):2):
|
||
|
main::bar((eval 170):2):
|
||
|
42
|
||
|
|
||
|
with this one, once the C<O>ption C<frame=2> has been set:
|
||
|
|
||
|
DB<4> O f=2
|
||
|
frame = '2'
|
||
|
DB<5> t print foo() * bar()
|
||
|
3: foo() * bar()
|
||
|
entering main::foo
|
||
|
2: sub foo { 14 };
|
||
|
exited main::foo
|
||
|
entering main::bar
|
||
|
2: sub bar { 3 };
|
||
|
exited main::bar
|
||
|
42
|
||
|
|
||
|
By way of demonstration, we present below a laborious listing
|
||
|
resulting from setting your C<PERLDB_OPTS> environment variable to
|
||
|
the value C<f=n N>, and running I<perl -d -V> from the command line.
|
||
|
Examples use various values of C<n> are shown to give you a feel
|
||
|
for the difference between settings. Long those it may be, this
|
||
|
is not a complete listing, but only excerpts.
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item 1
|
||
|
|
||
|
entering main::BEGIN
|
||
|
entering Config::BEGIN
|
||
|
Package lib/Exporter.pm.
|
||
|
Package lib/Carp.pm.
|
||
|
Package lib/Config.pm.
|
||
|
entering Config::TIEHASH
|
||
|
entering Exporter::import
|
||
|
entering Exporter::export
|
||
|
entering Config::myconfig
|
||
|
entering Config::FETCH
|
||
|
entering Config::FETCH
|
||
|
entering Config::FETCH
|
||
|
entering Config::FETCH
|
||
|
|
||
|
=item 2
|
||
|
|
||
|
entering main::BEGIN
|
||
|
entering Config::BEGIN
|
||
|
Package lib/Exporter.pm.
|
||
|
Package lib/Carp.pm.
|
||
|
exited Config::BEGIN
|
||
|
Package lib/Config.pm.
|
||
|
entering Config::TIEHASH
|
||
|
exited Config::TIEHASH
|
||
|
entering Exporter::import
|
||
|
entering Exporter::export
|
||
|
exited Exporter::export
|
||
|
exited Exporter::import
|
||
|
exited main::BEGIN
|
||
|
entering Config::myconfig
|
||
|
entering Config::FETCH
|
||
|
exited Config::FETCH
|
||
|
entering Config::FETCH
|
||
|
exited Config::FETCH
|
||
|
entering Config::FETCH
|
||
|
|
||
|
=item 4
|
||
|
|
||
|
in $=main::BEGIN() from /dev/null:0
|
||
|
in $=Config::BEGIN() from lib/Config.pm:2
|
||
|
Package lib/Exporter.pm.
|
||
|
Package lib/Carp.pm.
|
||
|
Package lib/Config.pm.
|
||
|
in $=Config::TIEHASH('Config') from lib/Config.pm:644
|
||
|
in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
||
|
in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li
|
||
|
in @=Config::myconfig() from /dev/null:0
|
||
|
in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
|
||
|
in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
|
||
|
in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
|
||
|
in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
|
||
|
in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574
|
||
|
in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574
|
||
|
|
||
|
=item 6
|
||
|
|
||
|
in $=main::BEGIN() from /dev/null:0
|
||
|
in $=Config::BEGIN() from lib/Config.pm:2
|
||
|
Package lib/Exporter.pm.
|
||
|
Package lib/Carp.pm.
|
||
|
out $=Config::BEGIN() from lib/Config.pm:0
|
||
|
Package lib/Config.pm.
|
||
|
in $=Config::TIEHASH('Config') from lib/Config.pm:644
|
||
|
out $=Config::TIEHASH('Config') from lib/Config.pm:644
|
||
|
in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
||
|
in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
|
||
|
out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
|
||
|
out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
||
|
out $=main::BEGIN() from /dev/null:0
|
||
|
in @=Config::myconfig() from /dev/null:0
|
||
|
in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
|
||
|
out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
|
||
|
in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
|
||
|
out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
|
||
|
in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
|
||
|
out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
|
||
|
in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
|
||
|
|
||
|
=item 14
|
||
|
|
||
|
in $=main::BEGIN() from /dev/null:0
|
||
|
in $=Config::BEGIN() from lib/Config.pm:2
|
||
|
Package lib/Exporter.pm.
|
||
|
Package lib/Carp.pm.
|
||
|
out $=Config::BEGIN() from lib/Config.pm:0
|
||
|
Package lib/Config.pm.
|
||
|
in $=Config::TIEHASH('Config') from lib/Config.pm:644
|
||
|
out $=Config::TIEHASH('Config') from lib/Config.pm:644
|
||
|
in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
||
|
in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
|
||
|
out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
|
||
|
out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
||
|
out $=main::BEGIN() from /dev/null:0
|
||
|
in @=Config::myconfig() from /dev/null:0
|
||
|
in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
|
||
|
out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
|
||
|
in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
|
||
|
out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
|
||
|
|
||
|
=item 30
|
||
|
|
||
|
in $=CODE(0x15eca4)() from /dev/null:0
|
||
|
in $=CODE(0x182528)() from lib/Config.pm:2
|
||
|
Package lib/Exporter.pm.
|
||
|
out $=CODE(0x182528)() from lib/Config.pm:0
|
||
|
scalar context return from CODE(0x182528): undef
|
||
|
Package lib/Config.pm.
|
||
|
in $=Config::TIEHASH('Config') from lib/Config.pm:628
|
||
|
out $=Config::TIEHASH('Config') from lib/Config.pm:628
|
||
|
scalar context return from Config::TIEHASH: empty hash
|
||
|
in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
||
|
in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
|
||
|
out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
|
||
|
scalar context return from Exporter::export: ''
|
||
|
out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
|
||
|
scalar context return from Exporter::import: ''
|
||
|
|
||
|
=back
|
||
|
|
||
|
In all cases shown above, the line indentation shows the call tree.
|
||
|
If bit 2 of C<frame> is set, a line is printed on exit from a
|
||
|
subroutine as well. If bit 4 is set, the arguments are printed
|
||
|
along with the caller info. If bit 8 is set, the arguments are
|
||
|
printed even if they are tied or references. If bit 16 is set, the
|
||
|
return value is printed, too.
|
||
|
|
||
|
When a package is compiled, a line like this
|
||
|
|
||
|
Package lib/Carp.pm.
|
||
|
|
||
|
is printed with proper indentation.
|
||
|
|
||
|
=head1 Debugging regular expressions
|
||
|
|
||
|
There are two ways to enable debugging output for regular expressions.
|
||
|
|
||
|
If your perl is compiled with C<-DDEBUGGING>, you may use the
|
||
|
B<-Dr> flag on the command line.
|
||
|
|
||
|
Otherwise, one can C<use re 'debug'>, which has effects at
|
||
|
compile time and run time. It is not lexically scoped.
|
||
|
|
||
|
=head2 Compile-time output
|
||
|
|
||
|
The debugging output at compile time looks like this:
|
||
|
|
||
|
compiling RE `[bc]d(ef*g)+h[ij]k$'
|
||
|
size 43 first at 1
|
||
|
1: ANYOF(11)
|
||
|
11: EXACT <d>(13)
|
||
|
13: CURLYX {1,32767}(27)
|
||
|
15: OPEN1(17)
|
||
|
17: EXACT <e>(19)
|
||
|
19: STAR(22)
|
||
|
20: EXACT <f>(0)
|
||
|
22: EXACT <g>(24)
|
||
|
24: CLOSE1(26)
|
||
|
26: WHILEM(0)
|
||
|
27: NOTHING(28)
|
||
|
28: EXACT <h>(30)
|
||
|
30: ANYOF(40)
|
||
|
40: EXACT <k>(42)
|
||
|
42: EOL(43)
|
||
|
43: END(0)
|
||
|
anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
|
||
|
stclass `ANYOF' minlen 7
|
||
|
|
||
|
The first line shows the pre-compiled form of the regex. The second
|
||
|
shows the size of the compiled form (in arbitrary units, usually
|
||
|
4-byte words) and the label I<id> of the first node that does a
|
||
|
match.
|
||
|
|
||
|
The last line (split into two lines above) contains optimizer
|
||
|
information. In the example shown, the optimizer found that the match
|
||
|
should contain a substring C<de> at offset 1, plus substring C<gh>
|
||
|
at some offset between 3 and infinity. Moreover, when checking for
|
||
|
these substrings (to abandon impossible matches quickly), Perl will check
|
||
|
for the substring C<gh> before checking for the substring C<de>. The
|
||
|
optimizer may also use the knowledge that the match starts (at the
|
||
|
C<first> I<id>) with a character class, and the match cannot be
|
||
|
shorter than 7 chars.
|
||
|
|
||
|
The fields of interest which may appear in the last line are
|
||
|
|
||
|
=over
|
||
|
|
||
|
=item C<anchored> I<STRING> C<at> I<POS>
|
||
|
|
||
|
=item C<floating> I<STRING> C<at> I<POS1..POS2>
|
||
|
|
||
|
See above.
|
||
|
|
||
|
=item C<matching floating/anchored>
|
||
|
|
||
|
Which substring to check first.
|
||
|
|
||
|
=item C<minlen>
|
||
|
|
||
|
The minimal length of the match.
|
||
|
|
||
|
=item C<stclass> I<TYPE>
|
||
|
|
||
|
Type of first matching node.
|
||
|
|
||
|
=item C<noscan>
|
||
|
|
||
|
Don't scan for the found substrings.
|
||
|
|
||
|
=item C<isall>
|
||
|
|
||
|
Means that the optimizer info is all that the regular
|
||
|
expression contains, and thus one does not need to enter the regex engine at
|
||
|
all.
|
||
|
|
||
|
=item C<GPOS>
|
||
|
|
||
|
Set if the pattern contains C<\G>.
|
||
|
|
||
|
=item C<plus>
|
||
|
|
||
|
Set if the pattern starts with a repeated char (as in C<x+y>).
|
||
|
|
||
|
=item C<implicit>
|
||
|
|
||
|
Set if the pattern starts with C<.*>.
|
||
|
|
||
|
=item C<with eval>
|
||
|
|
||
|
Set if the pattern contain eval-groups, such as C<(?{ code })> and
|
||
|
C<(??{ code })>.
|
||
|
|
||
|
=item C<anchored(TYPE)>
|
||
|
|
||
|
If the pattern may match only at a handful of places, (with C<TYPE>
|
||
|
being C<BOL>, C<MBOL>, or C<GPOS>. See the table below.
|
||
|
|
||
|
=back
|
||
|
|
||
|
If a substring is known to match at end-of-line only, it may be
|
||
|
followed by C<$>, as in C<floating `k'$>.
|
||
|
|
||
|
The optimizer-specific info is used to avoid entering (a slow) regex
|
||
|
engine on strings that will not definitely match. If C<isall> flag
|
||
|
is set, a call to the regex engine may be avoided even when the optimizer
|
||
|
found an appropriate place for the match.
|
||
|
|
||
|
The rest of the output contains the list of I<nodes> of the compiled
|
||
|
form of the regex. Each line has format
|
||
|
|
||
|
C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>)
|
||
|
|
||
|
=head2 Types of nodes
|
||
|
|
||
|
Here are the possible types, with short descriptions:
|
||
|
|
||
|
# TYPE arg-description [num-args] [longjump-len] DESCRIPTION
|
||
|
|
||
|
# Exit points
|
||
|
END no End of program.
|
||
|
SUCCEED no Return from a subroutine, basically.
|
||
|
|
||
|
# Anchors:
|
||
|
BOL no Match "" at beginning of line.
|
||
|
MBOL no Same, assuming multiline.
|
||
|
SBOL no Same, assuming singleline.
|
||
|
EOS no Match "" at end of string.
|
||
|
EOL no Match "" at end of line.
|
||
|
MEOL no Same, assuming multiline.
|
||
|
SEOL no Same, assuming singleline.
|
||
|
BOUND no Match "" at any word boundary
|
||
|
BOUNDL no Match "" at any word boundary
|
||
|
NBOUND no Match "" at any word non-boundary
|
||
|
NBOUNDL no Match "" at any word non-boundary
|
||
|
GPOS no Matches where last m//g left off.
|
||
|
|
||
|
# [Special] alternatives
|
||
|
ANY no Match any one character (except newline).
|
||
|
SANY no Match any one character.
|
||
|
ANYOF sv Match character in (or not in) this class.
|
||
|
ALNUM no Match any alphanumeric character
|
||
|
ALNUML no Match any alphanumeric char in locale
|
||
|
NALNUM no Match any non-alphanumeric character
|
||
|
NALNUML no Match any non-alphanumeric char in locale
|
||
|
SPACE no Match any whitespace character
|
||
|
SPACEL no Match any whitespace char in locale
|
||
|
NSPACE no Match any non-whitespace character
|
||
|
NSPACEL no Match any non-whitespace char in locale
|
||
|
DIGIT no Match any numeric character
|
||
|
NDIGIT no Match any non-numeric character
|
||
|
|
||
|
# BRANCH The set of branches constituting a single choice are hooked
|
||
|
# together with their "next" pointers, since precedence prevents
|
||
|
# anything being concatenated to any individual branch. The
|
||
|
# "next" pointer of the last BRANCH in a choice points to the
|
||
|
# thing following the whole choice. This is also where the
|
||
|
# final "next" pointer of each individual branch points; each
|
||
|
# branch starts with the operand node of a BRANCH node.
|
||
|
#
|
||
|
BRANCH node Match this alternative, or the next...
|
||
|
|
||
|
# BACK Normal "next" pointers all implicitly point forward; BACK
|
||
|
# exists to make loop structures possible.
|
||
|
# not used
|
||
|
BACK no Match "", "next" ptr points backward.
|
||
|
|
||
|
# Literals
|
||
|
EXACT sv Match this string (preceded by length).
|
||
|
EXACTF sv Match this string, folded (prec. by length).
|
||
|
EXACTFL sv Match this string, folded in locale (w/len).
|
||
|
|
||
|
# Do nothing
|
||
|
NOTHING no Match empty string.
|
||
|
# A variant of above which delimits a group, thus stops optimizations
|
||
|
TAIL no Match empty string. Can jump here from outside.
|
||
|
|
||
|
# STAR,PLUS '?', and complex '*' and '+', are implemented as circular
|
||
|
# BRANCH structures using BACK. Simple cases (one character
|
||
|
# per match) are implemented with STAR and PLUS for speed
|
||
|
# and to minimize recursive plunges.
|
||
|
#
|
||
|
STAR node Match this (simple) thing 0 or more times.
|
||
|
PLUS node Match this (simple) thing 1 or more times.
|
||
|
|
||
|
CURLY sv 2 Match this simple thing {n,m} times.
|
||
|
CURLYN no 2 Match next-after-this simple thing
|
||
|
# {n,m} times, set parens.
|
||
|
CURLYM no 2 Match this medium-complex thing {n,m} times.
|
||
|
CURLYX sv 2 Match this complex thing {n,m} times.
|
||
|
|
||
|
# This terminator creates a loop structure for CURLYX
|
||
|
WHILEM no Do curly processing and see if rest matches.
|
||
|
|
||
|
# OPEN,CLOSE,GROUPP ...are numbered at compile time.
|
||
|
OPEN num 1 Mark this point in input as start of #n.
|
||
|
CLOSE num 1 Analogous to OPEN.
|
||
|
|
||
|
REF num 1 Match some already matched string
|
||
|
REFF num 1 Match already matched string, folded
|
||
|
REFFL num 1 Match already matched string, folded in loc.
|
||
|
|
||
|
# grouping assertions
|
||
|
IFMATCH off 1 2 Succeeds if the following matches.
|
||
|
UNLESSM off 1 2 Fails if the following matches.
|
||
|
SUSPEND off 1 1 "Independent" sub-regex.
|
||
|
IFTHEN off 1 1 Switch, should be preceded by switcher .
|
||
|
GROUPP num 1 Whether the group matched.
|
||
|
|
||
|
# Support for long regex
|
||
|
LONGJMP off 1 1 Jump far away.
|
||
|
BRANCHJ off 1 1 BRANCH with long offset.
|
||
|
|
||
|
# The heavy worker
|
||
|
EVAL evl 1 Execute some Perl code.
|
||
|
|
||
|
# Modifiers
|
||
|
MINMOD no Next operator is not greedy.
|
||
|
LOGICAL no Next opcode should set the flag only.
|
||
|
|
||
|
# This is not used yet
|
||
|
RENUM off 1 1 Group with independently numbered parens.
|
||
|
|
||
|
# This is not really a node, but an optimized away piece of a "long" node.
|
||
|
# To simplify debugging output, we mark it as if it were a node
|
||
|
OPTIMIZED off Placeholder for dump.
|
||
|
|
||
|
=head2 Run-time output
|
||
|
|
||
|
First of all, when doing a match, one may get no run-time output even
|
||
|
if debugging is enabled. This means that the regex engine was never
|
||
|
entered and that all of the job was therefore done by the optimizer.
|
||
|
|
||
|
If the regex engine was entered, the output may look like this:
|
||
|
|
||
|
Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__'
|
||
|
Setting an EVAL scope, savestack=3
|
||
|
2 <ab> <cdefg__gh_> | 1: ANYOF
|
||
|
3 <abc> <defg__gh_> | 11: EXACT <d>
|
||
|
4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}
|
||
|
4 <abcd> <efg__gh_> | 26: WHILEM
|
||
|
0 out of 1..32767 cc=effff31c
|
||
|
4 <abcd> <efg__gh_> | 15: OPEN1
|
||
|
4 <abcd> <efg__gh_> | 17: EXACT <e>
|
||
|
5 <abcde> <fg__gh_> | 19: STAR
|
||
|
EXACT <f> can match 1 times out of 32767...
|
||
|
Setting an EVAL scope, savestack=3
|
||
|
6 <bcdef> <g__gh__> | 22: EXACT <g>
|
||
|
7 <bcdefg> <__gh__> | 24: CLOSE1
|
||
|
7 <bcdefg> <__gh__> | 26: WHILEM
|
||
|
1 out of 1..32767 cc=effff31c
|
||
|
Setting an EVAL scope, savestack=12
|
||
|
7 <bcdefg> <__gh__> | 15: OPEN1
|
||
|
7 <bcdefg> <__gh__> | 17: EXACT <e>
|
||
|
restoring \1 to 4(4)..7
|
||
|
failed, try continuation...
|
||
|
7 <bcdefg> <__gh__> | 27: NOTHING
|
||
|
7 <bcdefg> <__gh__> | 28: EXACT <h>
|
||
|
failed...
|
||
|
failed...
|
||
|
|
||
|
The most significant information in the output is about the particular I<node>
|
||
|
of the compiled regex that is currently being tested against the target string.
|
||
|
The format of these lines is
|
||
|
|
||
|
C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE>
|
||
|
|
||
|
The I<TYPE> info is indented with respect to the backtracking level.
|
||
|
Other incidental information appears interspersed within.
|
||
|
|
||
|
=head1 Debugging Perl memory usage
|
||
|
|
||
|
Perl is a profligate wastrel when it comes to memory use. There
|
||
|
is a saying that to estimate memory usage of Perl, assume a reasonable
|
||
|
algorithm for memory allocation, multiply that estimate by 10, and
|
||
|
while you still may miss the mark, at least you won't be quite so
|
||
|
astonished. This is not absolutely true, but may prvide a good
|
||
|
grasp of what happens.
|
||
|
|
||
|
Assume that an integer cannot take less than 20 bytes of memory, a
|
||
|
float cannot take less than 24 bytes, a string cannot take less
|
||
|
than 32 bytes (all these examples assume 32-bit architectures, the
|
||
|
result are quite a bit worse on 64-bit architectures). If a variable
|
||
|
is accessed in two of three different ways (which require an integer,
|
||
|
a float, or a string), the memory footprint may increase yet another
|
||
|
20 bytes. A sloppy malloc(3) implementation can make inflate these
|
||
|
numbers dramatically.
|
||
|
|
||
|
On the opposite end of the scale, a declaration like
|
||
|
|
||
|
sub foo;
|
||
|
|
||
|
may take up to 500 bytes of memory, depending on which release of Perl
|
||
|
you're running.
|
||
|
|
||
|
Anecdotal estimates of source-to-compiled code bloat suggest an
|
||
|
eightfold increase. This means that the compiled form of reasonable
|
||
|
(normally commented, properly indented etc.) code will take
|
||
|
about eight times more space in memory than the code took
|
||
|
on disk.
|
||
|
|
||
|
There are two Perl-specific ways to analyze memory usage:
|
||
|
$ENV{PERL_DEBUG_MSTATS} and B<-DL> command-line switch. The first
|
||
|
is available only if Perl is compiled with Perl's malloc(); the
|
||
|
second only if Perl was built with C<-DDEBUGGING>. See the
|
||
|
instructions for how to do this in the F<INSTALL> podpage at
|
||
|
the top level of the Perl source tree.
|
||
|
|
||
|
=head2 Using C<$ENV{PERL_DEBUG_MSTATS}>
|
||
|
|
||
|
If your perl is using Perl's malloc() and was compiled with the
|
||
|
necessary switches (this is the default), then it will print memory
|
||
|
usage statistics after compiling your code hwen C<< $ENV{PERL_DEBUG_MSTATS}
|
||
|
> 1 >>, and before termination of the program when C<<
|
||
|
$ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to
|
||
|
the following example:
|
||
|
|
||
|
$ PERL_DEBUG_MSTATS=2 perl -e "require Carp"
|
||
|
Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
|
||
|
14216 free: 130 117 28 7 9 0 2 2 1 0 0
|
||
|
437 61 36 0 5
|
||
|
60924 used: 125 137 161 55 7 8 6 16 2 0 1
|
||
|
74 109 304 84 20
|
||
|
Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.
|
||
|
Memory allocation statistics after execution: (buckets 4(4)..8188(8192)
|
||
|
30888 free: 245 78 85 13 6 2 1 3 2 0 1
|
||
|
315 162 39 42 11
|
||
|
175816 used: 265 176 1112 111 26 22 11 27 2 1 1
|
||
|
196 178 1066 798 39
|
||
|
Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.
|
||
|
|
||
|
It is possible to ask for such a statistic at arbitrary points in
|
||
|
your execution using the mstats() function out of the standard
|
||
|
Devel::Peek module.
|
||
|
|
||
|
Here is some explanation of that format:
|
||
|
|
||
|
=over
|
||
|
|
||
|
=item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)>
|
||
|
|
||
|
Perl's malloc() uses bucketed allocations. Every request is rounded
|
||
|
up to the closest bucket size available, and a bucket is taken from
|
||
|
the pool of buckets of that size.
|
||
|
|
||
|
The line above describes the limits of buckets currently in use.
|
||
|
Each bucket has two sizes: memory footprint and the maximal size
|
||
|
of user data that can fit into this bucket. Suppose in the above
|
||
|
example that the smallest bucket were size 4. The biggest bucket
|
||
|
would have usable size 8188, and the memory footprint would be 8192.
|
||
|
|
||
|
In a Perl built for debugging, some buckets may have negative usable
|
||
|
size. This means that these buckets cannot (and will not) be used.
|
||
|
For larger buckets, the memory footprint may be one page greater
|
||
|
than a power of 2. If so, case the corresponding power of two is
|
||
|
printed in the C<APPROX> field above.
|
||
|
|
||
|
=item Free/Used
|
||
|
|
||
|
The 1 or 2 rows of numbers following that correspond to the number
|
||
|
of buckets of each size between C<SMALLEST> and C<GREATEST>. In
|
||
|
the first row, the sizes (memory footprints) of buckets are powers
|
||
|
of two--or possibly one page greater. In the second row, if present,
|
||
|
the memory footprints of the buckets are between the memory footprints
|
||
|
of two buckets "above".
|
||
|
|
||
|
For example, suppose under the pervious example, the memory footprints
|
||
|
were
|
||
|
|
||
|
free: 8 16 32 64 128 256 512 1024 2048 4096 8192
|
||
|
4 12 24 48 80
|
||
|
|
||
|
With non-C<DEBUGGING> perl, the buckets starting from C<128> have
|
||
|
a 4-byte overhead, and thus a 8192-long bucket may take up to
|
||
|
8188-byte allocations.
|
||
|
|
||
|
=item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS>
|
||
|
|
||
|
The first two fields give the total amount of memory perl sbrk(2)ed
|
||
|
(ess-broken? :-) and number of sbrk(2)s used. The third number is
|
||
|
what perl thinks about continuity of returned chunks. So long as
|
||
|
this number is positive, malloc() will assume that it is probable
|
||
|
that sbrk(2) will provide continuous memory.
|
||
|
|
||
|
Memory allocated by external libraries is not counted.
|
||
|
|
||
|
=item C<pad: 0>
|
||
|
|
||
|
The amount of sbrk(2)ed memory needed to keep buckets aligned.
|
||
|
|
||
|
=item C<heads: 2192>
|
||
|
|
||
|
Although memory overhead of bigger buckets is kept inside the bucket, for
|
||
|
smaller buckets, it is kept in separate areas. This field gives the
|
||
|
total size of these areas.
|
||
|
|
||
|
=item C<chain: 0>
|
||
|
|
||
|
malloc() may want to subdivide a bigger bucket into smaller buckets.
|
||
|
If only a part of the deceased bucket is left unsubdivided, the rest
|
||
|
is kept as an element of a linked list. This field gives the total
|
||
|
size of these chunks.
|
||
|
|
||
|
=item C<tail: 6144>
|
||
|
|
||
|
To minimize the number of sbrk(2)s, malloc() asks for more memory. This
|
||
|
field gives the size of the yet unused part, which is sbrk(2)ed, but
|
||
|
never touched.
|
||
|
|
||
|
=back
|
||
|
|
||
|
=head2 Example of using B<-DL> switch
|
||
|
|
||
|
Below we show how to analyse memory usage by
|
||
|
|
||
|
do 'lib/auto/POSIX/autosplit.ix';
|
||
|
|
||
|
The file in question contains a header and 146 lines similar to
|
||
|
|
||
|
sub getcwd;
|
||
|
|
||
|
B<WARNING>: The discussion below supposes 32-bit architecture. In
|
||
|
newer releases of Perl, memory usage of the constructs discussed
|
||
|
here is greatly improved, but the story discussed below is a real-life
|
||
|
story. This story is mercilessly terse, and assumes rather more than cursory
|
||
|
knowledge of Perl internals. Type space to continue, `q' to quit.
|
||
|
(Actually, you just want to skip to the next section.)
|
||
|
|
||
|
Here is the itemized list of Perl allocations performed during parsing
|
||
|
of this file:
|
||
|
|
||
|
!!! "after" at test.pl line 3.
|
||
|
Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+
|
||
|
0 02 13752 . . . . 294 . . . . . . . . . . 4
|
||
|
0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3
|
||
|
5 05 32 . . . . . . . 1 . . . . . . . .
|
||
|
6 02 7152 . . . . . . . . . . 149 . . . . .
|
||
|
7 02 3600 . . . . . 150 . . . . . . . . . .
|
||
|
7 03 64 . -1 . 1 . . 2 . . . . . . . . .
|
||
|
7 04 7056 . . . . . . . . . . . . . . . 7
|
||
|
7 17 38404 . . . . . . . 1 . . 442 149 . . 147 .
|
||
|
9 03 2078 17 249 32 . . . . 2 . . . . . . . .
|
||
|
|
||
|
|
||
|
To see this list, insert two C<warn('!...')> statements around the call:
|
||
|
|
||
|
warn('!');
|
||
|
do 'lib/auto/POSIX/autosplit.ix';
|
||
|
warn('!!! "after"');
|
||
|
|
||
|
and run it with PErl's B<-DL> option. The first warn() will print
|
||
|
memory allocation info before parsing the file and will memorize
|
||
|
the statistics at this point (we ignore what it prints). The second
|
||
|
warn() prints increments with respect to these memorized data. This
|
||
|
is the printout shown above.
|
||
|
|
||
|
Different I<Id>s on the left correspond to different subsystems of
|
||
|
the perl interpreter. They are just the first argument given to
|
||
|
the perl memory allocation API named New(). To find what C<9 03>
|
||
|
means, just B<grep> the perl source for C<903>. You'll find it in
|
||
|
F<util.c>, function savepvn(). (I know, you wonder why we told you
|
||
|
to B<grep> and then gave away the answer. That's because grepping
|
||
|
the source is good for the soul.) This function is used to store
|
||
|
a copy of an existing chunk of memory. Using a C debugger, one can
|
||
|
see that the function was called either directly from gv_init() or
|
||
|
via sv_magic(), and that gv_init() is called from gv_fetchpv()--which
|
||
|
was itself called from newSUB(). Please stop to catch your breath now.
|
||
|
|
||
|
B<NOTE>: To reach this point in the debugger and skip the calls to
|
||
|
savepvn() during the compilation of the main program, you should
|
||
|
set a C breakpoint
|
||
|
in Perl_warn(), continue until this point is reached, and I<then> set
|
||
|
a C breakpoint in Perl_savepvn(). Note that you may need to skip a
|
||
|
handful of Perl_savepvn() calls that do not correspond to mass production
|
||
|
of CVs (there are more C<903> allocations than 146 similar lines of
|
||
|
F<lib/auto/POSIX/autosplit.ix>). Note also that C<Perl_> prefixes are
|
||
|
added by macroization code in perl header files to avoid conflicts
|
||
|
with external libraries.
|
||
|
|
||
|
Anyway, we see that C<903> ids correspond to creation of globs, twice
|
||
|
per glob - for glob name, and glob stringification magic.
|
||
|
|
||
|
Here are explanations for other I<Id>s above:
|
||
|
|
||
|
=over
|
||
|
|
||
|
=item C<717>
|
||
|
|
||
|
CReates bigger C<XPV*> structures. In the case above, it
|
||
|
creates 3 C<AV>s per subroutine, one for a list of lexical variable
|
||
|
names, one for a scratchpad (which contains lexical variables and
|
||
|
C<targets>), and one for the array of scratchpads needed for
|
||
|
recursion.
|
||
|
|
||
|
It also creates a C<GV> and a C<CV> per subroutine, all called from
|
||
|
start_subparse().
|
||
|
|
||
|
=item C<002>
|
||
|
|
||
|
Creates a C array corresponding to the C<AV> of scratchpads and the
|
||
|
scratchpad itself. The first fake entry of this scratchpad is
|
||
|
created though the subroutine itself is not defined yet.
|
||
|
|
||
|
It also creates C arrays to keep data for the stash. This is one HV,
|
||
|
but it grows; thus, there are 4 big allocations: the big chunks are not
|
||
|
freed, but are kept as additional arenas for C<SV> allocations.
|
||
|
|
||
|
=item C<054>
|
||
|
|
||
|
Creates a C<HEK> for the name of the glob for the subroutine. This
|
||
|
name is a key in a I<stash>.
|
||
|
|
||
|
Big allocations with this I<Id> correspond to allocations of new
|
||
|
arenas to keep C<HE>.
|
||
|
|
||
|
=item C<602>
|
||
|
|
||
|
Creates a C<GP> for the glob for the subroutine.
|
||
|
|
||
|
=item C<702>
|
||
|
|
||
|
Creates the C<MAGIC> for the glob for the subroutine.
|
||
|
|
||
|
=item C<704>
|
||
|
|
||
|
Creates I<arenas> which keep SVs.
|
||
|
|
||
|
=back
|
||
|
|
||
|
=head2 B<-DL> details
|
||
|
|
||
|
If Perl is run with B<-DL> option, then warn()s that start with `!'
|
||
|
behave specially. They print a list of I<categories> of memory
|
||
|
allocations, and statistics of allocations of different sizes for
|
||
|
these categories.
|
||
|
|
||
|
If warn() string starts with
|
||
|
|
||
|
=over
|
||
|
|
||
|
=item C<!!!>
|
||
|
|
||
|
print changed categories only, print the differences in counts of allocations.
|
||
|
|
||
|
=item C<!!>
|
||
|
|
||
|
print grown categories only; print the absolute values of counts, and totals.
|
||
|
|
||
|
=item C<!>
|
||
|
|
||
|
print nonempty categories, print the absolute values of counts and totals.
|
||
|
|
||
|
=back
|
||
|
|
||
|
=head2 Limitations of B<-DL> statistics
|
||
|
|
||
|
If an extension or external library does not use the Perl API to
|
||
|
allocate memory, such allocations are not counted.
|
||
|
|
||
|
=head1 SEE ALSO
|
||
|
|
||
|
L<perldebug>,
|
||
|
L<perlguts>,
|
||
|
L<perlrun>
|
||
|
L<re>,
|
||
|
and
|
||
|
L<Devel::Dprof>.
|