1175 lines
43 KiB
Plaintext
1175 lines
43 KiB
Plaintext
=head1 NAME
|
|
|
|
perlsub - Perl subroutines
|
|
|
|
=head1 SYNOPSIS
|
|
|
|
To declare subroutines:
|
|
|
|
sub NAME; # A "forward" declaration.
|
|
sub NAME(PROTO); # ditto, but with prototypes
|
|
|
|
sub NAME BLOCK # A declaration and a definition.
|
|
sub NAME(PROTO) BLOCK # ditto, but with prototypes
|
|
|
|
To define an anonymous subroutine at runtime:
|
|
|
|
$subref = sub BLOCK; # no proto
|
|
$subref = sub (PROTO) BLOCK; # with proto
|
|
|
|
To import subroutines:
|
|
|
|
use PACKAGE qw(NAME1 NAME2 NAME3);
|
|
|
|
To call subroutines:
|
|
|
|
NAME(LIST); # & is optional with parentheses.
|
|
NAME LIST; # Parentheses optional if predeclared/imported.
|
|
&NAME; # Makes current @_ visible to called subroutine.
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
Like many languages, Perl provides for user-defined subroutines. These
|
|
may be located anywhere in the main program, loaded in from other files
|
|
via the C<do>, C<require>, or C<use> keywords, or even generated on the
|
|
fly using C<eval> or anonymous subroutines (closures). You can even call
|
|
a function indirectly using a variable containing its name or a CODE reference
|
|
to it.
|
|
|
|
The Perl model for function call and return values is simple: all
|
|
functions are passed as parameters one single flat list of scalars, and
|
|
all functions likewise return to their caller one single flat list of
|
|
scalars. Any arrays or hashes in these call and return lists will
|
|
collapse, losing their identities--but you may always use
|
|
pass-by-reference instead to avoid this. Both call and return lists may
|
|
contain as many or as few scalar elements as you'd like. (Often a
|
|
function without an explicit return statement is called a subroutine, but
|
|
there's really no difference from the language's perspective.)
|
|
|
|
Any arguments passed to the routine come in as the array C<@_>. Thus if you
|
|
called a function with two arguments, those would be stored in C<$_[0]>
|
|
and C<$_[1]>. The array C<@_> is a local array, but its elements are
|
|
aliases for the actual scalar parameters. In particular, if an element
|
|
C<$_[0]> is updated, the corresponding argument is updated (or an error
|
|
occurs if it is not updatable). If an argument is an array or hash
|
|
element which did not exist when the function was called, that element is
|
|
created only when (and if) it is modified or if a reference to it is
|
|
taken. (Some earlier versions of Perl created the element whether or not
|
|
it was assigned to.) Note that assigning to the whole array C<@_> removes
|
|
the aliasing, and does not update any arguments.
|
|
|
|
The return value of the subroutine is the value of the last expression
|
|
evaluated. Alternatively, a C<return> statement may be used to exit the
|
|
subroutine, optionally specifying the returned value, which will be
|
|
evaluated in the appropriate context (list, scalar, or void) depending
|
|
on the context of the subroutine call. If you specify no return value,
|
|
the subroutine will return an empty list in a list context, an undefined
|
|
value in a scalar context, or nothing in a void context. If you return
|
|
one or more arrays and/or hashes, these will be flattened together into
|
|
one large indistinguishable list.
|
|
|
|
Perl does not have named formal parameters, but in practice all you do is
|
|
assign to a C<my()> list of these. Any variables you use in the function
|
|
that aren't declared private are global variables. For the gory details
|
|
on creating private variables, see
|
|
L<"Private Variables via my()"> and L<"Temporary Values via local()">.
|
|
To create protected environments for a set of functions in a separate
|
|
package (and probably a separate file), see L<perlmod/"Packages">.
|
|
|
|
Example:
|
|
|
|
sub max {
|
|
my $max = shift(@_);
|
|
foreach $foo (@_) {
|
|
$max = $foo if $max < $foo;
|
|
}
|
|
return $max;
|
|
}
|
|
$bestday = max($mon,$tue,$wed,$thu,$fri);
|
|
|
|
Example:
|
|
|
|
# get a line, combining continuation lines
|
|
# that start with whitespace
|
|
|
|
sub get_line {
|
|
$thisline = $lookahead; # GLOBAL VARIABLES!!
|
|
LINE: while (defined($lookahead = <STDIN>)) {
|
|
if ($lookahead =~ /^[ \t]/) {
|
|
$thisline .= $lookahead;
|
|
}
|
|
else {
|
|
last LINE;
|
|
}
|
|
}
|
|
$thisline;
|
|
}
|
|
|
|
$lookahead = <STDIN>; # get first line
|
|
while ($_ = get_line()) {
|
|
...
|
|
}
|
|
|
|
Use array assignment to a local list to name your formal arguments:
|
|
|
|
sub maybeset {
|
|
my($key, $value) = @_;
|
|
$Foo{$key} = $value unless $Foo{$key};
|
|
}
|
|
|
|
This also has the effect of turning call-by-reference into call-by-value,
|
|
because the assignment copies the values. Otherwise a function is free to
|
|
do in-place modifications of C<@_> and change its caller's values.
|
|
|
|
upcase_in($v1, $v2); # this changes $v1 and $v2
|
|
sub upcase_in {
|
|
for (@_) { tr/a-z/A-Z/ }
|
|
}
|
|
|
|
You aren't allowed to modify constants in this way, of course. If an
|
|
argument were actually literal and you tried to change it, you'd take a
|
|
(presumably fatal) exception. For example, this won't work:
|
|
|
|
upcase_in("frederick");
|
|
|
|
It would be much safer if the C<upcase_in()> function
|
|
were written to return a copy of its parameters instead
|
|
of changing them in place:
|
|
|
|
($v3, $v4) = upcase($v1, $v2); # this doesn't
|
|
sub upcase {
|
|
return unless defined wantarray; # void context, do nothing
|
|
my @parms = @_;
|
|
for (@parms) { tr/a-z/A-Z/ }
|
|
return wantarray ? @parms : $parms[0];
|
|
}
|
|
|
|
Notice how this (unprototyped) function doesn't care whether it was passed
|
|
real scalars or arrays. Perl will see everything as one big long flat C<@_>
|
|
parameter list. This is one of the ways where Perl's simple
|
|
argument-passing style shines. The C<upcase()> function would work perfectly
|
|
well without changing the C<upcase()> definition even if we fed it things
|
|
like this:
|
|
|
|
@newlist = upcase(@list1, @list2);
|
|
@newlist = upcase( split /:/, $var );
|
|
|
|
Do not, however, be tempted to do this:
|
|
|
|
(@a, @b) = upcase(@list1, @list2);
|
|
|
|
Because like its flat incoming parameter list, the return list is also
|
|
flat. So all you have managed to do here is stored everything in C<@a> and
|
|
made C<@b> an empty list. See L<Pass by Reference> for alternatives.
|
|
|
|
A subroutine may be called using the "C<&>" prefix. The "C<&>" is optional
|
|
in modern Perls, and so are the parentheses if the subroutine has been
|
|
predeclared. (Note, however, that the "C<&>" is I<NOT> optional when
|
|
you're just naming the subroutine, such as when it's used as an
|
|
argument to C<defined()> or C<undef()>. Nor is it optional when you want to
|
|
do an indirect subroutine call with a subroutine name or reference
|
|
using the C<&$subref()> or C<&{$subref}()> constructs. See L<perlref>
|
|
for more on that.)
|
|
|
|
Subroutines may be called recursively. If a subroutine is called using
|
|
the "C<&>" form, the argument list is optional, and if omitted, no C<@_> array is
|
|
set up for the subroutine: the C<@_> array at the time of the call is
|
|
visible to subroutine instead. This is an efficiency mechanism that
|
|
new users may wish to avoid.
|
|
|
|
&foo(1,2,3); # pass three arguments
|
|
foo(1,2,3); # the same
|
|
|
|
foo(); # pass a null list
|
|
&foo(); # the same
|
|
|
|
&foo; # foo() get current args, like foo(@_) !!
|
|
foo; # like foo() IFF sub foo predeclared, else "foo"
|
|
|
|
Not only does the "C<&>" form make the argument list optional, but it also
|
|
disables any prototype checking on the arguments you do provide. This
|
|
is partly for historical reasons, and partly for having a convenient way
|
|
to cheat if you know what you're doing. See the section on Prototypes below.
|
|
|
|
Function whose names are in all upper case are reserved to the Perl core,
|
|
just as are modules whose names are in all lower case. A function in
|
|
all capitals is a loosely-held convention meaning it will be called
|
|
indirectly by the run-time system itself. Functions that do special,
|
|
pre-defined things are C<BEGIN>, C<END>, C<AUTOLOAD>, and C<DESTROY>--plus all the
|
|
functions mentioned in L<perltie>. The 5.005 release adds C<INIT>
|
|
to this list.
|
|
|
|
=head2 Private Variables via my()
|
|
|
|
Synopsis:
|
|
|
|
my $foo; # declare $foo lexically local
|
|
my (@wid, %get); # declare list of variables local
|
|
my $foo = "flurp"; # declare $foo lexical, and init it
|
|
my @oof = @bar; # declare @oof lexical, and init it
|
|
|
|
A "C<my>" declares the listed variables to be confined (lexically) to the
|
|
enclosing block, conditional (C<if/unless/elsif/else>), loop
|
|
(C<for/foreach/while/until/continue>), subroutine, C<eval>, or
|
|
C<do/require/use>'d file. If more than one value is listed, the list
|
|
must be placed in parentheses. All listed elements must be legal lvalues.
|
|
Only alphanumeric identifiers may be lexically scoped--magical
|
|
builtins like C<$/> must currently be C<local>ize with "C<local>" instead.
|
|
|
|
Unlike dynamic variables created by the "C<local>" operator, lexical
|
|
variables declared with "C<my>" are totally hidden from the outside world,
|
|
including any called subroutines (even if it's the same subroutine called
|
|
from itself or elsewhere--every call gets its own copy).
|
|
|
|
This doesn't mean that a C<my()> variable declared in a statically
|
|
I<enclosing> lexical scope would be invisible. Only the dynamic scopes
|
|
are cut off. For example, the C<bumpx()> function below has access to the
|
|
lexical C<$x> variable because both the my and the sub occurred at the same
|
|
scope, presumably the file scope.
|
|
|
|
my $x = 10;
|
|
sub bumpx { $x++ }
|
|
|
|
(An C<eval()>, however, can see the lexical variables of the scope it is
|
|
being evaluated in so long as the names aren't hidden by declarations within
|
|
the C<eval()> itself. See L<perlref>.)
|
|
|
|
The parameter list to C<my()> may be assigned to if desired, which allows you
|
|
to initialize your variables. (If no initializer is given for a
|
|
particular variable, it is created with the undefined value.) Commonly
|
|
this is used to name the parameters to a subroutine. Examples:
|
|
|
|
$arg = "fred"; # "global" variable
|
|
$n = cube_root(27);
|
|
print "$arg thinks the root is $n\n";
|
|
fred thinks the root is 3
|
|
|
|
sub cube_root {
|
|
my $arg = shift; # name doesn't matter
|
|
$arg **= 1/3;
|
|
return $arg;
|
|
}
|
|
|
|
The "C<my>" is simply a modifier on something you might assign to. So when
|
|
you do assign to the variables in its argument list, the "C<my>" doesn't
|
|
change whether those variables are viewed as a scalar or an array. So
|
|
|
|
my ($foo) = <STDIN>; # WRONG?
|
|
my @FOO = <STDIN>;
|
|
|
|
both supply a list context to the right-hand side, while
|
|
|
|
my $foo = <STDIN>;
|
|
|
|
supplies a scalar context. But the following declares only one variable:
|
|
|
|
my $foo, $bar = 1; # WRONG
|
|
|
|
That has the same effect as
|
|
|
|
my $foo;
|
|
$bar = 1;
|
|
|
|
The declared variable is not introduced (is not visible) until after
|
|
the current statement. Thus,
|
|
|
|
my $x = $x;
|
|
|
|
can be used to initialize the new $x with the value of the old C<$x>, and
|
|
the expression
|
|
|
|
my $x = 123 and $x == 123
|
|
|
|
is false unless the old C<$x> happened to have the value C<123>.
|
|
|
|
Lexical scopes of control structures are not bounded precisely by the
|
|
braces that delimit their controlled blocks; control expressions are
|
|
part of the scope, too. Thus in the loop
|
|
|
|
while (defined(my $line = <>)) {
|
|
$line = lc $line;
|
|
} continue {
|
|
print $line;
|
|
}
|
|
|
|
the scope of C<$line> extends from its declaration throughout the rest of
|
|
the loop construct (including the C<continue> clause), but not beyond
|
|
it. Similarly, in the conditional
|
|
|
|
if ((my $answer = <STDIN>) =~ /^yes$/i) {
|
|
user_agrees();
|
|
} elsif ($answer =~ /^no$/i) {
|
|
user_disagrees();
|
|
} else {
|
|
chomp $answer;
|
|
die "'$answer' is neither 'yes' nor 'no'";
|
|
}
|
|
|
|
the scope of C<$answer> extends from its declaration throughout the rest
|
|
of the conditional (including C<elsif> and C<else> clauses, if any),
|
|
but not beyond it.
|
|
|
|
(None of the foregoing applies to C<if/unless> or C<while/until>
|
|
modifiers appended to simple statements. Such modifiers are not
|
|
control structures and have no effect on scoping.)
|
|
|
|
The C<foreach> loop defaults to scoping its index variable dynamically
|
|
(in the manner of C<local>; see below). However, if the index
|
|
variable is prefixed with the keyword "C<my>", then it is lexically
|
|
scoped instead. Thus in the loop
|
|
|
|
for my $i (1, 2, 3) {
|
|
some_function();
|
|
}
|
|
|
|
the scope of C<$i> extends to the end of the loop, but not beyond it, and
|
|
so the value of C<$i> is unavailable in C<some_function()>.
|
|
|
|
Some users may wish to encourage the use of lexically scoped variables.
|
|
As an aid to catching implicit references to package variables,
|
|
if you say
|
|
|
|
use strict 'vars';
|
|
|
|
then any variable reference from there to the end of the enclosing
|
|
block must either refer to a lexical variable, or must be fully
|
|
qualified with the package name. A compilation error results
|
|
otherwise. An inner block may countermand this with S<"C<no strict 'vars'>">.
|
|
|
|
A C<my()> has both a compile-time and a run-time effect. At compile time,
|
|
the compiler takes notice of it; the principle usefulness of this is to
|
|
quiet S<"C<use strict 'vars'>">. The actual initialization is delayed until
|
|
run time, so it gets executed appropriately; every time through a loop,
|
|
for example.
|
|
|
|
Variables declared with "C<my>" are not part of any package and are therefore
|
|
never fully qualified with the package name. In particular, you're not
|
|
allowed to try to make a package variable (or other global) lexical:
|
|
|
|
my $pack::var; # ERROR! Illegal syntax
|
|
my $_; # also illegal (currently)
|
|
|
|
In fact, a dynamic variable (also known as package or global variables)
|
|
are still accessible using the fully qualified C<::> notation even while a
|
|
lexical of the same name is also visible:
|
|
|
|
package main;
|
|
local $x = 10;
|
|
my $x = 20;
|
|
print "$x and $::x\n";
|
|
|
|
That will print out C<20> and C<10>.
|
|
|
|
You may declare "C<my>" variables at the outermost scope of a file to hide
|
|
any such identifiers totally from the outside world. This is similar
|
|
to C's static variables at the file level. To do this with a subroutine
|
|
requires the use of a closure (anonymous function with lexical access).
|
|
If a block (such as an C<eval()>, function, or C<package>) wants to create
|
|
a private subroutine that cannot be called from outside that block,
|
|
it can declare a lexical variable containing an anonymous sub reference:
|
|
|
|
my $secret_version = '1.001-beta';
|
|
my $secret_sub = sub { print $secret_version };
|
|
&$secret_sub();
|
|
|
|
As long as the reference is never returned by any function within the
|
|
module, no outside module can see the subroutine, because its name is not in
|
|
any package's symbol table. Remember that it's not I<REALLY> called
|
|
C<$some_pack::secret_version> or anything; it's just C<$secret_version>,
|
|
unqualified and unqualifiable.
|
|
|
|
This does not work with object methods, however; all object methods have
|
|
to be in the symbol table of some package to be found.
|
|
|
|
=head2 Persistent Private Variables
|
|
|
|
Just because a lexical variable is lexically (also called statically)
|
|
scoped to its enclosing block, C<eval>, or C<do> FILE, this doesn't mean that
|
|
within a function it works like a C static. It normally works more
|
|
like a C auto, but with implicit garbage collection.
|
|
|
|
Unlike local variables in C or C++, Perl's lexical variables don't
|
|
necessarily get recycled just because their scope has exited.
|
|
If something more permanent is still aware of the lexical, it will
|
|
stick around. So long as something else references a lexical, that
|
|
lexical won't be freed--which is as it should be. You wouldn't want
|
|
memory being free until you were done using it, or kept around once you
|
|
were done. Automatic garbage collection takes care of this for you.
|
|
|
|
This means that you can pass back or save away references to lexical
|
|
variables, whereas to return a pointer to a C auto is a grave error.
|
|
It also gives us a way to simulate C's function statics. Here's a
|
|
mechanism for giving a function private variables with both lexical
|
|
scoping and a static lifetime. If you do want to create something like
|
|
C's static variables, just enclose the whole function in an extra block,
|
|
and put the static variable outside the function but in the block.
|
|
|
|
{
|
|
my $secret_val = 0;
|
|
sub gimme_another {
|
|
return ++$secret_val;
|
|
}
|
|
}
|
|
# $secret_val now becomes unreachable by the outside
|
|
# world, but retains its value between calls to gimme_another
|
|
|
|
If this function is being sourced in from a separate file
|
|
via C<require> or C<use>, then this is probably just fine. If it's
|
|
all in the main program, you'll need to arrange for the C<my()>
|
|
to be executed early, either by putting the whole block above
|
|
your main program, or more likely, placing merely a C<BEGIN>
|
|
sub around it to make sure it gets executed before your program
|
|
starts to run:
|
|
|
|
sub BEGIN {
|
|
my $secret_val = 0;
|
|
sub gimme_another {
|
|
return ++$secret_val;
|
|
}
|
|
}
|
|
|
|
See L<perlmod/"Package Constructors and Destructors"> about the C<BEGIN> function.
|
|
|
|
If declared at the outermost scope, the file scope, then lexicals work
|
|
someone like C's file statics. They are available to all functions in
|
|
that same file declared below them, but are inaccessible from outside of
|
|
the file. This is sometimes used in modules to create private variables
|
|
for the whole module.
|
|
|
|
=head2 Temporary Values via local()
|
|
|
|
B<NOTE>: In general, you should be using "C<my>" instead of "C<local>", because
|
|
it's faster and safer. Exceptions to this include the global punctuation
|
|
variables, filehandles and formats, and direct manipulation of the Perl
|
|
symbol table itself. Format variables often use "C<local>" though, as do
|
|
other variables whose current value must be visible to called
|
|
subroutines.
|
|
|
|
Synopsis:
|
|
|
|
local $foo; # declare $foo dynamically local
|
|
local (@wid, %get); # declare list of variables local
|
|
local $foo = "flurp"; # declare $foo dynamic, and init it
|
|
local @oof = @bar; # declare @oof dynamic, and init it
|
|
|
|
local *FH; # localize $FH, @FH, %FH, &FH ...
|
|
local *merlyn = *randal; # now $merlyn is really $randal, plus
|
|
# @merlyn is really @randal, etc
|
|
local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal
|
|
local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc
|
|
|
|
A C<local()> modifies its listed variables to be "local" to the enclosing
|
|
block, C<eval>, or C<do FILE>--and to I<any subroutine called from within that block>.
|
|
A C<local()> just gives temporary values to global (meaning package)
|
|
variables. It does B<not> create a local variable. This is known as
|
|
dynamic scoping. Lexical scoping is done with "C<my>", which works more
|
|
like C's auto declarations.
|
|
|
|
If more than one variable is given to C<local()>, they must be placed in
|
|
parentheses. All listed elements must be legal lvalues. This operator works
|
|
by saving the current values of those variables in its argument list on a
|
|
hidden stack and restoring them upon exiting the block, subroutine, or
|
|
eval. This means that called subroutines can also reference the local
|
|
variable, but not the global one. The argument list may be assigned to if
|
|
desired, which allows you to initialize your local variables. (If no
|
|
initializer is given for a particular variable, it is created with an
|
|
undefined value.) Commonly this is used to name the parameters to a
|
|
subroutine. Examples:
|
|
|
|
for $i ( 0 .. 9 ) {
|
|
$digits{$i} = $i;
|
|
}
|
|
# assume this function uses global %digits hash
|
|
parse_num();
|
|
|
|
# now temporarily add to %digits hash
|
|
if ($base12) {
|
|
# (NOTE: not claiming this is efficient!)
|
|
local %digits = (%digits, 't' => 10, 'e' => 11);
|
|
parse_num(); # parse_num gets this new %digits!
|
|
}
|
|
# old %digits restored here
|
|
|
|
Because C<local()> is a run-time command, it gets executed every time
|
|
through a loop. In releases of Perl previous to 5.0, this used more stack
|
|
storage each time until the loop was exited. Perl now reclaims the space
|
|
each time through, but it's still more efficient to declare your variables
|
|
outside the loop.
|
|
|
|
A C<local> is simply a modifier on an lvalue expression. When you assign to
|
|
a C<local>ized variable, the C<local> doesn't change whether its list is viewed
|
|
as a scalar or an array. So
|
|
|
|
local($foo) = <STDIN>;
|
|
local @FOO = <STDIN>;
|
|
|
|
both supply a list context to the right-hand side, while
|
|
|
|
local $foo = <STDIN>;
|
|
|
|
supplies a scalar context.
|
|
|
|
A note about C<local()> and composite types is in order. Something
|
|
like C<local(%foo)> works by temporarily placing a brand new hash in
|
|
the symbol table. The old hash is left alone, but is hidden "behind"
|
|
the new one.
|
|
|
|
This means the old variable is completely invisible via the symbol
|
|
table (i.e. the hash entry in the C<*foo> typeglob) for the duration
|
|
of the dynamic scope within which the C<local()> was seen. This
|
|
has the effect of allowing one to temporarily occlude any magic on
|
|
composite types. For instance, this will briefly alter a tied
|
|
hash to some other implementation:
|
|
|
|
tie %ahash, 'APackage';
|
|
[...]
|
|
{
|
|
local %ahash;
|
|
tie %ahash, 'BPackage';
|
|
[..called code will see %ahash tied to 'BPackage'..]
|
|
{
|
|
local %ahash;
|
|
[..%ahash is a normal (untied) hash here..]
|
|
}
|
|
}
|
|
[..%ahash back to its initial tied self again..]
|
|
|
|
As another example, a custom implementation of C<%ENV> might look
|
|
like this:
|
|
|
|
{
|
|
local %ENV;
|
|
tie %ENV, 'MyOwnEnv';
|
|
[..do your own fancy %ENV manipulation here..]
|
|
}
|
|
[..normal %ENV behavior here..]
|
|
|
|
It's also worth taking a moment to explain what happens when you
|
|
C<local>ize a member of a composite type (i.e. an array or hash element).
|
|
In this case, the element is C<local>ized I<by name>. This means that
|
|
when the scope of the C<local()> ends, the saved value will be
|
|
restored to the hash element whose key was named in the C<local()>, or
|
|
the array element whose index was named in the C<local()>. If that
|
|
element was deleted while the C<local()> was in effect (e.g. by a
|
|
C<delete()> from a hash or a C<shift()> of an array), it will spring
|
|
back into existence, possibly extending an array and filling in the
|
|
skipped elements with C<undef>. For instance, if you say
|
|
|
|
%hash = ( 'This' => 'is', 'a' => 'test' );
|
|
@ary = ( 0..5 );
|
|
{
|
|
local($ary[5]) = 6;
|
|
local($hash{'a'}) = 'drill';
|
|
while (my $e = pop(@ary)) {
|
|
print "$e . . .\n";
|
|
last unless $e > 3;
|
|
}
|
|
if (@ary) {
|
|
$hash{'only a'} = 'test';
|
|
delete $hash{'a'};
|
|
}
|
|
}
|
|
print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n";
|
|
print "The array has ",scalar(@ary)," elements: ",
|
|
join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n";
|
|
|
|
Perl will print
|
|
|
|
6 . . .
|
|
4 . . .
|
|
3 . . .
|
|
This is a test only a test.
|
|
The array has 6 elements: 0, 1, 2, undef, undef, 5
|
|
|
|
Note also that when you C<local>ize a member of a composite type that
|
|
B<does not exist previously>, the value is treated as though it were
|
|
in an lvalue context, i.e., it is first created and then C<local>ized.
|
|
The consequence of this is that the hash or array is in fact permanently
|
|
modified. For instance, if you say
|
|
|
|
%hash = ( 'This' => 'is', 'a' => 'test' );
|
|
@ary = ( 0..5 );
|
|
{
|
|
local($ary[8]) = 0;
|
|
local($hash{'b'}) = 'whatever';
|
|
}
|
|
printf "%%hash has now %d keys, \@ary %d elements.\n",
|
|
scalar(keys(%hash)), scalar(@ary);
|
|
|
|
Perl will print
|
|
|
|
%hash has now 3 keys, @ary 9 elements.
|
|
|
|
The above behavior of local() on non-existent members of composite
|
|
types is subject to change in future.
|
|
|
|
=head2 Passing Symbol Table Entries (typeglobs)
|
|
|
|
[Note: The mechanism described in this section was originally the only
|
|
way to simulate pass-by-reference in older versions of Perl. While it
|
|
still works fine in modern versions, the new reference mechanism is
|
|
generally easier to work with. See below.]
|
|
|
|
Sometimes you don't want to pass the value of an array to a subroutine
|
|
but rather the name of it, so that the subroutine can modify the global
|
|
copy of it rather than working with a local copy. In perl you can
|
|
refer to all objects of a particular name by prefixing the name
|
|
with a star: C<*foo>. This is often known as a "typeglob", because the
|
|
star on the front can be thought of as a wildcard match for all the
|
|
funny prefix characters on variables and subroutines and such.
|
|
|
|
When evaluated, the typeglob produces a scalar value that represents
|
|
all the objects of that name, including any filehandle, format, or
|
|
subroutine. When assigned to, it causes the name mentioned to refer to
|
|
whatever "C<*>" value was assigned to it. Example:
|
|
|
|
sub doubleary {
|
|
local(*someary) = @_;
|
|
foreach $elem (@someary) {
|
|
$elem *= 2;
|
|
}
|
|
}
|
|
doubleary(*foo);
|
|
doubleary(*bar);
|
|
|
|
Note that scalars are already passed by reference, so you can modify
|
|
scalar arguments without using this mechanism by referring explicitly
|
|
to C<$_[0]> etc. You can modify all the elements of an array by passing
|
|
all the elements as scalars, but you have to use the C<*> mechanism (or
|
|
the equivalent reference mechanism) to C<push>, C<pop>, or change the size of
|
|
an array. It will certainly be faster to pass the typeglob (or reference).
|
|
|
|
Even if you don't want to modify an array, this mechanism is useful for
|
|
passing multiple arrays in a single LIST, because normally the LIST
|
|
mechanism will merge all the array values so that you can't extract out
|
|
the individual arrays. For more on typeglobs, see
|
|
L<perldata/"Typeglobs and Filehandles">.
|
|
|
|
=head2 When to Still Use local()
|
|
|
|
Despite the existence of C<my()>, there are still three places where the
|
|
C<local()> operator still shines. In fact, in these three places, you
|
|
I<must> use C<local> instead of C<my>.
|
|
|
|
=over
|
|
|
|
=item 1. You need to give a global variable a temporary value, especially C<$_>.
|
|
|
|
The global variables, like C<@ARGV> or the punctuation variables, must be
|
|
C<local>ized with C<local()>. This block reads in F</etc/motd>, and splits
|
|
it up into chunks separated by lines of equal signs, which are placed
|
|
in C<@Fields>.
|
|
|
|
{
|
|
local @ARGV = ("/etc/motd");
|
|
local $/ = undef;
|
|
local $_ = <>;
|
|
@Fields = split /^\s*=+\s*$/;
|
|
}
|
|
|
|
It particular, it's important to C<local>ize C<$_> in any routine that assigns
|
|
to it. Look out for implicit assignments in C<while> conditionals.
|
|
|
|
=item 2. You need to create a local file or directory handle or a local function.
|
|
|
|
A function that needs a filehandle of its own must use C<local()> uses
|
|
C<local()> on complete typeglob. This can be used to create new symbol
|
|
table entries:
|
|
|
|
sub ioqueue {
|
|
local (*READER, *WRITER); # not my!
|
|
pipe (READER, WRITER); or die "pipe: $!";
|
|
return (*READER, *WRITER);
|
|
}
|
|
($head, $tail) = ioqueue();
|
|
|
|
See the Symbol module for a way to create anonymous symbol table
|
|
entries.
|
|
|
|
Because assignment of a reference to a typeglob creates an alias, this
|
|
can be used to create what is effectively a local function, or at least,
|
|
a local alias.
|
|
|
|
{
|
|
local *grow = \&shrink; # only until this block exists
|
|
grow(); # really calls shrink()
|
|
move(); # if move() grow()s, it shrink()s too
|
|
}
|
|
grow(); # get the real grow() again
|
|
|
|
See L<perlref/"Function Templates"> for more about manipulating
|
|
functions by name in this way.
|
|
|
|
=item 3. You want to temporarily change just one element of an array or hash.
|
|
|
|
You can C<local>ize just one element of an aggregate. Usually this
|
|
is done on dynamics:
|
|
|
|
{
|
|
local $SIG{INT} = 'IGNORE';
|
|
funct(); # uninterruptible
|
|
}
|
|
# interruptibility automatically restored here
|
|
|
|
But it also works on lexically declared aggregates. Prior to 5.005,
|
|
this operation could on occasion misbehave.
|
|
|
|
=back
|
|
|
|
=head2 Pass by Reference
|
|
|
|
If you want to pass more than one array or hash into a function--or
|
|
return them from it--and have them maintain their integrity, then
|
|
you're going to have to use an explicit pass-by-reference. Before you
|
|
do that, you need to understand references as detailed in L<perlref>.
|
|
This section may not make much sense to you otherwise.
|
|
|
|
Here are a few simple examples. First, let's pass in several
|
|
arrays to a function and have it C<pop> all of then, return a new
|
|
list of all their former last elements:
|
|
|
|
@tailings = popmany ( \@a, \@b, \@c, \@d );
|
|
|
|
sub popmany {
|
|
my $aref;
|
|
my @retlist = ();
|
|
foreach $aref ( @_ ) {
|
|
push @retlist, pop @$aref;
|
|
}
|
|
return @retlist;
|
|
}
|
|
|
|
Here's how you might write a function that returns a
|
|
list of keys occurring in all the hashes passed to it:
|
|
|
|
@common = inter( \%foo, \%bar, \%joe );
|
|
sub inter {
|
|
my ($k, $href, %seen); # locals
|
|
foreach $href (@_) {
|
|
while ( $k = each %$href ) {
|
|
$seen{$k}++;
|
|
}
|
|
}
|
|
return grep { $seen{$_} == @_ } keys %seen;
|
|
}
|
|
|
|
So far, we're using just the normal list return mechanism.
|
|
What happens if you want to pass or return a hash? Well,
|
|
if you're using only one of them, or you don't mind them
|
|
concatenating, then the normal calling convention is ok, although
|
|
a little expensive.
|
|
|
|
Where people get into trouble is here:
|
|
|
|
(@a, @b) = func(@c, @d);
|
|
or
|
|
(%a, %b) = func(%c, %d);
|
|
|
|
That syntax simply won't work. It sets just C<@a> or C<%a> and clears the C<@b> or
|
|
C<%b>. Plus the function didn't get passed into two separate arrays or
|
|
hashes: it got one long list in C<@_>, as always.
|
|
|
|
If you can arrange for everyone to deal with this through references, it's
|
|
cleaner code, although not so nice to look at. Here's a function that
|
|
takes two array references as arguments, returning the two array elements
|
|
in order of how many elements they have in them:
|
|
|
|
($aref, $bref) = func(\@c, \@d);
|
|
print "@$aref has more than @$bref\n";
|
|
sub func {
|
|
my ($cref, $dref) = @_;
|
|
if (@$cref > @$dref) {
|
|
return ($cref, $dref);
|
|
} else {
|
|
return ($dref, $cref);
|
|
}
|
|
}
|
|
|
|
It turns out that you can actually do this also:
|
|
|
|
(*a, *b) = func(\@c, \@d);
|
|
print "@a has more than @b\n";
|
|
sub func {
|
|
local (*c, *d) = @_;
|
|
if (@c > @d) {
|
|
return (\@c, \@d);
|
|
} else {
|
|
return (\@d, \@c);
|
|
}
|
|
}
|
|
|
|
Here we're using the typeglobs to do symbol table aliasing. It's
|
|
a tad subtle, though, and also won't work if you're using C<my()>
|
|
variables, because only globals (well, and C<local()>s) are in the symbol table.
|
|
|
|
If you're passing around filehandles, you could usually just use the bare
|
|
typeglob, like C<*STDOUT>, but typeglobs references would be better because
|
|
they'll still work properly under S<C<use strict 'refs'>>. For example:
|
|
|
|
splutter(\*STDOUT);
|
|
sub splutter {
|
|
my $fh = shift;
|
|
print $fh "her um well a hmmm\n";
|
|
}
|
|
|
|
$rec = get_rec(\*STDIN);
|
|
sub get_rec {
|
|
my $fh = shift;
|
|
return scalar <$fh>;
|
|
}
|
|
|
|
Another way to do this is using C<*HANDLE{IO}>, see L<perlref> for usage
|
|
and caveats.
|
|
|
|
If you're planning on generating new filehandles, you could do this:
|
|
|
|
sub openit {
|
|
my $name = shift;
|
|
local *FH;
|
|
return open (FH, $path) ? *FH : undef;
|
|
}
|
|
|
|
Although that will actually produce a small memory leak. See the bottom
|
|
of L<perlfunc/open()> for a somewhat cleaner way using the C<IO::Handle>
|
|
package.
|
|
|
|
=head2 Prototypes
|
|
|
|
As of the 5.002 release of perl, if you declare
|
|
|
|
sub mypush (\@@)
|
|
|
|
then C<mypush()> takes arguments exactly like C<push()> does. The declaration
|
|
of the function to be called must be visible at compile time. The prototype
|
|
affects only the interpretation of new-style calls to the function, where
|
|
new-style is defined as not using the C<&> character. In other words,
|
|
if you call it like a builtin function, then it behaves like a builtin
|
|
function. If you call it like an old-fashioned subroutine, then it
|
|
behaves like an old-fashioned subroutine. It naturally falls out from
|
|
this rule that prototypes have no influence on subroutine references
|
|
like C<\&foo> or on indirect subroutine calls like C<&{$subref}> or
|
|
C<$subref-E<gt>()>.
|
|
|
|
Method calls are not influenced by prototypes either, because the
|
|
function to be called is indeterminate at compile time, because it depends
|
|
on inheritance.
|
|
|
|
Because the intent is primarily to let you define subroutines that work
|
|
like builtin commands, here are the prototypes for some other functions
|
|
that parse almost exactly like the corresponding builtins.
|
|
|
|
Declared as Called as
|
|
|
|
sub mylink ($$) mylink $old, $new
|
|
sub myvec ($$$) myvec $var, $offset, 1
|
|
sub myindex ($$;$) myindex &getstring, "substr"
|
|
sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off
|
|
sub myreverse (@) myreverse $a, $b, $c
|
|
sub myjoin ($@) myjoin ":", $a, $b, $c
|
|
sub mypop (\@) mypop @array
|
|
sub mysplice (\@$$@) mysplice @array, @array, 0, @pushme
|
|
sub mykeys (\%) mykeys %{$hashref}
|
|
sub myopen (*;$) myopen HANDLE, $name
|
|
sub mypipe (**) mypipe READHANDLE, WRITEHANDLE
|
|
sub mygrep (&@) mygrep { /foo/ } $a, $b, $c
|
|
sub myrand ($) myrand 42
|
|
sub mytime () mytime
|
|
|
|
Any backslashed prototype character represents an actual argument
|
|
that absolutely must start with that character. The value passed
|
|
to the subroutine (as part of C<@_>) will be a reference to the
|
|
actual argument given in the subroutine call, obtained by applying
|
|
C<\> to that argument.
|
|
|
|
Unbackslashed prototype characters have special meanings. Any
|
|
unbackslashed C<@> or C<%> eats all the rest of the arguments, and forces
|
|
list context. An argument represented by C<$> forces scalar context. An
|
|
C<&> requires an anonymous subroutine, which, if passed as the first
|
|
argument, does not require the "C<sub>" keyword or a subsequent comma. A
|
|
C<*> allows the subroutine to accept a bareword, constant, scalar expression,
|
|
typeglob, or a reference to a typeglob in that slot. The value will be
|
|
available to the subroutine either as a simple scalar, or (in the latter
|
|
two cases) as a reference to the typeglob.
|
|
|
|
A semicolon separates mandatory arguments from optional arguments.
|
|
(It is redundant before C<@> or C<%>.)
|
|
|
|
Note how the last three examples above are treated specially by the parser.
|
|
C<mygrep()> is parsed as a true list operator, C<myrand()> is parsed as a
|
|
true unary operator with unary precedence the same as C<rand()>, and
|
|
C<mytime()> is truly without arguments, just like C<time()>. That is, if you
|
|
say
|
|
|
|
mytime +2;
|
|
|
|
you'll get C<mytime() + 2>, not C<mytime(2)>, which is how it would be parsed
|
|
without the prototype.
|
|
|
|
The interesting thing about C<&> is that you can generate new syntax with it:
|
|
|
|
sub try (&@) {
|
|
my($try,$catch) = @_;
|
|
eval { &$try };
|
|
if ($@) {
|
|
local $_ = $@;
|
|
&$catch;
|
|
}
|
|
}
|
|
sub catch (&) { $_[0] }
|
|
|
|
try {
|
|
die "phooey";
|
|
} catch {
|
|
/phooey/ and print "unphooey\n";
|
|
};
|
|
|
|
That prints C<"unphooey">. (Yes, there are still unresolved
|
|
issues having to do with the visibility of C<@_>. I'm ignoring that
|
|
question for the moment. (But note that if we make C<@_> lexically
|
|
scoped, those anonymous subroutines can act like closures... (Gee,
|
|
is this sounding a little Lispish? (Never mind.))))
|
|
|
|
And here's a reimplementation of C<grep>:
|
|
|
|
sub mygrep (&@) {
|
|
my $code = shift;
|
|
my @result;
|
|
foreach $_ (@_) {
|
|
push(@result, $_) if &$code;
|
|
}
|
|
@result;
|
|
}
|
|
|
|
Some folks would prefer full alphanumeric prototypes. Alphanumerics have
|
|
been intentionally left out of prototypes for the express purpose of
|
|
someday in the future adding named, formal parameters. The current
|
|
mechanism's main goal is to let module writers provide better diagnostics
|
|
for module users. Larry feels the notation quite understandable to Perl
|
|
programmers, and that it will not intrude greatly upon the meat of the
|
|
module, nor make it harder to read. The line noise is visually
|
|
encapsulated into a small pill that's easy to swallow.
|
|
|
|
It's probably best to prototype new functions, not retrofit prototyping
|
|
into older ones. That's because you must be especially careful about
|
|
silent impositions of differing list versus scalar contexts. For example,
|
|
if you decide that a function should take just one parameter, like this:
|
|
|
|
sub func ($) {
|
|
my $n = shift;
|
|
print "you gave me $n\n";
|
|
}
|
|
|
|
and someone has been calling it with an array or expression
|
|
returning a list:
|
|
|
|
func(@foo);
|
|
func( split /:/ );
|
|
|
|
Then you've just supplied an automatic C<scalar()> in front of their
|
|
argument, which can be more than a bit surprising. The old C<@foo>
|
|
which used to hold one thing doesn't get passed in. Instead,
|
|
the C<func()> now gets passed in C<1>, that is, the number of elements
|
|
in C<@foo>. And the C<split()> gets called in a scalar context and
|
|
starts scribbling on your C<@_> parameter list.
|
|
|
|
This is all very powerful, of course, and should be used only in moderation
|
|
to make the world a better place.
|
|
|
|
=head2 Constant Functions
|
|
|
|
Functions with a prototype of C<()> are potential candidates for
|
|
inlining. If the result after optimization and constant folding is
|
|
either a constant or a lexically-scoped scalar which has no other
|
|
references, then it will be used in place of function calls made
|
|
without C<&> or C<do>. Calls made using C<&> or C<do> are never
|
|
inlined. (See F<constant.pm> for an easy way to declare most
|
|
constants.)
|
|
|
|
The following functions would all be inlined:
|
|
|
|
sub pi () { 3.14159 } # Not exact, but close.
|
|
sub PI () { 4 * atan2 1, 1 } # As good as it gets,
|
|
# and it's inlined, too!
|
|
sub ST_DEV () { 0 }
|
|
sub ST_INO () { 1 }
|
|
|
|
sub FLAG_FOO () { 1 << 8 }
|
|
sub FLAG_BAR () { 1 << 9 }
|
|
sub FLAG_MASK () { FLAG_FOO | FLAG_BAR }
|
|
|
|
sub OPT_BAZ () { not (0x1B58 & FLAG_MASK) }
|
|
sub BAZ_VAL () {
|
|
if (OPT_BAZ) {
|
|
return 23;
|
|
}
|
|
else {
|
|
return 42;
|
|
}
|
|
}
|
|
|
|
sub N () { int(BAZ_VAL) / 3 }
|
|
BEGIN {
|
|
my $prod = 1;
|
|
for (1..N) { $prod *= $_ }
|
|
sub N_FACTORIAL () { $prod }
|
|
}
|
|
|
|
If you redefine a subroutine that was eligible for inlining, you'll get
|
|
a mandatory warning. (You can use this warning to tell whether or not a
|
|
particular subroutine is considered constant.) The warning is
|
|
considered severe enough not to be optional because previously compiled
|
|
invocations of the function will still be using the old value of the
|
|
function. If you need to be able to redefine the subroutine you need to
|
|
ensure that it isn't inlined, either by dropping the C<()> prototype
|
|
(which changes the calling semantics, so beware) or by thwarting the
|
|
inlining mechanism in some other way, such as
|
|
|
|
sub not_inlined () {
|
|
23 if $];
|
|
}
|
|
|
|
=head2 Overriding Builtin Functions
|
|
|
|
Many builtin functions may be overridden, though this should be tried
|
|
only occasionally and for good reason. Typically this might be
|
|
done by a package attempting to emulate missing builtin functionality
|
|
on a non-Unix system.
|
|
|
|
Overriding may be done only by importing the name from a
|
|
module--ordinary predeclaration isn't good enough. However, the
|
|
C<subs> pragma (compiler directive) lets you, in effect, predeclare subs
|
|
via the import syntax, and these names may then override the builtin ones:
|
|
|
|
use subs 'chdir', 'chroot', 'chmod', 'chown';
|
|
chdir $somewhere;
|
|
sub chdir { ... }
|
|
|
|
To unambiguously refer to the builtin form, one may precede the
|
|
builtin name with the special package qualifier C<CORE::>. For example,
|
|
saying C<CORE::open()> will always refer to the builtin C<open()>, even
|
|
if the current package has imported some other subroutine called
|
|
C<&open()> from elsewhere.
|
|
|
|
Library modules should not in general export builtin names like "C<open>"
|
|
or "C<chdir>" as part of their default C<@EXPORT> list, because these may
|
|
sneak into someone else's namespace and change the semantics unexpectedly.
|
|
Instead, if the module adds the name to the C<@EXPORT_OK> list, then it's
|
|
possible for a user to import the name explicitly, but not implicitly.
|
|
That is, they could say
|
|
|
|
use Module 'open';
|
|
|
|
and it would import the C<open> override, but if they said
|
|
|
|
use Module;
|
|
|
|
they would get the default imports without the overrides.
|
|
|
|
The foregoing mechanism for overriding builtins is restricted, quite
|
|
deliberately, to the package that requests the import. There is a second
|
|
method that is sometimes applicable when you wish to override a builtin
|
|
everywhere, without regard to namespace boundaries. This is achieved by
|
|
importing a sub into the special namespace C<CORE::GLOBAL::>. Here is an
|
|
example that quite brazenly replaces the C<glob> operator with something
|
|
that understands regular expressions.
|
|
|
|
package REGlob;
|
|
require Exporter;
|
|
@ISA = 'Exporter';
|
|
@EXPORT_OK = 'glob';
|
|
|
|
sub import {
|
|
my $pkg = shift;
|
|
return unless @_;
|
|
my $sym = shift;
|
|
my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0));
|
|
$pkg->export($where, $sym, @_);
|
|
}
|
|
|
|
sub glob {
|
|
my $pat = shift;
|
|
my @got;
|
|
local(*D);
|
|
if (opendir D, '.') { @got = grep /$pat/, readdir D; closedir D; }
|
|
@got;
|
|
}
|
|
1;
|
|
|
|
And here's how it could be (ab)used:
|
|
|
|
#use REGlob 'GLOBAL_glob'; # override glob() in ALL namespaces
|
|
package Foo;
|
|
use REGlob 'glob'; # override glob() in Foo:: only
|
|
print for <^[a-z_]+\.pm\$>; # show all pragmatic modules
|
|
|
|
Note that the initial comment shows a contrived, even dangerous example.
|
|
By overriding C<glob> globally, you would be forcing the new (and
|
|
subversive) behavior for the C<glob> operator for B<every> namespace,
|
|
without the complete cognizance or cooperation of the modules that own
|
|
those namespaces. Naturally, this should be done with extreme caution--if
|
|
it must be done at all.
|
|
|
|
The C<REGlob> example above does not implement all the support needed to
|
|
cleanly override perl's C<glob> operator. The builtin C<glob> has
|
|
different behaviors depending on whether it appears in a scalar or list
|
|
context, but our C<REGlob> doesn't. Indeed, many perl builtins have such
|
|
context sensitive behaviors, and these must be adequately supported by
|
|
a properly written override. For a fully functional example of overriding
|
|
C<glob>, study the implementation of C<File::DosGlob> in the standard
|
|
library.
|
|
|
|
|
|
=head2 Autoloading
|
|
|
|
If you call a subroutine that is undefined, you would ordinarily get an
|
|
immediate fatal error complaining that the subroutine doesn't exist.
|
|
(Likewise for subroutines being used as methods, when the method
|
|
doesn't exist in any base class of the class package.) If,
|
|
however, there is an C<AUTOLOAD> subroutine defined in the package or
|
|
packages that were searched for the original subroutine, then that
|
|
C<AUTOLOAD> subroutine is called with the arguments that would have been
|
|
passed to the original subroutine. The fully qualified name of the
|
|
original subroutine magically appears in the C<$AUTOLOAD> variable in the
|
|
same package as the C<AUTOLOAD> routine. The name is not passed as an
|
|
ordinary argument because, er, well, just because, that's why...
|
|
|
|
Most C<AUTOLOAD> routines will load in a definition for the subroutine in
|
|
question using eval, and then execute that subroutine using a special
|
|
form of "goto" that erases the stack frame of the C<AUTOLOAD> routine
|
|
without a trace. (See the standard C<AutoLoader> module, for example.)
|
|
But an C<AUTOLOAD> routine can also just emulate the routine and never
|
|
define it. For example, let's pretend that a function that wasn't defined
|
|
should just call C<system()> with those arguments. All you'd do is this:
|
|
|
|
sub AUTOLOAD {
|
|
my $program = $AUTOLOAD;
|
|
$program =~ s/.*:://;
|
|
system($program, @_);
|
|
}
|
|
date();
|
|
who('am', 'i');
|
|
ls('-l');
|
|
|
|
In fact, if you predeclare the functions you want to call that way, you don't
|
|
even need the parentheses:
|
|
|
|
use subs qw(date who ls);
|
|
date;
|
|
who "am", "i";
|
|
ls -l;
|
|
|
|
A more complete example of this is the standard Shell module, which
|
|
can treat undefined subroutine calls as calls to Unix programs.
|
|
|
|
Mechanisms are available for modules writers to help split the modules
|
|
up into autoloadable files. See the standard AutoLoader module
|
|
described in L<AutoLoader> and in L<AutoSplit>, the standard
|
|
SelfLoader modules in L<SelfLoader>, and the document on adding C
|
|
functions to perl code in L<perlxs>.
|
|
|
|
=head1 SEE ALSO
|
|
|
|
See L<perlref> for more about references and closures. See L<perlxs> if
|
|
you'd like to learn about calling C subroutines from perl. See L<perlmod>
|
|
to learn about bundling up your functions in separate files.
|