446 lines
18 KiB
Plaintext
446 lines
18 KiB
Plaintext
=head1 NAME
|
|
|
|
perlmod - Perl modules (packages and symbol tables)
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
=head2 Packages
|
|
|
|
Perl provides a mechanism for alternative namespaces to protect
|
|
packages from stomping on each other's variables. In fact, there's
|
|
really no such thing as a global variable in Perl . The package
|
|
statement declares the compilation unit as being in the given
|
|
namespace. The scope of the package declaration is from the
|
|
declaration itself through the end of the enclosing block, C<eval>,
|
|
or file, whichever comes first (the same scope as the my() and
|
|
local() operators). Unqualified dynamic identifiers will be in
|
|
this namespace, except for those few identifiers that if unqualified,
|
|
default to the main package instead of the current one as described
|
|
below. A package statement affects only dynamic variables--including
|
|
those you've used local() on--but I<not> lexical variables created
|
|
with my(). Typically it would be the first declaration in a file
|
|
included by the C<do>, C<require>, or C<use> operators. You can
|
|
switch into a package in more than one place; it merely influences
|
|
which symbol table is used by the compiler for the rest of that
|
|
block. You can refer to variables and filehandles in other packages
|
|
by prefixing the identifier with the package name and a double
|
|
colon: C<$Package::Variable>. If the package name is null, the
|
|
C<main> package is assumed. That is, C<$::sail> is equivalent to
|
|
C<$main::sail>.
|
|
|
|
The old package delimiter was a single quote, but double colon is now the
|
|
preferred delimiter, in part because it's more readable to humans, and
|
|
in part because it's more readable to B<emacs> macros. It also makes C++
|
|
programmers feel like they know what's going on--as opposed to using the
|
|
single quote as separator, which was there to make Ada programmers feel
|
|
like they knew what's going on. Because the old-fashioned syntax is still
|
|
supported for backwards compatibility, if you try to use a string like
|
|
C<"This is $owner's house">, you'll be accessing C<$owner::s>; that is,
|
|
the $s variable in package C<owner>, which is probably not what you meant.
|
|
Use braces to disambiguate, as in C<"This is ${owner}'s house">.
|
|
|
|
Packages may themselves contain package separators, as in
|
|
C<$OUTER::INNER::var>. This implies nothing about the order of
|
|
name lookups, however. There are no relative packages: all symbols
|
|
are either local to the current package, or must be fully qualified
|
|
from the outer package name down. For instance, there is nowhere
|
|
within package C<OUTER> that C<$INNER::var> refers to
|
|
C<$OUTER::INNER::var>. It would treat package C<INNER> as a totally
|
|
separate global package.
|
|
|
|
Only identifiers starting with letters (or underscore) are stored
|
|
in a package's symbol table. All other symbols are kept in package
|
|
C<main>, including all punctuation variables, like $_. In addition,
|
|
when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV,
|
|
ARGVOUT, ENV, INC, and SIG are forced to be in package C<main>,
|
|
even when used for other purposes than their built-in one. If you
|
|
have a package called C<m>, C<s>, or C<y>, then you can't use the
|
|
qualified form of an identifier because it would be instead interpreted
|
|
as a pattern match, a substitution, or a transliteration.
|
|
|
|
Variables beginning with underscore used to be forced into package
|
|
main, but we decided it was more useful for package writers to be able
|
|
to use leading underscore to indicate private variables and method names.
|
|
$_ is still global though. See also L<perlvar/"Technical Note on the
|
|
Syntax of Variable Names">.
|
|
|
|
C<eval>ed strings are compiled in the package in which the eval() was
|
|
compiled. (Assignments to C<$SIG{}>, however, assume the signal
|
|
handler specified is in the C<main> package. Qualify the signal handler
|
|
name if you wish to have a signal handler in a package.) For an
|
|
example, examine F<perldb.pl> in the Perl library. It initially switches
|
|
to the C<DB> package so that the debugger doesn't interfere with variables
|
|
in the program you are trying to debug. At various points, however, it
|
|
temporarily switches back to the C<main> package to evaluate various
|
|
expressions in the context of the C<main> package (or wherever you came
|
|
from). See L<perldebug>.
|
|
|
|
The special symbol C<__PACKAGE__> contains the current package, but cannot
|
|
(easily) be used to construct variables.
|
|
|
|
See L<perlsub> for other scoping issues related to my() and local(),
|
|
and L<perlref> regarding closures.
|
|
|
|
=head2 Symbol Tables
|
|
|
|
The symbol table for a package happens to be stored in the hash of that
|
|
name with two colons appended. The main symbol table's name is thus
|
|
C<%main::>, or C<%::> for short. Likewise symbol table for the nested
|
|
package mentioned earlier is named C<%OUTER::INNER::>.
|
|
|
|
The value in each entry of the hash is what you are referring to when you
|
|
use the C<*name> typeglob notation. In fact, the following have the same
|
|
effect, though the first is more efficient because it does the symbol
|
|
table lookups at compile time:
|
|
|
|
local *main::foo = *main::bar;
|
|
local $main::{foo} = $main::{bar};
|
|
|
|
You can use this to print out all the variables in a package, for
|
|
instance. The standard but antequated F<dumpvar.pl> library and
|
|
the CPAN module Devel::Symdump make use of this.
|
|
|
|
Assignment to a typeglob performs an aliasing operation, i.e.,
|
|
|
|
*dick = *richard;
|
|
|
|
causes variables, subroutines, formats, and file and directory handles
|
|
accessible via the identifier C<richard> also to be accessible via the
|
|
identifier C<dick>. If you want to alias only a particular variable or
|
|
subroutine, assign a reference instead:
|
|
|
|
*dick = \$richard;
|
|
|
|
Which makes $richard and $dick the same variable, but leaves
|
|
@richard and @dick as separate arrays. Tricky, eh?
|
|
|
|
This mechanism may be used to pass and return cheap references
|
|
into or from subroutines if you won't want to copy the whole
|
|
thing. It only works when assigning to dynamic variables, not
|
|
lexicals.
|
|
|
|
%some_hash = (); # can't be my()
|
|
*some_hash = fn( \%another_hash );
|
|
sub fn {
|
|
local *hashsym = shift;
|
|
# now use %hashsym normally, and you
|
|
# will affect the caller's %another_hash
|
|
my %nhash = (); # do what you want
|
|
return \%nhash;
|
|
}
|
|
|
|
On return, the reference will overwrite the hash slot in the
|
|
symbol table specified by the *some_hash typeglob. This
|
|
is a somewhat tricky way of passing around references cheaply
|
|
when you won't want to have to remember to dereference variables
|
|
explicitly.
|
|
|
|
Another use of symbol tables is for making "constant" scalars.
|
|
|
|
*PI = \3.14159265358979;
|
|
|
|
Now you cannot alter $PI, which is probably a good thing all in all.
|
|
This isn't the same as a constant subroutine, which is subject to
|
|
optimization at compile-time. This isn't. A constant subroutine is one
|
|
prototyped to take no arguments and to return a constant expression.
|
|
See L<perlsub> for details on these. The C<use constant> pragma is a
|
|
convenient shorthand for these.
|
|
|
|
You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and
|
|
package the *foo symbol table entry comes from. This may be useful
|
|
in a subroutine that gets passed typeglobs as arguments:
|
|
|
|
sub identify_typeglob {
|
|
my $glob = shift;
|
|
print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, "\n";
|
|
}
|
|
identify_typeglob *foo;
|
|
identify_typeglob *bar::baz;
|
|
|
|
This prints
|
|
|
|
You gave me main::foo
|
|
You gave me bar::baz
|
|
|
|
The C<*foo{THING}> notation can also be used to obtain references to the
|
|
individual elements of *foo, see L<perlref>.
|
|
|
|
Subroutine definitions (and declarations, for that matter) need
|
|
not necessarily be situated in the package whose symbol table they
|
|
occupy. You can define a subroutine outside its package by
|
|
explicitly qualifying the name of the subroutine:
|
|
|
|
package main;
|
|
sub Some_package::foo { ... } # &foo defined in Some_package
|
|
|
|
This is just a shorthand for a typeglob assignment at compile time:
|
|
|
|
BEGIN { *Some_package::foo = sub { ... } }
|
|
|
|
and is I<not> the same as writing:
|
|
|
|
{
|
|
package Some_package;
|
|
sub foo { ... }
|
|
}
|
|
|
|
In the first two versions, the body of the subroutine is
|
|
lexically in the main package, I<not> in Some_package. So
|
|
something like this:
|
|
|
|
package main;
|
|
|
|
$Some_package::name = "fred";
|
|
$main::name = "barney";
|
|
|
|
sub Some_package::foo {
|
|
print "in ", __PACKAGE__, ": \$name is '$name'\n";
|
|
}
|
|
|
|
Some_package::foo();
|
|
|
|
prints:
|
|
|
|
in main: $name is 'barney'
|
|
|
|
rather than:
|
|
|
|
in Some_package: $name is 'fred'
|
|
|
|
This also has implications for the use of the SUPER:: qualifier
|
|
(see L<perlobj>).
|
|
|
|
=head2 Package Constructors and Destructors
|
|
|
|
Four special subroutines act as package constructors and destructors.
|
|
These are the C<BEGIN>, C<CHECK>, C<INIT>, and C<END> routines. The
|
|
C<sub> is optional for these routines.
|
|
|
|
A C<BEGIN> subroutine is executed as soon as possible, that is, the moment
|
|
it is completely defined, even before the rest of the containing file
|
|
is parsed. You may have multiple C<BEGIN> blocks within a file--they
|
|
will execute in order of definition. Because a C<BEGIN> block executes
|
|
immediately, it can pull in definitions of subroutines and such from other
|
|
files in time to be visible to the rest of the file. Once a C<BEGIN>
|
|
has run, it is immediately undefined and any code it used is returned to
|
|
Perl's memory pool. This means you can't ever explicitly call a C<BEGIN>.
|
|
|
|
An C<END> subroutine is executed as late as possible, that is, after
|
|
perl has finished running the program and just before the interpreter
|
|
is being exited, even if it is exiting as a result of a die() function.
|
|
(But not if it's polymorphing into another program via C<exec>, or
|
|
being blown out of the water by a signal--you have to trap that yourself
|
|
(if you can).) You may have multiple C<END> blocks within a file--they
|
|
will execute in reverse order of definition; that is: last in, first
|
|
out (LIFO). C<END> blocks are not executed when you run perl with the
|
|
C<-c> switch.
|
|
|
|
Inside an C<END> subroutine, C<$?> contains the value that the program is
|
|
going to pass to C<exit()>. You can modify C<$?> to change the exit
|
|
value of the program. Beware of changing C<$?> by accident (e.g. by
|
|
running something via C<system>).
|
|
|
|
Similar to C<BEGIN> blocks, C<INIT> blocks are run just before the
|
|
Perl runtime begins execution, in "first in, first out" (FIFO) order.
|
|
For example, the code generators documented in L<perlcc> make use of
|
|
C<INIT> blocks to initialize and resolve pointers to XSUBs.
|
|
|
|
Similar to C<END> blocks, C<CHECK> blocks are run just after the
|
|
Perl compile phase ends and before the run time begins, in
|
|
LIFO order. C<CHECK> blocks are again useful in the Perl compiler
|
|
suite to save the compiled state of the program.
|
|
|
|
When you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and
|
|
C<END> work just as they do in B<awk>, as a degenerate case. As currently
|
|
implemented (and subject to change, since its inconvenient at best),
|
|
both C<BEGIN> and<END> blocks are run when you use the B<-c> switch
|
|
for a compile-only syntax check, although your main code is not.
|
|
|
|
=head2 Perl Classes
|
|
|
|
There is no special class syntax in Perl, but a package may act
|
|
as a class if it provides subroutines to act as methods. Such a
|
|
package may also derive some of its methods from another class (package)
|
|
by listing the other package name(s) in its global @ISA array (which
|
|
must be a package global, not a lexical).
|
|
|
|
For more on this, see L<perltoot> and L<perlobj>.
|
|
|
|
=head2 Perl Modules
|
|
|
|
A module is just a set of related function in a library file a Perl
|
|
package with the same name as the file. It is specifically designed
|
|
to be reusable by other modules or programs. It may do this by
|
|
providing a mechanism for exporting some of its symbols into the
|
|
symbol table of any package using it. Or it may function as a class
|
|
definition and make its semantics available implicitly through
|
|
method calls on the class and its objects, without explicitly
|
|
exportating anything. Or it can do a little of both.
|
|
|
|
For example, to start a traditional, non-OO module called Some::Module,
|
|
create a file called F<Some/Module.pm> and start with this template:
|
|
|
|
package Some::Module; # assumes Some/Module.pm
|
|
|
|
use strict;
|
|
use warnings;
|
|
|
|
BEGIN {
|
|
use Exporter ();
|
|
our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);
|
|
|
|
# set the version for version checking
|
|
$VERSION = 1.00;
|
|
# if using RCS/CVS, this may be preferred
|
|
$VERSION = do { my @r = (q$Revision: 2.21 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r }; # must be all one line, for MakeMaker
|
|
|
|
@ISA = qw(Exporter);
|
|
@EXPORT = qw(&func1 &func2 &func4);
|
|
%EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ],
|
|
|
|
# your exported package globals go here,
|
|
# as well as any optionally exported functions
|
|
@EXPORT_OK = qw($Var1 %Hashit &func3);
|
|
}
|
|
our @EXPORT_OK;
|
|
|
|
# non-exported package globals go here
|
|
our @more;
|
|
our $stuff;
|
|
|
|
# initialize package globals, first exported ones
|
|
$Var1 = '';
|
|
%Hashit = ();
|
|
|
|
# then the others (which are still accessible as $Some::Module::stuff)
|
|
$stuff = '';
|
|
@more = ();
|
|
|
|
# all file-scoped lexicals must be created before
|
|
# the functions below that use them.
|
|
|
|
# file-private lexicals go here
|
|
my $priv_var = '';
|
|
my %secret_hash = ();
|
|
|
|
# here's a file-private function as a closure,
|
|
# callable as &$priv_func; it cannot be prototyped.
|
|
my $priv_func = sub {
|
|
# stuff goes here.
|
|
};
|
|
|
|
# make all your functions, whether exported or not;
|
|
# remember to put something interesting in the {} stubs
|
|
sub func1 {} # no prototype
|
|
sub func2() {} # proto'd void
|
|
sub func3($$) {} # proto'd to 2 scalars
|
|
|
|
# this one isn't exported, but could be called!
|
|
sub func4(\%) {} # proto'd to 1 hash ref
|
|
|
|
END { } # module clean-up code here (global destructor)
|
|
|
|
## YOUR CODE GOES HERE
|
|
|
|
1; # don't forget to return a true value from the file
|
|
|
|
Then go on to declare and use your variables in functions without
|
|
any qualifications. See L<Exporter> and the L<perlmodlib> for
|
|
details on mechanics and style issues in module creation.
|
|
|
|
Perl modules are included into your program by saying
|
|
|
|
use Module;
|
|
|
|
or
|
|
|
|
use Module LIST;
|
|
|
|
This is exactly equivalent to
|
|
|
|
BEGIN { require Module; import Module; }
|
|
|
|
or
|
|
|
|
BEGIN { require Module; import Module LIST; }
|
|
|
|
As a special case
|
|
|
|
use Module ();
|
|
|
|
is exactly equivalent to
|
|
|
|
BEGIN { require Module; }
|
|
|
|
All Perl module files have the extension F<.pm>. The C<use> operator
|
|
assumes this so you don't have to spell out "F<Module.pm>" in quotes.
|
|
This also helps to differentiate new modules from old F<.pl> and
|
|
F<.ph> files. Module names are also capitalized unless they're
|
|
functioning as pragmas; pragmas are in effect compiler directives,
|
|
and are sometimes called "pragmatic modules" (or even "pragmata"
|
|
if you're a classicist).
|
|
|
|
The two statements:
|
|
|
|
require SomeModule;
|
|
require "SomeModule.pm";
|
|
|
|
differ from each other in two ways. In the first case, any double
|
|
colons in the module name, such as C<Some::Module>, are translated
|
|
into your system's directory separator, usually "/". The second
|
|
case does not, and would have to be specified literally. The other
|
|
difference is that seeing the first C<require> clues in the compiler
|
|
that uses of indirect object notation involving "SomeModule", as
|
|
in C<$ob = purge SomeModule>, are method calls, not function calls.
|
|
(Yes, this really can make a difference.)
|
|
|
|
Because the C<use> statement implies a C<BEGIN> block, the importing
|
|
of semantics happens as soon as the C<use> statement is compiled,
|
|
before the rest of the file is compiled. This is how it is able
|
|
to function as a pragma mechanism, and also how modules are able to
|
|
declare subroutines that are then visible as list or unary operators for
|
|
the rest of the current file. This will not work if you use C<require>
|
|
instead of C<use>. With C<require> you can get into this problem:
|
|
|
|
require Cwd; # make Cwd:: accessible
|
|
$here = Cwd::getcwd();
|
|
|
|
use Cwd; # import names from Cwd::
|
|
$here = getcwd();
|
|
|
|
require Cwd; # make Cwd:: accessible
|
|
$here = getcwd(); # oops! no main::getcwd()
|
|
|
|
In general, C<use Module ()> is recommended over C<require Module>,
|
|
because it determines module availability at compile time, not in the
|
|
middle of your program's execution. An exception would be if two modules
|
|
each tried to C<use> each other, and each also called a function from
|
|
that other module. In that case, it's easy to use C<require>s instead.
|
|
|
|
Perl packages may be nested inside other package names, so we can have
|
|
package names containing C<::>. But if we used that package name
|
|
directly as a filename it would makes for unwieldy or impossible
|
|
filenames on some systems. Therefore, if a module's name is, say,
|
|
C<Text::Soundex>, then its definition is actually found in the library
|
|
file F<Text/Soundex.pm>.
|
|
|
|
Perl modules always have a F<.pm> file, but there may also be
|
|
dynamically linked executables (often ending in F<.so>) or autoloaded
|
|
subroutine definitions (often ending in F<.al> associated with the
|
|
module. If so, these will be entirely transparent to the user of
|
|
the module. It is the responsibility of the F<.pm> file to load
|
|
(or arrange to autoload) any additional functionality. For example,
|
|
although the POSIX module happens to do both dynamic loading and
|
|
autoloading, but the user can say just C<use POSIX> to get it all.
|
|
|
|
=head1 SEE ALSO
|
|
|
|
See L<perlmodlib> for general style issues related to building Perl
|
|
modules and classes, as well as descriptions of the standard library
|
|
and CPAN, L<Exporter> for how Perl's standard import/export mechanism
|
|
works, L<perltoot> and L<perltootc> for an in-depth tutorial on
|
|
creating classes, L<perlobj> for a hard-core reference document on
|
|
objects, L<perlsub> for an explanation of functions and scoping,
|
|
and L<perlxstut> and L<perlguts> for more information on writing
|
|
extension modules.
|