542 lines
20 KiB
Plaintext
542 lines
20 KiB
Plaintext
=head1 NAME
|
|
|
|
perlobj - Perl objects
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
First of all, you need to understand what references are in Perl.
|
|
See L<perlref> for that. Second, if you still find the following
|
|
reference work too complicated, a tutorial on object-oriented programming
|
|
in Perl can be found in L<perltoot>.
|
|
|
|
If you're still with us, then
|
|
here are three very simple definitions that you should find reassuring.
|
|
|
|
=over 4
|
|
|
|
=item 1.
|
|
|
|
An object is simply a reference that happens to know which class it
|
|
belongs to.
|
|
|
|
=item 2.
|
|
|
|
A class is simply a package that happens to provide methods to deal
|
|
with object references.
|
|
|
|
=item 3.
|
|
|
|
A method is simply a subroutine that expects an object reference (or
|
|
a package name, for class methods) as the first argument.
|
|
|
|
=back
|
|
|
|
We'll cover these points now in more depth.
|
|
|
|
=head2 An Object is Simply a Reference
|
|
|
|
Unlike say C++, Perl doesn't provide any special syntax for
|
|
constructors. A constructor is merely a subroutine that returns a
|
|
reference to something "blessed" into a class, generally the
|
|
class that the subroutine is defined in. Here is a typical
|
|
constructor:
|
|
|
|
package Critter;
|
|
sub new { bless {} }
|
|
|
|
That word C<new> isn't special. You could have written
|
|
a construct this way, too:
|
|
|
|
package Critter;
|
|
sub spawn { bless {} }
|
|
|
|
In fact, this might even be preferable, because the C++ programmers won't
|
|
be tricked into thinking that C<new> works in Perl as it does in C++.
|
|
It doesn't. We recommend that you name your constructors whatever
|
|
makes sense in the context of the problem you're solving. For example,
|
|
constructors in the Tk extension to Perl are named after the widgets
|
|
they create.
|
|
|
|
One thing that's different about Perl constructors compared with those in
|
|
C++ is that in Perl, they have to allocate their own memory. (The other
|
|
things is that they don't automatically call overridden base-class
|
|
constructors.) The C<{}> allocates an anonymous hash containing no
|
|
key/value pairs, and returns it The bless() takes that reference and
|
|
tells the object it references that it's now a Critter, and returns
|
|
the reference. This is for convenience, because the referenced object
|
|
itself knows that it has been blessed, and the reference to it could
|
|
have been returned directly, like this:
|
|
|
|
sub new {
|
|
my $self = {};
|
|
bless $self;
|
|
return $self;
|
|
}
|
|
|
|
In fact, you often see such a thing in more complicated constructors
|
|
that wish to call methods in the class as part of the construction:
|
|
|
|
sub new {
|
|
my $self = {};
|
|
bless $self;
|
|
$self->initialize();
|
|
return $self;
|
|
}
|
|
|
|
If you care about inheritance (and you should; see
|
|
L<perlmod/"Modules: Creation, Use, and Abuse">),
|
|
then you want to use the two-arg form of bless
|
|
so that your constructors may be inherited:
|
|
|
|
sub new {
|
|
my $class = shift;
|
|
my $self = {};
|
|
bless $self, $class;
|
|
$self->initialize();
|
|
return $self;
|
|
}
|
|
|
|
Or if you expect people to call not just C<CLASS-E<gt>new()> but also
|
|
C<$obj-E<gt>new()>, then use something like this. The initialize()
|
|
method used will be of whatever $class we blessed the
|
|
object into:
|
|
|
|
sub new {
|
|
my $this = shift;
|
|
my $class = ref($this) || $this;
|
|
my $self = {};
|
|
bless $self, $class;
|
|
$self->initialize();
|
|
return $self;
|
|
}
|
|
|
|
Within the class package, the methods will typically deal with the
|
|
reference as an ordinary reference. Outside the class package,
|
|
the reference is generally treated as an opaque value that may
|
|
be accessed only through the class's methods.
|
|
|
|
A constructor may re-bless a referenced object currently belonging to
|
|
another class, but then the new class is responsible for all cleanup
|
|
later. The previous blessing is forgotten, as an object may belong
|
|
to only one class at a time. (Although of course it's free to
|
|
inherit methods from many classes.) If you find yourself having to
|
|
do this, the parent class is probably misbehaving, though.
|
|
|
|
A clarification: Perl objects are blessed. References are not. Objects
|
|
know which package they belong to. References do not. The bless()
|
|
function uses the reference to find the object. Consider
|
|
the following example:
|
|
|
|
$a = {};
|
|
$b = $a;
|
|
bless $a, BLAH;
|
|
print "\$b is a ", ref($b), "\n";
|
|
|
|
This reports $b as being a BLAH, so obviously bless()
|
|
operated on the object and not on the reference.
|
|
|
|
=head2 A Class is Simply a Package
|
|
|
|
Unlike say C++, Perl doesn't provide any special syntax for class
|
|
definitions. You use a package as a class by putting method
|
|
definitions into the class.
|
|
|
|
There is a special array within each package called @ISA, which says
|
|
where else to look for a method if you can't find it in the current
|
|
package. This is how Perl implements inheritance. Each element of the
|
|
@ISA array is just the name of another package that happens to be a
|
|
class package. The classes are searched (depth first) for missing
|
|
methods in the order that they occur in @ISA. The classes accessible
|
|
through @ISA are known as base classes of the current class.
|
|
|
|
All classes implicitly inherit from class C<UNIVERSAL> as their
|
|
last base class. Several commonly used methods are automatically
|
|
supplied in the UNIVERSAL class; see L<"Default UNIVERSAL methods"> for
|
|
more details.
|
|
|
|
If a missing method is found in one of the base classes, it is cached
|
|
in the current class for efficiency. Changing @ISA or defining new
|
|
subroutines invalidates the cache and causes Perl to do the lookup again.
|
|
|
|
If neither the current class, its named base classes, nor the UNIVERSAL
|
|
class contains the requested method, these three places are searched
|
|
all over again, this time looking for a method named AUTOLOAD(). If an
|
|
AUTOLOAD is found, this method is called on behalf of the missing method,
|
|
setting the package global $AUTOLOAD to be the fully qualified name of
|
|
the method that was intended to be called.
|
|
|
|
If none of that works, Perl finally gives up and complains.
|
|
|
|
Perl classes do method inheritance only. Data inheritance is left up
|
|
to the class itself. By and large, this is not a problem in Perl,
|
|
because most classes model the attributes of their object using an
|
|
anonymous hash, which serves as its own little namespace to be carved up
|
|
by the various classes that might want to do something with the object.
|
|
The only problem with this is that you can't sure that you aren't using
|
|
a piece of the hash that isn't already used. A reasonable workaround
|
|
is to prepend your fieldname in the hash with the package name.
|
|
|
|
sub bump {
|
|
my $self = shift;
|
|
$self->{ __PACKAGE__ . ".count"}++;
|
|
}
|
|
|
|
=head2 A Method is Simply a Subroutine
|
|
|
|
Unlike say C++, Perl doesn't provide any special syntax for method
|
|
definition. (It does provide a little syntax for method invocation
|
|
though. More on that later.) A method expects its first argument
|
|
to be the object (reference) or package (string) it is being invoked on. There are just two
|
|
types of methods, which we'll call class and instance.
|
|
(Sometimes you'll hear these called static and virtual, in honor of
|
|
the two C++ method types they most closely resemble.)
|
|
|
|
A class method expects a class name as the first argument. It
|
|
provides functionality for the class as a whole, not for any individual
|
|
object belonging to the class. Constructors are typically class
|
|
methods. Many class methods simply ignore their first argument, because
|
|
they already know what package they're in, and don't care what package
|
|
they were invoked via. (These aren't necessarily the same, because
|
|
class methods follow the inheritance tree just like ordinary instance
|
|
methods.) Another typical use for class methods is to look up an
|
|
object by name:
|
|
|
|
sub find {
|
|
my ($class, $name) = @_;
|
|
$objtable{$name};
|
|
}
|
|
|
|
An instance method expects an object reference as its first argument.
|
|
Typically it shifts the first argument into a "self" or "this" variable,
|
|
and then uses that as an ordinary reference.
|
|
|
|
sub display {
|
|
my $self = shift;
|
|
my @keys = @_ ? @_ : sort keys %$self;
|
|
foreach $key (@keys) {
|
|
print "\t$key => $self->{$key}\n";
|
|
}
|
|
}
|
|
|
|
=head2 Method Invocation
|
|
|
|
There are two ways to invoke a method, one of which you're already
|
|
familiar with, and the other of which will look familiar. Perl 4
|
|
already had an "indirect object" syntax that you use when you say
|
|
|
|
print STDERR "help!!!\n";
|
|
|
|
This same syntax can be used to call either class or instance methods.
|
|
We'll use the two methods defined above, the class method to lookup
|
|
an object reference and the instance method to print out its attributes.
|
|
|
|
$fred = find Critter "Fred";
|
|
display $fred 'Height', 'Weight';
|
|
|
|
These could be combined into one statement by using a BLOCK in the
|
|
indirect object slot:
|
|
|
|
display {find Critter "Fred"} 'Height', 'Weight';
|
|
|
|
For C++ fans, there's also a syntax using -E<gt> notation that does exactly
|
|
the same thing. The parentheses are required if there are any arguments.
|
|
|
|
$fred = Critter->find("Fred");
|
|
$fred->display('Height', 'Weight');
|
|
|
|
or in one statement,
|
|
|
|
Critter->find("Fred")->display('Height', 'Weight');
|
|
|
|
There are times when one syntax is more readable, and times when the
|
|
other syntax is more readable. The indirect object syntax is less
|
|
cluttered, but it has the same ambiguity as ordinary list operators.
|
|
Indirect object method calls are parsed using the same rule as list
|
|
operators: "If it looks like a function, it is a function". (Presuming
|
|
for the moment that you think two words in a row can look like a
|
|
function name. C++ programmers seem to think so with some regularity,
|
|
especially when the first word is "new".) Thus, the parentheses of
|
|
|
|
new Critter ('Barney', 1.5, 70)
|
|
|
|
are assumed to surround ALL the arguments of the method call, regardless
|
|
of what comes after. Saying
|
|
|
|
new Critter ('Bam' x 2), 1.4, 45
|
|
|
|
would be equivalent to
|
|
|
|
Critter->new('Bam' x 2), 1.4, 45
|
|
|
|
which is unlikely to do what you want.
|
|
|
|
There are times when you wish to specify which class's method to use.
|
|
In this case, you can call your method as an ordinary subroutine
|
|
call, being sure to pass the requisite first argument explicitly:
|
|
|
|
$fred = MyCritter::find("Critter", "Fred");
|
|
MyCritter::display($fred, 'Height', 'Weight');
|
|
|
|
Note however, that this does not do any inheritance. If you wish
|
|
merely to specify that Perl should I<START> looking for a method in a
|
|
particular package, use an ordinary method call, but qualify the method
|
|
name with the package like this:
|
|
|
|
$fred = Critter->MyCritter::find("Fred");
|
|
$fred->MyCritter::display('Height', 'Weight');
|
|
|
|
If you're trying to control where the method search begins I<and> you're
|
|
executing in the class itself, then you may use the SUPER pseudo class,
|
|
which says to start looking in your base class's @ISA list without having
|
|
to name it explicitly:
|
|
|
|
$self->SUPER::display('Height', 'Weight');
|
|
|
|
Please note that the C<SUPER::> construct is meaningful I<only> within the
|
|
class.
|
|
|
|
Sometimes you want to call a method when you don't know the method name
|
|
ahead of time. You can use the arrow form, replacing the method name
|
|
with a simple scalar variable containing the method name:
|
|
|
|
$method = $fast ? "findfirst" : "findbest";
|
|
$fred->$method(@args);
|
|
|
|
=head2 Default UNIVERSAL methods
|
|
|
|
The C<UNIVERSAL> package automatically contains the following methods that
|
|
are inherited by all other classes:
|
|
|
|
=over 4
|
|
|
|
=item isa(CLASS)
|
|
|
|
C<isa> returns I<true> if its object is blessed into a subclass of C<CLASS>
|
|
|
|
C<isa> is also exportable and can be called as a sub with two arguments. This
|
|
allows the ability to check what a reference points to. Example
|
|
|
|
use UNIVERSAL qw(isa);
|
|
|
|
if(isa($ref, 'ARRAY')) {
|
|
#...
|
|
}
|
|
|
|
=item can(METHOD)
|
|
|
|
C<can> checks to see if its object has a method called C<METHOD>,
|
|
if it does then a reference to the sub is returned, if it does not then
|
|
I<undef> is returned.
|
|
|
|
=item VERSION( [NEED] )
|
|
|
|
C<VERSION> returns the version number of the class (package). If the
|
|
NEED argument is given then it will check that the current version (as
|
|
defined by the $VERSION variable in the given package) not less than
|
|
NEED; it will die if this is not the case. This method is normally
|
|
called as a class method. This method is called automatically by the
|
|
C<VERSION> form of C<use>.
|
|
|
|
use A 1.2 qw(some imported subs);
|
|
# implies:
|
|
A->VERSION(1.2);
|
|
|
|
=back
|
|
|
|
B<NOTE:> C<can> directly uses Perl's internal code for method lookup, and
|
|
C<isa> uses a very similar method and cache-ing strategy. This may cause
|
|
strange effects if the Perl code dynamically changes @ISA in any package.
|
|
|
|
You may add other methods to the UNIVERSAL class via Perl or XS code.
|
|
You do not need to C<use UNIVERSAL> in order to make these methods
|
|
available to your program. This is necessary only if you wish to
|
|
have C<isa> available as a plain subroutine in the current package.
|
|
|
|
=head2 Destructors
|
|
|
|
When the last reference to an object goes away, the object is
|
|
automatically destroyed. (This may even be after you exit, if you've
|
|
stored references in global variables.) If you want to capture control
|
|
just before the object is freed, you may define a DESTROY method in
|
|
your class. It will automatically be called at the appropriate moment,
|
|
and you can do any extra cleanup you need to do. Perl passes a reference
|
|
to the object under destruction as the first (and only) argument. Beware
|
|
that the reference is a read-only value, and cannot be modified by
|
|
manipulating C<$_[0]> within the destructor. The object itself (i.e.
|
|
the thingy the reference points to, namely C<${$_[0]}>, C<@{$_[0]}>,
|
|
C<%{$_[0]}> etc.) is not similarly constrained.
|
|
|
|
If you arrange to re-bless the reference before the destructor returns,
|
|
perl will again call the DESTROY method for the re-blessed object after
|
|
the current one returns. This can be used for clean delegation of
|
|
object destruction, or for ensuring that destructors in the base classes
|
|
of your choosing get called. Explicitly calling DESTROY is also possible,
|
|
but is usually never needed.
|
|
|
|
Do not confuse the foregoing with how objects I<CONTAINED> in the current
|
|
one are destroyed. Such objects will be freed and destroyed automatically
|
|
when the current object is freed, provided no other references to them exist
|
|
elsewhere.
|
|
|
|
=head2 WARNING
|
|
|
|
While indirect object syntax may well be appealing to English speakers and
|
|
to C++ programmers, be not seduced! It suffers from two grave problems.
|
|
|
|
The first problem is that an indirect object is limited to a name,
|
|
a scalar variable, or a block, because it would have to do too much
|
|
lookahead otherwise, just like any other postfix dereference in the
|
|
language. (These are the same quirky rules as are used for the filehandle
|
|
slot in functions like C<print> and C<printf>.) This can lead to horribly
|
|
confusing precedence problems, as in these next two lines:
|
|
|
|
move $obj->{FIELD}; # probably wrong!
|
|
move $ary[$i]; # probably wrong!
|
|
|
|
Those actually parse as the very surprising:
|
|
|
|
$obj->move->{FIELD}; # Well, lookee here
|
|
$ary->move->[$i]; # Didn't expect this one, eh?
|
|
|
|
Rather than what you might have expected:
|
|
|
|
$obj->{FIELD}->move(); # You should be so lucky.
|
|
$ary[$i]->move; # Yeah, sure.
|
|
|
|
The left side of ``-E<gt>'' is not so limited, because it's an infix operator,
|
|
not a postfix operator.
|
|
|
|
As if that weren't bad enough, think about this: Perl must guess I<at
|
|
compile time> whether C<name> and C<move> above are functions or methods.
|
|
Usually Perl gets it right, but when it doesn't it, you get a function
|
|
call compiled as a method, or vice versa. This can introduce subtle
|
|
bugs that are hard to unravel. For example, calling a method C<new>
|
|
in indirect notation--as C++ programmers are so wont to do--can
|
|
be miscompiled into a subroutine call if there's already a C<new>
|
|
function in scope. You'd end up calling the current package's C<new>
|
|
as a subroutine, rather than the desired class's method. The compiler
|
|
tries to cheat by remembering bareword C<require>s, but the grief if it
|
|
messes up just isn't worth the years of debugging it would likely take
|
|
you to to track such subtle bugs down.
|
|
|
|
The infix arrow notation using ``C<-E<gt>>'' doesn't suffer from either
|
|
of these disturbing ambiguities, so we recommend you use it exclusively.
|
|
|
|
=head2 Summary
|
|
|
|
That's about all there is to it. Now you need just to go off and buy a
|
|
book about object-oriented design methodology, and bang your forehead
|
|
with it for the next six months or so.
|
|
|
|
=head2 Two-Phased Garbage Collection
|
|
|
|
For most purposes, Perl uses a fast and simple reference-based
|
|
garbage collection system. For this reason, there's an extra
|
|
dereference going on at some level, so if you haven't built
|
|
your Perl executable using your C compiler's C<-O> flag, performance
|
|
will suffer. If you I<have> built Perl with C<cc -O>, then this
|
|
probably won't matter.
|
|
|
|
A more serious concern is that unreachable memory with a non-zero
|
|
reference count will not normally get freed. Therefore, this is a bad
|
|
idea:
|
|
|
|
{
|
|
my $a;
|
|
$a = \$a;
|
|
}
|
|
|
|
Even thought $a I<should> go away, it can't. When building recursive data
|
|
structures, you'll have to break the self-reference yourself explicitly
|
|
if you don't care to leak. For example, here's a self-referential
|
|
node such as one might use in a sophisticated tree structure:
|
|
|
|
sub new_node {
|
|
my $self = shift;
|
|
my $class = ref($self) || $self;
|
|
my $node = {};
|
|
$node->{LEFT} = $node->{RIGHT} = $node;
|
|
$node->{DATA} = [ @_ ];
|
|
return bless $node => $class;
|
|
}
|
|
|
|
If you create nodes like that, they (currently) won't go away unless you
|
|
break their self reference yourself. (In other words, this is not to be
|
|
construed as a feature, and you shouldn't depend on it.)
|
|
|
|
Almost.
|
|
|
|
When an interpreter thread finally shuts down (usually when your program
|
|
exits), then a rather costly but complete mark-and-sweep style of garbage
|
|
collection is performed, and everything allocated by that thread gets
|
|
destroyed. This is essential to support Perl as an embedded or a
|
|
multithreadable language. For example, this program demonstrates Perl's
|
|
two-phased garbage collection:
|
|
|
|
#!/usr/bin/perl
|
|
package Subtle;
|
|
|
|
sub new {
|
|
my $test;
|
|
$test = \$test;
|
|
warn "CREATING " . \$test;
|
|
return bless \$test;
|
|
}
|
|
|
|
sub DESTROY {
|
|
my $self = shift;
|
|
warn "DESTROYING $self";
|
|
}
|
|
|
|
package main;
|
|
|
|
warn "starting program";
|
|
{
|
|
my $a = Subtle->new;
|
|
my $b = Subtle->new;
|
|
$$a = 0; # break selfref
|
|
warn "leaving block";
|
|
}
|
|
|
|
warn "just exited block";
|
|
warn "time to die...";
|
|
exit;
|
|
|
|
When run as F</tmp/test>, the following output is produced:
|
|
|
|
starting program at /tmp/test line 18.
|
|
CREATING SCALAR(0x8e5b8) at /tmp/test line 7.
|
|
CREATING SCALAR(0x8e57c) at /tmp/test line 7.
|
|
leaving block at /tmp/test line 23.
|
|
DESTROYING Subtle=SCALAR(0x8e5b8) at /tmp/test line 13.
|
|
just exited block at /tmp/test line 26.
|
|
time to die... at /tmp/test line 27.
|
|
DESTROYING Subtle=SCALAR(0x8e57c) during global destruction.
|
|
|
|
Notice that "global destruction" bit there? That's the thread
|
|
garbage collector reaching the unreachable.
|
|
|
|
Objects are always destructed, even when regular refs aren't and in fact
|
|
are destructed in a separate pass before ordinary refs just to try to
|
|
prevent object destructors from using refs that have been themselves
|
|
destructed. Plain refs are only garbage-collected if the destruct level
|
|
is greater than 0. You can test the higher levels of global destruction
|
|
by setting the PERL_DESTRUCT_LEVEL environment variable, presuming
|
|
C<-DDEBUGGING> was enabled during perl build time.
|
|
|
|
A more complete garbage collection strategy will be implemented
|
|
at a future date.
|
|
|
|
In the meantime, the best solution is to create a non-recursive container
|
|
class that holds a pointer to the self-referential data structure.
|
|
Define a DESTROY method for the containing object's class that manually
|
|
breaks the circularities in the self-referential structure.
|
|
|
|
=head1 SEE ALSO
|
|
|
|
A kinder, gentler tutorial on object-oriented programming in Perl can
|
|
be found in L<perltoot>.
|
|
You should also check out L<perlbot> for other object tricks, traps, and tips,
|
|
as well as L<perlmodlib> for some style guides on constructing both modules
|
|
and classes.
|