417 lines
14 KiB
Plaintext
417 lines
14 KiB
Plaintext
|
|
=head1 NAME
|
|
|
|
perlreftut - Mark's very short tutorial about references
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
One of the most important new features in Perl 5 was the capability to
|
|
manage complicated data structures like multidimensional arrays and
|
|
nested hashes. To enable these, Perl 5 introduced a feature called
|
|
`references', and using references is the key to managing complicated,
|
|
structured data in Perl. Unfortunately, there's a lot of funny syntax
|
|
to learn, and the main manual page can be hard to follow. The manual
|
|
is quite complete, and sometimes people find that a problem, because
|
|
it can be hard to tell what is important and what isn't.
|
|
|
|
Fortunately, you only need to know 10% of what's in the main page to get
|
|
90% of the benefit. This page will show you that 10%.
|
|
|
|
=head1 Who Needs Complicated Data Structures?
|
|
|
|
One problem that came up all the time in Perl 4 was how to represent a
|
|
hash whose values were lists. Perl 4 had hashes, of course, but the
|
|
values had to be scalars; they couldn't be lists.
|
|
|
|
Why would you want a hash of lists? Let's take a simple example: You
|
|
have a file of city and country names, like this:
|
|
|
|
Chicago, USA
|
|
Frankfurt, Germany
|
|
Berlin, Germany
|
|
Washington, USA
|
|
Helsinki, Finland
|
|
New York, USA
|
|
|
|
and you want to produce an output like this, with each country mentioned
|
|
once, and then an alphabetical list of the cities in that country:
|
|
|
|
Finland: Helsinki.
|
|
Germany: Berlin, Frankfurt.
|
|
USA: Chicago, New York, Washington.
|
|
|
|
The natural way to do this is to have a hash whose keys are country
|
|
names. Associated with each country name key is a list of the cities in
|
|
that country. Each time you read a line of input, split it into a country
|
|
and a city, look up the list of cities already known to be in that
|
|
country, and append the new city to the list. When you're done reading
|
|
the input, iterate over the hash as usual, sorting each list of cities
|
|
before you print it out.
|
|
|
|
If hash values can't be lists, you lose. In Perl 4, hash values can't
|
|
be lists; they can only be strings. You lose. You'd probably have to
|
|
combine all the cities into a single string somehow, and then when
|
|
time came to write the output, you'd have to break the string into a
|
|
list, sort the list, and turn it back into a string. This is messy
|
|
and error-prone. And it's frustrating, because Perl already has
|
|
perfectly good lists that would solve the problem if only you could
|
|
use them.
|
|
|
|
=head1 The Solution
|
|
|
|
By the time Perl 5 rolled around, we were already stuck with this
|
|
design: Hash values must be scalars. The solution to this is
|
|
references.
|
|
|
|
A reference is a scalar value that I<refers to> an entire array or an
|
|
entire hash (or to just about anything else). Names are one kind of
|
|
reference that you're already familiar with. Think of the President:
|
|
a messy, inconvenient bag of blood and bones. But to talk about him,
|
|
or to represent him in a computer program, all you need is the easy,
|
|
convenient scalar string "Bill Clinton".
|
|
|
|
References in Perl are like names for arrays and hashes. They're
|
|
Perl's private, internal names, so you can be sure they're
|
|
unambiguous. Unlike "Bill Clinton", a reference only refers to one
|
|
thing, and you always know what it refers to. If you have a reference
|
|
to an array, you can recover the entire array from it. If you have a
|
|
reference to a hash, you can recover the entire hash. But the
|
|
reference is still an easy, compact scalar value.
|
|
|
|
You can't have a hash whose values are arrays; hash values can only be
|
|
scalars. We're stuck with that. But a single reference can refer to
|
|
an entire array, and references are scalars, so you can have a hash of
|
|
references to arrays, and it'll act a lot like a hash of arrays, and
|
|
it'll be just as useful as a hash of arrays.
|
|
|
|
We'll come back to this city-country problem later, after we've seen
|
|
some syntax for managing references.
|
|
|
|
|
|
=head1 Syntax
|
|
|
|
There are just two ways to make a reference, and just two ways to use
|
|
it once you have it.
|
|
|
|
=head2 Making References
|
|
|
|
B<Make Rule 1>
|
|
|
|
If you put a C<\> in front of a variable, you get a
|
|
reference to that variable.
|
|
|
|
$aref = \@array; # $aref now holds a reference to @array
|
|
$href = \%hash; # $href now holds a reference to %hash
|
|
|
|
Once the reference is stored in a variable like $aref or $href, you
|
|
can copy it or store it just the same as any other scalar value:
|
|
|
|
$xy = $aref; # $xy now holds a reference to @array
|
|
$p[3] = $href; # $p[3] now holds a reference to %hash
|
|
$z = $p[3]; # $z now holds a reference to %hash
|
|
|
|
|
|
These examples show how to make references to variables with names.
|
|
Sometimes you want to make an array or a hash that doesn't have a
|
|
name. This is analogous to the way you like to be able to use the
|
|
string C<"\n"> or the number 80 without having to store it in a named
|
|
variable first.
|
|
|
|
B<Make Rule 2>
|
|
|
|
C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to
|
|
that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a
|
|
reference to that hash.
|
|
|
|
$aref = [ 1, "foo", undef, 13 ];
|
|
# $aref now holds a reference to an array
|
|
|
|
$href = { APR => 4, AUG => 8 };
|
|
# $href now holds a reference to a hash
|
|
|
|
|
|
The references you get from rule 2 are the same kind of
|
|
references that you get from rule 1:
|
|
|
|
# This:
|
|
$aref = [ 1, 2, 3 ];
|
|
|
|
# Does the same as this:
|
|
@array = (1, 2, 3);
|
|
$aref = \@array;
|
|
|
|
|
|
The first line is an abbreviation for the following two lines, except
|
|
that it doesn't create the superfluous array variable C<@array>.
|
|
|
|
|
|
=head2 Using References
|
|
|
|
What can you do with a reference once you have it? It's a scalar
|
|
value, and we've seen that you can store it as a scalar and get it back
|
|
again just like any scalar. There are just two more ways to use it:
|
|
|
|
B<Use Rule 1>
|
|
|
|
If C<$aref> contains a reference to an array, then you
|
|
can put C<{$aref}> anywhere you would normally put the name of an
|
|
array. For example, C<@{$aref}> instead of C<@array>.
|
|
|
|
Here are some examples of that:
|
|
|
|
Arrays:
|
|
|
|
|
|
@a @{$aref} An array
|
|
reverse @a reverse @{$aref} Reverse the array
|
|
$a[3] ${$aref}[3] An element of the array
|
|
$a[3] = 17; ${$aref}[3] = 17 Assigning an element
|
|
|
|
|
|
On each line are two expressions that do the same thing. The
|
|
left-hand versions operate on the array C<@a>, and the right-hand
|
|
versions operate on the array that is referred to by C<$aref>, but
|
|
once they find the array they're operating on, they do the same things
|
|
to the arrays.
|
|
|
|
Using a hash reference is I<exactly> the same:
|
|
|
|
%h %{$href} A hash
|
|
keys %h keys %{$href} Get the keys from the hash
|
|
$h{'red'} ${$href}{'red'} An element of the hash
|
|
$h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element
|
|
|
|
|
|
B<Use Rule 2>
|
|
|
|
C<${$aref}[3]> is too hard to read, so you can write C<$aref-E<gt>[3]>
|
|
instead.
|
|
|
|
C<${$href}{red}> is too hard to read, so you can write
|
|
C<$href-E<gt>{red}> instead.
|
|
|
|
Most often, when you have an array or a hash, you want to get or set a
|
|
single element from it. C<${$aref}[3]> and C<${$href}{'red'}> have
|
|
too much punctuation, and Perl lets you abbreviate.
|
|
|
|
If C<$aref> holds a reference to an array, then C<$aref-E<gt>[3]> is
|
|
the fourth element of the array. Don't confuse this with C<$aref[3]>,
|
|
which is the fourth element of a totally different array, one
|
|
deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the
|
|
same way that C<$item> and C<@item> are.
|
|
|
|
Similarly, C<$href-E<gt>{'red'}> is part of the hash referred to by
|
|
the scalar variable C<$href>, perhaps even one with no name.
|
|
C<$href{'red'}> is part of the deceptively named C<%href> hash. It's
|
|
easy to forget to leave out the C<-E<gt>>, and if you do, you'll get
|
|
bizarre results when your program gets array and hash elements out of
|
|
totally unexpected hashes and arrays that weren't the ones you wanted
|
|
to use.
|
|
|
|
|
|
=head1 An Example
|
|
|
|
Let's see a quick example of how all this is useful.
|
|
|
|
First, remember that C<[1, 2, 3]> makes an anonymous array containing
|
|
C<(1, 2, 3)>, and gives you a reference to that array.
|
|
|
|
Now think about
|
|
|
|
@a = ( [1, 2, 3],
|
|
[4, 5, 6],
|
|
[7, 8, 9]
|
|
);
|
|
|
|
@a is an array with three elements, and each one is a reference to
|
|
another array.
|
|
|
|
C<$a[1]> is one of these references. It refers to an array, the array
|
|
containing C<(4, 5, 6)>, and because it is a reference to an array,
|
|
B<USE RULE 2> says that we can write C<$a[1]-E<gt>[2]> to get the
|
|
third element from that array. C<$a[1]-E<gt>[2]> is the 6.
|
|
Similarly, C<$a[0]-E<gt>[1]> is the 2. What we have here is like a
|
|
two-dimensional array; you can write C<$a[ROW]-E<gt>[COLUMN]> to get
|
|
or set the element in any row and any column of the array.
|
|
|
|
The notation still looks a little cumbersome, so there's one more
|
|
abbreviation:
|
|
|
|
=head1 Arrow Rule
|
|
|
|
In between two B<subscripts>, the arrow is optional.
|
|
|
|
Instead of C<$a[1]-E<gt>[2]>, we can write C<$a[1][2]>; it means the
|
|
same thing. Instead of C<$a[0]-E<gt>[1]>, we can write C<$a[0][1]>;
|
|
it means the same thing.
|
|
|
|
Now it really looks like two-dimensional arrays!
|
|
|
|
You can see why the arrows are important. Without them, we would have
|
|
had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For
|
|
three-dimensional arrays, they let us write C<$x[2][3][5]> instead of
|
|
the unreadable C<${${$x[2]}[3]}[5]>.
|
|
|
|
|
|
=head1 Solution
|
|
|
|
Here's the answer to the problem I posed earlier, of reformatting a
|
|
file of city and country names.
|
|
|
|
1 while (<>) {
|
|
2 chomp;
|
|
3 my ($city, $country) = split /, /;
|
|
4 push @{$table{$country}}, $city;
|
|
5 }
|
|
6
|
|
7 foreach $country (sort keys %table) {
|
|
8 print "$country: ";
|
|
9 my @cities = @{$table{$country}};
|
|
10 print join ', ', sort @cities;
|
|
11 print ".\n";
|
|
12 }
|
|
|
|
|
|
The program has two pieces: Lines 1--5 read the input and build a
|
|
data structure, and lines 7--12 analyze the data and print out the
|
|
report.
|
|
|
|
In the first part, line 4 is the important one. We're going to have a
|
|
hash, C<%table>, whose keys are country names, and whose values are
|
|
(references to) arrays of city names. After acquiring a city and
|
|
country name, the program looks up C<$table{$country}>, which holds (a
|
|
reference to) the list of cities seen in that country so far. Line 4 is
|
|
totally analogous to
|
|
|
|
push @array, $city;
|
|
|
|
except that the name C<array> has been replaced by the reference
|
|
C<{$table{$country}}>. The C<push> adds a city name to the end of the
|
|
referred-to array.
|
|
|
|
In the second part, line 9 is the important one. Again,
|
|
C<$table{$country}> is (a reference to) the list of cities in the country, so
|
|
we can recover the original list, and copy it into the array C<@cities>,
|
|
by using C<@{$table{$country}}>. Line 9 is totally analogous to
|
|
|
|
@cities = @array;
|
|
|
|
except that the name C<array> has been replaced by the reference
|
|
C<{$table{$country}}>. The C<@> tells Perl to get the entire array.
|
|
|
|
The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>,
|
|
C<print>, and doesn't involve references at all.
|
|
|
|
There's one fine point I skipped. Suppose the program has just read
|
|
the first line in its input that happens to mention Greece.
|
|
Control is at line 4, C<$country> is C<'Greece'>, and C<$city> is
|
|
C<'Athens'>. Since this is the first city in Greece,
|
|
C<$table{$country}> is undefined---in fact there isn't an C<'Greece'> key
|
|
in C<%table> at all. What does line 4 do here?
|
|
|
|
4 push @{$table{$country}}, $city;
|
|
|
|
|
|
This is Perl, so it does the exact right thing. It sees that you want
|
|
to push C<Athens> onto an array that doesn't exist, so it helpfully
|
|
makes a new, empty, anonymous array for you, installs it in the table,
|
|
and then pushes C<Athens> onto it. This is called `autovivification'.
|
|
|
|
|
|
=head1 The Rest
|
|
|
|
I promised to give you 90% of the benefit with 10% of the details, and
|
|
that means I left out 90% of the details. Now that you have an
|
|
overview of the important parts, it should be easier to read the
|
|
L<perlref> manual page, which discusses 100% of the details.
|
|
|
|
Some of the highlights of L<perlref>:
|
|
|
|
=over 4
|
|
|
|
=item *
|
|
|
|
You can make references to anything, including scalars, functions, and
|
|
other references.
|
|
|
|
=item *
|
|
|
|
In B<USE RULE 1>, you can omit the curly brackets whenever the thing
|
|
inside them is an atomic scalar variable like C<$aref>. For example,
|
|
C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as
|
|
C<${$aref}[1]>. If you're just starting out, you may want to adopt
|
|
the habit of always including the curly brackets.
|
|
|
|
=item *
|
|
|
|
To see if a variable contains a reference, use the `ref' function.
|
|
It returns true if its argument is a reference. Actually it's a
|
|
little better than that: It returns HASH for hash references and
|
|
ARRAY for array references.
|
|
|
|
=item *
|
|
|
|
If you try to use a reference like a string, you get strings like
|
|
|
|
ARRAY(0x80f5dec) or HASH(0x826afc0)
|
|
|
|
If you ever see a string that looks like this, you'll know you
|
|
printed out a reference by mistake.
|
|
|
|
A side effect of this representation is that you can use C<eq> to see
|
|
if two references refer to the same thing. (But you should usually use
|
|
C<==> instead because it's much faster.)
|
|
|
|
=item *
|
|
|
|
You can use a string as if it were a reference. If you use the string
|
|
C<"foo"> as an array reference, it's taken to be a reference to the
|
|
array C<@foo>. This is called a I<soft reference> or I<symbolic reference>.
|
|
|
|
=back
|
|
|
|
You might prefer to go on to L<perllol> instead of L<perlref>; it
|
|
discusses lists of lists and multidimensional arrays in detail. After
|
|
that, you should move on to L<perldsc>; it's a Data Structure Cookbook
|
|
that shows recipes for using and printing out arrays of hashes, hashes
|
|
of arrays, and other kinds of data.
|
|
|
|
=head1 Summary
|
|
|
|
Everyone needs compound data structures, and in Perl the way you get
|
|
them is with references. There are four important rules for managing
|
|
references: Two for making references and two for using them. Once
|
|
you know these rules you can do most of the important things you need
|
|
to do with references.
|
|
|
|
=head1 Credits
|
|
|
|
Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref@plover.com>)
|
|
|
|
This article originally appeared in I<The Perl Journal>
|
|
(http://tpj.com) volume 3, #2. Reprinted with permission.
|
|
|
|
The original title was I<Understand References Today>.
|
|
|
|
=head2 Distribution Conditions
|
|
|
|
Copyright 1998 The Perl Journal.
|
|
|
|
When included as part of the Standard Version of Perl, or as part of
|
|
its complete documentation whether printed or otherwise, this work may
|
|
be distributed only under the terms of Perl's Artistic License. Any
|
|
distribution of this file or derivatives thereof outside of that
|
|
package require that special arrangements be made with copyright
|
|
holder.
|
|
|
|
Irrespective of its distribution, all code examples in these files are
|
|
hereby placed into the public domain. You are permitted and
|
|
encouraged to use this code in your own programs for fun or for profit
|
|
as you see fit. A simple comment in the code giving credit would be
|
|
courteous but is not required.
|
|
|
|
|
|
|
|
|
|
=cut
|