1998-09-09 07:00:04 +00:00
|
|
|
=head1 NAME
|
|
|
|
|
|
|
|
perldsc - Perl Data Structures Cookbook
|
|
|
|
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
|
|
|
|
The single feature most sorely lacking in the Perl programming language
|
|
|
|
prior to its 5.0 release was complex data structures. Even without direct
|
|
|
|
language support, some valiant programmers did manage to emulate them, but
|
|
|
|
it was hard work and not for the faint of heart. You could occasionally
|
2000-06-25 11:04:01 +00:00
|
|
|
get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the
|
|
|
|
keys are actually more like a single concatenated string C<"$AoA$b">, but
|
1998-09-09 07:00:04 +00:00
|
|
|
traversal and sorting were difficult. More desperate programmers even
|
|
|
|
hacked Perl's internal symbol table directly, a strategy that proved hard
|
|
|
|
to develop and maintain--to put it mildly.
|
|
|
|
|
|
|
|
The 5.0 release of Perl let us have complex data structures. You
|
|
|
|
may now write something like this and all of a sudden, you'd have a array
|
|
|
|
with three dimensions!
|
|
|
|
|
|
|
|
for $x (1 .. 10) {
|
|
|
|
for $y (1 .. 10) {
|
|
|
|
for $z (1 .. 10) {
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoA[$x][$y][$z] =
|
1998-09-09 07:00:04 +00:00
|
|
|
$x ** $y + $z;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
Alas, however simple this may appear, underneath it's a much more
|
|
|
|
elaborate construct than meets the eye!
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
How do you print it out? Why can't you say just C<print @AoA>? How do
|
1998-09-09 07:00:04 +00:00
|
|
|
you sort it? How can you pass it to a function or get one of these back
|
|
|
|
from a function? Is is an object? Can you save it to disk to read
|
|
|
|
back later? How do you access whole rows or columns of that matrix? Do
|
|
|
|
all the values have to be numeric?
|
|
|
|
|
|
|
|
As you see, it's quite easy to become confused. While some small portion
|
|
|
|
of the blame for this can be attributed to the reference-based
|
|
|
|
implementation, it's really more due to a lack of existing documentation with
|
|
|
|
examples designed for the beginner.
|
|
|
|
|
|
|
|
This document is meant to be a detailed but understandable treatment of the
|
|
|
|
many different sorts of data structures you might want to develop. It
|
|
|
|
should also serve as a cookbook of examples. That way, when you need to
|
|
|
|
create one of these complex data structures, you can just pinch, pilfer, or
|
|
|
|
purloin a drop-in example from here.
|
|
|
|
|
|
|
|
Let's look at each of these possible constructs in detail. There are separate
|
|
|
|
sections on each of the following:
|
|
|
|
|
|
|
|
=over 5
|
|
|
|
|
|
|
|
=item * arrays of arrays
|
|
|
|
|
|
|
|
=item * hashes of arrays
|
|
|
|
|
|
|
|
=item * arrays of hashes
|
|
|
|
|
|
|
|
=item * hashes of hashes
|
|
|
|
|
|
|
|
=item * more elaborate constructs
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
But for now, let's look at general issues common to all
|
|
|
|
these types of data structures.
|
|
|
|
|
|
|
|
=head1 REFERENCES
|
|
|
|
|
|
|
|
The most important thing to understand about all data structures in Perl
|
|
|
|
-- including multidimensional arrays--is that even though they might
|
|
|
|
appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
|
|
|
|
one-dimensional. They can hold only scalar values (meaning a string,
|
|
|
|
number, or a reference). They cannot directly contain other arrays or
|
|
|
|
hashes, but instead contain I<references> to other arrays or hashes.
|
|
|
|
|
|
|
|
You can't use a reference to a array or hash in quite the same way that you
|
|
|
|
would a real array or hash. For C or C++ programmers unused to
|
|
|
|
distinguishing between arrays and pointers to the same, this can be
|
|
|
|
confusing. If so, just think of it as the difference between a structure
|
|
|
|
and a pointer to a structure.
|
|
|
|
|
|
|
|
You can (and should) read more about references in the perlref(1) man
|
|
|
|
page. Briefly, references are rather like pointers that know what they
|
|
|
|
point to. (Objects are also a kind of reference, but we won't be needing
|
|
|
|
them right away--if ever.) This means that when you have something which
|
|
|
|
looks to you like an access to a two-or-more-dimensional array and/or hash,
|
|
|
|
what's really going on is that the base type is
|
|
|
|
merely a one-dimensional entity that contains references to the next
|
|
|
|
level. It's just that you can I<use> it as though it were a
|
|
|
|
two-dimensional one. This is actually the way almost all C
|
|
|
|
multidimensional arrays work as well.
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
$array[7][12] # array of arrays
|
|
|
|
$array[7]{string} # array of hashes
|
1998-09-09 07:00:04 +00:00
|
|
|
$hash{string}[7] # hash of arrays
|
|
|
|
$hash{string}{'another string'} # hash of hashes
|
|
|
|
|
|
|
|
Now, because the top level contains only references, if you try to print
|
|
|
|
out your array in with a simple print() function, you'll get something
|
|
|
|
that doesn't look very nice, like this:
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
@AoA = ( [2, 3], [4, 5, 7], [0] );
|
|
|
|
print $AoA[1][2];
|
1998-09-09 07:00:04 +00:00
|
|
|
7
|
2000-06-25 11:04:01 +00:00
|
|
|
print @AoA;
|
1998-09-09 07:00:04 +00:00
|
|
|
ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
|
|
|
|
|
|
|
|
|
|
|
|
That's because Perl doesn't (ever) implicitly dereference your variables.
|
|
|
|
If you want to get at the thing a reference is referring to, then you have
|
|
|
|
to do this yourself using either prefix typing indicators, like
|
|
|
|
C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
|
|
|
|
like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
|
|
|
|
|
|
|
|
=head1 COMMON MISTAKES
|
|
|
|
|
|
|
|
The two most common mistakes made in constructing something like
|
|
|
|
an array of arrays is either accidentally counting the number of
|
|
|
|
elements or else taking a reference to the same memory location
|
|
|
|
repeatedly. Here's the case where you just get the count instead
|
|
|
|
of a nested array:
|
|
|
|
|
|
|
|
for $i (1..10) {
|
2000-06-25 11:04:01 +00:00
|
|
|
@array = somefunc($i);
|
|
|
|
$AoA[$i] = @array; # WRONG!
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
That's just the simple case of assigning an array to a scalar and getting
|
1998-09-09 07:00:04 +00:00
|
|
|
its element count. If that's what you really and truly want, then you
|
|
|
|
might do well to consider being a tad more explicit about it, like this:
|
|
|
|
|
|
|
|
for $i (1..10) {
|
2000-06-25 11:04:01 +00:00
|
|
|
@array = somefunc($i);
|
|
|
|
$counts[$i] = scalar @array;
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
Here's the case of taking a reference to the same memory location
|
|
|
|
again and again:
|
|
|
|
|
|
|
|
for $i (1..10) {
|
2000-06-25 11:04:01 +00:00
|
|
|
@array = somefunc($i);
|
|
|
|
$AoA[$i] = \@array; # WRONG!
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
So, what's the big problem with that? It looks right, doesn't it?
|
|
|
|
After all, I just told you that you need an array of references, so by
|
|
|
|
golly, you've made me one!
|
|
|
|
|
|
|
|
Unfortunately, while this is true, it's still broken. All the references
|
2000-06-25 11:04:01 +00:00
|
|
|
in @AoA refer to the I<very same place>, and they will therefore all hold
|
|
|
|
whatever was last in @array! It's similar to the problem demonstrated in
|
1998-09-09 07:00:04 +00:00
|
|
|
the following C program:
|
|
|
|
|
|
|
|
#include <pwd.h>
|
|
|
|
main() {
|
|
|
|
struct passwd *getpwnam(), *rp, *dp;
|
|
|
|
rp = getpwnam("root");
|
|
|
|
dp = getpwnam("daemon");
|
|
|
|
|
|
|
|
printf("daemon name is %s\nroot name is %s\n",
|
|
|
|
dp->pw_name, rp->pw_name);
|
|
|
|
}
|
|
|
|
|
|
|
|
Which will print
|
|
|
|
|
|
|
|
daemon name is daemon
|
|
|
|
root name is daemon
|
|
|
|
|
|
|
|
The problem is that both C<rp> and C<dp> are pointers to the same location
|
|
|
|
in memory! In C, you'd have to remember to malloc() yourself some new
|
|
|
|
memory. In Perl, you'll want to use the array constructor C<[]> or the
|
|
|
|
hash constructor C<{}> instead. Here's the right way to do the preceding
|
|
|
|
broken code fragments:
|
|
|
|
|
|
|
|
for $i (1..10) {
|
2000-06-25 11:04:01 +00:00
|
|
|
@array = somefunc($i);
|
|
|
|
$AoA[$i] = [ @array ];
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
The square brackets make a reference to a new array with a I<copy>
|
2000-06-25 11:04:01 +00:00
|
|
|
of what's in @array at the time of the assignment. This is what
|
1998-09-09 07:00:04 +00:00
|
|
|
you want.
|
|
|
|
|
|
|
|
Note that this will produce something similar, but it's
|
|
|
|
much harder to read:
|
|
|
|
|
|
|
|
for $i (1..10) {
|
2000-06-25 11:04:01 +00:00
|
|
|
@array = 0 .. $i;
|
|
|
|
@{$AoA[$i]} = @array;
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
Is it the same? Well, maybe so--and maybe not. The subtle difference
|
|
|
|
is that when you assign something in square brackets, you know for sure
|
|
|
|
it's always a brand new reference with a new I<copy> of the data.
|
2000-06-25 11:04:01 +00:00
|
|
|
Something else could be going on in this new case with the C<@{$AoA[$i]}}>
|
1998-09-09 07:00:04 +00:00
|
|
|
dereference on the left-hand-side of the assignment. It all depends on
|
2000-06-25 11:04:01 +00:00
|
|
|
whether C<$AoA[$i]> had been undefined to start with, or whether it
|
|
|
|
already contained a reference. If you had already populated @AoA with
|
1998-09-09 07:00:04 +00:00
|
|
|
references, as in
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoA[3] = \@another_array;
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
Then the assignment with the indirection on the left-hand-side would
|
|
|
|
use the existing reference that was already there:
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
@{$AoA[3]} = @array;
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
Of course, this I<would> have the "interesting" effect of clobbering
|
2000-06-25 11:04:01 +00:00
|
|
|
@another_array. (Have you ever noticed how when a programmer says
|
1998-09-09 07:00:04 +00:00
|
|
|
something is "interesting", that rather than meaning "intriguing",
|
|
|
|
they're disturbingly more apt to mean that it's "annoying",
|
|
|
|
"difficult", or both? :-)
|
|
|
|
|
|
|
|
So just remember always to use the array or hash constructors with C<[]>
|
|
|
|
or C<{}>, and you'll be fine, although it's not always optimally
|
|
|
|
efficient.
|
|
|
|
|
|
|
|
Surprisingly, the following dangerous-looking construct will
|
|
|
|
actually work out fine:
|
|
|
|
|
|
|
|
for $i (1..10) {
|
2000-06-25 11:04:01 +00:00
|
|
|
my @array = somefunc($i);
|
|
|
|
$AoA[$i] = \@array;
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
That's because my() is more of a run-time statement than it is a
|
|
|
|
compile-time declaration I<per se>. This means that the my() variable is
|
|
|
|
remade afresh each time through the loop. So even though it I<looks> as
|
|
|
|
though you stored the same variable reference each time, you actually did
|
|
|
|
not! This is a subtle distinction that can produce more efficient code at
|
|
|
|
the risk of misleading all but the most experienced of programmers. So I
|
|
|
|
usually advise against teaching it to beginners. In fact, except for
|
|
|
|
passing arguments to functions, I seldom like to see the gimme-a-reference
|
|
|
|
operator (backslash) used much at all in code. Instead, I advise
|
|
|
|
beginners that they (and most of the rest of us) should try to use the
|
|
|
|
much more easily understood constructors C<[]> and C<{}> instead of
|
|
|
|
relying upon lexical (or dynamic) scoping and hidden reference-counting to
|
|
|
|
do the right thing behind the scenes.
|
|
|
|
|
|
|
|
In summary:
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoA[$i] = [ @array ]; # usually best
|
|
|
|
$AoA[$i] = \@array; # perilous; just how my() was that array?
|
|
|
|
@{ $AoA[$i] } = @array; # way too tricky for most programmers
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 CAVEAT ON PRECEDENCE
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
Speaking of things like C<@{$AoA[$i]}>, the following are actually the
|
1998-09-09 07:00:04 +00:00
|
|
|
same thing:
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
$aref->[2][2] # clear
|
|
|
|
$$aref[2][2] # confusing
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
That's because Perl's precedence rules on its five prefix dereferencers
|
|
|
|
(which look like someone swearing: C<$ @ * % &>) make them bind more
|
|
|
|
tightly than the postfix subscripting brackets or braces! This will no
|
|
|
|
doubt come as a great shock to the C or C++ programmer, who is quite
|
|
|
|
accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
|
|
|
|
element of C<a>. That is, they first take the subscript, and only then
|
|
|
|
dereference the thing at that subscript. That's fine in C, but this isn't C.
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
The seemingly equivalent construct in Perl, C<$$aref[$i]> first does
|
|
|
|
the deref of $aref, making it take $aref as a reference to an
|
1998-09-09 07:00:04 +00:00
|
|
|
array, and then dereference that, and finally tell you the I<i'th> value
|
2000-06-25 11:04:01 +00:00
|
|
|
of the array pointed to by $AoA. If you wanted the C notion, you'd have to
|
|
|
|
write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first
|
1998-09-09 07:00:04 +00:00
|
|
|
before the leading C<$> dereferencer.
|
|
|
|
|
|
|
|
=head1 WHY YOU SHOULD ALWAYS C<use strict>
|
|
|
|
|
|
|
|
If this is starting to sound scarier than it's worth, relax. Perl has
|
|
|
|
some features to help you avoid its most common pitfalls. The best
|
|
|
|
way to avoid getting confused is to start every program like this:
|
|
|
|
|
|
|
|
#!/usr/bin/perl -w
|
|
|
|
use strict;
|
|
|
|
|
|
|
|
This way, you'll be forced to declare all your variables with my() and
|
|
|
|
also disallow accidental "symbolic dereferencing". Therefore if you'd done
|
|
|
|
this:
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
my $aref = [
|
1998-09-09 07:00:04 +00:00
|
|
|
[ "fred", "barney", "pebbles", "bambam", "dino", ],
|
|
|
|
[ "homer", "bart", "marge", "maggie", ],
|
|
|
|
[ "george", "jane", "elroy", "judy", ],
|
|
|
|
];
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
print $aref[2][2];
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
The compiler would immediately flag that as an error I<at compile time>,
|
2000-06-25 11:04:01 +00:00
|
|
|
because you were accidentally accessing C<@aref>, an undeclared
|
1998-09-09 07:00:04 +00:00
|
|
|
variable, and it would thereby remind you to write instead:
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
print $aref->[2][2]
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
=head1 DEBUGGING
|
|
|
|
|
|
|
|
Before version 5.002, the standard Perl debugger didn't do a very nice job of
|
|
|
|
printing out complex data structures. With 5.002 or above, the
|
|
|
|
debugger includes several new features, including command line editing as
|
|
|
|
well as the C<x> command to dump out complex data structures. For
|
2000-06-25 11:04:01 +00:00
|
|
|
example, given the assignment to $AoA above, here's the debugger output:
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
DB<1> x $AoA
|
|
|
|
$AoA = ARRAY(0x13b5a0)
|
1998-09-09 07:00:04 +00:00
|
|
|
0 ARRAY(0x1f0a24)
|
|
|
|
0 'fred'
|
|
|
|
1 'barney'
|
|
|
|
2 'pebbles'
|
|
|
|
3 'bambam'
|
|
|
|
4 'dino'
|
|
|
|
1 ARRAY(0x13b558)
|
|
|
|
0 'homer'
|
|
|
|
1 'bart'
|
|
|
|
2 'marge'
|
|
|
|
3 'maggie'
|
|
|
|
2 ARRAY(0x13b540)
|
|
|
|
0 'george'
|
|
|
|
1 'jane'
|
|
|
|
2 'elroy'
|
|
|
|
3 'judy'
|
|
|
|
|
|
|
|
=head1 CODE EXAMPLES
|
|
|
|
|
|
|
|
Presented with little comment (these will get their own manpages someday)
|
|
|
|
here are short code examples illustrating access of various
|
|
|
|
types of data structures.
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head1 ARRAYS OF ARRAYS
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Declaration of a ARRAY OF ARRAYS
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
@AoA = (
|
1998-09-09 07:00:04 +00:00
|
|
|
[ "fred", "barney" ],
|
|
|
|
[ "george", "jane", "elroy" ],
|
|
|
|
[ "homer", "marge", "bart" ],
|
|
|
|
);
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Generation of a ARRAY OF ARRAYS
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# reading from file
|
|
|
|
while ( <> ) {
|
2000-06-25 11:04:01 +00:00
|
|
|
push @AoA, [ split ];
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# calling a function
|
|
|
|
for $i ( 1 .. 10 ) {
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoA[$i] = [ somefunc($i) ];
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# using temp vars
|
|
|
|
for $i ( 1 .. 10 ) {
|
|
|
|
@tmp = somefunc($i);
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoA[$i] = [ @tmp ];
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# add to an existing row
|
2000-06-25 11:04:01 +00:00
|
|
|
push @{ $AoA[0] }, "wilma", "betty";
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Access and Printing of a ARRAY OF ARRAYS
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# one element
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoA[0][0] = "Fred";
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# another element
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoA[1][1] =~ s/(\w)/\u$1/;
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# print the whole thing with refs
|
2000-06-25 11:04:01 +00:00
|
|
|
for $aref ( @AoA ) {
|
1998-09-09 07:00:04 +00:00
|
|
|
print "\t [ @$aref ],\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
# print the whole thing with indices
|
2000-06-25 11:04:01 +00:00
|
|
|
for $i ( 0 .. $#AoA ) {
|
|
|
|
print "\t [ @{$AoA[$i]} ],\n";
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# print the whole thing one at a time
|
2000-06-25 11:04:01 +00:00
|
|
|
for $i ( 0 .. $#AoA ) {
|
|
|
|
for $j ( 0 .. $#{ $AoA[$i] } ) {
|
|
|
|
print "elt $i $j is $AoA[$i][$j]\n";
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head1 HASHES OF ARRAYS
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Declaration of a HASH OF ARRAYS
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
%HoA = (
|
1998-09-09 07:00:04 +00:00
|
|
|
flintstones => [ "fred", "barney" ],
|
|
|
|
jetsons => [ "george", "jane", "elroy" ],
|
|
|
|
simpsons => [ "homer", "marge", "bart" ],
|
|
|
|
);
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Generation of a HASH OF ARRAYS
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# reading from file
|
|
|
|
# flintstones: fred barney wilma dino
|
|
|
|
while ( <> ) {
|
|
|
|
next unless s/^(.*?):\s*//;
|
2000-06-25 11:04:01 +00:00
|
|
|
$HoA{$1} = [ split ];
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# reading from file; more temps
|
|
|
|
# flintstones: fred barney wilma dino
|
|
|
|
while ( $line = <> ) {
|
|
|
|
($who, $rest) = split /:\s*/, $line, 2;
|
|
|
|
@fields = split ' ', $rest;
|
2000-06-25 11:04:01 +00:00
|
|
|
$HoA{$who} = [ @fields ];
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# calling a function that returns a list
|
|
|
|
for $group ( "simpsons", "jetsons", "flintstones" ) {
|
2000-06-25 11:04:01 +00:00
|
|
|
$HoA{$group} = [ get_family($group) ];
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# likewise, but using temps
|
|
|
|
for $group ( "simpsons", "jetsons", "flintstones" ) {
|
|
|
|
@members = get_family($group);
|
2000-06-25 11:04:01 +00:00
|
|
|
$HoA{$group} = [ @members ];
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# append new members to an existing family
|
2000-06-25 11:04:01 +00:00
|
|
|
push @{ $HoA{"flintstones"} }, "wilma", "betty";
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Access and Printing of a HASH OF ARRAYS
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# one element
|
2000-06-25 11:04:01 +00:00
|
|
|
$HoA{flintstones}[0] = "Fred";
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# another element
|
2000-06-25 11:04:01 +00:00
|
|
|
$HoA{simpsons}[1] =~ s/(\w)/\u$1/;
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# print the whole thing
|
2000-06-25 11:04:01 +00:00
|
|
|
foreach $family ( keys %HoA ) {
|
|
|
|
print "$family: @{ $HoA{$family} }\n"
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# print the whole thing with indices
|
2000-06-25 11:04:01 +00:00
|
|
|
foreach $family ( keys %HoA ) {
|
1998-09-09 07:00:04 +00:00
|
|
|
print "family: ";
|
2000-06-25 11:04:01 +00:00
|
|
|
foreach $i ( 0 .. $#{ $HoA{$family} } ) {
|
|
|
|
print " $i = $HoA{$family}[$i]";
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
print "\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
# print the whole thing sorted by number of members
|
2000-06-25 11:04:01 +00:00
|
|
|
foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
|
|
|
|
print "$family: @{ $HoA{$family} }\n"
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# print the whole thing sorted by number of members and name
|
|
|
|
foreach $family ( sort {
|
2000-06-25 11:04:01 +00:00
|
|
|
@{$HoA{$b}} <=> @{$HoA{$a}}
|
1998-09-09 07:00:04 +00:00
|
|
|
||
|
|
|
|
$a cmp $b
|
2000-06-25 11:04:01 +00:00
|
|
|
} keys %HoA )
|
1998-09-09 07:00:04 +00:00
|
|
|
{
|
2000-06-25 11:04:01 +00:00
|
|
|
print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head1 ARRAYS OF HASHES
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Declaration of a ARRAY OF HASHES
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
@AoH = (
|
1998-09-09 07:00:04 +00:00
|
|
|
{
|
|
|
|
Lead => "fred",
|
|
|
|
Friend => "barney",
|
|
|
|
},
|
|
|
|
{
|
|
|
|
Lead => "george",
|
|
|
|
Wife => "jane",
|
|
|
|
Son => "elroy",
|
|
|
|
},
|
|
|
|
{
|
|
|
|
Lead => "homer",
|
|
|
|
Wife => "marge",
|
|
|
|
Son => "bart",
|
|
|
|
}
|
|
|
|
);
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Generation of a ARRAY OF HASHES
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# reading from file
|
|
|
|
# format: LEAD=fred FRIEND=barney
|
|
|
|
while ( <> ) {
|
|
|
|
$rec = {};
|
|
|
|
for $field ( split ) {
|
|
|
|
($key, $value) = split /=/, $field;
|
|
|
|
$rec->{$key} = $value;
|
|
|
|
}
|
2000-06-25 11:04:01 +00:00
|
|
|
push @AoH, $rec;
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
# reading from file
|
|
|
|
# format: LEAD=fred FRIEND=barney
|
|
|
|
# no temp
|
|
|
|
while ( <> ) {
|
2000-06-25 11:04:01 +00:00
|
|
|
push @AoH, { split /[\s+=]/ };
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
# calling a function that returns a key/value pair list, like
|
1998-09-09 07:00:04 +00:00
|
|
|
# "lead","fred","daughter","pebbles"
|
|
|
|
while ( %fields = getnextpairset() ) {
|
2000-06-25 11:04:01 +00:00
|
|
|
push @AoH, { %fields };
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# likewise, but using no temp vars
|
|
|
|
while (<>) {
|
2000-06-25 11:04:01 +00:00
|
|
|
push @AoH, { parsepairs($_) };
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# add key/value to an element
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoH[0]{pet} = "dino";
|
|
|
|
$AoH[2]{pet} = "santa's little helper";
|
1998-09-09 07:00:04 +00:00
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
=head2 Access and Printing of a ARRAY OF HASHES
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# one element
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoH[0]{lead} = "fred";
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# another element
|
2000-06-25 11:04:01 +00:00
|
|
|
$AoH[1]{lead} =~ s/(\w)/\u$1/;
|
1998-09-09 07:00:04 +00:00
|
|
|
|
|
|
|
# print the whole thing with refs
|
2000-06-25 11:04:01 +00:00
|
|
|
for $href ( @AoH ) {
|
1998-09-09 07:00:04 +00:00
|
|
|
print "{ ";
|
|
|
|
for $role ( keys %$href ) {
|
|
|
|
print "$role=$href->{$role} ";
|
|
|
|
}
|
|
|
|
print "}\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
# print the whole thing with indices
|
2000-06-25 11:04:01 +00:00
|
|
|
for $i ( 0 .. $#AoH ) {
|
1998-09-09 07:00:04 +00:00
|
|
|
print "$i is { ";
|
2000-06-25 11:04:01 +00:00
|
|
|
for $role ( keys %{ $AoH[$i] } ) {
|
|
|
|
print "$role=$AoH[$i]{$role} ";
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
print "}\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
# print the whole thing one at a time
|
2000-06-25 11:04:01 +00:00
|
|
|
for $i ( 0 .. $#AoH ) {
|
|
|
|
for $role ( keys %{ $AoH[$i] } ) {
|
|
|
|
print "elt $i $role is $AoH[$i]{$role}\n";
|
1998-09-09 07:00:04 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
=head1 HASHES OF HASHES
|
|
|
|
|
|
|
|
=head2 Declaration of a HASH OF HASHES
|
|
|
|
|
|
|
|
%HoH = (
|
|
|
|
flintstones => {
|
|
|
|
lead => "fred",
|
|
|
|
pal => "barney",
|
|
|
|
},
|
|
|
|
jetsons => {
|
|
|
|
lead => "george",
|
|
|
|
wife => "jane",
|
|
|
|
"his boy" => "elroy",
|
|
|
|
},
|
|
|
|
simpsons => {
|
|
|
|
lead => "homer",
|
|
|
|
wife => "marge",
|
|
|
|
kid => "bart",
|
|
|
|
},
|
|
|
|
);
|
|
|
|
|
|
|
|
=head2 Generation of a HASH OF HASHES
|
|
|
|
|
|
|
|
# reading from file
|
|
|
|
# flintstones: lead=fred pal=barney wife=wilma pet=dino
|
|
|
|
while ( <> ) {
|
|
|
|
next unless s/^(.*?):\s*//;
|
|
|
|
$who = $1;
|
|
|
|
for $field ( split ) {
|
|
|
|
($key, $value) = split /=/, $field;
|
|
|
|
$HoH{$who}{$key} = $value;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
# reading from file; more temps
|
|
|
|
while ( <> ) {
|
|
|
|
next unless s/^(.*?):\s*//;
|
|
|
|
$who = $1;
|
|
|
|
$rec = {};
|
|
|
|
$HoH{$who} = $rec;
|
|
|
|
for $field ( split ) {
|
|
|
|
($key, $value) = split /=/, $field;
|
|
|
|
$rec->{$key} = $value;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
# calling a function that returns a key,value hash
|
|
|
|
for $group ( "simpsons", "jetsons", "flintstones" ) {
|
|
|
|
$HoH{$group} = { get_family($group) };
|
|
|
|
}
|
|
|
|
|
|
|
|
# likewise, but using temps
|
|
|
|
for $group ( "simpsons", "jetsons", "flintstones" ) {
|
|
|
|
%members = get_family($group);
|
|
|
|
$HoH{$group} = { %members };
|
|
|
|
}
|
|
|
|
|
|
|
|
# append new members to an existing family
|
|
|
|
%new_folks = (
|
|
|
|
wife => "wilma",
|
|
|
|
pet => "dino",
|
|
|
|
);
|
|
|
|
|
|
|
|
for $what (keys %new_folks) {
|
|
|
|
$HoH{flintstones}{$what} = $new_folks{$what};
|
|
|
|
}
|
|
|
|
|
|
|
|
=head2 Access and Printing of a HASH OF HASHES
|
|
|
|
|
|
|
|
# one element
|
|
|
|
$HoH{flintstones}{wife} = "wilma";
|
|
|
|
|
|
|
|
# another element
|
|
|
|
$HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
|
|
|
|
|
|
|
|
# print the whole thing
|
|
|
|
foreach $family ( keys %HoH ) {
|
|
|
|
print "$family: { ";
|
|
|
|
for $role ( keys %{ $HoH{$family} } ) {
|
|
|
|
print "$role=$HoH{$family}{$role} ";
|
|
|
|
}
|
|
|
|
print "}\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
# print the whole thing somewhat sorted
|
|
|
|
foreach $family ( sort keys %HoH ) {
|
|
|
|
print "$family: { ";
|
|
|
|
for $role ( sort keys %{ $HoH{$family} } ) {
|
|
|
|
print "$role=$HoH{$family}{$role} ";
|
|
|
|
}
|
|
|
|
print "}\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
# print the whole thing sorted by number of members
|
|
|
|
foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {
|
|
|
|
print "$family: { ";
|
|
|
|
for $role ( sort keys %{ $HoH{$family} } ) {
|
|
|
|
print "$role=$HoH{$family}{$role} ";
|
|
|
|
}
|
|
|
|
print "}\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
# establish a sort order (rank) for each role
|
|
|
|
$i = 0;
|
|
|
|
for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
|
|
|
|
|
|
|
|
# now print the whole thing sorted by number of members
|
|
|
|
foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {
|
|
|
|
print "$family: { ";
|
|
|
|
# and print these according to rank order
|
|
|
|
for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
|
|
|
|
print "$role=$HoH{$family}{$role} ";
|
|
|
|
}
|
|
|
|
print "}\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
=head1 MORE ELABORATE RECORDS
|
|
|
|
|
|
|
|
=head2 Declaration of MORE ELABORATE RECORDS
|
|
|
|
|
|
|
|
Here's a sample showing how to create and use a record whose fields are of
|
|
|
|
many different sorts:
|
|
|
|
|
|
|
|
$rec = {
|
|
|
|
TEXT => $string,
|
|
|
|
SEQUENCE => [ @old_values ],
|
|
|
|
LOOKUP => { %some_table },
|
|
|
|
THATCODE => \&some_function,
|
|
|
|
THISCODE => sub { $_[0] ** $_[1] },
|
|
|
|
HANDLE => \*STDOUT,
|
|
|
|
};
|
|
|
|
|
|
|
|
print $rec->{TEXT};
|
|
|
|
|
1999-05-02 14:33:17 +00:00
|
|
|
print $rec->{SEQUENCE}[0];
|
1998-09-09 07:00:04 +00:00
|
|
|
$last = pop @ { $rec->{SEQUENCE} };
|
|
|
|
|
|
|
|
print $rec->{LOOKUP}{"key"};
|
|
|
|
($first_k, $first_v) = each %{ $rec->{LOOKUP} };
|
|
|
|
|
|
|
|
$answer = $rec->{THATCODE}->($arg);
|
|
|
|
$answer = $rec->{THISCODE}->($arg1, $arg2);
|
|
|
|
|
|
|
|
# careful of extra block braces on fh ref
|
|
|
|
print { $rec->{HANDLE} } "a string\n";
|
|
|
|
|
|
|
|
use FileHandle;
|
|
|
|
$rec->{HANDLE}->autoflush(1);
|
|
|
|
$rec->{HANDLE}->print(" a string\n");
|
|
|
|
|
|
|
|
=head2 Declaration of a HASH OF COMPLEX RECORDS
|
|
|
|
|
|
|
|
%TV = (
|
|
|
|
flintstones => {
|
|
|
|
series => "flintstones",
|
|
|
|
nights => [ qw(monday thursday friday) ],
|
|
|
|
members => [
|
|
|
|
{ name => "fred", role => "lead", age => 36, },
|
|
|
|
{ name => "wilma", role => "wife", age => 31, },
|
|
|
|
{ name => "pebbles", role => "kid", age => 4, },
|
|
|
|
],
|
|
|
|
},
|
|
|
|
|
|
|
|
jetsons => {
|
|
|
|
series => "jetsons",
|
|
|
|
nights => [ qw(wednesday saturday) ],
|
|
|
|
members => [
|
|
|
|
{ name => "george", role => "lead", age => 41, },
|
|
|
|
{ name => "jane", role => "wife", age => 39, },
|
|
|
|
{ name => "elroy", role => "kid", age => 9, },
|
|
|
|
],
|
|
|
|
},
|
|
|
|
|
|
|
|
simpsons => {
|
|
|
|
series => "simpsons",
|
|
|
|
nights => [ qw(monday) ],
|
|
|
|
members => [
|
|
|
|
{ name => "homer", role => "lead", age => 34, },
|
|
|
|
{ name => "marge", role => "wife", age => 37, },
|
|
|
|
{ name => "bart", role => "kid", age => 11, },
|
|
|
|
],
|
|
|
|
},
|
|
|
|
);
|
|
|
|
|
|
|
|
=head2 Generation of a HASH OF COMPLEX RECORDS
|
|
|
|
|
|
|
|
# reading from file
|
|
|
|
# this is most easily done by having the file itself be
|
|
|
|
# in the raw data format as shown above. perl is happy
|
|
|
|
# to parse complex data structures if declared as data, so
|
|
|
|
# sometimes it's easiest to do that
|
|
|
|
|
|
|
|
# here's a piece by piece build up
|
|
|
|
$rec = {};
|
|
|
|
$rec->{series} = "flintstones";
|
|
|
|
$rec->{nights} = [ find_days() ];
|
|
|
|
|
|
|
|
@members = ();
|
|
|
|
# assume this file in field=value syntax
|
|
|
|
while (<>) {
|
|
|
|
%fields = split /[\s=]+/;
|
|
|
|
push @members, { %fields };
|
|
|
|
}
|
|
|
|
$rec->{members} = [ @members ];
|
|
|
|
|
|
|
|
# now remember the whole thing
|
|
|
|
$TV{ $rec->{series} } = $rec;
|
|
|
|
|
|
|
|
###########################################################
|
|
|
|
# now, you might want to make interesting extra fields that
|
|
|
|
# include pointers back into the same data structure so if
|
2000-06-25 11:04:01 +00:00
|
|
|
# change one piece, it changes everywhere, like for example
|
|
|
|
# if you wanted a {kids} field that was a reference
|
|
|
|
# to an array of the kids' records without having duplicate
|
1998-09-09 07:00:04 +00:00
|
|
|
# records and thus update problems.
|
|
|
|
###########################################################
|
|
|
|
foreach $family (keys %TV) {
|
|
|
|
$rec = $TV{$family}; # temp pointer
|
|
|
|
@kids = ();
|
|
|
|
for $person ( @{ $rec->{members} } ) {
|
|
|
|
if ($person->{role} =~ /kid|son|daughter/) {
|
|
|
|
push @kids, $person;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
# REMEMBER: $rec and $TV{$family} point to same data!!
|
|
|
|
$rec->{kids} = [ @kids ];
|
|
|
|
}
|
|
|
|
|
2000-06-25 11:04:01 +00:00
|
|
|
# you copied the array, but the array itself contains pointers
|
1998-09-09 07:00:04 +00:00
|
|
|
# to uncopied objects. this means that if you make bart get
|
|
|
|
# older via
|
|
|
|
|
|
|
|
$TV{simpsons}{kids}[0]{age}++;
|
|
|
|
|
|
|
|
# then this would also change in
|
|
|
|
print $TV{simpsons}{members}[2]{age};
|
|
|
|
|
|
|
|
# because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
|
|
|
|
# both point to the same underlying anonymous hash table
|
|
|
|
|
|
|
|
# print the whole thing
|
|
|
|
foreach $family ( keys %TV ) {
|
|
|
|
print "the $family";
|
|
|
|
print " is on during @{ $TV{$family}{nights} }\n";
|
|
|
|
print "its members are:\n";
|
|
|
|
for $who ( @{ $TV{$family}{members} } ) {
|
|
|
|
print " $who->{name} ($who->{role}), age $who->{age}\n";
|
|
|
|
}
|
|
|
|
print "it turns out that $TV{$family}{lead} has ";
|
|
|
|
print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
|
|
|
|
print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
|
|
|
|
print "\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
=head1 Database Ties
|
|
|
|
|
|
|
|
You cannot easily tie a multilevel data structure (such as a hash of
|
|
|
|
hashes) to a dbm file. The first problem is that all but GDBM and
|
|
|
|
Berkeley DB have size limitations, but beyond that, you also have problems
|
|
|
|
with how references are to be represented on disk. One experimental
|
|
|
|
module that does partially attempt to address this need is the MLDBM
|
|
|
|
module. Check your nearest CPAN site as described in L<perlmodlib> for
|
|
|
|
source code to MLDBM.
|
|
|
|
|
|
|
|
=head1 SEE ALSO
|
|
|
|
|
|
|
|
perlref(1), perllol(1), perldata(1), perlobj(1)
|
|
|
|
|
|
|
|
=head1 AUTHOR
|
|
|
|
|
|
|
|
Tom Christiansen <F<tchrist@perl.com>>
|
|
|
|
|
|
|
|
Last update:
|
|
|
|
Wed Oct 23 04:57:50 MET DST 1996
|