Perl References Tutorial (For Kittens)

Perl References can trip up both a novice and expert monk. Here is a gentle introduction suitable for kittens and a cheat sheet for everyone. This is the formal version of my original.

What are References?

Simple Perl variables are called scalars. Examples of scalar values are

$number      = 123;
$string      = "String";
$file_handle = open "<filename";
$null_value  = undef;
$instance    = MyClass->new;
$reference   = \"Reference of a String";

Perl has a few simple data types. Each is determined from the variable prefix symbol when declared, or when you want a non-scalar set of variables returned.

While the entire (or even part) of the array is identified with the type symbol (@array), the scalar type is used to address a single element in the array ($array[0]). Likewise, use the hash variable (%hash) and element ($hash{"key"}) follow the same rule.

Scalar References

A reference value does not hold one of these complex data types, but a "pointer" to one. A reference to a scalar can be used with the \ referencing operator, the [ ] array reference construct, or a { } hash construct.

When you call a subroutine, Perl passes a copy of all your argument data, so your original data can't be modified. By passing a reference instead, the subroutine would be able to update the value in your scalar. (It would also need to treat that positional parameter as a reference. More on this later.)

$i = 123;             ## A scalar
$iref = \$i;          #=> SCALAR(0x80108d678)  ## Reference to  scalar
$sref = \"Hello"      ## A reference to a string or other literal
increment(\$i);       ## Pass by reference

Dereferencing Syntax

To get the value pointed to by the reference variable, add the type symbol ($, @, %, or &) onto the left side of the reference variable, such as: @$reference.

To dereference an expression such as traversing a data structure path, use the type smbol ($, @, %, or &) with braces to wrap around the expression, for example: %{ ... }. This approach also converts references returned from a subroutine call.

   $$scalar_reference;               #=> 123
   @$array_reference;                #=> (1,2,3)
   %$hash_reference;                 #=> ('key'=>"value", ...)
   ${ $scalar_reference };           #=> 123
   %{ $dbtable->{rows}[0] }          #=> ('column'=>"value", ...)
   %{ get_hash_reference() }         #=> ('key'=>"value", ...)
   @{ get_array_reference() }        #=> (1, 2, 3)
   &{ get_sub_reference() }();       #=> Calls returned subroutine reference

Array

The array holds a list of scalar values.

@array = (1, 2, 'three', $file_handle, undef);
$array[1]                  #=> 2    ## Second element array (start counting at 0)
@array[1,2]                #=> (2, 'three') ## A "slice" as list of elements selected from array
@array[1,2] = (12, 13);    #=> Assigns multiple array items

$#array                    #=> 4    ## Last element index (0..n) of array, -1 if empty.
scalar(@array)             #=> 5    ## Number of elements in the array, 0 if empty.

push @array, 123;                   ## Appends value(s) to the end of the array
pop  @array;               #=> 123  ## Removes and returns the last element of the array
unshift @array, 123;                ## Prepends value(s) onto the head of the array
shift @array               #=> 123  ## Removes and returns the first element from the array

foreach $element (@array) { # The foreach structure iterates over the loop
  print $element
}

for ($i=0; $i<scalar(@array); $i++) { # The famous C-style for loop can be used
  print $array[$i];
}

foreach my $i (0..$#array) { # Use the Range operator to iterate over the indexes of the array
  print "Element $i is $array[$i]\n";
}

# Use map to iterate over the array, returning new array ($_ is the temp variable in block)
@array = map { $_ * 2 } @array; # This example doubles the values of each array element

# Use grep to generate list of elements matching your conditional expression ($_ is temp var)
@array = grep { $_ % 2 } @array; # Returns the odd numbers from an array of numbers

Array References

To get an array reference, use the [1, 2, 3] syntax to create an array reference, or prefix your array variable with a backslash like \@array.

Dereference an array with @$arrayref, with the $arrayref->[element] arrow for element references, or the @{array_ref_expression} syntax.

@arr = (1,2,3)        # Setup an array
$aref = \@arr;        #=> ARRAY(0x80108d678)
@$aref = (1,2,3);     #=> ARRAY(0x80108d678)   # Alternate syntax for dereferenced assignment
@$aref;               #=> (1, 2, 3)
@{$aref}[2,1,0]       #=> (3, 2, 1)   # Also: @$aref[2,1,0]
$aref = [qw(a b c)];  #=> ARRAY(0x80108d678)
$aref = [1,2,3]       #=> ARRAY(0x80108d678)
@$ref;                #=> (1, 2, 3)
$ref->[1];            #=> 2   ## Second element of array
$$ref[1];             #=> 2   ## Alternate syntax
$aref = [split(/ /, $sentence)]; # wrap a function to return an array reference instead
$aref = [ map { $_ * 2 } @$aref ]; # Using map with an array reference
@$aref << $value;     #=> Does NOT work (doesn't save array back to reference)
push(@$aref, $value); #=> Works!

scalar(@$aref)        #=> 3   ## Number of elements in the referenced array
$#$aref               #=> 2   ## Last element index, -1 if empty
$#{$aref}             #=> 2   ## Last element index, alternate syntax
$#{[]}                #=> -1  ## ... Empty arrays return -1.

foreach my $element (@{ arrayhref_function() }) { ... };

foreach my $i (0..$#{$aref}) { # Use the Range operator to iterate over the indexes of the array
  print "Element $i is $aref->[$i]\n";
}

Tables

Arrays can contain other arrays. A "table" is a 2-dimensional array. You may also use more dimensions.

$table = [ [1,2,3], [4,5,6], [7,8,9] ];
$table->[1]            #=> [4,5,6]
$table->[0][1]         #=> 2
push( @$table, [10,11,12] ); # Appends new row onto table
push( $table->[ $#{$table} ], 13 );  # Appends to last row of the table, $table->[3][3] == 13

Hash

A Hash is also known as a dictionary or associative array. It provides a key/value structure to set and retrieve values by a key value. It is a special case of a list, and decomposes back into a list when assigned or sent as a function parameter.

%hash        = (one=>1, two=>2, three=>3);

$hash{one};                #=> 1     ## The associated value for the key value of "one"
@hash{'one', 'two'};       #=> (1, 2) ## Returns a sliced list of values

%hash                      #=> ('one', 1, 'two', 2, 'three', 3)  ## Decomposes into a list
keys %hash                 #=> ('one', 'two', 'three') ## List of keys
values %hash               #=> (1, 2, 3) ## List of values
exists $hash{$key}         #=> Boolean   ## Hash operator returns true if key exists in the hash
delete $hash{three};       #=> 3         # Deletes key/value from hash, returns the value

%hash = %{ hashref_function() };

while ( ($key, $value) = each %hash) { # Iterates over a hash
   print $key, $value;
}

foreach $key (keys %hash) {            # Iterates over a hash by key value
   print $key, $hash{$key};
}

Note: When iterating over a hash using each, it keeps a "cursor" in the hash of the current key/value pair. If you do not finish iterating, the next time you start iterating, it will continue where it left off instead of at the first key/value pair of the hash. To reset the cursor, call keys %hash before iteration.

Hash References

To get a hash reference, use the {key=>value} syntax instead, or prefix your variable name with a backslash like: \%hash.

Dereference a hash with %$hashref, with the $arrayref->{key} arrow for value references, or the %{array_ref_expression} syntax.

%hash = (one=>1, two=>2, three=>3);
$href = \%hash;       #=> ARRAY(0x80108d6f0) ## Hashes are also arrays
%hash = %$href;       #=> ('one', 1, 'two', 2, 'three', 3)
$href = {one=>1, two=>2, three=>3};
                      #=> ARRAY(0x80108d6f0) ## Hashes are also arrays
$href->{one};         #=> 1 # Access a value by reference
$$href{one};          #=> 1 # Alternate syntax
@{$href}{"one","two"} #=> (1, 2)   ## @{} because we are returning a list

keys %$href                 #=> ('one', 'two', 'three') ## List of keys
values %$href               #=> (1, 2, 3) ## List of values
exists $href->{$key}        #=> Boolean   ## Hash operator returns true if key exists in the hash
delete $href->{three};      #=> 3         # Deletes key/value from hash, returns the value

foreach $key (keys %$href) { print $key, $href->{$key}; } # Iterate over Hash Reference
while ( ($key, $value) = each %$href) { print $key, $value; }

# Hashes can contain other hashes!
$href = {one=>1, word_counts=>{"the"=>34, "train"=>4} };
$href->{word_counts}{the}     #=> 34
@words = keys( %{ $href->{word_counts} } ); # De-reference inner hash with %{...} construct

CodeReferences

Code references are useful for callbacks or referencing alternate processing subroutines.

$code = sub { @_ };           # Anonymous function reference
$code = \&functionName;       # Reference to function (or \&ModuleName::functionName)
&$code();                     # Call by function reference
&{$code}();            # Alternate Syntax
$code->();                    # Third way of doing this

Usage: Passing data to functions

When you call a Perl function, it generates an array of the arguments and invokes the function. The function uses the special default array variable, @_ and uses the positions of the arguments as the parameters.

myfunction($i, $s);   #=> sets @_ = (123, "String");

The called function parses the incoming argument list with standard Perl notation:

($first, $second, @rest) = @_;

That statement takes the incoming argument array, and puts the first element in $first, the second in $second, and the remaining arguments (if any) to into the @rest array.

When you pass an array or hash (which is really just an array anyway) to a function, it "flattens" out the array and passes each value of the array as a positional parameter.

myfunction($i, @arr, %hash); #=> sets @_ = (123, 1, 2, 3, 'name', 1, 1, "one", "etc.", $i);
      

Oops! Now myfunction() doesn't know where the array begin and ends, nor where the hash begins or ends. The only thing it knows for sure is the first argument since that in a simple scalar.

This can be desired, as long as each parameter is a simple value, and the last parameter is an array or hash.

myfunction($i, @arr);  #=> Sets @_ to (123, 1, 2, 3);
#...
sub myfunction {
  my ($i, @arr) = @_;    #=> allows function to reconstruct the parameters
}

But what if you need to pass both an array AND a hash to a function? This is where references save the day. Since a reference to an array or hash is a single value, the reference to the whole array takes only one positional argument.

myfunction($i, \@arr, \%hash);
#...
sub myfunction {
  my ($i, $aref, $href) = @_;   #=> $i is 123, $aref and $href are references.
  @$aref;                       #=> (1, 2, 3)
  %$hash;                       #=> (name=>1, 1=>"one", "etc."=>$i)
}
      

Function can also use this technique to return an array of values to a caller.

return ($i, \@arr, \%hash);

There is also an added performance benefit of passing references instead of whole arrays and hashes. For large structures, it takes time copying each element from the array into the @ variable. Instead, only a single reference variable is needed to be passed into the @ argument list. In computer science, this is known as "passing by reference" instead of "passing by value".

Lastly, when you pass a reference into a function, that function can change the value of the passed variable "in place" without getting returned explicitly. This technique is useful at time, but sometimes these "side effect" practices are discouraged, so use this only when it makes sense, okay?

sub myfunction {
  $iref = shift;           # Shifts first value off of @_ array
  ++$$iref;                # Increment the value at the reference
}
#...
$j = 123;                  #=> 123
myfunction(\$j);           # myfunction() will change the value of $j
$j;                        #=> 124

While this example demonstrated the concept, this is one of those cases where it is not okay to do this. I would instead not use the reference and have the code return the new value.

Usage: Data Structures

A table or 2-dimensional array is not a native perl datatype as it is in many languages. Instead, perl allows you to build your own as an array of arrays. Now since perl array elements can only be scalar variables, we need to use a reference instead. So a perl table is actually an array of references.

@row1 = (1, 2, 3);
@row2 = (4, 5, 6);
@table = (\@row1, \@row2);

$table[0]->[1];    # => 2 ## Value of first row, second column
$table[0][1];         # => 2 ## Any second-level subscript/hash on array implies the ->

Often, it's best to bite the bullet and fully embrace a reference when you are using a data structure like this. It helps me to think of it as a starting point, and makes the syntax more friendly in the long run.

$table = [ [1,2,3], [4,5,6] ];  # Table is a reference of arrays of arrays :-)
$table->[0][1];                 #=> 2
$table[0][1];  # Wrong!         # Error! Expecting $table[0]->[1] but $table is a reference, not array

See how I defined the table using the [ ] array reference syntax? It should make it more clear to write and read. Also, the table subscripts are now together, not separated by the -> dereferencing pointer.

Also, see the difference of addressing table elements set up as starting with an array instead of a array reference? It can trip you up, and I suggest always use a reference to avoid the confusion, because mixing syntax styles in a program is painful. Using a consistent syntax within a large program and complex data structures reduces chances for errors.

@table = ([1,2,3], [4,5,6]);    # Avoid: array of references
$table[0][1];                   #=> 2 ## While this works nicely...
$table[0]->[1];                 #=> 2 ## ... this is what perl does

$table = [ [1,2,3], [4,5,6] ];  # Suggested: start off with a reference
$table->[0][1];                 #=> 2 ## Only way to access the element

$reference->{key}               # This makes a consistent usage for all data structures
$reference->{key}[0];           # ... Hash of arrays
$reference->[0]{key};           # ... Array of hashes

A result set of database rows are best represented as an array of hash references, where each row is a (colname=>value) hash. Again, let's start with a reference to the result set.

$rows = [];                          # Initialize $rows as array reference
push @$rows, {id=>1, name=>"Allen"}; # Add first row, a hash reference.
push @$rows, {id=>2, name=>"Bob"};   # Second row

# Get at the data by $rows->[row_number]{column_name_as_hash_key}
$rows->[0]{name};                    #=> Allen

Did you follow all that? You may want to parse through it a few times. Here are a few notes:

Dereferencing References of References

Now for the fun part. We saw how to dereference a reference with the <typeoperator>$referencevariable syntax and -> operator to navigate through a series of nested references.

@arr = @$array_reference;
%hash = %$hash_reference;
$i = $$scalar_reference;

$array_reference->[0];
$hash_reference->{key};
$array_of_hashes->[0]{key};
$array_of_arrays->[0][2];  ## a 2-Dimensional table
$three_dim_table->[0][1][2];
$arr_hash_array->[1]{key}[2];

Now say we want to operate on a array reference returned from the -> operator. To dereferences an expression like this we use the @{expression} syntax for an array reference and the %{expression} syntax for a hash reference.

Here is now this works with our 2-dimensional table structure

$table = [ [1,2,3], [4,5,6] ];
$table->[0];               # Returns a reference to [1,2,3]
@row = @{$table->[0]};     # Returns the array of (1, 2, 3)
push @{$table->[0]}, 0;    # $table is now [ [1,2,3,0], [4,5,6] ]

scalar(@{$table->[1]})     #=> 3  ## The number of items of the second row
$#{$table->[1]}            #=> 2  ## Index of last element in the referenced array (-1 when empty)
@{$table->[1]}[1, 2]       #=> (5, 6) ## Slice of array, returns table[1][1,2] as a list

foreach $row_ref (@$table) {   # Iterate over each row
  foreach $value (@$row_ref) { # Iterate over each column in that row
    $value;                    # Do something with each value in the table
  }
}

for ($i=$#{$table}; $i>=0; $i--) { # Reverse iteration by index
  for ($j=$#{$table->[$i]}; $j>=0; $j--) { # ... reverse iteration for each row
    $table->[$i][$j] += $prev;    # Do something with that referenced location
    $prev = $table->[$i][$j];     # ... Totally contrived example
  }
}

foreach my $i (0..$#{$table}) { # Again, using the range operator
  foreach my $j (0..$#{$table->[$i]}) {
    print "table row $i column $j is: $table->[$i][$j]\n";
  }
}

Now let's look at our result set "Array references of hash references"

$rows = [ {id=>1, name=>"Allen"}, {id=>2, name=>"Bob"} ]; # Array Ref of Hashes
$rows->[0];                # Returns reference to the first row
%hash = %{$rows->[0]};     # Deferences the first row as a hash
@hash_keys = keys %{$rows->[0]}; # ... and now return its keys

foreach $hash_key (keys @{$rows->[0]}) { } # Iterate over the keys
while (($k,$v) = each %{$rows->[0]) { } # Iterate over the "row"

foreach $row_ref (@$rows) {               # Iterate over each row
  while (($k,$v) = each %{$rows->[0]) {   # Iterate over each key-value pair
    ($k, $v);                             # Do something with each pair
  }
}

Inspecting the reference for type

Okay, now imagine you need to write a subroutine that takes a reference, and need to determine what the datatype is for the reference?

The ref unary operator takes a variable and returns the datatype. It is the "type of" operator, but does not distinguish between scalar types (string, number, undef, file handles).

ref 'asdf'                   #=> ''        ## Non-reference values return the empty string
ref [1,2,3]                  #=> 'ARRAY'
ref {a=>12}                  #=> 'HASH'
ref sub {}                   #=> 'CODE'    ## Code object (see below)
ref \123                     #=> 'SCALAR'  ## Reference to a scalar
ref new MyClass              #=> 'MyClass' ## Objects return their package/class name

ref undef                    #=> ''        ## Even undefined values (null/nil) are scalar
defined(undef);              #=> False     ## A False condition, but no value returned

$r = [{a=>1}];
ref $r                       #=> 'ARRAY'   ## Array of Hashes
ref $r->[0]                  #=> 'HASH'    ## Deference to the Hash

Congratulations, you are now a master of basic perl references!

Closures

Closures are a cool trick, stolen outright from the Lisp universe. It is a block of code that is passed to another function or stored for later execution, much like we saw before.

The really cool part is this code block executes in the context of when it was defined. It has access to all the variables in the scope when it was created, and can alter them, even when invoked from within another function.

One of the useful things we can do with closures is to create a callback code block that injects our specific logic into a general purpose routine.

Closures are "anonymous code blocks" using the sub { } syntax (like a subroutine definition without a name).

$i = 1;
$closure  = sub { ++$i; }  # Reference to a closure
&$closure();               # Run the reference
$i;                        #=> 2

Let's build a useful function to tie this all together. Perl has a map construct that iterates over an array, injects a value into a block, and returns an array of values returned from the block. It does not use a closure, but it a language syntax construct. For instance:

@array = ( 1, 2, 3);
@array = map { $_ + 1 } @array; # Adds 1 to each element in the array
@array                          #=> (2, 3, 4)

# Alternate (non-map) way to do this:
@result = ();
foreach (@array) { push @result, $_ + 1; }
@array = @result;

It's a simple, but rather ugly syntax. It sets the $_ variable inside the block for each value of the array.

Suppose we want to create an map-like iterator for a hash. We can write a general routine, map_hash(), that iterates over a hash, and executes a passed closure (or code block) for each key-value pair. The routine passed back a key-value pair, either the same or changed. map_hash() returns a new hash of the result pairs.

sub map_hash {
  ($hash_ref, $block_ref) = @_;      # Args: map_hash(\%hash, sub {} );
  keys %$hash_ref; # Resets hash iterator in case it was not finished
  %result = ();
  while ( ($k,$v) = each %$hash_ref) {
    ($k, $v) = &$block_ref($k, $v);  # Calls the closure or code block
    %result{$k} = $v;                # Place new key-value in result hash
  }
  %result;                           # Returns the new hash
}

# Call map_hash to upper-case the keys of a hash.
%hash = (a=>1, b=>2);
%hash = map_hash(\%hash, \&uc_hash_keys);
%hash;                               #=> (A=>1, B=>2)

sub uc_hash_keys {
  ($k, $v) = @_;  # Input arguments: key, value pair
  (uc $k, $v);    # Return upper-cased key, value pair
}

# Call map_hash to sum the values of all keys in the hash
$total = 0;                          # The closure has access to all local vars!
%hash = map_hash(\%hash,
                   sub {             # Create a closure to
                     ($k, $v) = @_;  # Input arguments: key, value pair
                     $total += $v;   # Adds value to total defined above
                     ($k, $v);       # Return unchanged key, value pair
                   }
                 );
$total;                              #=> 3

Classes and Object Oriented Perl

Wait! What is this doing here? We are talking about Perl References, right? Well, the perl object model really is just a reference "blessed" with a package name. You can execute any method (a function in OOP is now called a method) in that package using the -> operator.

package Person;                   # Define our Person class

sub new {                         # Constructor method
  ($class, %attributes) = @_;     # * Receives the package and any arguments
  my $self = \%attributes;        # * Create a hash reference as instance
  bless $self, $class;            # * Tells perl $self is an instance of Person
}                                 # * bless returns $self, which is returned to caller

sub talk {
  ($self, $message) = @_;         # Self is the object reference always passed in
  print "$self->{name} says $message \n";
}

package main;                     # Change back to our "main" namespace

$person = {id=>1, name=>"Allen"}; # $person is a hash reference
bless $person, Person;            #=> Person=HASH(0x100804ed0), instance created without "new"

# Or we can use the "new" syntaxes we see in other languages
$person = new Person(id=>1, name=>"Allen")
$person = Person-<new(id=>1, name=>"Allen")

# Now $person is a instance of class Person (implemented with a hash).

# Call a method on the person object reference
$person->talk("hi");              #=> Allen says hi

# This is syntax-sugar for the true calling notation (which explains the $self arg)
$person = Person::new(Person, id=>1, name=>"Allen");
Person::talk($person, "hi");

Of course, there is more to understand about object-oriented perl than this basic example. I wanted to demonstrate that perl objects are also references, and require the same syntax.

Also, you see that all perl objects are specially "blessed" references of perl hashes, arrays, or other things you can reference. Though, you will find that 99.99% of the time, it will be a hash, because the hash keys become the instance variables of the object.

Bonus: Simulating The Ruby Call

NOTE: This is more advanced and references techniques from later in this document. If you are new to this topic, feel free to skip onto the next section.

In the Ruby (1.9) language, which names Perl as a close ancestor, You can call a function like this:

my_func(data, hashkey:"value", hashkey2:123) { "This is a closure" }

def my_func(data, options={}, &block)
   options      #=> { hashkey: "value", hash_key2: 123 }
   block.call   #=> "This is a closure"
end

my_func2(1, 2, 3, 4, 5, 6, 7, 8, debug:true) { |x| x * 2 }

def my_func2(*args)
   args         #=> [1, 2, 3, 4, 5, 6, 7, 8, {debug: true}]
   options = args.last.is_a?(Hash) ? args.pop : {} #=> {debug: true}
   args         #=> [1, 2, 3, 4, 5, 6, 7, 8]
   args.map {|x| yield(x) } # The passed block is called with yield
end

Notice that the calls that look like named parameters hashkey:"value" are pooled into a hash and appended to the call arguments, that my_func captures in the variable options. If a block is specified on the call, it will be assigned to a variable at the end starting with the ampersand like &block (though is still available using the yield command if not captured as such).

In my_func2, we expect any number of arguments, and the unspecified ones are assigned to the variable args using the "splat" (*) operator. This is how a perl function is invoked, where the parameters are assembled into an array, and the function must parse out the variables at the positions it expects.

However, any name-value pairs specified at the end of the call are put into a hash, which is still passed as the last element of the args array. If we don't expect the args array to end with a hash, we can use this trick to pop off the hash as the last element of the args array, which we can use to control preferences for the function.

We can use this trick as well with Perl when we also would not otherwise expect the last argument to be a hash reference or a closure (anonymous function reference).

my_func2(1, 2, 3, 4, 5, 6, 7, 8, {debug=>1}, sub { 2 * shift });

sub my_func2 {
  my ($block, $options, @args) = ruby_args(@_);
  map { &$block($_, $options) } @args;
}

sub ruby_args {
  # Check if last argument is a CODE block...
  my $block = scalar(@_) && ref($_[-1]) eq 'CODE' ? pop : sub {shift}; # Passed Closure?

  # You can either do this to put the optional arguments into a hash reference...
  my $options = scalar(@_) && ref($_[-1]) eq 'HASH' ? pop : {}; # as Hash Reference

  # ... Or this to dereference it and store in a hash
  my %options = scalar(@_) && ref($_[-1]) eq 'HASH' ? %{pop()} : (); # as Hash

  return ($block, $options, @_); #=> (1, 2, 3, 4, 5, 6, 7, 8)
}

Here, I inspected the special @ argument array to see if the last element ($[-1], we use $ to access an element from @, and the -1 index tells perl to wrap back around to the end of the array to point to the last element) is a code or hash reference. If it was, I use pop to take it off the end of the array, otherwise I set it to a default (empty) value.