Concurrent programming in Erlang

2.5. Function Definition

The following sections describe in more detail the syntax of an Erlang function. We start by giving names to the different syntactic elements of a function. This is followed by descriptions of these elements.

2.5.1. Terminology

Consider the following module:

-module(lists2).                               % 1
                                               % 2
-export([flat_length/1]).                      % 3
                                               % 4
%% flat_length(List)                           % 5
%%  Calculate the length of a list of lists.   % 6
                                               % 7
flat_length(List) ->                           % 8
flat_length(List, 0).                          % 9
                                               % 10
flat_length([H|T], N) when list(H) ->          % 11
    flat_length(H, flat_length(T, N));         % 12
flat_length([H|T], N) ->                       % 13
    flat_length(T, N + 1);                     % 14
flat_length([], N) ->                          % 15
    N.                                         % 16

Each line is commented % 1, etc. Comments start with the % character (which can occur anywhere in a line) and are delimited by the end of line.

Line 1 contains the module declaration. This must come before any other dec- larations or any code.

The leading - in lines 1 and 3 is called the attribute prefix. module(lists2) is an example of an attribute.

Lines 2, 4, etc., are blank – sequences of one or more blanks, lines, tabs, newline characters, etc., are treated as if they were a single blank.

Line 3 declares that the function flat_length, which has one argument, will be found in and should be exported from the module.

Lines 5 and 6 contain comments.

Lines 8 and 9 contain a definition of the function flat_length/1. This consists of a single clause.

The expression flat_length(List) is referred to as the head of the clause. The expressions following the -> are referred to as the body of the clause.

Lines 11 to 16 contain the definition of the function flat_length/2 – this function consists of three clauses; these are separated by semicolons ; and the last one is terminated by a full stop ..

The first argument of flat_length/2 in line 11 is the list [H|T]. H is referred to as the head of the list, T is referred to as the tail of the list. The expression list(H) which comes between the keyword when and the -> arrow is called a guard. The body of the function is evaluated if the patterns in the function head match and if the guard tests succeed.

The first clause of flat_length/2 is called a guarded clause; the other clauses are said to be unguarded.

flat_length/2 is a local function – i.e. cannot be called from outside the module (this is because it did not occur in the export attribute).

The module lists2 contains definitions of the functions flat_length/1 and flat_length/2. These represent two entirely different functions – this is in contrast to languages such as C or Pascal where a function name can only occur once with a fixed number of arguments.

2.5.2. Clauses

Each function is built from a number of clauses. The clauses are separated by semicolons ;. Each individual clause consists of a clause head, an optional guard and a body. These are described below.

2.5.3. Clause heads

The head of a clause consists of a function name followed by a number of arguments separated by commas. Each argument is a valid pattern.

When a function call is made, the call is sequentially matched against the set of clause heads which define the function.

2.5.4. Clause guards

Guards are conditions which have to be fulfilled before a clause is chosen.

A guard can be a simple test or a sequence of simple tests separated by commas. A simple test is an arithmetic comparison, a term comparison, or a call to a system predefined test function. Guards can be viewed as an extension of pattern matching. User-defined functions cannot be used in guards.

To evaluate a guard all the tests are evaluated. If all are true then the guard succeeds, otherwise it fails. The order of evaluation of the tests in a guard is undefined.

If the guard succeeds then the body of this clause is evaluated. If the guard test fails, the next candidate clause is tried, etc.

Once a matching head and guard of a clause have been selected the system commits to this clause and evaluates the body of the clause.

We can write a version of factorial using guarded clauses.

factorial(N) when N == 0 -> 1;
factorial(N) when N > 0 -> N * factorial(N - 1).

Note that in the above example we could have reversed the clause order, thus:

factorial(N) when N > 0 -> N * factorial(N - 1);
factorial(N) when N == 0 -> 1.

since in this case the combination of head patterns and guard tests serves to identify the correct clause uniquely.

2.5.5. Guard tests

The complete set of guard tests is as follows:

Guard Succeeds if
atom(X) X is an atom
constant(X) X is not a list or tuple
float(X) X is a float
integer(X) X is an integer
list(X) X is a list or []
number(X) X is an integer or float
pid(X) X is a process identifier
port(X) X is a port
reference(X) X is a reference
tuple(X) X is a tuple
binary(X) X is a binary

In addition, certain BIFs, together with arithmetic expressions, are allowed in guards. These are as follows:

element/2, float/1, hd/1, length/1, round/1, self/0, size/1 trunc/1, tl/1, abs/1, node/1, node/0, nodes/0

2.5.6. Term comparisons

The term comparison operators which are allowed in a guard are as follows:

Operator Description Type
X > Y X greater than Y coerce
X < Y X less than Y coerce
X =< Y X equal to or less than Y coerce
X >= Y X greater than or equal to Y coerce
X == Y X equal to Y coerce
X /= Y X not equal to Y coerce
X =:= Y X equal to Y exact
X =/= Y X not equal to Y exact

The comparison operators work as follows: firstly, both sides of the operator are evaluated where possible (i.e. in the case when they are arithmetic expressions, or contain guard function BIFs); then the comparison operator is performed.

For the purposes of comparison the following ordering is defined:

number < atom < reference < port < pid < tuple < list

Tuples are ordered first by their size then by their elements. Lists are ordered by comparing heads, then tails.

When the arguments of the comparison operator are both numbers and the type of the operator is coerce then if one argument is an integer and the other a float the integer is converted to a float before performing the comparison.

The exact comparison operators perform no such conversion.

Thus 5.0 == 1 + 4 succeeds whereas 5.0 =:= 1 + 4 fails.

Examples of guarded function clause heads:

foo(X, Y, Z) when integer(X), integer(Y), integer(Z), X == Y + Z ->
foo(X, Y, Z) when list(X), hd(X) == {Y, length(Z)} ->
foo(X, Y, Z) when {X, Y, size(Z)} == {a, 12, X} ->
foo(X) when list(X), hd(X) == c1, hd(tl(X)) == c2 ->

Note that no new variables may be introduced in a guard.

2.5.7. Clause bodies

The body of a clause consists of a sequence of one or more expressions which are separated by commas. All the expressions in a sequence are evaluated sequentially. The value of the sequence is defined to be the value of the last expression in the sequence. For example, the second clause of factorialcould be written:

factorial(N) when N > 0 ->
    N1 = N - 1,
    F1 = factorial(N1),
    N * F1.

During the evaluation of a sequence, each expression is evaluated and the result is either matched against a pattern or discarded.

There are several reasons for splitting the body of a function into a sequence of calls:

  • To ensure sequential execution of code – each expression in a function body is evaluated sequentially, while functions occurring in a nested function call could be executed in any order.
  • To increase clarity – it may be clearer to write the function as a sequence of expressions.
  • To unpack return values from a function.
  • To reuse the results of a function call.

Multiple reuse of a function value can be illustrated as follows:

good(X) ->
        Temp = lic(X),
        {cos(Temp), sin(Temp)}.

would be preferable to:

bad(X) ->
        {cos(lic(X)), sin(lic(X))}.

which means the same thing. lic is some long and involved calculation, i.e. some function whose value is expensive to compute.