Make your functions functional

Published November 22, 2016 - 24 Comments

Introduction : global variables

Global variables are a Bad Thing. Everyone knows this, right ?

But do you know exactly why ? I have asked this question around, and many of us can’t exactly explain why global variables should be avoided.

It is not a question of scope. Indeed, global constants have the same scope as global variables, but global constants are generally seen as a Good Thing, because they let you put a label over what would otherwise be “magic values”.

Some people answer that global variables should be avoided because they cause multithreading issues. They do cause multithreading issues, because a global variable can be accessed from any function, and could be written and read simultaneously from several threads, but I don’t think this is the main issue. Because, like everyone knows, global variables should be avoided even when there is only a single thread in a program.

I think that global variables are a problem because they break functions.

Functions are useful to decompose a program (or another function) into simpler elements, and for this reason, they reduce complexity, and are a tool to improve expressiveness of the code. But to do this, functions must respect certain rules. One of the rules to respect stems from the very definition of a function:

A function takes inputs, and provides outputs.

inputs_f_outputs

It sounds simple, because it is. And to keep it simple, the important thing to understand is that a function must clearly show what its inputs and outputs are. This is where global variables break functions. As soon as there is a global variable, every function in its scope can potentially have this global variable as input and/or output. And this is hidden from the function declaration. So the function has inputs and outputs, but does not tell exactly what they are. Such functions are… dysfunctional.

Note how global constants don’t have this issue. They are not an input of a function, because they cannot vary (as input does by definition), and they are certainly not an output either, because the function cannot write in them.

As a result, a function must clearly express its input and outputs. This idea happens to be at the basis of functional programming, so we could formulate the guideline this way:

Make your functions functional !

The rest of this post shows how to do this in an idiomatic way in C++.

Expressing the inputs of a function

Quite simply, inputs come in to a function through its parameters. Generally, inputs are expressed by passing a reference-to-const parameter (const T&). So when you read or write a function prototype, bear in mind that reference-to-const means input. For some types, input can also come in by value (like primitive types for instance).

Expressing the input-output parameters

C++ allows to modify inputs of a function. Such parameters are both input and output. The typical way to represent this is by reference-to-not-const (T&).

Expressing the outputs of a function

The rule here is:

Outputs should come out by the return type.

Output f(const Input& input);

This does sound natural, but there are many cases where we are reluctant to do this, and instead a more clumsy way is often seen: passing the output in parameter as a reference-to-not-const (T&), like so:

void f(const Input& input, Output& output);

Then the function would be in charge of filling this output parameter.

There are several drawbacks with using this technique:

It is not natural. Outputs should come out by the return type. With the above code, you end up with an awkward syntax at call site:

Output output;
f(input, output);

As opposed to the simpler syntax:

Output output = f(input);

And this gets even more awkward when there are several functions called in a row.

You have no guarantee that the function is actually going to fill the output,
Maybe it does not make sense to default-construct the Output class. In this case you would force it to be, for a questionable reason.

If producing outputs through the return type is better, why doesn’t everyone do it all the time ?

There are 3 types of reasons that prevent us from doing it. And all of them can be worked around, most of the time very easily. They are : performance, error handling and multiple return type.

Performance

In C, returning by value sounded like folly, because it incurred a copy of objects, instead of copying pointers. But in C++ there are several language mechanisms that elide the copy when returning by value. For instance Return Value Optimisation (RVO) or move semantics do this. For example, returning any STL container by value would move it instead of copying it. And moving an STL container takes about as much time as copying a pointer.

In fact you don’t even have to master RVO or move semantics to return objects by value. Just do it ! In many cases the compiler will do its best to elide the copy, and for the cases it doesn’t, you have over 80% probability that this code is not in the critical section for performance anyway.

Only when your profiler showed that a copy made during a return by value of a specific function is your bottleneck for performance, you could think of degrading your code by passing the output parameter by reference. And even then, you could still have other options (like facilitating RVO or implementing move semantics for the returned type).

Error handling

Sometimes a function may not be able to compute its output in certain cases. For example the function may fail with certain inputs. Then what can be returned if there is no output ?

In this case some code falls back to the pattern of passing output by reference, because the function doesn’t have to fill it. Then to indicate whether the output was filled or not, the function returns a boolean or an error code like:

bool f(const Input& input, Output& output);

This make for a clumsy and brittle code at call site:

Output output;
bool success = f(input, output);
if (success)
{
   // use output ...
}

The cleanest solution for the call site is for the function to throw an exception when it fails, and return an output when it succeeds. However, the surrounding code has to be exception safe, and many teams don’t use exceptions in their codeline anyway.

Even then, there is still a solution to make output come out by the return type: use optional.

You can see all about optional in a dedicated post, but in short, optional<T> represent an object that can be any value of type T, or empty. So when the function succeeds, you can return an optional containing the actual output, and when it fails, you can just return an empty optional:

boost::optional<Output> f(const Input& input);

Note that optional is in the process of standardization and will be natively available in C++17.

And at the calling site:

auto output = f(input); // in C++11 simply write auto output = f(input);
if (output)
{
   // use *output...
}

Multiple return types

In C++, only one type can be returned from a function. So when a function must return several outputs, the following pattern is sometimes seen:

void f(const Input& intput, Output1& output1, Output2& output2);

Or worse, asymmetrically:

Output1 f(const Input& input, Output2& output2);

Still falling back to the dreaded pattern of passing outputs by reference.

The cleanest solution to fix this and produce several outputs by return type, as the language stands today (< C++17), is defining a new structure grouping the outputs:

struct Outputs
{
   Output1 output1;
   Output2 output2;
};

Which leads to the more expressive declaration:

Outputs f(const Input& input);

If the two outputs are often together, it might even make sense to group them in an actual object (with private data and public methods), although this is not always the case.

In C++11, a quicker but less clean solution is to use tuples:

std::tuple<Output1, Output2> f(const Input& input);

And at call site:

Output1 output1;
Output2 output2;
std::tie(output1, output2) = f(inputs);

This has the drawback of forcing the outputs to be default constructible. (If you’re not familiar with tuples yet, don’t worry, we get into the details of how the above works when we explore tuples in a dedicated post).

As a final note, here is a syntax that will probably be integrated in C++17 to natively return multiple values:

auto [output1, output2] = f(const Input& input);

This would be the best of both worlds. It is called Structured Bindings. f would return an std::tuple here.

Conclusion

In conclusion, strive to have outputs coming out of your functions by their return type. When this is impractical, use an other solution, but bear in mind that it is detrimental for the clarity and expressiveness of your code.