Expressiveness, Nullable Types, and Composition (Part 1)

Published July 16, 2019

This week we have a series of two articles on composing nullable types written by Rafael Varago. Rafael is a Software Engineer at eGym GmbH, he’s been working with C++, Scala, Go, build systems (e.g Modern CMake), embedded applications, and distributed systems. He enjoys Declarative Programming and Modern C++. Rafael loves learning new technologies and he relatively often writes on Medium @varago.rafael. He can be found online on Github and on LinkedIn.

We’re software developers, our mission is to provide software that solves problems. And in its essence, writing software is all about composition.

We have a big problem to solve, so we break it up into smaller pieces that can be more easily comprehended, and then we compose these smaller pieces together into working software.

Imagine the problem of calculating the arithmetic mean given a list of numbers, maybe for an IoT application that keeps track of temperature measurements.

In C++, one possible way to solve the problem would be:

template <typename T>
T average(std::vector<T> const& series) {
    auto const sum = std::accumulate(begin(series), end(series), T{});
    return sum / std::size(series);
}

We broke the problem into two smaller ones:

Compute the sum of elements in the list.
Compute the length of elements in the list.

These two simpler tasks were accomplished by using STL Algorithms, which is an extremely powerful tool that should be part of all C++ developers’ toolkit:

Then we finally composed those two by dividing the former by the latter.

This works as expected for almost all cases. But what happens if series is an empty vector? That’s an exercise I’ll leave for the reader :).

Showing ZIP codes

Imagine an application that shows the ZIP code for a given person based on an association with an address. For this problem, we have the following classes:

struct zip_code {};
struct address {};
struct person {};

And the functions:

address findAddress(person const&);
zip_code getZipCode(address const&);

We also have a function person findPerson(Query const&) that returns an instance of a person that matches the specified search query, maybe by requesting some database.

These functions might be composed together to solve our problem as a pipeline, i.e. series of functions sequentially applied:

auto const customQuery = buildQuery();
auto const zipCode = getZipCode(findAddress(findPerson(customQuery)));
use(zipCode);

That should do the trick. Does it?

However, consider the case where the built customQuery didn’t match any person at all.

Moreover, the application might also allow having a person in the database that doesn’t have a corresponding address.

How should we handle these cases where the function may fail? How should we clearly and unambiguously communicate such failure to the client code?

Being expressive and bringing type-safety with `std::optional<T>`

There are many answers for those questions, for instance, throwing an exception. But we have to be sure that those failure conditions are really exceptional. Otherwise, we’d be risking using exceptions for flow control, and that’s not a good idea.

Here, I’m picking up Total Functions. So, I’m going to lift failures into the C++ powerful static type system and therefore rely on its type-checker to enforce guarantees at compile-time.

One way to achieve such a goal is through the use of the famous and glorious std::optional<T>, which is a standard type meant to be a vocabulary type that exactly represents the possible absence of a value, or nullability, sort of similar to a pointer but far more clearly and precisely. Given a type T, an std::optional<T> wraps, or lifts, T and can have all the values that T can have or it can be empty. It’s basically a container that can hold zero or one element.

In terms of Algebraic Data Types, an std::optional<T> models a sum type Being #T cardinality of the “set” T, roughly speaking, the number of elements that can inhabit a given type T. In this context, an std::optional<T> satisfies the following constraint:

#std::optional<T> = #T + 1

If we change the signatures to return an std::optional<T>, then we’d end up with:

std::optional<address> findAddress();
zip_code getZipCode();

Mixing std::optional<T> and Composition

Going back to the caller code, it may become something like:

auto const customQuery = buildQuery();
auto const person = findPerson(customQuery);
if (!person) return;
auto const address = findAddress(person.value());
if (!address) return;
auto const zipCode = getZipCode(address.value());
use(zipCode);

Hmm..It became relatively more verbose. What we have now:

Each intermediate step demands a safety check against an empty std::optional<T>, so we ended up with duplicated code for error handling.
If any check fails, then we do the same action: return from the function.
The error handling happens in the middle of the code, therefore distracting us from the main logic and making it harder to understand the business logic that we’re coding up.

Furthermore, the example involves only three functions, but we could have many more and for each added function, we also have to add more logic for handling…Things can get very hairy!

Even more critical, we have to make several calls to std::optional<T> accessor member function, in those cases to value(), and for each call, we have to make sure we’ve checked that it’s not empty before accessing its value. Otherwise, we would trigger a bad_optional_access. Thus, it’d be nice to minimize the direct calls to value() by wrapping intermediary ones inside a function that does the checking and then accesses the value. And only make the direct call to value() from our code at the very end of the composition.

Essentially, std::optional<T> has reduced our ability to compose, or chain, the operations as we had before. The code became slightly more complex to understand, and therefore to change.

Before, we were able to compose findPerson, findAddress, and getZipCode:

(query ->person) andThen (person ->address) andThen (address -> zipcode)

Where andThen is the usual function composition: evaluates the first function and then feeds its return value into the second function.

Such pipeline of function applications can be reduced into a “single function”, which is the composition of the whole pipeline:

(query ->zipcode)

That receives a query, evaluates, or applies, the intermediate functions under the covers, to finally return a zip_code.

But now, we have:

(query ->optional<person>) andThen (person ->optional<address>) andThen (address -> zipcode)

That we would like to reduce to:

(query ->optional<zipcode>)

But this isn’t possible. Because it can’t be composed anymore, given that we now have mismatches between the return type the first function and the input type of the second, i.e. findPerson returns an std::optional<person> whereas findAddress expects a person.

Thus, in order to complete the new functions, we have to somehow “adapt” the involved types. We need something more powerful than andThen that knows how to compose functions that return types lifted, i.e. wrapped, into std::optional<T>, or maybe more generally into a concept representing nullable types that abstracts std::optional<T>.

Ideally, we’d like to have both:

Expressiveness and safety brought by std::optional<T>.
Ability to compose operations in std::optional<T> as easy as we can do for T.

Fortunately, std::optional<T> is getting a nice monadic interface soon, and monads are, essentially, all about composition.

Looking into the future: C++20 and monadic composition for `std::optional<T>`

Among the new features described on the proposal for adding monadic composition to std::optional<T>, two of them are of particular interest for our example:

map: Given an std::optional<A> and a function f: A -> B, map uses f to map over std::optional<A>, yielding another std::optional std::optional<B>.
and_then: Given an std::optional<A> and a function f: A ->std::optional<B>, and_then uses f to map over std::optional<A>, yielding another std::optional<B>.

These are usually called combinators, since they are used to combine basic behaviours into more complex ones. Being a little bit more strict to terminology, these are used to compose effectfull functions, or effectfull programs, where std::optional<T> is an effect for a failed computation.

With map and and_then at our disposal, we could rewrite our example as:

auto const customQuery = buildQuery();
auto const zipCode = findPerson(customQuery)
.and_then(findAddress)
.map(getZipCode);
if (!zipCode) return;
use(zipCode.value());

In my opinion, that’s clearer than before, not duplicated checks against empty, just one and it happens at the end. By the way, who else is looking forward for C++20? :).

Composing other nullable types

By returning std::optional<T> we were able to represent functions that may fail to produce an answer, but they can’t give us more information about the reason for this, for instance, a person either wasn’t found or a connection problem happened when requesting the external database where we executed the query. To provide more information about the failure, or to distinguish between several failures, we would need to select another type that can hold such extra information.

One type can be std::variant<T, E> where T is the type of the returned value in case of a successful execution, whereas E is the type of the error that happened that we want to communicate to the caller code.

Similar to std::optional<E>, an std::variant<T, E> is also a sum type that can be either T or E, but not both at the same time. In terms of the Algebra of Types, it satisfies the constraint:

#std::variant<T, E> = #T + #E

Interesting side-note: std::variant<T, E> can be interpreted as a generalization of std::optional<T>, but that’s a topic for another time, let’s -try- to keep the focus here :).

Equipped with std::variant<T, E>, the above example could be changed to:

struct error {}; // represents a possible error that happened
struct zip_code {};
struct address {};
struct person {};
std::variant<person, error> findPerson(Query const&)
std::variant<address, error> findAddress(person const&);
zip_code getZipCode(address const&);

As far as I know, we won’t have monadic composition for std::variant<A, E> in C++20, but maybe in the via std::expected<A, E> or other excellent libraries already available, such as tl::expected<A, E>. If you happen to know about other similar proposals, I’d love to hear more about.

So, we would have to go back to add error handling at the middle of composition. Maybe ending up with something like:

auto const customQuery = buildQuery();
auto const person = findPerson(customQuery);
if (!std::holds_alternative<person>(person)) return;
auto const address = findAddress(std::get<person>(person));
if (!std::holds_alternative<address>(address)) return;
auto const zipCode = getZipCode(std::get<address>(address));
use(zipCode);

That’s very similar to our approach before using C++20’s monadic composition. We have error handling mixed up with business logic.

Conclusion

In this article we briefly described the idea behind composition and its importance for software development. We saw how to bring expressiveness and type-safety via std::optional<T>, and a taste of its monadic composition that comes with C++20.

And we completed the article with two open questions: