Jonathan Boccara's blog

Strong Types for Safe Indexing in Collections – Part 2

Published November 4, 2021 - 0 Comments

In the previous article on strong types, we set out to find how to use strong types for safe indexing in collections.

More precisely, if we have two vectors with two indices to access them, how can we use strong types to make sure we use the right index for the right vector, and that we don’t swap them by mistake?

In other terms, if we have two collections:

std::vector<int> foos = {1, 2, 3};
std::vector<int> bars = {10, 20};

And we create two separate indices by using strong types:

using FooIndex = NamedType<size_t, struct FooTag, PreIncrementable, Comparable>;
using BarIndex = NamedType<size_t, struct BarTag, PreIncrementable, Comparable>;

Those are two different types wrapping a size_t and that can be incremented and compared.

How can we make this first piece of code compile:

for (FooIndex fooIndex = FooIndex{0}; fooIndex < FooIndex{foos.size()}; ++fooIndex)
{
    for (BarIndex barIndex = BarIndex{0}; barIndex < BarIndex{bars.size()}; ++barIndex)
    {
        std::cout << foos[fooIndex] << '-' << bars[barIndex] << '\n'; // ok, correct indices
    }
}

And how to make this one trigger a compilation error?

for (FooIndex fooIndex = FooIndex{0}; fooIndex < FooIndex{foos.size()}; ++fooIndex)
{
    for (BarIndex barIndex = BarIndex{0}; barIndex < BarIndex{bars.size()}; ++barIndex)
    {
        std::cout << foos[barIndex] << '-' << bars[fooIndex] << '\n'; // oops, wrong indices!
    }
}

In the previous article we saw how to reuse the code of std::vector to implement a new data structure with a custom operator[]. We’ll see now another approach: how to use a proxy of a standard std::vector with a custom operator[].

Using a proxy: the simple case

Using a proxy consists in storing a reference to the vector, and providing an operator[] with a custom interface that calls the normal operator[] of std::vector:

template<typename T, typename Index>
class StrongIndexAccess
{
public:
    explicit StrongIndexAccess(std::vector<T> const& vector) : vector_(vector){}

    typename std::vector<T>::const_reference operator[](Index pos) const
    {
        return vector_[pos.get()];
    }

private:
    std::vector<T> vector_;
};

We can then create two different StrongIndexAccess by using the two strongly typed indices:

auto indexedFoos = StrongIndexAccess<int, FooIndex>(foos);
auto indexedBars = StrongIndexAccess<int, BarIndex>(bars);

Then the following piece of code compiles:

for (FooIndex fooIndex = FooIndex{0}; fooIndex < FooIndex{foos.size()}; ++fooIndex)
{
    for (BarIndex barIndex = BarIndex{0}; barIndex < BarIndex{bars.size()}; ++barIndex)
    {
        std::cout << indexedFoos[fooIndex] << '-' << indexedBars[barIndex] << '\n';
    }
}

And this one doesn’t:

for (FooIndex fooIndex = FooIndex{0}; fooIndex < FooIndex{foos.size()}; ++fooIndex)
{
    for (BarIndex barIndex = BarIndex{0}; barIndex < BarIndex{bars.size()}; ++barIndex)
    {
        std::cout << indexedFoos[barIndex] << '-' << indexedBars[fooIndex] << '\n';
    }
}

This is exactly what we wanted. Are we done then?

The above code works well for const references, which don’t allow modifying the values inside of the vector. To allow it we need to support non const references.

Also, our above code doesn’t support taking a reference on an incoming temporary vector:

auto indexedFoos = StrongIndexAccess<int, FooIndex>(std::vector<int>{1, 2, 3});
auto indexedBars = StrongIndexAccess<int, BarIndex>(std::vector<int>{10, 20});

The code as we wrote it will compile, but as soon as we try to access the values through StrongIndexAccess, we get to undefined behaviour, typically with the application crashing, because we’re accessing a destroyed object.

We need to make our StrongIndexAccess support those two additional cases, and this is where the fun begins.

Handling non const, lvalue and rvalue references

Before writing code, let’s decide on how to handle the tree cases of incoming values:

  • const lvalue reference: std::vector<T> const& vector
  • non const lvalue reference: std::vector<T>& vector
  • non const rvalue reference: std::vector<T>&& vector

We don’t include const rvalue references because they are virtually never used.

In the first two cases, with a lvalue reference, we can use the same idea as in the initial code. The source value being an lvalue, we know it’s going to stick around for some time before being destroyed, so we can just keep a reference to it. The reference has to be const or non const depending on the incoming value.

In the case of the rvalue though, we can’t just keep a reference: the incoming value is about to be destroyed, or is being moved from, which means in either case that we don’t want to access afterwards.

Another way then is to keep the whole value inside of our StrongIndexAccess, only for rvalues. Indeed an rvalue, especially of type std::vector, is made to be moved inside of our class.

In summary, here is what we want to do based on the type of the incoming value:

  • const lvalue reference: keep a const lvalue reference
  • non const lvalue reference: keep a non const lvalue reference
  • non const rvalue reference: keep the whole value

The implementation

This implies that the type of our data member depends on the incoming type to the constructor of StrongIndexAccess. C++ doesn’t allow to do that, but we can get away with something equivalent by using std::variant.

So we want a std::variant<std::vector&, std::vector const&, std::vector> as a member, or something like that, and be able to get a const or non const reference on this when we need it in operator[].

This is not something straightforward to implement (although not very difficult) especially since std::variant does not accept reference types.

Luckily, we’ve already done all the work when we saw How to Store an lvalue or an rvalue in the Same Object.

Let’s reuse our code from back then, with the Storage type and its accessors getReference and getConstReference. We can just initialise the data member of type Storage depending on the incoming value in the constructor:

template<typename T, typename Index>
class StrongIndexAccess
{
public:
    explicit StrongIndexAccess(std::vector<T>& vector) : vector_(NonConstReference(vector)){}
    explicit StrongIndexAccess(std::vector<T> const& vector) : vector_(ConstReference(vector)){}
    explicit StrongIndexAccess(std::vector<T>&& vector) : vector_(Value(std::move(vector))){}

    typename std::vector<T>::reference operator[](Index pos)
    {
        auto& vector = getReference(vector_);
        return vector[pos.get()];
    }

    typename std::vector<T>::const_reference operator[](Index pos) const
    {
        auto const& vector = getConstReference(vector_);
        return vector[pos.get()];
    }

private:
    Storage<std::vector<T>> vector_;
};

If you’re curious about how Storage works exactly, have a look at this preview article.

Where to put the custom code

In the previous article we saw how to introduce another data structure than std::vector to achieve our purpose of customising operator[]. And in this article we’ve just seen how to introduce a proxy to support the custom operator[] without changing the data structure.

The drawback of the proxy is that you have two objects in the client code: the data structure and the proxy. Whereas by customising the data structure there is only the data structure to manipulate. But the advantage of the proxy is that it is a more modular solution.

All in all, I prefer the solution of the proxy. Which one do you prefer? Would you have solved the problem of strong indexing differently? Let me know in a comment!

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin