Jonathan Boccara's blog

The SoA Vector – Part 2: Implementation in C++

Published December 21, 2018 - 0 Comments

Today’s guest post is the second part of a two-posts series written by Sidney Congard. Sidney is an almost graduated student and an intern at QuasarDB, a company writing it’s own database in C++17. He has been doing C++ in his free time regularly for two years.

Also interested in writing on Fluent C++? Check out the guest posting area!

Like we saw in the first part of this series on SoA, the SoA is a way to organise the data of a collection of objects to optimise the performance of certain use cases: traversing the collection by accessing the same data member of all the objects:

struct person {
   std::string name;
   int age;
};

std::vector<person> persons = ...

for (auto& person : persons)
{
   ++person.age;
}

The SoA in its barest expression is this:

struct persons {
    std::vector<std::string> names;
    std::vector<int> ages;
};

By putting all the ages next to each other in memory, we optimise the performance of the traversal. But such a structure is not a container in itself, and is in particular not compatible with the STL.

Let’s design an SoA collection with an interface as close as possible to std::vector<persons>, but with the SoA structure of components stored in separate arrays.

Proxy types

Here are the most basic expressions that we want to support:

auto persons = soa::vector<person>{};
persons.push_back({ “Julie”, 46 });
persons[0].age += 1;

The operator[] allows to modify the components through their names. So we need to implement a proxy class which holds references to each component with the same names, which will be created by our soa::vector iterator.

It means that we can’t avoid to use of a macro to create these proxy types, unless we let the user write the proxy type explicitly. This macro then allow us to generate another proxy (for const references).

This macro can be tedious to write: The Boost.Preprocessor library can help by providing high-level macro functions to generate our code. Metaclasses will surely allow us to avoid this once they are available!

On the implementation side, we will have a tuple of vectors. We can then later improve this by having a single allocation and a tuple of indexes and accepting a custom allocator as a template parameter, but it won’t affect much its use.

namespace soa {

template <class T>
class vector {
    // We will see how we get these '???' types later.
    std::tuple<std::vector<???>, ...> vectors_;
};

}

Creating the proxy types with macros

Let’s start by creating the proxy types, that will be what we get when we dereference an iterator coming from our SoA vector:

#define SOA_PP_REF(type, member) \
decltype(std::declval<type>().member) & member;

#define SOA_PP_CREF(type, member) \
decltype(std::declval<type>().member) const& member;

#define SOA_DEFINE_TYPE(type, ...) \
namespace soa { \

    template <> \
    struct ref_proxy<::type> { \
        SOA_PP_MAP(SOA_PP_REF, ::type, __VA_ARGS__) \
    }; \
    template <> \
    struct cref_proxy<::type> { \
        SOA_PP_MAP(SOA_PP_CREF, ::type, __VA_ARGS__) \
    }; \
}

The above code relies on the macro SOA_PP_MAP(macro, type, args...) which will expands to macro(type, arg) for each arg in args. We will skip the implementation of the implementation of the SOA_PP_MAP here. If you’re interested to see its code you can check it out here.

To instantiate the proxy types corresponding to the following type:

struct person {
    std::string name;
    int age;
};

We would invoke the macro this way:

SOA_DEFINE_TYPE(person, name, age);

The code generated by the macro would look like this:

namespace soa {

template <>
struct ref_proxy<person> {
    std::string& name;
    int& age;
};
template <>
struct cref_proxy<person> {
    std::string const& name;
    int const& age;
};
}

The iterator class

Now we can make iterators that creates our proxy when they are dereferenced. I didn’t find if there is a way to make them work with the arrow operator as well, so please tell me if you know how !

namespace soa {

template <class Vector>
class iterator {
    Vector* vec_;
    int index_;
    // This is used to write functions using every vector at the same time.
    // We assume the iterator is a friend of our vector.
    using sequence_type = std::index_sequence<std::tuple_size_v<decltype(vec_->vectors_)>>;

    // We create a proxy object from the components at 'index_' of each 'vec_' vector.
    template <size_t...Is>
    ref_proxy<typename Vector::value_type> make_proxy(std::index_sequence<Is...>) const noexcept {
        return { std::get<Is>(vec_->vectors_)[index_] ... };
    }
public:
    iterator& operator++() noexcept { return ++index_, *this; }
    // I don't put the rest of our random-access iterator implementation (based on index_).
    // The full code is available on GitHub as explained at the end of the article.

    // The dereferencing operator simply returns a new proxy object.
    auto operator*() const noexcept {
        return make_proxy(sequence_type{});
    }
};
}

Once we have this iterator, the soa::vector::operator[] is now easy to write:

template <class T>
auto soa::vector<T>::operator[](int i) {
    return *soa::iterator<vector<T>>{ this, i };
}

Implementing push_back

The push_back method needs to deconstruct the given object into its components:

template <class T>
void soa::vector<T>::push_back(T const& val) {
    auto elements = detail::as_tuple(val);
    detail::for_each(vectors_, elements, [] (auto& vec, auto const& elt) {
        vec.push_back(elt);
    });
}

To implement the helper functions used in this code, we can use C++17 structured bindings with aggregates to have a tuple of references on its members. Then, we can iterate over the tuple elements and put them into our tuple of vectors (that can be deduced from aggregate tuple).

namespace detail {

// Arity gives us the number of components of an aggregate by counting the number of references in it’s proxy.
template <class T>
constexpr int aggregate_arity = sizeof(soa::ref_proxy<T>) / sizeof(void*);

// as_tuple returns a tuple of references on the given aggregate’s components.
// Currently, we cannot make this function variadic, so we must recopy come code, manually or with a macro.
// We skip this here for simplicity but you can find the details in the Github library
// As an example, this is the function implementation when our aggregate has three components :
template <class T>
auto as_tuple_impl(T& agg, std::integral_constant<int, 3>) {
    auto& [v1, v2, v3] = agg;
    return std::forward_as_tuple(v1, v2, v3);
}

// This function dispatches the call on the 'as_tuple_impl' function which takes the right number of components.
template <class T>
auto as_tuple(T&& agg) {
    using arity = std::integral_constant<int, aggregate_arity<T>>;
    return as_tuple_impl(agg, arity{});
}

// for_each applies a function on two tuple elements.
template <class T1, class T2, class BinaryOp, size_t...Is>
void for_each_impl(T1& t1, T2& t2, BinaryOp& f, std::index_sequence<Is…>) {
    (f(std::get<Is>(t1, t2)), ...);
}
template <class T1, class T2, class BinaryOp>
void for_each(T1&& t1, T2&& t2, BinaryOp&& f) {
    static_assert(std::tuple_size_v<T1> == std::tuple_size_v<T2>);
    using seq = std::make_index_sequence<std::tuple_size_v<T1>>;
    for_each_impl(t1, t2, f, seq{});
}

}

We now have our core functionalities: a way to add elements and to access them (through iterators). Then, we can copy most of std::vector‘s interface (begin() and end(), back() and front(), emplace_back(components...), vector moves and copies, …) by using the same tools.

In particular, this makes our SoA vector compatible with STL algorithms:

std::string get_name_by_age(soa::vector<person> const& persons, int required_age) {
    auto const it = std::find_if(persons.begin(), persons.end(), [=] (auto&& p) {
        return p.age == required_age;
    });
    return (*it).name;
}

Performance of traversing the collection

Since we’re having a proxy in the iterator, iterating in the collection could be harder to optimize for the compiler.

Let’s consider a simple traversal in the standard case of an AoS simple collection such as an array:

void copy_ages(int const* src, int* __restrict dst) {
    for (int i = 0; i < persons.size(); ++i) {
        dst[i] = src[i];
    }
}

With the right optimization flags (-O3 for Clang and gcc and /Ox for MSVC), the compiler generates a memcpy to haul the entire collection.

Let’s now consider the same traversal with our SoA collection, that uses a proxy with its iterator:

void copy_ages_with_proxy(soa::vector<user::person> const& persons, int* __restrict dst) {
    for (int i = 0; i < persons.size(); ++i) {
        dst[i] = persons[i].age;
    }
}

With -O3, GCC and Clang compiles this function with memcpy, but not MSVC with /Ox. MSVC generates a less efficient loop that copies the elements one by one.

For more complex use cases, there is a good chance that we miss this kind of optimizations on every compiler.

The whole point of SoA was an optimal performance, so can we do something to have a optimized traversal, regardless of the compiler?

One way to do this is to give to the user a way to access directly one of our array of components.

namespace soa {

template <class T>
struct vector_span {
    T* begin_;
    T* end_;
    T* begin() const { return begin_ };
    T* end()   const { return end_ };
};

template <class T>
template <size_t I>
auto vector<T>::get_span() {
    auto& vec = std::get<I>(vectors_);
    return vector_span{ vec.data(), vec.data() + vec.size() };
}

}

The above code uses an numeric index (size_t I) to identify the member in the data object.

But now that we know the component’s names, we can allow the user to access to these arrays through these names! To achieve this, we can inherit those spans from our soa::vector. To do so, we will have a third class created with our macro:

SOA_DEFINE_TYPE(person, name, age);

This macro generates this code:

namespace soa {

template <>
struct members<person> {
    vector_span<decltype(std::declval<person>().name)> name;
    vector_span<decltype(std::declval<person>().age)> age;
};

}

We then make our soa::vector inherits from this structure:

namespace soa {

template <class T>
class vector : public members<T> {
    // ...
};
}

Now we can access our components without the proxy:

int sum_ages(soa::vector<person>& persons) {
    return std::reduce(persons.age.begin(), persons.age.end());
}

These spans can be painful to maintain when the vector is modified, but our functionality is here. In my implementation, I improved this by storing one pointer per span and removing the tuple of vectors. As a result, I have only one allocation and no information is copied (the size is stored once and can be retrieved by the custom spans).

Polishing off the interface

Finally, we can improve our proxies by adding them operators:

  • ref_proxy<T>::operator T() to construct a T by copying the proxy elements. It requires T to be copy constructible.
  • ref_proxy<T>::operator=(T const&) to assign by copy T elements to the proxy’s elements. It also requires T to be copy constructible.
  • ref_proxy<T>::operator=(T&&) to assign by move T elements to the proxy’s elements.

Here are the new expressions this allows us to write:

person change_last_person(soa::vector<person>& persons) {
    // Move assignment operator
    persons.back() = { "Abbie", 26 };

    // Cast operator
    return persons.back();
}

Unfortunately, I don’t know a way to construct a T by moving the proxy elements. We can continue to extend our interface but I think we covered most things here. My final implementation can be found on the GitHub repository . I’ll be glad to know any alternative design or insights about it !

There is also eastl::tuple_vector<Ts…> that I discovered after creating soa::vector: it has the same goal as soa::vector, although it targets tuples.

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin