Jonathan Boccara's blog

How std::any Works

Published February 5, 2021 - 0 Comments

In the previous post we’ve seen a very nice technique to use value semantics with inheritance and virtual methods, which was made possible by std::any.

Given its usefulness, it would be interesting to better understand std::any. Indeed, std::any is sometimes said to be “the modern void*“. But it does much more than a void*.

A void* loses information about the type of the objects it points to:

int i = 42;
void* pvi = &i;

double d = *static_cast<double*>(pvi); // incorrect cast, but execution marches on

But std::any somehow remembers information about the type:

int i = 42;
std::any ai = i;

double d = std::any_cast<double&>(ai); // throws an exception of type std::bad_any_cast

It doesn’t give access to the static type of the object it was given, but it is still able to recognize when we’re trying to cast it in the wrong type.

How does std::any perform that?

An naive implementation of std::any

std::any is a modern void* that has to remember information about type. A first implementation can be to represent it as a void* with a std::type_info:

struct any
{
    void* data_;
    std::type_info const& type_;

    // ... see below for implementation ... // 
};

We can make the constructor of std::any fill those two pieces of information pretty easily:

struct any
{
    void* data_;
    std::type_info const& type_;

    template<typename T>
    explicit any(T&& value)
        : data_{new T{std::forward<T>(value)}}
        , type_{typeid(T)}
    {
    }
};

To implement any_cast we can then just compare the typeids of the type in the any_cast and the one in the any:

template<typename T>
T& any_cast(any& aAny)
{
    if (typeid(T) == aAny.type_)
    {
        return *static_cast<T*>(aAny.data_);
    }
    else
    {
        throw std::bad_any_cast{};
    }
}

But this approach breaks down for other features of std::any. For example, to copy an std::any:

any a(42);
any b = a;

We need to call the constructor of the type of the object passed to any. And a type_info, which is runtime type information, is not enough to do that. We need code with the static type to call the copy constructor.

Keeping the static type

Keeping a type doesn’t seem possible: we can’t store a type as a value member. However, lambdas make this possible.

The key here is to store function pointers as data members, and to invoke those function pointers to get runtime types or to copy objects:

struct any
{
    void* data_;
    std::type_info const& (*getType_)();
    void* (*clone_)(void* other);

    // ... see below for implementation ... //
};

The getType_ function pointer can be called to retrieve the std::type_info of the object passed to initialize the any, and the clone_ function pointer can be used to call the copy constructor.

We can implement those two function pointers with lambdas:

struct any
{
    void* data_;
    std::type_info const& (*getType_)();
    void* (*clone_)(void* otherData);

    template<typename T>
    explicit any(T&& value)
        : data_{new T{std::forward<T>(value)}}
        , getType_{[]() -> std::type_info const& { return typeid(T); }}
        , clone_([](void* otherData) -> void* { return new T(*static_cast<T*>(otherData)); })
    {
    }
};

We’re leveraging here on a very powerful aspect of lambdas: they can include local type information, and be converted in function pointers. This a sort of type erasure, but by keeping track of the static type internally.

We can now implement the copy constructor:

struct any
{
    void* data_;
    std::type_info const& (*getType_)();
    void* (*clone_)(void* otherData);

    template<typename T>
    explicit any(T&& value)
        : data_{new T{std::forward<T>(value)}}
        , getType_{[]() -> std::type_info const&{ return typeid(T); }}
        , clone_([](void* otherData) -> void* { return new T(*static_cast<T*>(otherData)); })
    {
    }

    any(any const& other)
    : data_(other.clone_(other.data_))
    , getType_(other.getType_)
    , clone_(other.clone_)
    {
    }
};

The copy constructor of any invokes clone_, that uses the information about static types in its implementation to invoke the copy constrctuctor of the underlying object. We also copy the function pointers to make them usable by the copied object, that has the same underlying type.

Note that we could have kept the type_info as a parameters instead of using a function pointer to return it. Using a function pointer has the advantage of consistency inside of the class, but it’s not a very strong advantage.

Deallocating memory

Our implementation of any performs dynamic allocation to store its underlying object. This memory has to be handed back to the operating system at some point.

But since it is undefined behaviour to delete a void*, we have to call delete on a typed pointer. We can again use a function pointer created from a lambda to achieve that:

struct any
{
    void* data_;
    std::type_info const& (*getType_)();
    void* (*clone_)(void* otherData);
    void (*destroy_)(void* data);

    template<typename T>
    explicit any(T&& value)
        : data_{new T{std::forward<T>(value)}}
        , getType_{[]() -> std::type_info const&{ return typeid(T); }}
        , clone_([](void* otherData) -> void* { return new T(*static_cast<T*>(otherData)); })
        , destroy_([](void* data_) { delete static_cast<T*>(data_); })
    {
    }

    any(any const& other)
    : data_(other.clone_(other.data_))
    , getType_(other.getType_)
    , clone_(other.clone_)
    , destroy_(other.destroy_)
    {
    }

    ~any()
    {
        destroy_(data_);
    }
};

The real implementation of std::any

Is our implementation of any production-ready? Hardly. For the record, the implementation of std::any in libstdc++ is about 600 lines of code.

Our implementation is useful to understand the concepts underlying the implementation of any, but there is more to it. In particular, we could group all function pointers into one larger function, to reduce the size of the any. Also, we’ve ignored the small object optimisation.

Indeed, our any always allocates on the heap. The standard doesn’t impose an allocation method, but recommends to implementers to preform a small object optimisation for small objects, that is to say to store small objects within the any itself, and not perform a heap allocation.

But there is no threshold beyond which this is guaranteed to happen, or whether this will happen at all. The code of libstdc++ implements this optimisation though, and is interesting to read if you want to go further in your understanding of std::any, which is a good endeavour.

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin