Jonathan Boccara's blog

How to Construct C++ Objects Without Making Copies

Published July 17, 2018 - 19 Comments

Today’s guest post is written by guest author Miguel Raggi. Miguel is a Computer Science and Math professor at UNAM, Mexico’s largest university. He loves clean, expressive, performant C++ code (and strives to convince students to write it in this way!). Miguel is the author of discreture, an open source C++ library to efficiently generate combinatorial objects, such as combinations, partitions, set partitions, and many more.
Interested to write on Fluent C++ too? Check out the guest posting area.

C++ references are a powerful but tricky tool: used correctly, they can improve performance with little impact on the clarity of code. But used badly, they can hide performance issues, or even send a peaceful program into the realm of undefined behaviour.

In this post, we will explore how to use the various references of C++ to minimize copies when constructing an object that holds a value, and how in some cases we can even reach zero copies.

This article assumes that you’re familiar with move semantics, lvalue, rvalue and forwarding references. If you’d like to be refreshed on the subject, you can take a look at lvalues, rvalues and their references.

Copying from an lvalue, moving from an rvalue

Let’s imagine we have a TextBox class that holds a string, maybe to edit and display.

class TextBox
{
public:
   // constructors: see below
private:
   std::string text_;
};

We want to be able to construct a TextBox by passing it a std::string, and make a copy only when necessary. That is, when we pass it an lvalue. But when we pass it an rvalue, we would like to only move from that rvalue and into text_.

One way to go about this is to create two constructors:

class TextBox
{
public:
   explicit TextBox(const std::string& text) : text_(text) {}
   explicit TextBox(std::string&& text) : text_(std::move(text)) {}
private:
   std::string text_;
};

The first one takes an lvalue reference (no copy), and copies it into text_ (one copy).

The second one takes an rvalue reference (no copy) and moves it into text_ (no copy).

To make this class simpler, we can merge those two constructors into one:

class TextBox
{
public:
   explicit TextBox(std::string text) : text_(std::move(text)) {}
private:
   std::string text_;
};

What’s going on here? If we pass it an lvalue, the copy constructor of `std::string` gets called to construct the text parameter (one copy), then text is moved into text_ (no copy).

And if we pass it an rvalue, the move constructor of std::string gets called to construct the text  parameter (no copy), and then text is moved into text_ (no copy).

Referencing an lvalue, moving from an rvalue

But what if we don’t need to modify or own the object that is passed to us? This is often the case with helper or connecting classes.

Then we really just need a reference or pointer to the object, not a full copy. For example, if we have a class called TextDisplayer whose main purpose is to display some text to the window, we would like to do something like this:

class TextDisplayer
{
public:
   explicit TextDisplayer(const std::string& text) : text_(text) {}
private:
   const std::string& text_;
};

And this sometimes works fine. Except that it has an error just waiting to happen.

Consider the following three construction contexts:

std::string txt = "Hello World";
TextDisplayer displayer1(txt); // fine!
TextDisplayer displayer2(get_text_from_file()); // error!
TextDisplayer displayer3("Hello World"); // error!

Oops. Versions two and three have undefined behavior lying in wait, because the references that displayer2 and displayer3 hold are now invalid, since they were destroyed right after the constructors finish.

What we really want is for TextDisplayer to hold a reference if we are given an lvalue (that we assume will keep on existing) or alternatively, hold (and own) the full string if given an rvalue (and acquire it by moving from it).

In either case, there is no reason to make a copy, so we would like to avoid it if possible. We will see how to do just that.

Forwarding references

So how do we make a class that holds a reference if given an lvalue, but moves (and owns) when given rvalues?

This is where forwarding references come in. We wish create a template T which will be deduced as:

  • An lvalue reference if given an lvalue
  • Not a reference if given an rvalue

Fortunately, some really smart people already thought of this and gave us reference collapsing. Here is how we would like to use it to make our wrapper that never makes a copy.

template <class T>
class TextDisplayer
{
public:
   explicit TextDisplayer(T&& text) : text_(std::forward<T>(text)) {}
private:
   T text_;
};

Note: in real code we would choose a more descriptive name for T, such as String. We could also add a static_assert that std::remove_cvref<T> should be std::string.

(As pointed out by FlameFire and John Lynch in the comments section, the template parameter T in the constructor is not a forwarding reference, contrary to what the first version of this article was suggesting. However, we shall make use of forwarding references below in the deduction guide and helper function.)

If we pass an lvalue reference to the constructor of TextDisplayer, T is deduced to be an std::string&, so no copies are made. And if we pass an rvalue reference, T is deduced to be an std::string, but it’s moved in (as T is moveable in our case), so there are no copies made either.

Making the call site compile

Unfortunately, the following doesn’t compile:

std::string txt = "Hello World";
TextDisplayer displayer(txt); // compile error!

It gives the following error (with clang)

error: no viable constructor or deduction guide for deduction of template arguments of 'TextDisplayer'
   TextDisplayer displayer(txt);
                 ^

Strangely, using the rvalue version does compile and work (in C++17):

TextDisplayer displayer(get_string_from_file()); // Ok!

The problem when passing an lvalue is that constructor type deduction is done in two steps. The first step is to deduce the type for class template parameters (in our case, T) and instantiate the class. The second step is to pick a constructor, after the class has been instantiated. But once T is deduced to be a std::string, it can’t choose the constructor taking a parameter of type std:string&&. Perhaps surprisingly, the constructor chosen in the second step doesn’t have to be the one used for template parameter deduction.

We would then need to construct it like this:

TextDisplayer<std::string&> displayer1(txt);

which is not very elegant (but nonetheless works).

Let’s see two ways of solving this: The way before C++17 and the C++17 way.

Before C++17, we can create a helper function similar to make_unique or any of the make_*  functions, whose main purpose was to overcome the pre-C++17 limitation that the compiler can’t deduce class templates using constructors.

template <class T>
auto text_displayer(T&& text)
{
   return TextDisplayer<T>(std::forward<T>(text));
}

In C++17 we got automatic deduction for class templates using constructors. But we also got something else that comes along with it: deduction guides.

In short, deduction guides are a way to tell the compiler how to deduce class templates when using a constructor, which is why we are allowed to do this:

std::vector v(first, last); // first and last are iterators

and it will deduce the value type of the std::vector from the value type of the iterators.

So we need to provide a deduction guide for our constructor. In our case, it consists in adding the following line:

template<class T> TextDisplayer(T&&) -> TextDisplayer<T>; // deduction guide

This allows us to write the following code:

std::string txt = "Hello World";
TextDisplayer displayer1(txt);
TextDisplayer displayer2(get_string_from_file());

and both cases compile. More importantly, they never, for any reason, make a copy of the string. They either move or reference the original.

Making it const

One thing that we lost from the original implementation of TextDisplayer which simply saved a reference, was the constness of the std::string reference. After all, we don’t want to risk modifying the original std::string that the caller trusted us with! We should store a const reference when given an lvalue, not a reference.

It would be nice to simply change the declaration of the member variable text_ to something like:

const T text_; // doesn’t work, see below

The const is effective when we are given rvalues, and decltype(text_) will be const std::string. But when given lvalues, decltype(text_) turns out to be std::string&. No const. Bummer.

The reason is that T is a reference, so const applies to the reference itself, not to what is referenced to. which is to say, the const does nothing, since every reference is already constant, in the sense that, unlike pointers, it can’t “point” to different places. This is the phenomenon described in The Formidable Const Reference That Isn’t Const.

We can work around this issue with a bit of template magic. In order to add const to the underlying type of a reference, we have to remove the reference, then add const to it, and then take a reference again:

using constTref =  const std::remove_reference_t<T>&;

Now we have to ask T whether it is a reference or not, and if so, use constTref. If not, use const T.

using constT = std::conditional_t<std::is_lvalue_reference_v<T>, constTref, const T>;

And finally, we can just declare text_ as follows:

constT text_;

The above works in both cases (lvalues and rvalues), but is ugly and not reusable. As this is a blog about expressive code, we should strive to make the above more readable. One way is to  add some extra helpers that can be reused: const_reference, which gives a const reference to a type (be it a reference or not), and add_const_to_value, which acts as std::add_const on normal types and as const_reference on references.

template<class T>
struct const_reference
{
   using type = const std::remove_reference_t<T>&;
};

template <class T>
using const_reference_t =  typename const_reference<T>::type;

template <class T>
struct add_const_to_value
{
   using type =  std::conditional_t<std::is_lvalue_reference_v<T>, const_reference_t<T>, const T>;
};

template <class T>
using add_const_to_value_t =  typename add_const_to_value<T>::type;

And so our TextDisplayer class can now be declared like this:

class TextDisplayer
{
   // ...
private:
   add_const_to_valuet<T> text_;
};

Isn’t there a risk of invalidating our references?

It’s difficult (but possible) to invalidate our reference to the string. If we hold the string (when given an rvalue), there is no way for it to be invalidated. And when given an lvalue, if both the lvalue and TextDisplayer live in stack memory, we know the lvalue string will outlive the TextDisplayer, since the TextDisplayer was created after the string, which means the TextDisplayer will be deleted before the string. So we’re good in all those cases.

But some more elaborate ways of handing memory in client code could lead to dangling references. Allocating a TextDisplayer on the heap, for example, as in new TextDisplayer(myLvalue), or getting it from a std::unique_ptr, leaves the possibility of the TextDisplayer outliving the lvalue it is referring to, which would cause undefined behaviour when we try to use it.

One way to work around this risk would be disable operator new on TextDisplayer, to prevent non-stack allocations. Furthermore, as is always the danger when holding pointers or references, making copies of TextDisplayer could also lead to issues and should also be forbidden or redefined.

Finally, I guess we might still manually delete the string before TextDisplayer goes out of scope. It shouldn’t be the common case, but I don’t think there is anything we can do about that. But I’ll be happy to be proven wrong in the comments section. Bonus points if your solution doesn’t involve std::shared_ptr or any other extra free store allocations.

You may also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed