Jonathan Boccara's blog

Good news: strong types are (mostly) free in C++

Published May 5, 2017 - 3 Comments

Strong types are a simple and efficient tool for improving code expressiveness, by letting you express your intentions better to both the compiler and to your fellow human companions.

This post is part of the series about strong types, that keeps growing because it is such a rich topic:

A question that comes to mind fairly quickly when reading about strong types is how much will it cost in terms of performance? Should I stay away from strong types in the areas of the codeline that are really sensitive to performance, therefore forgoing their benefits in terms of code clarity?

The suspicion

The proposed implementation of strong types that we saw was using a generic wrapper:

template <typename T, typename Parameter>
class NamedType
{
public:
    explicit NamedType(T const& value) : value_(value) {}
    T& get() { return value_; }
    T const& get() const {return value_; }
private:
    T value_;
};

…that could be delcared for a specific type the following way:

using Width = NamedType<double, struct WidthTag>;
using Height = NamedType<double, struct HeightTag>;

and that could be used in an interface this way:

class Rectangle
{
public:
    Rectangle(Width, Height);
    ....
};

and at call site:

Rectangle r(Width(10), Height(12));

We even saw how you could easily fit units in there in this post about strong types, but our purpose for performance here can be served with just the above example.

The suspected costs related to the usage of strong types are simple:

  • allocating stack space for the Width object,
  • constructing it from the passed int,
  • calling .get() to retrieve the underlying value, incurring a copy of a reference,
  • destructing the Width object,
  • potentially having several Width object around during parameter passing,
  • and the same costs for the Height object.

The question is: how much will this cost? What is the price to pay for expressiveness?

Essentially, it’s free

One easy way to measure the performance impact of the usage of strong types is comparing the generated assembly to what is obtained by using the primitive types.

So we’ll compile the following class:

class StrongRectangle
{
public:
    StrongRectangle (Width width, Height height) : width_(width.get()), height_(height.get()) {}
    double getWidth() const {return width_;}
    double getHeight() const {return height_;}
  
private:
    double width_;
    double height_;
};

versus the native version:

class Rectangle
{
public:
    Rectangle (double width, double height) : width_(width), height_(height) {}
    double getWidth() const {return width_;}
    double getHeight() const {return height_;}
  
private:
    double width_;
    double height_;
};

with the following calling code:

int main()
{
  double width;
  std::cin >> width;
  double height;
  std::cin >> height;
  
  //Rectangle r(width, height);
  //StrongRectangle r((Width(width)), (Height((height))));
  
  std::cout << r.getWidth() << r.getHeight(); 
}

by putting in either of the two calls to the classes constructors. Note the extra parentheses to disambiguate the call to the StrongRectangle constructor from a function declaration, which are really annoying and are just another manifestation of the most vexing parse in C++. Note that the only case this happens is by passing named variables to a constructor with strong types. Passing literals like numbers, or calling a function that is not a constructor doesn’t need such extra parentheses.

Here is the assembly generated by clang 3.9.1 in -O2 on the very popular online compiler godbolt.org, for the version using primitive types:

main:                                   # @main
        sub     rsp, 24
        lea     rsi, [rsp + 16]
        mov     edi, std::cin
        call    std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&)
        lea     rsi, [rsp + 8]
        mov     edi, std::cin
        call    std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&)
        movsd   xmm0, qword ptr [rsp + 16] # xmm0 = mem[0],zero
        movsd   xmm1, qword ptr [rsp + 8] # xmm1 = mem[0],zero
        movsd   qword ptr [rsp], xmm1   # 8-byte Spill
        mov     edi, std::cout
        call    std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
        mov     rdi, rax
        movsd   xmm0, qword ptr [rsp]   # 8-byte Reload
        call    std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
        xor     eax, eax
        add     rsp, 24
        ret

_GLOBAL__sub_I_example.cpp:             # @_GLOBAL__sub_I_example.cpp
        push    rax
        mov     edi, std::__ioinit
        call    std::ios_base::Init::Init()
        mov     edi, std::ios_base::Init::~Init()
        mov     esi, std::__ioinit
        mov     edx, __dso_handle
        pop     rax
        jmp     __cxa_atexit            # TAILCALL

You don’t even need to look at the code in details, what we want to know is whether or not the strong type example generates more code than the primitive one.

And re-compiling by commenting out the primitive type and putting in the strong type gives… exactly the same generated assembly.

So no cost for the strong type. The holy zero-cost abstraction. The graal of modern C++. All the code related to the wrapping of strong types was simple enough for the compiler to understand there was nothing to do with is in production code, and that it could be completely optimized away.

Except this was compiled in -O2.

Compiling in -O1 doesn’t give the same result with clang. Showing the exact generated assembly code has little interest for the purpose of this post (you can have a look on godbolt if you’re interested), but it was quite bigger.

Note however, by compiling with gcc, the strong type machinery was optimized away both with -O2 and -O1.

What to think of this?

We can draw several conclusions from this experiment.

First, this implementation of strong types is compatible with compiler optimizations. If your compiling options are high enough then the code related to strong never makes it to a production binary. This leaves you with all the advantages related to expressiveness of strong types, for free.

Second, “high enough” depends on the compiler. In this experiment, we saw that gcc did away with the code in -O1, while clang did it only in -O2.

Lastly, even if the code is not optimized away because your binary is not compiled aggressively enough, then all hope is not lost. The rule of the 80-20 (some even say 90-10) means that in general, the vast majority of a codeline will matter little for performance. So when there is a very small likelihood of strong types being detrimental for performance, but a 100% one it will benefit the expressiveness and robustness of your code, the decision is quickly made. And it can still be reverted after profiling anyway.

 

Related articles:

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed