Smart developers use smart pointers (1/7) – Smart pointers basics

Published August 22, 2017 - 12 Comments

One thing that can rapidly clutter your C++ code and hinder its readability is memory management. Done badly, this can turn a simple logic into an inexpressive slalom of mess management, and make the code lose control over memory safety.

The programming task of ensuring that all objects are correctly deleted is very low in terms of levels of abstraction, and since writing good code essentially comes down to respecting levels of abstraction, you want to keep those tasks away from your business logic (or any sort of logic for that matter).

Smart pointers are made to deal with this effectively and relieve your code from the dirty work. This series of posts will show you how to take advantage of them to make your code both more expressive and more correct.

We’re going to go deep into the subject and since I want everyone to be able to follow all of this series, there is no prerequisite and we start off here with the basics of smart pointers.

Here is the content of the series:

The stack and the heap

Like many other languages, C++ has several types of memories, that correspond to different parts of the physical memory. They are: the static, the stack, and the heap. The static is a topic rich enough to deserve its own moment of glory, so here we focus on the stack and the heap only.

The stack

Allocating on the stack is the default way to store objects in C++:

int f(int a)
{
    if (a > 0)
    {
        std::string s = "a positive number";
        std::cout << s << '\n';
    }
    return a;
}

Here a and s are stored on the stack. Technically this means that a and s are stored next to one another in memory because they have been pushed on a stacked maintained by the compiler. However these concerns are not so relevant for daily work.

There is one important, crucial, even fundamental thing to know about the stack though. It is at the basis of everything that follows in the rest of this series. And the good news is that it’s very easy:

Objects allocated on the stack are automatically destroyed when they go out of scope.

You can re-read this a couple of times, maybe tatoo it on your forearm if needed, and print out a T-shirt to your spouse reading this statement so that you can be reminded of it regularly.

In C++ a scope is defined by a pair of brackets ({ and }) except those used to initialize objects:

std::vector<int> v = {1, 2, 3}; // this is not a scope

if (v.size() > 0)
{ // this is the beginning of a scope
    ...
} // this is the end of a scope

And there are 3 ways for an object to go out of scope:

encountering the next closing bracket (}),
encountering a return statement,
having an exception thrown inside the current scope that is not caught inside the current scope.

So in the first code example, s is destroyed at the closing bracket of the if statement, and a is destroyed at the return statement of the function.

The heap

The heap is where dynamically allocated objects are stored, that is to say objects that are allocated with a call to new, which returns a pointer:

int * pi = new int(42);

After the above statement, pi points to an int object allocated on the heap.

Ok strictly speaking, the memory allocated by new is called the free store. The heap is the memory allocated by malloc, calloc and realloc which are vestiges from C that are normally no longer used in new code, and which we are ignoring in this post (but we’ll talk more about them later in the series). But the term ‘heap’ is so ubiquitous in developer jargon to talk about any dynamically allocated memory that I am using it here in that sense.

Anyway to destroy an object allocated by new, we have to do it manually by calling delete:

delete pi;

Contrary to the stack, objects allocated on the heap are not destroyed automatically. This offers the advantages of keeping them longer than the end of a scope, and without incurring any copy except those of pointers which are very cheap. Also, pointers allow to manipulate objects polymorphically: a pointer to a base class can in fact point to objects of any derived class.

But as a price to pay for this flexibility it puts you, the developer, in charge of their deletion.

And deleting a object on the heap is no trivial task: delete has to be called once and only once to deallocate a heap-based object. If it is not called the object is not deallocated, and its memory space is not reusable – this is called a memory leak. But on the other hand, a delete called more than once on the same address leads to undefined behaviour.

And this is where the code gets cluttered and loses expressiveness (and sometimes even correctness). Indeed, to make sure that all objects are correctly destroyed, the bookkeeping varies from a simple delete to a complex system of flags in the presence of early returns for example.

Also, some interfaces are ambiguous in terms of memory management. Consider the following example:

House* buildAHouse();

As a caller of this function, should I delete the pointer it returns? If I don’t and no one does then it’s a memory leak. But if I do and someone else does, then it’s undefined behaviour. Between the devil and the deep blue sea.

I think all this has led to a bad reputation of C++ as being a complex language in terms of memory management.

But fortunately, smart pointers will take care of all of this for you.

RAII: the magic four letters

RAII is a very idiomatic concept in C++ that takes advantage of the essential property of the stack (look up on your arm, or at the upper body of your spouse) to simplify the memory management of objects on the heap. In fact RAII can even be used to make easy and safe the management of any kind of resource, and not only memory. Oh and I’m not going to write what these 4 letters mean because it is unimportant and confusing in my opinion. You can take them as the name of someone, like superhero of C++ for example.

The principle of RAII is simple: wrap a resource (a pointer for instance) into an object, and dispose of the resource in its destructor. And this is exactly what smart pointers do:

template <typename T>
class SmartPointer
{
public:
    explicit SmartPointer(T* p) : p_(p) {}
    ~SmartPointer() { delete p_; }

private:
    T* p_;
};

The point is that you can manipulate smart pointers as objects allocated on the stack. And the compiler will take care of automatically calling the destructor of the smart pointer because… objects allocated on the stack are automatically destroyed when they go out of scope. And this will therefore call delete on the wrapped pointer. Only once. In a nutshell, smart pointers behave like pointers, but when they are destroyed they delete the object they point to.

The above code example was only made to get a grasp of RAII. But by no means is it a complete interface of a realistic smart pointer.

First, a smart pointer syntactically behaves like a pointer in many way: it can be dereferenced with operator* or operator->, that is to say you can call *sp or sp->member on it. And it is also convertible to bool, so that it can be used in an if statement like a pointer:

if (sp)
{
    ...

which tests the nullity of the underlying pointer. And finally, the underlying pointer itself is accessible with a .get() method.

Second, and maybe more importantly, there is a missing aspect from the above interface: it doesn’t deal with copy! Indeed, as is, a SmartPointer copied also copies the underlying pointer, so the below code has a bug:

{
    SmartPointer<int> sp1(new int(42));
    SmartPointer<int> sp2 = sp1; // now both sp1 and sp2 point to the same object
} // sp1 and sp2 are both destroyed, the pointer is deleted twice!

Indeed, it deletes the underlying object twice, leading to undefined behaviour.

How to deal with copy then? This is a feature on which the various types of smart pointer differ. And it turns out that this lets you express your intentions in code quite precisely. Stay tuned, as this is what we see in the next episode of this series.

Don't want to miss out ? Follow:
Share this post!

About Jonathan Boccara

Smart developers use smart pointers (1/7) – Smart pointers basics

The stack and the heap

The stack

The heap

RAII: the magic four letters

Comments are closed