Jonathan Boccara's blog

Size and Capacity of STL Containers

Published October 13, 2017 - 0 Comments

Daily C++

Size and capacity are concepts that look somewhat similar from afar. But mixing them up can lead to under-optimized or even plain wrong code. This article explains all about size and capacity of standard containers, and how these two concepts differ.

A big thanks to  Stephan T. Lavavej, who kindly provided his feedback on the article which I worked into the post after its initial release.

Definitions of Size and Capacity

The size of a container is the number of elements it contains. Or said differently, the number of elements passed over in an iteration from beginning to end. This is an information which is fundamentally part of the container interface.

The capacity of a container is the maximum number of elements it could contain without having to allocate new memory. Although this is accessible through the interface of some containers, this is more of an implementation issue, and related to time and memory optimization.

Therefore, when so many elements are added to a container that its size would overgrow its capacity, new memory is allocated. In this case, a std::vector for example would transfer its contents to the newly allocated memory location (note that, in C++11, if their move constructor is noexcept then the contents are moved rather than copied – but the memory allocation and deallocation remains here).

Now that we’ve got the definitions done, the rest of this article shows how to read or manipulate the size and capacity of STL containers.

Size

Retrieving information about size

All standard containers provide a size() method that returns the number of elements they actually contain. Note that std::string also has a length() method, that does exactly the same thing as size but with a maybe more natural name.

Standard containers also provide an empty() method returning a boolean to indicate whether they contain any element or not.

The empty() method has a constant execution time for all containers. Before C++11, the size method could have linear or constant execution time for some containers (std::list in particular). For that reason, to check if a container was empty before C++11, the .empty() method was to be preferred over comparing size to 0. After C++11, calling empty or comparing size to 0 can be used indifferently (except stylistically using “empty” expresses the meaning more directly, and vector::empty()produces slightly more efficient code than comparing size with 0 for arcane reasons – thanks Stephan for this info).

All standard containers provide a max_size() method that returns the maximum number of elements that the container can hold. This is determined by the platform you use. Note that, contrary to what one could expect, max_size is not a static method. The reason for this is that individual containers can be instantiated with a custom allocator, on which the maximum size could depend. But as Scott Meyers explains in Effective STL Item 10, STL allocators are – or should – rarely be customised anyway.

If you only have a range materialised with two iterators, and not directly a container, you can know the size of the range by calling std::distance between the beginning and the end of the range.

Modifying the size

Adding or removing elements from containers modifies their size, but some other methods can also have an impact on it. Here they are:

  • the constructor accepting a number of elements. for example the following code creates a vector of size 15:
This means that the vector has initialized 15 elements to their default value.

This constructor has a variant also taking a value:

The container calls its elements’ copy constructor with the value passed. Here the vector contains 42 elements constructed with the letter ‘a’.

  • the resize method, that takes a size parameter and, optionally, a value parameter. For example here are the prototypes for std::vector::resize methods (other containers have similar methods):
resize modifies the size of the container. This means that the container may contain a different number of elements than it did before:

  • if the new size is bigger than the old size, new elements are added at the end of the container. If no value is specified to resize, the new objects are value-initialized, otherwise they are copy constructed from the specified value.
  • if the new size is smaller than the old size, the latest elements are removed.
  • if the new size is the same as the old size, resize has no effect.

Capacity

Retrieving capacity

Contrary to size, capacity does not make sense for all containers. For example, by definition of std::list that represents a linked list, capacity is always equal to its size, so std::list does not have anything related to capacity in its interface. Capacity is defined for vector, deque and string.

In fact, capacity is mainly useful for vectors and strings. Indeed, these containers reallocate their contents when capacity is overgrown by size, and this implies transferring all elements to the new memory location (whereas deques allocate incremental chunks of memory without copying or moving the data previously inserted). And in some cases you may want some control over capacity in order to avoid multiple reallocations and copies.

To know the capacity of a container, simply call its capacity() method (except for deque that doesn’t have this method).

Augmenting capacity

If you know in advance the number of elements that will be stored in the container, you can allow for an adequate capacity upfront, thus avoiding the cost of adjusting capacity along the insertions.
For that call the reserve() method before inserting into the container, and pass it the capacity it should allocate for.

Note though that calling reserve in certain cases could actually make the vector slower and make the push_back have a quadratic complexity. More on this in a future article dedicated to reserve.

Reducing capacity

Now if you have a vector or deque or string that contained many elements, but that was resized down to few or no elements, its size was reduced, but not its capacity. So you may want to trim this excess capacity in order to save the large allocated memory that has become useless, but the reserve method can only augment capacity. The solution depends on whether your compiler is C++11-compliant or not.

In C++11

Simply invoke the shrink_to_fit() method on the container.

Before C++11

You can use the “swap trick“, which consists of swapping the container with a new one that contains only the remaining elements:

Here a temporary vector with all the elements of v and no name is created: std::vector<int>(v.begin(), v.end()). This temporary vector is then swapped with v. The swap method efficiently swaps the vectors’ contents without actually copying the elements around.

Note the use of the vector’s range constructor (the one that takes a begin and an end), and not the copy constructor. Using the range constructor guarantees that only the container’s elements are actually copied, and not the whole capacity. Indeed, we don’t know how the copy constructor is implemented: it could copy the excess capacity (although in practice this shouldn’t happen).

Note that even if you haven’t migrated to C++11 yet, nothing prevents you from wrapping the swap trick into a shrink_to_fit function:

This makes client code arguably clearer than if it were directly using the swap trick.

In all cases

Before or after C++11, note that there is no guarantee that capacity is actually brought down exactly to size. Capacity is really up to your STL implementation. But with these techniques, it will be as small as it can get.

Capacity strategies

The decision of how much memory to allocate when a vector’s size overgrows its capacity is up to your STL implementation. However the standard imposes that the push_back() method of vector should be amortized constant, that is to say that filling a vector with n incremental push_backs should have a O(n) complexity in terms of copying elements.

How can that be achieved ?

Increasing the allocated memory by 1 when size overgrows capacity is not a solution: all elements would be copied every time a new one is added, so for n push_backs the number of copies would be:

1 + 2 + 3 + … + n

Which is n * n² / 2, so O(n²). Not good, because the standard imposes O(n).

Increasing allocated memory by a constant factor C is not a solution either, because elements would be copied

C + 2C + 3C + … + floor(n/C)

times, which is better but still O(n²). So not good enough.

A conformant solution is to double the allocated memory every time size overgrows capacity, and it is actually used by some compilers. In this case, when size reaches capacity, half of the vector has already been copied once, and a quarter of it has been copied twice, and an eighth of it three times, and so on. So the number of copies is:

n/2 + n/4 + n/8 + … = sum(1/2^k) * n

And this is O(n).

Some compilers use this technique, although not always with 2 as multiplication factor, but typically something between 1.5 and 2 in order to spare memory.

So here are the practical things to understand to differentiate size and capacity!

Share this post! Facebooktwittergoogle_pluslinkedin    Don't want to miss out ? Follow:   twitterlinkedinrss

Receive regular updates to make your code more expressive.