Jonathan Boccara's blog

A Recap on string_view

Published February 19, 2021 - 0 Comments

The string capabilities of C++ have little evolved since C++98, until C++17 brought a major evolution: std::string_view.

Let’s look at what string_view is about and what it can bring to your code, by making it more expressive and making it run faster.

std::string_view

As its name suggests, std::string_view is a view on a string. But let’s define view and let’s define string.

A view…

A view is a light object that can be constructed, copied, moved and assigned to in constant time, and that references another object.

We can draw a parallel with C++20’s range views that model the concept std::ranges::view. This concept required that views can be copied, moved and assigned in constant time, and views typically reference other ranges.

C++17 didn’t have concepts and ranges, but std::string_view already had the semantics of a view. Note that std::string_view is a read-only view. It cannot modify the characters in the string that it references.

Also, note that you don’t have to wait for C++17 to use string_view. There are some C++11 compliant implementations, such as the one of Abseil for example.

… on a string

A view references something, and here std::string_view references a string. This “string” denomination includes three things:

  • a std::string,
  • a null-terminated char*,
  • a char* and a size.

These are the three inputs you can pass in to build a string. The first one is defined in the std::string class as a implicit conversion operator, and the last two correspond to std::string_view‘s constructors.

In summary, std::string_view is a lightweight object that reference a C or C++ string. Now let’s see how that can be useful to your code.

A rich API for cheap

Let’s go back to the history of strings in C++.

The roots of std::string

Before C++, in C, there was no string class. C forced us to carry around char* pointers, which has two drawbacks:

  • there is no clear ownership of the array of characters,
  • the API to operate on them is very limited.

As Scott Meyers mentions towards the end of More Effective C++, when building the C++ language, “As Chair of the working group for the C++ standard library, Mike Vilot was told: ‘If there isn’t a standard string type, there will be blood in the streets!'”. And C++ had the std::string class.

std::string solves the above two problems of char*, as std::string owns its characters and deals with the associated memory, and it has a very rich interface, that can do many, many things (it is so big that Herb Sutter describes its “monolith” aspect in the last 4 chapters of Exceptional C++).

The price of ownership

Ownership and memory management of the array of characters is a big advantage, that we can’t imagine how we’d live without today. But it comes with a price: each time we construct a string, it has to allocate memory on the heap (assuming it has too many characters to fit in the small string optimisation). And each time we destruct it, it has to hand back this heap memory.

These operations involve the OS and take time. Most of the time they go unnoticed though, because most code is statistically not critical for performance. But in the code that happens to be performance sensitive (and only your profiler can tell you what code this is), repeatedly building and destructing std::string can be unacceptable for performance.

Consider the following example to illustrate. Imagine we’re building a logging API, that uses std::string because it’s the most natural thing to does it makes the implementation expressive by taking advantage of its rich API. It wouldn’t even cross our minds to use char*:

void log(std::string const& information);

We make sure to take the string by reference to const, so as to avoid copies that would take time.

Now we’re calling our API:

log("The system is currently computing the results...");

Note that we’re passing a const char*, and not a std::string. But log expects a std::string. This code compiles, because const char* is implicitly convertible to std::string… but despite the const&, this code constructs and destructs a std::string!

Indeed, the std::string is a temporary object built for the purpose of the log function, and is destructed at the end of the statement calling the function.

char* can come from string literals as in the above example, but also from legacy code that doesn’t use std::string.

If this is happening in a performance sensitive part of the codebase, it may be too big of a performance hit.

What to do then? Before string_view, we had to go back to char* and forgo the expressiveness of the implementation of log:

void log(const char* information); // crying emoji

Using std::string_view

With std::string_view we can get the best of both worlds:

void log(std::string_view information);

This does not construct a std::string, but merely a lightweight view over the const char*. So no more performance impact. But we still get all the nice things of std::string‘s API in order to write expressive code in the implementation of log.

Note that we pass string_view by copy, as it has the semantics of a reference.

Pitfall: memory management

Since a std::string_view references a string and doesn’t own it, we have to make sure that the referenced string outlives the string_view. In the above code it looked OK, but if we’re not careful we could get into memory issues.

For example consider this code, simplified for illustration purposes:

std::string_view getName()
{
    auto const name = std::string{"Arthur"};
    return name;
}

This leads to undefined behaviour: the function returns a std::string_view pointing to a std::string that has been destroyed at the end of the function.

This issue is not new and specific to std::string_view. They exist with pointers, references, and in the general sense with any object that references another one:

int& getValue()
{
    int const value = 42;
    return value;
} // value is destructed!

More and more views in C++

As mentioned earlier, C++20 introduces the formal concept of view for ranges, and brings in a lot more views into the standard. These include transform, filter and the other range adaptors, which are some of the selling arguments of the ranges library.

Like string_view, they are lightweight objects with a rich interface, that allow to write expressive code and pay for little more than what you use.

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin