Jonathan Boccara's blog

The Complete Guide to Building Strings In C++: From “Hello World” Up To Boost Karma

Published December 19, 2017 - 21 Comments

Daily C++ (this post is daily-able but you’ll need to split its independant parts across several days)

Building strings sounds like one of the most basic things a programmer can do in any language. But in fact there is a lot of ways to go about it in C++, depending on how complex your need is. Here we see a span of alternatives, ranging from the  basic "Hello, world." of std::string‘s constructor all the way up to Boost Karma that lets you express complex string building in very concise code.

As this is a relatively long post, here is its outline:

  • Building a string with… a string
  • Building a string out of TWO strings
  • Building a string out of N strings
  • Building a string from a file
  • Throwing everything but the kitchen *string* at it
  • Boost Format: Decoupling formatting from contents
  • Boost Karma, there we are
  • Let’s go out and build strings now

Building strings C++

Building a string with… a string

The most basic way to build a string, that you quite certainly already know, is this:

std::string greetings = "Hello, world."

Structured string code

What is a little less known though, is that long strings can be broken over lines, without any special syntax except quotes:

std::string longGreetings = "Hello, world. How are you doing? I suppose that by now "
                            "you must have your inbox chock-full of greetings like "
                            "this one, in like hundreds of programming languages and "
                            "sent over by thousands or millions of software developers "
                            "taking up the challenge of learning a new language. "
                            "World, you must be the most popular mentor for beginners "
                            "but you'll find this message a little bit different: in "
                            "it you'll hear about Boost Karma, which I hope you'll "
                            "find both unusual and interesting. Keep it up, world.";

Handy, right?

This is useful for example for writing SQL requests in your code, because they can sometimes be more readable if wrapped over several lines. And don’t forget to put a space at the end of each substring if needed, otherwise the first word of a given line will be stuck onto the last one of the previous line.

This trick also lets you create straight strings but with code indented and spread over several lines. The following string for example:

std::string s = "(field1=value1) or ((field6=value2 or field2=value3 or field3=value4) and (field1=value2))";

can be expanded into a more structured code, but keeping the same exact value:

std::string s = "("
                    "field1=value1"
                ")"
                " or "
                "("
                    "("
                        "field6=value2"
                        " or "
                        "field2=value3"
                        " or "
                        "field3=value4"
                    ")"
                    " and "
                    "("
                        "field1=value2"
                    ")"
                ")";

I found this helpful more than once.

Raw string literals

The end of a string literal in code is delimited by a quote ("). But what if you want your string to actually contain a quote? It needs to be escaped with a backslash (\):

std::string stringInQuote = "This is a \"string\"";

Printing out that string indeed gives:

This is a "string"

In C++11, raw strings literals allow to treat every character as part of the string. An R marks a raw string, and it is surrounded by parentheses the following way:

std::string stringInQuote = R"(This is a "string")";

This creates the same string as above. Note how the quotes are no longer escaped.

Every character inside a raw string counts as part of the string, and this includes new lines and other blank space. For instance the following raw string literal:

std::string stringInQuote = R"(This is a "string"
                               and a second line)";

looks like this when printed out:

This is a "string"
                               and a second line

The whitespace comes from the fact that the second line inside the raw string literal is away from the left margin of the text editor of the IDE. So you need to be careful with that. If you want several lines of a raw string to be aligned, you need to align them against the left margin in the source code itself:

int main()
{
    std::string stringInQuote = R"(This is a "string"
and a second line
and a third)";
    
    std::cout << stringInQuote << '\n';
}

which can seem a curious form of indentation.

std::string‘s constructor

One last thing about std::string‘s constructor: you can build a string that consists of the repetition of one character. For example the following code:

std::string s(10, 'a'); // read: 10 times 'a'
std::cout << s << '\n';

outputs:

aaaaaaaaaa

whmunch cri printf stringich is the savage sound emitted by a software developer who lost part of its humanity by spending hours chasing a non-reproducible bug caused by an incorrect printf. More on printf later.

Building a string out of TWO strings

The most simple way to concatenate strings in C++ is by using the + (or +=) operator:

std::string s1 = "Hello, ";
std::string s2 = "world.";

std::string s3 = s1 + s2;
s1 += s2;

These operators have several overloads, including one taking a const char* to append string literals:

std::string s1 = "Hello, ";
std::string s2 = s1 + "world.";

or even individual characters:

s2 += '!';

Now you may wonder what the performance cost of these operations is. Is it better to use operator+ or operator+=? I have thrown the comparative cases of building a single string into a Google Benchmark, testing the difference between:

std::string s4;
s4 = s1 + s2 + s3;

and:

std::string s4;
s4 += s1;
s4 += s2;
s4 += s3;

for strings of various sizes, and on my tests the difference was not significant for long strings, and operator+= was slightly faster for small strings. In this case I suspect that the Return Value Optimization plays a role in this. But this can vary widely between compilers, so if you want to know for sure on your platform you’d still need to run a test I’m afraid.

Note that you can call the reserve method on the result string before performing the concatenation, to let it know how much data is going to come in and let it allocate. But this can have surprising performance results, and it will be the topic of a dedicated post.

Building a string out of N strings

Imagine the following scenario: you have a bunch of strings, and you want to concatenate them all into one big string. How to do this in C++?

One way to go about this in a one liner is by a direct use of std::accumulate:

std::string result = std::accumulate(begin(words), end(words), std::string())

Indeed, std::accumulate takes a collection and an initial value, and successively applies operator+ on the value and each element of the collection, each times updating the value with the result of the sum. And, as we saw just above, operator+ concatenates two strings.

Note here that the initial value has to be std::string() and not simply "" because std::accumulate takes a template parameter for the value. And since there is no implicit conversion in template type deduction, the algorithm will consider that it is operating on const char* (which is the type of "") and this conflicts with the outcome of operator+ which is an std::string and can’t be assigned back into the accumulator.

Although this method is very concise, it’s not the fastest you can get. Indeed, lots of strings are constructed and destroyed during the traversal of the collection. To use the same string all along the traversal of the collection, you can roll out a simple loop:

std::string result;
for (std::string const& word : words)
{
    result += word;
}

I’ve compared the two pieces of code with Google Benchmark, and the second one (without algorithms) came out 4.5x faster than the first one in my test.

And to make the test fairer I haven’t added a reserve with the total size of the concatenated string, but in practice you would probably want to add this before the loop:

const int length = std::accumulate(begin(words), end(words), 0, [](int acc, std::string const& word){return acc + word.length();});
result.reserve(length);

So the algorithm version is more concise, but slower. STL algorithms generally lead to better code, but in this case I haven’t found what algorithm would be superior to the for loop on all criteria including performance. If you see how, please leave a comment.

Building a string from a file

Reading all the contents of a file into a string can be achieved the following way:

std::ostringstream fileContentsStream;
fileContentsStream << std::ifstream("MyFile.txt").rdbuf();
std::string fileContents = fileContentsStream.str();

fileContentsStream is an output stream made for building strings (see the following section). ifstream is an input stream that reads from a file and stores its contents into its internal buffer. This internal buffer can be accessed through the rdbuf method, and is read until exhaustion by the operator<< of the output stream.

Throwing everything but the kitchen *string* at it

So far we’ve covered how to make strings out of other strings. But often comes up the need of pushing other things like numbers or even custom types into a string.

To just convert a numeric value into a string, use the to_string set of overload:

int i = 42;
std::string s = std::to_string(i);

And it also works for floating point numbers.

Note that this method can’t be directly overloaded with custom types, because it lives in the std namespace, and we as C++ developers (and not library implementers) are not allowed to add anything to the std namespace.

There are ways to end up using to_string for your types though, but it requires some work explained in a dedicated article.

std::ostringstream

Now let’s get to the main component that lets you push various types objects into a string: std::ostringstream.

An ostringstream is an output stream, that is to say it offers an operator<< through which you can send it data. And when called on its str() method, the ostringstream produces the concatenation of all the data it was sent.

What makes it really powerful is that operator<< has overloads on various types. The standard offers overloads on native types, like those used in the following code:

int numberOfTomatoes = 4;
int numberOfLeeks = 2;

std::ostringstream groceryList;
groceryList << "Buy " << numberOfTomatoes << " tomatoes and "
            << numberOfLeeks << " leeks.";

std::cout << groceryList.str() << '\n';

This code outputs:

Buy 4 tomatoes and 2 leeks.

Note that I recommend that you DON’T name your ostringstreams “oss”, because it doesn’t carry any information about what they represent. Naming is an important topic that is crucial for keeping code expressive, so it’s worth making the extra effort to figure out what variables represent.

ostringstream can also be used on custom types, if they overload operator<<:

class Point
{
public:
    Point(int x, int y) : x_(x), y_(y) {}
private:
    int x_;
    int y_;

    friend std::ostream& operator<<(std::ostream& os, Point const& point)
    {
        os << '{' << point.x_ << '-' << point.y_ << '}';
        return os;
    }
};

(in this case I do use os as a name for the stream because here there isn’t much to say about it, apart that it is an output stream).

Here operator<< is customized on std::ostream and not std::ostringstream but it works because the latter derives from the former, and this way we get an implementation for the other types of outputs streams (e.g. file output stream) for free.

It can be used like in this code:

Point point(3, 4);

std::ostringstream drawingInfo;
drawingInfo << "Draw at " << point << '.';

std::cout << drawingInfo.str() << '\n';

which outputs

Draw at {3-4}.

Note that the str method outputs a temporary std::string, that is destroyed at the end of the statement it is invoked in (unless it is bound to a const reference, see Herb Sutter’s Most important const). So you can’t hold a reference to something that belongs to this particular string:

const char* c = drawingInfo.str().c_str();
std::cout << c << '\n'; // undefined behaviour

std::ostringstream and the STL

std::ostringstream can be handily connected to an output iterator especially designed for pushing into output streams: std::output_iterator, which can itself be used in STL algorithms. This is a very symmetric construction to the first one in How to split a string in C++. The following code:

std::vector<int> numbers = {1, 2, 3, 4, 5};
std::ostringstream result;
std::copy(begin(numbers), end(numbers), std::ostream_iterator<int>(result));

creates a string that contains:

12345

std::output_iterator offers the possibility to add a delimiting string between the various values sent to the ostringstream it is connected to:

std::vector<int> numbers = {1, 2, 3, 4, 5};
std::ostringstream result;
std::copy(begin(numbers), end(numbers), std::ostream_iterator<int>(result, ", "));

which creates a string that contains:

1, 2, 3, 4, 5,

Granted, there is a trailing delimiter at the end, but this overload can be very handy to quickly send space delimited values to a human readable display, at least for debugging purposes.

This is an example using std::copy which is extremely simple, but this technique works just as well with all the other algorithms in the STL.

Formatting

Streams are vast. It’s a seldom explored region of the standard library, but it resembles a little world populated by objects, functions and other tags and sentries that interact together. I certainly don’t claim to know it in depth, but I’ve fished out a bunch of its inhabitants for you, that let you do formatting.

These objects can be pushed into an output stream (and in particular into an std::ostringstream) with operator<<. While these operations don’t output characters by themselves, they indicate to the stream how you want the actual characters to be formatted.

std::setw can be used to indicate the amount of space that a piece of data should occupy in the string. If this data is smaller, then the rest of the space is padded. The padding is done after the data when using std::left, and before the data when using std::right:

std::ostringstream table;
table << std::setw(10) << std::left << "First" << '|' << std::setw(10) << std::right << 250 << '\n'
      << std::setw(10) << std::left << "Second" << '|' << std::setw(10) << std::right << 3 << '\n'
      << std::setw(10) << std::left << "Third" << '|' << std::setw(10) << std::right << 40286 << '\n';

leads to a string that contains:

First     |       250
Second    |         3
Third     |     40286

It’s a bit of a mouthful of code to not say that much, but we will take care or making it more expressive at a later time (spoiler alert: I’ll ask you to participate).

By default the padding is done with whitespace, but this can be changed with the std::setfill method. For instance the following code:

std::ostringstream table;
table << std::setfill('_')
      << std::setw(10) << std::left << "First" << std::setw(10) << std::right << 250 << '\n'
      << std::setw(10) << std::left << "Second" << std::setw(10) << std::right << 3 << '\n'
      << std::setw(10) << std::left << "Third" << std::setw(10) << std::right << 40286 << '\n';

produces this string:

First____________250
Second_____________3
Third__________40286

Note that while std::setw only affect the next data coming into the stream (which does not include std::left and such), all the others we’ve seen here maintain their effect until encountering a counter-order further down the stream.

Finally, std::setprecision can force a maximum number of digits to a displayed number. It can be used in conjunction with std::fixed to fix an exact number of decimals (and no longer digits):

std::ostringstream pi1;
pi1 << std::setprecision(3) << 3.14159;
// 3.14

std::ostringstream pi2;
pi2 << std::setprecision(15) << 3.14159;
// 3.14159

std::ostringstream pi3;
pi3 << std::fixed << std::setprecision(3) << 3.14159;
// 3.142

std::ostringstream pi4;
pi4 << std::fixed << std::setprecision(15) << 3.14159;
//3.141590000000000

To save you some time searching for the right headers, note that those formatting components are included in two headers:

  • It is in <ios> that you will find:
    • std::left
    • std::right
    • std::fixed
  • And in <iomanip> that you will find:
    • std::setw
    • std::setprecision
    • std::setfill

Thanks to Patrice Roy for pointing out this clarification.

Boost Format: Decoupling formatting from contents

Speaking about formatting, this is what Boost Format is made for.

Note: the following Boost libraries can make an intensive use of templates, which can result in slower compilation times.

The point here isn’t to duplicate the library’s official documentation, which is quite clear by the way, but rather to let you how what kind of things this library can let you do.

The approach of Boost Format is to separate the formatting instructions from the actual contents that are to be formatted. You start by first specifying how the whole string should look like, and then fill in the contents (potentially at a later time). This contrasts with std::ostringstream where formatting information alternates with the content to be formatted.

Boost Format takes a string that describes the format that the output should take, and uses operator% to feed in the contents to be formatted. It offers an operator<< that takes an standard output stream (like std::cout or an std::ostringstream) and pushes the formatted content into it.

Here is what a usage of Boost Format looks like:

std::ostringstream result;
result << boost::format("The result is %d.") % 42;

The string then produced will look like this:

The result is 42.

“Huh?” I can hear you wonder. “Isn’t this just like printf?!”.

Boost Format has indeed in common with printf that formatting is decoupled from filling contents, but the comparison pretty much stops here.

In fact, consider the story of the Titanic meeting the Iceberg in those 3 aspects:

  • you can get into real trouble if you were in the Titanic,
  • the iceberg is much stronger,
  • there is a lot more to the iceberg than meets the eye.

Here we have a very similar story between printf and Boost Format. I let you figure out which one plays the role of the Titanic.

The joy of using printf

The advantages of Boost Format over printf include:

  • More safety: while printf can sliently cause a memory corruption if the contents to be formatted do not correspond to the formatting string, Boost Format will throw exceptions.

 

  • More formatting features: the formatting possibilities of Boost Format are much richer. For example, amongst many other things, they include the reordering of the contents passed:
    std::ostringstream result;
    result << boost::format("%1% %2% %1%") % '-' % "___";
    // -___-
    

 

  • More flexibility: you can even pass in your own types as long as they have an operator<<. By using the Point class from above:
    std::ostringstream result;
    result << boost::format("Draw at %1%.") % Point(3,4);
    // Draw at {3-4}.

To start using Boost Format, simply #include <boost/format.hpp>, and off you go.

Boost Karma, there we are

This is the final step of our voyage through string building in C++.

Boost Karma, which is a part of the larger Boost Spirit library, provides more powerful features than the other components seen above, and comes with an arguably less direct syntax. Once again, the purpose here is not to replicate the well-done official documentation, but rather to give you an overview of its concepts.

Essentially, Boost Karma revolves about two types of components: generators and generating functions.

Generating functions are provided by the library. They are not too many of them. They take an input, a generator and an output, and format the input with the generator in order to put the result in the output.

And the library provides basic generators that can be combined into arbitrarily elaborate constructions.

Here is a very simple usage of Boost Karma:

using boost::spirit::karma::int_;
using boost::spirit::karma::generate;

std::string result;

generate(
    std::back_inserter(result), // the output
    int_,                       // the generator
    42                          // the input
);

(Karma’s symbols live in the namespace boost::spirit::karma so I won’t repeat the using directives in the next code examples.)

At the end of this code, result contains the string “42“.

But the generators can be combined into more complex structures, and some generation functions accept a collection of parameters.

Here is an how to display the contents of a vector, separated by commas and without a trailing comma at the end:

std::vector<int> numbers = {5, 3, 2};
std::string result;

generate(
    std::back_inserter(result), // the output
    int_ << *(", " << int_),    // the generator
    numbers                     // the input
);

This interesting bit here is the generator. It can be interpreted this way:

  • int_: print the first element (if there is one) with the format of an integer,
  • <<: “and then”: combine with another generator that will take care of the rest of the elements,
  • *: repeat the following as many times as possible. It looks like the * in regex except the C++ syntax forces this to be at the beginning since this is implemented as an overload of unary operator*,
  • ", ": print this string,
  • <<: “and then”
  • int_ print the next element as an integer.

With the above code, result contains the string “5, 3, 2“.

As a final example, generators can implement a logic combined to the elements taken by the generating function. This example is directly taken from the official documentation. It aims at formatting a complex number with the following logic:

  • if the imaginary part is null, just print the real part,
  • if not, print the number between brackets, with the real part and the imaginary part separated by a comma.
std::complex<double> c(3, -1);
std::string result;

generate(
    std::back_inserter(result),      // the output
    !double_(0.0) << '(' << double_ << ',' << double_ << ')' //
    |                                                        // the generator
    omit[double_] << double_,                                //
    c.imag(), c.real(), c.imag()     // the input
);

First have a look at the input:

c.imag(), c.real(), c.imag()

This generating function takes a variadic pack of parameters.

Now let’s see what this generator does in details:

  • !double(0.0): if the first input parameter (c.imag()) is equal to zero, this part of the generator “fails”. This mean that the rest of the generator until the next part (starting after the pipe (|) character) is ignored. A new trial will be done with the next part of the generator,
  • << ‘(‘ << double_ << ‘,’ << double_ << ‘)’: and then print the complex number is the expected format, with the second (c.real()) and third (c.imag()) argument of the input,
  • |: if the previous generator succeeded then ignore the following, otherwise try the following,
  • omit[double_]: disregards the first input argument (c.imag()),
  • << double_: and then print the second input argument (c.real()).

With the above code, results contains (3.0,-1.0).

Let’s go out and build strings now

Now your C++ toolbox is bursting with tools to build strings. You can pick the ones that fit best each of your needs.

Of course, the simpler is always the better so the tools at the top of the page are used quite extensively, and those deeper down the page bring the power and complexity that are needed in rarer contexts. Hopefully. But it is still interesting to see various designs to generate arbitrarily complex strings!

I hope this has been helpful to you and, as always, your feedback is more than welcome. In particular if you see something you think should be included in this guide – do let me know!

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed