Jonathan Boccara's blog

How to *Efficiently* Convert a String to an int in C++

Published July 27, 2018 - 11 Comments

Today’s guest post is written by guest author jft, as a sequel to his previous article How to Convert a String to an int in C++. In this post, jft presents the performance analyses he conducted to find out which method is fastest to extract numbers from a string.
Interested in writing on Fluent C++ too? Check out the guest posting area.

This is Part 2 in our series about conversion from characters to integers.

In Part 1 we looked at the different options available and in particular the new std::from_chars() conversion function available in C++17. We discussed their pros and cons and gave examples of their usage in the context of obtaining a vector of unsigned integers from a single string in which numbers were separated by multiple spaces.

In this Part 2, we will examine the performance of these various conversion methods and provide suggestions for performance improvements. The test code can be found in this coliru.

The results

The table below gives details of the performance results obtained, for extracting numbers from a single string in which they are separated by multiple spaces:

10,000,000 (coliru) 10,000,000 (Laptop1) 50,000,000 (Laptop1) 50,000,000 (Lenovo) 50,000,000      (Laptop1 x64) 50,000,000 (Laptop2)
atol() 616 546 2,994 4,202 3,311 4,068
strtoul() 459 454 2,421 2,560 2,660 2,852
from_chars() 244 136 745 884 1,027 972
>> 1,484 7,299 37,590 47,072 31,351 48,116
stoul() 1,029 798 4,115 4,636 6,328 5,210

Note that all timings are in milli-seconds.

Laptop1 is Windows 7 64-bit, 16 GB memory, Intel i7 processor 2.6 GHZ and a hard disk. Lenovo is Windows 10 64-bit, 8 GB memory, Intel i5 processor 2.6 GHZ and a hard disk. Laptop2 is Windows 7 64-bit, 4 GB memory, Intel i5 processor 2.6 GHZ and a SSD. For all except coliru, the compiler used is MS VS2017 15.7.4 with all optimizations enabled and optimized for speed (x86 unless specified).

The first thing that hit me when I initially saw these figures was how slow stream extraction is compared to the other methods – and the second was how fast the new std::from_chars() function is! For coliru it is twice as fast as the next fastest (strtoul()) and for the laptop/Lenovo about three times as fast (although for x64 the ratio is slightly less). So the new conversion std::from_chars() certainly fulfils its promise re performance and is also easy to use.

A simpler use case: extracting the first number from a string

These timings were to extract numbers from a single string in which they were separated by multiple spaces. But what about just extracting the first number from a string? This wouldn’t then require the starting position of the next conversion to be set, or sub-strings to be extracted. So would other conversion methods such as stoul() start to show their true form? Would this show different timings with a different winner – or closer results? Time for another investigation.

The code for this is available on this coliru. This program creates a vector of string containing 3,000,000 (or the number specified by the const MaxNumV – Note that coliru times out if the program takes too long to execute, so the maximum number used is constrained) consecutive positive numbers which are then timed using different approaches as they are converted into vectors of unsigned integers. Note that no error checking is performed as it is known that all the characters to be converted are of the correct format and only contain digits. The timings for these various approaches are then displayed.

All the code isn’t shown or discussed here as it’s really a simpler version of the previous test code but the vector test code for std::from_chars() is as below to show how easy it is to use this conversion function:

size_t vec_as_from_chars(const vector<string>& vs, vector<size_t>& nos)
{
    size_t n = 0;

    for (const auto& s : vs) {
        from_chars(s.data(), s.data() + s.size(), n);
        nos.push_back(n);
    }

    return nos.size();
}

Where vs is the vector string to be converted and nos is the vector of size_t of the converted numbers. For each entry in vs, s is the string to be converted with s.data() giving it’s starting address and s.data() + s.size() giving its end address as required by std::from_chars().

The timings, are just as instructive as previously as we can see from the results obtained in the table below:

3,000,000 (coliru) 3,000,000 (Laptop1) 50,000,000 (Laptop1) 50,000,000 (Lenovo) 50,000,000 (Laptop1 x64) 50,000,000 (Laptop2)
atol() 157 138 2,340 2,665 2,699 2,865
strtoul() 140 135 2,303 2,618 2,724 2,765
from_chars() 20 21 331 388 229 385
>> 1,824 3,399 58,387 75,585 48,496 74,104
stoul() 125 199 3,451 3,817 4,020 4,568

Note that all timings are in milli-seconds.

Again, stream extraction is by far the slowest (although to be fair in this case every string has first to be converted into a stringstream). But note just how fast std::from_chars() is. It is approximately 7 times faster than the next fastest (stoul()) for x86 and 12 times faster for x64 code! The percentage speed improvement from std::from_chars() in this situation is even more marked than for the previous. Wow!

C you all

You may have noticed that there’s one type of string that we haven’t discussed so far – the C null-terminated string. Which of course you yourselves would never use, would you – but which you might come across or have to deal with if you use command-line program arguments. So I extended the vector example from above so that the end parameter for std::from_chars() has to find the end of string. Consider:

size_t vec_as_from_chars_c(const vector<string>& vs, vector<size_t>& nos)
{
    size_t n = 0;

    for (const auto& s : vs) {
        from_chars(s.c_str(), s.c_str() + strlen(s.c_str()), n);
        nos.push_back(n);
    }

    return nos.size();
}

Here strlen(s.c_str()) is used to obtain the number of characters in the C-style string which are added to the base address to obtain the end address. Surely, in this case, the overhead of obtaining the end of the string would outweigh the performance advantage of std::from_chars()?

The table below gives details of the performance results obtained. I’ve only included the previous results for strtoul() and from_chars() for comparison purposes.

3,000,000 (coliru) 3,000,000 (Laptop1) 50,000,000 (Laptop1) 50,000,000 (Lenovo) 50,000,000 (Laptop1 x64) 50,000,000 (Laptop2)
strtoul() 140 135 2,303 2,618 2,724 2,765
from_chars() 20 21 331 388 229 385
from_chars_c() 27 38 642 807 640 756

Note that all timings are in milli-seconds.

But no. Again this shows that std::from_chars() is still the fastest – even when the end position has first to be calculated!

The fastest way to read a collection of ints from a file

The slow results for stream extraction, although much higher than expected, reflect the results of previous work undertaken extracting data from text files where stream extraction was again determined to be the worst for performance.

The best was found to be to read the whole file into memory (where possible), set a std::string_view object  to represent this memory and then create a vector of std::string_view for the required extractions – or just extract that required from the std::string_view object. See test code here.

This simply creates a file consisting of numbers separated by spaces. The contents of this file are then read and processed in two different ways. The first is probably what would be classed as the “C++” way:

while (ifs >> n)
    nos.push_back(n);

Which simply extracts the number from the input file stream and inserts it into the vector. The other method is to read the whole file into memory, set a std::string_view object to represent this memory and then call the as_from_chars() function discussed in Part 1.

Consider the code to read a file (already opened for ifs object and stream assumed to be ‘good’ (ie not in an error state) into memory:

ifs.seekg(0, ifs.end);

const auto fileSize = static_cast<size_t>(ifs.tellg());
const auto buffer = make_unique<char[]>(fileSize);
vector<size_t> nums;

ifs.seekg(0);
ifs.read(buffer.get(),fileSize);

return as_from_chars(string_view(buffer.get(), static_cast<size_t>(ifs.gcount())), nos);

Line 3 finds the size of the file in bytes by obtaining the position of the end-of-file. It then allocates the required memory (using std::make_unique<>() for heap allocation, as text buffer can be arbitrarily large) and reads all of the file into this memory, finally setting a std::string_view variable to represent this.

Note that the value returned by .gcount() may be less than (but never more) than the value returned by .tellg(). The reason for this is that the file is opened in ’text mode’ (as opposed to ‘binary mode’) so that \r\n is converted to \n etc.

Thus the number of characters actually placed into the buffer may be less than that stored in the file depending upon how many such conversions are undertaken. Thus .gcount() can’t be compared to .tellg() to ensure that the read is successful as .gcount() is likely to be less. Again, no error checking is performed as it is assumed that all numbers to be converted are ‘good’ (ie all non-space characters are digits).

I obtained the following timings:

350,000 (coliru) 350,000 (Laptop1) 50,000,000 (Laptop1) 50,000,000 (Lenovo) 50,000,000 (Laptop1 x64) 50,000,000 (Laptop2)
file stream extraction (>>) 49 287 39,904 49,853 34,310 52,567
file memory read 16 18 2,725 2,904 2,758 3,289

Note that all timings are in milli-seconds.

Which shows that file stream extraction for Windows using MS VS2017 is about 15 times slower than first reading the whole file into memory and then processing this using std::string_view and std::from_chars().

If you are performing read operations on files, we see that the quickest method is to read the whole file (if possible) into memory and then treat this as a std::string_view object. If you need to extract numbers, then use std::from_chars() from this std::string_view object.

The moral of the story

This has been a very constructive exercise in comparing the features and performance of the various methods available for string to number conversion. Although no timings have been done for floating point (not yet implemented for VS2017 at the time of this writing), there is no reason to suppose that the results for integers won’t be replicated.

To me, the moral of this story is quite simple. Unless otherwise required for some reason, always use std::from_chars() to perform character conversions!

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed