Jonathan Boccara's blog

How to Convert a String to an int in C++

Published July 24, 2018 - 14 Comments

Today’s guest post is written by guest author jft. In this article, he presents us a thorough comparison between the various ways C++ offers to extract number from a string. You will see how they differ from each other in terms of features as well as in term of ease of use in code.
Interested in writing on Fluent C++ too? Check out the guest posting area.

Since the beginnings of computer programming, there has been a need to convert characters representing numbers into actual binary numbers that the computer understands.

Once computer input moved from data entered via front-panel toggle switches (ah, the fun days…) to input from human-accessible devices like tele-types, entering say 12 meant the separate characters 1 and 2 – and not the number 12. So code was needed to perform this conversion. I can well remember writing such a conversion routine as one of my first assembler programs back in the 1970’s.

This two-article mini-series looks at the existing options available for the C++ programmer, details the new C++17 option (which is supposed to address the perceived inadequacies of the present methods, and with enhanced performance) and discusses performance issues.

In this article we will explore the available options, and the next one we will compare their performance.

In order to compare and contrast these (how they are used and their performance), we will dissect their usages with the example of obtaining a vector of unsigned integers from a string, with the numbers within the string separated by multiple spaces. Also, we will only discuss ASCII integer characters, not Unicode (or wide characters or variations of) and not floating point (although corresponding alternatives for floating point will be mentioned).

The code discussed can be found here. This first builds a string containing 10,000,000 (or the number specified by the const MaxNum – 10,000,000 is the maximum for coliru because of execution time limits) consecutive positive integers which are then timed using different approaches as they are converted into vectors of unsigned integers (note that no error checking is performed as it is known that all the chars to be converted are of the correct format). The timings for these various approaches are then displayed. But we’ll focus more on performance on various platforms in the next article.

The function parameters for these various conversion routines are: const std::string& nums, std::vector<size_t>& nos.

where nums is the string of character numbers to convert (as described above) and nos is the vector of type size_t that contains the converted unsigned numbers. However, for several of these routines an input type of std::string_view instead of const std::string& could be used where possible. In this case the function parameters are: std::string_view nums, std::vector<size_t>& nos.

Note that in this case, nums is passed by value, and not by const reference, as is usual with std::string_view.

Genesis

In the beginning was C, with its run-time library (C Run-Time Library or CRT [Note not CRL!]). As C++ was derived from C, the functionality of the CRT is available within C++. Two of these library functions – atol() and strtoul() – can be used from within C++ code to perform numeric conversions. So let’s look at these first.

atol()

[and its associates atoi(), atoll() and atof()].

This was the original C conversion function. Its usage is very simple:

long atol(const char *str);

It takes one parameter (a pointer to the characters to be converted, which can be preceded by white-space chars) and returns the converted value up to the first non-digit character (which can be the terminating null char). What could be simpler? So let’s look at its usage in the context of the test program:

size_t as_atol(const std::string& nums, std::vector<size_t>& nos)
//or alternatively:
//size_t as_atol(std::string_view nums, std::vector<size_t>& nos)
{
    // Pointer to data end excluding trailing spaces
    const auto end = nums.data() + nums.find_last_not_of(' ') + 1; 

    for (auto d = nums.data(); d < end; ) {
        for (; (d < end) && (*d == ' '); ++d); // Skip leading spaces
        nos.push_back(atol(d));
        for (; (d < end) && isdigit(*d); ++d); // Skip the numeric characters
    }

    return nos.size();
}

The first point to note (although not demonstrated here) is that there is no way to know if the conversion has been successful! If no conversion can be performed (such as trying to convert “qwerty”), then 0 is returned – which is the same as if the char 0 had been converted successfully. Although if underflow/overflow occurs then this can be detected (return value is LONG_MAX/LONG_MIN and errno [the global CRT error variable] is set to ERANGE).

The second point is that there is no way to tell at which point in the given string the conversion terminates. Conversion of “  123qwe” and “123” both return a value of 123. Hence in the above code, the converted chars have to be skipped over again (they have already been read once by atol()) before atol() is called again for the next conversion. That is the purpose of the second inner for loop. The first one simply skips to the first non-space char because although atol() would skip past these spaces, the code would still need to skip these so that the digits can be skipped. By putting the first loop before atol(), any initial spaces are only skipped over once for performance.

We also need to determine when there are no more conversions to be performed. Hence we need to find the end of the data to be converted and terminate the conversion when this point is exceeded.

atol() can be useful when a simple single conversion is required and no error checking is needed. Additionally, it doesn’t recognise, of course, std::string and std::string_view. However its limitations should rule it out where multiple conversions are required or where 0 may be a valid converted value.

strtoul()

[and its associates strtof(), strtod(), strtold(), strtol(), strtoll() and strtoull()].

This usage is a bit more complicated than atol() as it’s defined as

unsigned long strtoul(const char *nptr, char **endptr, int base);

It takes three parameters. The first is a pointer to the characters to be converted – which can start with white-space chars. The second is an optional (if not required then we can pass nullptr) pointer to the address of the variable that will be set to indicate the address of the first character not converted. And the third is the base for the conversion (note that this doesn’t default to 10 and has to be specified!).

It then returns the converted value up to the first non-digit character (which can be the terminating null char). So let’s look at the test example:

size_t as_strtoul(const std::string& nums, std::vector<size_t>& nos)
//or alternatively:
//size_t as_strtoul(std::string_view nums, std::vector<size_t>& nos)
{
    const char *str = nullptr; // Start pointer – gets set to last in the loop
    auto last = nums.data(); // Points to last character not converted

    do
        if (const auto n = strtoul((str = last), const_cast<char**>(&last), 10); last != str)
            nos.push_back(n);

    while (last != str);

    return nos.size();
}

This is simpler, more fluent code than the atol() example. It is also more efficient as it determines the next conversion starting point from the result of the previous conversion – thus eliminating the inner for loops that were needed with atol().

However, strtoul() still returns 0 if no conversion has been performed – although in this case nptr and endptr (if used) will have the same value so it is possible to determine if a conversion has/has not been performed and the position of the terminating char. Overflow(underflow) detection is the same as for atol(). So strtoul() corrects the two glaring issues with atol(). However, like atol(), it also doesn’t recognise std::string and std::string_view. For many, this is the ‘go to’ function when a conversion is required.

The New Testament

And so it came to pass that C++ was created and we saw that it was good. So what did the original C++ bring to the conversion table?

There were the new std::string (but no conversions) and std::istringstream class with stream extraction (>>) which enabled numbers to be easily extracted from a string stream with the specified type.

The test example using this method gives:

size_t as_stream(const std::string& nums, std::vector<size_t>& nos)
{
    for (auto [iss, n] = std::pair(std::istringstream(nums), 0U); iss >> n; nos.push_back(n));

    return nos.size();
}

Although stream extraction can determine if an error occurred and the character at which this happened, these are not easy to do (and are not demonstrated in the example code). The state of the stream has to be determined and reset if further extractions are required and the ‘bad’ characters have to be skipped before the next extraction.

However, unlike atol() and strtoul(), there is no method to determine if an overflow/underflow occurred. Also note that a string stream can only be constructed from a std::string object – not from a std::string_view object. But as this is a stream extraction, the usual input manipulators can be used (eg dec/hex/oct, ws etc).

As for performance? – wait until the next instalment to determine how efficient this is.

C++11 and stoul()

C++11 brought stoul() [and its variations stoi(), stol(), stoll(), stoull(), stof(), stod(), stold()] and is defined as:

unsigned long stoul(const std::string& str, size_t* idx = 0, int base = 10);

Which in many ways looks like strtoul() with an important difference – you can’t specify the starting position in the string!

stoul() takes three parameters. The first is a const reference to the string object which contains the characters to be converted – and like strtoul(), preceding white-space chars are ignored. The second is an optional (if not specified then 0 [for nullptr] is used) pointer to the address of the variable that will be set to indicate the index of the first character not converted – ie the number of converted characters. The third is the base, which does default to 10 if not specified.

It returns the converted value up-to the first non-digit character or the end of the string.

So let’s look at the test example:

size_t as_stoul(const std::string& nums, std::vector<size_t>& nos)
{
    constexpr auto numdigs = std::numeric_limits<size_t>::digits10 + 1; // Maximum number of characters for type
    const auto endstr = nums.find_last_not_of(' ') + 1; // End of data excluding trailing spaces

    for (size_t last = 0, strt = 0, fnd = 0; strt < endstr; strt = fnd + last)
        nos.push_back(std::stoul(nums.substr(fnd = nums.find_first_not_of(' ', strt), numdigs), &last));

    return nos.size();
}

Remember that nums is a sequence of consecutive positive numbers separated by multiple spaces. But stoul() only converts from the start of the string (which can seem surprising, since idx could have been also an input parameter if specified).

So the first thing we have to do is extract the number to be converted from the string. But this is not actually as simple as it may sound. In this case a number may be preceded by an unknown number of white-space characters. Whilst stoul() itself ignores these, how do we know how many to extract for the .substr()?

We could, of course, extract all of them to the end of the string as stoul() stops extracting at the end of the digits. However, this would be very costly time-wise as .substr() creates a new string object and if this object is greater than the size of the internal stack-based buffer then dynamic memory allocation would occur – not to mention the overhead of the copying.

Fortunately, we don’t have to do this. std::numeric_limits provides various pieces of information about types and one of these is digits10 which gives the ‘Number of digits (in decimal base) that can be represented without change’ – which upon investigation is one less than the maximum number of characters in an unsigned integer (two less for a signed integer because of the possible leading sign). This is the number to which the variable numdigs is set.

So to create the .substr() for the conversion we simply find the first char that is not a space and extract the maximum number of chars possible for the data type. The start of the next extraction is simply computed by adding the position of the first digit found (fnd) to that of the returned position from stoul() (last). If this is less than the end position (endstr - last after trailing spaces have been discarded) then all’s well for the next extraction – otherwise the conversion is complete.

stoul() does detect errors. And in keeping with C++, these are reported using exceptions. So unless you absolutely know that the characters to be converted all represent valid numbers (such as here), then code using stoul() needs to take this into account. Two possible exceptions can be generated.

The first is std::invalid_argument which is thrown when no conversion is performed (ie the first non-white space char is not a digit). The other is std::out_of_range which is thrown when the value read is out of the range of representable values of the type (unsigned long in this case).

Consider as an example:

const std::string num = "   *89"s;
std::vector<size_t> nos;

try {
    as_stoul(num, nos);
}
catch (const std::invalid_argument& ia) {
    return std::cout << ia.what() << std::endl, 1;
}
catch (const std::out_of_range& oor) {
    return std::cout << oor.what() << std::endl, 2;
}

std::cout << "converted " << nos.size() << " numbers" << std::endl;

Here the output would be:

invalid stoul argument

As *89 cannot be converted as the initial non-white space char is ‘*’ which is not a valid digit.

Revelation

And behold there came C++17 that went forth to conquer. When the features of C++17 were being discussed, it was recognised that the existing conversion methods had some perceived flaws (see proposal P0067R5). The most serious of which was performance – especially for JSON/XML etc parsers which require high throughput.

Hence the new std::from_chars() conversion functions. This is defined (for integer types) as:

from_chars_result from_chars(const char* first, const char* last, T& value, int base = 10);

Where T can be any integer type (eg int, size_t etc). There are also overloaded conversion functions for float, double and long double for which the output format can be specified as either scientific, fixed or both (general).

The first thing to really note here is that the return value is not the converted value – unlike the other conversion functions. The converted value is returned via the reference parameter value. Hence this variable needs to be defined first in the calling code.

The other parameters are as expected. first points to the location of the first character to be converted, last to one past the last character to be considered (ie [first, last) ) and base is the optional conversion base that defaults to 10.

The other interesting fact is that std::from_chars() does not ignore leading white-space chars. first is expected to point to the first digit of the characters to be converted. Hence if you are converting from chars that have leading white-space characters, the caller is responsible for skipping over these.

So what is this return type?

from_chars_result is a struct defined as:

struct from_chars_result
{
    const char * ptr;
    errc ec;
};

Where:

ptr is a pointer to the char that caused the conversion to stop or to last if all specified chars were converted. So in the event of a conversion not been performed, ptr would be set to first – as the conversion would fail on the first character.

ec is the error condition code of type std::errc (a class enum). If no error occurred (ie the conversion was successful) then this is set to std::errc {} (default initialization).  If an error occurred (ie the conversion was unsuccessful), then this is set to std::invalid_argument and if an overflow occurred in the conversion then this is set to std::result_out_of_range. Note that no exceptions are raised –so no try/catch blocks are required around its usage.

So let’s look at the test example:

size_t as_from_chars(const std::string& nums, std::vector<size_t>& nos)
//or alternatively:
//size_t as_from_chars(std::string_view nums, std::vector<size_t>& nos)
{
    // Pointer to end of characters to be converted excluding trailing spaces
    const auto end = nums.data() + nums.find_last_not_of(' ') + 1;  // End of data excluding trailing spaces
    const char* st = nullptr;	// Start pointer – set to last in the loop
    auto last = nums.data();	// Position of last character not converted
    size_t n;				// Converted number

    do {
        for (st = last; (st < end) && (*st == ' '); ++st);	// Ignore spaces
        if (last = std::from_chars(st, end, n).ptr; last != st)
            nos.push_back(n);

    } while (last != st);

    return nos.size();
}

First we find the end of the string ignoring trailing spaces. Then within the loop we have to ignore any leading spaces as std::from_chars() doesn’t do this – unlike the other methods. The actual conversion is then straight forward as we have the starting position and nothing is lost specifying the same end position each time as these are just pointers and no copying takes place. Once we have the returned pointer (last) equal to the start pointer (st) we know we either have an error (not in this case) or the end has been reached. Simples!

Whole string conversion

A common situation that arises is to convert characters that should represent just one number – possibly with either or both of leading/trailing spaces such as:

  • “   123 “
  • “34”
  • “   45”

[The “” are there just to show the spaces]

With

  • “12q”
  • “  23 q”

Being considered as errors – as they don’t consist of just a valid number. This conversion is again easy with from_chars() as shown below:

template<typename T = int>
auto getnum(std::string_view str)
{
    const auto fl = str.find_last_not_of(' ');	// Find end of data excluding trailing spaces

    if (fl == std::string_view::npos)	// If end of data not found, return no value
        return std::optional<T> {};

    const auto end = str.data() + fl + 1;	// End of data to be converted
    T num;

    return (std::from_chars(str.data() + str.find_first_not_of(' '), end, num).ptr == end) ? std::optional<T>{num} : std::optional<T> {};
}

First we find the real end of the string (ignoring any trailing spaces) and if then there is no data to convert, the code simply exits and returns no value for optional<T>. The start of the data ignoring leading spaces is then found (there must be a start otherwise the code would have exited as previous) which is used as the start of the conversion using std::from_chars() and the returned ptr is compared to end.

If this is the same then a complete conversion has been performed and the converted number is returned as a value for optional<T>. If these are not the same then not all of the data has been converted – which means in this case an error has occurred and again returns no value for optional<T>.

And it could be used like this:

if (auto res = getnum<size_t>("2  "); res)
    std::cout << *res << endl;
else
    std::cout << "Bad number" << endl;

Here the required type of the returned number is specified as a template parameter to getnum() – which defaults to int if not specified.

If the conversion was successful then the optional return has the converted value and if the conversion was unsuccessful then the optional return doesn’t have a value. Note that getnum() doesn’t check for underflow/overflow.

Summary of features

This table summarises the facilities of the considered conversion methods:

atol() strtoul() stoul() >> from_chars()
Specify starting position Yes Yes No Use seekg() Yes
Error detection No Yes Yes Yes Yes
Out of range detection Yes Yes Yes No Yes
Specify base No Yes Yes Yes Yes
Ignore leading white-space Yes Yes Yes Yes No
Determine termination char No Yes Yes Possible Yes
Accepts std::string No * No * Yes Yes (for std::istringstream) No *
Accepts std::string_view No ** No ** No No No **
Auto-base detection *** No Yes (set base = 0) Yes (set base = 0) No No

* to pass std:string, use .c_str()

** to pass std::string_view, use .data() but this cannot be used with stoul() and std::istringstream (and hence stream extraction >>)

To come…

And in the next thrilling instalment, we’ll reveal the possibly surprising performance results and discuss performance issues. Stay tuned!

You may also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed