Std::string Size() Length() [PORTABLE]
I was reading the answers for this question and found that there is actually a method called length() for std::string (I always used size()). Is there any specific reason for having this method in std::string class? I read both MSDN and CppRefernce, and they seem to indicate that there is no difference between size() and length(). If that is so, isn't it making more confusing for the user of the class?
std::string size() length()
As per the documentation, these are just synonyms. size() is there to be consistent with other STL containers (like vector, map, etc.) and length() is to be consistent with most peoples' intuitive notion of character strings. People usually talk about a word, sentence or paragraph's length, not its size, so length() is there to make things more readable.
std::string is a class that is part of the standard library, used to hold a string. It manages the allocation/deallocation of the dynamic memory for the string, and contains data members for the pointer to the memory, its size, and possible others.
sizeof(std::string) returns the size of the std::string object, not that of the actual string. It depends on the implementation of the standard library and is not fixed by the C++ standard. On that particular implementation it is 24 bytes.
To get the length of the actual string, use str.length(), where str is the std::string object. The actual allocated memory size is always at least 1 byte more, because str.c_str() will return a pointer to the string, with a terminating NULL byte at the end of it. The class allocates and deallocates memory as needed when the string is changed, and may internally allocate more memory than is needed by the string's size.
Theoretically, the std::string class could also have been implemented so that it is smaller than 24 bytes (for example, if it contained just a pointer and an integer for the length). One reason it is that way is because the standard library implementation does small string optimization, i.e. if the string is short (less than 24 characters), it will be put into the std::string object itself, and it will allocate no dynamic memory for it. So there also needs to be a flag inside the std::string that indicates this.
Also, a string literal in C++ will not be of type std::string, but it will be a raw C-string of type char[N]. So sizeof("test") returns 5, because it is a char. (4 bytes for the characters, plus one additional 0 byte at the end.)
I'm having some trouble figuring out the exact semantics of std::string.length().The documentation explicitly points out that length() returns the number of characters in the string and not the number of bytes. I was wondering in which cases this actually makes a difference.
In particular, is this only relevant to non-char instantiations of std::basic_string or can I also get into trouble when storing UTF-8 strings with multi-byte characters? Does the standard allow for length() to be UTF8-aware?
But std::string is about char characters; there is no such thing as a multi-byte character as far as std::string is concerned, whether you crammed one in at a high level or not. So, std::string.length() is always the number of bytes represented by the string. Note that if you're cramming multibyte "characters" into an std::string, then your definition of "character" suddenly becomes at odds with that of the container and of the standard.
A std::string is std::basic_string, so s.length() * sizeof(char) = byte length. Also, std::string knows nothing of UTF-8, so you're going to get the byte size even if that's not really what you're after.
The crux is in how the length of the string is computed. In C you count characters until you reach the first null character. This is the same in C++ for types like const char*. For std::string the length of the string is tracked separately and is independent of the contained characters. You can have as many null characters as you like and they do not affect the string length! (Well, they do affect the string length in the sense that every '\0' adds 1 to the length, but it does not indicate the end of the string.)
Surprised? You might be, because a std::string is often initialized from a null-terminated character string and often its value is used as a null-terminated character string, when c_str is called; but nonetheless a std::string is not a null-terminated character string.
Tip: You might see some C++ programs that use the size() function to get the length of a string. This is just an alias of length(). It is completely up to you if you want to use length() or size():
The C++ String class has length() and size() function. These can be used to get the length of a string type object. To get the length of the traditional C like strings, we can use the strlen() function. That is present under the cstring header file. Another two approaches are straight forward. One by using the while loop, and another is by using the for loop.
Another approach is to set the size of the string using resize() and to initialize the data character per character. QString uses 0-based indexes, just like C++ arrays. To access the character at a particular index position, you can use operator(). On non-const strings, operator() returns a reference to a character that can be used on the left side of an assignment. For example:
The sole purpose of this function is to provide a means of fine tuning QString's memory usage. In general, you will rarely ever need to call this function. If you want to know how many characters are in the string, call size().
Warning: reserve() reserves memory but does not change the size of the string. Accessing data beyond the end of the string is undefined behavior. If you need to access memory beyond the current end of the string, use resize().
In this example, when we initialized the vector v1, it contained 2 elements. Thus its length was 2 with v1 = 5 and v1 = 6.v1.resize(5) resized the vector so that it contains 5 elements. Since we did not assign any value to the rest of the elements of the vector, they got assigned a value 0. So, v1 = 0, v1 = 0 and v1 = 0.
The std::string_view, from the C++17 standard, is a read-only non-owning reference to a char sequence. The motivation behind std::string_view is that it is quite common for functions to require a read-only reference to an std::string-like object where the exact type of the object does not matter. The drawback of using const std::string& in those situations is that it requires creating an std::string object. Here is a simple case in point:
Constructing an std::string object could be expensive because it usually (but not always) requires dynamic memory allocation. Where the cost of constructing an std::string object is a concern, the readability and ease of usage are frequently compromised by using const char* and length parameters:
What makes std::string_view better than const std::string& is that it eliminates the need to have an std::string object in the first place. Usually, an std::string_view is composed of two members: a const char* that points to the start of the char array, and the size. Our simple example with std::string_view:
Assume a hypothetical trading system application that uses a large number of option contract OSI symbols (e.g., "AAPL 131101C00470000"). An OSI symbol is a 21-character long identifier that encodes various attributes of an option contract. The application loads a delimited list of all the symbols from a file to a buffer. Then the symbols from the buffer are split and stored in an std::unordered_set of std::string objects:
At various places in the application, the symbols are searched from the symbols collection, copied when necessary, and stored to other STL containers when needed. But nowhere, the symbols are modified. It is costly in terms of performance and memory usage to have a vast number of the std::string objects, mainly when the dynamic memory allocation is involved. To minimize dynamic allocation, a typical implementation of std::string is optimized to store a small string within itself in a char array; this is called short/small string optimization (SSO). However, the small-string size for optimization is implementation-dependent and could very well be below 21-chars.
That is the only change we need for the above example code to work. However, it is likely that more changes would be required in a real-world application. For one, it is important to consider here is that the buffer that holds the symbol list should live for the lifetime of the application; otherwise, all the symbol std::string_views would be invalidated.
Another change could come from a fact that the std::string_view does not have the c_str() interface to return a null-terminated string. We would have to convert the std::string_view to std:: string wherever a null-terminated string is required:
The std::string_view is an excellent utility for good performance and readability where only the std::string-like interface is required. But the caution must be exercised to ensure that the std::string_view does not outlive the referred char sequence.
All standard containers provide a size() method that returns the number of elements they actually contain. Note that std::string also has a length() method, that does exactly the same thing as size but with a maybe more natural name.
sf::String defines the most important functions of the standard std::string class: removing, random access, iterating, appending, comparing, etc. However it is a simple class provided for convenience, and you may have to consider using a more optimized class if your program requires complex string handling. The automatic conversion functions will then take care of converting your string to sf::String whenever SFML requires it.
std::string has two modes: long string and short string. It uses a union to reuse the same bytes for both modes. Short string mode is an optimization which makes it possible to store up to 22 characters without heap allocation.