Null termination in strings

Null termination in strings - java

Yes, I did check other threads and I have come to a conclusion. I just want you to confirm it so that I don't have any misconceptions.
Java String objects are not null terminated.
C++ std::string objects are also not null terminated
C strings or C-style strings if you will (array of characters), are the only strings that are null-terminated.
Correct or Incorrect?

C-strings are 0-terminated strings. You aren't forced to use them in C though.
Both C++ std::string and Java strings are counted strings, which means they store their length.
But C++ std::strings are also followed by a 0 since C++11, making them 0-terminated if (as often the case) they don't contain any embeddded 0, for better interoperability with 0-terminated-string APIs.

All of those are in themselves correct, but petty pedantery: C-style strings are not unique to C, there are other places where such things occur (most commonly in various forms of Assembler code, and C being a language originally designed to be "slightly above assembler" makes this "no surprise").
And in C++11, std::string is guaranteed to have a NUL terminator after the last actual string character [but it's valid to store NULL characters inside the string if you wish] (at least if you call c_str(), but in the implementations I've looked at, it's stored there on creation/update)

All the statements are not wrong, but need to clarify more of the specifics in each of the mentioned languages.

That is correct c++ std::string and java String both hold private fields indicating the length of the string. A NULL terminator is not needed.
The std::string method c_str returns the string as a NULL terminated char array for use when a NULL terminator is required e.g. c string functions such as strlen.

I don't know about the Java part, but in C++11 std::strings are NUL-terminated (besides storing the chars count), i.e. &s[0] returns the same string as s.c_str() (which is NUL-terminated, as a raw C-style string).
See this answer for more details.

The question you need to be asking is why C-String should be null terminated.
The answer is the string manipulation functions needs to know the exact length of the string. As strings in C are just array of characters there is no information that tells (this is the size of this array) they need something to help determining the size of array which is the null character standing at the end of it.
Where as in Java strings are instances of the String class which has the length field so there is no need for the null termination.
The same thing apply to strings in c++.

Almost correct.
C-string are not just an array of characters. They are a null-terminated array of characters.
So if you have an array of characters, it's not a C-string yet, it's just an ordinary array of characters. It has to have a terminating null character to be a valid C-style string.
Additionally, an std::string must also be null-terminated (since C++11). (But it still has a private variable holding the length of the string.)

Related

Can String contain <0x00> along with assigned values in java

If I declare one string, is there any possibility that the string can contain <0x00> along with assigned data ?
For instance :
String s = "Stack";
Can the string result come as :
Stack<0x00><0x00><0x00><0x00><0x00><0x00><0x00><0x00><0x00><0x00><0x00><0x00>

Yes, as:
String s = "Stack\u0000\u000";
This in contrast to C/C++ where strings are terminated by a '\0' char.
If a String must be passed as byte array to native code, there java has a trick available for UTF-8,
a modified UTF-8 that also turns '\u0000' into a multi-byte sequence: DataOutputStream.writeUTF(String)
Note that \u0000 (as some other control chars) is not allowed in XML.
By the way the 0 string terminator is deemed by its inventor as the greatest mistake in C. It also influenced processor instruction sets.

What is the purpose of the char datatype?

I am currently reading a textbook on Java and each chapter involving the String datatype also discusses char. My question is, what purpose does char have in the real world?
The only thing I have found is that because String is immutable, it makes it a poor choice for passwords. Thus, one should choose a character array (char[]) over String. However, Java does have a mutable class for strings called StringBuilder; would that not be just as suitable a replacement for strings as is char[]?

This has already been answered in the comments really :)
A String is a collection of Chars. Without Chars you would have nothing to build Strings from.
Because Char[] and dealing with Char[] is so important they warrant having their own class to handle the processing of them - hence String.
In your coding you are unlikely to use the Char datatype directly unless you are processing Strings or handling passwords. The only reason Char[] is used for passwords is because it's harder to accidentally print them into logs/view them in memory/put them into string caches, etc and because once you have finished with it you can explicitly zero the elements in the Array so it never stays in memory longer than needed.

difference between Strings in C++ and Java

In C++ I can do something like this...
String s = "abc";
char c = s[i]; // works fine...
But in Java, if I try doing the same, it throws an error. Why?.
In java, to achieve the above, I have to do :
s.toCharArray();
How is the implementation of Strings in C++ different from that in Java?

In java, to achieve the above, I have to do :
s.toCharArray();
Not really. You can use charAt instead:
char c = s.charAt(i);
Basically, C++ allows user-defined operators - Java doesn't. So the String class doesn't expose any sort of "indexing" operator; that only exists for arrays, and a String isn't an array. (It's usually implemented using an array, but that's a different matter.)
EDIT: As noted in comments, the + operator is special-cased for strings - right in the language specification. The same could have been done for [], but it isn't - and as it's not in the language specification, and Java doesn't support overloaded operators, it can't be performed in library code. (For example, you can't give custom behaviour to + for any other class.)

The difference is that C++ has operator overloading, and uses it to access the string contents.
They both store the string characters in such a way as you cannot change them.

The reason that it is possible to write
string s = "abc";
char c = s[i];
in C++ is that the string class has overloaded the indexing operator (say [] operator) which allows programmers to access characters of a string object the same way that they access an element of an array, despite the fact that a string object is not an array.
Java, on the other hand, does not allow operator overloading of any kind (the only exception is the + operator that is overloaded for strings) and hence, the indexing operator is not and can not be overloaded for String objects. In Java, to access a character of a string, you need to use accessor member methods, such as charAt. You can also invoke the toCharArray method of the String class, which returns to you an array of the characters of the string object and you can use the indexing operator with this returned value:
char c = s.toCharArray()[i];

See the method String#charAt
Returns the char value at the specified index. An index ranges from 0 to length() - 1. The first char value of the sequence is at index 0, the next at index 1, and so on, as for array indexing.
If the char value specified by the index is a surrogate, the surrogate value is returned.
public char charAt(int index)

in c++ strings are already treated as array of characters,
but in java String is a built in class.
it is different from array of characters.

In C++, a string is typically just an array of (or a pointer to) chars, terminated with a NULL (\0) character. You can process a string by indexing also as you would process any array.
But in Java , a strings are not arrays. Java strings are objects of type java.lang.String so You cannot process them by indexing .

Char vs String in Java? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am learning Java this year as part of the AP Computer Science curriculum, and while I was reading about "Char" and "String" I could not understand why one would bother to use "Char" and only be able to store one character rather than just use "String" and be able to store much more than that. In short what's the point of "char" if it can only store a single character?

People are mentioning memory concerns, which are valid, but I don't think that's a very important reason 99% of the time. An important reason is that the Java compiler will tell you if you make a mistake so you don't have to figure it out on your own.
For example, if you only want 1 character for a variable, you can use a char to store the value and now nobody can put anything else in there without it being an error. If you used a String instead, there could be two characters in the String even though you intended that to never be possible. In fact, there could be 0 characters in the String which would be just as bad. Also, all your code that uses the String will have to say "get the first character of the String" where it could simply say, "give me the character".
An analogy (which may not make sense to you yet, unfortunately) would be, "Why would I say a Person has a Name when I could say a Person has a List of Names?" The same reasons apply. If you only want a Person to have one Name, then giving him a list of Names adds a lot maintenance overhead.

You could consider this analogy:
You need one apple. Would you prefer to have one apple in your hand, or a big box that could contain more apples, but only needs to contain the one?
The char primitive datatype is easier to work with than the String class in situations where you only need one character. It's also a lot less overhead, because the String class has a lot of extra methods and information that it needs to store to be efficient at handling string with multiple characters. Using the String class when you only need one character is basically overkill. If you want to read from a variable of both types to get the character, this is the code that would do that:
// initialization of the variables
char character = 'a';
String string = "a";
// writing a method that returns the character
char getChar()
{
return character; // simple!
}
char getCharFromString()
{
return string.charAt(0); // complicated, could fail if there is no character
}
If this code looks complicated, you can ignore it. The conclusion is that using String when you only need one character is overcomplicating things.
Basically, the String class is used when you need more than one character. You could also just create an array of chars, but then you would not have the useful methods of the String class, such as the .equals() and .length() methods.

Strings are objects. Objects always go on the dynamic storage. Storing one-character string would require at least a dozen of bytes.
chars (not Chars) are primitives. They take fixed amount of space (2 bytes). In situations when you need to process a single character, creating one-character string is a waste of resources. Moreover, when you expect to see a single character, using strings would require validation that the data passed in has exactly one character. This would be unacceptable in situations when you must be extremely fast, such as character-based input and output.
To summarize, you need a char because of
Memory footprint - a char is smaller than a String of one character
Speed of processing - creating objects carries an overhead
Program's maintainability - Knowing the type makes it easier for you and for the readers of your code to know what kind of data is expected to be stored in a char variable.

char take up less memory for times when you really only need one character. There are also multiple other applications for using a single character.
char is a primitive datatype while string is an object which comes at greater overhead.
A string is also made up of char, so there's that too.

Because the char takes up less memory!
Also the char is stored in memory and NOT as a reference value so theoretically its faster to access the char (You'll understand that more later)
***Note: I once had this same thought when I first started programming about why use an int when you can use a long and not have to worry about large numbers. This tells me you're on your way to be a great programmer! :)

char is a primitive type while String is a true Object. In some cases where performance is a concern it's conceivable that you would only want to use primitives.
Another case in which you would want to use char is when you're writing Java 1.0 and you're tasked with creating the String class!
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];

Everything in java can be reduced to primitive types. You can write any program with primitive types. So you need some kind of minimalist way of storing text. A char is also really just a byte, that is interpreted as a character.
Also if you want to loop though all characters in a string you would do:
char[] chArr = str.toCharArray();
for(int i = 0 ; i < chArr.length ; i++)
{
//do something with chArr[i];
}
This would be much more awkward trying to substring out an exact character from the String.

Lot of answers here already. While the memory concerns are valid, you have to realize there are times when you want to directly manipulate characters. The word ladder game
where you try to turn one word into another by changing one character at a time is an example I had to do in a programming class. Having a char type lets you manipulate a singe character at a time. It also lets you assign an int to a char that maps to your local character set.
You can do thing like char c = 97; and that will print out as a. You can do things like increment a character from 97 to 122 to print out all lowercase characters. Sometimes this actually is useful.

How to compare two char* variables

Suppose we have the following method (it is in c code):
const char *bitap_search(const char *text, const char *pattern)
My question is how can I compare text and pattern if they are char? This method is like a substring problem but I am confused a bit can I write in term of char such code?
if (text[i]==pattern[i])?
look i am interesting at this algorithm in java
http://en.wikipedia.org/wiki/Bitap_algorithm
how implement this in java?
R = malloc((k+1) * sizeof *R);
and please help me to translate this code in java
so we have two string text?
like "i like computer it is very important"
and patter string " computer it is very"?
can anybody explain me what we have instead of char?

I'm not sure what exactly you are asking, but if you mean to find pattern in text, then strstr(text, pattern). Or if you mean to just compare text and pattern, then strcmp(text, pattern) (note that it returns 0 when text and pattern are equal).
Edit based on discussion in comments: If you mean to ask how to implement the indexing of individual characters in Java, then substitute (in Java) text.charAt(i) for the C text[i]. In C the chars in strings can be indexed directly like an array, in Java one needs to call the correct method in String.
Edit 2: The C code const char * can be replaced in Java with String.
In C malloc is used to allocate memory; in this case it allocates room in the array R for m+1 elements. So, the BIT *R can be removed and R = malloc((m+1) * sizeof *R); replaced with boolean[] R = new boolean[m + 1];. When assigning values into the array R substitute true for 1 and false for 0.

I think you are confused about the difference between char and char *. In C there is no built-in string type. Strings are represented as null-terminated character arrays, meaning that the last character of the string must be \0 So char is a single character, while char * is a pointer to an array of characters, i. e. a string. And that means that it is perfectly fine to say if (text[i] == pattern[i]).

You might try these:
Google-diff-match-patch says that it
has a
java implementation of Bitap.
Also, it appears that crosswire
has an implementation.
Finally, there
seems to be a package called String
Search, whose title is
High-performance pattern matching
algorithms in Java.

You probably need strcmp() or strpos().

You should use strncmp(). The syntax is something like:
int strncmp( const char *str1, const char *str2, size_t count );
It is the best and more secure way of comparing strings, but of course you will need to know their length, or at least the minimum length between them.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Null termination in strings - java

All the statements are not wrong, but need to clarify more of the specifics in each of the mentioned languages.

I don't know about the Java part, but in C++11 std::strings are NUL-terminated (besides storing the chars count), i.e. &s[0] returns the same string as s.c_str() (which is NUL-terminated, as a raw C-style string). See this answer for more details.

Related

Can String contain <0x00> along with assigned values in java

What is the purpose of the char datatype?

difference between Strings in C++ and Java

Char vs String in Java? [closed]

How to compare two char* variables

Categories

Resources