Java String IndexOf behaviour for empty string

Java String IndexOf behaviour for empty string - java

For a string say, String str = "abc" both str.indexOf("a") and str.indexOf("") return 0. Is this behaviour valid?

If only there was some place where they document behaviour of methods.
indexOf(string)
Returns the index within this string of the first occurrence of the specified substring.
The returned index is the smallest value k for which:
this.startsWith(str, k)
startsWith(string)
true if the character sequence represented by the argument is a prefix of the character sequence represented by this string; false otherwise. Note also that true will be returned if the argument is an empty string or is equal to this String object as determined by the equals(Object) method.

Yes. The conceptual reason is pretty similar as adding 0 in math. So ""+"a"+"bc" = "abc" = ""+"a"+"b"+""+"c".

The return value of String.indexof("") is 0, or the starting index, when you pass in an empty string because the empty string "" is indeed located there.
Think of "abc" as "" + "abc".
Otherwise, please refer to this documentation:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(int)
indexOf "returns:the index of the first occurrence of the character in the character sequence represented by this object, or -1 if the character does not occur."
Therefore, str.indexOf("a") returns 0.

Think of it as
"" +"abc"="abc"
basically
"abc" is ""+"abc"

An empty string does in fact exist in any string that is not null.

Related

Java - Why does string split for empty string give me a non empty array?

I want to split a String by a space. When I use an empty string, I expect to get an array of zero strings. Instead, I get an array with only empty string. Why ?
public static void main(String [] args){
String x = "";
String [] xs = x.split(" ");
System.out.println("strings :" + xs.length);//prints 1 instead of 0.
}

The single element string array entry is in fact empty string. This makes sense, because the split on " " fails, and hence you just get back the input with which you started. As a general approach, you may consider that if splitting returns you a single element, then the split did not match anything, leaving you with the starting input string.

An interesting puzzle indeed:
> "".split(" ")
String[1] { "" }
> " ".split(" ")
String[0] { }
The question is, when you split the empty string, why does the result contain the empty string, and when you split a space, why does the result not contain anything? It seems inconsistent, but all is explained in the documentation.
The String.split(String) method "works as if by invoking the two-argument split method with the given expression and a limit argument of zero", so let's read the docs for String.split(String, int). The case of the empty string is answered by this part:
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
The empty string has no part matching a space, so the output is an array containing one element, the input string, exactly as the docs say should happen.
The case of the string " " is answered by these two parts:
A zero-width match at the beginning however never produces such empty leading substring.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
The whole input string " " matches the splitting pattern. In principle we could include an empty string on either side of the match, but the docs say that an empty leading substring is never included, and (because the limit parameter n = 0) the trailing empty string is also discarded. Hence, the empty strings before and after the match are both not included in the resulting array, so it's empty.

It appears that since the String exists and it cannot be split (there are no spaces), it simply places the entire String into the first array position, causing there to be one. If you were to instead try
String x = " ";
String [] xs = x.split(" ");
System.out.println("strings :" + xs.length);//prints 1 instead of 0.
It will give you the zero you are expecting.
See also: Java String split removed empty values

Does trim() method removes CRLF characters also?

Suddenly noticed that trim() method removes CRLF - new line - characters also..:
String s = "str\r\n";
s = s.trim();
System.out.println("--");
System.out.print(s);
System.out.println("--");
Is it intended to do so?

Yes, see the doc:
Otherwise, let k be the index of the first character in the string
whose code is greater than '\u0020', and let m be the index of the
last character in the string whose code is greater than '\u0020'. A
new String object is created, representing the substring of this
string that begins with the character at index k and ends with the
character at index m-that is, the result of this.substring(k, m+1).
CR+LF: CR (U+000D) followed by LF (U+000A) less than U+0020

Java String split returns length 0

public int lengthOfLastWord(String s) {
s.replaceAll("\\s", "");
String[] splittedS = s.split("\\s+");
if(splittedS.length == 1 && splittedS[0].equals("")) return 0;
return splittedS[splittedS.length - 1].length();
}
I tested it out with the string " ", and it returns that the length of splittedS is 0.
When I trimmed the String did I get " " -> "", so when I split this, I should have an array of length with with the first element being ""?

Java Strings are immutable so you have to store the reference to the returned String after replacement because a new String has been returned. You have written,
s.replaceAll("\\s", "");
But write,
s = s.replaceAll("\\s", "");
instead of above.
Wherever you perform operations on String, keep the new reference moving further.

The call to replaceAll has no effect, but since you split on \\s+, split method works exactly the same: you end up with an empty array.
Recall that one-argument split is the same as two-argument split with zero passed for the second parameter:
String[] splittedS = s.split("\\s+", 0);
// ^^^
This means that regex pattern is applied until there's no more changes, and then trailing empty strings are removed from the array.
This last point is what makes your array empty: the application of \\s+ pattern produces an array [ "" ], with a single empty string. This string is considered trailing by split, so it is removed from the result.
This result is not going to change even if you fix the call to replaceAll the way that other answers suggest.

You need to re assign the variable
s=s.replaceAll(...)

Empty Strings within a non empty String [duplicate]

This question already has answers here:
Replace with empty string replaces newChar around all the characters in original string
(4 answers)
Closed 6 years ago.
I'm confused with a code
public class StringReplaceWithEmptyString
{
public static void main(String[] args)
{
String s1 = "asdfgh";
System.out.println(s1);
s1 = s1.replace("", "1");
System.out.println(s1);
}
}
And the output is:
asdfgh
1a1s1d1f1g1h1
So my first opinion was every character in a String is having an empty String "" at both sides. But if that's the case after 'a' (in the String) there should be two '1' coming in the second line of output (one for end of 'a' and second for starting of 's').
Now I checked whether the String is represented as a char[] in these links In Java, is a String an array of chars? and String representation in Java I got answer as YES.
So I tried to assign an empty character '' to a char variable, but its giving me a compiler error,
Invalid character constant
The same process gives a compiler error when I tried in char[]
char[] c = {'','a','','s'}; // CTE
So I'm confused about three things.
How an empty String is represented by char[] ?
Why I'm getting that output for the above code?
How the String s1 is represented in char[] when it is initialized first time?
Sorry if I'm wrong at any part of my question.

Just adding some more explanation to Tim Biegeleisen answer.
As of Java 8, The code of replace method in java.lang.String class is
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Here You can clearly see that the string is replaced by Regex Pattern matcher and in regex "" is identified by Zero-Length character and it is present around any Non-Zero length character.
So, behind the scene your code is executed as following
Pattern.compile("".toString(), Pattern.LITERAL).matcher("asdfgh").replaceAll(Matcher.quoteReplacement("1".toString()));
The the output becomes
1a1s1d1f1g1h1

Going with Andy Turner's great comment, your call to String#replace() is actually implemented using String#replaceAll(). As such, there is a regex replacement happening here. The matches occurs before the first character, in between each character in the string, and after the last character.
^|a|s|d|f|g|h|$
^ this and every pipe matches to empty string ""
The match you are making is a zero length match. In Java's regex implementation used in String.replaceAll(), this behaves as the example above shows, namely matching each inter-character position and the positions before the first and after the last characters.
Here is a reference which discusses zero length matches in more detail: http://www.regexguru.com/2008/04/watch-out-for-zero-length-matches/
A zero-width or zero-length match is a regular expression match that does not match any characters. It matches only a position in the string. E.g. the regex \b matches between the 1 and , in 1,2.

This is because it does a regex match of the pattern/replacement you pass to the replace().
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
Parameters:
target The sequence of char values
to be replaced
replacement The replacement sequence of char values
Returns: The resulting string
Throws: NullPointerException if target
or replacement is null.
Since:
1.5
Please read more at the link below ... (Also browse through the source code).
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.replace%28java.lang.CharSequence%2Cjava.lang.CharSequence%29
A regex such as "" would match every possible empty string in a string. In this case it happens to be every empty space at the start and end and after every character in the string.

How can I select the substring represented by the original string except the two initial characters?

I know that in Java I can extract a substring from a String object doing something like:
String string= "Hello World";
String subString = string.substring(5);
And in this wat the subString variable will contain only the Hello string
and I know that I can also specify 2 index to select a substring, something like:
String subString = string.substring(6, 11);
That will select the World string.
But what can I do if, given a string, I want select the substring represented by the original string except the two initial characters.
So for example I have:
String value = "12345"
and my substring have to be 345
How can I do it?

String subString = string.substring(5); doesn't do what you think it does.
Actually string.substring(2) returns a String containing all the characters of the first String except the first two characters.
When you want a sub string starting at the beginning of the input String, you use the two parameters version - for example string.substring(0,5) for the first 5 characters.

From the Java docs,
Returns a new string that is a substring of this string. The substring
begins with the character at the specified index and extends to the
end of this string.
Examples:
"unhappy".substring(2) returns "happy"
"Harbison".substring(3) returns "bison"
"emptiness".substring(9) returns "" (an empty string)
Parameters: beginIndex the beginning index, inclusive. Returns: the
specified substring. Throws: IndexOutOfBoundsException - if beginIndex
is negative or larger than the length of this String object.
public static void main(String[] args) {
String sb = "12345";
String s = sb.substring(2);
System.out.println(s);
}
output
345

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java String IndexOf behaviour for empty string - java

For a string say, String str = "abc" both str.indexOf("a") and str.indexOf("") return 0. Is this behaviour valid?

Yes. The conceptual reason is pretty similar as adding 0 in math. So ""+"a"+"bc" = "abc" = ""+"a"+"b"+""+"c".

Think of it as "" +"abc"="abc" basically "abc" is ""+"abc"

An empty string does in fact exist in any string that is not null.

Related

Java - Why does string split for empty string give me a non empty array?

Does trim() method removes CRLF characters also?

Java String split returns length 0

Empty Strings within a non empty String [duplicate]

How can I select the substring represented by the original string except the two initial characters?

Categories

Resources