Not sure how string split actually works in this case - java

I don't get the following:
In the following String:
String s = "1234;x;;y;";
if I do:
String[] s2 = s.split(";");
I get s2.length to be 4 and
s2[0] = "1234";
s2[1] = "x";
s2[2] = "";
s2[3] = "y";
But in the string: String s = "1234;x;y;;";
I get:
s2.length to be 3 and
s2[0] = "1234";
s2[1] = "x";
s2[2] = "y";
?
What is the difference and I don't get 4 in the latter case as well?
UPDATE:
Using -1 is not was I was expecting as behavior.
I mean the last semicolon is the end of the String so in the latter example I was also expecting 4 as length of the array

From the docs,
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
UPDATE:
You have five substrings separated by ; In the second case, these are 1234, x, y, and . As per the docs, all empty substrings (at the end) which result from the split operation would be eliminated.
For details, look here.
If n is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
The string boo:and:foo, for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" } // all the empty substrings at the end were eliminated

Trailing empty strings are omitted. However, there are ways to include them explicitly, if needed.

From http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String)
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

Good question. If you check the API documentation for String.split() and check the example with "boo:foo" then you can see that the trailing empty strings are omitted.
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
The string "boo:and:foo", for example, yields the following results with these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }

Thats default behavior of split method in java to not return empty tokens . ]
s.split("\;", -1); should return empty token

Why not check what does the documention says first. Here is the link:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29
And here is your answer:
Trailing empty strings are therefore not included in the resulting
array.

Related

java split with unknown result

for the following code,
why there is empty string in position 2,3,5,6,8?
then why "b", ":andf", "1" has no empty string behind?
String[] splitStrs = "booo:and:fooo1o".split("o", -1);
System.out.println(splitStrs.length);
for (int i=0; i<splitStrs.length; i++) {
System.out.println("\"" + splitStrs[i]+ "\"");
}
output is:
8
"b"
""
""
":and:f"
""
""
"1"
""
why there is empty string in position 2,3,5,6,8?
When splitting on "o", there's nothing between the o's in "ooo", thus empty strings.
then why "b", ":andf", "1" has no empty string behind?
But there is an empty string at the end of your output, i.e., behind "1".
Per the documentation, a negative 2nd arg specifically means "trailing empty strings not discarded".
Always read the doc.
The split method will find all occurances where is wanted character (in your example "o"), put a new (sub)string between current "o" and next "o", without the "o" character, in array, and continue for the whole string.
When you have, for an example "oo", it will be "" since there is nothing between those 2 "o" characters.
Let's take an example. You have a string "Oh, hello Anna! I havent seen you since 2010s!" and split this string on every place where is "a" character.
First, start from the first character, then find where is next letter "a", which is found on 14th index. Take part of the string from start to that place where is "a" and add it into an array. First element of an array will look like "Oh, hello Ann" ("A" and "a" are different characters). Then start from that "a" where I have found (14th index) and find next "a" , which is in 20th index in our example. Take part of the string from first and second "a" and copy it in an array. Then the procedure goes on until the end of the string.
Result will be:
"Oh, hello Ann"
"! I h"
"vent seen you since 2010s!"
If we split our same string on every "n", by using same logic, we will get:
"Oh, hello A"
""
"a! I have"
"t see"
" you si"
"ce 2010s"
Reason why I get an empty string on second part is because in "...Anna...", there is nothing between those 2 "n" characters
Some examples can be found on: https://www.geeksforgeeks.org/split-string-java-examples/
Public String [ ] split ( String regex, int limit ) Parameters:
regex – a delimiting regular expression
Limit – the resulting
threshold
The limit parameter can have 3 values:
limit > 0 – If this is the case, then the pattern will be applied at
most limit-1 times, the resulting array’s length will not be more
than n, and the resulting array’s last entry will contain all input
beyond the last matched pattern.
limit < 0 – In this case, the
pattern will be applied as many times as possible, and the resulting
array can be of any size.
limit = 0 – In this case, the pattern will
be applied as many times as possible, the resulting array can be of
any size, and trailing empty strings will be discarded.
please visit GeeksforGeeks site for more information regarding spliting.
"booo:and:fooo1o".split("o", -1);
why there is empty string in position 2,3,5,6,8?
Since the limit is -1 we can split it any number of times. When 'o' is used as the regex it will give all the values which are before it since there is no value it returns empty string
then why "b", ":andf", "1" has no empty string behind?
There is no empty string because there are characters after the 'o' previous match.
See this example:
class Split{
public static void main(final String ... $){
var out = System.out;
final String s = "ooa";
for(final String str : s.split("o", -1))
out.println("\""+str+"\"");
}
}
Output:
$ javac Split.java && java Split
""
""
"a"
Why this output?
When first match happens # index 0 which is also the very first character and it returns string before it but since there is no string it returns an empty string.
Then when second match happens with o # index 1 it returns the string after the first match and before index 1. Since there is no characters it returns an empty string.
After that it returns a.

How could the split() for Strings in Java be explained when two equal characters reside adjacently

In the docs for String.split,
The folowing are the examples. How can the last example be explained?
The string "boo:and:foo", for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" }
Read the docs carefully:
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string.
Note that the empty string is a substring of any string, including boo:and:foo. If you do "boo:and:foo".substring(2, 2), you will get the empty string. The empty string between the first two o's is followed by (i.e. "is terminated by") the substring "o" (the second o). The substring "o" matches the regex "o", so the empty string fulfils the requiremen:
is terminated by another substring that matches the given expression or is terminated by the end of the string
So it gets put into the resulting array.
The empty string after the second to last o also fulfils this criteria, and the empty string after the last o "is terminated by the end of the string". They should have been added to the array, and the array would have looked like:
{ "b", "", ":and:f", "", "" }
However, they are discarded from the array, because,
If [limit] is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
"trailing empty strings" refers to the last two "" elements in the array, which gets discarded.

Java String split returns length 0

public int lengthOfLastWord(String s) {
s.replaceAll("\\s", "");
String[] splittedS = s.split("\\s+");
if(splittedS.length == 1 && splittedS[0].equals("")) return 0;
return splittedS[splittedS.length - 1].length();
}
I tested it out with the string " ", and it returns that the length of splittedS is 0.
When I trimmed the String did I get " " -> "", so when I split this, I should have an array of length with with the first element being ""?
Java Strings are immutable so you have to store the reference to the returned String after replacement because a new String has been returned. You have written,
s.replaceAll("\\s", "");
But write,
s = s.replaceAll("\\s", "");
instead of above.
Wherever you perform operations on String, keep the new reference moving further.
The call to replaceAll has no effect, but since you split on \\s+, split method works exactly the same: you end up with an empty array.
Recall that one-argument split is the same as two-argument split with zero passed for the second parameter:
String[] splittedS = s.split("\\s+", 0);
// ^^^
This means that regex pattern is applied until there's no more changes, and then trailing empty strings are removed from the array.
This last point is what makes your array empty: the application of \\s+ pattern produces an array [ "" ], with a single empty string. This string is considered trailing by split, so it is removed from the result.
This result is not going to change even if you fix the call to replaceAll the way that other answers suggest.
You need to re assign the variable
s=s.replaceAll(...)

Split String in Java with [a-z] regular expression

I have two regexpressions:
[a-c] : any character from a-c
[a-z] : any character from a-z
And a test:
public static void main(String[] args) {
String s = "abcde";
String[] arr1 = s.split("[a-c]");
String[] arr2 = s.split("[a-z]");
System.out.println(arr1.length); //prints 4 : "", "", "", "de"
System.out.println(arr2.length); //prints 0
}
Why the second splitting behaves like this? I would expect a reslut with 6 empty string "" results.
According to the documentation of the single-argument String.split:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
To keep the trailing strings, you can use the two-argument version, and specify a negative limit:
String s = "abcde";
String[] arr1 = s.split("[a-c]", -1); // ["", "", "", "de"]
String[] arr2 = s.split("[a-z]", -1); // ["", "", "", "", "", ""]
By default, split discards trailing empty strings. In the arr2 case, they were all trailing empty strings, so they were all discarded.
To get 6 empty strings, pass a negative limit as the second parameter to the split method, which will keep all trailing empty strings.
String[] arr2 = s.split("[a-z]", -1);
If n is non-positive then the pattern will be applied as many times as
possible and the array can have any length.
String.split():
Splits this string around matches of the given regular expression.
Around means that the matches themselves are removed. For example, splitting "a,b,c" on commas would be just a as well as b and c.
The first split removes the a, b, and c.
The second removes all letters, thus all characters from that string.

Behaviour of String.split in java 1.6?

My code is:
String s = "1;;;; 23;;";
System.out.println(s.split(";").length);
and gives as output 5.
The source code of split is:
public String[] split(String regex) {
return split(regex, 0);
}
and the documentation says:
This method works as if by invoking the two-argument
split(java.lang.String,int) method with the given expression and a
limit argument of zero. Trailing empty strings are therefore not
included in the resulting array.
The string "boo:and:foo", for example, yields the following results
with these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
If I print the strings I have:
1
23
Shouldn't I get from this 1;;;; 23;; something like {"1", "", "", "", " 23", ""} ?
No, five is correct, as your quoted docs state:
Trailing empty strings are therefore not included in the resulting
array.
Which is why the empty strings at the end of the array are omitted. If you want the empty strings, do as Evgeniy Dorofeev's answer says and specify a limit of -1.
Since limit = 0 trailing empty strings are not included. Try
System.out.println(s.split(";", -1).length);
and you will get 7
It will split the string when ever ';' present and put into array.

Categories