Behaviour of String.split in java 1.6? - java

My code is:
String s = "1;;;; 23;;";
System.out.println(s.split(";").length);
and gives as output 5.
The source code of split is:
public String[] split(String regex) {
return split(regex, 0);
}
and the documentation says:
This method works as if by invoking the two-argument
split(java.lang.String,int) method with the given expression and a
limit argument of zero. Trailing empty strings are therefore not
included in the resulting array.
The string "boo:and:foo", for example, yields the following results
with these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
If I print the strings I have:
1
23
Shouldn't I get from this 1;;;; 23;; something like {"1", "", "", "", " 23", ""} ?

No, five is correct, as your quoted docs state:
Trailing empty strings are therefore not included in the resulting
array.
Which is why the empty strings at the end of the array are omitted. If you want the empty strings, do as Evgeniy Dorofeev's answer says and specify a limit of -1.

Since limit = 0 trailing empty strings are not included. Try
System.out.println(s.split(";", -1).length);
and you will get 7

It will split the string when ever ';' present and put into array.

Related

java split with unknown result

for the following code,
why there is empty string in position 2,3,5,6,8?
then why "b", ":andf", "1" has no empty string behind?
String[] splitStrs = "booo:and:fooo1o".split("o", -1);
System.out.println(splitStrs.length);
for (int i=0; i<splitStrs.length; i++) {
System.out.println("\"" + splitStrs[i]+ "\"");
}
output is:
8
"b"
""
""
":and:f"
""
""
"1"
""
why there is empty string in position 2,3,5,6,8?
When splitting on "o", there's nothing between the o's in "ooo", thus empty strings.
then why "b", ":andf", "1" has no empty string behind?
But there is an empty string at the end of your output, i.e., behind "1".
Per the documentation, a negative 2nd arg specifically means "trailing empty strings not discarded".
Always read the doc.
The split method will find all occurances where is wanted character (in your example "o"), put a new (sub)string between current "o" and next "o", without the "o" character, in array, and continue for the whole string.
When you have, for an example "oo", it will be "" since there is nothing between those 2 "o" characters.
Let's take an example. You have a string "Oh, hello Anna! I havent seen you since 2010s!" and split this string on every place where is "a" character.
First, start from the first character, then find where is next letter "a", which is found on 14th index. Take part of the string from start to that place where is "a" and add it into an array. First element of an array will look like "Oh, hello Ann" ("A" and "a" are different characters). Then start from that "a" where I have found (14th index) and find next "a" , which is in 20th index in our example. Take part of the string from first and second "a" and copy it in an array. Then the procedure goes on until the end of the string.
Result will be:
"Oh, hello Ann"
"! I h"
"vent seen you since 2010s!"
If we split our same string on every "n", by using same logic, we will get:
"Oh, hello A"
""
"a! I have"
"t see"
" you si"
"ce 2010s"
Reason why I get an empty string on second part is because in "...Anna...", there is nothing between those 2 "n" characters
Some examples can be found on: https://www.geeksforgeeks.org/split-string-java-examples/
Public String [ ] split ( String regex, int limit ) Parameters:
regex – a delimiting regular expression
Limit – the resulting
threshold
The limit parameter can have 3 values:
limit > 0 – If this is the case, then the pattern will be applied at
most limit-1 times, the resulting array’s length will not be more
than n, and the resulting array’s last entry will contain all input
beyond the last matched pattern.
limit < 0 – In this case, the
pattern will be applied as many times as possible, and the resulting
array can be of any size.
limit = 0 – In this case, the pattern will
be applied as many times as possible, the resulting array can be of
any size, and trailing empty strings will be discarded.
please visit GeeksforGeeks site for more information regarding spliting.
"booo:and:fooo1o".split("o", -1);
why there is empty string in position 2,3,5,6,8?
Since the limit is -1 we can split it any number of times. When 'o' is used as the regex it will give all the values which are before it since there is no value it returns empty string
then why "b", ":andf", "1" has no empty string behind?
There is no empty string because there are characters after the 'o' previous match.
See this example:
class Split{
public static void main(final String ... $){
var out = System.out;
final String s = "ooa";
for(final String str : s.split("o", -1))
out.println("\""+str+"\"");
}
}
Output:
$ javac Split.java && java Split
""
""
"a"
Why this output?
When first match happens # index 0 which is also the very first character and it returns string before it but since there is no string it returns an empty string.
Then when second match happens with o # index 1 it returns the string after the first match and before index 1. Since there is no characters it returns an empty string.
After that it returns a.

Java String split returns length 0

public int lengthOfLastWord(String s) {
s.replaceAll("\\s", "");
String[] splittedS = s.split("\\s+");
if(splittedS.length == 1 && splittedS[0].equals("")) return 0;
return splittedS[splittedS.length - 1].length();
}
I tested it out with the string " ", and it returns that the length of splittedS is 0.
When I trimmed the String did I get " " -> "", so when I split this, I should have an array of length with with the first element being ""?
Java Strings are immutable so you have to store the reference to the returned String after replacement because a new String has been returned. You have written,
s.replaceAll("\\s", "");
But write,
s = s.replaceAll("\\s", "");
instead of above.
Wherever you perform operations on String, keep the new reference moving further.
The call to replaceAll has no effect, but since you split on \\s+, split method works exactly the same: you end up with an empty array.
Recall that one-argument split is the same as two-argument split with zero passed for the second parameter:
String[] splittedS = s.split("\\s+", 0);
// ^^^
This means that regex pattern is applied until there's no more changes, and then trailing empty strings are removed from the array.
This last point is what makes your array empty: the application of \\s+ pattern produces an array [ "" ], with a single empty string. This string is considered trailing by split, so it is removed from the result.
This result is not going to change even if you fix the call to replaceAll the way that other answers suggest.
You need to re assign the variable
s=s.replaceAll(...)

Split String in Java with [a-z] regular expression

I have two regexpressions:
[a-c] : any character from a-c
[a-z] : any character from a-z
And a test:
public static void main(String[] args) {
String s = "abcde";
String[] arr1 = s.split("[a-c]");
String[] arr2 = s.split("[a-z]");
System.out.println(arr1.length); //prints 4 : "", "", "", "de"
System.out.println(arr2.length); //prints 0
}
Why the second splitting behaves like this? I would expect a reslut with 6 empty string "" results.
According to the documentation of the single-argument String.split:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
To keep the trailing strings, you can use the two-argument version, and specify a negative limit:
String s = "abcde";
String[] arr1 = s.split("[a-c]", -1); // ["", "", "", "de"]
String[] arr2 = s.split("[a-z]", -1); // ["", "", "", "", "", ""]
By default, split discards trailing empty strings. In the arr2 case, they were all trailing empty strings, so they were all discarded.
To get 6 empty strings, pass a negative limit as the second parameter to the split method, which will keep all trailing empty strings.
String[] arr2 = s.split("[a-z]", -1);
If n is non-positive then the pattern will be applied as many times as
possible and the array can have any length.
String.split():
Splits this string around matches of the given regular expression.
Around means that the matches themselves are removed. For example, splitting "a,b,c" on commas would be just a as well as b and c.
The first split removes the a, b, and c.
The second removes all letters, thus all characters from that string.

Java split() a String made out of the String you are splitting with?

When I compile and run this code:
class StringTest {
public static void main(String[] args) {
System.out.println("Begin Test");
String letters = "AAAAAAA"
String[] broken = letters.split("A");
for(int i = 0; i < broken.length; i++)
System.out.println("Item " + i + ": " + broken[i]);
System.out.println("End Test");
}
}
The output to the console is:
Begin Test
End Test
Can anyone explain why split() works like this? I saw some other questions sort of like this on here, but didn't fully understand why there is no output when splitting a string made entirely out of the character that you are using for regex. Why does java handle Strings this way?
String.split discards trailing empty strings. For example, "foo,bar,,".split(",") gets split into {"foo", "bar"}. What you're seeing is a string that consists entirely of the separator, so all the empty splits are "trailing" and get discarded.
You could probably get all those empty strings if you used letters.split("A", -1). Alternately, Guava's Splitter doesn't do things like that unless you ask for it: Splitter.on('A').split(letters).
It is because "A" is used as delimiter in split method and since you don't have any other text in your string other than delimiter "A" therefore after split you are left with nothing (empty string is not returned in the resulting array).
Since every character in your input is a delimiter, every string found is blank. By default, every trailing blank found is ignored, hence what you're seeing.
However, split() comes in two flavours. There is a second version of the split() method that accepts another int parameter limit, which controls the number of times the match is to be applied, but also the behaviour of ignoring trailing blanks.
If the limit parameter is negative, trailing blanks are preserved.
If you executed this code:
String letters = "AAAAAAA";
String[] broken = letters.split("A", -1); // note the -1
System.out.println(Arrays.toString(broken));
You get this output:
{"", "", "", "", "", "", ""}
See the javadoc for more, including examples of how various limit values affect behaviour.

Not sure how string split actually works in this case

I don't get the following:
In the following String:
String s = "1234;x;;y;";
if I do:
String[] s2 = s.split(";");
I get s2.length to be 4 and
s2[0] = "1234";
s2[1] = "x";
s2[2] = "";
s2[3] = "y";
But in the string: String s = "1234;x;y;;";
I get:
s2.length to be 3 and
s2[0] = "1234";
s2[1] = "x";
s2[2] = "y";
?
What is the difference and I don't get 4 in the latter case as well?
UPDATE:
Using -1 is not was I was expecting as behavior.
I mean the last semicolon is the end of the String so in the latter example I was also expecting 4 as length of the array
From the docs,
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
UPDATE:
You have five substrings separated by ; In the second case, these are 1234, x, y, and . As per the docs, all empty substrings (at the end) which result from the split operation would be eliminated.
For details, look here.
If n is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
The string boo:and:foo, for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" } // all the empty substrings at the end were eliminated
Trailing empty strings are omitted. However, there are ways to include them explicitly, if needed.
From http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String)
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Good question. If you check the API documentation for String.split() and check the example with "boo:foo" then you can see that the trailing empty strings are omitted.
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
The string "boo:and:foo", for example, yields the following results with these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
Thats default behavior of split method in java to not return empty tokens . ]
s.split("\;", -1); should return empty token
Why not check what does the documention says first. Here is the link:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29
And here is your answer:
Trailing empty strings are therefore not included in the resulting
array.

Categories