Java String split is not working - java

Java Experts ,
Please look into the below split command code and let me know why last two nulls are not captured.
String test = "1,O1,,,,0.0000,0.0000,,";
String[] splittest = test.split(",");
System.out.println("length -"+splittest.length);
for (String string : splittest) {
System.out.println("value"+string);
}
The result iam getting
length -7
value1
valueO1
value
value
value
value0.0000
value0.0000
surprisingly the length is 7 where as it should be 9 and also as you can see values after 0.0000 ie two last nulls are not coming . Lets say now if i change the string test
"1,O1,,,,0.0000,0.0000,0,0"
String test = "1,O1,,,,0.0000,0.0000,0,0";
String[] splittest = test.split(",");
System.out.println("length -"+splittest.length);
for (String string : splittest) {
System.out.println("value"+string);
}
I am getting correctly
length -9
value1
valueO1
value
value
value
value0.0000
value0.0000
value0
value0
I don't think iam doing wrong . Is it a bug ? JAVA Version - jdk1.6.0_31

It behaves as specified in the javadoc:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
If you want to keep the trailing blank strings, you can use the 2 argument split method with a negative limit:
String[] splittest = test.split(",", -1);
If the limit is non-positive then the pattern will be applied as many times as possible and the array can have any length.

split silently discards trailing separators, as specified in the Javadoc.
In general, the behavior of split is kind of weird. Consider using Guava's Splitter instead, which has somewhat more predictable and customizable behavior. (Disclosure: I contribute to Guava.)
Splitter.on(',').split("1,O1,,,,0.0000,0.0000,,");
// returns [1, O1, , , , 0.0000, 0.0000, , ]
Splitter.on(',').omitEmptyStrings()
.split("1,O1,,,,0.0000,0.0000,,");
// returns [1, O1, 0.0000, 0.0000]

As mentioned above, test.split(","); will ignore trailing blank strings. You could use the two parameter method with a large second argument. However, the API also states
If n is non-positive then the pattern will be applied as many times
as possible and the array can have any length.
where n is the second argument. So if you want all the trailing strings, I would recommend
test.split(",", -1);

Related

String.split() returns an array with an additional empty value

I'm working on a piece of code where I've to split a string into individual parts. The basic logic flow of my code is, the numbers below on the LHS, i.e 1, 2 and 3 are ids of an object. Once I split them, I'd use these ids, get the respective value and replace the ids in the below String with its respective values. The string that I have is as follow -
String str = "(1+2+3)>100";
I've used the following code for splitting the string -
String[] arraySplit = str.split("\\>|\\<|\\=");
String[] finalArray = arraySplit[0].split("\\(|\\)|\\+|\\-|\\*");
Now the arrays that I get are as such -
arraySplit = [(1+2+3), >100];
finalArray = [, 1, 2, 3];
So, after the string is split, I'd replace the string with the values, i.e the string would now be, (20+45+50)>100 where 20, 45 and 50 are the respective values. (this string would then be used in SpEL to evaluate the formula)
I'm almost there, just that I'm getting an empty element at the first position. Is there a way to not get the empty element in the second array, i.e finalArray? Doing some research on this, I'm guessing it is splitting the string (1+2+3) and taking an empty element as a part of the string.
If this is the thing, then is there any other method apart from String.split() that would give me the same result?
Edit -
Here, (1+2+3)>100 is just an example. The round braces are part of a formula, and the string could also be as ((1+2+3)*(5-2))>100.
Edit 2 -
After splitting this String and doing some code over it, I'm goind to use this string in SpEL. So if there's a better solution by directly using SpEL then also it would be great.
Also, currently I'm using the syntax of the formula as such - (1+2+3) * 4>100 but if there's a way out by changing the formula syntax a bit then that would also be helpful, e.g replacing the formula by - ({#1}+{#2}+{#3}) *
{#4}>100, in this case I'd get the variable using {# as the variable and get the numbers.
I hope this part is clear.
Edit 3 -
Just in case, SpEL is also there in my project although I don't have much idea on it, so if there's a better solution using SpEL then its more than welcome. The basic logic of the question is written at the starting of the question in bold.
If you take a look at the split(String regex, int limit)(emphasis is mine):
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.
Thus, you can specify 0 as limit param:
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
If you keep things really simple, you may be able to get away with using a combination of regular expressions and string operations like split and replace.
However, it looks to me like you'd be better off writing a simple parser using ANTLR.
Take a look at Parsing an arithmetic expression and building a tree from it in Java and https://theantlrguy.atlassian.net/wiki/display/ANTLR3/Five+minute+introduction+to+ANTLR+3
Edit: I haven't used ANTLR in a while - it's now up to version 4, and there may be some significant differences, so make sure that you check the documentation for that version.

Java Split - wrong length

Why am I receiving a length of 3 instead of 4? How can I fix this to give the proper length?
String s="+9851452;;FERRARI;;";
String split[]=s.split("[;]");
System.out.println(split.length);
You're receiving a length of 3 because for split,
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
If you specify a negative limit, it'll work fine:
String s="+9851452;;FERRARI;;";
String split[]=s.split(";", -1);
System.out.println(Arrays.toString(split));
You'll just need to ignore or remove the 5th item, or remove the trailing ; - it shows up because there are 5 (potentially blank) strings on either sides of 4 tokens. See the docs for more info.
String split[]=s.split("[;]", -1);
The answer for WHY it's not working is in the doc: http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#split%28java.lang.String%29 . "Trailing empty strings are therefore not included in the resulting array.".
You can use StringUtils from apache commons lang.
String s="+9851452;;FERRARI;;";
Arrays.toString(StringUtils.splitPreserveAllTokens(s, ";"))

Problems with running String.split("")

I have an Integer which I'm trying to convert into a String[] array, so that I can access the individual digits it is comprised of. Here's my code:
Integer num = 101;
String[] numArray = num.toString().split("");
Why does System.out.println(numArray.length) return 4 not 3?
Edit: To the people downvoting this thread, if you actually read my post, you would understand that I tried to troubleshoot the issue myself before posting it here. I get that there is a downvote trend here because of exams' week and people seeking easy answers, but I personally wasn't.
Because ...
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
However, leading empty strings are included. You're splitting on "nothing" which matches before the first character in the string, so you get a leading empty string.
The same thing would occur if you had, for example:
String foo = ":a:b:c:";
String[] bar = foo.split(":");
bar[0] would be an empty string and bar.length would be 4

Why does "split" on an empty string return a non-empty array?

Split on an empty string returns an array of size 1 :
scala> "".split(',')
res1: Array[String] = Array("")
Consider that this returns empty array:
scala> ",,,,".split(',')
res2: Array[String] = Array()
Please explain :)
If you split an orange zero times, you have exactly one piece - the orange.
The Java and Scala split methods operate in two steps like this:
First, split the string by delimiter. The natural consequence is that if the string does not contain the delimiter, a singleton array containing just the input string is returned,
Second, remove all the rightmost empty strings. This is the reason ",,,".split(",") returns empty array.
According to this, the result of "".split(",") should be an empty array because of the second step, right?
It should. Unfortunately, this is an artificially introduced corner case. And that is bad, but at least it is documented in java.util.regex.Pattern, if you remember to take a look at the documentation:
For n == 0, the result is as for n < 0, except trailing empty strings
will not be returned. (Note that the case where the input is itself an
empty string is special, as described above, and the limit parameter
does not apply there.)
Solution 1: Always pass -1 as the second parameter
So, I advise you to always pass n == -1 as the second parameter (this will skip step two above), unless you specifically know what you want to achieve / you are sure that the empty string is not something that your program would get as an input.
Solution 2: Use Guava Splitter class
If you are already using Guava in your project, you can try the Splitter (documentation) class. It has a very rich API, and makes your code very easy to understand.
Splitter.on(".").split(".a.b.c.") // "", "a", "b", "c", ""
Splitter.on(",").omitEmptyStrings().split("a,,b,,c") // "a", "b", "c"
Splitter.on(CharMatcher.anyOf(",.")).split("a,b.c") // "a", "b", "c"
Splitter.onPattern("=>?").split("a=b=>c") // "a", "b", "c"
Splitter.on(",").limit(2).split("a,b,c") // "a", "b,c"
Splitting an empty string returns the empty string as the first element. If no delimiter is found in the target string, you will get an array of size 1 that is holding the original string, even if it is empty.
For the same reason that
",test" split ','
and
",test," split ','
will return an array of size 2. Everything before the first match is returned as the first element.
"a".split(",") -> "a"
therefore
"".split(",") -> ""
In all programming languages I know a blank string is still a valid String. So doing a split using any delimiter will always return a single element array where that element is the blank String. If it was a null (not blank) String then that would be a different issue.
This split behavior is inherited from Java, for better or worse...
Scala does not override the definition from the String primitive.
Note, that you can use the limit argument to modify the behavior:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
i.e. you can set the limit=-1 to get the behavior of (all?) other languages:
# ",a,,b,,".split(",")
res1: Array[String] = Array("", "a", "", "b")
# ",a,,b,,".split(",", -1) // limit=-1
res2: Array[String] = Array("", "a", "", "b", "", "")
It's seems to be well-known the Java behavior is quite confusing but:
The behavior above can be observed from at least Java 5 to Java 8.
There was an attempt to change the behavior to return an empty array when splitting an empty string in JDK-6559590. However, it was soon reverted in JDK-8028321 when it causes regression in various places. The change never makes it into the initial Java 8 release.
Note: The split method wasn't in Java from the beginning (it's not in 1.0.2) but actually is there from at least 1.4 (e.g. see JSR51 circa 2002). I am still investigating...
What's unclear is why Java chose this in the first place (my suspicion is that it was originally an oversight/bug in an "edge case"), but now irrevocably baked into the language and so it remains.
Empty string have no special status while splitting a string. You may use:
Some(str)
.filter(_ != "")
.map(_.split(","))
.getOrElse(Array())
use this Function,
public static ArrayList<String> split(String body) {
return new ArrayList<>(Arrays.asList(Optional.ofNullable(body).filter(a->!a.isEmpty()).orElse(",").split(",")));
}

foo.split(',').length != number of ',' found in 'foo'?

Maybe it's because it's end of day on a Friday, and I have already found a work-around, but this is killing me.
I am using Java but am .NET developer.
I have a string and I need to split it on semicolon comma. Let's say its a row in a CSV file who has 200 210 columns. line.split(',').length will be sometimes, 199, where count of ',' will be 208 OR 209. I find count in 2 different ways even to be sure (using a regex, then manually looping through and checking the character after losing my sanity).
What's the super-obvious-hit-face-on-desk thing I'm missing here? Why isn't foo.split(delim).length == CountOfOccurences(foo,delim) all the time, only sometimes?
thanks much
First, there's an obvious difference of one. If there are 200 columns, all with text, there are 199 commas. Second, Java drops trailing empty strings by default. You can change this by passing a negative number as the second argument.
"foo,,bar,baz,,".split(",")
is:
{foo,,bar,baz}
an array of 4 elements. But
"foo,,bar,baz,,".split(",", -1)
is::
{foo,,bar,baz,,}
with all 6.
Note that only trailing empty strings are dropped by default.
Finally, don't forget that the String is compiled into a regex. This is not be applicable here, since , is not a special character, but you should keep it in mind.
There are a couple things happening. First, if you have three items like a,b,c and split on comma, you'll have three entries, one more than the number of commas.
But what you're dealing with probably comes from consecutive delimiters. : a,,,,b,c,,,,,
The ones at the end get dropped. Check the java documentation for the split function.
http://download.java.net/jdk7/docs/api/java/lang/String.html
As others have pointed out, String.split has some very non-intuitive behaviour.
If you're using Google's Guava open-source Java library, there's a Splitter class which gives a much nicer (in my opinion) API for this, with more flexibility:
String input = "foo, bar,";
Splitter.on(',').split(input);
// returns "foo", " bar", ""
Splitter.on(',').omitEmptyStrings().split(input);
// returns "foo", " bar"
Splitter.on(',').omitEmptyStrings().trimResults().split(input);
// returns "foo", "bar"
Is it omitting blanks?
Do you have something like "a,b,c,,d,e" or trailing delimiters like "a,b,c,,,,"?
Are there extra delimiters in the cell data?
Short example: foo = "1,2" and
foo.split(",").length = 2
count(foo, ",") = 1
Probably you have a mistake in your code. Here is an example in Java code:
String row = "1,2,3,4,,5"; // second example: 1,2,3,5,,
System.out.println(row.split(",").length); // print 6 in both cases
// code to count how many , you have in your row
Pattern patter = Pattern.compile(",");
Matcher m = patter.matcher(row);
int nr = 0;
while(m.find())
{
nr++;
}
System.out.println(nr); // print 5 for the first example and 6 for second

Categories