Problems with running String.split("")

Problems with running String.split("") - java

I have an Integer which I'm trying to convert into a String[] array, so that I can access the individual digits it is comprised of. Here's my code:
Integer num = 101;
String[] numArray = num.toString().split("");
Why does System.out.println(numArray.length) return 4 not 3?
Edit: To the people downvoting this thread, if you actually read my post, you would understand that I tried to troubleshoot the issue myself before posting it here. I get that there is a downvote trend here because of exams' week and people seeking easy answers, but I personally wasn't.

Because ...
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
However, leading empty strings are included. You're splitting on "nothing" which matches before the first character in the string, so you get a leading empty string.
The same thing would occur if you had, for example:
String foo = ":a:b:c:";
String[] bar = foo.split(":");
bar[0] would be an empty string and bar.length would be 4

Related

Why is my String array length 3 instead of 2?

I'm trying to understand regex. I wanted to make a String[] using split to show me how many letters are in a given string expression?
import java.util.*;
import java.io.*;
public class Main {
public static String simpleSymbols(String str) {
String result = "";
String[] alpha = str.split("[\\+\\w\\+]");
int alphaLength = alpha.length;
// System.out.print(alphaLength);
String[] charCount = str.split("[a-z]");
int charCountLength = charCount.length;
System.out.println(charCountLength);
}
}
My input string is "+d+=3=+s+". I split the string to count the number of letters in string. The array length should be two but I'm getting three. Also, I'm trying to make a regex to check the pattern +b+, with b being any letter in the alphabet? Is that correct?

So, a few things pop out to me:
First, your regex looks correct. If you're ever worried about how your regex will perform, you can use https://regexr.com/ to check it out. Just put your regex on the top and enter your string in the bottom to see if it is matching correctly
Second, upon close inspection, I see you're using the split function. While it is convenient for quickly splitting strings, you need to be careful as to what you are splitting on. In this case, you're removing all of the strings that you were initially looking at, which would make it impossible to find. If you print it out, you would notice that the following shows (for an input string of +d+=3=+s+):
+
+=3=+
+
Which shows that you accidentally cut out what you were looking to find in the first place. Now, there are several ways of fixing this, depending on what your criteria is.
Now, if what you wanted was just to separate on all +s and it doesn't matter that you find only what is directly bounded by +s, then split works awesome. Just do str.split("+"), and this will return you a list of the following (for +d+=3=+s+):
d
=3=
s
However, you can see that this poses a few problems. First, it doesn't strip out the =3= that we don't want, and second, it does not truly give us values that are surrounded by a +_+ format, where the underscore represents the string/char you're looking for.
Seeing as you're using +w, you intend to find words that are surrounded by +s. However, if you're just looking to find one character, I would suggest using another like [a-z] or [a-zA-Z] to be more specific. However, if you want to find multiple alphabetical characters, your pattern is fine. You can also add a * (0 or more) or a + (1 or more) at the end of the pattern to dictate what exactly you're looking for.
I won't give you the answer outright, but I'll give you a clue as to what to move towards. Try using a pattern and a matcher to find the regex that you listed above and then if you find a match, make sure to store it somewhere :)
Also, for future reference, you should always start a function name with a lower case, at least in Java. Only constants and class names should start in a capital :)

I am trying to use split to count the number of letters in that string. The array length should be two, but I'm getting three.
The regex in the split functions is used as delimiters and will not be shown in results. In your case "str.split([a-z])" means using alphabets as delimiters to separate your input string, which makes three substrings "(+)|d|(+=3=+)|s|(+)".
If you really want to count the number of letters using "split", use 'str.split("[^a-z]")'. But I would recommend using "java.util.regex.Matcher.find()" in order to find out all letters.
Also, I'm trying to make a regex to check the pattern +b+, with b being any letter in the alphabet? Is that correct?
Similarly, check the functions in "java.util.regex.Matcher".

String.split() returns an array with an additional empty value

I'm working on a piece of code where I've to split a string into individual parts. The basic logic flow of my code is, the numbers below on the LHS, i.e 1, 2 and 3 are ids of an object. Once I split them, I'd use these ids, get the respective value and replace the ids in the below String with its respective values. The string that I have is as follow -
String str = "(1+2+3)>100";
I've used the following code for splitting the string -
String[] arraySplit = str.split("\\>|\\<|\\=");
String[] finalArray = arraySplit[0].split("\\(|\\)|\\+|\\-|\\*");
Now the arrays that I get are as such -
arraySplit = [(1+2+3), >100];
finalArray = [, 1, 2, 3];
So, after the string is split, I'd replace the string with the values, i.e the string would now be, (20+45+50)>100 where 20, 45 and 50 are the respective values. (this string would then be used in SpEL to evaluate the formula)
I'm almost there, just that I'm getting an empty element at the first position. Is there a way to not get the empty element in the second array, i.e finalArray? Doing some research on this, I'm guessing it is splitting the string (1+2+3) and taking an empty element as a part of the string.
If this is the thing, then is there any other method apart from String.split() that would give me the same result?
Edit -
Here, (1+2+3)>100 is just an example. The round braces are part of a formula, and the string could also be as ((1+2+3)*(5-2))>100.
Edit 2 -
After splitting this String and doing some code over it, I'm goind to use this string in SpEL. So if there's a better solution by directly using SpEL then also it would be great.
Also, currently I'm using the syntax of the formula as such - (1+2+3) * 4>100 but if there's a way out by changing the formula syntax a bit then that would also be helpful, e.g replacing the formula by - ({#1}+{#2}+{#3}) *
{#4}>100, in this case I'd get the variable using {# as the variable and get the numbers.
I hope this part is clear.
Edit 3 -
Just in case, SpEL is also there in my project although I don't have much idea on it, so if there's a better solution using SpEL then its more than welcome. The basic logic of the question is written at the starting of the question in bold.

If you take a look at the split(String regex, int limit)(emphasis is mine):
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.
Thus, you can specify 0 as limit param:
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

If you keep things really simple, you may be able to get away with using a combination of regular expressions and string operations like split and replace.
However, it looks to me like you'd be better off writing a simple parser using ANTLR.
Take a look at Parsing an arithmetic expression and building a tree from it in Java and https://theantlrguy.atlassian.net/wiki/display/ANTLR3/Five+minute+introduction+to+ANTLR+3
Edit: I haven't used ANTLR in a while - it's now up to version 4, and there may be some significant differences, so make sure that you check the documentation for that version.

Why does "split" on an empty string return a non-empty array?

Split on an empty string returns an array of size 1 :
scala> "".split(',')
res1: Array[String] = Array("")
Consider that this returns empty array:
scala> ",,,,".split(',')
res2: Array[String] = Array()
Please explain :)

If you split an orange zero times, you have exactly one piece - the orange.

The Java and Scala split methods operate in two steps like this:
First, split the string by delimiter. The natural consequence is that if the string does not contain the delimiter, a singleton array containing just the input string is returned,
Second, remove all the rightmost empty strings. This is the reason ",,,".split(",") returns empty array.
According to this, the result of "".split(",") should be an empty array because of the second step, right?
It should. Unfortunately, this is an artificially introduced corner case. And that is bad, but at least it is documented in java.util.regex.Pattern, if you remember to take a look at the documentation:
For n == 0, the result is as for n < 0, except trailing empty strings
will not be returned. (Note that the case where the input is itself an
empty string is special, as described above, and the limit parameter
does not apply there.)
Solution 1: Always pass -1 as the second parameter
So, I advise you to always pass n == -1 as the second parameter (this will skip step two above), unless you specifically know what you want to achieve / you are sure that the empty string is not something that your program would get as an input.
Solution 2: Use Guava Splitter class
If you are already using Guava in your project, you can try the Splitter (documentation) class. It has a very rich API, and makes your code very easy to understand.
Splitter.on(".").split(".a.b.c.") // "", "a", "b", "c", ""
Splitter.on(",").omitEmptyStrings().split("a,,b,,c") // "a", "b", "c"
Splitter.on(CharMatcher.anyOf(",.")).split("a,b.c") // "a", "b", "c"
Splitter.onPattern("=>?").split("a=b=>c") // "a", "b", "c"
Splitter.on(",").limit(2).split("a,b,c") // "a", "b,c"

Splitting an empty string returns the empty string as the first element. If no delimiter is found in the target string, you will get an array of size 1 that is holding the original string, even if it is empty.

For the same reason that
",test" split ','
and
",test," split ','
will return an array of size 2. Everything before the first match is returned as the first element.

"a".split(",") -> "a"
therefore
"".split(",") -> ""

In all programming languages I know a blank string is still a valid String. So doing a split using any delimiter will always return a single element array where that element is the blank String. If it was a null (not blank) String then that would be a different issue.

This split behavior is inherited from Java, for better or worse...
Scala does not override the definition from the String primitive.
Note, that you can use the limit argument to modify the behavior:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
i.e. you can set the limit=-1 to get the behavior of (all?) other languages:
# ",a,,b,,".split(",")
res1: Array[String] = Array("", "a", "", "b")
# ",a,,b,,".split(",", -1) // limit=-1
res2: Array[String] = Array("", "a", "", "b", "", "")
It's seems to be well-known the Java behavior is quite confusing but:
The behavior above can be observed from at least Java 5 to Java 8.
There was an attempt to change the behavior to return an empty array when splitting an empty string in JDK-6559590. However, it was soon reverted in JDK-8028321 when it causes regression in various places. The change never makes it into the initial Java 8 release.
Note: The split method wasn't in Java from the beginning (it's not in 1.0.2) but actually is there from at least 1.4 (e.g. see JSR51 circa 2002). I am still investigating...
What's unclear is why Java chose this in the first place (my suspicion is that it was originally an oversight/bug in an "edge case"), but now irrevocably baked into the language and so it remains.

Empty string have no special status while splitting a string. You may use:
Some(str)
.filter(_ != "")
.map(_.split(","))
.getOrElse(Array())

use this Function,
public static ArrayList<String> split(String body) {
return new ArrayList<>(Arrays.asList(Optional.ofNullable(body).filter(a->!a.isEmpty()).orElse(",").split(",")));
}

foo.split(',').length != number of ',' found in 'foo'?

Maybe it's because it's end of day on a Friday, and I have already found a work-around, but this is killing me.
I am using Java but am .NET developer.
I have a string and I need to split it on semicolon comma. Let's say its a row in a CSV file who has 200 210 columns. line.split(',').length will be sometimes, 199, where count of ',' will be 208 OR 209. I find count in 2 different ways even to be sure (using a regex, then manually looping through and checking the character after losing my sanity).
What's the super-obvious-hit-face-on-desk thing I'm missing here? Why isn't foo.split(delim).length == CountOfOccurences(foo,delim) all the time, only sometimes?
thanks much

First, there's an obvious difference of one. If there are 200 columns, all with text, there are 199 commas. Second, Java drops trailing empty strings by default. You can change this by passing a negative number as the second argument.
"foo,,bar,baz,,".split(",")
is:
{foo,,bar,baz}
an array of 4 elements. But
"foo,,bar,baz,,".split(",", -1)
is::
{foo,,bar,baz,,}
with all 6.
Note that only trailing empty strings are dropped by default.
Finally, don't forget that the String is compiled into a regex. This is not be applicable here, since , is not a special character, but you should keep it in mind.

There are a couple things happening. First, if you have three items like a,b,c and split on comma, you'll have three entries, one more than the number of commas.
But what you're dealing with probably comes from consecutive delimiters. : a,,,,b,c,,,,,
The ones at the end get dropped. Check the java documentation for the split function.
http://download.java.net/jdk7/docs/api/java/lang/String.html

As others have pointed out, String.split has some very non-intuitive behaviour.
If you're using Google's Guava open-source Java library, there's a Splitter class which gives a much nicer (in my opinion) API for this, with more flexibility:
String input = "foo, bar,";
Splitter.on(',').split(input);
// returns "foo", " bar", ""
Splitter.on(',').omitEmptyStrings().split(input);
// returns "foo", " bar"
Splitter.on(',').omitEmptyStrings().trimResults().split(input);
// returns "foo", "bar"

Is it omitting blanks?
Do you have something like "a,b,c,,d,e" or trailing delimiters like "a,b,c,,,,"?
Are there extra delimiters in the cell data?

Short example: foo = "1,2" and
foo.split(",").length = 2
count(foo, ",") = 1
Probably you have a mistake in your code. Here is an example in Java code:
String row = "1,2,3,4,,5"; // second example: 1,2,3,5,,
System.out.println(row.split(",").length); // print 6 in both cases
// code to count how many , you have in your row
Pattern patter = Pattern.compile(",");
Matcher m = patter.matcher(row);
int nr = 0;
while(m.find())
{
nr++;
}
System.out.println(nr); // print 5 for the first example and 6 for second

How do I split a concatenated string into multiple floating point values?

I'm a begginer in java I have
packet=090209153038020734.0090209153039020734.0
like this I want to split this string and store into an array like two strings:
1) 090209153038020734.0
2) 090209153039020734.0
I have done like this:
String packetArray[] = packets.split(packets,Constants.SF);
Where:
Constants.SF=0x01.
But it won't work.
Please help me.

I'd think twice about using split since those are obviously fixed width fields.
I've seen them before on another question here (several in fact so I'm guessing this may be homework (or a popular data collection device :-)) and it's plain that the protocol is:
STX (0x01).
0x0f.
date (YYMMDD or DDMMYY).
time (HHMMSS).
0x02.
value (XXXXXX.X).
0x03.
0x04.
And, given that they're fixed width, you should probably just use substrings to get the information out.

The JavaDoc of String is helpful here: http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html
You have your String packet;
String.indexOf(String) gives you a position of a special substring. your interested in the "." sign. So you write
int position = packet.indexOf(".")+1
+1 becuase you want the trailing decimal too. It will return something 20-ish and will be the last pos of the first number.
Then we use substring
String first = packet.substring(0,position) will give you everything up to the ".0"
String second = packet.substring(position-1) should give you everything starting after the ".0" and up to the end of the string.
Now if you want them explicitely into an array you can just put them there. The code as a whole - I may have some "off by one" -bugs.
int position = packet.indexOf(".")+1
String first = packet.substring(0,position)
String second = packet.substring(position-1)
String[] packetArray = new String[2];
packetArray[0] = first;
packetArray[1] = second;

String packetArray[] = packets.split("\u0001");
should work. You are using
public String[] split(String regex, int limit)
which is doing something else: It makes sure that split() returns an array with at most limit members (1 in this case, so you get what you ask for).

You need to read the Javadocs for the String.split() methods...you are calling the version of String.split() that takes a regular expression and a limit, but you are passing the string itself as the first parameter, which doesn't really make sense.
As Aaron Digulla mentioned, use the other version.

You don't say how you want to do the split. It could be based on a fixed length (number of characters) or you want one decimal place.
If the former you could do packetArray = new String[]{packet.substring(0, 20), packet.substring(21)};
int dotIndex = packets.indexOf('.');
packetArray = new String[]{packet.substring(0, dotIndex+2), packet.substring(dotIndex+2)};
Your solution confuses the regexp with the string.

split uses regular expressions as documented here. Your code seems to be trying to match the whole string Constants.SF = 0x01 times, which doesn't make much sense. If you know what char the boxes are then you can use something like {[^c]+cc} where c is the character of the box (i guess this is 0x01), to match each "packet".
I think you are trying to use it like the .net String.Split(...) function?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.