java split with unknown result - java

for the following code,
why there is empty string in position 2,3,5,6,8?
then why "b", ":andf", "1" has no empty string behind?
String[] splitStrs = "booo:and:fooo1o".split("o", -1);
System.out.println(splitStrs.length);
for (int i=0; i<splitStrs.length; i++) {
System.out.println("\"" + splitStrs[i]+ "\"");
}
output is:
8
"b"
""
""
":and:f"
""
""
"1"
""

why there is empty string in position 2,3,5,6,8?
When splitting on "o", there's nothing between the o's in "ooo", thus empty strings.
then why "b", ":andf", "1" has no empty string behind?
But there is an empty string at the end of your output, i.e., behind "1".
Per the documentation, a negative 2nd arg specifically means "trailing empty strings not discarded".
Always read the doc.

The split method will find all occurances where is wanted character (in your example "o"), put a new (sub)string between current "o" and next "o", without the "o" character, in array, and continue for the whole string.
When you have, for an example "oo", it will be "" since there is nothing between those 2 "o" characters.
Let's take an example. You have a string "Oh, hello Anna! I havent seen you since 2010s!" and split this string on every place where is "a" character.
First, start from the first character, then find where is next letter "a", which is found on 14th index. Take part of the string from start to that place where is "a" and add it into an array. First element of an array will look like "Oh, hello Ann" ("A" and "a" are different characters). Then start from that "a" where I have found (14th index) and find next "a" , which is in 20th index in our example. Take part of the string from first and second "a" and copy it in an array. Then the procedure goes on until the end of the string.
Result will be:
"Oh, hello Ann"
"! I h"
"vent seen you since 2010s!"
If we split our same string on every "n", by using same logic, we will get:
"Oh, hello A"
""
"a! I have"
"t see"
" you si"
"ce 2010s"
Reason why I get an empty string on second part is because in "...Anna...", there is nothing between those 2 "n" characters
Some examples can be found on: https://www.geeksforgeeks.org/split-string-java-examples/

Public String [ ] split ( String regex, int limit ) Parameters:
regex – a delimiting regular expression
Limit – the resulting
threshold
The limit parameter can have 3 values:
limit > 0 – If this is the case, then the pattern will be applied at
most limit-1 times, the resulting array’s length will not be more
than n, and the resulting array’s last entry will contain all input
beyond the last matched pattern.
limit < 0 – In this case, the
pattern will be applied as many times as possible, and the resulting
array can be of any size.
limit = 0 – In this case, the pattern will
be applied as many times as possible, the resulting array can be of
any size, and trailing empty strings will be discarded.
please visit GeeksforGeeks site for more information regarding spliting.
"booo:and:fooo1o".split("o", -1);
why there is empty string in position 2,3,5,6,8?
Since the limit is -1 we can split it any number of times. When 'o' is used as the regex it will give all the values which are before it since there is no value it returns empty string
then why "b", ":andf", "1" has no empty string behind?
There is no empty string because there are characters after the 'o' previous match.
See this example:
class Split{
public static void main(final String ... $){
var out = System.out;
final String s = "ooa";
for(final String str : s.split("o", -1))
out.println("\""+str+"\"");
}
}
Output:
$ javac Split.java && java Split
""
""
"a"
Why this output?
When first match happens # index 0 which is also the very first character and it returns string before it but since there is no string it returns an empty string.
Then when second match happens with o # index 1 it returns the string after the first match and before index 1. Since there is no characters it returns an empty string.
After that it returns a.

Related

How could the split() for Strings in Java be explained when two equal characters reside adjacently

In the docs for String.split,
The folowing are the examples. How can the last example be explained?
The string "boo:and:foo", for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" }
Read the docs carefully:
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string.
Note that the empty string is a substring of any string, including boo:and:foo. If you do "boo:and:foo".substring(2, 2), you will get the empty string. The empty string between the first two o's is followed by (i.e. "is terminated by") the substring "o" (the second o). The substring "o" matches the regex "o", so the empty string fulfils the requiremen:
is terminated by another substring that matches the given expression or is terminated by the end of the string
So it gets put into the resulting array.
The empty string after the second to last o also fulfils this criteria, and the empty string after the last o "is terminated by the end of the string". They should have been added to the array, and the array would have looked like:
{ "b", "", ":and:f", "", "" }
However, they are discarded from the array, because,
If [limit] is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
"trailing empty strings" refers to the last two "" elements in the array, which gets discarded.

Java - Why does string split for empty string give me a non empty array?

I want to split a String by a space. When I use an empty string, I expect to get an array of zero strings. Instead, I get an array with only empty string. Why ?
public static void main(String [] args){
String x = "";
String [] xs = x.split(" ");
System.out.println("strings :" + xs.length);//prints 1 instead of 0.
}
The single element string array entry is in fact empty string. This makes sense, because the split on " " fails, and hence you just get back the input with which you started. As a general approach, you may consider that if splitting returns you a single element, then the split did not match anything, leaving you with the starting input string.
An interesting puzzle indeed:
> "".split(" ")
String[1] { "" }
> " ".split(" ")
String[0] { }
The question is, when you split the empty string, why does the result contain the empty string, and when you split a space, why does the result not contain anything? It seems inconsistent, but all is explained in the documentation.
The String.split(String) method "works as if by invoking the two-argument split method with the given expression and a limit argument of zero", so let's read the docs for String.split(String, int). The case of the empty string is answered by this part:
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
The empty string has no part matching a space, so the output is an array containing one element, the input string, exactly as the docs say should happen.
The case of the string " " is answered by these two parts:
A zero-width match at the beginning however never produces such empty leading substring.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
The whole input string " " matches the splitting pattern. In principle we could include an empty string on either side of the match, but the docs say that an empty leading substring is never included, and (because the limit parameter n = 0) the trailing empty string is also discarded. Hence, the empty strings before and after the match are both not included in the resulting array, so it's empty.
It appears that since the String exists and it cannot be split (there are no spaces), it simply places the entire String into the first array position, causing there to be one. If you were to instead try
String x = " ";
String [] xs = x.split(" ");
System.out.println("strings :" + xs.length);//prints 1 instead of 0.
It will give you the zero you are expecting.
See also: Java String split removed empty values

How to detect if a string input has more than one consecutive space?

For a class I have to make a morse code program using a binary tree. The user is suppose to enter morse code and the program will decode it and print out the result. The binary tree only holds A-Z. And I only need to read dashes, dots, and spaces. If there is one space that is the end of the letter. If there is 2 or more spaces in a row that is the end of the word.
How do you detect if the string input has consecutive spaces? Right now I have it programmed where it detects if there is 2 (which will then print out a space), but i dont know how to have it where it knows there is 3+ spaces.
This is how I'm reading the input btw:
String input = showInputDialog( "Enter Code", null);
character = input.charAt(i);
And this is how I have it detecting a space: if (character == ' ').
Can anyone help?
Well, you could do something like this which if you had more than one item in the resulting array would tell you that you had at least one instance of 2+ spaces.
String[] foo = "a b c d".split(" +");
This splits into "a b", "c", and "d".
You'd probably need regex checks than just that though if you need to detect how many of each count of spaces (e.g. how many 2 spaces, how many 3 spaces, etc).
Note I have made an assumption that you are retrieving the full morse code message in one go and not one character at a time
Focusing on this point:
"If there is one space that is the end of the letter. If there is 2 or more spaces in a row that is the end of the word."
Personally, I'd use the split() method on the String class. This will split up a String into a String[] and then you can do some checks on the individual Strings in the array. Splitting on a space character like this will give you a couple of behavioural advantages:
Any strings that represent characters will have no trailing or leading spaces on them
Any sequences of multiple spaces will result in empty strings in the returned String[].
For example, calling split(" ") on the string "A B C" would give you a String[] containing {"A", "B", "", "C"}
Using this, I would first check if the empty string appeared at all. If this was the case, it implies that there were at least 2 space characters next to each other in the input morse code message. Then you can just ignore any empty strings that occur after the first one and it will cater for any number of sequential empty strings.
Without wanting to complete your assignment for you, here is some sample code:
public String decode(final String morseCode) {
final StringBuilder decodedMessage = new StringBuilder();
final String[] splitMorseCode = morseCode.split(" ");
for (final String morseCharacter : splitMorseCode) {
if( "".equals(morseCharacter) ) {
/* We now know we had at least 2 spaces in sequence
* So we check to see if we already added a space to spearate the
* resulting decoded words. If not, then we add one. */
if ( !decodedMessage.toString().endsWith(" ") ) {
decodedMessage.append(" ");
}
continue;
}
//Some code that decodes your morse code character.
}
return decodedMessage.toString();
}
I also wrote a quick test. In my example I made "--" convert to "M". Splitting the decodedMessage on the space character was a way of counting the individual words that had been decoded.
#Test
public void thatDecoderCanDecodeMultipleWordsSeparatedByMultipleSpaces() {
final String decodedMessage = this.decoder.decode("-- -- -- -- -- -- -- -- -- -- -- -- -- --");
assertThat(decodedMessage.split(" ").length, is(7));
assertThat(decodedMessage, is("MM MM MM MM MM MM MM"));
}
Of course, if this is still not making sense, then reading the APIs always helps
To detect if a String has more than one space:
if (str.matches(".* .*"))
This will help.,
public class StringTester {
public static void main(String args[]){
String s="Hello ";
int count=0;
char chr[]= s.toCharArray();
for (char chr1:chr){
if(chr1==' ')
count++;
}
if(count>=2)
System.out.println(" I got more than 2 spaces") ;
}

Not sure how string split actually works in this case

I don't get the following:
In the following String:
String s = "1234;x;;y;";
if I do:
String[] s2 = s.split(";");
I get s2.length to be 4 and
s2[0] = "1234";
s2[1] = "x";
s2[2] = "";
s2[3] = "y";
But in the string: String s = "1234;x;y;;";
I get:
s2.length to be 3 and
s2[0] = "1234";
s2[1] = "x";
s2[2] = "y";
?
What is the difference and I don't get 4 in the latter case as well?
UPDATE:
Using -1 is not was I was expecting as behavior.
I mean the last semicolon is the end of the String so in the latter example I was also expecting 4 as length of the array
From the docs,
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
UPDATE:
You have five substrings separated by ; In the second case, these are 1234, x, y, and . As per the docs, all empty substrings (at the end) which result from the split operation would be eliminated.
For details, look here.
If n is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
The string boo:and:foo, for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" } // all the empty substrings at the end were eliminated
Trailing empty strings are omitted. However, there are ways to include them explicitly, if needed.
From http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String)
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Good question. If you check the API documentation for String.split() and check the example with "boo:foo" then you can see that the trailing empty strings are omitted.
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
The string "boo:and:foo", for example, yields the following results with these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
Thats default behavior of split method in java to not return empty tokens . ]
s.split("\;", -1); should return empty token
Why not check what does the documention says first. Here is the link:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29
And here is your answer:
Trailing empty strings are therefore not included in the resulting
array.

How can split a string which contains only delimiter?

I am using the following code:
String sample = "::";
String[] splitTime = sample.split(":");
// extra detail omitted
System.out.println("Value 1 :"+splitTime[0]);
System.out.println("Value 2 :"+splitTime[1]);
System.out.println("Value 3 :"+splitTime[2]);
I am getting ArrayIndexOutofBound exception. How does String.split() handle consecutive or trailing / opening delimiters?
See also:
Doubt in split method
Java split() method strips empty strings at the end?
Alnitak is correct that trailing empty strings will be discarded by default.
If you want to have trailing empty strings, you should use split(String, int) and pass a negative number as the limit parameter.
The limit parameter controls the number of times the
pattern is applied and therefore affects the length of the resulting
array. If the limit n is greater than zero then the pattern
will be applied at most n - 1 times, the array's
length will be no greater than n, and the array's last entry
will contain all input beyond the last matched delimiter. If n
is non-positive then the pattern will be applied as many times as
possible and the array can have any length. If n is zero then
the pattern will be applied as many times as possible, the array can
have any length, and trailing empty strings will be discarded.
Note that split(aString) is a synonym for split(aString, 0):
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Also, you should use a loop to get the values from the array; this avoids a possible ArrayIndexOutOfBoundsException.
So your corrected code should be (assuming you want the trailing empty strings):
String sample = "::";
String[] splitTime = sample.split(":", -1);
for (int i = 0; i < splitTime.length; i++) {
System.out.println("Value " + i + " : \"" + splitTime[i] + "\"");
}
Output:
Value 0 : ""
Value 1 : ""
Value 2 : ""
From the J2SE API manual:
Trailing empty strings are therefore not included in the resulting array.
So, if you pass in "::" you'll get an empty array because all of the delimiters are trailing.
If you want to make sure that you get no more than three entries you should use:
String[] splitTime = sample.split(":", 3);
With an input of "::" that would indeed give you three empty strings in the output array.
However if the input only happens to have one ":" in it then you'll still only get two elements in your array.
Like this perhaps?
int ndx = 0;
StringTokenizer t = new StringTokenizer(": : ::::",":");
while (t.hasMoreElements())
{
System.out.println(String.format("Value %d : %s", ++ndx,t.nextElement()));
}
you should check the length of the splitTime array.
Use the function StringTokenizer in which u pass the string and the second argument as delimiter
use splittime.length function to find the length

Categories