Split String in Java with [a-z] regular expression - java

I have two regexpressions:
[a-c] : any character from a-c
[a-z] : any character from a-z
And a test:
public static void main(String[] args) {
String s = "abcde";
String[] arr1 = s.split("[a-c]");
String[] arr2 = s.split("[a-z]");
System.out.println(arr1.length); //prints 4 : "", "", "", "de"
System.out.println(arr2.length); //prints 0
}
Why the second splitting behaves like this? I would expect a reslut with 6 empty string "" results.

According to the documentation of the single-argument String.split:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
To keep the trailing strings, you can use the two-argument version, and specify a negative limit:
String s = "abcde";
String[] arr1 = s.split("[a-c]", -1); // ["", "", "", "de"]
String[] arr2 = s.split("[a-z]", -1); // ["", "", "", "", "", ""]

By default, split discards trailing empty strings. In the arr2 case, they were all trailing empty strings, so they were all discarded.
To get 6 empty strings, pass a negative limit as the second parameter to the split method, which will keep all trailing empty strings.
String[] arr2 = s.split("[a-z]", -1);
If n is non-positive then the pattern will be applied as many times as
possible and the array can have any length.

String.split():
Splits this string around matches of the given regular expression.
Around means that the matches themselves are removed. For example, splitting "a,b,c" on commas would be just a as well as b and c.
The first split removes the a, b, and c.
The second removes all letters, thus all characters from that string.

Related

Java String split returns length 0

public int lengthOfLastWord(String s) {
s.replaceAll("\\s", "");
String[] splittedS = s.split("\\s+");
if(splittedS.length == 1 && splittedS[0].equals("")) return 0;
return splittedS[splittedS.length - 1].length();
}
I tested it out with the string " ", and it returns that the length of splittedS is 0.
When I trimmed the String did I get " " -> "", so when I split this, I should have an array of length with with the first element being ""?
Java Strings are immutable so you have to store the reference to the returned String after replacement because a new String has been returned. You have written,
s.replaceAll("\\s", "");
But write,
s = s.replaceAll("\\s", "");
instead of above.
Wherever you perform operations on String, keep the new reference moving further.
The call to replaceAll has no effect, but since you split on \\s+, split method works exactly the same: you end up with an empty array.
Recall that one-argument split is the same as two-argument split with zero passed for the second parameter:
String[] splittedS = s.split("\\s+", 0);
// ^^^
This means that regex pattern is applied until there's no more changes, and then trailing empty strings are removed from the array.
This last point is what makes your array empty: the application of \\s+ pattern produces an array [ "" ], with a single empty string. This string is considered trailing by split, so it is removed from the result.
This result is not going to change even if you fix the call to replaceAll the way that other answers suggest.
You need to re assign the variable
s=s.replaceAll(...)

String.split(String pattern) Java method is not working as intended

I'm using String.split() to divide some Strings as IPs but its returning an empty array, so I fixed my problem using String.substring(), but I'm wondering why is not working as intended, my code is:
// filtrarIPs("196.168.0.1 127.0.0.1 255.23.44.1 100.168.100.1 90.168.0.1","168");
public static String filtrarIPs(String ips, String filtro) {
String resultado = "";
String[] lista = ips.split(" ");
for (int c = 0; c < lista.length; c++) {
String[] ipCorta = lista[c].split("."); // Returns an empty array
if (ipCorta[1].compareTo(filtro) == 0) {
resultado += lista[c] + " ";
}
}
return resultado.trim();
}
It should return an String[] as {"196"."168"."0"."1"}....
split works with regular expressions. '.' in regular expression notation is a single character. To use split to split on an actual dot you must escape it like this: split("\\.").
Use
String[] ipCorta = lista[c].split("\\.");
in regular expressions the . matches almost any character.
If you want to match the dot you have to escape it \\..
Your statement
lista[c].split(".")
will split the first String "196.168.0.1" by any (.) character, because String.split takes a regular expression as argument.
However, the point, why you are getting an empty array is, that split will also remove all trailing empty Strings in the result.
For example, consider the following statement:
String[] tiles = "aaa".split("a");
This will split the String into three empty values like [ , , ]. Because of the fact, that the trailing empty values will be removed, the array will remain empty [].
If you have the following statement:
String[] tiles = "aaab".split("a");
it will split the String into three empty values and one filled value b like [ , , , "b"]
Since there are no trailing empty values, the result remains with these four values.
To get rid of the fact, that you don't want to split on every character, you have to escape the regular expression like this:
lista[c].split("\\.")
String.split() takes a regular expression as parameter, so you have to escape the period (which matches on anything). So use split("\\.") instead.
THis may help you:
public static void main(String[] args){
String ips = "196.168.0.1 127.0.0.1 255.23.44.1 100.168.100.1 90.168.0.1";
String[] lista = ips.split(" ");
for(String s: lista){
for(String s2: s.split("\\."))
System.out.println(s2);
}
}

Behaviour of String.split in java 1.6?

My code is:
String s = "1;;;; 23;;";
System.out.println(s.split(";").length);
and gives as output 5.
The source code of split is:
public String[] split(String regex) {
return split(regex, 0);
}
and the documentation says:
This method works as if by invoking the two-argument
split(java.lang.String,int) method with the given expression and a
limit argument of zero. Trailing empty strings are therefore not
included in the resulting array.
The string "boo:and:foo", for example, yields the following results
with these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
If I print the strings I have:
1
23
Shouldn't I get from this 1;;;; 23;; something like {"1", "", "", "", " 23", ""} ?
No, five is correct, as your quoted docs state:
Trailing empty strings are therefore not included in the resulting
array.
Which is why the empty strings at the end of the array are omitted. If you want the empty strings, do as Evgeniy Dorofeev's answer says and specify a limit of -1.
Since limit = 0 trailing empty strings are not included. Try
System.out.println(s.split(";", -1).length);
and you will get 7
It will split the string when ever ';' present and put into array.

Not sure how string split actually works in this case

I don't get the following:
In the following String:
String s = "1234;x;;y;";
if I do:
String[] s2 = s.split(";");
I get s2.length to be 4 and
s2[0] = "1234";
s2[1] = "x";
s2[2] = "";
s2[3] = "y";
But in the string: String s = "1234;x;y;;";
I get:
s2.length to be 3 and
s2[0] = "1234";
s2[1] = "x";
s2[2] = "y";
?
What is the difference and I don't get 4 in the latter case as well?
UPDATE:
Using -1 is not was I was expecting as behavior.
I mean the last semicolon is the end of the String so in the latter example I was also expecting 4 as length of the array
From the docs,
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
UPDATE:
You have five substrings separated by ; In the second case, these are 1234, x, y, and . As per the docs, all empty substrings (at the end) which result from the split operation would be eliminated.
For details, look here.
If n is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
The string boo:and:foo, for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" } // all the empty substrings at the end were eliminated
Trailing empty strings are omitted. However, there are ways to include them explicitly, if needed.
From http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String)
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Good question. If you check the API documentation for String.split() and check the example with "boo:foo" then you can see that the trailing empty strings are omitted.
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
The string "boo:and:foo", for example, yields the following results with these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
Thats default behavior of split method in java to not return empty tokens . ]
s.split("\;", -1); should return empty token
Why not check what does the documention says first. Here is the link:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29
And here is your answer:
Trailing empty strings are therefore not included in the resulting
array.

How to split a string with whitespace chars at the beginning?

Quick example:
public class Test {
public static void main(String[] args) {
String str = " a b";
String[] arr = str.split("\\s+");
for (String s : arr)
System.out.println(s);
}
}
I want the array arr to contain 2 elements: "a" and "b", but in the result there are 3 elements: "" (empty string), "a" and "b". What should I do to get it right?
Kind of a cheat, but replace:
String str = " a b";
with
String[] arr = " a b".trim().split("\\s+");
The other way to trim it is to use look ahead and look behind to be sure that the whitespace is sandwiched between two non-white-space characters,... something like:
String[] arr = str.split("(?<=\\S)\\s+(?=\\S)");
The problem with this is that it doesn't trim the leading spaces, giving this result:
a
b
but nor should it as String#split(...) is for splitting, not trimming.
The simple solution is to use trim() to remove leading (and trailing) whitespace before the split(...) call.
You can't do this with just split(...). The split regex is matching string separators; i.e. there will necessarily be a substring (possibly empty) before and after each matched separator.
You can deal with the case where the whitespace is at the end by using split(..., 0). This discards any trailing empty strings. However, there is no equivalent form of split for discarding leading empty strings.
Instead of trimming, you could just add an if to check if a string is empty or not.

Categories