Split method returning blank array in Java - java

I'm having a problem with the split function returning a blank array. When any index is called from the array, it throws an index out of bounds exception. Here is the code:
class splitNumber {
public static void main(String[] args) {
String number = "3.84";
String[] sep = number.split(".");
System.out.println(sep[0]);
}
}
Is there any fix or workaround for this? I'm using Java SE 7.

As noted elsewhere, String#split takes a regex. One alternate way to construct that regex is to use Pattern#quote:
String number = "3.84";
String[] sep = number.split(Pattern.quote("."));
System.out.println(Arrays.toString(sep));
This saves you from typing a bunch of tedious escape chars.

"." is a special character in regex and String's split method accepts regex, which you need to escape like:
String[] sep = number.split("\\.");

Use "\."
"Note that this takes a regular expression, so remember to escape special characters if necessary, e.g. if you want to split on period . which means "any character" in regex, use either split("\.") or split(Pattern.quote("."))."
How to split a string in Java

Related

Java split with special characters

I have below code that doing a split for string using <div>\\$\\$PZ\\$\\$</div> and it's not working using the special characters.
public class HelloWorld{
public class HelloWorld{
public static void main(String []args){
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split("<div>\\$\\$PZ\\$\\$</div>", 2);
for (String a : arrOfStr)
System.out.println(a);
}
}
the output os test<div>\$\$PZ\$\$</div>test
it works when I remove the special characters
Can you please help.
As you already know, the parameter to split(...) is a regular expression, so some characters have special meaning. If you want the parameter to be treated literally, i.e. not as a regex, call the Pattern.quote(String s) method.
Example
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split(Pattern.quote("<div>\\$\\$PZ\\$\\$</div>"), 2);
for (String a : arrOfStr)
System.out.println(a);
Output
test
test
The quote() method simply surrounds the literal text with the regex \Q...\E quotation pattern1, e.g. your <div>\$\$PZ\$\$</div> text becomes:
\Q<div>\$\$PZ\$\$</div>\E
For fixed text you could just do that yourself, i.e. the following 3 versions all create the same regex to split on:
str.split(Pattern.quote("<div>\\$\\$PZ\\$\\$</div>"), 2)
str.split("\\Q<div>\\$\\$PZ\\$\\$</div>\\E", 2)
str.split("<div>\\\\\\$\\\\\\$PZ\\\\\\$\\\\\\$</div>", 2)
To me, the 3rd one, using \ to escape, is the least readable/desirable version.
If there is a lot of special characters to escape, using \Q...\E is easier than \-escaping all the special characters separately, but very few people use it, so it's fairly unknown to most.
The quote() method is especially useful when you need to treat dynamic text literally, e.g. when the text to split on is configurable by the user.
1) quote() will correctly handle literal text containing \E.
This:
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split("<div>\\\\\\$\\\\\\$PZ\\\\\\$\\\\\\$</div>", 2);
for (String a : arrOfStr) {
System.out.println(a);
}
prints:
test
test
EDIT: Why do we need all those backslashes? It's because of how we need to handle String literals representing regex expressions. This page describes the reason with examples. The essence is this:
For a backslash \...
...the pattern to match that would be \\... (to escape the escape)
... but the string literal to create that pattern would have to have one backslash to escape each of the two backslashes: \\\\.
Add to that the original need to also escape the $, that gives us our 6 backslashes in the string representation.

Java escape characters in strings - string contains \r (need to keep "r")

I have a string "EAD\rgonzalez" which is passed to me.
I need to pull out "rgonzalez" from it.
I am running into problems with the "\" character.
I cannot find the index of it, I cannot replace it, etc.
Any help on pulling the data after the "\" would be appreciated.
The string that i receive is in the format of domain\username; the data can vary.
Another example would be US\ngross where \n would be interpreted as a newline character.
To clarify, I am not adding a '\', i am trying to split a string on a '\'
This string contains '\r' which in itself is a character, a special one.
I need a way to make \r contained within my string two separate characters, a '\' and an 'r'.
You haven't provided any code, but I'm assuming what you're doing is something like this:
String user = request.getParameter("user"); // user = "EAD\rgonzalez"
If you were to declare a static string in your application, you would have to escape the backslash because it is a special character for Java strings:
String user = "EAD\\rgonzalez";
To split that string on the backslash you must escape it twice in the regex that you pass to the split method. Once because backslash is a special character for Java strings and again because backslash is a special character for regex strings. So instead of one backlash you have four. The one is escaped so then you have two, and then both of them are escaped again.
String[] parts = user.split("\\\\");
Now you have split the string:
System.out.println(parts[0]); // "EAD"
System.out.println(parts[1]); // "rgonzalez"
The string that i receive is in the format of domain\username... the data can vary
The data shouldn't vary if that is the input your program expects.
where \n would be interpreted as a newline character
I'm not sure how you'd get newlines from a single line input form. If you are, then your input is invalid because it does not follow the format you're specified and are expecting. In the case where you did interpret newlines and other whitespace characters, you would either treat the whole thing as the domain, or the username, thus potentially breaking your program logic... Since you have stated the requirement of domain\username, and I don't think that requires you to handle any other form of input.
I am collecting this string from the header data from the request object in a webapp.
In that case, the raw value should not contain an escape character and is actually represented as the form "domain\\username" as a Java string. When you print the value, the escape characters aren't shown
I cannot find the index of it,
With the correct representation, indexOf("\\") will work...
pulling the data after the "\"
Since you would have the value as domain\\username, you need to escape both of the backslashes within the method of split(String pattern) since that is a regular expression.
For example,
public static void main (String[] args) throws java.lang.Exception
{
String in = "EAD\\rgonzalez";
System.out.println(in.indexOf("\\")); // find the index of '\'
String[] parts = in.split("\\\\"); // split on '\\'
System.out.println(Arrays.toString(parts));
}
Again, the string "EAD\rgonzalez" is not in the form of domain\username, as demonstrated here
System.out.print("EAD\rgonzalez".matches("[A-Z]+\\[a-z]+")); // false
The magic you need is in org.apache.commons.lang.StringEscapeUtils
Here is a demo:
package ignoreescapeseq2;
import org.apache.commons.lang.StringEscapeUtils;
/*
* #author Charles Knell
*/
public class IgnoreEscapeSeq2 {
public static void main(String[] args) {
String string = "EAD\rgonzalez"; // REQUIRED INPUT STRING
String eString = StringEscapeUtils.escapeJava(string);
String [] sArray = eString.split("\\\\");
System.out.println("domain: " + sArray[0]);
System.out.println("username: " + sArray[1]);
}
}
Here is the output:
Although this MAY answer the question, there does still seem to be a problem
if you must define the string in java. As you said, "EAD\xgonzalez" isn't a
valid java string because \x isn't a valid escape character. The solution above only works if the input string never has to be explictly defined, as in the demo.

Java Regex Metacharacters returning extra space while spliting

I want to split string using regex instead of StringTokenizer. I am using String.split(regex);
Regex contains meta characters and when i am using \[ it is returning extra space in returning array.
import java.util.Scanner;
public class Solution{
public static void main(String[] args) {
Scanner i= new Scanner(System.in);
String s= i.nextLine();
String[] st=s.split("[!\\[,?\\._'#\\+\\]\\s\\\\]+");
System.out.println(st.length);
for(String z:st)
System.out.println(z);
}
}
When i enter input [a\m]
It returns array length as 3 and
a m
Space is also there before a.
Can anyone please explain why this is happening and how can i correct it. I don't want extra space in resulting array.
Since the [ is at the beginning of the string, when split removes [, there appear two elements after the first split step: the empty string that is at the beginning of the string, and the rest of the string. String#split does not return trailing empty elements only (as it is executed with limit=0 by default).
Remove the characters you split against from the start (using a .replaceAll("^[!\\[,?._'#+\\]\\s\\\\]+", note the ^ at the beginning of the pattern). Here is a sample code you can leverage:
String[] st="[a\\m]".replaceAll("^[!\\[,?._'#+\\]\\s\\\\]+", "")
.split("[!\\[,?._'#+\\]\\s\\\\]+");
System.out.println(st.length);
for(String z:st) {
System.out.println(z);
}
See demo
As an addition to Wiktor Stribiżew’s answer, you may do the same without having to specify the pattern twice, by dealing with the java.util.regex package directly. Removing this redundancy may avoid potential errors and may also be more efficient as the pattern doesn’t need to be parsed twice:
Pattern p = Pattern.compile("[!\\[,?\\._'#\\+\\]\\s\\\\]+");
Matcher m = p.matcher(s);
if(m.lookingAt()) s=m.replaceFirst("");
String[] st = p.split(s);
for(String z:st)
System.out.println(z);
To be able to use the same pattern, i.e. without having to use the anchor ^ for removing a leading separator, we first check via lookingAt() whether the pattern really matches at the beginning of the text before removing the first occurrence. Then, we proceed with the split operation, but reusing the already prepared Pattern.
Regarding your issue mentioned in a comment, the split operation will always return at least one element, the input string, when there is no match, even when the string is empty. If you wish to have an empty array then, the only solution is to replace the result explicitly:
if(st.length==1 && s.equals[0]) st=new String[0];
or, if you only want to treat an empty string specially, you may check this beforehand:
if(s.isEmpty()) st=new String[0];
else {
// the code as shown above
}

having trouble with arrays and maybe split

String realstring = "&&&.&&&&";
Double value = 555.55555;
String[] arraystring = realstring.split(".");
String stringvalue = String.valueof(value);
String [] valuearrayed = stringvalue.split(".");
System.out.println(arraystring[0]);
Sorry if it looks bad. Rewrote on my phone. I keep getting ArrayIndexOutOfBoundsException: 0 at the System.out.println. I have looked and can't figure it out. Thanks for the help.
split() takes a regexp as argument, not a literal string. You have to escape the dot:
string.split("\\.");
or
string.split(Pattern.quote("."));
Or you could also simply use indexOf('.') and substring() to get the two parts of your string.
And if the goal is to get the integer part of a double, you could also simply use
long truncated = (long) doubleValue;
split uses regex as parameter and in regex . means "any character except line separators", so you could expect that "a.bc".split(".") would create array of empty strings like ["","","","",""]. Only reason it is not happening is because (from split javadoc)
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
so because all strings are empty you get empty array (and that is because you see ArrayIndexOutOfBoundsException).
To turn off removal mechanism you would have to use split(regex, limit) version with negative limit.
To split on . literal you need to escape it with \. (which in Java needs to be written as "\\." because \ is also Strings metacharacter) or [.] or other regex mechanism.
Dot (.) is a special character so you need to escape it.
String realstring = "&&&.&&&&";
String[] partsOfString = realstring.split("\\.");
String part1 = partsOfString[0];
String part2 = partsOfString[1];
System.out.println(part1);
this will print expected result of
&&&
Its also handy to test if given string contains this character. You can do this by doing :
if (string.contains(".")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain .");
}

Splitting string on the basis of keyword

I have string
String x="http://www.allindiaflorist.com/imgs/arrangemen4.jpg***http://storyofpakistan.com/wp-content/uploads/2011/11/Rukn-AlaminMultan.jpg***" ;
I want to extract string on the basis of *** so I should get array of size 2,
I am doing this,.
String[] explode=a.split("//***");
img1=explode[0]; //`it gives java.util.regex.patternSyntaxException`
I also tried
String[] explode=a.split("***");
img1=explode[0]; //`it gives java.util.regex.patternSyntaxException`
I am ok to write my custom generic function that can search for *** but I want to why split() is not working
Thanks
Use Pattern#quote:
String[] explode=a.split(Pattern.quote("***"));
Now you don't have to break your head on what special character you need to escape. The method "returns a literal pattern String for the specified String".
(For the sake of clarification, you're getting the error because you should escape each *).
Use regex [*]{3}.Try,
String x="htt.....
String arr[] =x.split("[*]{3}");
String str = "http://www.allindiaflorist.com/imgs/arrangemen4.jpg***http://storyofpakistan.com/wp-content/uploads/2011/11/Rukn-AlaminMultan.jpg***";
String delim = "\\*\\*\\*";
String[] arr= str.split(delim);
System.out.println(arr[0]);
System.out.println(arr[1]);
output
http://www.allindiaflorist.com/imgs/arrangemen4.jpg
http://storyofpakistan.com/wp-content/uploads/2011/11/Rukn-AlaminMultan.jpg
You can try this:
String[] explode=a.split("\\Q***\\E");
\Q Start quoting the regex.
\E End quoting the regex.
Basically, between \Q and \E the metacharacter * will be considered as a plain character (ie *) with no special meaning.
Escape * using \\
String[] arr=x.split("\\*\\*\\*");
try this code it will work you have given wrong regx pattern.
it should be inside[]
public static void main(String args[])
{
String x="http://www.allindiaflorist.com/imgs/arrangemen4.jpg***http://storyofpakistan.com/wp-content/uploads/2011/11/Rukn-AlaminMultan.jpg***" ;
String[] explode=x.split("[***]");
String img1=explode[0];
System.out.println(img1);
}

Categories