How to split String with this regular expression? - java

if (url.contains("|##|")) {
Log.e("url data", "" + url);
final String s[] = url.split("\\|##|");
}
I have a URL with the separator "|##|"
I tried to separate this but didn't find solution.

Use Pattern.quote, it'll do the work for you:
Returns a literal pattern String for the specified String.
final String s[] = url.split(Pattern.quote("|##|"));
Now "|##|" is treated as the string literal "|##|" and not the regex "|##|". The problem is that you're not escaping the second pipe, it has a special meaning in regex.
An alternative solution (as suggested by #kocko), is escaping* the special characters manually:
final String s[] = url.split("\\|##\\|");
* Escaping a special character is done by \, but in Java \ is represented as \\

You have to escape the second |, as it is a regex operator:
final String s[] = url.split("\\|##\\|");

You should try to understand the concept as well - String.split(String regex) interprets the parameter as a regular expression, and since pipe character "|" is a logical OR in regular expression, you would be getting result as an array of each alphabet is your word.
Even if you had used url.split("|"); you would have got same result.
Now why the String.contains(CharSequence s) passed the |##| in the start because it interprets the parameter as CharSequence and not a regular expression.
Bottom line: Check the API that how the particular method interprets the passed input. Like we have seen, in case of split() it interprets as regular expression while in case of contains() it interprets as character sequence.
You can check the regular expression constructs over here - http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Related

Why split method does not support $,* etc delimiter to split string

import java.util.StringTokenizer;
class MySplit
{
public static void main(String S[])
{
String settings = "12312$12121";
StringTokenizer splitedArray = new StringTokenizer(settings,"$");
String splitedArray1[] = settings.split("$");
System.out.println(splitedArray1[0]);
while(splitedArray.hasMoreElements())
System.out.println(splitedArray.nextToken().toString());
}
}
In above example if i am splitting string using $, then it is not working fine and if i am splitting with other symbol then it is working fine.
Why it is, if it support only regex expression then why it is working fine for :, ,, ; etc symbols.
$ has a special meaning in regex, and since String#split takes a regex as an argument, the $ is not interpreted as the string "$", but as the special meta character $. One sexy solution is:
settings.split(Pattern.quote("$"))
Pattern#quote:
Returns a literal pattern String for the specified String.
... The other solution would be escaping $, by adding \\:
settings.split("\\$")
Important note: It's extremely important to check that you actually got element(s) in the resulted array.
When you do splitedArray1[0], you could get ArrayIndexOutOfBoundsException if there's no $ symbol. I would add:
if (splitedArray1.length == 0) {
// return or do whatever you want
// except accessing the array
}
If you take a look at the Java docs you could see that the split method take a regex as parameter, so you have to write a regular expression not a simple character.
In regex $ has a specific meaning, so you have to escape it this way:
settings.split("\\$");
The problem is that the split(String str) method expects str to be a valid regular expression. The characters you have mentioned are special characters in regular expression syntax and thus perform a special operation.
To make the regular expression engine take them literally, you would need to escape them like so:
.split("\\$")
Thus given this:
String str = "This is 1st string.$This is the second string";
for(String string : str.split("\\$"))
System.out.println(string);
You end up with this:
This is 1st string.
This is the second strin
Dollar symbol $ is a special character in Java regex. You have to escape it so as to get it working like this:
settings.split("\\$");
From the String.split docs:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
On a side note:
Have a look at the Pattern class which will give you an idea as to which all characters you need to escape.
Because $ is a special character used in Regular Expressions which indicate the beginning of an expression.
You should escape it using the escape sequence \$ and in case of Java it should be \$
Hope that helps.
Cheers

String split by dot - Java

I have the following code:
public static void main(String[] args) {
String str = "21.12.2015";
String delim = "\\.";
String[] st = str.split(delim);
System.out.println(st[0]+"."+st[1]+"."+st[2]); // 1
System.out.println(st[0]+delim+st[1]+delim+st[2]); // 2
}
Now, line 1 is printing expected output - 21.12.2015. But why line 2 is not giving same output as line 1? Why it is printing like 21\.12\.2015?
EDIT:
Actually in my requirement, the delimiter changes dynamically for each string(- or / or .). So I am trying to assign the delimiter to a variable and then split by it and finally print it as a pattern(say dd.mm.yy or dd-mm-yy or etc). For other delimiters it's fine, but for dot it's coming like dd\.mm\.yy. How shall I achieve the expected result?
This handles all delim values:
String str = "21.12.2015";
String delim = "."; // or "-" or "?" or ...
String[] st = str.split(java.util.regex.Pattern.quote(delim));
When you say split you are using delim as a regex pattern. It is treated differently. Please have a look to this regular expression.
But when you are using delim in sysout you are using it as string. the difference is obviuos
When you create the delim variable, you escape the backslash. The real value of the delim variable is \..
Just create the delim variable as (the backslash is useless):
String delim = ".";
because of delim = "\\.", while spliting "\\." is required.
You are using the split method from the String class, which uses regular expression for splitting the the string.
Due to this the \\. will split the string by every dot and needs to be escaped, since the dot itself is part of the regular expression.
In the second part you are simply printing the string, in which the backlash itself is a indicator for an string expression (like \n as a new line).
The double backlash just excludes this string expression to be written as a normal string "\n" in this case, and thats why you get the "\." result
For better understanding, try to delete one of the backslashes in the delim variable, and the java interpreter will throw an error since "\." is not a string expression
\\. is a regex String to parse . literally. You need it while splitting (since split() expects a regex String).
While printing, you need to use . directly isntead of "\\." because println() doesn't need a regex.
Split method uses regex for splitting so you will need to provide as \\. while this is not the scenario when you are printing it, you just need to use '.' directly.
In Java \\. will be printed as \. as \\ is considered as a single backslash.

Split string and calculate power of the 2 substrings

My input string looks like this "10^-9". Since Java don't handle exponentiation and "^" stands for bitwise XOR, I tried to do it differently. I splitted the String in 10 and -9 and parsed it into double, then used Math.pow() to get the value of it. The code looks like this:
String[] relFactorA = vA.getValue().split("^"); //vA.getValue is a String like `"10^-9"` or any other number instead of 9.
Double pow1 = Double.parseDouble(relFactorA[0]);
Double pow2 = Double.parseDouble(relFactorA[1]);
Double relFactor = Math.pow(pow1, pow2);
System.out.println(relFactor);
But that approach results in an java.lang.NumberFormatException. In the code I cannot find an error, whether I did something wrong or the compiler recognizes the "-(Minus)" as a "-(hyphen)", but I dont think thats the reason, because both Strings look the same and the compiler should see this.
You probably didn't split on ^ since split uses regex as parameter and in regex ^ has special meaning (its start of line). To make it literal use "\\^".
String.split() uses a regex as parameter. If you want to split for the symbol ^ use \\^ instead.
String[] relFactorA = vA.getValue().split("\\^");
Note that String#split takes a regex and not a String:
public String[] split(String regex)
You should escape the special character ^:
vA.getValue().split("\\^");
Escaping a regex is usually done by \, but in Java, \ is represented as \\.
Instead, you can use Pattern#quote that will treat the ^ as the String ^ and not the meta-character ^:
Returns a literal pattern String for the specified String.
vA.getValue().split(Patter.quote("^"));

how to replace a string in Java

I have a question about using replaceAll() function.
if a string has parentheses as a pair, replace it with "",
while(S.contains("()"))
{
S = S.replaceAll("\\(\\)", "");
}
but why in replaceAll("\\(\\)", "");need to use \\(\\)?
Because as noted by the javadocs, the argument is a regular expression.
Parenthesis in a regular expression are used for grouping. If you're going to match parenthesis as part of a regular expression they must be escaped.
It's because replaceAll expects a regex and ( and ) have a special meaning in a regex expressions and need to be escaped.
An alternative is to use replace, which counter-intuitively does the same thing as replaceAll but takes a string as an input instead of a regex:
S = S.replace("()", "");
First, your code can be replaced with:
S = S.replace("()", "");
without the while loop.
Second, the first argument to .replaceAll() is a regular expression, and parens are special tokens in regular expressions (they are grouping operators).
And also, .replaceAll() replaces all occurrences, so you didn't even need the while loop here. Starting with Java 6 you could also have written:
S = S.replaceAll("\\Q()\\E", "");
It is let as an exercise to the reader as to what \Q and \E are: http://regularexpressions.info gives the answer ;)
S = S.replaceAll("\(\)", "") = the argument is a regular expression.
Because the method's first argument is a regex expression, and () are special characters in regex, so you need to escape them.
Because parentheses are special characters in regexps, so you need to escape them. To get a literal \ in a string in Java you need to escape it like so : \\.
So () => \(\) => \\(\\)

Replace a special character with other special character within string

I want to replace a special character " with \" in string.
I tried str = str.replaceAll("\"","\\\");
But this doesnt work.
The closing quotes are missing in the 2nd parameter. Change to:
str = str.replaceAll("\"","\\\\\"");
Also see this example.
String.replaceAll() API:
Replaces each substring of this string that matches the given regular
expression with the given replacement.
An invocation of this method of the form str.replaceAll(regex, repl)
yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceAll(repl)
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
Btw, it is duplicated question.
You have to escape the \ by doubling it:\\
Code example:
String tt = "\\\\terte\\";
System.out.println(tt);
System.out.println(tt.replaceAll("\\\\", "|"));
This gives the following output:
\\terte\
||terte|

Categories