Why is `\.` not a valid escape sequence in java regex [duplicate] - java

This question already has answers here:
java regex pattern unclosed character class
(2 answers)
Closed 6 years ago.
In the GWT tutorial where you build a stock watcher there is this regex expression to check if an input is valid:
if (!symbol.matches("^[0-9A-Z\\.]{1,10}$"))
Which allows inputs between 1 and 10 chars that are numbers, letters, or dots.
The part that confuses me is the \\.
I interpret this as escaped backslash \\ and then a . which stands for any character. And I thought the correct expression would be \. to escape the dot but doing this results in a regex error in eclipse Invalid escape sequence.
Am I missing the obvious here?

This is one of the hassles of regular expressions in Java. That \\ is not an escaped backslash at the regex level, just at the string level.
This string:
"^[0-9A-Z\\.]{1,10}$"
Defines this regular expression:
^[0-9A-Z\.]{1,10}$
...because the escape is consumed by the string literal.

\ is the escape symbol in a Java String Literal. For instance the newline character is written as \n. In order to place a normal \ in a Java string, this is done by using \\.
So your Java String literal (string in the code): "^[0-9A-Z\\.]{1,10}$" is the actual string used for the regular expression "^[0-9A-Z\.]{1,10}$" (with a single slash). So as you expected this is \. in the regular expression.

Related

(Java) To replace with a slash, why are 4 slashes required in replacement argument of String's replaceAll method? [duplicate]

This question already has answers here:
Regular expression to match a backslash followed by a quote
(3 answers)
Closed 5 years ago.
In Java, the string "\\" represents a single backlash, the first backslash being an escape character. Thus System.out.print("\\") prints \. However if "\\" is given as the replacement argument in method replaceAll, as in "aba".replaceAll("b", "\\"), the following exception is thrown: java.lang.IllegalArgumentException: character to be escaped is missing.
Four slashes does the trick. Thus if one prints "aba".replaceAll("b", "\\\\") the result is a\a. But why is two slashes incorrect? Isn't the first slash the escaping slash, and the second slash the character to be escaped, just like in System.out.print("\\")? Notice that only one escaping slash is sufficient for other replacement strings passed to replaceAll. E.g. printing "aba".replaceAll("b", "\t") results in a a.
Note: I'm using Java SE 9.
Edit: Some the questions suggested as duplicates are not duplicates. Please do not confuse this with the question of why four backslashes are needed in a regex to match a single backslash. This is not the same issue, as the second argument in replaceAll is obviously not a regex. You couldn't specify a replacement String with a regex because ultimately replacement needs to resolve to a literal String.
The answer is that replaceAll method takes a regualar expression String which has its own meaning of \ character so you have to escape it twice.
In simple words, reason that "aba".replaceAll("b", "\t") results in a a is that \t is parsed to tabulation before regex, so when regex is parsed it only contains a tabulation.

This regular expression is for which type of strings [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
How can I know that any particular regular expression matches which type of input? Like I want to know about \$\{([\w]+)\}. Which string will be matched by this regular expression?
Pattern placeholder = Pattern.compile("\\$\\{([\\w]+)\\}");
Matcher mat = placeholder.matcher("input");
while (mat.find()) {
}
It accepts E.L access type to variables:
${somethingHere}
As comemnted above, you can check that Reference for more info.
This will find any character within ${}
The \w metacharacter is used to find a word character.
A word character is a character from a-z, A-Z, 0-9, including the _ (underscore) character.
The other characters are escaped by \, \$ looks for a $ \{ looks for { and \} looks for }
The + token mean to repeat the character ([\w]) between one and unlimited times, as many times as possible.

Is it possible to use split function with a '.' as a delimiting regular expression? [duplicate]

This question already has answers here:
Split string with dot as delimiter
(13 answers)
Closed 8 years ago.
I wanna split the content of a string variable, but I wanna use the point as a delimiting regular expression, my code doesn't work.
public class Test {
public static void main(String [] a){
String ch = "r.a.c.h.i.d";
String[] tab;
tab=ch.split(".");
System.out.println(tab.length);
for(String e : tab)System.out.println(e);
}
}
Change tab=ch.split("."); to tab=ch.split("\\.");. You need to escape the dot because otherwise it's treated as a special character in the regex passed to split.
tab = ch.split("\\.");
One slash is the escape character for the regex. But in Java you need to have a second slash because you have to escape the first slash.
Yes, it's possible. In a regular expression, . means any character.
Predefined character classes
.       Any character (may or may not match line terminators)
So you must escape it to provide the literal . meaning. Escape it with a backslash character, providing two backslashes, because Java needs the backslash character itself escaped.
Use the regular expression "\\.".
In general, to get the literal characters out of an expression that may contain special-meaning characters, you can use the Pattern.quote method.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
split(Pattern.quote("."))

String.Split - Unexpected behaviour [duplicate]

This question already has answers here:
Why does String.split need pipe delimiter to be escaped?
(3 answers)
Closed 8 years ago.
I am trying to split a string as follows
String string = "mike|ricki"
If I do the following string.split("|") I would expect an array of 2 elements, "mike" and "ricki". Instead I am getting the following
[, m, i, k, e, |, r, i, c, k, i]
Am i doing something fundamentally wrong here?
Yes. Pipe character | is a special character in regular expressions. You must escape it by using \. The escape string would be \|, but in Java the backslash \ is a special character for escape in literal Strings, so you have to double escape it and use \\|:
String[] names = string.split("\\|");
System.out.println(Arrays.toString(names));
If you read the String.split() Java Documentation, it says that it can receive a Regular Expression as an input.
The Pipe character | is a special character in regular expressions so if you want to use it as a literal you have to escape it like \\|
So your code have to be:
String[] splitted = string.split("\\|");
EDIT : Corrected sample code.
String.split takes a regular expression. The pipe character has a special meaning in regex so it's not matching as you were expecting.
Try String.split("\\|") instead.
The backslash tells regex to treat the pipe as a literal character.

get regex to work in java

I have this regular expression pattern,
From: ["<][^>]*>
I need it to work in java and the double quotes is producing an error. When I try and escape it like so
From: [\"<][^>]*>
it does not produce the correct result. Does anyone know how to handle double quotes in java for regular expressions? Thanks
The \ character in Java String literals is a reserved escape character, so to add a regex escape character into a Java literal String object one must Escape the Escape :)
Eg. \\" will result in a regex of \" which will find double quote characters.
EDIT: One thing that I forgot was that the double quote character is also a reserved character for a Java string literalas well! Because of this the \ for the regex must be escaped as well as the " character.
The actual Java string literal will look like this String regex = "\\\"";

Categories