get regex to work in java - java

I have this regular expression pattern,
From: ["<][^>]*>
I need it to work in java and the double quotes is producing an error. When I try and escape it like so
From: [\"<][^>]*>
it does not produce the correct result. Does anyone know how to handle double quotes in java for regular expressions? Thanks

The \ character in Java String literals is a reserved escape character, so to add a regex escape character into a Java literal String object one must Escape the Escape :)
Eg. \\" will result in a regex of \" which will find double quote characters.
EDIT: One thing that I forgot was that the double quote character is also a reserved character for a Java string literalas well! Because of this the \ for the regex must be escaped as well as the " character.
The actual Java string literal will look like this String regex = "\\\"";

Related

Regular Expressions with double backslash in java

I want to understand the concept of regular expression in below code:
private static final String SQL_INSERT = "INSERT INTO ${table}(${keys})
VALUES(${values})";
private static final String TABLE_REGEX = "\\$\\{table\\}";
.
.
.
String query = SQL_INSERT.replaceFirst(TABLE_REGEX, "tableName");
The above code is working fine but i would like to understand how. As per my knowledge $ and { symbols should be escaped in java string using backslash but in above string there is no backslash and if I try to add it, it shows error: invalid escape sequence.
Also why the TABLE_REGEX = "\\$\\{table\\}"; contains double backslash?
The $ and { don't need to be escaped in Java string literals in general but in regular expressions they need to be escaped as they have special meaning in regular expressions. The $ matches the end of a line and { is used for matching characters a certain amount of times. To match any of the regular expression special characters themselves these characters need to be escaped. For example A{5} matches AAAAA but A\{5 matches A{5.
To escape something in a regular expression string you use the \. But the backslash in string literals itself needs escaping which is done by another \. That is the String literal "\\{" actually corresponds to the string "\{".
This is why in regular expression string literals you will often encounter multiple backslashes. You might also want to take a look at Pattern.quote(String s) which takes a string and properly escapes all special characters (wrt. Java regular expressions).
Essentially instead of
private static final String TABLE_REGEX = "\\$\\{table\\}";
you could write
private static final String TABLE_REGEX = Pattern.quote("${table}");
In your example SQL_INSERT.replaceFirst(TABLE_REGEX, "tableName"); matches the first occurrence of ${table} in SQL_INSERT and replaces this occurrence with tableName:
String sql = "INSERT INTO ${table}(${keys}) VALUES(${values})".replaceFirst("\\$\\{table\\}", "tableName");
boolean test = sql.equals("INSERT INTO tableName(${keys}) VALUES(${values})");
System.out.println(test); // will print 'true'
$ or "$" is the dollar sign / a string containg it.
\$ is an escaped dollar sign, normally found in raw regex if you want to match the char $ instead of the end of the line
"\\$" is a String containing an escaped \ followed by a normale $. Since you are not writing a raw regex, but the regex is inside a Java String you need to escape the \ so that when the regex interpreter comes along it just sees a normal \ which it then treats as escaping the following $.
"\$" is not valid because from a normal String point of view a $ is nothing special and does not need to / must not be escaped.
i would like to understand how.
It is replacing the first match of the regex "\\$\\{table\\}" in the original string "INSERT INTO ${table}(${keys}) VALUES(${values})" with "tableName".
$ and { symbols should be escaped in java string
using backslash but in above string there is no backslash and if I try
to add it, it shows error: invalid escape sequence.
No, ${} are not escaped in a Java string, why would they?
Also why the TABLE_REGEX = "\$\{table\}"; contains double
backslash?
In Java escaping is done by double backslash because single backslash indicates special character (e.g. \n, \t). It is escaping ${} symbols because these symbols have a special meaning in a regex, so escaping them tells the Java regex engine to treat them literally as those symbols and not their special meaning.

difference between '\' and '\\' in while using it as escape characters

I know that we use escape characters like \n for next line and \t for tab.
But today while working on few string I came across \\$.
I had to print "nike$" so to print it I had to modify the string as "nike\\$".
I want to know what is the exact difference between \ and \\.
Inside a string literal, \ is an escape: The next character that follows tells us what it will do, as in your \n example for newline.
This means you can't put \ in a string on its own, since it's half of an escape sequence. Instead, to have a \ actually in a string, you use \\.
I had to print "nike$" so to print it I had to modify the string as "nike\\$"
"nike\\$" will result in a string that outputs (for instance, via System.out.println) as nike\$, not nike$.
Your use of \\$ suggests to me that you were feeding a regular expression pattern into something, e.g.:
p = Pattern.compile("nike\\$");
In that situation, we have two levels of escaping going on: The string literal, and the regular expression. To have a literal $ in a regular expression, it has to be escaped by \ because otherwise it's an end-of-input assertion. To get that \$ actually to the regular expression parser when using a string literal, we have to escape the backslash in the literal so we actually have a backslash in the string for the regular expression engine to see, thus \\$.

java regex escape sequences

I was wondering about regex in Java and stumbled upon the use of backslashes. For instance, if I wanted to look for occurences of the words "this regex" in a text, I would do something like this:
Pattern.compile("this regex");
Nonetheless, I could also do something like this:
Pattern.compile("this\\sregex");
My question is: what is the difference between the two of them? And why do I have to type the backslash twice, I mean, why isn't \s an escape sequence in Java? Thanks in advance!
\s means any whitespace character, including tab, line feed and carriage return.
Java string literals already use \ to escape special characters. To put the character \ in a string literal, you need to write "\\". However regex patterns also use \ as their escape character, and the way to put that into a string literal is to use two, because it goes through two separate escaping processes. If you read your regex pattern from a plain text file for example, you won't need double escaping.
The reason you need two backslashes is that when you enter a regex string in Java code you are actually dealing with two parsers:
The first is the Java compiler, which is converting your string literal to a Java String.
The second is the regex parser, which is interpreting your regex, after it has been converted to a Java string and then passed to the regex parse when you call Pattern.compile.
So when you input "this\\sregex", it will be converted to the Java string "this\sregex" by the Java compiler. Then when you call Pattern.compile with the string, the backslash will be interpreted by the regex compiler as a special character.
The difference is that \s denotes a whitespace character, which can be more than just a blank space. It can be a tab, newline, line feed, to name a few.

Can't use Regex in Java because of escape sequence error, how to remove the error

I have this regex :
^(([A-Z]:)|((\\|/){1,2}\w+)\$?)((\\|/)(\w[\w ]*.*))+\.([txt|exe]+)$
but every time I assign it to any string, Eclipse returns me invalid escape sequences, I have inserted a backward slash but it gives me the same error.
How to assign the above expression to string in java?
Replace all "\\" with "\\\\". Java has no language support for regular expressions. So you'll need "\\" to get a backslash from the Compiler into the String. If the regular expression shall contain an escaped backslash, you need "\\\\".
final String re = "^(([A-Z]:)|((\\\\|/){1,2}\\w+)\\$?)((\\\\|/)(\\w[\\w ]*.*))+\\.([txt|exe]+)$"
Try the following:
String regex = "^(([A-Z]:)|((\\\\|/){1,2}\\w+)\\$?)((\\\\|/)(\\w[\\w ]*.*))+\\.([txt|exe]+)$";
The backslash character itself needs to be escaped as well, so you would end up with four \ characters.

Java regular expressions and dollar sign

I have Java string:
String b = "/feedback/com.school.edu.domain.feedback.Review$0/feedbackId");
I also have generated pattern against which I want to match this string:
String pattern = "/feedback/com.school.edu.domain.feedback.Review$0(.)*";
When I say b.matches(pattern) it returns false. Now I know dollar sign is part of Java RegEx, but I don't know how should my pattern look like. I am assuming that $ in pattern needs to be replaced by some escape characters, but don't know how many. This $ sign is important to me as it helps me distinguish elements in list (numbers after dollar), and I can't go without it.
Use
String escapedString = java.util.regex.Pattern.quote(myString)
to automatically escape all special regex characters in a given string.
You need to escape $ in the regex with a back-slash (\), but as a back-slash is an escape character in strings you need to escape the back-slash itself.
You will need to escape any special regex char the same way, for example with ".".
String pattern = "/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
In Java regex both . and $ are special. You need to escape it with 2 backslashes, i.e..
"/feedback/com\\.navtag\\.etc\\.Review\\$0(.*)"
(1 backslash is for the Java string, and 1 is for the regex engine.)
Escape the dollar with \
String pattern =
"/feedback/com.navteq.lcms.common.domain.poi.feedback.Review\\$0(.)*";
I advise you to escape . as well, . represent any character.
String pattern =
"/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
The ans by #Colin Hebert and edited by #theon is correct. The explanation is as follows. #azec-pdx
It is a regex as a string literal (within double quotes).
period (.) and dollar-sign ($) are special regex characters (metacharacters).
To make the regex engine interpret them as normal regex characters period(.) and dollar-sign ($), you need to prefix a single backslash to each. The single backslash ( itself a special regex character) quotes the character following it and thus escaping it.
Since the given regex is a string literal, another backslash is required to be prefixed to each to avoid confusion with the usual visible-ASCII escapes(character, string and Unicode escapes in string literals) and thus avoid compiler error.
Even if you use within a string literal any special regex construct that has been defined as an escape sequence, it needs to be prefixed with another backslash to avoid compiler error.For example, the special regex construct (an escape sequence) \b (word boundary) of regex would clash with \b(backspace) of the usual visible-ASCII escape(character escape). Thus another backslash is prefixed to avoid the clash and then \\b would be read by regex as word boundary.
To be always safe, all single backslash escapes (quotes) within string literals are prefixed with another backslash. For example, the string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
The last period (.)* is supposed to be interpreted as special regex character and thus it needs no quoting by a backslash, let alone prefixing a second one.

Categories