Regular Expressions with double backslash in java - java

I want to understand the concept of regular expression in below code:
private static final String SQL_INSERT = "INSERT INTO ${table}(${keys})
VALUES(${values})";
private static final String TABLE_REGEX = "\\$\\{table\\}";
.
.
.
String query = SQL_INSERT.replaceFirst(TABLE_REGEX, "tableName");
The above code is working fine but i would like to understand how. As per my knowledge $ and { symbols should be escaped in java string using backslash but in above string there is no backslash and if I try to add it, it shows error: invalid escape sequence.
Also why the TABLE_REGEX = "\\$\\{table\\}"; contains double backslash?

The $ and { don't need to be escaped in Java string literals in general but in regular expressions they need to be escaped as they have special meaning in regular expressions. The $ matches the end of a line and { is used for matching characters a certain amount of times. To match any of the regular expression special characters themselves these characters need to be escaped. For example A{5} matches AAAAA but A\{5 matches A{5.
To escape something in a regular expression string you use the \. But the backslash in string literals itself needs escaping which is done by another \. That is the String literal "\\{" actually corresponds to the string "\{".
This is why in regular expression string literals you will often encounter multiple backslashes. You might also want to take a look at Pattern.quote(String s) which takes a string and properly escapes all special characters (wrt. Java regular expressions).
Essentially instead of
private static final String TABLE_REGEX = "\\$\\{table\\}";
you could write
private static final String TABLE_REGEX = Pattern.quote("${table}");
In your example SQL_INSERT.replaceFirst(TABLE_REGEX, "tableName"); matches the first occurrence of ${table} in SQL_INSERT and replaces this occurrence with tableName:
String sql = "INSERT INTO ${table}(${keys}) VALUES(${values})".replaceFirst("\\$\\{table\\}", "tableName");
boolean test = sql.equals("INSERT INTO tableName(${keys}) VALUES(${values})");
System.out.println(test); // will print 'true'

$ or "$" is the dollar sign / a string containg it.
\$ is an escaped dollar sign, normally found in raw regex if you want to match the char $ instead of the end of the line
"\\$" is a String containing an escaped \ followed by a normale $. Since you are not writing a raw regex, but the regex is inside a Java String you need to escape the \ so that when the regex interpreter comes along it just sees a normal \ which it then treats as escaping the following $.
"\$" is not valid because from a normal String point of view a $ is nothing special and does not need to / must not be escaped.

i would like to understand how.
It is replacing the first match of the regex "\\$\\{table\\}" in the original string "INSERT INTO ${table}(${keys}) VALUES(${values})" with "tableName".
$ and { symbols should be escaped in java string
using backslash but in above string there is no backslash and if I try
to add it, it shows error: invalid escape sequence.
No, ${} are not escaped in a Java string, why would they?
Also why the TABLE_REGEX = "\$\{table\}"; contains double
backslash?
In Java escaping is done by double backslash because single backslash indicates special character (e.g. \n, \t). It is escaping ${} symbols because these symbols have a special meaning in a regex, so escaping them tells the Java regex engine to treat them literally as those symbols and not their special meaning.

Related

Check whether the string contains backslash or not?

In Java, \' denotes a single quotation mark (single quote) character, and \" denotes a double quotation mark (double quote) character.
So, String s = "I\'m a human."; works well.
However, String s = "I'm a human." does not make any compile errors, either.
Likewise, char c = '\"'; works, but char c = '"'; also works.
But I need to detect whether the string contains backslash or not:
"abcd'" does not contain backslash
"abcd\'" contains backslash.
I need to distinguish whether the string contains backslash or not.
You can't. The're called escape sequences for a reason. For example, \n once put in a String, cannot match a literal \ against itself. It's gone. All that's left, is a new-line.
Remember \ is used to escape a character. It itself doesn't remain a part of the String.
However, you can check for a literal \ by doing a simple contains like
String s = "abcd\\";
System.out.println(s.contains("\\"));
"abcd\" is not a valid string in java.
Here java treated \" as an escape sequence character("). So, if you want to put a backslash in a string then you need to use \ with escape sequence character.
String "abcd\'" has not contained backslash character. It has an escape sequence character \'.
Escape characters (also called escape sequences or escape codes) in
general are used to signal an alternative interpretation of a series
of characters. In Java, a character preceded by a backslash (\) is an
escape sequence and has special meaning to the java compiler.
When an escape sequence is encountered in a print statement, the
compiler interprets it accordingly. For example, if you want to put
quotes within quotes you must use the escape sequence, \", on the
interior quotes. To print the sentence: She said "Hello!" to me. you
should write:
System.out.println("She said \"Hello!\" to me.");
// Java program to illustrate to find a character
// in the string.
import java.io.*;
public static void main (String[] args)
{
// This is a string in which a character
// to be searched.
String str = "gee\\k";
// Returns index of first occurrence of character.
int firstIndex = str.indexOf('\\');
System.out.println("First occurrence of char '\\'" +
" is found at : " + firstIndex);
}
if(string.contains("\\")){
//TODO do your code here
}
\ is used as for escape sequence in Java.
If you want to print backslash in the string you just have to print "abcd\\".
For your example it would be:
boolean containsBs = "abcd\\".contains("\\");
When you are using Strings you do not need to use the escape character(backslash) for single quotation marks. Likewise when using char you do not need to escape the double quotation mark.
String use double quotation mark while chars use single quotation mark. You need to use the escape character for double quote in Strings and for simple quote in chars.
String ex="I'm an example";
String ex2="My name is \"example\"";
char c='"';
char c2='\'';
If you want to find out if a String contains backslash
String ex="abcd";
String ex2="abcd\\";
ex.contains("\\"); //false
ex.contains("\\"); //true
The first backslash is for escaping and the second is the character.

string validation over regular expressions in java

How to validate the given string over the regular expression (XSD Pattern):
xsd pattern:'([a-zA-Z0-9.,;:'+-/()?*[]{}\`´~
]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*'
I need to validate the string with above pattern whether it matches or not.
I have tried the below code but getting unsupported escape characters error while compiling
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PatternMatching {
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9\.,;:'\+\-/\(\)?\*\[\]\{\}\\`´~ ]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
public static boolean validateUserName(String userName){
Matcher mtch = usrNamePtrn.matcher(userName);
if(mtch.matches()){
return true;
}
return false;
}
public static void main(String a[]){
System.out.println("Is a valid username?"+validateUserName("stephen & john"));
}
}
how to do the above task, in addition to that if the doesn't match with the pattern then that characters need to be displayed.and I am using java 1.6 any suggestions is appreciated
First, the regular expression itself has three mistakes.
Mistake 1:
A backslash is a special character which is used to escape whatever character follows it. Therefore, the sequence
\`
is either identical to a single back-quote, or, depending on the regular expression engine, is an illegal escape sequence. Either way, if the intent was to match a backslash along with all the other characters, it should be written as:
\\`
Mistake 2:
Inside the […] character grouping, a ] must be escaped so it doesn’t signify the end of the grouping. So, [] needs to be written as [\].
Mistake 3:
Inside the […] character grouping, a - indicates a character range, like a-z. The regular expression [+-/] does not mean “plus or hyphen or slash”; it means “any of the characters between plus and slash, inclusive.” Technically, this mistake doesn’t affect the outcome in this particular case, because +-/ is equivalent to those three literal characters plus the comma and period, which both happen to occur earlier in the character grouping anyway. But, in the interest of saying what you mean, the - should be escaped:
+\-/
Second is the matter of turning the regular expression into a Java string.
The backslash and the double-quote are special characters in Java. Obviously, " denotes the start and end of a String literal, so if you want a " inside a String, you must escape it:
\"
This is not related to regular expressions; this just tells the compiler that the String contains a double-quote character. It will be compiled into a single " and that is what the regular expression engine will see.
Finally, there is the matter of backslashes. It just so happens that, while regular expressions use a backslash to escape characters as described above, Java also uses backslashes to escape characters in strings. This means that if you want a literal backslash in a Java String, it must be written in the code as two backslashes:
String s = "\\"; // a String of length 1
Recall from above that we need a regular expression with consecutive backslash characters:
\\`
A Java string containing those three characters would look like this:
String s = "\\\\`"; // a String of length 3
A regular expression allows a backslash almost anywhere; for instance, \% is the same as %. However, Java only allows specific characters to be preceded by a single backslash. \+ is not one of those permitted sequences.
+, (, ), {, and } are not special characters inside a […] grouping, so there is no need to escape them anyway.
So, your code needs to be changed from this:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9\.,;:'\+\-/\(\)?\*\[\]\{\}\\`´~ ]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
to this:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9.,;:'+\\-/()?*\\[\\]{}\\\\`´~ ]|[!\"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
This is because " is a special character in Java.
You'll have to substitute " with an escape character i.e. \" and \ with \\ as follows:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9.,;:'+-/()?*[]{}\\`´~ ]|[!\"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
Note the change in the pattern below where " and \ have been replaced by \" and \\:
Also, note that this will only fix the Compile Issues. You need to re-check your Regex to see if it works fine.

Can't use Regex in Java because of escape sequence error, how to remove the error

I have this regex :
^(([A-Z]:)|((\\|/){1,2}\w+)\$?)((\\|/)(\w[\w ]*.*))+\.([txt|exe]+)$
but every time I assign it to any string, Eclipse returns me invalid escape sequences, I have inserted a backward slash but it gives me the same error.
How to assign the above expression to string in java?
Replace all "\\" with "\\\\". Java has no language support for regular expressions. So you'll need "\\" to get a backslash from the Compiler into the String. If the regular expression shall contain an escaped backslash, you need "\\\\".
final String re = "^(([A-Z]:)|((\\\\|/){1,2}\\w+)\\$?)((\\\\|/)(\\w[\\w ]*.*))+\\.([txt|exe]+)$"
Try the following:
String regex = "^(([A-Z]:)|((\\\\|/){1,2}\\w+)\\$?)((\\\\|/)(\\w[\\w ]*.*))+\\.([txt|exe]+)$";
The backslash character itself needs to be escaped as well, so you would end up with four \ characters.

get regex to work in java

I have this regular expression pattern,
From: ["<][^>]*>
I need it to work in java and the double quotes is producing an error. When I try and escape it like so
From: [\"<][^>]*>
it does not produce the correct result. Does anyone know how to handle double quotes in java for regular expressions? Thanks
The \ character in Java String literals is a reserved escape character, so to add a regex escape character into a Java literal String object one must Escape the Escape :)
Eg. \\" will result in a regex of \" which will find double quote characters.
EDIT: One thing that I forgot was that the double quote character is also a reserved character for a Java string literalas well! Because of this the \ for the regex must be escaped as well as the " character.
The actual Java string literal will look like this String regex = "\\\"";

Java regular expressions and dollar sign

I have Java string:
String b = "/feedback/com.school.edu.domain.feedback.Review$0/feedbackId");
I also have generated pattern against which I want to match this string:
String pattern = "/feedback/com.school.edu.domain.feedback.Review$0(.)*";
When I say b.matches(pattern) it returns false. Now I know dollar sign is part of Java RegEx, but I don't know how should my pattern look like. I am assuming that $ in pattern needs to be replaced by some escape characters, but don't know how many. This $ sign is important to me as it helps me distinguish elements in list (numbers after dollar), and I can't go without it.
Use
String escapedString = java.util.regex.Pattern.quote(myString)
to automatically escape all special regex characters in a given string.
You need to escape $ in the regex with a back-slash (\), but as a back-slash is an escape character in strings you need to escape the back-slash itself.
You will need to escape any special regex char the same way, for example with ".".
String pattern = "/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
In Java regex both . and $ are special. You need to escape it with 2 backslashes, i.e..
"/feedback/com\\.navtag\\.etc\\.Review\\$0(.*)"
(1 backslash is for the Java string, and 1 is for the regex engine.)
Escape the dollar with \
String pattern =
"/feedback/com.navteq.lcms.common.domain.poi.feedback.Review\\$0(.)*";
I advise you to escape . as well, . represent any character.
String pattern =
"/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
The ans by #Colin Hebert and edited by #theon is correct. The explanation is as follows. #azec-pdx
It is a regex as a string literal (within double quotes).
period (.) and dollar-sign ($) are special regex characters (metacharacters).
To make the regex engine interpret them as normal regex characters period(.) and dollar-sign ($), you need to prefix a single backslash to each. The single backslash ( itself a special regex character) quotes the character following it and thus escaping it.
Since the given regex is a string literal, another backslash is required to be prefixed to each to avoid confusion with the usual visible-ASCII escapes(character, string and Unicode escapes in string literals) and thus avoid compiler error.
Even if you use within a string literal any special regex construct that has been defined as an escape sequence, it needs to be prefixed with another backslash to avoid compiler error.For example, the special regex construct (an escape sequence) \b (word boundary) of regex would clash with \b(backspace) of the usual visible-ASCII escape(character escape). Thus another backslash is prefixed to avoid the clash and then \\b would be read by regex as word boundary.
To be always safe, all single backslash escapes (quotes) within string literals are prefixed with another backslash. For example, the string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
The last period (.)* is supposed to be interpreted as special regex character and thus it needs no quoting by a backslash, let alone prefixing a second one.

Categories