string validation over regular expressions in java - java

How to validate the given string over the regular expression (XSD Pattern):
xsd pattern:'([a-zA-Z0-9.,;:'+-/()?*[]{}\`´~
]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*'
I need to validate the string with above pattern whether it matches or not.
I have tried the below code but getting unsupported escape characters error while compiling
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PatternMatching {
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9\.,;:'\+\-/\(\)?\*\[\]\{\}\\`´~ ]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
public static boolean validateUserName(String userName){
Matcher mtch = usrNamePtrn.matcher(userName);
if(mtch.matches()){
return true;
}
return false;
}
public static void main(String a[]){
System.out.println("Is a valid username?"+validateUserName("stephen & john"));
}
}
how to do the above task, in addition to that if the doesn't match with the pattern then that characters need to be displayed.and I am using java 1.6 any suggestions is appreciated

First, the regular expression itself has three mistakes.
Mistake 1:
A backslash is a special character which is used to escape whatever character follows it. Therefore, the sequence
\`
is either identical to a single back-quote, or, depending on the regular expression engine, is an illegal escape sequence. Either way, if the intent was to match a backslash along with all the other characters, it should be written as:
\\`
Mistake 2:
Inside the […] character grouping, a ] must be escaped so it doesn’t signify the end of the grouping. So, [] needs to be written as [\].
Mistake 3:
Inside the […] character grouping, a - indicates a character range, like a-z. The regular expression [+-/] does not mean “plus or hyphen or slash”; it means “any of the characters between plus and slash, inclusive.” Technically, this mistake doesn’t affect the outcome in this particular case, because +-/ is equivalent to those three literal characters plus the comma and period, which both happen to occur earlier in the character grouping anyway. But, in the interest of saying what you mean, the - should be escaped:
+\-/
Second is the matter of turning the regular expression into a Java string.
The backslash and the double-quote are special characters in Java. Obviously, " denotes the start and end of a String literal, so if you want a " inside a String, you must escape it:
\"
This is not related to regular expressions; this just tells the compiler that the String contains a double-quote character. It will be compiled into a single " and that is what the regular expression engine will see.
Finally, there is the matter of backslashes. It just so happens that, while regular expressions use a backslash to escape characters as described above, Java also uses backslashes to escape characters in strings. This means that if you want a literal backslash in a Java String, it must be written in the code as two backslashes:
String s = "\\"; // a String of length 1
Recall from above that we need a regular expression with consecutive backslash characters:
\\`
A Java string containing those three characters would look like this:
String s = "\\\\`"; // a String of length 3
A regular expression allows a backslash almost anywhere; for instance, \% is the same as %. However, Java only allows specific characters to be preceded by a single backslash. \+ is not one of those permitted sequences.
+, (, ), {, and } are not special characters inside a […] grouping, so there is no need to escape them anyway.
So, your code needs to be changed from this:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9\.,;:'\+\-/\(\)?\*\[\]\{\}\\`´~ ]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
to this:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9.,;:'+\\-/()?*\\[\\]{}\\\\`´~ ]|[!\"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");

This is because " is a special character in Java.
You'll have to substitute " with an escape character i.e. \" and \ with \\ as follows:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9.,;:'+-/()?*[]{}\\`´~ ]|[!\"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
Note the change in the pattern below where " and \ have been replaced by \" and \\:
Also, note that this will only fix the Compile Issues. You need to re-check your Regex to see if it works fine.

Related

Regular Expressions with double backslash in java

I want to understand the concept of regular expression in below code:
private static final String SQL_INSERT = "INSERT INTO ${table}(${keys})
VALUES(${values})";
private static final String TABLE_REGEX = "\\$\\{table\\}";
.
.
.
String query = SQL_INSERT.replaceFirst(TABLE_REGEX, "tableName");
The above code is working fine but i would like to understand how. As per my knowledge $ and { symbols should be escaped in java string using backslash but in above string there is no backslash and if I try to add it, it shows error: invalid escape sequence.
Also why the TABLE_REGEX = "\\$\\{table\\}"; contains double backslash?
The $ and { don't need to be escaped in Java string literals in general but in regular expressions they need to be escaped as they have special meaning in regular expressions. The $ matches the end of a line and { is used for matching characters a certain amount of times. To match any of the regular expression special characters themselves these characters need to be escaped. For example A{5} matches AAAAA but A\{5 matches A{5.
To escape something in a regular expression string you use the \. But the backslash in string literals itself needs escaping which is done by another \. That is the String literal "\\{" actually corresponds to the string "\{".
This is why in regular expression string literals you will often encounter multiple backslashes. You might also want to take a look at Pattern.quote(String s) which takes a string and properly escapes all special characters (wrt. Java regular expressions).
Essentially instead of
private static final String TABLE_REGEX = "\\$\\{table\\}";
you could write
private static final String TABLE_REGEX = Pattern.quote("${table}");
In your example SQL_INSERT.replaceFirst(TABLE_REGEX, "tableName"); matches the first occurrence of ${table} in SQL_INSERT and replaces this occurrence with tableName:
String sql = "INSERT INTO ${table}(${keys}) VALUES(${values})".replaceFirst("\\$\\{table\\}", "tableName");
boolean test = sql.equals("INSERT INTO tableName(${keys}) VALUES(${values})");
System.out.println(test); // will print 'true'
$ or "$" is the dollar sign / a string containg it.
\$ is an escaped dollar sign, normally found in raw regex if you want to match the char $ instead of the end of the line
"\\$" is a String containing an escaped \ followed by a normale $. Since you are not writing a raw regex, but the regex is inside a Java String you need to escape the \ so that when the regex interpreter comes along it just sees a normal \ which it then treats as escaping the following $.
"\$" is not valid because from a normal String point of view a $ is nothing special and does not need to / must not be escaped.
i would like to understand how.
It is replacing the first match of the regex "\\$\\{table\\}" in the original string "INSERT INTO ${table}(${keys}) VALUES(${values})" with "tableName".
$ and { symbols should be escaped in java string
using backslash but in above string there is no backslash and if I try
to add it, it shows error: invalid escape sequence.
No, ${} are not escaped in a Java string, why would they?
Also why the TABLE_REGEX = "\$\{table\}"; contains double
backslash?
In Java escaping is done by double backslash because single backslash indicates special character (e.g. \n, \t). It is escaping ${} symbols because these symbols have a special meaning in a regex, so escaping them tells the Java regex engine to treat them literally as those symbols and not their special meaning.

Why split method does not support $,* etc delimiter to split string

import java.util.StringTokenizer;
class MySplit
{
public static void main(String S[])
{
String settings = "12312$12121";
StringTokenizer splitedArray = new StringTokenizer(settings,"$");
String splitedArray1[] = settings.split("$");
System.out.println(splitedArray1[0]);
while(splitedArray.hasMoreElements())
System.out.println(splitedArray.nextToken().toString());
}
}
In above example if i am splitting string using $, then it is not working fine and if i am splitting with other symbol then it is working fine.
Why it is, if it support only regex expression then why it is working fine for :, ,, ; etc symbols.
$ has a special meaning in regex, and since String#split takes a regex as an argument, the $ is not interpreted as the string "$", but as the special meta character $. One sexy solution is:
settings.split(Pattern.quote("$"))
Pattern#quote:
Returns a literal pattern String for the specified String.
... The other solution would be escaping $, by adding \\:
settings.split("\\$")
Important note: It's extremely important to check that you actually got element(s) in the resulted array.
When you do splitedArray1[0], you could get ArrayIndexOutOfBoundsException if there's no $ symbol. I would add:
if (splitedArray1.length == 0) {
// return or do whatever you want
// except accessing the array
}
If you take a look at the Java docs you could see that the split method take a regex as parameter, so you have to write a regular expression not a simple character.
In regex $ has a specific meaning, so you have to escape it this way:
settings.split("\\$");
The problem is that the split(String str) method expects str to be a valid regular expression. The characters you have mentioned are special characters in regular expression syntax and thus perform a special operation.
To make the regular expression engine take them literally, you would need to escape them like so:
.split("\\$")
Thus given this:
String str = "This is 1st string.$This is the second string";
for(String string : str.split("\\$"))
System.out.println(string);
You end up with this:
This is 1st string.
This is the second strin
Dollar symbol $ is a special character in Java regex. You have to escape it so as to get it working like this:
settings.split("\\$");
From the String.split docs:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
On a side note:
Have a look at the Pattern class which will give you an idea as to which all characters you need to escape.
Because $ is a special character used in Regular Expressions which indicate the beginning of an expression.
You should escape it using the escape sequence \$ and in case of Java it should be \$
Hope that helps.
Cheers

A simple scenario in Java where replacing a single character with a back slash requires four back slashes

Let's consider the following code snippet in Java.
package escape;
final public class Main
{
public static void main(String[] args)
{
String s = "abc/xyz";
System.out.println(s.replaceAll("/", "\\\\"));
}
}
I just want to replace "/" with "\" in the above String abc/xyz and which is done and displays abc\xyz as expected but I couldn't get why it requires back slashes four times. It looks like two back slashes are sufficient. Why such is not a case?
The reason is that String.replaceAll uses regular expressions (and actually calls Matcher.replaceAll which does document this). In regular expressions you have to escape the '\' also in string literals you have to escape the '\'. Your 4 slashes are two slashes in the java string. And thereby an escaped slash in the regular expression.
You need to escape back slash once\\ for java String and one more time\\ for regex replacement string.
From the JavaDoc:
Note that backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired
System.out.println(s.replaceAll("/", Matcher.quoteReplacement("\\")));
I just want to replace "/" with "\"
Then you should not be using a regular expression, which is overkill, and requires backslashes to be escaped (twice). Instead do
string.replace('/', '\\');
(Still need to escape it once)
Referring to the Java documentation, if you use replaceAll, Java will treat the first parameter as a RegEx, and will assess special meaning for backslashes in the replacement string. Basically, \1 would refer to the first matching glob in the regex... In this case you need to escape the backslashes so they're "litteral" backslashes for the String, and then you need to escape these a second time so that replaceAll doesn't try to treat them with a special meaning.

Java regular expressions and dollar sign

I have Java string:
String b = "/feedback/com.school.edu.domain.feedback.Review$0/feedbackId");
I also have generated pattern against which I want to match this string:
String pattern = "/feedback/com.school.edu.domain.feedback.Review$0(.)*";
When I say b.matches(pattern) it returns false. Now I know dollar sign is part of Java RegEx, but I don't know how should my pattern look like. I am assuming that $ in pattern needs to be replaced by some escape characters, but don't know how many. This $ sign is important to me as it helps me distinguish elements in list (numbers after dollar), and I can't go without it.
Use
String escapedString = java.util.regex.Pattern.quote(myString)
to automatically escape all special regex characters in a given string.
You need to escape $ in the regex with a back-slash (\), but as a back-slash is an escape character in strings you need to escape the back-slash itself.
You will need to escape any special regex char the same way, for example with ".".
String pattern = "/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
In Java regex both . and $ are special. You need to escape it with 2 backslashes, i.e..
"/feedback/com\\.navtag\\.etc\\.Review\\$0(.*)"
(1 backslash is for the Java string, and 1 is for the regex engine.)
Escape the dollar with \
String pattern =
"/feedback/com.navteq.lcms.common.domain.poi.feedback.Review\\$0(.)*";
I advise you to escape . as well, . represent any character.
String pattern =
"/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
The ans by #Colin Hebert and edited by #theon is correct. The explanation is as follows. #azec-pdx
It is a regex as a string literal (within double quotes).
period (.) and dollar-sign ($) are special regex characters (metacharacters).
To make the regex engine interpret them as normal regex characters period(.) and dollar-sign ($), you need to prefix a single backslash to each. The single backslash ( itself a special regex character) quotes the character following it and thus escaping it.
Since the given regex is a string literal, another backslash is required to be prefixed to each to avoid confusion with the usual visible-ASCII escapes(character, string and Unicode escapes in string literals) and thus avoid compiler error.
Even if you use within a string literal any special regex construct that has been defined as an escape sequence, it needs to be prefixed with another backslash to avoid compiler error.For example, the special regex construct (an escape sequence) \b (word boundary) of regex would clash with \b(backspace) of the usual visible-ASCII escape(character escape). Thus another backslash is prefixed to avoid the clash and then \\b would be read by regex as word boundary.
To be always safe, all single backslash escapes (quotes) within string literals are prefixed with another backslash. For example, the string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
The last period (.)* is supposed to be interpreted as special regex character and thus it needs no quoting by a backslash, let alone prefixing a second one.

How to replace a special character with single slash

I have a question about strings in Java. Let's say, I have a string like so:
String str = "The . startup trace ?state is info?";
As the string contains the special character like "?" I need the string to be replaced with "\?" as per my requirement. How do I replace special characters with "\"? I tried the following way.
str.replace("?","\?");
But it gives a compilation error. Then I tried the following:
str.replace("?","\\?");
When I do this it replaces the special characters with "\\". But when I print the string, it prints with single slash. I thought it is taking single slash only but when I debugged I found that the variable is taking "\\".
Can anyone suggest how to replace the special characters with single slash ("\")?
On escape sequences
A declaration like:
String s = "\\";
defines a string containing a single backslash. That is, s.length() == 1.
This is because \ is a Java escape character for String and char literals. Here are some other examples:
"\n" is a String of length 1 containing the newline character
"\t" is a String of length 1 containing the tab character
"\"" is a String of length 1 containing the double quote character
"\/" contains an invalid escape sequence, and therefore is not a valid String literal
it causes compilation error
Naturally you can combine escape sequences with normal unescaped characters in a String literal:
System.out.println("\"Hey\\\nHow\tare you?");
The above prints (tab spacing may vary):
"Hey\
How are you?
References
JLS 3.10.6 Escape Sequences for Character and String Literals
See also
Is the char literal '\"' the same as '"' ?(backslash-doublequote vs only-doublequote)
Back to the problem
Your problem definition is very vague, but the following snippet works as it should:
System.out.println("How are you? Really??? Awesome!".replace("?", "\\?"));
The above snippet replaces ? with \?, and thus prints:
How are you\? Really\?\?\? Awesome!
If instead you want to replace a char with another char, then there's also an overload for that:
System.out.println("How are you? Really??? Awesome!".replace('?', '\\'));
The above snippet replaces ? with \, and thus prints:
How are you\ Really\\\ Awesome!
String API links
replace(CharSequence target, CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
replace(char oldChar, char newChar)
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
On how regex complicates things
If you're using replaceAll or any other regex-based methods, then things becomes somewhat more complicated. It can be greatly simplified if you understand some basic rules.
Regex patterns in Java is given as String values
Metacharacters (such as ? and .) have special meanings, and may need to be escaped by preceding with a backslash to be matched literally
The backslash is also a special character in replacement String values
The above factors can lead to the need for numerous backslashes in patterns and replacement strings in a Java source code.
It doesn't look like you need regex for this problem, but here's a simple example to show what it can do:
System.out.println(
"Who you gonna call? GHOSTBUSTERS!!!"
.replaceAll("[?!]+", "<$0>")
);
The above prints:
Who you gonna call<?> GHOSTBUSTERS<!!!>
The pattern [?!]+ matches one-or-more (+) of any characters in the character class [...] definition (which contains a ? and ! in this case). The replacement string <$0> essentially puts the entire match $0 within angled brackets.
Related questions
Having trouble with Splitting text. - discusses common mistakes like split(".") and split("|")
Regular expressions references
regular-expressions.info
Character class and Repetition with Star and Plus
java.util.regex.Pattern and Matcher
In case you want to replace ? with \?, there are 2 possibilities: replace and replaceAll (for regular expressions):
str.replace("?", "\\?")
str.replaceAll("\\?","\\\\?");
The result is "The . startup trace \?state is info\?"
If you want to replace ? with \, just remove the ? character from the second argument.
But when I print the string, it prints
with single slash.
Good. That's exactly what you want, isn't it?
There are two simple rules:
A backslash inside a String literal has to be specified as two to satisfy the compiler, i.e. "\". Otherwise it is taken as a special-character escape.
A backslash in a regular expresion has to be specified as two to satisfy regex, otherwise it is taken as a regex escape. Because of (1) this means you have to write 2x2=4 of them:"\\\\" (and because of the forum software I actually had to write 8!).
String str="\\";
str=str.replace(str,"\\\\");
System.out.println("New String="+str);
Out put:- New String=\
In java "\\" treat as "\". So, the above code replace a "\" single slash into "\\".

Categories