why string.matches["+-*/"] will report the pattern exception? - java

I have this code:
public static void main(String[] args) {
String et1 = "test";
String et2 = "test";
et1.matches("[-+*/]"); //works fine
et2.matches("[+-*/]"); //java.util.regex.PatternSyntaxException, why?
}
Because '-' is escape character? But why it will works fine, if '-' switchs with '+' ?

it is because - is used to define a range of characters in a character class. Since + is after * in the ascii table, the range has no sense, and you obtain an error.
To have a literal - in the middle of a character class, you must escape it. There is no problem if the - is at the begining or at the end of the class because it's unambigous.
An other situation where you don't need to escape the - is when you have a character class shortcut before, example:
[\\d-abc]
(other regex engines like pcre allows the same when the character class shortcut is placed after [abc-\d], but Java doesn't seem to allow this.)

- inside a character class (the [xxx]) is used to define a range, for example: [a-z] for all lower case characters. If you want to actually mean "dash", it has to be in first or last position. I generally place it first to avoid any confusions.
Alternatively you can escape it: [+\\-*/].

Just FYI, the Java regular expression meta characters are defined here:
The metacharacters supported by this API are: <([{\^-=$!|]})?*+.>
As a general rule, to save myself from regexp debugging headaches, if I want to use any of these characters as a literal then I precede them with a \ (Or \\ inside of a Java String expression).
Either:
et2.matches("[\\+\\-\\*/]");
Or:
et2.matches("[\\-\\+\\*/]");
Will work regardless of order.

I think you should use: [\-\+\*/]
Because: '-' to define range, eg: [a-d] it's mean: a,b,c,d

Related

string validation over regular expressions in java

How to validate the given string over the regular expression (XSD Pattern):
xsd pattern:'([a-zA-Z0-9.,;:'+-/()?*[]{}\`´~
]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*'
I need to validate the string with above pattern whether it matches or not.
I have tried the below code but getting unsupported escape characters error while compiling
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PatternMatching {
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9\.,;:'\+\-/\(\)?\*\[\]\{\}\\`´~ ]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
public static boolean validateUserName(String userName){
Matcher mtch = usrNamePtrn.matcher(userName);
if(mtch.matches()){
return true;
}
return false;
}
public static void main(String a[]){
System.out.println("Is a valid username?"+validateUserName("stephen & john"));
}
}
how to do the above task, in addition to that if the doesn't match with the pattern then that characters need to be displayed.and I am using java 1.6 any suggestions is appreciated
First, the regular expression itself has three mistakes.
Mistake 1:
A backslash is a special character which is used to escape whatever character follows it. Therefore, the sequence
\`
is either identical to a single back-quote, or, depending on the regular expression engine, is an illegal escape sequence. Either way, if the intent was to match a backslash along with all the other characters, it should be written as:
\\`
Mistake 2:
Inside the […] character grouping, a ] must be escaped so it doesn’t signify the end of the grouping. So, [] needs to be written as [\].
Mistake 3:
Inside the […] character grouping, a - indicates a character range, like a-z. The regular expression [+-/] does not mean “plus or hyphen or slash”; it means “any of the characters between plus and slash, inclusive.” Technically, this mistake doesn’t affect the outcome in this particular case, because +-/ is equivalent to those three literal characters plus the comma and period, which both happen to occur earlier in the character grouping anyway. But, in the interest of saying what you mean, the - should be escaped:
+\-/
Second is the matter of turning the regular expression into a Java string.
The backslash and the double-quote are special characters in Java. Obviously, " denotes the start and end of a String literal, so if you want a " inside a String, you must escape it:
\"
This is not related to regular expressions; this just tells the compiler that the String contains a double-quote character. It will be compiled into a single " and that is what the regular expression engine will see.
Finally, there is the matter of backslashes. It just so happens that, while regular expressions use a backslash to escape characters as described above, Java also uses backslashes to escape characters in strings. This means that if you want a literal backslash in a Java String, it must be written in the code as two backslashes:
String s = "\\"; // a String of length 1
Recall from above that we need a regular expression with consecutive backslash characters:
\\`
A Java string containing those three characters would look like this:
String s = "\\\\`"; // a String of length 3
A regular expression allows a backslash almost anywhere; for instance, \% is the same as %. However, Java only allows specific characters to be preceded by a single backslash. \+ is not one of those permitted sequences.
+, (, ), {, and } are not special characters inside a […] grouping, so there is no need to escape them anyway.
So, your code needs to be changed from this:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9\.,;:'\+\-/\(\)?\*\[\]\{\}\\`´~ ]|[!"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
to this:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9.,;:'+\\-/()?*\\[\\]{}\\\\`´~ ]|[!\"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
This is because " is a special character in Java.
You'll have to substitute " with an escape character i.e. \" and \ with \\ as follows:
private static Pattern usrNamePtrn = Pattern.compile("([a-zA-Z0-9.,;:'+-/()?*[]{}\\`´~ ]|[!\"#%&<>÷=#_$£]|[àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ])*");
Note the change in the pattern below where " and \ have been replaced by \" and \\:
Also, note that this will only fix the Compile Issues. You need to re-check your Regex to see if it works fine.

Validating a mathematical expression in java

I am trying to validate if the string "expression" as in the code below is a formula.
String expression = request.getParameter(FORMULA);
if(!Pattern.matches("[a-zA-Z0-9+-*/()]", expression)){return new AjaxMessage(AjaxMessage.ResponseStatusEnum.FAILURE, getJsonString(, "Manager.invalid.formula" , null));
}
examples of value for expression are {a+b/2, (a+b)*2,(john-Max),etc} just for the context (the variable names in the formula might vary and the arithmetic expression contains only [+-/()*] special characters. As you can see I tried to validate using regex (new to regex), but I think it's not possible as I don't know the length of the variable names.
Is there a way to achieve a validation using regex or any other library in java?
Thanks in advance.
The reason is you are using characters with special meaning in regex. You need to escape those characters. I have just modified yor regex to make it work.
Code:
List<String> expressions = new ArrayList<String>();
expressions.add("a+b/2");
expressions.add("(a+b)*2");
expressions.add("john-Max");
expressions.add("etc[");
for (String expression : expressions) {
if (!Pattern.matches("[a-zA-Z0-9\\+\\-\\*/\\(\\)]*", expression)) {
System.out.println("NOT match");
} else {
System.out.println("MATCH");
}
}
}
OUTPUT:
MATCH
MATCH
MATCH
NOT match
You're using special character in your regex, you need to escape them using \.
It should look like [a-zA-Z0-9+\\-*/()] . This only tests one character you need to add a * at the end to test multiple characters.
Edit (thanks Toto): because [] tests a single character, it's called a character class (not like a Java class actually), so only the -is considered special here. For a regex without the braces, you would neeed to escape the other special characters.
Special characters have special meaning using regex and won't be interpreted as the character they are (for example parenthesis are used to make groups, * means 0 or more of the previous character, etc.).
About character class: https://docs.oracle.com/javase/tutorial/essential/regex/char_classes.html
More info:
http://www.regular-expressions.info/characters.html and
http://www.regular-expressions.info/refcharacters.html
I use this site to test my regexes (note that regex engine may vary !):
https://regex101.com/
As said in comment, a mathematic expression is more than just different characters, so if you want to validate, you'll have to do more manual checking.

Using Scanner.useDelimeter() in Java to isolate tokens in an expression

I am trying to isolate the words, brackets and => and <=> from the following input:
(<=>A B) OR (C AND D) AND(A AND C)
So far I've come to isolating just the words (see Scanner#useDelimeter()):
sc.useDelimeter("[^a-zA-Z]");
Upon using :
sc.useDelimeter("[\\s+a-zA-Z]");
I get the output just the brackets.
which I don't want but want AND ).
How do I do that? Doing \\s+ gives the same result.
Also, how is a delimiter different from regex? I'm familiar with regex in PHP. Is the notation used the same?
Output I want:
(
<=>
A
(and so on)
You need a delimitimg regex that can be zero width (because you have adjacent terms), so look-arounds are the only option. Try this:
sc.useDelimeter("((?<=[()>])\\s*)|(\\s*\\b\\s*)");
This regex says "after a bracket or greater-than or at a word boundary, discarding spaces"
Also note that the character class [\\s+a-zA-Z] includes the + character - most characters lose any special regex meaning when inside a character class. It seems you were trying to say "one or more spaces", but that's not how you do that.
Inside [] the ^ means 'not', so the first regex, [^a-zA-Z], says 'give me everything that's not a-z or A-Z'
The second regex, [\\s+a-zA-Z], says 'give me everything that is space, +, a-z or A-Z'. Note that "+" is a literal plus sign when in a character class.

Java regular expression: how to include '-'

I am using this pattern and matching a string.
String s = "//name:value /name:value";
if (s.matches("(//?\\s*\\w+:\\w+\\s*)+")) {
// it fits
}
This works properly.
But if I want to have a string like "/name-or-address:value/name-or-address:value" which has this '-' in second part, it doesn't work.
I am using \w to match A-Za-z_, but how can I include - in that?
Use [\w-] to combine both \w and -.
Note that - should always be at the beginning or end of a character class, otherwise it will be interpreted as defining a range of characters (for instance, [a-z] is the range of characters from a to z, whereas [az-] is the three characters a,z,and-).
I don't know if it answers your question but why not replacing \w+ with (\w|-)+ or [\w-]+ ?
[-\w] (Or in a string, [-\\w].)
How about
if (s.matches("/(/|\\w|-|:\\w)+")) {

Unescaped "." still matches when used in a negation group

I made, what I believed to be, an error in a regular expression in Java recently but when I test my code I don't get the error I expect.
The expression I created was meant to replace a password in a string that I received from another source. The pattern I used went along the lines of: "password: [^\\s.]*", the idea being that it would match the word "password" the colon, a space, then any characters except for a space or a full-stop (period). I would then replace the instance with "password: XXXXXX" and therefore mask it.
The obvious error should be that I have forgotten to escape the full-stop. In otherwords the proper expression should have been "password: [^\\s\\.]*". Thing is, if I don't escape the full-stop the code still works!
Here's some sample code:
import java.util.regex.*;
public class SimpleRegexTest {
public static void main(String[] args) {
Pattern simplePattern = Pattern.compile("password: [^\\s.]*");
Matcher simpleMatcher = simplePattern.matcher("password: newpass. Enjoy.");
String maskedString = simpleMatcher.replaceAll("password: XXXXXX");
System.out.println(maskedString);
}
}
When I run the above code I get the following output:
password: XXXXXX. Enjoy.
Is this a special case, or have I completely missed something?
(edit: changed to "escape the full-stop")
Michael Borgwardt: I couldn't think of another term to describe what I was doing apart from "negation group", sorry for the ambiguity.
Aviator: In this case, no, a space won't be in the password. I didn't make the rules ;-).
(edit: doubled up the slashes in the non-code text so it displays properly, added the ^ which was in the code, but not the text :-/)
Sundar: Fixed the double slashes, SO seems to have it's own escape characters.
A period ('.' character) does not need to be escaped inside a character class [] in a regular expression.
From the API:
Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter.
It looks like you got the negation operator mixed up for regex ranges.
In particular, my understanding is that you used the snippet [\s.]* to mean "any characters except for a space or a full-stop (period)." This would in fact be expressed as [^ .]*, using the caret to negate the characters in the set.
I don't know if this was just a typo in your post or what was actually in your code, but the regex as it stands in your question will match the word "password", a colon, a space, then any sequence of backslash characters, "s" characters or periods.

Categories