Regarding regex whitespace - java

I have a small query regarding representing the space in java regular Expression.
I want to restrict the name and for that i have defined an pattern as
Pattern DISPLAY_NAME_PATTERN = compile("^[a-zA-Z0-9_\\.!~*()=+$,-\s]{3,20}$");
but eclipse indicating it as error "Invalid escape sequence".It is saying it for "\s" which according to
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
is a valid predefined class.
What am i missing.Could anyone help me withit.
Thanks in advance.

You need to escape the \ in \s one more time. And also, you don't need to escape the . inside a character class. . and \\. inside a character class matches a literal dot.
Pattern DISPLAY_NAME_PATTERN = Pattern.compile("^[a-zA-Z0-9_.!~*()=+$,\\s-]{3,20}$");
And also put the - at the first or at the last inside the character class. Because - at the center of character class may act as a range operator. regex.PatternSyntaxException: Illegal character range exception is mainly because of this issue, that there isn't a range exists between the , and \\s
If you want to do a backslash match, then you need to escape it exactly three times.
Pattern DISPLAY_NAME_PATTERN = Pattern.compile("^[a-zA-Z0-9_.\\\\!~*()=+$,\\s-]{3,20}$");
Example:
System.out.println("foo-bar bar8998~*foo".matches("[a-zA-Z0-9_.\\\\!~*()=+$,\\s-]{3,20}")); // true
System.out.println("fo".matches("[a-zA-Z0-9_.\\\\!~*()=+$,\\s-]{3,20}")); // false

Related

mongo query that has regex returns null for string that contains special character as ^ [duplicate]

I am trying to find the following text in my string : '***'
the thing is that the C# Regex mechanism doesnt allow me to do the following:
new Regex("***", RegexOptions.CultureInvariant | RegexOptions.Compiled);
due to
ArgumentException: "parsing "*" - Quantifier {x,y} following nothing."
obviously it thinks that my stars represents regular expressions,
is there a way to tell the Regex mechanism to treat stars as just stars and nothing else?
* in Regex means:
Matches the previous element zero or more times.
so that, you need to use \* or [*] instead.
explain:
\
When followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character. For example, \* is the same as \x2A.
[ character_group ]
Matches any single character in character_group.
You need to escape the star with a backslash: #"\*"

Need a regular expression for field which should allow special characters, alphanumeric characters, and spaces

I am using the following regex:
[a-zA-Z0-9-#.()/%&\\s]{0,19}.
The requirement for the field is it should allow any thing and the field size should be 19.
Let me know if any corrections.Any help is appreciated.
You simply need to escape the special characters. Try:
[a-zA-Z0-9\-#\.\(\)\/%&\s]{0,19}
You can test your regular expressions on http://rubular.com/
Your regex is incorrect in at least one way - if you're considering a hyphen to be a "special character", then you should put it at the beginning or end of the range. So: [a-zA-Z0-9#.()/%&\s-]{0,19}.
Characters that are "special" within the context of the regex itself are often not parsed if they're inside a range. So you're fine with ., ( and ). But check your parser to make sure that it understands what \s means. It might be simpler just to put a space.
Also, if your regex parser tends to delimit the regex with slashes, then you may have to escape the slash in the middle of the range: [a-zA-Z0-9#.()\/%&\s-]{0,19}.
Just escape the dash - or put it at the begining or at the end of the character class:
[a-zA-Z0-9\\-#.()/%&\\s]{0,19}
or
[-a-zA-Z0-9#.()/%&\\s]{0,19}
or
[a-zA-Z0-9#.()/%&\\s-]{0,19}

regex help in java

I'm trying to compare following strings with regex:
#[xyz="1","2"'"4"] ------- valid
#[xyz] ------------- valid
#[xyz="a5","4r"'"8dsa"] -- valid
#[xyz="asd"] -- invalid
#[xyz"asd"] --- invalid
#[xyz="8s"'"4"] - invalid
The valid pattern should be:
#[xyz then = sign then some chars then , then some chars then ' then some chars and finally ]. This means if there is characters after xyz then they must be in format ="XXX","XXX"'"XXX".
Or only #[xyz]. No character after xyz.
I have tried following regex, but it did not worked:
String regex = "#[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"]";
Here the quotations (in part after xyz) are optional and number of characters between quotes are also not fixed and there could also be some characters before and after this pattern like asdadad #[xyz] adadad.
You can use the regex:
#\[xyz(?:="[a-zA-z0-9]+","[a-zA-z0-9]+"'"[a-zA-z0-9]+")?\]
See it
Expressed as Java string it'll be:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
What was wrong with your regex?
[...] defines a character class. When you want to match literal [ and ] you need to escape it by preceding with a \.
[a-zA-z][0-9] match a single letter followed by a single digit. But you want one or more alphanumeric characters. So you need [a-zA-Z0-9]+
Use this:
String regex = "#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\]";
When you write [a-zA-z][0-9] it expects a letter character and a digit after it. And you also have to escape first and last square braces because square braces have special meaning in regexes.
Explanation:
[a-zA-z0-9]+ means alphanumeric character (but not an underline) one or more times.
(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? means that expression in parentheses can be one time or not at all.
Since square brackets have a special meaning in regex, you used it by yourself, they define character classes, you need to escape them if you want to match them literally.
String regex = "#\\[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"\\]";
The next problem is with '"[a-zA-z][0-9]' you define "first a letter, second a digit", you need to join those classes and add a quantifier:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
See it here on Regexr
there could also be some characters before and after this pattern like
asdadad #[xyz] adadad.
Regex should be:
String regex = "(.)*#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\](.)*";
The First and last (.)* will allow any string before the pattern as you have mentioned in your edit. As said by #ademiban this (=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? will come one time or not at all. Other mistakes are also very well explained by Others +1 to all other.

Java regexp error: \( is not a valid character

I was using java regexp today and found that you are not allowed to use the following regexp sequence
String pattern = "[a-zA-Z\\s\\.-\\)\\(]*";
if I do use it it will fail and tell me that \( is not a valid character.
But if I change the regexp to
String pattern = "[[a-zA-Z\\s\\.-]|[\\(\\)]]*";
Then it will work. Is this a bug in the regxp engine or am I not understanding how to work with the engine?
EDIT: I've had an error in my string: there shouldnt be 2 starting [[, it should be only one. This is now corrected
Your regex has two problems.
You've not closed the character class.
The - is acting as a range operator with . on LHS and ( on RHS. But ( comes before . in unicode, so this results in an invalid range.
To fix problem 1, close the char class or if you meant to not include [ in the allowed characters delete one of the [.
To fix problem 2, either escape the - as \\- or move the - to the beginning or to the end of the char class.
So you can use:
String pattern = "[a-zA-Z\\s\\.\\-\\)\\(]*";
or
String pattern = "[a-zA-Z\\s\\.\\)\\(-]*";
or
String pattern = "[-a-zA-Z\\s\\.\\)\\(]*";
You should only use the dash - at the end of the character class, since it is normally used to show a range (as in a-z). Rearrange it:
String pattern = "[[a-zA-Z\\s\\.\\)\\(-]*";
Also, I don't think you have to escape (.) characters inside brackets.
Update: As others pointed out, you must also escape the [ in a java regex character class.
The problem here is that \.-\) ("\\.-\\)" in a Java string literal) tries to define a range from . to ). Since the Unicode codepoint of . (U+002E) is higher than that of ) (U+0029) this is an error.
Try using this pattern and you'll see: [z-a].
The correct solution is to either put the dash - at the end of the character group (at which point it will lose its special meaning) or to escape it.
You also need to close the unclosed open square bracket or escape it, if it was not intended for grouping.
Also, escaping the fullstop . is not necessary inside a character group.
You have to escape the dash and close the unmatched square bracket. So you are going to get two errors with this regex:
java.util.regex.PatternSyntaxException: Illegal character range near index 14
because the dash is used to specify a range, and \) is obviously a not valid range character. If you escape the dash, making it [[a-zA-Z\s\.\-\)\(]* you'll get
java.util.regex.PatternSyntaxException: Unclosed character class near index 19
which means that you have an extra opening square bracket that is used to specify character class. I don't know what you meant by putting an extra bracket here, but either escaping or removing it will make it a valid regex.

Java regex predefined character class nested inside character class

I need to use a regular expression that contains all the \b characters except the dot, .
Something like [\b&&[^.]]
For example, in the following test string:
"somewhere deep down in some org.argouml.swingext classes and"
I want org.argouml.swingext string to match but org.argouml string not too match. (Using the Matcher.find() method)
If I use: \b(package_name)>\b they both match, which is not what I want.
If I use: \b(package_name)[\b&&[^\.]] I get a PatternSyntaxException
If I use: \b(package_name)(\b&&[^\.]) nothing matches.
I use this link to test my regexes.
Context: I have a list of package names from a project and I have to search them inside some texts. Obviously if a nested package is found, I don't want the outer package to match as well, as seen from the above example.
I am not using the \s character class at the end because the package may be at the end of line, or it may followed by other nonword characters such as : , ) etc, characters that are contained in the \b class. I just want to subtract the . from the \b class.
If anybody knows how to do this, I would be very grateful :)
Thanks
A negative lookahead would work here:
\borg.argouml(?!\.)\b
Remember that in Java string literals the backslashes in regular expressions must be escaped:
"\\borg.argouml(?!\\.)\\b"
Why not simply use:
\b\w+(\.\w+)+\b
FYI, the PatternSyntaxException pops up because \b matches a position, not a character. A character class always matches 1 character so putting \b (a word boundary) inside a character class will cause the exception to be thrown.

Categories