java regex pattern unclosed character class - java

I need some help. Im getting:
Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 24
^[a-zA-Z└- 0-9£µ /.'-\]*$
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.clazz(Pattern.java:2254)
at java.util.regex.Pattern.sequence(Pattern.java:1818)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
Here is my code:
String testString = value.toString();
Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'-\\]*$");
Matcher m = pattern.matcher(testString);
I have to use the unicode value for some because I'm working with xhtml.
Any help would be great!

Assuming that you want to match \ and - and not ]:
Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'\\\\-]*$");
You need to double escape your backslashes, as \ is also an escape character in regex. Thus \\] escapes the backslash for java but not for regex. You need to add another java-escaped \ in order to regex-escape your second java-escaped \.
So \\\\ after java escaping becomes \\ which is then regex escaped to \.
Moving - to the end of the sequence means that it is used as a character, instead of a range operator as pointed out by Pshemo.

It is hard to say what are you trying to achieve, but I can see few strange things in your regex:
you have opened class of characters but never closed it. Instead you used \\] which makes ] normal character.
If you want to include ] in your characters class then you need additional ] at the end, like "^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'-\\]]*$"
if you want to include \ in your characters class then you need to use \\\\ version, because you need to escape its special meaning two times, in regex engine, and in Javas String
you used - with ('-\\]) which in character class is used to specify range of characters like a-z or A-Z. To escape its special meaning you need to use \\-

Related

Regex Match word that include a Dot

I have a Question I have this Sentence for Example:
"HalloAnna daveca.nn dave anna ca. anna"
And I only wanna match the single Standing "ca." .
My RegEx is like that :
(?i)\b(ca\.)\b
But this doesn't work and I don't know why. Any ideas ?
//Update
I excecute it with:
testSource.replaceAll()
and with
pattern.matcher(testSource).replaceAll().
both doesn´t work.
You must escape the dot and assert a non-word following:
(?i)\bca\.(?=\W)
See live demo.
You should use it like this:
Pattern.compile("(?i)\\b(ca\\.)(?=\\W)").matcher(a).replaceAll("SOME TEXT");
Which if you omit the java escapes gives a regex: (?i)\b(ca\.)\W.
Every \ in normal regex has to be escaped in java - \\.
Also, before a word you have word boundary (\b), but it applies only to a part in String where you have a change from whitespace to a alphanumeric character or the other way around. But in your case you have a dot, which is not an alphanumeric character, so you can't use \b at the end. You can use \W which means that a non-word character is following the dot. But to use \W you need to ignore it in the capture group (so it won't be replaced) - (?=.
Another issue was that you used ., which matches any character, but you actually want to match the real dot, so to do that you have to escape it - \., which in java String becomes \\..

How do I properly create a regex for String.matches() with escape characters?

I am trying to check if a string of length one is any of the following characters: "[", "\", "^", "_", single back-tick "`", or "]".
Right now I am trying to accomplish this with the following if statement:
if (character.matches("[[\\]^_`]")){
isValid = false;
}
When I run my program I get the following error for the if statement:
java.util.regex.PatternSyntaxException: null (in
java.util.regex.Pattern)
What is the correct syntax for a regex with escape characters?
Your list has four characters that need special attention:
^ is the inversion character. It must not be the first character in a character class, or it must be escaped.
\ is the escape character. It must be escaped for direct use.
[ starts a character class, so it must be escaped.
] ends a character class, so it must be escaped.
Here is the "raw" regex:
[\[\]_`\\^]
Since you represent your regex as a Java string literal, all backslashes must be additionally escaped for the Java compiler:
if (character.matches("[\\[\\]_`\\\\^]")){
isValid = false;
}
You need to escape the [, ] and \\ - the [ and ] so that the pattern compiler knows that they are not the special character class delimiters, and \\ because it's already being converted to one backslash because it's in a string literal, so to represent an escaped backslash in a pattern, you need to have no less than four consecutive backslashes.
So the resulting regex should be
"[\\[\\]\\\\^_`]"
(Test it on RegexPlanet - click on the "Java" button to test).

Java regular expression to remove all non alphanumeric characters EXCEPT spaces

I'm trying to write a regular expression in Java which removes all non-alphanumeric characters from a paragraph, except the spaces between the words.
This is the code I've written:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\s]", "");
However, the compiler gave me an error message pointing to the s saying it's an illegal escape character. The program compiled OK before I added the \s to the end of the regular expression, but the problem with that was that the spaces between words in the paragraph were stripped out.
How can I fix this error?
You need to double-escape the \ character: "[^a-zA-Z0-9\\s]"
Java will interpret \s as a Java String escape character, which is indeed an invalid Java escape. By writing \\, you escape the \ character, essentially sending a single \ character to the regex. This \ then becomes part of the regex escape character \s.
You need to escape the \ so that the regular expression recognizes \s :
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
Generally whenever you see that error, it means you only have a single backslash where you need two:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
Victoria, you must write \\s not \s here.
Please take a look at this site, you can test Java Regex online and get wellformatted regex string patterns back:
http://www.regexplanet.com/advanced/java/index.html

problem understanding a string pattern

I'm learning GWT by following this tutorial but there's something I don't quite fully understand in step 4. The following line's checking that a string matches a pattern:
if (!str.matches("^[0-9A-Z\\.]{1,10}$")) {...}
After checking the documentation for the Pattern class I understand that the characters ^ and $ represent the beginning and the end of the line, and that [...]{1,10} means that the part in brackets [...] has to be present at least once but not more than 10 times. What I don't understand is the final characters of the part in brackets. 0-9A-Z means a range of characters from 0 to 9 or from A to Z. But what does \\. mean?
It matches a dot character. Since dot has a special meaning in regexp, it must be escaped with a backslash. And because backslash has a special meaning in Java strings, it must be escaped with another backslash.
dot .
As it is a special character in regexp syntax.
Also it has two escapes as \ is a special character in java strings.
The dot "." in regex means "any character". An escaped dot "." (or "\.") means the dot character itself (without any special regex behaviour like the unescaped dot).
So, for example, "123.ABC" could be a line that matches the given regex (line breaks etc. not included).
It matches a dot character. A double slash '\\' simply means a single '\' as you have to escape '\'s in java strings. So '\\.' is translated to '\.' which means match just a '.' character. If you just used '.' by itself, without escaping, it would match any character. So you have to escape it, to match a '.' character.

Java regular expression

I want to replace any one of these chars:
% \ , [ ] # & # ! ^
... with empty string ("").
I used this code:
String line = "[ybi-173]";
Pattern cleanPattern = Pattern.compile("%|\\|,|[|]|#|&|#|!|^");
Matcher matcher = cleanPattern.matcher(line);
line = matcher.replaceAll("");
But it doesn't work.
What do I miss in this regular expression?
Some of the characters are special characters that are being interpreted differently. You can either escape them all with backslashes, or better yet put them in a character class (no need to escape the non-CC characters, eases readability):
Pattern cleanPattern = Pattern.compile("[%\\\\,\\[\\]#&#!^]");
There are several reasons why your solution doesn't work.
Several of the characters you wish to match have special meanings in regular expressions, including ^, [, and ]. These must be escaped with a \ character, but, to make matters worse, the \ itself must be escaped so that the Java compiler will pass the \ through to the regular expression constructor. So, to sum up step one, if you wish to match a ] character, the Java string must look like "\\]".
But, furthermore, this is a case for character classes [], rather than the alternation operator |. If you want to match "any of the characters a, b, c, that looks like [abc]. You character class would be [%\,[]#&#!^], but, because of the Java string escaping rules and the special meaning of certain characters, your regex will be [%\\\\,\\[\\]#&#!\\^].
You'd define your pattern as a character group enclosed in [ and ] and escape special chars, e.g.
String n = "%\\,[]#&#!^".replaceAll("[%\\\\,\\[\\]#&#!^]", "");

Categories