converting RegEx into my Java function [duplicate] - java

This question already has answers here:
Why does this Java regex cause "illegal escape character" errors?
(7 answers)
Closed 2 years ago.
I'm having problems with Java RegEx. That's my regex statement "\"730\"\s+{([^}]+)}" and it works on an regex checking website, but I have trouble getting it to work in Java. That's my current code.
String patternString = '\"730\"\s+{([^}]+)}';
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(vdfContentsString);
boolean matches = matcher.matches();
Thanks for advice.
It says "Illegal escape character in character literal".

Single quotes (') declare characters, double quotes (") declare strings, that's why you get the syntax error Illegal escape character in character literal. Second, regex itself syntactically uses the backslash, as in \s for whitespace. Maybe confusing might be the fact that Java also uses \ for character escaping. That's why you need two backslashes (\\s in Java will become \s for the resulting regular expression).
Then you need to take care of special characters in regular expressions: { and } are quantifiers ("repeat n times"), if you want them literally, escape them (\\{ and \\})
So if you want to match a string like "730" {whatever}, use this regular expression:
"730"\s+\{([^}]+)\}
or in Java:
String patternString = "\"730\"\\s+\\{([^}]+)\\}";
Example:
String str = "\"730\" { \"installdir\" \"C:\\Program Files (x86)\\Steam\\steamapps\\common\\Counter-Strike Global Offensive\" \"HasAllLocalContent\" \"1\" \"UpToDate\" \"1\" }";
String patternString = "\"730\"\\s+\\{([^}]+)\\}";
System.out.println(str.matches(patternString)); // true

Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition
Escape { and } as well because in Java Regex Pattern it has special meaning.
String patternString = "\"730\"\\s+\\{([^\\}]+)\\}";
EDIT
String#matches() method looks for whole string if you are looking for sub-string of a long string then use Matcher#find() method and get the result from the groups that is captured by enclosing the pattern inside parenthisis (...).
sample code:
String patternString = "(\"730\"\\s+\\{([^\\}]+)\\})";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(vdfContentsString);
while (matcher.find()) {
System.out.println(matcher.group(1));
}

{, } are Metacharacters (See HERE for metacharacters) and need to be escaped with \\, hence, \\{ .. \\}.
\ is an escape character, while \s, \w, \d etc (See HERE for a list) are metacharacters, therefore, as mentioned above, these need to be escaped as well, hence, \\s+
instead of [^\\}], i would suggest (.+?)}
This is working:
String patternString = '\\\"730\\\"\\s+\\{(.+?)\\}';
The above is the required Java string which gets parsed into the following regular expression: \"730\"\s+\{(.+?)\}, and then it can be used to match the input string. Tadan!
two levels of parsing!

Related

Why escaping double quote with single and triple backslashes in a Java regular expression yields identical results

I want to escape " (double quotes) in a regex.
I found out that there is no difference whether I use \\\ or \, both yield the same correct result.
Why is it so? How can the first one give correct result?
To define a " char in a string literal in Java, you need to escape it for the string parsing engine, like "\"".
The " char is not a special regex metacharacter, so you needn't escape this character for the regex engine. However, you may do it:
A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.
To define a regex escape a literal backslash is used, and it is defined with double backslash in a Java string literal, "\\":
It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler.
So, both "\"" (a literal " string) and "\\\"" (a literal \" string) form a regex pattern that matches a single " char.
Try to use this:
String regex = "(\"\\w+\")";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("Some \"test\" string. And \"another\" quoted word.");
while (matcher.find()) {
System.out.println(matcher.group());
}
Prints:
"test"
"another"

Regex in java: error for str.replace("\s+", " ")

Why the java (1.7) gives me error for the following line?
String str2 = str.replace("\s+", " ");
Error:
Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )
As far as I know "\s+" is a valid regex. Isn't it?
String.replace() will only replace literals, that's the first problem.
The second problem is that \s is not a valid escape sequence in a Java string literal, by definition.
Which means what you wanted was probably "\\s+".
But even then, .replace() won't take that as a regex. You have to use .replaceAll() instead:
s.replaceAll("\\s+", "");
BUT there is another problem. You seem to be using it often... Therefore, use a Pattern instead:
private static final Pattern SPACES = Pattern.compile("\\s+");
// In code...
SPACES.matcher(input).replaceAll("");
FURTHER NOTES:
If what you want is to only replace the first occurrence, then use .replaceFirst(); String has it, and so does Pattern
When you .replace{First,All}() on a String, a new Pattern is recompiled for each and every invocation. Use a Pattern if you have to do repetitive matches!
It's a valid regular expression pattern, but \s is not a valid String literal escape sequence. Escape the \.
String str2 = str.replace("\\s+", " ");
As suggested, String#replace(CharSequence, CharSequence) doesn't consider the arguments you provide as regular expressions. So even if you got the program to compile, it wouldn't do what you seem to want it to do. Check out String#replaceAll(String, String).

detect $character java regular expression

i have to find a word like ${test} from text file. and will replace the based on some criteria. in the regular express '$' have meaning of search till the end of the line.
what is the regular expression to detect like ${\w+}.
You can try using this regex:
"\\$\\{\\w+\\}"
and the method String#replaceAll(String regex, String replacement):
String s = "abc ${test}def"; // for example
s = s.replaceAll("\\$\\{\\w+\\}", "STACKOVERFLOW");
[^}]* rather than \w+ ?
You might want to consider using [^}]* rather than \w+. The former matches any chars that are not a closing brace, so it would allow test-123, which the second would reject. Of course that may just be what you want.
Let's assume this is the raw regex (see what matches in the demo):
\$\{[^}]*\}
In Java, we need to further escape the backslashes, yielding \\$\\{[^}]*.
Likewise \$\{\w+\} would have to be used as \\$\\{\\w+\}
Replacing the Matches in Java
String resultString = subjectString.replaceAll("\\$\\{[^}]*\}", "Your Replacement");
Iterating through the matches in Java
Pattern regex = Pattern.compile("\\$\\{[^}]*\}");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// the current match is regexMatcher.group()
}
Explanation
\$ matches the literal $
\{ matches an opening brace
[^}]* matches any chars that are not a closing brace
\} a closing brace

How to escape characters in a regular expression

When I use the following code I've got an error:
Matcher matcher = pattern.matcher("/Date\(\d+\)/");
The error is :
invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )
I have also tried to change the value in the brackets to('/Date\(\d+\)/'); without any success.
How can i avoid this error?
You need to double-escape your \ character, like this: \\.
Otherwise your String is interpreted as if you were trying to escape (.
Same with the other round bracket and the d.
In fact it seems you are trying to initialize a Pattern here, while pattern.matcher references a text you want your Pattern to match.
Finally, note that in a Pattern, escaped characters require a double escape, as such:
\\(\\d+\\)
Also, as Rohit says, Patterns in Java do not need to be surrounded by forward slashes (/).
In fact if you initialize a Pattern like that, it will interpret your Pattern as starting and ending with literal forward slashes.
Here's a small example of what you probably want to do:
// your input text
String myText = "Date(123)";
// your Pattern initialization
Pattern p = Pattern.compile("Date\\(\\d+\\)");
// your matcher initialization
Matcher m = p.matcher(myText);
// printing the output of the match...
System.out.println(m.find());
Output:
true
Your regex is correct by itself, but in Java, the backslash character itself needs to be escaped.
Thus, this regex:
/Date\(\d+\)/
Must turn into this:
/Date\\(\\d+\\)/
One backslash is for escaping the parenthesis or d. The other one is for escaping the backslash itself.
The error message you are getting arises because Java thinks you're trying to use \( as a single escape character, like \n, or any of the other examples. However, \( is not a valid escape sequence, and so Java complains.
In addition, the logic of your code is probably incorrect. The argument to matcher should be the text to search (for example, "/Date(234)/Date(6578)/"), whereas the variable pattern should contain the pattern itself. Try this:
String textToMatch = "/Date(234)/Date(6578)/";
Pattern pattern = pattern.compile("/Date\\(\\d+\\)/");
Matcher matcher = pattern.matcher(textToMatch);
Finally, the regex character class \d means "one single digit." If you are trying to refer to the literal phrase \\d, you would have to use \\\\d to escape this. However, in that case, your regex would be a constant, and you could use textToMatch.indexOf and textToMatch.contains more easily.
To escape regex in java, you can also use Pattern.quote()

Removing literal character in regex

I have the following string
\Qpipe,name=office1\E
And I am using a simplified regex library that doesn't support the \Q and \E.
I tried removing them
s.replaceAll("\\Q", "").replaceAll("\\E", "")
However, I get the error Caused by: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 1
\E
^
Any ideas?
\ is the special escape character in both Java string and regex engine. To pass a literal \ to the regex engine you need to have \\\\ in the Java string. So try:
s.replaceAll("\\\\Q", "").replaceAll("\\\\E", "")
Alternatively and a simpler way would be to use the replace method which takes string and not regex:
s.replace("\\Q", "").replace("\\E", "")
Use the Pattern.quote() function to escape special characters in regex for example
s.replaceAll(Pattern.quote("\Q"), "")
replaceAll takes a regular expression string. Instead, just use replace which takes a literal string. So myRegexString.replace("\\Q", "").replace("\\E", "").
But that still leaves you with the problem of quoting special regex characters for your simplified regex library.
String.replaceAll() takes a regular expression as parameter, so you need to escape your backslash twice:
s.replaceAll("\\\Q", "").replaceAll("\\\\E", "");
You can also use the below. I used this because i was matching and replacing a text wrapped and the Q & E would stay in the pattern. This way it doesn't.
final int flags = Pattern.LITERAL;
regex = "My regex";
pattern = Pattern.compile( regex, flags );

Categories