detect $character java regular expression - java

i have to find a word like ${test} from text file. and will replace the based on some criteria. in the regular express '$' have meaning of search till the end of the line.
what is the regular expression to detect like ${\w+}.

You can try using this regex:
"\\$\\{\\w+\\}"
and the method String#replaceAll(String regex, String replacement):
String s = "abc ${test}def"; // for example
s = s.replaceAll("\\$\\{\\w+\\}", "STACKOVERFLOW");

[^}]* rather than \w+ ?
You might want to consider using [^}]* rather than \w+. The former matches any chars that are not a closing brace, so it would allow test-123, which the second would reject. Of course that may just be what you want.
Let's assume this is the raw regex (see what matches in the demo):
\$\{[^}]*\}
In Java, we need to further escape the backslashes, yielding \\$\\{[^}]*.
Likewise \$\{\w+\} would have to be used as \\$\\{\\w+\}
Replacing the Matches in Java
String resultString = subjectString.replaceAll("\\$\\{[^}]*\}", "Your Replacement");
Iterating through the matches in Java
Pattern regex = Pattern.compile("\\$\\{[^}]*\}");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// the current match is regexMatcher.group()
}
Explanation
\$ matches the literal $
\{ matches an opening brace
[^}]* matches any chars that are not a closing brace
\} a closing brace

Related

Regular expressions in multi-line text code in Java [duplicate]

I am trying to match a multi line text using java. When I use the Pattern class with the Pattern.MULTILINE modifier, I am able to match, but I am not able to do so with (?m).
The same pattern with (?m) and using String.matches does not seem to work.
I am sure I am missing something, but no idea what. Am not very good at regular expressions.
This is what I tried
String test = "User Comments: This is \t a\ta \n test \n\n message \n";
String pattern1 = "User Comments: (\\W)*(\\S)*";
Pattern p = Pattern.compile(pattern1, Pattern.MULTILINE);
System.out.println(p.matcher(test).find()); //true
String pattern2 = "(?m)User Comments: (\\W)*(\\S)*";
System.out.println(test.matches(pattern2)); //false - why?
First, you're using the modifiers under an incorrect assumption.
Pattern.MULTILINE or (?m) tells Java to accept the anchors ^ and $ to match at the start and end of each line (otherwise they only match at the start/end of the entire string).
Pattern.DOTALL or (?s) tells Java to allow the dot to match newline characters, too.
Second, in your case, the regex fails because you're using the matches() method which expects the regex to match the entire string - which of course doesn't work since there are some characters left after (\\W)*(\\S)* have matched.
So if you're simply looking for a string that starts with User Comments:, use the regex
^\s*User Comments:\s*(.*)
with the Pattern.DOTALL option:
Pattern regex = Pattern.compile("^\\s*User Comments:\\s+(.*)", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
ResultString will then contain the text after User Comments:
This has nothing to do with the MULTILINE flag; what you're seeing is the difference between the find() and matches() methods. find() succeeds if a match can be found anywhere in the target string, while matches() expects the regex to match the entire string.
Pattern p = Pattern.compile("xyz");
Matcher m = p.matcher("123xyzabc");
System.out.println(m.find()); // true
System.out.println(m.matches()); // false
Matcher m = p.matcher("xyz");
System.out.println(m.matches()); // true
Furthermore, MULTILINE doesn't mean what you think it does. Many people seem to jump to the conclusion that you have to use that flag if your target string contains newlines--that is, if it contains multiple logical lines. I've seen several answers here on SO to that effect, but in fact, all that flag does is change the behavior of the anchors, ^ and $.
Normally ^ matches the very beginning of the target string, and $ matches the very end (or before a newline at the end, but we'll leave that aside for now). But if the string contains newlines, you can choose for ^ and $ to match at the start and end of any logical line, not just the start and end of the whole string, by setting the MULTILINE flag.
So forget about what MULTILINE means and just remember what it does: changes the behavior of the ^ and $ anchors. DOTALL mode was originally called "single-line" (and still is in some flavors, including Perl and .NET), and it has always caused similar confusion. We're fortunate that the Java devs went with the more descriptive name in that case, but there was no reasonable alternative for "multiline" mode.
In Perl, where all this madness started, they've admitted their mistake and gotten rid of both "multiline" and "single-line" modes in Perl 6 regexes. In another twenty years, maybe the rest of the world will have followed suit.
str.matches(regex) behaves like Pattern.matches(regex, str) which attempts to match the entire input sequence against the pattern and returns
true if, and only if, the entire input sequence matches this matcher's pattern
Whereas matcher.find() attempts to find the next subsequence of the input sequence that matches the pattern and returns
true if, and only if, a subsequence of the input sequence matches this matcher's pattern
Thus the problem is with the regex. Try the following.
String test = "User Comments: This is \t a\ta \ntest\n\n message \n";
String pattern1 = "User Comments: [\\s\\S]*^test$[\\s\\S]*";
Pattern p = Pattern.compile(pattern1, Pattern.MULTILINE);
System.out.println(p.matcher(test).find()); //true
String pattern2 = "(?m)User Comments: [\\s\\S]*^test$[\\s\\S]*";
System.out.println(test.matches(pattern2)); //true
Thus in short, the (\\W)*(\\S)* portion in your first regex matches an empty string as * means zero or more occurrences and the real matched string is User Comments: and not the whole string as you'd expect. The second one fails as it tries to match the whole string but it can't as \\W matches a non word character, ie [^a-zA-Z0-9_] and the first character is T, a word character.
The multiline flag tells regex to match the pattern to each line as opposed to the entire string for your purposes a wild card will suffice.

converting RegEx into my Java function [duplicate]

This question already has answers here:
Why does this Java regex cause "illegal escape character" errors?
(7 answers)
Closed 2 years ago.
I'm having problems with Java RegEx. That's my regex statement "\"730\"\s+{([^}]+)}" and it works on an regex checking website, but I have trouble getting it to work in Java. That's my current code.
String patternString = '\"730\"\s+{([^}]+)}';
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(vdfContentsString);
boolean matches = matcher.matches();
Thanks for advice.
It says "Illegal escape character in character literal".
Single quotes (') declare characters, double quotes (") declare strings, that's why you get the syntax error Illegal escape character in character literal. Second, regex itself syntactically uses the backslash, as in \s for whitespace. Maybe confusing might be the fact that Java also uses \ for character escaping. That's why you need two backslashes (\\s in Java will become \s for the resulting regular expression).
Then you need to take care of special characters in regular expressions: { and } are quantifiers ("repeat n times"), if you want them literally, escape them (\\{ and \\})
So if you want to match a string like "730" {whatever}, use this regular expression:
"730"\s+\{([^}]+)\}
or in Java:
String patternString = "\"730\"\\s+\\{([^}]+)\\}";
Example:
String str = "\"730\" { \"installdir\" \"C:\\Program Files (x86)\\Steam\\steamapps\\common\\Counter-Strike Global Offensive\" \"HasAllLocalContent\" \"1\" \"UpToDate\" \"1\" }";
String patternString = "\"730\"\\s+\\{([^}]+)\\}";
System.out.println(str.matches(patternString)); // true
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition
Escape { and } as well because in Java Regex Pattern it has special meaning.
String patternString = "\"730\"\\s+\\{([^\\}]+)\\}";
EDIT
String#matches() method looks for whole string if you are looking for sub-string of a long string then use Matcher#find() method and get the result from the groups that is captured by enclosing the pattern inside parenthisis (...).
sample code:
String patternString = "(\"730\"\\s+\\{([^\\}]+)\\})";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(vdfContentsString);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
{, } are Metacharacters (See HERE for metacharacters) and need to be escaped with \\, hence, \\{ .. \\}.
\ is an escape character, while \s, \w, \d etc (See HERE for a list) are metacharacters, therefore, as mentioned above, these need to be escaped as well, hence, \\s+
instead of [^\\}], i would suggest (.+?)}
This is working:
String patternString = '\\\"730\\\"\\s+\\{(.+?)\\}';
The above is the required Java string which gets parsed into the following regular expression: \"730\"\s+\{(.+?)\}, and then it can be used to match the input string. Tadan!
two levels of parsing!

How to escape characters in a regular expression

When I use the following code I've got an error:
Matcher matcher = pattern.matcher("/Date\(\d+\)/");
The error is :
invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )
I have also tried to change the value in the brackets to('/Date\(\d+\)/'); without any success.
How can i avoid this error?
You need to double-escape your \ character, like this: \\.
Otherwise your String is interpreted as if you were trying to escape (.
Same with the other round bracket and the d.
In fact it seems you are trying to initialize a Pattern here, while pattern.matcher references a text you want your Pattern to match.
Finally, note that in a Pattern, escaped characters require a double escape, as such:
\\(\\d+\\)
Also, as Rohit says, Patterns in Java do not need to be surrounded by forward slashes (/).
In fact if you initialize a Pattern like that, it will interpret your Pattern as starting and ending with literal forward slashes.
Here's a small example of what you probably want to do:
// your input text
String myText = "Date(123)";
// your Pattern initialization
Pattern p = Pattern.compile("Date\\(\\d+\\)");
// your matcher initialization
Matcher m = p.matcher(myText);
// printing the output of the match...
System.out.println(m.find());
Output:
true
Your regex is correct by itself, but in Java, the backslash character itself needs to be escaped.
Thus, this regex:
/Date\(\d+\)/
Must turn into this:
/Date\\(\\d+\\)/
One backslash is for escaping the parenthesis or d. The other one is for escaping the backslash itself.
The error message you are getting arises because Java thinks you're trying to use \( as a single escape character, like \n, or any of the other examples. However, \( is not a valid escape sequence, and so Java complains.
In addition, the logic of your code is probably incorrect. The argument to matcher should be the text to search (for example, "/Date(234)/Date(6578)/"), whereas the variable pattern should contain the pattern itself. Try this:
String textToMatch = "/Date(234)/Date(6578)/";
Pattern pattern = pattern.compile("/Date\\(\\d+\\)/");
Matcher matcher = pattern.matcher(textToMatch);
Finally, the regex character class \d means "one single digit." If you are trying to refer to the literal phrase \\d, you would have to use \\\\d to escape this. However, in that case, your regex would be a constant, and you could use textToMatch.indexOf and textToMatch.contains more easily.
To escape regex in java, you can also use Pattern.quote()

How to match a string's end using a regex pattern in Java?

I want a regular expression pattern that will match with the end of a string.
I'm implementing a stemming algorithm that will remove suffixes of a word.
E.g. for a word 'Developers' it should match 's'.
I can do it using following code :
Pattern p = Pattern.compile("s");
Matcher m = p.matcher("Developers");
m.replaceAll(" "); // it will replace all 's' with ' '
I want a regular expression that will match only a string's end something like replaceLast().
You need to match "s", but only if it is the last character in a word. This is achieved with the boundary assertion $:
input.replaceAll("s$", " ");
If you enhance the regular expression, you can replace multiple suffixes with one call to replaceAll:
input.replaceAll("(ed|s)$", " ");
Use $:
Pattern p = Pattern.compile("s$");
public static void main(String[] args)
{
String message = "hi this message is a test message";
message = message.replaceAll("message$", "email");
System.out.println(message);
}
Check this,
http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html
When matching a character at the end of string, mind that the $ anchor matches either the very end of string or the position before the final line break char if it is present even when the Pattern.MULTILINE option is not used.
That is why it is safer to use \z as the very end of string anchor in a Java regex.
For example:
Pattern p = Pattern.compile("s\\z");
will match s at the end of string.
See a related Whats the difference between \z and \Z in a regular expression and when and how do I use it? post.
NOTE: Do not use zero-length patterns with \z or $ after them because String.replaceAll(regex) makes the same replacement twice in that case. That is, do not use input.replaceAll("s*\\z", " ");, since you will get two spaces at the end, not one. Either use "s\\z" to replace one s, or use "s+\\z" to replace one or more.
If you still want to use replaceAll with a zero-length pattern anchored at the end of string to replace with a single occurrence of the replacement, you can use a workaround similar to the one in the How to make a regular expression for this seemingly simple case? post (writing "a regular expression that works with String replaceAll() to remove zero or more spaces from the end of a line and replace them with a single period (.)").

regular expressions using java.util.regex API- java

How can I create a regular expression to search strings with a given pattern? For example I want to search all strings that match pattern '*index.tx?'. Now this should find strings with values index.txt,mainindex.txt and somethingindex.txp.
Pattern pattern = Pattern.compile("*.html");
Matcher m = pattern.matcher("input.html");
This code is obviously not working.
You need to learn regular expression syntax. It is not the same as using wildcards. Try this:
Pattern pattern = Pattern.compile("^.*index\\.tx.$");
There is a lot of information about regular expressions here. You may find the program RegexBuddy useful while you are learning regular expressions.
The code you posted does not work because:
dot . is a special regex character. It means one instance of any character.
* means any number of occurrences of the preceding character.
therefore, .* means any number of occurrences of any character.
so you would need something like
Pattern pattern = Pattern.compile(".*\\.html.*");
the reason for the \\ is because we want to insert dot, although it is a special regex sign.
this means: match a string in which at first there are any number of wild characters, followed by a dot, followed by html, followed by anything.
* matches zero or more occurrences of the preceding token, so if you want to match zero or more of any character, use .* instead (. matches any char).
Modified regex should look something like this:
Pattern pattern = Pattern.compile("^.*\\.html$");
^ matches the start of the string
.* matches zero or more of any char
\\. matches the dot char (if not escaped it would match any char)
$ matches the end of the string

Categories