Extract set of repeated pattern from String literal in Java - java

What would be a convenient and reliable way to extract all the "{...}" tags from a given string? (Using Java).
So, to give an example:
Say I have: http://www.something.com/{tag1}/path/{tag2}/else/{tag3}.html
I want to get all the "{}” tags; I was thinking about using the Java .split() functions, but not sure what the correct regex would be for this.
Note also: tags can be called anything, not just tagX!

I would use regular expressions to match this. Something like this could work for your expression:
String regex = "\\{.*?\\}";
As this will "reluctantly" match any sub string that has { and } surrounding it. The .*? makes it find any character between the { and }, but reluctantly, so it doesn't match the bigger String:
{tag1}/path/{tag2}/else/{tag3}
which would be a "greedy" match. Note that the curly braces in the regex need to be escaped with double backslashes since curly braces have a separate meaning inside a regular expression, and if you want to indicate the curly brace String, you need to escape it.
e.g.,
public static void main(String[] args) {
String test = "http://www.something.com/{tag1}/path/{tag2}/else/{tag3}.html";
String regex = "\\{.*?\\}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(test);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
With an output of:
{tag1}
{tag2}
{tag3}
You can read more about regular expressions here:
Oracle Regular Expressions Tutorial
and for greater detail, here:
www.regular-expressions.info/tutorial

Related

Java Regular expressions for filename

I want to check the filenames sent to me against two patterns.
The first regular expression is ~*~, which should match names like ~263~. I put this in online regular expression testers and it matches. The code doesnt work though. Says no match
List<FTPFile> ret = new ArrayList<FTPFile>();
Pattern pattern = Pattern.compile("~*~");
Matcher matcher;
for (FTPFile file : files)
{
matcher = pattern.matcher(file.getName());
if(matcher.matches())
{
ret.add(file);
}
}
return ret;
Also the second pattern I need is ##* which should match strings like abc#ere#sss
Please tell me the proper patterns in java for this.
You need to define your pattern like,
Pattern pattern = Pattern.compile("~.*~");
~* in your regex ~*~ will repeat the first ~ zero or more times. So it won't match the number following the first ~. Because matches method tries to match the whole input string, this regex causes the match to fail. So you need to add .* inbetween to match strings like ~66~ or ~kjk~ . To match the strings which has only numbers present inbetween ~, you need to use ~\d+~
Try Regex:
\~.*\~
Instead:
~*~
Example:
Pattern pattern = Pattern.compile("\\~.*\\~");

converting RegEx into my Java function [duplicate]

This question already has answers here:
Why does this Java regex cause "illegal escape character" errors?
(7 answers)
Closed 2 years ago.
I'm having problems with Java RegEx. That's my regex statement "\"730\"\s+{([^}]+)}" and it works on an regex checking website, but I have trouble getting it to work in Java. That's my current code.
String patternString = '\"730\"\s+{([^}]+)}';
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(vdfContentsString);
boolean matches = matcher.matches();
Thanks for advice.
It says "Illegal escape character in character literal".
Single quotes (') declare characters, double quotes (") declare strings, that's why you get the syntax error Illegal escape character in character literal. Second, regex itself syntactically uses the backslash, as in \s for whitespace. Maybe confusing might be the fact that Java also uses \ for character escaping. That's why you need two backslashes (\\s in Java will become \s for the resulting regular expression).
Then you need to take care of special characters in regular expressions: { and } are quantifiers ("repeat n times"), if you want them literally, escape them (\\{ and \\})
So if you want to match a string like "730" {whatever}, use this regular expression:
"730"\s+\{([^}]+)\}
or in Java:
String patternString = "\"730\"\\s+\\{([^}]+)\\}";
Example:
String str = "\"730\" { \"installdir\" \"C:\\Program Files (x86)\\Steam\\steamapps\\common\\Counter-Strike Global Offensive\" \"HasAllLocalContent\" \"1\" \"UpToDate\" \"1\" }";
String patternString = "\"730\"\\s+\\{([^}]+)\\}";
System.out.println(str.matches(patternString)); // true
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition
Escape { and } as well because in Java Regex Pattern it has special meaning.
String patternString = "\"730\"\\s+\\{([^\\}]+)\\}";
EDIT
String#matches() method looks for whole string if you are looking for sub-string of a long string then use Matcher#find() method and get the result from the groups that is captured by enclosing the pattern inside parenthisis (...).
sample code:
String patternString = "(\"730\"\\s+\\{([^\\}]+)\\})";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(vdfContentsString);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
{, } are Metacharacters (See HERE for metacharacters) and need to be escaped with \\, hence, \\{ .. \\}.
\ is an escape character, while \s, \w, \d etc (See HERE for a list) are metacharacters, therefore, as mentioned above, these need to be escaped as well, hence, \\s+
instead of [^\\}], i would suggest (.+?)}
This is working:
String patternString = '\\\"730\\\"\\s+\\{(.+?)\\}';
The above is the required Java string which gets parsed into the following regular expression: \"730\"\s+\{(.+?)\}, and then it can be used to match the input string. Tadan!
two levels of parsing!

detect $character java regular expression

i have to find a word like ${test} from text file. and will replace the based on some criteria. in the regular express '$' have meaning of search till the end of the line.
what is the regular expression to detect like ${\w+}.
You can try using this regex:
"\\$\\{\\w+\\}"
and the method String#replaceAll(String regex, String replacement):
String s = "abc ${test}def"; // for example
s = s.replaceAll("\\$\\{\\w+\\}", "STACKOVERFLOW");
[^}]* rather than \w+ ?
You might want to consider using [^}]* rather than \w+. The former matches any chars that are not a closing brace, so it would allow test-123, which the second would reject. Of course that may just be what you want.
Let's assume this is the raw regex (see what matches in the demo):
\$\{[^}]*\}
In Java, we need to further escape the backslashes, yielding \\$\\{[^}]*.
Likewise \$\{\w+\} would have to be used as \\$\\{\\w+\}
Replacing the Matches in Java
String resultString = subjectString.replaceAll("\\$\\{[^}]*\}", "Your Replacement");
Iterating through the matches in Java
Pattern regex = Pattern.compile("\\$\\{[^}]*\}");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// the current match is regexMatcher.group()
}
Explanation
\$ matches the literal $
\{ matches an opening brace
[^}]* matches any chars that are not a closing brace
\} a closing brace

%etd(msg01) regular expression?

I am trying to write a regular expression for String like %etd(msg01).
String string = "My name is %etd(msg01) and %etd(msg02)";
Pattern pattern = Pattern.compile("%etd(.+)");
Matcher matcher = pattern.matcher(string);
while(matcher.find()) {
System.out.println(matcher.group());
}
It prints %etd(msg01) and %etd(msg02). However, I want it to print %etd(msg01) %etd(msg02) separately. I mean I am looking for non-greedy match.
How should the regular expression be changed to make it non greedy in this situation?
You should use this regex:
Pattern pattern = Pattern.compile("%etd\\([^)]+\\)");
Please place a question mark after .* or .+ to make it nongreedy. This should work for you...
Pattern pattern = Pattern.compile("%etd\\(.+?\\)");
Double slashes are also necessary in front of open and close parenthesis because they carry a special meaning in regular expression.
Another way of using is as below if you are sure that your names doesn't contain an open paranthesis after the first one.
Pattern pattern = Pattern.compile("%etd\\([^(]+\\)");

Regular expression to replace content between parentheses ()

I tried this code:
string.replaceAll("\\(.*?)","");
But it returns null. What am I missing?
Try:
string.replaceAll("\\(.*?\\)","");
You didn't escape the second parenthesis and you didn't add an additional "\" to the first one.
First, Do you wish to remove the parentheses along with their content? Although the title of the question indicates no, I am assuming that you do wish to remove the parentheses as well.
Secondly, can the content between the parentheses contain nested matching parentheses? This solution assumes yes. Since the Java regex flavor does not support recursive expressions, the solution is to first craft a regex which matches the "innermost" set of parentheses, and then apply this regex in an iterative manner replacing them from the inside-out. Here is a tested Java program which correctly removes (possibly nested) parentheses and their contents:
import java.util.regex.*;
public class TEST {
public static void main(String[] args) {
String s = "stuff1 (foo1(bar1)foo2) stuff2 (bar2) stuff3";
String re = "\\([^()]*\\)";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(s);
while (m.find()) {
s = m.replaceAll("");
m = p.matcher(s);
}
System.out.println(s);
}
}
Test Input:
"stuff1 (foo1(bar1)foo2) stuff2 (bar2) stuff3"
Test Output:
"stuff1 stuff2 stuff3"
Note that the lazy-dot-star solution will never work, because it fails to match the innermost set of parentheses when they are nested. (i.e. it erroneously matches: (foo1(bar1) in the example above.) And this is a very commonly made regex mistake: Never use the dot when there is a more precise expression! In this case, the contents between an "innermost" set of matching parentheses consists of any character that is not an opening or closing parentheses, (i.e. Use: [^()]* instead of: .*?).
Try string.replaceAll("\\(.*?\\)","").
string.replaceAll("\\([^\\)]*\\)","");
This way you are saying match a bracket, then all non-closing bracket chars, and then a closing bracket. This is usually faster than reluctant or greedy .* matchers.

Categories