I'm using Java and I would like to implement a code whose output is PRP I when the input is (NP (PRP I)).
My current implementation is like the following:
Pattern pattern = Pattern.compile("\\((.?)\\)");
Matcher matcher = pattern.matcher(noun_phrase);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
and its output is NP (PRP I.
I know that one possibility would be to count the parentheses, but I'm wondering if there is any way to get just the string inside the nested parentheses using regex.
This should work
Pattern pattern = Pattern.compile("\\(.*?\\((.*?)\\)\\)");
Matcher matcher = pattern.matcher("(NP (PRP I))");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
You can use following sites to experiment with Regular expressions.
https://regex101.com/r/cE0dM7/1
http://leaverou.github.io/regexplained/
https://www.debuggex.com/r/gfVglXkY1Cw5D6Mb
You need to add another braces around the group. Also, you need to make sure that between the fixed parentheses you don't match the parentheses:
String noun_phrase = "(NP (PRP I))";
Pattern pattern = Pattern.compile("\\([^(]*\\(([^)]*)\\)[^)]*\\)");
Matcher matcher = pattern.matcher(noun_phrase);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
The negated character classes [^(] and [^)] make sure you don't match parentheses too eagerly.
Well, as I don't know how deep you can go with your parantheses, I will suggest two possible solutions.
Solution 1: Assuming the depth's exactly as in your question.
This regex will work: Pattern pattern = Pattern.compile("\\(([^()]*)\\)").
Solution 2: Assuming the depths arbitrary (but at least the most inner string is surrounded by parantheses).
In this case, you will have to make some more changes. First, your pattern will look like this: Pattern pattern = Pattern.compile("(\\(.*)*\\(([^)]*)\\)"). See the difference? You now have two groups, the first matching on all but the innermost part surrounded by parantheses, the second group is exactly the one you want. That means, in your loop, you have to change matcher.group(1) to matcher.group(2). Furthermore, [^)] makes sure, you don't have any closing parantheses in your group.
Related
Can someone tell me the easiest way to extract the number '20' in the following substring.
Level I (10/20)
Note: The numbers in the brackets and the number behind 'Level' are changing and can contain more chars than in this example
It would be awesome if there is a method for using a regex and extract a specific part out of it.
I'm not the best with regex, but here's a working solution for your example:
String s = "Level I (10/20)";
Pattern pattern = Pattern.compile("\\(\\d+/(\\d+)\\)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
20
How about this one, works for multi-line input too:
^Level[[:blank:]].+\([\d]*\/([\d]*)\)
Test here
I am trying to write a regular expression for String like %etd(msg01).
String string = "My name is %etd(msg01) and %etd(msg02)";
Pattern pattern = Pattern.compile("%etd(.+)");
Matcher matcher = pattern.matcher(string);
while(matcher.find()) {
System.out.println(matcher.group());
}
It prints %etd(msg01) and %etd(msg02). However, I want it to print %etd(msg01) %etd(msg02) separately. I mean I am looking for non-greedy match.
How should the regular expression be changed to make it non greedy in this situation?
You should use this regex:
Pattern pattern = Pattern.compile("%etd\\([^)]+\\)");
Please place a question mark after .* or .+ to make it nongreedy. This should work for you...
Pattern pattern = Pattern.compile("%etd\\(.+?\\)");
Double slashes are also necessary in front of open and close parenthesis because they carry a special meaning in regular expression.
Another way of using is as below if you are sure that your names doesn't contain an open paranthesis after the first one.
Pattern pattern = Pattern.compile("%etd\\([^(]+\\)");
I need to print #OPOK, but in the following code:
String s = "\"MSG1\":\"00\",\"MSG2\":\"#OPOK\",\"MSG3\":\"XXXXXX\"}";
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+)\".*");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
} else {
System.out.println("Match not found");
}
I get #OPOK","MSG3":"XXXXXX instead, how do I fix my pattern ?
You want to make your .+ part reluctant. By default it's greedy - it'll match as much as it can without preventing the pattern from matching. You want it to match as little as it can, like this:
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+?)\".*");
The ? is what makes it reluctant. See the Pattern documentation for more details.
Or of course you could just match against "any character other than a double quote" which is what Brian's approach will do. Both will work equally well as far as I'm aware; there may well be performance differences between them (I'd expect Brian's to perform better to be honest) but if performance is important to you you should test both approaches.
You probably want the following:
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]+)\"");
For the capture group you are interested in, this will match any character except a double quote. Since the group is surrounded by double quotes, this should prevent it from going "too far" in the match.
Edited to add: As #bmorris591 suggested in the comments, you can add an extra + (as shown below) to make the quantifier possessive. This may help improve performance in cases where the matcher fails to find a match.
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]++)\"");
I have the following string:
http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true
How can I extract the part after 30/? In this case, it's 32531a5d-b0b1-4a8b-9029-b48f0eb40a34.I have another strings having same part upto 30/ and after that every string having different id upto next / which I want.
You can do like this:
String s = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
System.out.println(s.substring(s.indexOf("30/")+3, s.length()));
split function of String class won't help you in this case, because it discards the delimiter and that's not what we want here. you need to make a pattern that looks behind. The look behind synatax is:
(?<=X)Y
Which identifies any Y that is preceded by a X.
So in you case you need this pattern:
(?<=30/).*
compile the pattern, match it with your input, find the match, and catch it:
String input = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
Matcher matcher = Pattern.compile("(?<=30/).*").matcher(input);
matcher.find();
System.out.println(matcher.group());
Just for this one, or do you want a generic way to do it ?
String[] out = mystring.split("/")
return out[out.length - 2]
I think the / is definitely the delimiter you are searching for.
I can't see the problem you are talking about Alex
EDIT : Ok, Python got me with indexes.
Regular expression is the answer I think. However, how the expression is written depends on the data (url) format you want to process. Like this one:
Pattern pat = Pattern.compile("/Content/SiteFiles/30/([a-z0-9\\-]+)/.*");
Matcher m = pat.matcher("http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true");
if (m.find()) {
System.out.println(m.group(1));
}
Note: This is a Java-only question (i.e. no Javascript, sed, Perl, etc.)
I need to filter out all the "reluctant" curly braces ({}) in a long string of text.
(by "reluctant" I mean as in reluctant quantifier).
I have been able to come up with the following regex which correctly finds and lists all such occurrences:
Pattern pattern = Pattern.compile("(\\{)(.*?)(\\})", Pattern.DOTALL);
Matcher matcher = pattern.matcher(originalString);
while (matcher.find()) {
Log.d("WITHIN_BRACES", matcher.group(2));
}
My problem now is how to replace every found matcher.group(0) with the corresponding matcher.group(2).
Intuitively I tried:
while (matcher.find()) {
String noBraces = matcher.replaceAll(matcher.group(2));
}
But that replaced all found matcher.group(0) with only the first matcher.group(2), which is of course not what I want.
Is there an expression or a method in Java's regex to perform this "corresponding replaceAll" that I need?
ANSWER: Thanks to the tip below, I have been able to come up with 2 fixes that did the trick:
if (matcher.find()) {
String noBraces = matcher.replaceAll("$2");
}
Fix #1: Use "$2" instead of matcher.group(2)
Fix #2: Use if instead of while.
Works now like a charm.
You can use the special backreference syntax:
String noBraces = matcher.replaceAll("$2");