Java fix regex in code - java

I need to print #OPOK, but in the following code:
String s = "\"MSG1\":\"00\",\"MSG2\":\"#OPOK\",\"MSG3\":\"XXXXXX\"}";
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+)\".*");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
} else {
System.out.println("Match not found");
}
I get #OPOK","MSG3":"XXXXXX instead, how do I fix my pattern ?

You want to make your .+ part reluctant. By default it's greedy - it'll match as much as it can without preventing the pattern from matching. You want it to match as little as it can, like this:
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+?)\".*");
The ? is what makes it reluctant. See the Pattern documentation for more details.
Or of course you could just match against "any character other than a double quote" which is what Brian's approach will do. Both will work equally well as far as I'm aware; there may well be performance differences between them (I'd expect Brian's to perform better to be honest) but if performance is important to you you should test both approaches.

You probably want the following:
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]+)\"");
For the capture group you are interested in, this will match any character except a double quote. Since the group is surrounded by double quotes, this should prevent it from going "too far" in the match.
Edited to add: As #bmorris591 suggested in the comments, you can add an extra + (as shown below) to make the quantifier possessive. This may help improve performance in cases where the matcher fails to find a match.
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]++)\"");

Related

How to get just nested bracket in regex

I'm using Java and I would like to implement a code whose output is PRP I when the input is (NP (PRP I)).
My current implementation is like the following:
Pattern pattern = Pattern.compile("\\((.?)\\)");
Matcher matcher = pattern.matcher(noun_phrase);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
and its output is NP (PRP I.
I know that one possibility would be to count the parentheses, but I'm wondering if there is any way to get just the string inside the nested parentheses using regex.
This should work
Pattern pattern = Pattern.compile("\\(.*?\\((.*?)\\)\\)");
Matcher matcher = pattern.matcher("(NP (PRP I))");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
You can use following sites to experiment with Regular expressions.
https://regex101.com/r/cE0dM7/1
http://leaverou.github.io/regexplained/
https://www.debuggex.com/r/gfVglXkY1Cw5D6Mb
You need to add another braces around the group. Also, you need to make sure that between the fixed parentheses you don't match the parentheses:
String noun_phrase = "(NP (PRP I))";
Pattern pattern = Pattern.compile("\\([^(]*\\(([^)]*)\\)[^)]*\\)");
Matcher matcher = pattern.matcher(noun_phrase);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
The negated character classes [^(] and [^)] make sure you don't match parentheses too eagerly.
Well, as I don't know how deep you can go with your parantheses, I will suggest two possible solutions.
Solution 1: Assuming the depth's exactly as in your question.
This regex will work: Pattern pattern = Pattern.compile("\\(([^()]*)\\)").
Solution 2: Assuming the depths arbitrary (but at least the most inner string is surrounded by parantheses).
In this case, you will have to make some more changes. First, your pattern will look like this: Pattern pattern = Pattern.compile("(\\(.*)*\\(([^)]*)\\)"). See the difference? You now have two groups, the first matching on all but the innermost part surrounded by parantheses, the second group is exactly the one you want. That means, in your loop, you have to change matcher.group(1) to matcher.group(2). Furthermore, [^)] makes sure, you don't have any closing parantheses in your group.

Java regex to parse a particular semicolon delimited param from a URL?

I have a URL I'm expecting like:
www.somewebsite.com/misc-session/;session-id=1FSDSF2132FSADASD13213
I want to parse out
session-id=1FSDSF2132FSADASD13213
Using a regular express in Java, what would be the best approach to take for this?
Using a test regex website I've experimented with some different ways but I'm wondering what is the best approach that is the most fail safe, and protected incase the URL is actually formed like:
www.somewebsite.com/misc-session/;session-id=1FSDSF2132FSADASD13213?someExtraParam=false
or
www.somewebsite.com/misc-session/extra-path/;session-id=1FSDSF2132FSADASD13213?someExtraParam=false
I am always just looking for the value of "session-id".
EDIT:
The value of session-id is NOT limited to digits and is guaranteed to contain a combination of both.
What is the best approach that is the most fail safe, and protected.
Well I think matching word boundary on both sides will be enough.
Regex: \bsession-id=\d+\b
Note:- Use \\d and \\b if regex flavor you are using needs double escaping.
Regex101 Demo
Just in case session-id have characters in range [A-Za-z0-9] use this regex.
Regex: \bsession-id=[A-Za-z0-9]+\b
Regex101 Demo
Ideone Demo
Remember to include
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Try this one:
String str = "www.somewebsite.com/misc-session/;session-id=213213213";
Pattern p = Pattern.compile("(session-id=\\d+)");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(0));
}
Note that session-id= is always given and you are interested in the following number, that is represented with \d (use double \\d in Java). The + stands for at least one number at all.
However better look at the detailed description at Regex101.

Regex extract string in java

I'm trying to extract a string from a String in Regex Java
Pattern pattern = Pattern.compile("((.|\\n)*).{4}InsurerId>\\S*.{5}InsurerId>((.|\\n)*)");
Matcher matcher = pattern.matcher(abc);
I'm trying to extract the value between
<_1:InsurerId>F2021633_V1</_1:InsurerId>
I'm not sure where am I going wrong but I don't get output for
if (matcher.find())
{
System.out.println(matcher.group(1));
}
You can use:
Pattern pattern = Pattern.compile("<([^:]+:InsurerId)>([^<]*)</\\1>");
Matcher matcher = pattern.matcher(abc);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
RegEx Demo
You may want to use the totally awesome page http://regex101.com/ to test your regular expressions. As you can see at https://regex101.com/r/rV8uM3/1, you only have empty capturing groups, but let me explain to you what you did. :D
((.|\n)*) This matches any character, or a new line, unimportant how often. It is capturing, so your first matching group will always be everything before <_1:InsurerId>, or an empty string. You can match any character instead, it will include new lines: .*. You can even leave it away as it isn't actually part of the String you want to match - using anything here will actually be a problem if you have multiple InsurerIds in your file and want to get them all.
.{4}InsurerId> This matches "InsurerId>" with any four characters in front of it and is exactly what you want. As the first character is probably always an opening angle bracket (and you don't want stuff like "<ExampleInsurerId>"), I'd suggest using <.{3}InsurerId> instead. This still could have some problems (<Test id="<" xInsurerId>), so if you know exactly that it's "_<a digit>:", why not use <_\d:InsurerId>?
\S* matches everything except for whitespaces - probably not the best idea as XML and similar files can be written to not contain any space at all. You want to have everything to the next tag, so use [^<]* - this matches everything except for an opening angle bracket. You also want to get this value later, so you have to use a capturing group: ([^<]*)
.{5}InsurerId> The same thing here: use <\/.{3}InsurerId> or <\/_\d:InsurerId> (forward slashes are actually characters interpreted by other RegEx implementations, so I suggest escaping them)
((.|\n)*) Again the same thing, just leave it away
The resulting Regular Expression would then be the following:
<_\d:InsurerId>([^<]*)<\/_\d:InsurerId>
And as you can see at https://regex101.com/r/mU6zZ3/1 - you have exactly one match, and it's even "F2021633_V1" :D
For Java, you have to escape the backslashes, so the resulting code would look like this:
Pattern pattern = Pattern.compile("<_\\d:InsurerId>([^<]*)<\\/_\\d:InsurerId>");
If you are using Java 7 and above, you can use naming groups to make the Regex a little bit more readable (also see the backreference group \k for close tag to match the openning tag):
Pattern pattern = Pattern.compile("(?:<(?<InsurancePrefix>.+)InsurerId>)(?<id>[A-Z0-9_]+)</\\k<InsurancePrefix>InsurerId>");
Matcher matcher = pattern.matcher("<_1:InsurerId>F2021633_V1</_1:InsurerId>");
if (matcher.matches()) {
System.out.println(matcher.group("id"));
}
Using back reference the matches() fails, for example, on this text
<_1:InsurerId>F2021633_V1</_2:InsurerId>
which is correct
Javadoc has a good explanation: https://docs.oracle.com/javase/8/docs/api/
Also you might consider using a different tool (XML parser) instead of Regex, as well, as other people have to support your code, and complex Regex is usually difficult to understand.

Java Regex to check "=number", ex "=5455"?

I want to check a string that matches the format "=number", ex "=5455".
As long as the fist char is "=" & the subsequence is any number in [0-9] (dot is not allowed), then it will popup "correct" message.
if(str.matches("^[=][0-9]+")){
Window.alert("correct");
}
So, is this ^[=][0-9]+ the correct one?
if it is not correct, can u provide a correct solution?
if it is correct, then can u find a better solution?
I'm no big regex expert and more knowledgeable people than me might correct this answer, but:
I don't think there's a point in using [=] rather than simply = - the [...] block is used to declare multiple choices, why declare a multiple choice of one character?
I don't think you need to use ^ (if your input string contains any character before =, it won't match anyway). I'm unsure as to whether its presence makes your regex faster, slower or has no effect.
In conclusion, I'd use =[0-9]+
That should be correct it is looking for an anchored at the beginning = sign and then 1 or more digits between 0-9
Your regex will work, even though it can be simplified:
.matches() does not really do regex matching, since it tries and matches all the input against the regex; therefore the beginning of input anchor is not needed;
you don't need the character class around the =.
Therefore:
if (str.matches("=[0-9]+")) { ... }
If you want to match a string which only begins with that regex, you have to use a Pattern, a Matcher and .find():
final Pattern p = Pattern.compile("^=[0-9]+");
final Matcher m = p.matcher(str);
if (m.find()) { ... }
And finally, Matcher also has .lookingAt() which anchors the regex only at the beginning of the input.

Java Regex Escape

I've got this bit of code to grab a url within a textarea. It has been working great until I tried a url with a '+' in it.
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z]*)(.*)");
Matcher matcher = pattern.matcher(text);
So I tried puting \\+ and \\\\+ in my code but it did not work. So i did some googling and stack overflow problems kept mentioning this guy
Pattern.quote("+");
However, I am not sure how I implement that statement into what I currently have now. If that is even the way I want to go. But I'm assuming I need to do something like this...
String quote = Pattern.quote("+");
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z]*)(.*)");
Matcher matcher = pattern.matcher(text);
And then add the variable quote somewhere in the pattern? Please help! I just learned this stuff today I'm brand new to it! Thank you?
just escape the quote with \, example
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z\"]*)(.*)");
(https?[://.0-9-?a-z=_#!A-Z]*)
Bear in mind that [ and ] denote a class of characters, and that this means that any character within it will be included. [aegl]+ will match "age", "a", "e", g", "eagle", and "gaggle". It also means that a character listed twice (like /) is completely redundant.
Pattern.quote is useful, but will only return the same string with a backslash preceding any special character. Pattern.quote("+") will return \+.
Because + has no significance between square brackets, you should be able to put a + unescaped within the square brackets. At that point you can also add a \\ if it makes you feel better.
Pattern pattern = Pattern.compile("(.*)(https?[:/.0-9-?a-z=_#!A-Z+]*)(.*)");
Pattern pattern = Pattern.compile("(.*)(https?[:/.0-9-?a-z=_#!A-Z\\+]*)(.*)");
See it here: http://fiddle.re/0780

Categories