java regex - not able to retrieve contents from first square brackets - java

I am pretty new to regular expressions stuff.. I have this requirement of picking up contents in the first square brackets. For e.g. if I have the string like "PORT-OTEF_RA2/6 [Eh0001/001-06] [ignore, test port]",
I need the result as "Eh0001/001-06".
I am using following regular expression.
Pattern pattern =
Pattern.compile("^PORT.+\\[(.*?)\\]");
Matcher matcher =
pattern.matcher("PORT-OTEF_RA2/6 [Eh0001/001-06] [ignore, test port]");
if(matcher.find()){
System.out.println(matcher.group(1));
}
but I always get the contents of second square brackets.
However, if I give the regular expression as
Pattern.compile("\\[(.*?)\\]");
I get the required answer. But I need to make sure the string starts with "PORT". Can someone light me on where I am going wrong.

Use non-greedy regex after PORT:
^PORT.+?\\[(.*?)\\]
Otherwise .+ will be greedy and match till last [...] is found.
RegEx Demo

Related

Regex extract string in java

I'm trying to extract a string from a String in Regex Java
Pattern pattern = Pattern.compile("((.|\\n)*).{4}InsurerId>\\S*.{5}InsurerId>((.|\\n)*)");
Matcher matcher = pattern.matcher(abc);
I'm trying to extract the value between
<_1:InsurerId>F2021633_V1</_1:InsurerId>
I'm not sure where am I going wrong but I don't get output for
if (matcher.find())
{
System.out.println(matcher.group(1));
}
You can use:
Pattern pattern = Pattern.compile("<([^:]+:InsurerId)>([^<]*)</\\1>");
Matcher matcher = pattern.matcher(abc);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
RegEx Demo
You may want to use the totally awesome page http://regex101.com/ to test your regular expressions. As you can see at https://regex101.com/r/rV8uM3/1, you only have empty capturing groups, but let me explain to you what you did. :D
((.|\n)*) This matches any character, or a new line, unimportant how often. It is capturing, so your first matching group will always be everything before <_1:InsurerId>, or an empty string. You can match any character instead, it will include new lines: .*. You can even leave it away as it isn't actually part of the String you want to match - using anything here will actually be a problem if you have multiple InsurerIds in your file and want to get them all.
.{4}InsurerId> This matches "InsurerId>" with any four characters in front of it and is exactly what you want. As the first character is probably always an opening angle bracket (and you don't want stuff like "<ExampleInsurerId>"), I'd suggest using <.{3}InsurerId> instead. This still could have some problems (<Test id="<" xInsurerId>), so if you know exactly that it's "_<a digit>:", why not use <_\d:InsurerId>?
\S* matches everything except for whitespaces - probably not the best idea as XML and similar files can be written to not contain any space at all. You want to have everything to the next tag, so use [^<]* - this matches everything except for an opening angle bracket. You also want to get this value later, so you have to use a capturing group: ([^<]*)
.{5}InsurerId> The same thing here: use <\/.{3}InsurerId> or <\/_\d:InsurerId> (forward slashes are actually characters interpreted by other RegEx implementations, so I suggest escaping them)
((.|\n)*) Again the same thing, just leave it away
The resulting Regular Expression would then be the following:
<_\d:InsurerId>([^<]*)<\/_\d:InsurerId>
And as you can see at https://regex101.com/r/mU6zZ3/1 - you have exactly one match, and it's even "F2021633_V1" :D
For Java, you have to escape the backslashes, so the resulting code would look like this:
Pattern pattern = Pattern.compile("<_\\d:InsurerId>([^<]*)<\\/_\\d:InsurerId>");
If you are using Java 7 and above, you can use naming groups to make the Regex a little bit more readable (also see the backreference group \k for close tag to match the openning tag):
Pattern pattern = Pattern.compile("(?:<(?<InsurancePrefix>.+)InsurerId>)(?<id>[A-Z0-9_]+)</\\k<InsurancePrefix>InsurerId>");
Matcher matcher = pattern.matcher("<_1:InsurerId>F2021633_V1</_1:InsurerId>");
if (matcher.matches()) {
System.out.println(matcher.group("id"));
}
Using back reference the matches() fails, for example, on this text
<_1:InsurerId>F2021633_V1</_2:InsurerId>
which is correct
Javadoc has a good explanation: https://docs.oracle.com/javase/8/docs/api/
Also you might consider using a different tool (XML parser) instead of Regex, as well, as other people have to support your code, and complex Regex is usually difficult to understand.

Java Regular expressions for filename

I want to check the filenames sent to me against two patterns.
The first regular expression is ~*~, which should match names like ~263~. I put this in online regular expression testers and it matches. The code doesnt work though. Says no match
List<FTPFile> ret = new ArrayList<FTPFile>();
Pattern pattern = Pattern.compile("~*~");
Matcher matcher;
for (FTPFile file : files)
{
matcher = pattern.matcher(file.getName());
if(matcher.matches())
{
ret.add(file);
}
}
return ret;
Also the second pattern I need is ##* which should match strings like abc#ere#sss
Please tell me the proper patterns in java for this.
You need to define your pattern like,
Pattern pattern = Pattern.compile("~.*~");
~* in your regex ~*~ will repeat the first ~ zero or more times. So it won't match the number following the first ~. Because matches method tries to match the whole input string, this regex causes the match to fail. So you need to add .* inbetween to match strings like ~66~ or ~kjk~ . To match the strings which has only numbers present inbetween ~, you need to use ~\d+~
Try Regex:
\~.*\~
Instead:
~*~
Example:
Pattern pattern = Pattern.compile("\\~.*\\~");

%etd(msg01) regular expression?

I am trying to write a regular expression for String like %etd(msg01).
String string = "My name is %etd(msg01) and %etd(msg02)";
Pattern pattern = Pattern.compile("%etd(.+)");
Matcher matcher = pattern.matcher(string);
while(matcher.find()) {
System.out.println(matcher.group());
}
It prints %etd(msg01) and %etd(msg02). However, I want it to print %etd(msg01) %etd(msg02) separately. I mean I am looking for non-greedy match.
How should the regular expression be changed to make it non greedy in this situation?
You should use this regex:
Pattern pattern = Pattern.compile("%etd\\([^)]+\\)");
Please place a question mark after .* or .+ to make it nongreedy. This should work for you...
Pattern pattern = Pattern.compile("%etd\\(.+?\\)");
Double slashes are also necessary in front of open and close parenthesis because they carry a special meaning in regular expression.
Another way of using is as below if you are sure that your names doesn't contain an open paranthesis after the first one.
Pattern pattern = Pattern.compile("%etd\\([^(]+\\)");

Java: regex - how do i get the first quote text

As a beginner with regex i believe im about to ask something too simple but ill ask anyway hope it won't bother you helping me..
Lets say i have a text like "hello 'cool1' word! 'cool2'"
and i want to get the first quote's text (which is 'cool1' without the ')
what should be my pattern? and when using matcher, how do i guarantee it will remain the first quote and not the second?
(please suggest a solution only with regex.. )
Use this regular expression:
'([^']*)'
Use as follows: (ideone)
Pattern pattern = Pattern.compile("'([^']*)'");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Or this if you know that there are no new-line characters in your quoted string:
'(.*?)'
when using matcher, how do i guarantee it will remain the first quote and not the second?
It will find the first quoted string first because it starts seaching from left to right. If you ask it for the next match it will give you the second quoted string.
If you want to find first quote's text without the ' you can/should use Lookahead and Lookbehind mechanism like
(?<=').*?(?=')
for example
System.out.println("hello 'cool1' word! 'cool2'".replaceFirst("(?<=').*?(?=')", "ABC"));
//out -> hello 'ABC' word! 'cool2'
more info
You could just split the string on quotes and get the second piece (which will be between the first and second quotes).
If you insist on regex, try this:
/^.*?'(.*?)'/
Make sure it's set to multiline, unless you know you'll never have newlines in your input. Then, get the subpattern from the result and that will be your string.
To support double quotes too:
/^.*?(['"])(.*?)\1/
Then get subpattern 2.

Whitespace in Java's regular expression

I'm trying to write a regular expression to mach an IRC PRIVMSG string. It is something like:
:nick!name#some.host.com PRIVMSG #channel :message body
So i wrote the following code:
Pattern pattern = Pattern.compile("^:.*\\sPRIVMSG\\s#.*\\s:");
Matcher matcher = pattern.matcher(msg);
if(matcher.matches()) {
System.out.println(msg);
}
It does not work. I got no matches. When I test the regular expression using online javascript testers, I got matches.
I tried to find the reason, why it doesn't work and I found that there's something wrong with the whitespace symbol. The following pattern will give me some matches:
Pattern.compile("^:.*");
But the pattern with \s will not:
Pattern.compile("^:.*\\s");
It's confusing.
The java matches method strikes again! That method only returns true if the entire string matches the input. You didn't include anything that captures the message body after the second colon, so the entire string is not a match. It works in testers because 'normal' regex is a 'match' if any part of the input matches.
Pattern pattern = Pattern.compile("^:.*?\\sPRIVMSG\\s#.*?\\s:.*$");
Should match
If you look at the documentation for matches(), uou will notice that it is trying to match the entire string. You need to fix your regexp or use find() to iterate through the substring matches.

Categories