Correct Regular Expression - java

I'm using Java Regex to read String of the type
"{\n 'Step 1.supply.vendor1.quantity':\"80"\,\n
'Step 2.supply.vendor2.quantity':\"120"\,\n
'Step 3.supply.vendor3.quantity':\"480"\,\n
'Step 4.supply.vendor4.quantity':\"60"\,\n}"
I have to detect strings of type
'Step 2.supply.vendor2.quantity':\"120"\,\n.
I'm trying to use pattern and matcher of regex but I'm not able to figure out the correct regular expression for lines like
<Beginning of Line><whitespace><whitespace><'Step><whitespace><Number><.><Any number & any type of characters><,\n><EOL>.
The <Beginning of Line> and <EOL> I have used for clarification purpose.
I have tried several patterns
String regex = "(\\n\\s{2})'Step\\s\\d.*,\n";
String regex = "\\s\\s'Step\\s\\d.*,\n";
I always get IllegalStateException: No match found.
I'm not able to find proper material to read on Java Regex with good examples. Any help would be really great. Thanks.

As the others said in the comments, you should really use a JSON Parser.
But if you want to see how it could work with a regex, here is how you can do it :
Take an example of a line you want to capture :Step 1.supply.vendor1.quantity':"80"
Replace digits with \\d* (\\d matches any digit)
Replace dots with \\. (dots need to be escaped)
Add some parenthesis around the parts that you want to capture
Here is the resulting regex : "Step (\\d*)\\.supply\\.vendor(\\d*)\\.quantity':\"(\\d*)\""
Now, use a Regex and a Matcher :
String input = "{\n 'Step 1.supply.vendor1.quantity':\"80\"\\,\n";
Pattern pattern = Pattern.compile("Step (\\d*)\\.supply\\.vendor(\\d*)\\.quantity':\"(\\d*)\"");
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
Output :
1 //(corresponds to "Step (\\d*)")
1 //(corresponds to "vendor(\\d*)")
80 //(corresponds to "quantity':\"(\\d*)")

Related

Regex to extract part of string before regex match

Say i have the following line:
My '123? Text ā€“ 56x73: Hello World blablabla
I want to extract everything before " - 56x73 ..."
I already found a regex to match the part which I don't want to extract:
\sā€“\s\d{1,2}x\d{1,2}:\s.+
How can I get only the other part using Java and Regex?
use
String str= ...
String regex= your regex
Pattern pattern;
Matcher matcher;
pattern = Pattern.compile(regex);
matcher = pattern.matcher(str);
if (matcher.find())
{
matcher.group(0, 1, ...)
use () in your regex to deliminate groups
You already got way around but this can be helpful,
Assuming 56 and 73 will NOT be constant.
Use Regex: "(.*)(\\s)(.*)(\\s)([\\d]+[x][\\d]+)"
then use "group(int number)" where a number will be 1 in this case.
I used .* between two \s intentionally to get around with "-" thing I didn't anything about that but I found this. Also noticed from one of the comment.
If anybody wants to edit and improve my answer you are more than welcome.

Regular expression to find substring in text

I have a text file contains some strings I want to extract with Java regex,
Those strings are in format of:
$numbers,numbers,numbers....,numbers##
(start with $, followed by groups of numbers plus ,, and end with ##)
Here is my pattern.
Pattern pattern = Pattern.compile("$*##");
Matcher matcher = pattern.matcher(text);
if (matcher.find())
{
}
It turns out that nothing match my pattern
Can anyone tell me what's wrong with it?
You need to do:
Pattern pattern = Pattern.compile("\\$\\$\\d+(,\\d+)*##$");
Thanks to #Pshemo for his valuable inputs to reach the solution.

how to get "something" from <em>something</em> use java Regular expressions

in the following, i need to get:
String regex = "Item#: <em>.*</em>";
String content = "xxx Item#: <em>something</em> yyy";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(content);
if( matcher.find() ) {
System.out.println(matcher.group());
}
it will print:
Item#: <em>something</em>
but i just need the value "something".
i know i can use .substring(begin,end) to get the value,
but is there another way which would be more elegant?
It prints the whole string because you have printed it. matcher.group() prints the complete match. To get specific part of your matched string, you need to change your Regex to capture the content between the tag in a group: -
String regex = "Item#: <em>(.*?)</em>";
Also, use Reluctant quantifier(.*?) to match the least number of characters before an </em> is encountered.
And then in if, print group(1) instead of group()
if( matcher.find() ) {
System.out.println(matcher.group(1));
}
Anyways, you should not use Regex to parse HTML. Regex is not strong enough to achieve this task. You should probably use some HTML parser like - HTML Cleaner. Also see the link that is provided in one of the comments in the OP. That post is very nice explanation of the problems you can face.

Print out the string that matched my regular expression in java?

Possible duplicate: Print regex matches in java
I am using Matcher class in java to match a string with a particular regular expression which I converted into a Pattern using the Pattern class. I know my regex works because when I do Matcher.find(), I am getting true values where I am supposed to. But I want to print out the stings that are producing those true values (meaning print out the strings that match my regex) and I don't see a method in the matcher class to achieve that. Please do let me know if anyone has encountered such a problem before. I apologize as this question is fairly rudimentary but I am fairly new to regex and hence am still finding my way around the regex world.
Assuming mis your matcher:
m.group() will return the matched string.
[EDIT] Added info regarding matched groups
Also, if your regex has portions inside parenthesis, m.group(n) will return the string that matches the nth group inside parenthesis;
Pattern p = Pattern.compile("mary (.*) bob");
Matcher m = p.matcher("since that day mary loves bob");
m.group() returns "mary loves bob".
m.group(1) return "loves".

Whitespace in Java's regular expression

I'm trying to write a regular expression to mach an IRC PRIVMSG string. It is something like:
:nick!name#some.host.com PRIVMSG #channel :message body
So i wrote the following code:
Pattern pattern = Pattern.compile("^:.*\\sPRIVMSG\\s#.*\\s:");
Matcher matcher = pattern.matcher(msg);
if(matcher.matches()) {
System.out.println(msg);
}
It does not work. I got no matches. When I test the regular expression using online javascript testers, I got matches.
I tried to find the reason, why it doesn't work and I found that there's something wrong with the whitespace symbol. The following pattern will give me some matches:
Pattern.compile("^:.*");
But the pattern with \s will not:
Pattern.compile("^:.*\\s");
It's confusing.
The java matches method strikes again! That method only returns true if the entire string matches the input. You didn't include anything that captures the message body after the second colon, so the entire string is not a match. It works in testers because 'normal' regex is a 'match' if any part of the input matches.
Pattern pattern = Pattern.compile("^:.*?\\sPRIVMSG\\s#.*?\\s:.*$");
Should match
If you look at the documentation for matches(), uou will notice that it is trying to match the entire string. You need to fix your regexp or use find() to iterate through the substring matches.

Categories