Regex shows incorrect answer - java

I have a text file where it has information of a person. I have written a regex to extract age of a person ie X years Y months.
String n="Mayur is 18 years 4 months old ";
Pattern p=Pattern.compile("[\\d+\\s+years]+[\\d+\\s+months]+",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(n);
while (m.find()) {
System.out.println(m.group(0));
}
Output i received is:
r
s 18 years 4 months o
I did not extracted those wanted characters in the output but it is listed them.
Expected output is:
18 years 4 Month
Please not they are records with only years and some with only months.

The problem with your regex is that [\d+\s+years] matches any character found in the list so that's why you got r in the result you don't have to use brackets [].
This is the Regex you need (\\d+\\s* years\\s*)*(\\d+\\s* months)*, use () for a matching group.
I changed \\s+ to \\s* to make it match cases where it's written:
Mayur is 18years 4months old
Here's a Live DEMO
EDIT:
The problem of empty strings is due to to the *quantifier after the matching groups, I fixed it using this new Regex:
(\\d+\\s* years\\s*)+|(\\d+\\s* months)+
See the DEMO here

(?:\\d+\\s+(?:years|months)\\s*){1,2}
Use this.[] is not what you think.Its a character class.See demo.
https://regex101.com/r/uE3cC4/25

Try this:
String n="Mayur is 18 years 4 months old ";
Pattern p=Pattern.compile("([0-9]+) years ([0-9]+) months",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(n);
while (m.find()) {
String years = m.group(1);
String months = m.group(2);
System.out.println(m.group(0));
}
Using "0" as group you can get the whole expression. Otherwise using 1 or 2 you can get the values.

Related

how can I select date from text in java? [duplicate]

This question already has answers here:
How to extract a date from a string and put it into a date variable in Java
(5 answers)
Closed 2 years ago.
how can I select date from text in java? for example if I have dates in format: 2007-01-12abcd, absc2008-01-31 and I need to have dates in format: 2007-01-12, 2008-01-31 (without text). I used matcher in my code but it is not working.
for (int i=0; i < list.size(); i++) {
Pattern compiledPattern = Pattern.compile("((?:19|20)[0-9][0-9])-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])", Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(list.get(i));
if (matcher.find() == true) {
new_list.add(list.get(i));
}
}
I would keep things simple and just search on the following regex pattern:
\d{4}-\d{2}-\d{2}
It is fairly unlikely that anything which is not a date in your text already would match to this pattern.
Sample code:
String input = "2007-01-12abcd, absc2008-01-31";
String pattern = "\\d{4}-\\d{2}-\\d{2}";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
while (m.find()) {
System.out.println(m.group(0));
}
This prints:
2007-01-12
2008-01-31
By the way, your regex pattern can't be completely correct anyway, because it doesn't handle odd edge cases such as leap years, where February has 29 instead of 28 days.
well i havent made a code but i think i might help you. First of all I presuppose that the format of the date in the string is already the right way(the order of the numbers is right and there are commas between the dates). Go through the string with a for-each for each character. If the current character(char) is a proper letter like a, b or c then you donw add it to the final string. If not you do add it. If the character is a comma you have to add this string to the list. The same should happen if it is the last character. This might not be the best way to do that but i am very sure it should work

Regular expression groups with all combinations [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 3 years ago.
I am studying regular expression groups and have a simple question about that. Let's say I have a basic regular expression in java such as :
Pattern pattern = Pattern.compile("[0-9]{16}");
And I have a matcher :
Matcher matcher = pattern.matcher("111111111111111122);
while (matcher.find()) {
System.out.println(matcher.group());
}
When I loop, I want to be printed :
1111111111111111
1111111111111112
1111111111111122
I want to get the result of all 16 length number combinations. But it's only printed :
1111111111111111
Can I solve this issue by only modifying the regexp pattern?
To get the result you want, change your code to:
Pattern pattern = Pattern.compile("(?=([0-9]{16}))");
Matcher matcher = pattern.matcher("111111111111111122");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Notice the call to group(1), not group(), which is the same as group(0).
Output
1111111111111111
1111111111111112
1111111111111122

How can I get the second matcher in regex in Java? [duplicate]

This question already has answers here:
Match at every second occurrence
(6 answers)
Closed 4 years ago.
I want to extract the second matcher in a regex pattern between - and _ in this string:
VA-123456-124_VRG.tif
I tried this:
Pattern mpattern = Pattern.compile("-.*?_");
But I get 123456-124 for the above regex in Java.
I need only 124.
How can I achieve this?
If you know that's your format, this will return the requested digits.
Everything before the underscore that is not a dash
Pattern pattern = Pattern.compile("([^\-]+)_");
I would use a formal pattern matcher here, to be a specific as possible. I would use this pattern:
^[^-]+-[^-]+-([^_]+).*
and then check the first capture group for the possible match. Here is a working code snippet:
String input = "A-123456-124_VRG.tif";
String pattern = "^[^-]+-[^-]+-([^_]+).*";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(1) );
}
124
Demo
By the way, there is a one liner which would also work here:
System.out.println(input.split("[_-]")[2]);
But, the caveat here is that it is not very specific, and might fail for your other data.
You know you want only digits so be more specific Pattern.compile("-([0-9]+)_");
Try using below regex:
.*-(.*?)_
What this will do is : .* will match all the characters till it finds - . Also, as it is greedy, it will try to find the last possible option, which is just before 24
Demo: https://regex101.com/r/NWgZoH/1
JShell Output:
jshell> Pattern pattern = Pattern.compile(".*-(.*?)_");
pattern ==> .*-(.*?)_
jshell> Matcher matcher = pattern.matcher("VA-123456-124_VRG.tif");
matcher ==> java.util.regex.Matcher[pattern=.*-(.*?)_ region=0,21 lastmatch=]
jshell> if(matcher.find()){
...> System.out.println(matcher.group(1));
...> }
124
Your test case are very low, but if I answer your test case I think below regex can be helpful.
-.*-(.*)_
then extract first group.
if you just want to extract in simple way go ahead with this,
public static void main(String[] args) {
String s = "VA-123456-124_VRG.tif";
System.out.println(s.split("[_-]")[2]);
}

Regex to extract Job Experience from a text

I am trying to create a general regex to extract job experience from a text.
Consider the following examples and their expected outputs.
1)String string1= "My work experience is 2 years"
Output = "2 years"
2) String string2 = "My work experience is 6 months"
Output = "6 months"
I have used regex as /[0-9] years/ but it doesn't seem to work.
Please share if anyone knows a general regex.
You can use alternations:
String str = "My work experience is 2 years\nMy work experience is 6 months";
String rx = "\\d+\\s+(?:months?|years?)";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group(0));
}
See IDEONE demo
Output:
2 years
6 months
Or, you can also obtain strings like 3 years 6 months like this:
String str = "My work experience is 2 years\nMy work experience is 3 years 6 months and his experience is 4 years and 5 months";
String rx = "\\d+\\s+years?\\s+(?:and\\s*)?\\d+\\s+months?|\\d+\\s+(?:months?|years?)";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group(0));
}
Output of another demo:
2 years
3 years 6 months
4 years and 5 months
I suggest using this regex:
String regex = "\\d+.*$"

Java Regular expressions issue - Can't match two strings in the same line [duplicate]

This question already has answers here:
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 8 years ago.
just experiencing some problems with Java Regular expressions.
I have a program that reads through an HTML file and replaces any string inside the #VR# characters, i.e. #VR#Test1 2 3 4#VR#
However my issue is that, if the line contains more than two strings surrounded by #VR#, it does not match them. It would match the leftmost #VR# with the rightmost #VR# in the sentence and thus take whatever is in between.
For example:
#VR#Google#VR#
My code would match
URL-GOES-HERE#VR#" target="_blank" style="color:#f4f3f1; text-decoration:none;" title="ContactUs">#VR#Google
Here is my Java code. Would appreciate if you could help me to solve this:
Pattern p = Pattern.compile("#VR#.*#VR#");
Matcher m;
Scanner scanner = new Scanner(htmlContent);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String match_found = m.group().replaceAll("#VR#", "");
System.out.println("group: " + match_found);
}
}
I tried replacing m.group() with m.group(0) and m.group(1) but nothing. Also m.groupCount() always returns zero, even if there are two matches as in my example above.
Thanks, your help will be very much appreciated.
Your problem is that .* is "greedy"; it will try to match as long a substring as possible while still letting the overall expression match. So, for example, in #VR# 1 #VR# 2 #VR# 3 #VR#, it will match 1 #VR# 2 #VR# 3.
The simplest fix is to make it "non-greedy" (matching as little as possible while still letting the expression match), by changing the * to *?:
Pattern p = Pattern.compile("#VR#.*?#VR#");
Also m.groupCount() always returns zero, even if there are two matches as in my example above.
That's because m.groupCount() returns the number of capture groups (parenthesized subexpressions, whose corresponding matched substrings retrieved using m.group(1) and m.group(2) and so on) in the underlying pattern. In your case, your pattern has no capture groups, so m.groupCount() returns 0.
You can try the regular expression:
#VR#(((?!#VR#).)+)#VR#
Demo:
private static final Pattern REGEX_PATTERN =
Pattern.compile("#VR#(((?!#VR#).)+)#VR#");
public static void main(String[] args) {
String input = "#VR#Google#VR# ";
System.out.println(
REGEX_PATTERN.matcher(input).replaceAll("$1")
); // prints "Google "
}

Categories