Why my regex isn't working for date? - java

I've got a problem using a regex to match the date in a string. Actually I've got a lot of "date formats" to match but the first one doesn't work and I don't get why it wouldn't work...
The format is like "September 12, 2013" or "May 6, 2014" or "June 02, 2014"...
In my string text, there is the following date : "July 4, 2014".
Here's my code :
Pattern p = Pattern.compile("[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
System.out.println(m.group(1));
But it comes to this error :
Exception in thread "main" java.lang.IllegalStateException: No match found
I even tried with smaller regex but it still doesn't match anything.
Thank you in advance for the help !

You need to invoke Matcher#find() or Matcher#matches() before invoking Matcher#group.
Otherwise, the match is not performed, hence you have neither the whole group, nor any single back-references populated.
Both methods mentioned above return boolean, which will help you infer whether or not your desired group will contain any text.
A typical idiom would be:
if (matcher.find()) {
// get the group(s)
}
Documentation here.
On the other hand, I would recommend you use DateFormats instead of regular expressions for dates - API here.

You need to condition for m.find() and print m.group(0) in place of (1).
String text = "July 4, 2014";
String pattern = "\\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\\D [0-9]{1,2}, [0-9]{4}";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
if(m.find()){
System.out.println("Found value: " + m.group(0));
}

You need to check if(m.find()) and print m.group(0) because if you print m.group(1) this will print 1 or 2, (1|2) according to your input, as your input has 2014, m.group(1) will print 2. And m.group(0) means the first group of "[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}" and it prints your full text because it takes your full regex as a first group because there is no other group except (1|2).
Try this code.
String text="July 4, 2014";
Pattern p = Pattern.compile("[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
if (m.find( )) {
System.out.println(m.group(0));
}else{
System.out.println("No match found");
}
Output
July 4, 2014
Visit here to know basic with example

Related

Java regular expression with Matching text and special charachter

Hi I am new to java regex. I have the below string
String s = " "KBC_2022-12-20-2004_IDEAL333_MASTER333_2022-12-20-1804_SUCCESS";
I wanted to only Print "333" which is appended with MASTER . The output should be 333.
Can someone help me writing the regex for this . Basically the code should print the value between "MASTER" and the next "_". here its 333 but the value might be of any number of character not limit to length 3.
You can use this regex: (?<=MASTER)[0-9]+(?=\_).
We are looking for everything between MASTER and _:
lookbehind: everything that goes after MASTER: (?<=MASTER)
lookahead: everything that goes before _: (?=\_)
Try on regex101.com
You can do MASTER(\\d+)_
Pattern p = Pattern.compile("MASTER(\\d+)_");
Matcher m = p.matcher(" KBC_2022-12-20-2004_IDEAL333_MASTER333_2022-12-20-1804_SUCCESS");
if (m.find()) {
System.out.println(m.group(1)); // 333
}
m = p.matcher(" KBC_2022-12-20-2004_IDEAL333_MASTER123_2022-12-20-1804_SUCCESS");
if (m.find()) {
System.out.println(m.group(1)); // 123
}

Matcher doesn't return the matched regex, instead returns whole string in group(0) (Java)

The problem I'm facing is that upon following the question at this link, Using Regular Expressions to Extract a Value in Java, I am unable to extract the correct group of the String
Pattern p = Pattern.compile("I have .*");
Matcher m = p.matcher("I have apples");
if(m.find()){
System.out.println(m.group(0));
}
What I get:
I have apples
What I want to get:
apples
I've tried asking for m.group(1) as well but it throws me an exception.
How should I go about this?
You have to define a capturing group to get m.group(...) work correctly.
Change your pattern to
Pattern p = Pattern.compile("I have (.*)");
m.group(0) 'denotes the entire pattern'
m.group(1) now returns the expected 'apple'

Regex not capturing matching in expected groups

I have been working on requirement and I need to create a regex on following string:
startDate:[2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]
There can be many variations of this string as follows:
startDate:[*;2016-10-12T12:23:23Z]
startDate:[2016-10-12T12:23:23Z;*]
startDate:[*;*]
startDate in above expression is a key name which can be anything like endDate, updateDate etc. which means we cant hardcode that in a expression. The key name can be accepted as any word though [a-zA-Z_0-9]*
I am using the following compiled pattern
Pattern.compile("([[a-zA-Z_0-9]*):(\\[[[\\*]|[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[Z]];[[\\*]|[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[Z]]\\]])");
The pattern matches but the groups created are not what I expect. I want the group surrounded by parenthesis below:
(startDate):([*:2016-10-12T12:23:23Z])
group1 = "startDate"
group2 = "[*;2016-10-12T12:23:23Z]"
Could you please help me with correct expression in Java and groups?
You are using [ rather than ( to wrap options (i.e. using |).
For example, the following code works for me:
Pattern pattern = Pattern.compile("(\\w+):(\\[(\\*|\\d{4}):\\*\\])");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
for (int i = 0; i < matcher.groupCount() + 1; i++) {
System.out.println(i + ":" + matcher.group(i));
}
} else {
System.out.println("no match");
}
To simplify things I just use the year but I'm sure it'll work with the full timestamp string.
This expression captures more than you need in groups but you can make them 'non-capturing' using the (?: ) construct.
Notice in this that I simplified some of your regexp using the predefined character classes. See http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for more details.
Here is a solution which uses your original regex, modified so that it actually returns the groups you want:
String content = "startDate:[2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]";
Pattern pattern = Pattern.compile("([a-zA-Z_0-9]*):(\\[(?:\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z|\\*):(?:\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z|\\*)\\])");
Matcher matcher = pattern.matcher(content);
// remember to call find() at least once before trying to access groups
matcher.find();
System.out.println("group1 = " + matcher.group(1));
System.out.println("group2 = " + matcher.group(2));
Output:
group1 = startDate
group2 = [2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]
This code has been tested on IntelliJ and appears to be working correctly.

Java Matcher. Return several entries from one sequence

For example I have following regexp: \d{2} (2 digits). And when I using
Matcher matcher = Pattern.compile("\\d{2}").matcher("123");
matcher.find();
String result = matcher.group();
In result variable I get only first entry, i.e. 12. But I want to get ALL possible entries, i.e. 12 and 23.
How to achieve this?
You'll need the help of a capture group within a positive lookahead:
Matcher m = Pattern.compile("(?=(\\d{2}))").matcher("1234");
while (m.find()) System.out.println(m.group(1));
prints
12
23
34
That's not how regular expression matching works. The matcher starts at the beginning of the string, and each time it finds a match it continues looking from the character following the end of that match - it will not give you overlapping matches.
If you want to find overlapping matches of an arbitrary regular expression without needing to use lookaheads and capturing groups you can do this by resetting the matcher's "region" after each match
Matcher matcher = Pattern.compile(theRegex).matcher(str);
// prevent ^ and $ from matching the beginning/end of the region when this is
// smaller than the whole string
matcher.useAnchoringBounds(false);
// allow lookaheads/behinds to look outside the current region
matcher.useTransparentBounds(true);
while(matcher.find()) {
System.out.println(matcher.group());
if(matcher.start() < str.length()) {
// start looking again from the character after the _start_ of the previous
// match, instead of the character following the _end_ of the match
matcher.region(matcher.start() + 1, str.length());
}
}
some thing like this
^(?=[1-3]{2}$)(?!.*(.).*\1).*$
Test and experiment here

Regexp grouping and replaceAll with .* in Java duplicates the replacement

I got a problem using Rexexp in Java. The example code writes out ABC_012_suffix_suffix, I was expecting it to output ABC_012_suffix
Pattern rexexp = Pattern.compile("(.*)");
Matcher matcher = rexexp.matcher("ABC_012");
String result = matcher.replaceAll("$1_suffix");
System.out.println(result);
I understand that replaceAll replaces all matched groups, the questions is why is this regexp group (.*) matching twice on my string ABC_012 in Java?
Pattern regexp = Pattern.compile(".*");
Matcher matcher = regexp.matcher("ABC_012");
matcher.matches();
System.out.println(matcher.group(0));
System.out.println(matcher.replaceAll("$0_suffix"));
Same happens here, the output is:
ABC_012
ABC_012_suffix_suffix
The reason is hidden in the replaceAll method: it tries to find all subsequences that match the pattern:
while (matcher.find()) {
System.out.printf("Start: %s, End: %s%n", matcher.start(), matcher.end());
}
This will result in:
Start: 0, End: 7
Start: 7, End: 7
So, to our first surprise, the matcher finds two subsequences, "ABC_012" and another "". And it appends "_suffix" to both of them:
"ABC_012" + "_suffix" + "" + "_suffix"
Probably .* gives you "full match" and then reduces match to the "empty match" (but still a match). Try (.+) or (^.*$) instead. Both work as expected.
At regexinfo star is defined as follows:
*(star) - Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.
If you just want to add "_suffix" to your input why don't you just do:
String result = "ABC_012" + "_suffix";
?

Categories