how can I select date from text in java? [duplicate] - java

This question already has answers here:
How to extract a date from a string and put it into a date variable in Java
(5 answers)
Closed 2 years ago.
how can I select date from text in java? for example if I have dates in format: 2007-01-12abcd, absc2008-01-31 and I need to have dates in format: 2007-01-12, 2008-01-31 (without text). I used matcher in my code but it is not working.
for (int i=0; i < list.size(); i++) {
Pattern compiledPattern = Pattern.compile("((?:19|20)[0-9][0-9])-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])", Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(list.get(i));
if (matcher.find() == true) {
new_list.add(list.get(i));
}
}

I would keep things simple and just search on the following regex pattern:
\d{4}-\d{2}-\d{2}
It is fairly unlikely that anything which is not a date in your text already would match to this pattern.
Sample code:
String input = "2007-01-12abcd, absc2008-01-31";
String pattern = "\\d{4}-\\d{2}-\\d{2}";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
while (m.find()) {
System.out.println(m.group(0));
}
This prints:
2007-01-12
2008-01-31
By the way, your regex pattern can't be completely correct anyway, because it doesn't handle odd edge cases such as leap years, where February has 29 instead of 28 days.

well i havent made a code but i think i might help you. First of all I presuppose that the format of the date in the string is already the right way(the order of the numbers is right and there are commas between the dates). Go through the string with a for-each for each character. If the current character(char) is a proper letter like a, b or c then you donw add it to the final string. If not you do add it. If the character is a comma you have to add this string to the list. The same should happen if it is the last character. This might not be the best way to do that but i am very sure it should work

Related

How to capture a regex group for below pattern [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 3 years ago.
I am exploring java regex groups and I am trying to replace a string with some characters.
I have a string str = "abXYabcXYZ"; and I am trying to replace all characters except for the pattern group abc in string.
I tried to use str.replaceAll("(^abc)",""), but it did not work. I understand that (abc) will match a group.
You might find it easier to find the parts you want to keep and just build a new string. There are flaws with this issue with overlapping patterns, but it will likely be good enough for your use case. However, if your pattern really is as simple as "abc" then you may want to instead consider just counting the total number of matches.
String str = "abXYabcXYZ";
Pattern patternToKeep = Pattern.compile("abc");
MatchResult matches = patternToKeep.matcher(str).toMatchResult();
StringBuilder sb = new StringBuilder();
for (int i = 1; i < matches.groupCount(); i++) {
sb.append(matches.group(i));
}
System.out.println(sb.toString());
It is easier to keep the matching parts of the pattern and concatenate them. In the following example the matcher iterates with find() over str and match the next pattern. In the loop your "abc" pattern will be always found at group(0).
String str = "abXYabcXYZabcxss";
Pattern pattern = Pattern.compile("abc");
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
sb.append(matcher.group(0));
}
System.out.println(sb.toString());
For only replacing, the nearest you can get would be:
((?!abc).)*
But with the problem that only the a's of abc would not be replaced.
Regex101 example

Pattern and Matcher in Java: Matcher only finds one match instead of two [duplicate]

This question already has answers here:
Overlapping matches in Regex
(3 answers)
Closed 5 years ago.
I'm working with Pattern and Matcher in Java. I have the following code:
String searchString = "0,00,0";
String searchInText = "0,00,00,0"
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(searchString);
while(m.find){
...
}
My Problem is that the Matcher only finds one match from the first zero to the 4th zero. But there should be another match from the 3rd zero to the last zero.
Can someone help me? Is there a workaround?
Getting overlapping matches with regex is tricky, especially if you're not very familiar with regexes.
If you're not really using regex functionality (like in your example), you could easily do this with an indexOf(String, int) and keep increasing the index from which you're doing the search.
int index = 0;
while((index = text.indexOf(pattern, index)) > -1) {
System.out.println(index + " " + pattern);
index++;
}

java split by bracket and keep the delmiter - RegEx [duplicate]

This question already has answers here:
How do I split a string in Java?
(39 answers)
Closed 6 years ago.
i am trying to split the string using regex with closing bracket as a delimiter and have to keep the bracket..
i/p String: (GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)
needed o/p:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)
I am using the java regex - "\([^)]*?\)" and it is throwing me the error..Below is the code I am using and when I try to get the group, its throwing the error..
Pattern splitDelRegex = Pattern.compile("\\([^)]*?\\)");
Matcher regexMatcher = splitDelRegex.matcher("(GROUP=test1)(GROUP=test2) (GROUP=test3)(GROUP=test4)");
List<String> matcherList = new ArrayList<String>();
while(regexMatcher.find()){
String perm = regexMatcher.group(1);
matcherList.add(perm);
}
any help is appreciated..Thanks
You simply forgot to put capturing parentheses around the entire regex. You are not capturing anything at all. Just change the regex to
Pattern splitDelRegex = Pattern.compile("(\\([^)]*?\\))");
^ ^
I tested this in Eclipse and got your desired output.
You could use
str.split(")")
That would return an array of strings which you would know are lacking the closing parentheses and so could add them back in afterwards. Thats seems much easier and less error prone to me.
You could try changing this line :
String perm = regexMatcher.group(1);
To this :
String perm = regexMatcher.group();
So you read the last found group.
I'm not sure why you need to split the string at all. You can capture each of the bracketed groups with a regex.
Try this regex (\\([a-zA-Z0-9=]*\\)). I have a capturing group () that looks for text that starts with a literal \\(, contains [a-zA-Z0-9=] zero or many times * and ends with a literal \\). This is a pretty loose regex, you could tighten up the match if the text inside the brackets will be predictable.
String input = "(GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)";
String regex = "(\\([a-zA-Z0-9=]*\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) { // find the next match
System.out.println(matcher.group()); // print the match
}
Output:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)

Regex shows incorrect answer

I have a text file where it has information of a person. I have written a regex to extract age of a person ie X years Y months.
String n="Mayur is 18 years 4 months old ";
Pattern p=Pattern.compile("[\\d+\\s+years]+[\\d+\\s+months]+",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(n);
while (m.find()) {
System.out.println(m.group(0));
}
Output i received is:
r
s 18 years 4 months o
I did not extracted those wanted characters in the output but it is listed them.
Expected output is:
18 years 4 Month
Please not they are records with only years and some with only months.
The problem with your regex is that [\d+\s+years] matches any character found in the list so that's why you got r in the result you don't have to use brackets [].
This is the Regex you need (\\d+\\s* years\\s*)*(\\d+\\s* months)*, use () for a matching group.
I changed \\s+ to \\s* to make it match cases where it's written:
Mayur is 18years 4months old
Here's a Live DEMO
EDIT:
The problem of empty strings is due to to the *quantifier after the matching groups, I fixed it using this new Regex:
(\\d+\\s* years\\s*)+|(\\d+\\s* months)+
See the DEMO here
(?:\\d+\\s+(?:years|months)\\s*){1,2}
Use this.[] is not what you think.Its a character class.See demo.
https://regex101.com/r/uE3cC4/25
Try this:
String n="Mayur is 18 years 4 months old ";
Pattern p=Pattern.compile("([0-9]+) years ([0-9]+) months",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(n);
while (m.find()) {
String years = m.group(1);
String months = m.group(2);
System.out.println(m.group(0));
}
Using "0" as group you can get the whole expression. Otherwise using 1 or 2 you can get the values.

Extracting dates from string

I have a list with file names that look roughly like this: Gadget1-010912000000-020912235959.csv, i.e. they contain two dates indicating the timespan of their data.
The user enters a date format and a file format:
File Format in this case: *GADGET*-*DATE_FROM*-*DATE_TO*.csv
Date format in this case: ddMMyyHHmmss
What I want to do is extracting the three values out of the file name with the given file and date format.
My problem is: Since the date format can differ heavily (hours, minutes and seconds can be seperated by a colon, dates by a dot,...) I don't quite know how to create a fitting regular expression.
You can use a regular expression to remove non digits characters, and then parse value.
DateFormat dateFormat = new SimpleDateFormat("ddMMyyHHmmss");
String[] fileNameDetails = ("Gadget1-010912000000-020912235959").split("-");
/*Catch All non digit characters and removes it. If non exists maintains original string*/
String date = fileNameDetails[1].replaceAll("[^0-9]", "");
try{
dateFormat.parse(fileNameDetails[1]);
}catch (ParseException e) {
}
Hope it helps.
SimpleDateFormat solves your issue. You can define the format with commas, spaces and whatever and simply parse according to the format:
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
So you map your format (e.g ddMMyyHHmmss) to a corresponding SimpleDateFormat.
SimpleDateFormat format = new SimpleDateFormat("ddMMyyHHmmss");
Date x = format.parse("010912000000");
If the format changes, you simply change the SimpleDateFormat
You can use a series of date-time formats, trying each until one works.
You may need to order the formats to prioritize matches.
For example, with Joda time, you can use DateTimeFormat.forPattern() and DateTimeFormatter.getParser() for each of a series of patterns. Try DateTimeParser.parseInto() until one succeeds.
One nice thing about this approach is that it is easy to add and remove patterns.
Use Pattern and Matcher class.
Look at the example:
String inputDate = "01.09.12.00:00:00";
Pattern pattern = Pattern.compile(
"([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})");
Matcher matcher = pattern.matcher(inputDate);
matcher.find();
StringBuilder cleanStr = new StringBuilder();
for(int i = 1; i <= matcher.groupCount(); i++) {
cleanStr.append(matcher.group(i));
}
SimpleDateFormat format = new SimpleDateFormat("ddMMyyHHmmss");
Date x = format.parse(cleanStr.toString());
System.out.println(x.toString());
The most important part is line
Pattern pattern = Pattern.compile(
"([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})[:]{0,1}([0-9]
Here you define regexp and mark groups in paranthesis so ([0-9]{2}) marks a group. Then is expression for possible delimeters [\\.]* in this case 0 or 1 dot, but you can put more possible delimeters for example [\\.|\]{0,1}.
Then you run matcher.find() which returns true if pattern matches. And then using matcher.group(int) you can get group by group. Note that index of first group is 1.
Then I construct clean date String using StringBuilder. And then parse date.
Cheers,
Michal

Categories