I need to find a regex to extract date section from the name of several files.
In particular I have these two formats:
ATC0200720140828080610.xls
ATC0200720140901080346_UFF_ACC.xls
I use these two regex to check file name format:
^ATC02007[0-9]{14}.xls$
^ATC02007[0-9]{14}_UFF_ACC.xls$
But I need a regex to extract a specific section:
constant | yyyyMMddHHmmss | constant
^ ^ ^
ATC02007 | 20140901080346 | _UFF_ACC.xls
Both regex I'm using match the entire file name, so I can't use to extract the middle section, so which is the right expression?
You are almost there. Just use round brackets to contain the numbers you want.
^ATC02007([0-9]{14})(_UFF_ACC)?.xls$
See example. The numbers are captured in group 1$1.
You need to use capturing groups.
^(ATC02007)([0-9]{14})((?:[^.]*)?\\.xls)$
DEMO
GRoup index 1 contains the first constant and group 2 contains date and time and group 3 contains the third constant.
String s = "ATC0200720140828080610.xls\n" +
"ATC0200720140901080346_UFF_ACC.xls";
Pattern regex = Pattern.compile("(?m)^(ATC02007)([0-9]{14})((?:[^.]*)?\\.xls)$");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
Output:
ATC02007
20140828080610
.xls
ATC02007
20140901080346
_UFF_ACC.xls
Related
This is my string
2007-01-12Jakistxt2008-01-31xxx2008-02-292008-15-102008-19-452009-05-0120999-11-11pppp2001-00-0109-01-012001-01-002009-01-1112009-02-291998-11-11
I tried find date in format YYYY-MM-DD . I know that directly it is not possible.
I managed print this result
2007-01-12
2008-01-31
2008-02-292
2008-19-452
0999-11-11
2001-00-010
2001-01-002
2009-02-291
String regex4="\\d{4}-\\d{2}-\\d{2,3}";
Pattern wzor4=Pattern.compile(regex4);
Pattern wzor5=Pattern.compile(regex5);
Matcher efekt4=wzor4.matcher(wyrazenie);
String rezultat4="";
while (efekt4.find()) {
list422.add(efekt4.group());
}
for(int i=0;i<list422.size();i++) System.out.println(list422.get(i));`
Try this pattern: (?(?<=^)|(?<=\D))\d{4}-\d{2}-\d{2}(?(?=$)|(?=\D)).
It uses \d{4}-\d{2}-\d{2} to match your strng format.
Also, the date cannot be followed or preceeded by any digit:
(?(?<=^)|(?<=\D)) - it's conditional look-behind: if we are at the beginning of the string, then start matching, if not, make sure that what's before is not a digit (\D)
(?(?=$)|(?=\D)) - it's look-ahead analogical to look-behind.
Demo
Alternatively, you could use just \d{4}-\d{2}-\d{2}, which would also match adjacent to each other.
Demo
I put it String regex4="(?(?<=^)|(?<=\D))\d{4}-\d{2}-\d{2}(?(?=$)|(?=\D))";
and Eclipse said
Unknown inline modifier near index 2
(?(?<=^)|(?<=\D))\d{4}-\d{2}-\d{2}(?(?=$)|(?=\D))
^
I have a String in following format
Index: /aap/guru/asdte/atsAPI.tcl
===================================================================
RCS file: /autons/atsAPI.tcl,v
retrieving revision 1.41
Index: /aap/guru/asdte/atsAPI1.tcl
===================================================================
RCS file: /autons/atsAPI1.tcl,v
retrieving revision 1.41
What I want is to match a line start with Index: and then get the file name from path.
I mean first get Index: /aap/guru/asdte/atsAPI.tcl and then extract atsAPI.tcl as final result.
Currently I am using matching twice, first whole line and then extracting file name.
My question is, how to do it in a single regular expression in java.
Current Code is
String line = "Index: /aap/guru/asdte/atsAPI.tcl\r\n===================================================================\r\nRCS file: /autons/atsAPI.tcl,v\r\nretrieving revision 1.41\r\n\r\nIndex: /aap/guru/asdte/atsAPI1.tcl\r\n===================================================================\r\nRCS file: /autons/atsAPI1.tcl,v\r\nretrieving revision 1.41";
Pattern regex1 = Pattern.compile("Index:.*?\\n", Pattern.DOTALL);
Pattern regex2 = Pattern.compile("[^*/]+$");
Matcher matcher1 = regex1.matcher(line);
while (matcher1.find()) {
String s = matcher1.group(0);
Matcher matcher2 = regex2.matcher(s);
while (matcher2.find()) {
System.out.println(matcher2.group(0));
}
}
how to do it in a single regular expression in java.
Use a capturing group as shown below.
Regular Expression:
^Index:.*\/(.*)
Now the filename can be obtained by using matcher.group(1) and is represented by the last part (.*) in the regex
^ matches starting anchor
Index: matches the literal as-is
.* matches anything (greedy)
\/ matches a slash /
(.*) matches the filename in a capturing group
Make sure (?m) or Pattern.MULTILINE flag is set so that the matching is multi line and matches the starting anchor ^ at the start of every line.
Regex101 Demo
EDIT: Modify your code to use only one regex, like this:
Pattern pattern = Pattern.compile("^Index:.*\\/(.*)", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
// Output:
atsAPI.tcl
atsAPI1.tcl
Demo
Try this ^Index.+\/([^\.]+\.\w+)$ with the gm flags or Index.+\/([^\.]+\.\w+) without the m flag. The only capturing group is for the name of the file.
Try the following regex, the answer is in the first match group:
Index:.*?\/([\w]+\.[\w]*)
You can debug it in the following link:
Regex link
I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$
You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Looking at your string, you could also find the word/digits after the colon.
:(\w+)
Hi below is my text file
welcome to java training
program
Name rtrti*&*
John
address india say^%$7
Date of Birth
11/12/1989
I have 100 files like above.The above text is the extracted text from the image files so it is not in order, from this i need to get the names and date of births can you please suggest me how to do this, I am new to this task.
Required output
John
11/12/1989
I have tried
Pattern p = Pattern.compile("Name");
Matcher matcher = p.matcher(content);
matcher.find();
But I have know idea how to get the next line of matched pattern, I cant not read this file line by line because my need is to store entire text in a single string.
I'll give a few hints that will get you on track. Without more details regarding the expected input, it will be difficult to give you a solid solution. First, I trust that you are already familiar with the Pattern and Matcher javadocs. You will need to understand the Groups and capturing section. Finally, you can utilize DOTALL mode which will allow the . character to match newlines.
To get you started, the following should work to find the name:
Pattern p = Pattern.compile(
"(?s)" + // DOTALL
".*" + // Match anything (to consume everything before 'Name')
"Name" + // Match the literal 'Name'
".*?" + // Reluctantly grab everything until...
"\n" + // Newline is reached
"\\s*" + // Consume leading whitespace
"(\\S+)" // Capture at least one non-whitespace character
);
Matcher m = p.matcher(content);
if(m.find()) {
String name = m.group(1); // The first capturing group contains "John"
}
I have the following strings:
<PAUL SAINT-KARL 1997-05-07>
<BOB DEAN 2001-05-07>
<GUY JEDDY 2007-05-07>
I want a java regex that would match this type of pattern "name and date" and then extract the name and date separately.
I able to match them separately with the following java regex:
1) (\d{4}-\d{2}-\d{2})>
2) <([ A-Z&#;0-9-]*+)
What I'm looking for is one regex that would identify the full text pattern as provided, and then extract the subsections, such as the actual name, and the date.
I'm looking to use Matcher.group() to retrieve the complete match from the target string.
Thanks
Try this:
"<([ A-Z&#;0-9-]*?) (\\d{4}-\\d{2}-\\d{2})>"
I changed the *+ to *? to make the * match lazily.