Trying to find apt regex expression for my query - java

Pattern p1 = Pattern.compile("(?:^|)'([^']*?)'(?:$|)");
Matcher m = p1.matcher(input);
//Matcher m = p2.matcher(testcases);
while (m.find()) {
output += (m.group().replace("\'", "").trim() + "/");
}
Input
/content/folder[#name='folder query 2']/folder[#name='Share file
Zone']/folder[#name='steve']/folder[#name="steve's Personal
Folder"]/folder[#name='Backup']/folder[#name='20150317']/folder[#name='.Archive']
output should be -
folder query 2,
Share file zone,
steve,
steve's Personal folder,
Backup,
20150317,
.Archive
for some reasons my regex expressions seems to be reading only words with quotes so it doesn't consider steve's neither double quotes of the same. i am trying to format the query hence what i require is only folder names irrespective of single quote or double quote without considering apostrophe associated.

Use the following regex: (['"])(.*?)\1
It matches the opening quote (single or double), capturing that character as capture #1, captures the text as capture #2, and ends with the same kind of quote used at the beginning, by matching the capture #1.
Remember to escape " and \ when writing as Java string literal.
Test
String input = "/content/folder[#name='folder query 2']/folder[#name='Share file Zone']/folder[#name='steve']/folder[#name=\"steve's Personal Folder\"]/folder[#name='Backup']/folder[#name='20150317']/folder[#name='.Archive']";
for (Matcher m = Pattern.compile("(['\"])(.*?)\\1").matcher(input); m.find(); )
System.out.println(m.group(2));
Output
folder query 2
Share file Zone
steve
steve's Personal Folder
Backup
20150317
.Archive

You can get every thing between [#name=['\"] and ['\"] your regex should look like this \\[#name=['\"](.*?)['\"]] :
Pattern p1 = Pattern.compile("\\[#name=['\"](.*?)['\"]]");
Matcher m = p1.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Output
folder query 2
Share file Zone
steve
steve's Personal Folder
Backup
20150317
.Archive
Ideone Demo

Related

How to capture multiple groups in regex?

I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$
You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Looking at your string, you could also find the word/digits after the colon.
:(\w+)

How to extract date section from filename?

I need to find a regex to extract date section from the name of several files.
In particular I have these two formats:
ATC0200720140828080610.xls
ATC0200720140901080346_UFF_ACC.xls
I use these two regex to check file name format:
^ATC02007[0-9]{14}.xls$
^ATC02007[0-9]{14}_UFF_ACC.xls$
But I need a regex to extract a specific section:
constant | yyyyMMddHHmmss | constant
^ ^ ^
ATC02007 | 20140901080346 | _UFF_ACC.xls
Both regex I'm using match the entire file name, so I can't use to extract the middle section, so which is the right expression?
You are almost there. Just use round brackets to contain the numbers you want.
^ATC02007([0-9]{14})(_UFF_ACC)?.xls$
See example. The numbers are captured in group 1$1.
You need to use capturing groups.
^(ATC02007)([0-9]{14})((?:[^.]*)?\\.xls)$
DEMO
GRoup index 1 contains the first constant and group 2 contains date and time and group 3 contains the third constant.
String s = "ATC0200720140828080610.xls\n" +
"ATC0200720140901080346_UFF_ACC.xls";
Pattern regex = Pattern.compile("(?m)^(ATC02007)([0-9]{14})((?:[^.]*)?\\.xls)$");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
Output:
ATC02007
20140828080610
.xls
ATC02007
20140901080346
_UFF_ACC.xls

extracting a particular field from url

I want to extract particular fields from a url of a facebookpage. Iam not able to extract since link format is not static.eg:if I gave the below examples as input it should give the o/p as what we desire
1)https://www.facebook.com/pages/Ice-cream/109301862430120?rf=102173023157556
o/p -109301862430120
What about this type of link
can anyone help me
So in short, you want to get name after last / and (if there is any) before ? mark.
You can do it with using URI and File classes like
String data = "https://www.facebook.com/pages/Anti-Christian-sentiment/149675731889496?ref=br_tf";
System.out.println(new File(new URI(data).getRawPath()).getName());
Output: 149675731889496
If you need to use regex then you can use
([^/?]+)(\\?|$)
and just read content of group 1 (the one in first pair of parenthesis).
If you don't want to use groups, and make regex match only digit part (without including ? in match) then you can use look around mechanisms like look-ahead (?=...). Regex you would have to use would look like
[^/?]+(?=\\?|$)
Code example:
String data = "https://www.facebook.com/pages/Anti-Christian-sentiment/149675731889496?ref=br_tf";
Pattern p = Pattern.compile("([^/?]+)(\\?|$)");
Matcher m = p.matcher(data);
if (m.find()){
System.out.println(m.group(1));
}
Output:
149675731889496

how to get all names and date of births from a specific file using java

Hi below is my text file
welcome to java training
program
Name rtrti*&*
John
address india say^%$7
Date of Birth
11/12/1989
I have 100 files like above.The above text is the extracted text from the image files so it is not in order, from this i need to get the names and date of births can you please suggest me how to do this, I am new to this task.
Required output
John
11/12/1989
I have tried
Pattern p = Pattern.compile("Name");
Matcher matcher = p.matcher(content);
matcher.find();
But I have know idea how to get the next line of matched pattern, I cant not read this file line by line because my need is to store entire text in a single string.
I'll give a few hints that will get you on track. Without more details regarding the expected input, it will be difficult to give you a solid solution. First, I trust that you are already familiar with the Pattern and Matcher javadocs. You will need to understand the Groups and capturing section. Finally, you can utilize DOTALL mode which will allow the . character to match newlines.
To get you started, the following should work to find the name:
Pattern p = Pattern.compile(
"(?s)" + // DOTALL
".*" + // Match anything (to consume everything before 'Name')
"Name" + // Match the literal 'Name'
".*?" + // Reluctantly grab everything until...
"\n" + // Newline is reached
"\\s*" + // Consume leading whitespace
"(\\S+)" // Capture at least one non-whitespace character
);
Matcher m = p.matcher(content);
if(m.find()) {
String name = m.group(1); // The first capturing group contains "John"
}

Bug in java.util.regex in sun jdk 6.0.24?

The following code blocks on my system. Why?
System.out.println( Pattern.compile(
"^((?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*)/\\*.*?\\*/(.*)$",
Pattern.MULTILINE | Pattern.DOTALL ).matcher(
"\n\n\n\n\n\nUPDATE \"$SCHEMA\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';"
).matches() );
The pattern (designed to detect comments of the form /*...*/ but not within ' or ") should be fast, as it is deterministic...
Why does it take soooo long?
You're running into catastrophic backtracking.
Looking at your regex, it's easy to see how .*? and (.*) can match the same content since both also can match the intervening \*/ part (dot matches all, remember). Plus (and even more problematic), they can also match the same stuff that ((?:[^'"][^'"]*|"[^"]*"|'[^']*')*) matches.
The regex engine gets bogged down in trying all the permutations, especially if the string you're testing against is long.
I've just checked your regex against your string in RegexBuddy. It aborts the match attempt after 1.000.000 steps of the regex engine. Java will keep churning on until it gets through all permutations or until a Stack Overflow occurs...
You can greatly improve the performance of your regex by prohibiting backtracking into stuff that has already been matched. You can use atomic groups for this, changing your regex into
^((?>[^'"]+|"[^"]*"|'[^']*')*)(?>/\*.*?\*/)(.*)$
or, as a Java string:
"^((?>[^'\"]+|\"[^\"]*\"|'[^']*')*)(?>/\\*.*?\\*/)(.*)$"
This reduces the number of steps the regex engine has to go through from > 1 million to 58.
Be advised though that this will only find the first occurrence of a comment, so you'll have to apply the regex repeatedly until it fails.
Edit: I just added two slashes that were important for the expressions to work. Yet I had to change more than 6 characters.... :(
I recommend that you read Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...).
I think it's because of this bit:
(?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*
Removing the second and third alternatives gives you:
(?:[^'\"][^'\"]*)*
or:
(?:[^'\"]+)*
Repeated repeats can take a long time.
For comment /* and */ detection I would suggest having a code like this:
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" /*a comment\n\n*/ SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
Pattern pt = Pattern.compile("\"[^\"]*\"|'[^']*'|(/\\*.*?\\*/)",
Pattern.MULTILINE | Pattern.DOTALL);
Matcher matcher = pt.matcher(str);
boolean found = false;
while (matcher.find()) {
if (matcher.group(1) != null) {
found = true;
break;
}
}
if (found)
System.out.println("Found Comment: [" + matcher.group(1) + ']');
else
System.out.println("Didn't find Comment");
For above string it prints:
Found Comment: [/*a comment
*/]
But if I change input string to:
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" '/*a comment\n\n*/' SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
OR
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" \"/*a comment\n\n*/\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
Output is:
Didn't find Comment

Categories