Using regex pattern in Java - java

I created a regex pattern that works perfect, but I can't get it working in Java:
(\\"|[^" ])+|"(\\"|[^"])*"
applied to
robocopy "C:\test" "C:\test2" /R:0 /MIR /NP
gives (as it should)
[0] => robocopy
[1] => "C:\test"
[2] => "C:\test2"
[3] => /R:0
[4] => /MIR
[5] => /NP
in group 0 according to http://myregextester.com/index.php
Now, how do I get those 6 values in Java?
I tried
Pattern p = Pattern.compile(" (\\\"|[^\" ])+ | \"(\\\"|[^\"])*\" ");
Matcher m = p.matcher(command);
System.out.println(m.matches()); // returns false
but the pattern doesn't even match anything at all?
Update
The original perl regex was:
(\\"|[^" ])+|"(\\"|[^"])*"

Since the regexp string is first processed by the compiler before making it to the regexp processor, you need to double every backslahs in the expression, and add additional slashes for every doublequote.
Pattern p = Pattern.compile("(\\\\\"|[^\" ])+|\"(\\\\\"|[^\"])*\"");

The matches() method is matching the whole string to the regex - it returns true only if the entire string is matching
What you are looking for is the find() method, and get the substring using the group() method.
It is usually done by iterating:
while (m.find()) {
.... = m.group();
//post processing
}

matches() tries to match the pattern on entire string. You should use find() method of the Matcher object for your case.
So the solution is:
System.out.println(m.find());

Related

How to capture multiple groups in regex?

I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$
You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Looking at your string, you could also find the word/digits after the colon.
:(\w+)

Regex not filter by delimiters

I want to create a regular expresion where I want match in case my number are separated by a coma.
For example:
1 OK
1,2,3 OK
1\n2,3 OK
1,\n Not OK
1,,2 Not OK
1,\n2 Not Ok
So far I create this expresion
\d+(([,.|\n])+\d+)*
If I change the last * to be at least 1 with +
\d+(([,.|\n])+\d+)+
Then all previous scenarios works but not this one
1 Not OK//And should be ok
I´m using matcher.find()
Matcher matcher = Pattern.compile(pattern).matcher(number);
if (matcher.find()) {
System.out.println("total number:" + matcher.group(0));;
}
Any idea what I´m doing wrong in my regex?
You can use this regex:
^\d+(?:(?:,|\n)\d+)*$
Java regex:
Pattern p = Pattern.compile("^\\d+(?:(?:,|\\n)\\d+)*$");
RegEx Demo
PS: To match literal \n you will need:
^\d+(?:(?:,|\\n)\d+)*$

Substring between lines using Regular Expression Java

Hi I am having following string
abc test ...
interface
somedata ...
xxx ...
!
sdfff as ##
example
yyy sdd ## .
!
I have a requirement that I want to find content between a line having word "interface" or "example" and a line "!".
Required output will be something like below
String[] output= {"somedata ...\nxxx ...\n","yyy sdd ## .\n"} ;
I can do this manually using substring and iteration . But I want to achieve this using regular expression.
Is it possible?
This is what I have tried
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?m)\ninterface(.*?)\n!\n");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Am I Right? Please suggest a right way of doing it .
Edit :
A small change : I want to find content between a line "interface" or "example" and a line "!".
Can we achieve this too using regex ?
You could use (?s) DOTALL modifier.
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?s)(?<=\\ninterface\\n).*?(?=\\n!\\n)");//Pattern.compile("(?m)^.*$");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Output:
somedata
xxx
yyy
Note that the input in your example is different.
(?<=\\ninterface\\n) Asserts that the match must be preceded by the characters which are matched by the pattern present inside the positive lookbehind.
(?=\\n!\\n) Asserts that the match must be followed by the characters which are matched by the pattern present inside the positive lookahead.
Update:
Pattern pattern = Pattern.compile("(?s)(?<=\\n(?:example|interface)\\n).*?(?=\\n!\\n)");

Get String in between either single quotes or empty space

I wish to have a regular expression which gives me the name of classLoader inserted in single quotes/empty-space but not a mixture of both.
i.e. some examples. :
2014-05-21 22:05:13.685 TRACE [Core]
sun.misc.Launcher$AppClassLoader#62c8aeb3 searching for resource
'java/util/LoggerFactory.class'.
expected output sun.misc.Launcher$AppClassLoader#62c8aeb3
2014-05-21 22:05:13.685 TRACE [Core] Class
'org.jboss.weld.metadata.TypeStore' not found in classloader
'org.jboss.modules.ModuleClassLoader#4ebded0b'.
expected output org.jboss.modules.ModuleClassLoader#4ebded0b
2014-05-21 22:04:34.591 INFO [Core] Started plugin
org.zeroturnaround.javarebel.integration.IntegrationPlugin from
/Users/endragor/Downloads/jrebel/jrebel.jar in
sun.misc.Launcher$AppClassLoader#62c8aeb3
expected output sun.misc.Launcher$AppClassLoader#62c8aeb3
Note that for last exampe, the line ends with new line character. i.e. there is nothing in front.
This is what I have tried ".*[\\s|'](.*ClassLoader.*[^']*)['|\\s].*". But it doesn't work. For the first example it gives below rather than sun.misc.Launcher$AppClassLoader#62c8aeb3:
sun.misc.Launcher$AppClassLoader#62c8aeb3 searching for resource
'java/util/LoggerFactory.class
Also my regex does not handle if the class loader string is end of the line i.e. example-3 above. What can I do so that either ' is considered or \\s but not both
Try this one :
String extractedValue=yourString.replaceAll("(.*)([ '])(.*ClassLoader.*?)(\\2)(.*)", "$3");
Whenever we want to extract String between a predifined set of value , where the first and last delimiter should have the same value , we can use Backreference feature .
this regex should do without grouping:
[^\s']*ClassLoader[^\s']*
in java it should be:
[^\\s']*ClassLoader[^\\s']*
you don't need the pipe | in [..], in regex [abcd] means a or b or c or d
update
add java codes:
public static void main(String[] args){
Pattern p = Pattern.compile("[^\\s']*ClassLoader[^\\s']*");
Matcher m = p.matcher("2014-05-21 22:05:13.685 TRACE [Core] sun.misc.Launcher$AppClassLoader#62c8aeb3 searching for resource 'java/util/LoggerFactory.class'.");
if (m.find()) {
System.out.println(m.group());
}
}
output:
sun.misc.Launcher$AppClassLoader#62c8aeb3
As operator has changed the original post so here is the updated answer.
Simply use below pattern to check for default toString() representation of Object class.
[\w\.$]+ClassLoader#[a-z0-9]+
Pattern Expiation:
\w A word character: [a-zA-Z_0-9]
X+ X, one or more times
[abc] a, b, or c (simple class)
Snapshot:
Here is the DEMO

Bug in java.util.regex in sun jdk 6.0.24?

The following code blocks on my system. Why?
System.out.println( Pattern.compile(
"^((?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*)/\\*.*?\\*/(.*)$",
Pattern.MULTILINE | Pattern.DOTALL ).matcher(
"\n\n\n\n\n\nUPDATE \"$SCHEMA\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';"
).matches() );
The pattern (designed to detect comments of the form /*...*/ but not within ' or ") should be fast, as it is deterministic...
Why does it take soooo long?
You're running into catastrophic backtracking.
Looking at your regex, it's easy to see how .*? and (.*) can match the same content since both also can match the intervening \*/ part (dot matches all, remember). Plus (and even more problematic), they can also match the same stuff that ((?:[^'"][^'"]*|"[^"]*"|'[^']*')*) matches.
The regex engine gets bogged down in trying all the permutations, especially if the string you're testing against is long.
I've just checked your regex against your string in RegexBuddy. It aborts the match attempt after 1.000.000 steps of the regex engine. Java will keep churning on until it gets through all permutations or until a Stack Overflow occurs...
You can greatly improve the performance of your regex by prohibiting backtracking into stuff that has already been matched. You can use atomic groups for this, changing your regex into
^((?>[^'"]+|"[^"]*"|'[^']*')*)(?>/\*.*?\*/)(.*)$
or, as a Java string:
"^((?>[^'\"]+|\"[^\"]*\"|'[^']*')*)(?>/\\*.*?\\*/)(.*)$"
This reduces the number of steps the regex engine has to go through from > 1 million to 58.
Be advised though that this will only find the first occurrence of a comment, so you'll have to apply the regex repeatedly until it fails.
Edit: I just added two slashes that were important for the expressions to work. Yet I had to change more than 6 characters.... :(
I recommend that you read Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...).
I think it's because of this bit:
(?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*
Removing the second and third alternatives gives you:
(?:[^'\"][^'\"]*)*
or:
(?:[^'\"]+)*
Repeated repeats can take a long time.
For comment /* and */ detection I would suggest having a code like this:
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" /*a comment\n\n*/ SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
Pattern pt = Pattern.compile("\"[^\"]*\"|'[^']*'|(/\\*.*?\\*/)",
Pattern.MULTILINE | Pattern.DOTALL);
Matcher matcher = pt.matcher(str);
boolean found = false;
while (matcher.find()) {
if (matcher.group(1) != null) {
found = true;
break;
}
}
if (found)
System.out.println("Found Comment: [" + matcher.group(1) + ']');
else
System.out.println("Didn't find Comment");
For above string it prints:
Found Comment: [/*a comment
*/]
But if I change input string to:
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" '/*a comment\n\n*/' SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
OR
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" \"/*a comment\n\n*/\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
Output is:
Didn't find Comment

Categories