Are regex engines of Java and Groovy the same?

Are regex engines of Java and Groovy the same? - java

Now I am doing some code basing on regex in Groovy. But for creation and testing my regexes I use books referencing to Java regex engine and Java-oriented http://www.regexplanet.com/advanced/java/index.html.
And I am a bit afraid - Is Groovy regex engine really the same as Java one? I know that they are very close. But have they some differences nevertheless? If you know the answer - can you kindly give me some reference on the subject?

From the language documentation:
The pattern operator (~) provides a simple way to create a java.util.regex.Pattern instance.
I can't find phrasing where the documentation guarantees this is the regular expression engine used for pattern matching throughout Groovy; I do however find it very, very, very, very unlikely Groovy would use two RE engines in its implementation now or switch the RE engine in the future.

"Because Groovy is based on Java, you can use Java's regular expression package with Groovy. Simply put import java.util.regex.* at the top of your Groovy source code. Any Java code using regular expressions will then automatically work in your Groovy code too."
Source: regular-expressions.info

Here is a groovy example to match regex along with find :
assert ['abc'] == ['def', 'abc', '123'].findAll { it =~ /abc/ }
You can find more examples from here (including above sample), thanks to Mr. Haki.

Related

Different result between Javascript and Java regular expression matches

Now I am trying to match some patterns from a String containing elasticsearch's structured bulk requests. Here is an example:
index {[event_20191209][event][null], source[{"haha":"haha","jaja":"jaja"}]}, update {[event_20191209][event][xxx], doc_as_upsert[false], doc[index {[null][_doc][null], source[{"haha":"haha","jaja":"jaja"}]}], scripted_upsert[false], detect_noop[true]}, delete {[event_20191208][_doc][sjdos]}, update {[event_20191209][event][yyy], doc_as_upsert[false], upsert[index {[null][_doc][null], source[{"haha":"haha","jaja":"jaja"}]}], scripted_upsert[false], detect_noop[true]}
My goal is to match every separate request out of the bulk requests string, i.e to get strings like:
index {[event_20191209][event][null], source[{"haha":"haha","jaja":"jaja"}]},
update {[event_20191209][event][xxx], doc_as_upsert[false], doc[index {[null][_doc][null], source[{"haha":"haha","jaja":"jaja"}]}], scripted_upsert[false], detect_noop[true]},
delete {[event_20191208][_doc][sjdos]},
update {[event_20191209][event][yyy], doc_as_upsert[false], upsert[index {[null][_doc][null], source[{"haha":"haha","jaja":"jaja"}]}], scripted_upsert[false], detect_noop[true]}
And my pattern expression is [a-z]+\s\{.+?\}[,\w\t\r\n]+? which works fine on a Javascript based regular expression online tester like below:
However, when I copied this pattern expression to my Java code, the output was not what I expected. It was like this:
So I realized there exists some differences between Javascript and Java regular expression engine, but I cannot figure out how to update my expression so that it could work well in Java after so much coding and googling.
I would be so grateful if someone could give me some favor or hint for this.

After a short nap, I found epiphany. I was a fool in the morning....
The workaround is so easy to implement. Elasticsearch has well overridden toString() for us.

At first glance, I wouldn't suggest using regex right away. It looks like those lines follow some kind of pattern that you could parse and split up first.
After that, if you're talking about regex, I'd try:
Taking a look at the java regex format: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
How about using an online java regex tool instead?

how can I simplify a regular expression generated from a DFA

I try to implement the Brzozowski algebraic method using Java, to generate the regular expression of the langage accepted by a given DFA, The expression generated is correct but not simplified.
For example :
E|(E)e(e)|(A|(E)e(A))(A|e|(B)e(A))*(B|(B)e(e))|(B|(E)e(B)|(A|(E)e(A))(A|e|(B)e(A))*(E|(B)e(B)))(B|e|(E)e(B)|(A|(E)e(A))(A|e|(B)e(A))*(E|(B)e(B)))*(E|(E)e(e)|(A|(E)e(A))(A|e|(B)e(A))*(B|(B)e(e)))
(e = epsilon, E = the empty set)
instead of : (A|B)*AB
The "Transitive closure method" returns nearly the same result.
One of the solutions consists of minimizing the automaton, but i think it's too heavy to generate a simplified regular expression.
also, using Java regular expressions methods to simplify a regular expression is not pretty at all :) .
So, it would be nice to try helping me to find a solution.

Yes, it is possible. Take a look at this Peter Norvig article. Oh! There is a second part. The solution is in python, but you can easily adapt it.

understanding regex if then statements

So I'm not sure if I understand how this works and would like
a simple explanation to how they work is all. I probably have it way off. A pure regex solution is required, and I don't know if this is possible. If it is, a solution would be awesome too, but a shove in the right direction would be good for my learning process ^_^
This is how I thought the if/then/else option built into my regex engines was formatted:
?(condition)if regex|else regex
I want it to capture a string from a very specific location only when this string exists within a certain section of javascript. Because this is how I thought it worked after a decent amount of research I tried out a few variations of this code but they all ended up something like this.
((?^view_large$)Tables-137(.*?)search.htm)
Also of relevance: I'm using an java based app that has regex searches which pull the data I need so I cannot write an if statement in java which would be my preferred method. It's a pain to have to do it this way, but at the moment I have no other choice. I'm trying really hard for them to allow java code functionality instead of pure regex for more versatile options.
So to summarize, is there even a if/then option in regex and if so how is it formatted for what I'm trying to accomplish?
EDIT: The string that I want to be the "if condition" is like this: if view_large string exists and is not null then capture the exact string 500/ which is captured within the catch all group I used: (.*?)

There is no conditionals in Java regexp, but you can simulate them by writing two expressions that include mutually exclusive look-behind constructs, like this:
((?<=if )then)|((?<!if )end)
This expression will match "then" when it is preceded by an "if "; it will match "end" when it is not preceded by an "if "

The Javadoc for java.util.regex.Pattern mentions, in its list of "Perl constructs not supported by this class":
The conditional constructs (?(condition)X) and (?(condition)X|Y).
So, no dice. But you should look through the Javadoc to see if you can achieve what you need by using regex features that it does support. (Or, if you post some more detailed examples, we can try to help.)

Try lookaround assertions.
For example, say you want to capture FOOBAR only if there is a 4+ digit number somewhere:
(?=.*\d{4}).*(FOOBAR)

Tool to convert regex between different language syntaxes?

Is there a tool to convert a regex from one popular language's syntax to another? For example a Python-style regex to a Java-style regex?.
Or at least, has someone put together a set of rules to do these conversions?
And obviously some constructs won't be able to convert.

Go to this article, and follow the link to "Regex info's comparison of Regex flavors", that got me to a tool called RegexBuddy, which sounds like it might do what you want.

Yes there is a Windows tool that will do this: RegexBuddy

Regular Expression to a Java.regex.Pattern

I'm working with the IPV6 address space to have our java app accept the IPV6 standard. I've written a regular expression that is tested and working.
((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}((:[0-9A-Fa-f]{1,4}){1,2}|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)|(([0-9A-Fa-f]{1,4}:){3}|(((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|((:[0-9A-Fa-f]{1,4}){1,7}|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9}?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))
Can anyone tell me how to get this working with the proper escapes to compile as a Java pattern?

Use this:
Pattern.compile("((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}((:[0-9A-Fa-f]{1,4}){1,2}|:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:)|(([0-9A-Fa-f]{1,4}:){3}|(((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:)|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|((:[0-9A-Fa-f]{1,4}){1,7}|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:)))")
Summary of changes:
All backslashes are doubled.
There is a typo of [1-9} in your pattern. That is fixed.
Your pattern is also missing a final closing parenthesis.

If you are asking how to escape in order to make this regular expression works. You can check "Escape Sequence" section of this tutorial for better explaination
Characters

Check out the official API docs, specifically the Pattern class (which has a list of escape characters, and other stuff specific to the Java implementation of Regex), and the java.util.regex package documentation in general.
This site has been one of my favorites for regex reference, and has some details on some of the Java-specific regex quirks.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Are regex engines of Java and Groovy the same? - java

"Because Groovy is based on Java, you can use Java's regular expression package with Groovy. Simply put import java.util.regex.* at the top of your Groovy source code. Any Java code using regular expressions will then automatically work in your Groovy code too." Source: regular-expressions.info

Here is a groovy example to match regex along with find : assert ['abc'] == ['def', 'abc', '123'].findAll { it =~ /abc/ } You can find more examples from here (including above sample), thanks to Mr. Haki.

Related

Different result between Javascript and Java regular expression matches

how can I simplify a regular expression generated from a DFA

understanding regex if then statements

Tool to convert regex between different language syntaxes?

Regular Expression to a Java.regex.Pattern

Categories

Resources