Is there a tool to convert a regex from one popular language's syntax to another? For example a Python-style regex to a Java-style regex?.
Or at least, has someone put together a set of rules to do these conversions?
And obviously some constructs won't be able to convert.
Go to this article, and follow the link to "Regex info's comparison of Regex flavors", that got me to a tool called RegexBuddy, which sounds like it might do what you want.
Yes there is a Windows tool that will do this: RegexBuddy
Related
Now I am trying to match some patterns from a String containing elasticsearch's structured bulk requests. Here is an example:
index {[event_20191209][event][null], source[{"haha":"haha","jaja":"jaja"}]}, update {[event_20191209][event][xxx], doc_as_upsert[false], doc[index {[null][_doc][null], source[{"haha":"haha","jaja":"jaja"}]}], scripted_upsert[false], detect_noop[true]}, delete {[event_20191208][_doc][sjdos]}, update {[event_20191209][event][yyy], doc_as_upsert[false], upsert[index {[null][_doc][null], source[{"haha":"haha","jaja":"jaja"}]}], scripted_upsert[false], detect_noop[true]}
My goal is to match every separate request out of the bulk requests string, i.e to get strings like:
index {[event_20191209][event][null], source[{"haha":"haha","jaja":"jaja"}]},
update {[event_20191209][event][xxx], doc_as_upsert[false], doc[index {[null][_doc][null], source[{"haha":"haha","jaja":"jaja"}]}], scripted_upsert[false], detect_noop[true]},
delete {[event_20191208][_doc][sjdos]},
update {[event_20191209][event][yyy], doc_as_upsert[false], upsert[index {[null][_doc][null], source[{"haha":"haha","jaja":"jaja"}]}], scripted_upsert[false], detect_noop[true]}
And my pattern expression is [a-z]+\s\{.+?\}[,\w\t\r\n]+? which works fine on a Javascript based regular expression online tester like below:
However, when I copied this pattern expression to my Java code, the output was not what I expected. It was like this:
So I realized there exists some differences between Javascript and Java regular expression engine, but I cannot figure out how to update my expression so that it could work well in Java after so much coding and googling.
I would be so grateful if someone could give me some favor or hint for this.
After a short nap, I found epiphany. I was a fool in the morning....
The workaround is so easy to implement. Elasticsearch has well overridden toString() for us.
At first glance, I wouldn't suggest using regex right away. It looks like those lines follow some kind of pattern that you could parse and split up first.
After that, if you're talking about regex, I'd try:
Taking a look at the java regex format: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
How about using an online java regex tool instead?
Now I am doing some code basing on regex in Groovy. But for creation and testing my regexes I use books referencing to Java regex engine and Java-oriented http://www.regexplanet.com/advanced/java/index.html.
And I am a bit afraid - Is Groovy regex engine really the same as Java one? I know that they are very close. But have they some differences nevertheless? If you know the answer - can you kindly give me some reference on the subject?
From the language documentation:
The pattern operator (~) provides a simple way to create a java.util.regex.Pattern instance.
I can't find phrasing where the documentation guarantees this is the regular expression engine used for pattern matching throughout Groovy; I do however find it very, very, very, very unlikely Groovy would use two RE engines in its implementation now or switch the RE engine in the future.
"Because Groovy is based on Java, you can use Java's regular expression package with Groovy. Simply put import java.util.regex.* at the top of your Groovy source code. Any Java code using regular expressions will then automatically work in your Groovy code too."
Source: regular-expressions.info
Here is a groovy example to match regex along with find :
assert ['abc'] == ['def', 'abc', '123'].findAll { it =~ /abc/ }
You can find more examples from here (including above sample), thanks to Mr. Haki.
Im using this regex online test site.
Here is the regex im using:
\{"ip":"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$","iphone":"admin/ios","dev":\{"action":"CUS","from":"REG","CUSA":"ADVERT"\}\}
And im trying to match it to:
{"ip":"192.168.50.5","iphone":"admin/ios","dev":{"action":"CUS","from":"REG","CUSA":"ADVERT"}}
When i run the test, it doesn't match, I need it to match on the site above for validation reasons.
A different perspective: it seems that it is already pretty hard to come up with a regex that initially works for you. What does this tell you about how hard will it be in the future to maintain this regex; and maybe extend it?!
What I am saying is: regexes are a good tool; but sometimes overrated. This looks like a string in JSON format. Wouldn't it be better to just take it as that, and use a garden-variety JSON parser instead of trying to build your own regex?
You see, what will be more robust over time - your self baked regex; or some standard library that millions of people are using?
One place to read about JSON parsers would be this question here.
This will be enough for your context.
"ip":"(\d+).(\d+).(\d+).(\d+)"
Edit:
Regex is not for structured data processing, most of the time you need a solution that just works. When sample data changed and doesn't match anymore, you update the regex string to match it again.
Since you want to get four numbers inside a quote pair after a key called "ip", this regex will definitely do it.
If you want something else, please provide more context. Thanks!
So I'm not sure if I understand how this works and would like
a simple explanation to how they work is all. I probably have it way off. A pure regex solution is required, and I don't know if this is possible. If it is, a solution would be awesome too, but a shove in the right direction would be good for my learning process ^_^
This is how I thought the if/then/else option built into my regex engines was formatted:
?(condition)if regex|else regex
I want it to capture a string from a very specific location only when this string exists within a certain section of javascript. Because this is how I thought it worked after a decent amount of research I tried out a few variations of this code but they all ended up something like this.
((?^view_large$)Tables-137(.*?)search.htm)
Also of relevance: I'm using an java based app that has regex searches which pull the data I need so I cannot write an if statement in java which would be my preferred method. It's a pain to have to do it this way, but at the moment I have no other choice. I'm trying really hard for them to allow java code functionality instead of pure regex for more versatile options.
So to summarize, is there even a if/then option in regex and if so how is it formatted for what I'm trying to accomplish?
EDIT: The string that I want to be the "if condition" is like this: if view_large string exists and is not null then capture the exact string 500/ which is captured within the catch all group I used: (.*?)
There is no conditionals in Java regexp, but you can simulate them by writing two expressions that include mutually exclusive look-behind constructs, like this:
((?<=if )then)|((?<!if )end)
This expression will match "then" when it is preceded by an "if "; it will match "end" when it is not preceded by an "if "
The Javadoc for java.util.regex.Pattern mentions, in its list of "Perl constructs not supported by this class":
The conditional constructs (?(condition)X) and (?(condition)X|Y).
So, no dice. But you should look through the Javadoc to see if you can achieve what you need by using regex features that it does support. (Or, if you post some more detailed examples, we can try to help.)
Try lookaround assertions.
For example, say you want to capture FOOBAR only if there is a 4+ digit number somewhere:
(?=.*\d{4}).*(FOOBAR)
Is there any way in Java to use a special delimiter at the start and the end of a String to avoid having to backslash all of the quotes within that String?
i.e. not have to do this:
String s = "Quote marks like this \" are just the best, here are a few more \" \" \""
No, there is no such option. Sorry.
No - there's nothing like C#'s verbatim string literals or Groovy's slashy strings, for example.
On the other hand, it's the kind of feature which may be included in the future. It's not like it would require any fundamental changes in the type system. I'd be hugely surprised for it to make it into Java 7 this late in the day though, and I haven't seen any suggestions that it'll be in Java 8... so you're in for a long wait :(
The only way to achive this is to put your strings in some other file and read it from Java. For instance a resource bundle.
Its not possible as of now, May be NOT in future also.
if you can give us what and why you are loookng for this kind of feature we can defnitely Suggest some more alternatives