How to store regex date pattern in JSON - java

My code requires me to store regex string in JSON. this is working fine for most of the patterns but lands in trouble when date pattern with '/' is used
i tried escaping with a '\'
(\\d{1,2}\/\\d{1,2}\/\\d{1,2}) this seems to be working fine as JSONLint does give any error
however the challenge is when i am trying to parse the JSON string in a JAVA program it gives error as it further requires '\' and '/' to be escaped. I have tried multiple options but not able to solve

I think your proposed regex escapes a backslash too many: Have a look at: https://regex101.com/r/xBFeZG/1
It's only the \ that needs to be escaped in java regexes, so transforming what I believe you want to that would be:
(\\d{1,2}\\/\\d{1,2}\\/\\d{1,4})
However, why not simply use a standard date format (like: dd/MM/yyyy -> see https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) and do something like:
LocalDate.parse(date, DateTimeFormatter.ofPattern(format)

If you have an expression like
\d{1,2}/\d{1,2}/\d{1,4}
then exporting it as JSON will produce something like this
{ "regex": "\\d{1,2}\/\\d{1,2}\/\\d{1,4}" }
with every "\" being escaped as "\\".
To parse correctly in Java, you really just have to "un-escape" the escaped backslashes, in other words, remove the leading backslash. Something like this should work:
String regex = jsonRegex.replaceAll("\\\\(.)", "$1");
EDIT: Forward slashes don't actually need to be escaped, although escaping them doesn't hurt. So, the expression will most probably be emitted in JSON like
\\d{1,2}/\\d{1,2}/\\d{1,4}

Related

How to escape \" as absolute, concrete, unique, single value? Java

I'm trying to replace \" backslash and quote to ", those characters are together in my text, so I do a replace but the compiler it say that my escaping syntax is incorrect:
String myText = "Animal:{\"name\":\"turkey\"}";
As you can observe here there is \" together so I'm want to replace by just one quote "
To look like this:
"Animal:{"name":"turkey"}"
So my replace its looks like:
myText.replaceAll("\\\"", "\"");
To escape a backslash we need \\
To escape a quote we need \"
With that logic I have this \\\" but its not working, its incorrect for the compiler...
I already tried:
myText.replaceAll("\\*", "");
But this one is not want I want in my string.
Any advice?
I think you are exposing the problem wrongly.
If you write a String like this in the source code:
String myText = "Animal:{\"name\":\"turkey\"}";
... the String is already escaped.
It means you may print :
System.out.println(myText);
... and you would get in output:
Animal:{"name":"turkey"}
... without need of doing anything else.
So I can only imagine two things:
Your question is: Can I write String myText = "Animal:{"name":"turkey"}"; without backslashes in the source code? => The answer is no, or the compiler wouldn't know what's the delimiter of the text and what's just another character of the text.
Your question is missing information: for example, you are receiving this String from a service which is responding in Json, and over the transport this String is keeping the backslashes on the quotes.
If that's the case, you should rather use the proper library to parse the message as JsonNode. Add a dependency to jackson-databind into your project and then use these methods:
ObjectMapper mapper = new ObjectMapper(); // <-- create ObjectMapper
JsonNode actualObj = mapper.readTree(jsonString); // <-- parse your Json string into a JsonNode
String niceJsonString = actualObj.toPrettyString(); // <-- formats the Json properly
If your question falls in my second guess, then I suggest you have a look at Jackson, it is a pretty powerful library to work with Json (a market standard for Json messaging in Java).
The problem is that the first argument of replaceAll is not just a String, it is a regex string, and regex also uses backslash to escape the next character. So you need another entire level of escaping, so that what you are passing to regex is the string you have, i.e. escape backslash escape quote, so that the regex will search for the sequence backslash quote.
myText.replaceAll("\\\\\\\"", "\"");
I think that's right.

How to change special character "\" in a text file with replace

I need to change some things in a big .rtf. I do it correctly in another files with another text changing, but in the text has something like this "\line". I want to change it to "\par"
I know the '\' is special character, and I can't use simple .replace("\line", "\par"). I tried the .replace("\\line", "\\par").
Neither worked, is there a way to do this? I can't use simple .replace("line", "par") because some words have the line between but without the "\". I only need to change when line has a "\" before
Strings are immutable
line = line.replace("\\line", "\\par");
You need to escape the \ in the regex as \\. However each of these needs to be escaped in the string. You'll need a full regex:
replaceAll("\\\\line", "\\\\par");
4 backslashes are turned into 2 \ characters in the string during compiler parsing, and \\ is parsed by the regex engine as a single literal backslash.

Java Regular Expression Escape Sequence

I was trying to match the example in ,
<p>LinkToPage</p>
With rubular.com I could get something like <a href=\"(.*)?\/index.html\">.*<\/a>.
I'll be using this in Pattern.compile in Java. I know that \ has to be escaped as well, and I've come up with <a href=\\\"(.*)?\\\/index.html\\\">.*<\\\/a> and a few more variations but I'm getting it wrong. I tested on regexplanet. Can anyone help me with this?
Use ".*" in your Java code.
You only need to escape " because it's a Java string literal.
You don't need to escape /, because you aren't delimiting your regex with slashes (as you would be in Ruby).
Also, (.*)? makes no sense. Just use (.*). * can already match "nothing", so there's no point in having the ?.
Pattern.compile(".*");
That should fix your regex. You do not need to escape the forward slashes.
However I am obligated to present you with the standard caution against parsing HTML with regex:
RegEx match open tags except XHTML self-contained tags
You can tell Java what to match and call Pattern.quote(str) to make it escape the correct things for you.

Regex - escape or character block?

What is the best approach if for instance a question mark is expected in a String.
...[?]...
or
...\?...
Example:
The text bla?bla will match both with the pattern bla[?]bla and bla\?bla (bot not bla?bla obviously) but is there any reason to use one over the other?
There is no technical reason to prefer one over the other: They are equivalent expressions. The character class is only used to avoid entering a backslash, so IMHO the escaped version is "cleaner"
However the reason may be to avoid double-escaping the slash on input. In languages like java, the literal version of the escaped version would look like this:
// in java you need to escape a backslash with another backslash :(
String regex = "...\\?...";
It could be that wherever the regexes are coming from has a similar issue and it's easier to read [?] than \\?

Regex in GWT to match URLs

I implemented the Pattern class as shown here:
http://www.java2s.com/Code/Java/GWT/ImplementjavautilregexPatternwithJavascriptRegExpobject.htm
And I would like to use the following regex to match urls in my String:
(http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?
Unfortunately, the Java compiler of course fails on parsing that string because it doesn't use valid escape sequences (since the above is technically a url pattern for JavaScript, not Java)
At the end of the day, I'm looking for a regex pattern that will both compile in Java and execute in JavaScript correctly.
You will have to use JSNI to do the regex evaluation part in Javascript. If you do write the regex with the escaped backslashes, that will get converted to Javascript as it is and will obviously be invalid. Thought it will work in the Hosted or Dev mode as thats still running Java bytecode, but not on the compiled application.
A simple JSNI example to test if a given string is a valid URL:
// Java method
public native boolean isValidUrl(String url) /*-{
var pattern = /(http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/;
return pattern.test(url);
}-*/;
There may be other irregularities between the Java and Javascript regex engines, so it's better to offload it completely to Javascript at least for moderately complex regexes.
The pattern itself looks fine, but I guess, its because of Backslash escaping.
Please take a look this http://www.regular-expressions.info/java.html
In literal Java strings the backslash
is an escape character. The literal
string "\\" is a single backslash. In
regular expressions, the backslash is
also an escape character. The regular
expression \\ matches a single
backslash. This regular expression as
a Java string, becomes "\\\\". That's
right: 4 backslashes to match a single
one.
So, if you reuse your Javascript regex in java, you need to replace \ to \\, and vice versa.
I don't know exactly how this would help but here is the exact function you requested in Javascript. I guess using JSNI like Anurag said will help.
var urlPattern = "(https?|ftp)://(www\\.)?(((([a-zA-Z0-9.-]+\\.){1,}[a-zA-Z]{2,4}|localhost))|((\\d{1,3}\\.){3}(\\d{1,3})))(:(\\d+))?(/([a-zA-Z0-9-._~!$&'()*+,;=:#/]|%[0-9A-F]{2})*)?(\\?([a-zA-Z0-9-._~!$&'()*+,;=:/?#]|%[0-9A-F]{2})*)?(#([a-zA-Z0-9._-]|%[0-9A-F]{2})*)?";
function isValidURL(url) {
urlPattern = "^" + urlPattern + "$";
var regex = new RegExp(urlPattern);
return regex.test(url);
}
Like what #S.Mark said, I basically took the "java" way of doing Regular Expression in Javascript.
In Java, you would just done it the following way (see how the expression is the same).
String urlPattern = "(https?|ftp)://(www\\.)?(((([a-zA-Z0-9.-]+\\.){1,}[a-zA-Z]{2,4}|localhost))|((\\d{1,3}\\.){3}(\\d{1,3})))(:(\\d+))?(/([a-zA-Z0-9-._~!$&'()*+,;=:#/]|%[0-9A-F]{2})*)?(\\?([a-zA-Z0-9-._~!$&'()*+,;=:/?#]|%[0-9A-F]{2})*)?(#([a-zA-Z0-9._-]|%[0-9A-F]{2})*)?";
Hope this helps. PS, this Regular expression works and even validates sites pointing to localhost:port) where port is any digit port number.

Categories