regex with replaceAll - java

I have done some searching and would like advice on this problem:
I want to replace "labels":"Webapp" with "labels":["Webapp"]
I found the regex (\"labels\"\:\")+(([a-zA-Z]|\s|\-)+)+(\") with the following substitution "labels":["$2"]
I use the method replaceAll and the Talend editor.
I write output_row.json = output_row.json.replaceAll("(\"labels\"\:\")+(([a-zA-Z]|\s|\-)+)+(\")",""labels":["$2"]"); but It doesn't work.
Message détaillé: Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ )
Then I escaped the characters, I did:
output_row.json = output_row.json.replaceAll("(\\"labels\\"\:\\")+(([a-zA-Z]|\\s|\-)+)+(\\")","\"labels\":[\"$2\"]");
But It doesn't work yet.
Please could you help me?
Thanks.

Issues : don't escape - and : they are not special characters in regex
escape \s with \\s plus escape " as you did in your second example \"labels\":[\"$2\"]
Although you can use a more concise regex and combine your \\s , - inside character class []
You can use (\"labels\":\")+([a-zA-Z -]+)\
System.out.println("labels\":\"Webapp"
.replaceAll("(\"labels\":\")+([a-zA-Z -]+)\""
, "\"labels\":[\"$2\"]"));

Related

Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ ). Not able to put regex pattern in a string

I am trying to put a regex pattern but not able to do so. This is the regex pattern:
/(?:^(http:\/\/|https:\/\/|\/\/)((?:[\w.:-]+)(?:(?:[\/]+)(?!acl-)|["\'])(?:[^\s"\'}\]]*)))/mi
Please help me. I would be more than thankful to you.
/(?:^(http:\/\/|https:\/\/|\/\/)((?:[\w.:-]+)(?:(?:[\/]+)(?!acl-)|["\'])(?:[^\s"\'}\]]*)))/mi
Equivalent Java regex for the above is,
"(?m)(?i)(?:^(http://|https://|//)((?:[\\w.:-]+)(?:/+(?!acl-)|[\"'])(?:[^\\s\"'}\\]]*)))"
(?m) called multiline modifier and (?i) called case-insensitive modifier. You don't need to escape the forward slashes and single quotes but you must escape the double quotes. You could also combine the above modifiers as (?mi) or (?im) and (http://|https://|//) could be written as ((?:https?:)?//)

REGEXP - how to read " character?

I'm using hadoop pig with regexp (REGEX_EXTRACT_ALL) - this is Java parsing.
I have a string:
"DYN_USER_ID=32753477; $Path=\"/\"; DYN_USER_CONFIRM=e6d2a0a7b7715cb10d1dca504e3c5e80; $Path=\"/\"" "Nokia6070/2.0 (03.20) Profile/MIDP-2.0 Configuration/CLDC-1.1"
I'm expeting two groups:
First: DYN_USER_ID=32753477; $Path=\"/\"; DYN_USER_CONFIRM=e6d2a0a7b7715cb10d1dca504e3c5e80; $Path=\"/\"
Second: Nokia6070/2.0 (03.20) Profile/MIDP-2.0 Configuration/CLDC-1.1
As you can see, inside the first string there is " character but with escape character \.
The simplies solution is:
"(.*)" "(.*)"
But is it the best one?
"(.*)(?<!\\\\)" "(.*)"
This uses negatve lookbehind: (?<!☀) where ☀ is some string, here the character backspace is represented by an regex-escaped and String-escaped backslash.
Ideally, you should be using the negated character class [^"] so that it matches from the first delimiter " to the last delimiter ", but the problem is that it ignores escaped " characters. If you can have escaped " and escaped \ in your strings, it will be better if you use something like this:
"((?:\\.|[^"\\])+)" "((?:\\.|[^"\\])+)"
The group (?:\\.|[^"\\])+ will match either an escaped character or many [^"\\] characters.
regex101 demo

Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ )

I have a problem with a regex in java.
When I try to use this regex:
^(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)?([0-5]?\d)$
I get the following error
"Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ )"
I don't know how to handle that error.
I already tried to double the backslashes, but it didn't work.
I hope someone can help me with this.
Thanks
This should work ^(?:(?:([01]?\\d|2[0-3]):)?([0-5]?\\d):)?([0-5]?\\d)$
The reason is that the listed symbols in the error message have special meaning, but \d is not one of those defined special symbols for using \, this means you have to escape it (by adding an extra \ in front of the symbol).
Whenever you're writing regular expressions in Java, remember to escape the \ characters used in the string that defines the regular expression. In other words, if your regular expression contains one \, then you HAVE to write two \\. For example, your code should look like this:
^(?:(?:([01]?\\d|2[0-3]):)?([0-5]?\\d):)?([0-5]?\\d)$
Why, you ask? because in Java's strings, \ is the escape character used to denote special characters (example: tabs, new lines, etc.) and if a string contains a \ then it must itself be escaped, by prepending another \ in front of it. Hence, \\.
For the record, here is the Java language specification page listing the valid escape characters and their meanings, notice the last one:
\b backspace
\t horizontal tab
\n linefeed
\f form feed
\r carriage return
\" double quote
\' single quote
\\ backslash
you can use notepad++ with find / and replace with //

Java regular expression to remove all non alphanumeric characters EXCEPT spaces

I'm trying to write a regular expression in Java which removes all non-alphanumeric characters from a paragraph, except the spaces between the words.
This is the code I've written:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\s]", "");
However, the compiler gave me an error message pointing to the s saying it's an illegal escape character. The program compiled OK before I added the \s to the end of the regular expression, but the problem with that was that the spaces between words in the paragraph were stripped out.
How can I fix this error?
You need to double-escape the \ character: "[^a-zA-Z0-9\\s]"
Java will interpret \s as a Java String escape character, which is indeed an invalid Java escape. By writing \\, you escape the \ character, essentially sending a single \ character to the regex. This \ then becomes part of the regex escape character \s.
You need to escape the \ so that the regular expression recognizes \s :
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
Generally whenever you see that error, it means you only have a single backslash where you need two:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
Victoria, you must write \\s not \s here.
Please take a look at this site, you can test Java Regex online and get wellformatted regex string patterns back:
http://www.regexplanet.com/advanced/java/index.html

java eclipse regex cant "\+"

I need to check a String is "\++?" which will match something like +6014456
But I get this error message invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\) .... why?
It's giving you an error because "\++?" isn't a valid Java literal - you need to escape the backslash. Try this:
Pattern pattern = Pattern.compile("\\++?");
However, I don't think that's actually the regular expression you want. Don't you actually mean something like:
Pattern pattern = Pattern.compile("\\+\\d+");
That corresponds to a regular expression of \+\d+, i.e. a plus followed by at least one digit.
I think you should use two backslashes. One for escaping the second (because it's a java string), the second for escaping the + (because it's a special character for regex).
shouldn't it be more like "\\+?" ?
Pattern pattern = Pattern.compile("\\++?");
System.out.println(pattern.matcher("+9970").find());
works for me

Categories