Pattern Replacing in string with escaped characters failing with replaceAll - java

I have a use case where I want to replace some values in html string, so I need to do replaceAll for that, but that is not working, although replace is working fine, here is my code:
String str = "<style type=\"text/css\">#include(\"Invoice_Service_Tax.css\")</style>";
String pattern = "#include(\"Invoice_Service_Tax.css\")";
System.out.println(str.replace(pattern, "some-value"));
System.out.println(str.replaceAll(pattern, "some-value"));
output is :
<style type="text/css">some-value</style>
<style type="text/css">#include("Invoice_Service_Tax.css")</style>
For my use case I need to do replaceAll only, I tried with below patterns also but no help:
"#include(\\\"Invoice_Service_Tax.css\\\")"
"#include(Invoice_Service_Tax.css)"

Replace doesn't look for special chars, just a literal replace while replaceAll uses regexes so there are some special characters.
The problem with the regex is that ( is a special character for grouping so you need to escape it.
#include\\(\"Invoice_Service_Tax.css\"\\) should work with your replaceAll

The key difference between String.replace and String.replaceAll is that the first parameter for String.replace is string literal, but for String.replaceAll it is a regex. java doc of those two methods has great explanation about it. So if there's special chars like \ or $ in the string which you want to replace, you'll see the different behaves again, like:
public static void main(String[] args) {
String str = "<style type=\"text/css\">#include\"Invoice_Service_Tax\\.css\"</style>";
String pattern = "#include\"Invoice_Service_Tax\\.css\"";
System.out.println(str.replace(pattern, "some-value")); // works
System.out.println(str.replaceAll(pattern, "some-value")); // not works, pattern should be: "#include\"Invoice_Service_Tax\\\\.css\""
}

Related

ReplaceAll when it is not alpha characters

I need to replace all the occurrences of a word in a String when it is between non alpha characters(digits, blankspaces...etc) or at the beginning or the end of the String for a $0. However, my Regex pattern does not seem to work when I use replaceAll.
I have tried several solutions which I found on the web, like Pattern.quote, but the pattern doesn't seem to work. However, it works perfectly on https://regexr.com/
public static final String REPLACE_PATTERN = "(?<=^|[^A-Za-z])(%s)(?=[^A-Za-z]|$)";
String patternToReplace = String.format(REPLACE_PATTERN, "a");
inputString = inputString.replaceAll(Pattern.quote(patternToReplace), "$0");
For example, with the string and the word "a":
a car4is a5car
I expect the output to be:
$0 car4is $05car
Just change from inputString.replaceAll(Pattern.quote(patternToReplace), "$0"); to inputString.replaceAll(patternToReplace, "\\$0");
I have tested with this code :
public static final String REPLACE_PATTERN = "(?<=^|[^A-Za-z])(%s)(?=[^A-Za-z]|$)";
String patternToReplace = String.format(REPLACE_PATTERN, "a");
inputString = inputString.replaceAll(patternToReplace, "\\$0");
System.out.println(inputString);
Output :
$0 car4is $05car
Hope this helps you :)
When you want to replace the matching parts of the string with "$0", you have to write it
"\\$0". This is because $0 has a special meaning: The matching string. So you replace the string by itself.
You are quoting the wrong thing. You should not quote the pattern. You should quote "a" - the part of the pattern that should be treated literally.
String patternToReplace = String.format(REPLACE_PATTERN, Pattern.quote("a"));
If you are never going to put anything other letters in the second argument of format, then you don't need to quote at all, because letters do not have special meaning in regex.
Additionally, $ has special meaning when used as the replacement, so you need to escape it:
inputString = inputString.replaceAll(patternToReplace, "\\$0");
Pattern.quote() returns regex literal and everything in-between is treated like a text.
You should use Matcher to replace all string occurrences. Besides that, as #Donat pointed out, $0 is treated like a regex variable, so you need to escape it.
inputString = Pattern.compile(patternToReplace).matcher(inputString).replaceAll("\\$0");

How to replace special character }} in a string with "" using regexp in java

I am trying to replace special character }} in a string with "" using regexp in Java, I tired the below two methods and it doesn't work. Please let me know what is wrong with these statements.
Note the string would also contain } which I would like to retain. Goal is to replace only }}.
Method 1:
String buffer = obj.toJSONString() + ",";
String result = buffer.replaceAll(Pattern.quote("(?<![\\w\\d])}}(?![\\w\\d])"), "");
Method 2:
Pattern.compile("(?<![\\w\\d])}}(?![\\w\\d])").matcher(buffer).replaceAll("");
The quote in the following:
String result = buffer.replaceAll(Pattern.quote("(?<![\\w\\d])}}(?![\\w\\d])"), "");
says to treat the regex as a literal string. That's wrong.
If you simply want to remove all }} irrespective of context:
String result = buffer.replaceAll(Pattern.quote("}}"), "");
If you do need to respect the context, don't Pattern.quote(...) the regex!
The other problem is in the way that you attempt to specify the character classes. Since \d is a subset of \w, it is unnecessary to combine them. Just do this instead:
String result = buffer.replaceAll("(?<!\\w)\\}\\}(?!\\w)"), "");
I'm not sure if it is strictly necessary to quote the } characters, but it is harmless if it is not necessary.
Dont' use Pattern.quote, use a literal regex pattern, and escape the brackets:
Stringbuffer = obj.toJSONString() + ",";
String result = buffer.replaceAll("(?<![\\w\\d])\\}\\}(?![\\w\\d])", "");
Using Pattern.quote tells the regex engine to treat the string as literal. This does mean the brackets would not have to be escaped, but it would also render your lookarounds as literal text, probably not what you have in mind.
The method 2 still needs to escape special characters }
Pattern.compile("(?<![\\w\\d])\\}\\}(?![\\w\\d])").matcher(buffer).replaceAll("");
Can you please try same with Apache StringUtils. It’s faster and should work in your case. Kindly find following links for reference.
apache-stringutils-vs-java-implementation-of-replace
Apache StringUtils 3.6

Why split method does not support $,* etc delimiter to split string

import java.util.StringTokenizer;
class MySplit
{
public static void main(String S[])
{
String settings = "12312$12121";
StringTokenizer splitedArray = new StringTokenizer(settings,"$");
String splitedArray1[] = settings.split("$");
System.out.println(splitedArray1[0]);
while(splitedArray.hasMoreElements())
System.out.println(splitedArray.nextToken().toString());
}
}
In above example if i am splitting string using $, then it is not working fine and if i am splitting with other symbol then it is working fine.
Why it is, if it support only regex expression then why it is working fine for :, ,, ; etc symbols.
$ has a special meaning in regex, and since String#split takes a regex as an argument, the $ is not interpreted as the string "$", but as the special meta character $. One sexy solution is:
settings.split(Pattern.quote("$"))
Pattern#quote:
Returns a literal pattern String for the specified String.
... The other solution would be escaping $, by adding \\:
settings.split("\\$")
Important note: It's extremely important to check that you actually got element(s) in the resulted array.
When you do splitedArray1[0], you could get ArrayIndexOutOfBoundsException if there's no $ symbol. I would add:
if (splitedArray1.length == 0) {
// return or do whatever you want
// except accessing the array
}
If you take a look at the Java docs you could see that the split method take a regex as parameter, so you have to write a regular expression not a simple character.
In regex $ has a specific meaning, so you have to escape it this way:
settings.split("\\$");
The problem is that the split(String str) method expects str to be a valid regular expression. The characters you have mentioned are special characters in regular expression syntax and thus perform a special operation.
To make the regular expression engine take them literally, you would need to escape them like so:
.split("\\$")
Thus given this:
String str = "This is 1st string.$This is the second string";
for(String string : str.split("\\$"))
System.out.println(string);
You end up with this:
This is 1st string.
This is the second strin
Dollar symbol $ is a special character in Java regex. You have to escape it so as to get it working like this:
settings.split("\\$");
From the String.split docs:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
On a side note:
Have a look at the Pattern class which will give you an idea as to which all characters you need to escape.
Because $ is a special character used in Regular Expressions which indicate the beginning of an expression.
You should escape it using the escape sequence \$ and in case of Java it should be \$
Hope that helps.
Cheers

Java String.replaceAll doesn't replace a quote with escaped quote

Having a hard time replacing a quote with an escaped quote. I have a string that has the value of 'Foo "bar" foobar', and I am trying to replace the quotes around bar with escaped quotes, and it isn't working. I am going crazy.
s=s.replaceAll("\"","\\\"");
I would expect s to have the value of 'foo \"bar\" foobar', but it doesn't. Any help?
replaceAll uses regular expressions - in which the backslash just escapes the next character, even in the replacement.
Use replace instead and it's fine... or you can double the backslash in the second argument if you want to use the regular expression form:
String after = before.replaceAll("\"", "\\\\\"");
This could be useful if you need to match the input using regular expressions, but I'd strongly recommend using the non-regex form unless you actually need regex behaviour.
Personally I think it was a mistake for methods in String to use regular expressions to start with - things like foo.split(".") which will split on every character rather than on periods, completely unexpectedly to the unwary developer.
Use s = s.replace("\"", "\\\"");, which is the non-regex version.
You need to escape both replace all arguments because they are regex!
#Test
public void testReplace() {
assertEquals("foo\\\"bar\\\"",
"foo\"bar\"".replaceAll("\\\"", "\\\\\\\""));
}
So if you want to write " you need to escape the regex -> \" but because you are in java you need to escape the \ and " for java to, so you get \\\\". (that is the search parameter).
For the replace parameter you want to have \" -> in regex: \\\" -> in Java \\\\\\\"
s=s.replaceAll("\"","\\\\\"");
But you should really go with replace(...) as the others here advice.
Use;
s=s.replaceAll("\"","\\\\\"");
String str = "'Foo \"Bar\" Bar'";
System.out.println(str);
System.out.println(str.replaceAll("\"", "\\\\\""));
please follow below program and use > s.replace("/"."//");
public class stringbackslash {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String s="\";
s=s.replace("\", "\\");
System.out.println(s);
}
}

regex to convert find instances a single \

I am looking to replace \n with \\n but so far my regex attempts are not working (Really it is any \ by itself, \n just happens to be the use case I have in the data).
What I need is something along the lines of:
any-non-\ followed by \ followed by any-non-\
Ultimately I'll be passing the regex to java.lang.String.replaceAll so a regex formatted for that would be great, but I can probably translate another style regex into what I need.
For example I after this program to print out "true"...
public class Main
{
public static void main(String[] args)
{
final String original;
final String altered;
final String expected;
original = "hello\nworld";
expected = "hello\\nworld";
altered = original.replaceAll("([^\\\\])\\\\([^\\\\])", "$1\\\\$2");
System.out.println(altered.equals(expected));
}
}
using this does work:
altered = original.replaceAll("\\n", "\\\\n");
The string should be
"[^\\\\]\\\\[^\\\\]"
You have to quadruple backslashes in a String constant that's meant for a regex; if you only doubled them, they would be escaped for the String but not for the regex.
So the actual code would be
myString = myString.replaceAll("([^\\\\])\\\\([^\\\\])", "$1\\\\$2");
Note that in the replacement, a quadruple backslash is now interpreted as two backslashes rather than one, since the regex engine is not parsing it. Edit: Actually, the regex engine does parse it since it has to check for the backreferences.
Edit: The above was assuming that there was a literal \n in the input string, which is represented in a string literal as "\\n". Since it apparently has a newline instead (represented as "\n"), the correct substitution would be
myString = myString.replaceAll("\\n", "\\\\n");
This must be repeated for any other special characters (\t, \r, \0, \\, etc.). As above, the replacement string looks exactly like the regex string but isn't.
So whenever there is 1 backslash, you want 2, but if there is 2, 3 or 4... in a row, leave them alone?
you want to replace
(?<=[^\\])\\(?!\\+)([^\\])
with
\\$1
That changes the string
hello\nworld and hello\\nworld and hello\\\nworld
into
hello\\nworld and hello\\nworld and hello\\\nworld
I don't know exactly what you need it for, but you could have a look at StringEscapeUtils from Commons Lang. They have plenty of methods doing things like that, and if you don't find exactly what you're searching for, you could have a look at the source to find inspiration :)
Whats wrong with using altered = original.replaceAll("\\n", "\\\\n"); ? That's exactly what i would have done.

Categories