ReplaceAll when it is not alpha characters - java

I need to replace all the occurrences of a word in a String when it is between non alpha characters(digits, blankspaces...etc) or at the beginning or the end of the String for a $0. However, my Regex pattern does not seem to work when I use replaceAll.
I have tried several solutions which I found on the web, like Pattern.quote, but the pattern doesn't seem to work. However, it works perfectly on https://regexr.com/
public static final String REPLACE_PATTERN = "(?<=^|[^A-Za-z])(%s)(?=[^A-Za-z]|$)";
String patternToReplace = String.format(REPLACE_PATTERN, "a");
inputString = inputString.replaceAll(Pattern.quote(patternToReplace), "$0");
For example, with the string and the word "a":
a car4is a5car
I expect the output to be:
$0 car4is $05car

Just change from inputString.replaceAll(Pattern.quote(patternToReplace), "$0"); to inputString.replaceAll(patternToReplace, "\\$0");
I have tested with this code :
public static final String REPLACE_PATTERN = "(?<=^|[^A-Za-z])(%s)(?=[^A-Za-z]|$)";
String patternToReplace = String.format(REPLACE_PATTERN, "a");
inputString = inputString.replaceAll(patternToReplace, "\\$0");
System.out.println(inputString);
Output :
$0 car4is $05car
Hope this helps you :)

When you want to replace the matching parts of the string with "$0", you have to write it
"\\$0". This is because $0 has a special meaning: The matching string. So you replace the string by itself.

You are quoting the wrong thing. You should not quote the pattern. You should quote "a" - the part of the pattern that should be treated literally.
String patternToReplace = String.format(REPLACE_PATTERN, Pattern.quote("a"));
If you are never going to put anything other letters in the second argument of format, then you don't need to quote at all, because letters do not have special meaning in regex.
Additionally, $ has special meaning when used as the replacement, so you need to escape it:
inputString = inputString.replaceAll(patternToReplace, "\\$0");

Pattern.quote() returns regex literal and everything in-between is treated like a text.
You should use Matcher to replace all string occurrences. Besides that, as #Donat pointed out, $0 is treated like a regex variable, so you need to escape it.
inputString = Pattern.compile(patternToReplace).matcher(inputString).replaceAll("\\$0");

Related

Pattern Replacing in string with escaped characters failing with replaceAll

I have a use case where I want to replace some values in html string, so I need to do replaceAll for that, but that is not working, although replace is working fine, here is my code:
String str = "<style type=\"text/css\">#include(\"Invoice_Service_Tax.css\")</style>";
String pattern = "#include(\"Invoice_Service_Tax.css\")";
System.out.println(str.replace(pattern, "some-value"));
System.out.println(str.replaceAll(pattern, "some-value"));
output is :
<style type="text/css">some-value</style>
<style type="text/css">#include("Invoice_Service_Tax.css")</style>
For my use case I need to do replaceAll only, I tried with below patterns also but no help:
"#include(\\\"Invoice_Service_Tax.css\\\")"
"#include(Invoice_Service_Tax.css)"
Replace doesn't look for special chars, just a literal replace while replaceAll uses regexes so there are some special characters.
The problem with the regex is that ( is a special character for grouping so you need to escape it.
#include\\(\"Invoice_Service_Tax.css\"\\) should work with your replaceAll
The key difference between String.replace and String.replaceAll is that the first parameter for String.replace is string literal, but for String.replaceAll it is a regex. java doc of those two methods has great explanation about it. So if there's special chars like \ or $ in the string which you want to replace, you'll see the different behaves again, like:
public static void main(String[] args) {
String str = "<style type=\"text/css\">#include\"Invoice_Service_Tax\\.css\"</style>";
String pattern = "#include\"Invoice_Service_Tax\\.css\"";
System.out.println(str.replace(pattern, "some-value")); // works
System.out.println(str.replaceAll(pattern, "some-value")); // not works, pattern should be: "#include\"Invoice_Service_Tax\\\\.css\""
}

how to remove only charactres in a given string in java?

I am trying to remove only [A-z|a-z] like this:
String input ="A021001208A 711100609C 01111";
String clean = input.replaceAll("\\D+^\\s+","");
System.out.println(clean.toString());
but the above code also removes the spaces; I don't want to remove space.
The expected output is:
021001208 711100609 01111
Please help me to formate the reg-ex to remove only characters.
Just replace [a-zA-Z] then:
String clean = input.replaceAll("(?i)[A-Z]+","");
(?i) is ignore case embedded flag expression.
Rather than use a positive character class, use a negated one. The regex you want is:
[^\d\s]
Which means "any character other than a digit or a whitespace".
When coded as java, it looks like:
String clean = input.replaceAll("[^\\d\\s]","");
Try this it will replace all occurence of alphabet from the given string.
String clean = input.replaceAll("[^a-zA-Z]", "");
You have to use [a-zA-Z] regular expression. So your .replaceAll() method will look like as below :
String clean = input.replaceAll("[a-zA-Z]","");

Splitting a string with java not working

I have a string in java that looks something like:
holdingco^(218) 333-4444^scott#holdingco.com
I set a string variable equal to it:
String value = "holdingco^(218) 333-4444^scott#holdingco.com";
Then I want to split this string into it's components:
String[] components = value.split("^");
However it does not split up the string. I have tried escaping the carrot delimiter to no avail.
Use
String[] components = value.split("\\^");
The unescaped ^ means beginning of a string in a regex, and the unescaped $ means end. You have to use two backslashes for escaping, as the string literal "\\" represents a single backslash, and that's what regex needs.
If you tried escaping with one backslash, it didn't compile, as \^ is not a valid escape sequence in Java.
try with: value.split("\\^"); this should work a bit better

regex to convert find instances a single \

I am looking to replace \n with \\n but so far my regex attempts are not working (Really it is any \ by itself, \n just happens to be the use case I have in the data).
What I need is something along the lines of:
any-non-\ followed by \ followed by any-non-\
Ultimately I'll be passing the regex to java.lang.String.replaceAll so a regex formatted for that would be great, but I can probably translate another style regex into what I need.
For example I after this program to print out "true"...
public class Main
{
public static void main(String[] args)
{
final String original;
final String altered;
final String expected;
original = "hello\nworld";
expected = "hello\\nworld";
altered = original.replaceAll("([^\\\\])\\\\([^\\\\])", "$1\\\\$2");
System.out.println(altered.equals(expected));
}
}
using this does work:
altered = original.replaceAll("\\n", "\\\\n");
The string should be
"[^\\\\]\\\\[^\\\\]"
You have to quadruple backslashes in a String constant that's meant for a regex; if you only doubled them, they would be escaped for the String but not for the regex.
So the actual code would be
myString = myString.replaceAll("([^\\\\])\\\\([^\\\\])", "$1\\\\$2");
Note that in the replacement, a quadruple backslash is now interpreted as two backslashes rather than one, since the regex engine is not parsing it. Edit: Actually, the regex engine does parse it since it has to check for the backreferences.
Edit: The above was assuming that there was a literal \n in the input string, which is represented in a string literal as "\\n". Since it apparently has a newline instead (represented as "\n"), the correct substitution would be
myString = myString.replaceAll("\\n", "\\\\n");
This must be repeated for any other special characters (\t, \r, \0, \\, etc.). As above, the replacement string looks exactly like the regex string but isn't.
So whenever there is 1 backslash, you want 2, but if there is 2, 3 or 4... in a row, leave them alone?
you want to replace
(?<=[^\\])\\(?!\\+)([^\\])
with
\\$1
That changes the string
hello\nworld and hello\\nworld and hello\\\nworld
into
hello\\nworld and hello\\nworld and hello\\\nworld
I don't know exactly what you need it for, but you could have a look at StringEscapeUtils from Commons Lang. They have plenty of methods doing things like that, and if you don't find exactly what you're searching for, you could have a look at the source to find inspiration :)
Whats wrong with using altered = original.replaceAll("\\n", "\\\\n"); ? That's exactly what i would have done.

Problem replacing words using [^a-zA-Z] regex

Just could not get this one and googling did not help much either..
First something that I know: Given a string and a regex, how to replace all the occurrences of strings that matches this regular expression by a replacement string ? Use the replaceAll() method in the String class.
Now something that I am unable to do. The regex I have in my code now is [^a-zA-Z] and I know for sure that this regex is definitely going to have a range. Only some more characters might be added to the list. What I need as output in the code below is Worksheet+blah but what I get using replaceAll() is Worksheet++++blah
String homeworkTitle = "Worksheet%#5_blah";
String unwantedCharactersRegex = "[^a-zA-Z]";
String replacementString = "+";
homeworkTitle = homeworkTitle.replaceAll(unwantedCharactersRegex,replacementString);
System.out.println(homeworkTitle);
What is the way to achieve the output that I wish for? Are there any Java methods that I am missing here?
[^a-zA-Z]+
Will do it nicely.
You just need a greedy quantifier in order to match as many non-alphabetical characters you can, and replace the all match by one '+' (a - by default - greedy quantifier)
Note: [^a-zA-Z]+? would make the '+' quantifier lazy, and would have give you the same result than [^a-zA-Z], since it would only have matched only one non-alphabetical character at a time.
String unwantedCharactersRegex = "[^a-zA-Z]"
This matches a single non-letter. So each single non-letter is replaced by a +. You need to say "one or more", so try
String unwantedCharactersRegex = "[^a-zA-Z]+"

Categories