regex to convert find instances a single \ - java

I am looking to replace \n with \\n but so far my regex attempts are not working (Really it is any \ by itself, \n just happens to be the use case I have in the data).
What I need is something along the lines of:
any-non-\ followed by \ followed by any-non-\
Ultimately I'll be passing the regex to java.lang.String.replaceAll so a regex formatted for that would be great, but I can probably translate another style regex into what I need.
For example I after this program to print out "true"...
public class Main
{
public static void main(String[] args)
{
final String original;
final String altered;
final String expected;
original = "hello\nworld";
expected = "hello\\nworld";
altered = original.replaceAll("([^\\\\])\\\\([^\\\\])", "$1\\\\$2");
System.out.println(altered.equals(expected));
}
}
using this does work:
altered = original.replaceAll("\\n", "\\\\n");

The string should be
"[^\\\\]\\\\[^\\\\]"
You have to quadruple backslashes in a String constant that's meant for a regex; if you only doubled them, they would be escaped for the String but not for the regex.
So the actual code would be
myString = myString.replaceAll("([^\\\\])\\\\([^\\\\])", "$1\\\\$2");
Note that in the replacement, a quadruple backslash is now interpreted as two backslashes rather than one, since the regex engine is not parsing it. Edit: Actually, the regex engine does parse it since it has to check for the backreferences.
Edit: The above was assuming that there was a literal \n in the input string, which is represented in a string literal as "\\n". Since it apparently has a newline instead (represented as "\n"), the correct substitution would be
myString = myString.replaceAll("\\n", "\\\\n");
This must be repeated for any other special characters (\t, \r, \0, \\, etc.). As above, the replacement string looks exactly like the regex string but isn't.

So whenever there is 1 backslash, you want 2, but if there is 2, 3 or 4... in a row, leave them alone?
you want to replace
(?<=[^\\])\\(?!\\+)([^\\])
with
\\$1
That changes the string
hello\nworld and hello\\nworld and hello\\\nworld
into
hello\\nworld and hello\\nworld and hello\\\nworld

I don't know exactly what you need it for, but you could have a look at StringEscapeUtils from Commons Lang. They have plenty of methods doing things like that, and if you don't find exactly what you're searching for, you could have a look at the source to find inspiration :)

Whats wrong with using altered = original.replaceAll("\\n", "\\\\n"); ? That's exactly what i would have done.

Related

ReplaceAll when it is not alpha characters

I need to replace all the occurrences of a word in a String when it is between non alpha characters(digits, blankspaces...etc) or at the beginning or the end of the String for a $0. However, my Regex pattern does not seem to work when I use replaceAll.
I have tried several solutions which I found on the web, like Pattern.quote, but the pattern doesn't seem to work. However, it works perfectly on https://regexr.com/
public static final String REPLACE_PATTERN = "(?<=^|[^A-Za-z])(%s)(?=[^A-Za-z]|$)";
String patternToReplace = String.format(REPLACE_PATTERN, "a");
inputString = inputString.replaceAll(Pattern.quote(patternToReplace), "$0");
For example, with the string and the word "a":
a car4is a5car
I expect the output to be:
$0 car4is $05car
Just change from inputString.replaceAll(Pattern.quote(patternToReplace), "$0"); to inputString.replaceAll(patternToReplace, "\\$0");
I have tested with this code :
public static final String REPLACE_PATTERN = "(?<=^|[^A-Za-z])(%s)(?=[^A-Za-z]|$)";
String patternToReplace = String.format(REPLACE_PATTERN, "a");
inputString = inputString.replaceAll(patternToReplace, "\\$0");
System.out.println(inputString);
Output :
$0 car4is $05car
Hope this helps you :)
When you want to replace the matching parts of the string with "$0", you have to write it
"\\$0". This is because $0 has a special meaning: The matching string. So you replace the string by itself.
You are quoting the wrong thing. You should not quote the pattern. You should quote "a" - the part of the pattern that should be treated literally.
String patternToReplace = String.format(REPLACE_PATTERN, Pattern.quote("a"));
If you are never going to put anything other letters in the second argument of format, then you don't need to quote at all, because letters do not have special meaning in regex.
Additionally, $ has special meaning when used as the replacement, so you need to escape it:
inputString = inputString.replaceAll(patternToReplace, "\\$0");
Pattern.quote() returns regex literal and everything in-between is treated like a text.
You should use Matcher to replace all string occurrences. Besides that, as #Donat pointed out, $0 is treated like a regex variable, so you need to escape it.
inputString = Pattern.compile(patternToReplace).matcher(inputString).replaceAll("\\$0");

Delete all the space at end of each line in Java

I'm trying to delete all the spaces and tabs at the end of all my lines.
I use following methods:
string.replace("\\s+$", "");
here, "\s+$" is a regex expression
It seems it' right. However the fact is I would not delete it.
The string is:
a) AAAAAA.
b) BBBBB.
c) CCCCCC.
d) DDDDD.
Like this:
string.replaceAll("\\s+$", "");
String#replace() does not take a regex as argument.
You need to double escape \s
EDIT: yes it does work:
public static void main(final String[] args) {
final String a = " AAAAAAAA ";
System.out.println(a.replaceAll("\\s+$", ""));
}
Maybe your confusion is that you think a.replaceAll() will modify String a. Strings in Java are immutable (they cannot change). a.replaceAll() will return the modified String.
EDIT2: If you're using a multiline String, "\\s+[$\n]" regex should do the work.
Use replaceAll(), which uses regex to find its target, rather than replace(), which uses plain text for its target.
str = str.replaceAll("(?m)\\s+$", "");
Here I've added the (?m) switch that makes carat and dollar match before/after newlines, which you'll need to make it work given the multi-line input.
Delete all the space at end of each line in Java
I guess you need String#trim() method
use
string.trim()
Returns a copy of the string, with leading and trailing whitespace omitted.

How to search for a special character in java string?

I am having some problem with searching for a special character "(".
I got a java.util.regex.PatternSyntaxException exception has occurred.
It might have something to do with "(" being treated as special character.
I am not very good with pattern expression. Can someone help me properly search for the escape character?
// I need to split the string at the "("
String myString = "Room Temperature (C)";
String splitList[] = myString.split ("("); // i got an exception
// I tried this but got compile error
String splitList[] = myString.split ("\(");
Try one of these:
string.split("\\(");
string.split(Pattern.quote("("));
Since a string split takes a regular expression as an argument, you need to escape characters properly. See Jon Skeet's answer on this here:
The reason you got an exception the first time is because split() takes a regular expression as argument, and ( has a special meaning there, as you suggest. To avoid this, you need to escape it using a \, like you tried.
What you missed, is that you also need to escape your backslashes with an extra \ in Java, meaning you need a total of two:
String splitList[] = myString.split ("\\(");
You need to escape the character via backslashes: string.split("\\(");
( is one of regex special characters. To escape it you can use e.g.
split(Pattern.quote("(")),
split("\\Q(\\E"),
split("\\("),
split("[(]").

Problems with replaceAll (i want to remove all ocurrences of [*])

i have a text with some words like [1], [2], [3] etc...
For example: houses both permanent[1] collections and temporary[2]
exhibitions of contemporary art and photography.[6]
I want to remove these words, so the string must be like this:
For example: houses both permanent collections and temporary
exhibitions of contemporary art and photography.
I tryed using: s = s.replaceAll("[.*]", ""); but it just remove the dots (.) from the text.
Wich is the correct way to achieve it?
thanks
It's because [ and ] are regex markers. This should work:
s = s.replaceAll("\\[\\d+\\]","");
(assuming that you always have numbers within the []).
If it could be any characters:
s = s.replaceAll("\\[.*?\\]","");
(thanks #PeterLawrey).
Use:
s.replaceAll("\\[[^]]+\\]", "")
[ and ] are special in a regular expression and are the delimiters of a character class, you need to escape them. Your original regex was a character class looking either for a dot or a star.
Step 1: get a better (safer) pattern. Your current one will probably remove most of your string, even if you do get it working as written. Aim for as specific as possible. This one should do (only match brackets that have digits between them).
[\d+]
Step 2: escape special regex characters. [] has a special meaning in regex syntax (character classes) so they need escaping.
\[\d+\]
Step 3: escape for string literal. \ has a special meaning in string literals (escape character) so they also need escaping.
"\\[\\d+\\]"
And now we should have some nicely working code.
s = s.replaceAll("\\[\\d+\\]", "");
Try:
public class StringTest {
public static void main(String args[]){
String str = "houses both permanent[1] collections and temporary[2] exhibitions of contemporary art and photography.[6]";
String patten = str.replaceAll("\\[[0-9]*]", "");
System.out.println(patten);
}
}
output:
houses both permanent collections and temporary exhibitions of contemporary art and photography.

Escape quote function in java not working

I need to escape quotes from a string before printing them out.
I've written a function, using char arrays to be as explicit as possible. But all this function seems to do is print out its own input:
public static String escapeQuotes(String myString) {
char[] quote=new char[]{'"'};
char[] escapedQuote=new char[]{'\\' , '"'};
return myString.replaceAll(new String(quote), new String(escapedQuote));
}
public static void main(String[] args) throws Exception {
System.out.println("asd\"");
System.out.println(escapeQuotes("asd\""));
}
I would expect the output of this to be:
asd"
asd\"
However what I get is:
asd"
asd"
Any ideas how to do this properly?
Thanks
I would try replace instead of replaceAll. The second version is designed for regex, AFAIK.
edit
From replaceAll docs
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string
So, backslash in second argument is being a problem.
The direct link to the docs
http://download.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String)
Java comes with readily available methods for escaping regexs and replacements:
public static String escapeQuotes(String myString) {
return myString.replaceAll(Pattern.quote("\""), Matcher.quoteReplacement("\\\""));
}
You also need to escape \ into \\ Otherwise any \ in your original string will remain \ and the parser, thinking that a special character (e.g. ") must follow a \ will get confused.
Every \ or " character you put into a Java literal string needs to be preceeded by \.
You want to use replace not replaceAll as the former deals with strings, the latter with regexps. Regexps will be slower in your case and will require even more backslashes.
replace which works with CharSequences i.e. Strings in this case exists from Java 1.5 onwards.
myString = myString.replace("\\", "\\\\");
myString = myString.replace("\"", "\\\"");

Categories