Why does replaceAll fail with "illegal group reference"? - java

I am in need to replace
\\\s+\\$\\$ to $$
I used
String s = " $$";
s = s.replaceAll("\\s+\\$\\$","$$");
but it throws exception
java.lang.IllegalArgumentException: Illegal group reference

From String#replaceAll javadoc:
Note that backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
So escaping of an arbitrary replacement string can be done using Matcher#quoteReplacement:
String s = " $$";
s = s.replaceAll("\\s+\\$\\$", Matcher.quoteReplacement("$$"));
Also escaping of the pattern can be done with Pattern#quote
String s = " $$";
s = s.replaceAll("\\s+" + Pattern.quote("$$"), Matcher.quoteReplacement("$$"));

Use "\\$\\$" in the second parameter:
String s=" $$";
s=s.replaceAll("\\s+\\$\\$","\\$\\$");
//or
//s=s.replaceAll("\\s+\\Q$$\\E","\\$\\$");
The $ is group symbol in regex's replacement parameter
So you need to escape it

The problem here is not the regular expression, but the replacement:
$ is used to refer to () matching groups. So you need to escape it as well with a backslash (and a second backslash to make the java compiler happy):
String s=" $$";
s = s.replaceAll("\\s+\\$\\$", "\\$\\$");

This is the right way. Replace the literar $ by escaped \\$
str.replaceAll("\\$", "\\\\\\$")

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void main(String []args){
String msg = "I have %s in my string";
msg = msg.replaceFirst(Pattern.quote("%s"), Matcher.quoteReplacement("$"));
System.out.println(msg);
}
}

I had the same problem, so I end up implementing replace all with split.
It solved the exception for me
public static String replaceAll(String source, String key, String value){
String[] split = source.split(Pattern.quote(key));
StringBuilder builder = new StringBuilder();
builder.append(split[0]);
for (int i = 1; i < split.length; i++) {
builder.append(value);
builder.append(split[i]);
}
while (source.endsWith(key)) {
builder.append(value);
source = source.substring(0, source.length() - key.length());
}
return builder.toString();
}

$ has special meaning in the replacement string as well as in the regex, so you have to escape it there, too:
s=s.replaceAll("\\s+\\$\\$", "\\$\\$");

String s="$$";
s=s.replaceAll("\\s+\\$\\$","$$");

Related

When use java regular-expression pattern.matcher(), source does not match regex.But, my hope result is ,source matches regex

When use java regular-expression pattern.matcher(), source does not match regex.But, my hope result is ,source matches regex.
String source = "ONE.TWO"
String regex = "^ONE\\.TWO\\..*"
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
test();
}
public static void test() {
Test stringDemo = new Test();
stringDemo.testMatcher();
}
public void testMatcher() {
String source = "ONE.TWO";
String regex = "^ONE\\.TWo\\..*";
// The result = false, "not match". But, the hope result is true, "match"
matcher(source, regex);
}
public void matcher(String source, String regex) {
Pattern pattern = Pattern.compile(regex);
boolean match = pattern.matcher(source).matches();
if (match) {
System.out.println("match");
} else {
System.out.println("not match");
}
}
}
In your code, your regular expression expects the o in TWO to be lower case and expects it to be followed by a ..
Try:
String source = "ONE.TWo.";
This will match your regular expression as coded in your question.
The expression \. means match a literal dot (rather than any character). When you code this into a Java String, you have to escape the backslash with another backslash, so it becomes "\\.".
The .* on the end of the expression means "match zero or more of any character (except line-break)".
So this would also match:
String source = "ONE.TWo.blah blah";
Well it doesn't match for two reasons:
Your regex "^ONE\\.TWo\\..*" isn't case sensitive so how do you expect TWo to match TWO.
And your regex expects a . character at the end while your string "ONE.TWO" doesn't have it.
Use the following Regex, to match your source string:
String regex = "^ONE\\.TWO\\.*.*";
Pattern matching is case sensitive by Default. In your case source has a uppercase O and regex a lowercase o.
So you have to add Pattern.CASE_INSENSITIVE or Change the case of o
Pattern pattern = Pattern.compile(regex,Pattern.CASE_INSENSITIVE );
or
String regex = "^ONE\\.TWO\\..*";
Your regex is a bit incorrect. You have an extra dot here:
String regex = "^ONE\.TWO\.(extra dot).*"
Try this one, without dot:
String regex = "^ONE\.TWO.*"
String regex = "^ONE\\.TWO\\..*"
The DOUBLE SLASH \\ in regex is escape sequence to match a SINGLE SLASH \ in Source string.
The .* at the end matches any character 0 or More times except line breaks.
To match the regex your source should be like
String source = "ONE\.TWO\three blah ##$ etc" OR
String source = "ONE\.TWO\.123##$ etc"
Basically its Any String which starts with ONE\.TWO\ and without line breaks.

How to clean a file, replacing unwanted seperators, operators, string literals

I'm working on a concordance problem where I must: "Clean the file. For this, remove all string literals (anything enclosed
in double quotes, the second of which is not preceded by an odd number
of backslashes), remove all // comments, remove all separator characters
(look these up), and operators (look these up). Do not worry about ".class literals" (we will assume they will not appear in the input file)."
I think I know how the replaceAll() method works, but I don't know what's going to be in the file. For starters, how would I go about removing all string literals? Is there a way to replace everything within two double quotes? I.E. String someString = "I want to remove this from a file plz help me, thx";
I've currently put each line of text within an ArrayList of Strings.
Here's what I've got: http://pastebin.com/N84QdLqz
I think I've come up with a solution for your string literal regex. Something like:
inputLine.replaceAll("\"([^\\\\\"]*(\\\\\")*)*([\\\\]{2})*(\\\\\")*[^\"]*\"");
should do the trick. The regex is actually significantly more readable if you print it out to the console after Java has had a chance to escape all of the characters. So if you call System.out.println() with that String, you'll get:
"([^\\"]*(\\")*)*([\\]{2})*(\\")*[^"]*"
I'll break down the original regex to explain. First there's:
"\"([^\\\\\"]*(\\\\\")*)*
This says to match a quote character (") followed by 0 or more patterns of characters that are neither backslashes (\) nor quote characters (") which are followed by 0 or more escaped quotes (\"). As you can see, since \ is typically used as an escape character in Java, any regexes using them become pretty verbose.
([\\\\]{2})*
This says to next match 0 or more sets of 2 (i.e. even-numbered amounts) of backslashes.
(\\\\\")*
This says to match a single backslash followed by a quote character, and to find 0 or more of those together.
[^\"]*\"
This says to match anything that is not a quote character, 0 or more times, followed by a quote character.
I tested my regex with an example similar to what you were asking for:
string literals (anything enclosed in double quotes, the second of which is not preceded by an odd number of backslashes)
Emphasis mine. So by this statement, if the first quote in a literal has a backslash in front of it, it doesn't matter.
String s = "This is "a test\" + "So is this"
Applying the regex with replaceAll and a replacement of \"\", you'll get:
String s = ""a test\""So is this"
which should be correct. You can completely remove the matching literal's quotes, if you want, by calling replaceAll with a replacement of "":
String s = a test\So is this"
Alternately, using this regex on something much less contrived to cause headaches:
String s = "This is \"a test\\" + "So is this"
will return:
String s = +
Yo can do something like this:
private static final String REGEX = "(\"[\\w|\\s]*\")";
private static Pattern P;
private static Matcher M;
public static void main(String args[]){
P = Pattern.compile(REGEX);
//.... your code here ....
}
public static ArrayList<String> readStringsFromFile(String fileName) throws FileNotFoundException
{
Scanner scanner = null;
scanner = new Scanner(new File(fileName));
ArrayList<String> list = new ArrayList<>();
String str = new String();
try
{
while(scanner.hasNext())
{
str = scanner.nextLine();
str = cleanLine(str);//clean the line after read
list.add(str);
}
}
catch (InputMismatchException ex)
{
}
return list;
}
public static String cleanLine(String line) {
int index;
//remove comment lines
index = line.indexOf("//");
if (index != -1) {
line = line.substring(0, index);
}
//remove everything within two double quotes
M = P.matcher(line);
String tmp = "";
while(M.find()) {
tmp = line.substring(0,M.start());
tmp += line.substring(M.end());
line = tmp;
M = P.matcher(line);
}
return line;
}

String index out of range with replace all

How can I replace mapDir surrounded by <> to a certain string?
String mapDir = "D:\\mapping\\specialists\\ts_gpc\\";
String test = "foo: <mapDir> -bar";
println(test.replaceAll("<mapDir>", mapDir));
The above gives me a StringIndexOutOfBoundsException.
This code below for me, but I think pure java has to work as well.
static String replaceWord(String original, String find, String replacement) {
int i = original.indexOf(find);
if (i < 0) {
return original; // return original if 'find' is not in it.
}
String partBefore = original.substring(0, i);
String partAfter = original.substring(i + find.length());
return partBefore + replacement + partAfter;
}
You dont need replaceAll method as you are not using regex. Instead you could work with replace api like below:
String mapDir = "D:\\mapping\\specialists\\ts_gpc\\";
String test = "foo: <mapDir> -bar";
System.out.println(test.replace("<mapDir>", mapDir));
replaceAll in String uses a regex, as specified in the documentation:
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.
Thus, you should escape your replacement string like this:
String mapDir = "D:\\mapping\\specialists\\ts_gpc\\";
String test = "foo: <mapDir> -bar";
System.out.println(test.replaceAll("<mapDir>", Matcher.quoteReplacement(mapDir)));
which gives the output:
foo: D:\mapping\specialists\ts_gpc\ -bar
Since replaceAll works with regex, you need to re-escape the backslashes:
String mapDir = "D:\\\\mapping\\\\specialists\\\\ts_gpc\\\\";
String test = "foo: <mapDir> -bar";
System.out.println(test.replaceAll("<mapDir>", mapDir));
Quoting this answer:
by the time the regex compiler sees the pattern you've given it, it
sees only a single backslash (since Java's lexer has turned the double
backwhack into a single one)

Java - Split string

i have string which is separated by "." when i try to split it by the dot it is not getting spitted.
Here is the exact code i have. Please let me know what could cause this not to split the string.
public class TestStringSplit {
public static void main(String[] args) {
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String test[] = testStr.split(".");
for (String string : test) {
System.out.println("test : " + string);
}
System.out.println("Str Length : " + test.length);
}
}
I have to separate the above string and get only the last part. in the above case it is CreateRequisitionRO not CreateRequisitionRO; please help me to get this.
You can split this string through StringTokenizer and get each word between dot
StringTokenizer tokenizer = new StringTokenizer(string, ".");
String firstToken = tokenizer.nextToken();
String secondToken = tokenizer.nextToken();
As you are finding for last word CreateRequisitionRO you can also use
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String yourString = testStr.substring(testStr.lastIndexOf('.')+1, testStr.length()-1);
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String test[] = testStr.split("\\.");
for (String string : test) {
System.out.println("test : " + string);
}
System.out.println("Str Length : " + test.length);
The "." is a regular expression wildcard you need to escape it.
Change String test[] = testStr.split("."); to String test[] = testStr.split("\\.");.
As the argument to String.split takes a regex argument, you need to escape the dot character (which means wildcard in regex):
Note that String.split takes in a regular expression, and . has special meaning in regular expression (which matches any character except for line separator), so you need to escape it:
String test[] = testStr.split("\\.");
Note that you escape the . at the level of regular expression once: \., and to specify \. in a string literal, \ needs to be escaped again. So the string to pass to String.split is "\\.".
Or another way is to specify it inside a character class, where . loses it special meaning:
String test[] = testStr.split("[.]");
You need to escape the . as it is a special character, a full list of these is available. Your split line needs to be:
String test[] = testStr.split("\\.");
Split takes a regular expression as a parameter. If you want to split by the literal ".", you need to escape the dot because that is a special character in a regular expression. Try putting 2 backslashes before your dot ("\\.") - hopefully that does what you are looking for.
String test[] = testStr.split("\\.");

Escaping double-slashes with regular expressions in Java

I have this unit test:
public void testDeEscapeResponse() {
final String[] inputs = new String[] {"peque\\\\u0f1o", "peque\\u0f1o"};
final String[] expected = new String[] {"peque\\u0f1o", "peque\\u0f1o"};
for (int i = 0; i < inputs.length; i++) {
final String input = inputs[i];
final String actual = QTIResultParser.deEscapeResponse(input);
Assert.assertEquals(
"deEscapeResponse did not work correctly", expected[i], actual);
}
}
I have this method:
static String deEscapeResponse(String str) {
return str.replaceAll("\\\\", "\\");
}
The unit test is failing with this error:
java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(String.java:686)
at java.util.regex.Matcher.appendReplacement(Matcher.java:703)
at java.util.regex.Matcher.replaceAll(Matcher.java:813)
at java.lang.String.replaceAll(String.java:2189)
at com.acme.MyClass.deEscapeResponse
at com.acme.MyClassTest.testDeEscapeResponse
Why?
Use String.replace which does a literal replacement instead of String.replaceAll which uses regular expressions.
Example:
"peque\\\\u0f1o".replace("\\\\", "\\") // gives peque\u0f1o
String.replaceAll takes a regular expression thus \\\\ is interpreted as the expression \\ which in turn matches a single \. (The replacement string also has special treatment for \ so there's an error there too.)
To make String.replaceAll work as you expect here, you would need to do
"peque\\\\u0f1o".replaceAll("\\\\\\\\", "\\\\")
I think the problem is that you're using replaceAll() instead of replace(). replaceAll expects a regular expression in the first field and you're just trying to string match.
See javadoc for Matcher:
Note that backslashes (\) and dollar
signs ($) in the replacement string
may cause the results to be different
than if it were being treated as a
literal replacement string. Dollar
signs may be treated as references to
captured subsequences as described
above, and backslashes are used to
escape literal characters in the
replacement string.
Thus with replaceAll you cannot replace anything with a backslash. Thus a really crazy workaround for your case would be str.replaceAll("\\\\(\\\\)", "$1")

Categories