Java escape characters in strings - string contains \r (need to keep "r")

Java escape characters in strings - string contains \r (need to keep "r") - java

I have a string "EAD\rgonzalez" which is passed to me.
I need to pull out "rgonzalez" from it.
I am running into problems with the "\" character.
I cannot find the index of it, I cannot replace it, etc.
Any help on pulling the data after the "\" would be appreciated.
The string that i receive is in the format of domain\username; the data can vary.
Another example would be US\ngross where \n would be interpreted as a newline character.
To clarify, I am not adding a '\', i am trying to split a string on a '\'
This string contains '\r' which in itself is a character, a special one.
I need a way to make \r contained within my string two separate characters, a '\' and an 'r'.

You haven't provided any code, but I'm assuming what you're doing is something like this:
String user = request.getParameter("user"); // user = "EAD\rgonzalez"
If you were to declare a static string in your application, you would have to escape the backslash because it is a special character for Java strings:
String user = "EAD\\rgonzalez";
To split that string on the backslash you must escape it twice in the regex that you pass to the split method. Once because backslash is a special character for Java strings and again because backslash is a special character for regex strings. So instead of one backlash you have four. The one is escaped so then you have two, and then both of them are escaped again.
String[] parts = user.split("\\\\");
Now you have split the string:
System.out.println(parts[0]); // "EAD"
System.out.println(parts[1]); // "rgonzalez"

The string that i receive is in the format of domain\username... the data can vary
The data shouldn't vary if that is the input your program expects.
where \n would be interpreted as a newline character
I'm not sure how you'd get newlines from a single line input form. If you are, then your input is invalid because it does not follow the format you're specified and are expecting. In the case where you did interpret newlines and other whitespace characters, you would either treat the whole thing as the domain, or the username, thus potentially breaking your program logic... Since you have stated the requirement of domain\username, and I don't think that requires you to handle any other form of input.
I am collecting this string from the header data from the request object in a webapp.
In that case, the raw value should not contain an escape character and is actually represented as the form "domain\\username" as a Java string. When you print the value, the escape characters aren't shown
I cannot find the index of it,
With the correct representation, indexOf("\\") will work...
pulling the data after the "\"
Since you would have the value as domain\\username, you need to escape both of the backslashes within the method of split(String pattern) since that is a regular expression.
For example,
public static void main (String[] args) throws java.lang.Exception
{
String in = "EAD\\rgonzalez";
System.out.println(in.indexOf("\\")); // find the index of '\'
String[] parts = in.split("\\\\"); // split on '\\'
System.out.println(Arrays.toString(parts));
}
Again, the string "EAD\rgonzalez" is not in the form of domain\username, as demonstrated here
System.out.print("EAD\rgonzalez".matches("[A-Z]+\\[a-z]+")); // false

The magic you need is in org.apache.commons.lang.StringEscapeUtils
Here is a demo:
package ignoreescapeseq2;
import org.apache.commons.lang.StringEscapeUtils;
/*
* #author Charles Knell
*/
public class IgnoreEscapeSeq2 {
public static void main(String[] args) {
String string = "EAD\rgonzalez"; // REQUIRED INPUT STRING
String eString = StringEscapeUtils.escapeJava(string);
String [] sArray = eString.split("\\\\");
System.out.println("domain: " + sArray[0]);
System.out.println("username: " + sArray[1]);
}
}
Here is the output:
Although this MAY answer the question, there does still seem to be a problem
if you must define the string in java. As you said, "EAD\xgonzalez" isn't a
valid java string because \x isn't a valid escape character. The solution above only works if the input string never has to be explictly defined, as in the demo.

Related

Java split with special characters

I have below code that doing a split for string using <div>\\$\\$PZ\\$\\$</div> and it's not working using the special characters.
public class HelloWorld{
public class HelloWorld{
public static void main(String []args){
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split("<div>\\$\\$PZ\\$\\$</div>", 2);
for (String a : arrOfStr)
System.out.println(a);
}
}
the output os test<div>\$\$PZ\$\$</div>test
it works when I remove the special characters
Can you please help.

As you already know, the parameter to split(...) is a regular expression, so some characters have special meaning. If you want the parameter to be treated literally, i.e. not as a regex, call the Pattern.quote(String s) method.
Example
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split(Pattern.quote("<div>\\$\\$PZ\\$\\$</div>"), 2);
for (String a : arrOfStr)
System.out.println(a);
Output
test
test
The quote() method simply surrounds the literal text with the regex \Q...\E quotation pattern1, e.g. your <div>\$\$PZ\$\$</div> text becomes:
\Q<div>\$\$PZ\$\$</div>\E
For fixed text you could just do that yourself, i.e. the following 3 versions all create the same regex to split on:
str.split(Pattern.quote("<div>\\$\\$PZ\\$\\$</div>"), 2)
str.split("\\Q<div>\\$\\$PZ\\$\\$</div>\\E", 2)
str.split("<div>\\\\\\$\\\\\\$PZ\\\\\\$\\\\\\$</div>", 2)
To me, the 3rd one, using \ to escape, is the least readable/desirable version.
If there is a lot of special characters to escape, using \Q...\E is easier than \-escaping all the special characters separately, but very few people use it, so it's fairly unknown to most.
The quote() method is especially useful when you need to treat dynamic text literally, e.g. when the text to split on is configurable by the user.
1) quote() will correctly handle literal text containing \E.

This:
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split("<div>\\\\\\$\\\\\\$PZ\\\\\\$\\\\\\$</div>", 2);
for (String a : arrOfStr) {
System.out.println(a);
}
prints:
test
test
EDIT: Why do we need all those backslashes? It's because of how we need to handle String literals representing regex expressions. This page describes the reason with examples. The essence is this:
For a backslash \...
...the pattern to match that would be \\... (to escape the escape)
... but the string literal to create that pattern would have to have one backslash to escape each of the two backslashes: \\\\.
Add to that the original need to also escape the $, that gives us our 6 backslashes in the string representation.

Retrieving string in between two characters

I have a string, and if the string contains a special character(that i have chosen) such as "+" i want to
split the string basically and take the string after it until it encounters another special character such as another "+" or "-" sign(keep in mind that these are special characters only because i want them to be).
So lets say i have this string:
String strng = "hi+hello-bye/34";
so i want what ever method you use to solve this problem to some way return the strings hello or bye or 34 alone. And if the string hand two "+" signs i also want it to return first string after those two "+" characters.
so something like this:
strng.split("\\+");
but instead of it returning "hello-bye/34" it only returns "hello".

One option here is to split the input string on the multiple special character delimiters you have in mind, while retaining these delimiters in the actual split results. Then, iterate over the array of parts and return the part which occurs immediately after the special delimiter of interest.
Here is a concise method implementing this logic. I assume that your set of delimiters is +, -, * and /, though you can easily change this to whatever you wish.
public String findMatch(String input, String delimiter) {
String[] parts = input.split("((?<=[+\\-*/])|(?=[+\\-*/]))");
String prev = null;
for (String part : parts) {
if (delimiter.equals(prev)) {
return part;
}
prev = part;
}
return null; // default value if no match found
}
Note: This method will return null if it iterates over the split input string and cannot find the special delimiter, or perhaps finds it as the last element in the array with no string proceeding it. You are free to change the default return value to whatever you wish.

Representing ^A (Unicode \u0001) correctly in Java when passed as argument

public class TestU {
public static void main(String[] args) {
String str = "\u0001";
System.out.println("str-->"+str);
System.out.println("arg[0]-->"+args[0]);
}
}
Output :
str-->^A
arg[0]-->\u0001
I am passing arg[0] as \u0001
I executed this code in linux, the command line variable is not taken as unicode special character.

The argument you pass from command line is not actually unicode character but it's a String of unicode character which is escaped with \. Ultimately, your String will become \\u0001 and that's why it is printing \u0001. Same way, if you enter \ as a command line argument it will become \\ to escape your backward slash.
While the String you have declared in main is actually unicode character.
String escapedstring = "\\u0001";//in args[0]
String unicodeChar = "\u0001";// in str
So, now you want \\u0001 to be converted into \u0001 and there are lot of ways to achieve that. i.e. you can use StringEscapeUtils#unescapeJava method of utility or you can also try following way.
String str = "\\u0001";
char unicodeChar = (char) Integer.parseInt(str.substring(2));
System.out.println(unicodeChar);
NOTE : You can find other ways to convert unicode String to unicode characters in following question.(Already provided in comment by Marcinek)
How to convert a string with Unicode encoding to a string of letters

having trouble with arrays and maybe split

String realstring = "&&&.&&&&";
Double value = 555.55555;
String[] arraystring = realstring.split(".");
String stringvalue = String.valueof(value);
String [] valuearrayed = stringvalue.split(".");
System.out.println(arraystring[0]);
Sorry if it looks bad. Rewrote on my phone. I keep getting ArrayIndexOutOfBoundsException: 0 at the System.out.println. I have looked and can't figure it out. Thanks for the help.

split() takes a regexp as argument, not a literal string. You have to escape the dot:
string.split("\\.");
or
string.split(Pattern.quote("."));
Or you could also simply use indexOf('.') and substring() to get the two parts of your string.
And if the goal is to get the integer part of a double, you could also simply use
long truncated = (long) doubleValue;

split uses regex as parameter and in regex . means "any character except line separators", so you could expect that "a.bc".split(".") would create array of empty strings like ["","","","",""]. Only reason it is not happening is because (from split javadoc)
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
so because all strings are empty you get empty array (and that is because you see ArrayIndexOutOfBoundsException).
To turn off removal mechanism you would have to use split(regex, limit) version with negative limit.
To split on . literal you need to escape it with \. (which in Java needs to be written as "\\." because \ is also Strings metacharacter) or [.] or other regex mechanism.

Dot (.) is a special character so you need to escape it.
String realstring = "&&&.&&&&";
String[] partsOfString = realstring.split("\\.");
String part1 = partsOfString[0];
String part2 = partsOfString[1];
System.out.println(part1);
this will print expected result of
&&&
Its also handy to test if given string contains this character. You can do this by doing :
if (string.contains(".")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain .");
}

Replacing strings in java question

Say I have a java source file saved into a String variable like so.
String contents = Utils.getTextFromFile(new File(fileName));
And say there is a line of text in the source file like so
String x = "Hello World\n";
Notice the newline character at the end.
In my code, I know of the existence of Hello World, but not Hello World\n
so therefore a call to
String search = "Hello World";
contents = contents.replaceAll(search, "Something Else");
will fail because of that newline character. How can I make it so it will match in the case of one or many newline characters? Could this be a regular expression I add to the end search variable?
EDIT:
I am replacing string literals with variables. I know the literals, but I dont know if they have a newline character or not. Here is an example of the code before my replacement. For the replacment, I know that application running at a time. exists, but not application running at a time.\n\n
int option = JOptionPane.showConfirmDialog(null,"There is another application running. There can only be one application\n" +
"application running at a time.\n\n" +
"Press OK to close the other application\n" +
"Press Cancel to close this application",
"Multiple Instances of weh detected",
JOptionPane.OK_CANCEL_OPTION, JOptionPane.ERROR_MESSAGE);
And here is an example after my replacement
int option = JOptionPane.showConfirmDialog(null,"There is another application running. There can only be one application\n" +
"application running at a time.\n\n" +
"Press OK to close the other application\n" +
"PRESS_CANCEL_TO_CLOSE",
"MULTIPLE_INSTANCES_OF",
JOptionPane.OK_CANCEL_OPTION,
JOptionPane.ERROR_MESSAGE);
Notice that all of the literals without newlines get replaced, such as "Multiple Instances of weh Detected" is now "MULTIPLE_INSTANCES_OF" but all of the ones with new lines do not. I am thinking that there is some regular expression I can add on to handle one or many newline characters when it tries to the replace all.

well since its actually a regular expression that is passed as the first argument you could try something like this as the first argument
String search = "[^]?[H|h]ello [W|w]orld[\n|$]?"
which will search for hello world everywhere in the start and the end of a line wether it has a \n or not.
a bit redundant and as stated it should not matter but... apparantly it does
try it (made it nifty so it matches capital as well as regular letters :P... just overdoin it)

If you're only replacing string literals, and you're ok with replacing every occurrence of the literal (not just the first one) then you should use the replace method instead of replaceAll.
Your first example should change to this:
String search = "Hello World";
contents = contents.replace(search, "Something Else");
The replaceAll does a regular expression replacement instead of a string literal replacement. This is generally slower, and is not strictly necessary for your use case.
Note that this answer assumes that the trailing newline characters can be left in the string (which you've said you are ok with in the comments).

String search = "Hello World\n\n\n";
search.replaceAll ("Hello World(\\n*)", "Guten Morgen\1");
\1 captures the first group, marked by (...), counting from the opening parenthesis. \n+ is \n for newline, but the backslash needs to be masked in Java, leading to two backslashes. * means 0 to n, so it would catch 0, 1, 2, ... newlines.

As the commenters noted, the \n should not mess this up. However, if you are ok with removing the new lines, you can try this:
contents = contents.replaceAll("search"+"\\n*", "Something Else");

As per your question you need to pass following charator
public String replaceAll(String regex, String replacement)
The two parameters are –
regex – the regular expression to match
replacement – the string to be substituted for every match
Some other method is: -
replace(char oldChar, char newChar)
replace(CharSequence target, CharSequence replacement)
replaceFirst(String regex, String replacement)
example is
import java.lang.String;
public class StringReplaceAllExample {
public static void main(String[] args) {
String str = "Introduction 1231 to 124 basic 1243 programming 34563 concepts 5455";
String Str1 = str.replaceAll("[0-9]+", "");
System.out.println(Str1);
Str1 = str.replaceAll("[a-zA-Z]+", "Java");
System.out.println(Str1);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java escape characters in strings - string contains \r (need to keep "r") - java

Related

Java split with special characters

Retrieving string in between two characters

Representing ^A (Unicode \u0001) correctly in Java when passed as argument

having trouble with arrays and maybe split

Replacing strings in java question

Categories

Resources