I have below code that doing a split for string using <div>\\$\\$PZ\\$\\$</div> and it's not working using the special characters.
public class HelloWorld{
public class HelloWorld{
public static void main(String []args){
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split("<div>\\$\\$PZ\\$\\$</div>", 2);
for (String a : arrOfStr)
System.out.println(a);
}
}
the output os test<div>\$\$PZ\$\$</div>test
it works when I remove the special characters
Can you please help.
As you already know, the parameter to split(...) is a regular expression, so some characters have special meaning. If you want the parameter to be treated literally, i.e. not as a regex, call the Pattern.quote(String s) method.
Example
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split(Pattern.quote("<div>\\$\\$PZ\\$\\$</div>"), 2);
for (String a : arrOfStr)
System.out.println(a);
Output
test
test
The quote() method simply surrounds the literal text with the regex \Q...\E quotation pattern1, e.g. your <div>\$\$PZ\$\$</div> text becomes:
\Q<div>\$\$PZ\$\$</div>\E
For fixed text you could just do that yourself, i.e. the following 3 versions all create the same regex to split on:
str.split(Pattern.quote("<div>\\$\\$PZ\\$\\$</div>"), 2)
str.split("\\Q<div>\\$\\$PZ\\$\\$</div>\\E", 2)
str.split("<div>\\\\\\$\\\\\\$PZ\\\\\\$\\\\\\$</div>", 2)
To me, the 3rd one, using \ to escape, is the least readable/desirable version.
If there is a lot of special characters to escape, using \Q...\E is easier than \-escaping all the special characters separately, but very few people use it, so it's fairly unknown to most.
The quote() method is especially useful when you need to treat dynamic text literally, e.g. when the text to split on is configurable by the user.
1) quote() will correctly handle literal text containing \E.
This:
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split("<div>\\\\\\$\\\\\\$PZ\\\\\\$\\\\\\$</div>", 2);
for (String a : arrOfStr) {
System.out.println(a);
}
prints:
test
test
EDIT: Why do we need all those backslashes? It's because of how we need to handle String literals representing regex expressions. This page describes the reason with examples. The essence is this:
For a backslash \...
...the pattern to match that would be \\... (to escape the escape)
... but the string literal to create that pattern would have to have one backslash to escape each of the two backslashes: \\\\.
Add to that the original need to also escape the $, that gives us our 6 backslashes in the string representation.
Related
I used a regex expression to remove special characters from name. The expression will remove all letters except English alphabets.
public static void main(String args[]) {
String name = "Özcan Sevim.";
name = name.replaceAll("[^a-zA-Z\\s]", " ").trim();
System.out.println(name);
}
Output:
zcan Sevim
Expected Output:
Özcan Sevim
I get bad result as I did it this way, the right way will be to remove special characters based on ASCII codes so that other letters will not be removed, can someone help me with a regex that would remove only special characters.
You can use \p{IsLatin} or \p{IsAlphabetic}
name = name.replaceAll("[^\\p{IsLatin}]", " ").trim();
Or to remove the punctuation just use \p{Punct} like this :
name = name.replaceAll("\\p{Punct}", " ").trim();
Outputs
Özcan Sevim
take a look at the full list of Summary of regular-expression constructs and use the one which can help you.
Use Guava CharMatcher for that :) It will be easier to read and maintain it.
name = CharMatcher.ASCII.negate().removeFrom(name);
use [\W+] or "[^a-zA-Z0-9]" as regex to match any special characters and also use String.replaceAll(regex, String) to replace the spl charecter with an empty string. remember as the first arg of String.replaceAll is a regex you have to escape it with a backslash to treat em as a literal charcter.
String string= "hjdg$h&jk8^i0ssh6";
Pattern pt = Pattern.compile("[^a-zA-Z0-9]");
Matcher match= pt.matcher(string);
while(match.find())
{
String s= match.group();
string=string.replaceAll("\\"+s, "");
}
System.out.println(string);
I have a string "EAD\rgonzalez" which is passed to me.
I need to pull out "rgonzalez" from it.
I am running into problems with the "\" character.
I cannot find the index of it, I cannot replace it, etc.
Any help on pulling the data after the "\" would be appreciated.
The string that i receive is in the format of domain\username; the data can vary.
Another example would be US\ngross where \n would be interpreted as a newline character.
To clarify, I am not adding a '\', i am trying to split a string on a '\'
This string contains '\r' which in itself is a character, a special one.
I need a way to make \r contained within my string two separate characters, a '\' and an 'r'.
You haven't provided any code, but I'm assuming what you're doing is something like this:
String user = request.getParameter("user"); // user = "EAD\rgonzalez"
If you were to declare a static string in your application, you would have to escape the backslash because it is a special character for Java strings:
String user = "EAD\\rgonzalez";
To split that string on the backslash you must escape it twice in the regex that you pass to the split method. Once because backslash is a special character for Java strings and again because backslash is a special character for regex strings. So instead of one backlash you have four. The one is escaped so then you have two, and then both of them are escaped again.
String[] parts = user.split("\\\\");
Now you have split the string:
System.out.println(parts[0]); // "EAD"
System.out.println(parts[1]); // "rgonzalez"
The string that i receive is in the format of domain\username... the data can vary
The data shouldn't vary if that is the input your program expects.
where \n would be interpreted as a newline character
I'm not sure how you'd get newlines from a single line input form. If you are, then your input is invalid because it does not follow the format you're specified and are expecting. In the case where you did interpret newlines and other whitespace characters, you would either treat the whole thing as the domain, or the username, thus potentially breaking your program logic... Since you have stated the requirement of domain\username, and I don't think that requires you to handle any other form of input.
I am collecting this string from the header data from the request object in a webapp.
In that case, the raw value should not contain an escape character and is actually represented as the form "domain\\username" as a Java string. When you print the value, the escape characters aren't shown
I cannot find the index of it,
With the correct representation, indexOf("\\") will work...
pulling the data after the "\"
Since you would have the value as domain\\username, you need to escape both of the backslashes within the method of split(String pattern) since that is a regular expression.
For example,
public static void main (String[] args) throws java.lang.Exception
{
String in = "EAD\\rgonzalez";
System.out.println(in.indexOf("\\")); // find the index of '\'
String[] parts = in.split("\\\\"); // split on '\\'
System.out.println(Arrays.toString(parts));
}
Again, the string "EAD\rgonzalez" is not in the form of domain\username, as demonstrated here
System.out.print("EAD\rgonzalez".matches("[A-Z]+\\[a-z]+")); // false
The magic you need is in org.apache.commons.lang.StringEscapeUtils
Here is a demo:
package ignoreescapeseq2;
import org.apache.commons.lang.StringEscapeUtils;
/*
* #author Charles Knell
*/
public class IgnoreEscapeSeq2 {
public static void main(String[] args) {
String string = "EAD\rgonzalez"; // REQUIRED INPUT STRING
String eString = StringEscapeUtils.escapeJava(string);
String [] sArray = eString.split("\\\\");
System.out.println("domain: " + sArray[0]);
System.out.println("username: " + sArray[1]);
}
}
Here is the output:
Although this MAY answer the question, there does still seem to be a problem
if you must define the string in java. As you said, "EAD\xgonzalez" isn't a
valid java string because \x isn't a valid escape character. The solution above only works if the input string never has to be explictly defined, as in the demo.
I have a string which I want to first split by space, and then separate the words from the special characters.
For Example, let's say the input is:
Hi, How are you???
I already wrote the logic to split by space here:
String input = "Hi, How are you???";
String[] words = input.split("\\\\s+");
Now, I want to seperate each word from the special character.
For example: "Hi," to {"Hi", ","} and "you???" to {"you", "???"}
If the string does not end with any special characters, just ignore it.
Can you please help me with the regular expression and code for this in Java?
Following regex should help you out:
(\s+|[^A-Za-z0-9]+)
This is not a java regex, so you need to add a backspace.
It matches on whitespaces \s+ and on strings of characters consisting not of A-Za-z0-9. This is a workaround, since there isn't (or at least I do not know of) a regex for special characters.
You can test this regex here.
If you use this regex with the split function, it will return the words. Not the special characters and whitespaces it machted on.
UPDATE
According to this answer here on SO, java has\P{Alpha}+, which matches any non-alphabetic character. So you could try:
(\s|\P{Alpha})+
I want to separate each word from the special character.
For example: "Hi," to {"Hi", ","} and "you???" to {"you", "???"}
regex to achieve above behavior
String stringToSearch ="Hi, you???";
Pattern p1 = Pattern.compile("[a-z]{0}\\b");
String[] str = p1.split(stringToSearch);
System.out.println(Arrays.asList(str));
output:
[Hi, , , you, ???]
#mike is right...we need to split the sentence on special characters, leaving out the words. Here is the code:
`public static void main(String[] args) {
String match = "Hi, How are you???";
String[] words = match.split("\\P{Alpha}+");
for(String word: words) {
System.out.print(word + " ");
}
}`
I want to split a string against the following characters
~!#$%^&*()_+=<>,.?/:;"'{}|[]\, \n,\t, space
I tried to use \\s regex delimiter but i don't want the # included as the split character so that a string like this is #funny should result to this is #funny as the resulting values.
I have tried the following but it doesn't work.
this is #funny".split("\\s")
but it doesn't work. Any ideas?
Just specify the characters you want in square bracket, which means any of. Single escape Java characters (like \") and double escape Regex special characters (like \\[):
#Test
public void testName() throws Exception
{
String[] split = "this is #funny".split("[~!#$%^&*()_+=<>,.?/:;\"'{}|\\[\\]\\\\ \\n\\t]");
for (String string : split)
{
logger.debug(string);
}
}
User replaceAll(String regex,String replacement) method from String.
String result = "this is #funny".replaceAll("[~!#$%^&*()_+=<>,.?/:;\"'{}|\\[\\]\\,\\n\\t]", "");
System.out.println(result);
You can try to implement this:
String[] split = "this&is%a#funny^string".split("[^#\\p{Alnum}]|\\s+");
for (String string : split){
System.out.println(string);
}
Also check the Java API (Patterns) for more information on how to process strings.
It look like this will work for you:
String[] split = str.split("[^a-zA-Z&&[^#]]+");
This uses a character class subtraction to split on non-letter chars, except the hash.
Here's some test code:
String str = "this is #funny";
String[] split = str.split("[^a-zA-Z&&[^#]]+");
System.out.println(Arrays.toString(split));
Output:
[this, is, #funny]
I'm having a problem with the split function returning a blank array. When any index is called from the array, it throws an index out of bounds exception. Here is the code:
class splitNumber {
public static void main(String[] args) {
String number = "3.84";
String[] sep = number.split(".");
System.out.println(sep[0]);
}
}
Is there any fix or workaround for this? I'm using Java SE 7.
As noted elsewhere, String#split takes a regex. One alternate way to construct that regex is to use Pattern#quote:
String number = "3.84";
String[] sep = number.split(Pattern.quote("."));
System.out.println(Arrays.toString(sep));
This saves you from typing a bunch of tedious escape chars.
"." is a special character in regex and String's split method accepts regex, which you need to escape like:
String[] sep = number.split("\\.");
Use "\."
"Note that this takes a regular expression, so remember to escape special characters if necessary, e.g. if you want to split on period . which means "any character" in regex, use either split("\.") or split(Pattern.quote("."))."
How to split a string in Java