Eliminating Unicode Characters and Escape Characters from String - java

I want to remove all Unicode Characters and Escape Characters like (\n, \t) etc. In short I want just alphanumeric string.
For example :
\u2029My Actual String\u2029
\nMy Actual String\n
I want to fetch just 'My Actual String'. Is there any way to do so, either by using a built in string method or a Regular Expression ?

Try
String stg = "\u2029My Actual String\u2029 \nMy Actual String";
Pattern pat = Pattern.compile("(?!(\\\\(u|U)\\w{4}|\\s))(\\w)+");
Matcher mat = pat.matcher(stg);
String out = "";
while(mat.find()){
out+=mat.group()+" ";
}
System.out.println(out);
The regex matches all things except unicode and escape characters. The regex pictorially represented as:
Output:
My Actual String My Actual String

Try this:
anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.", "");
to remove escaped characters. If you also want to remove all other special characters use this one:
anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.|[^a-zA-Z0-9\\s]", "");
(I guess you want to keep the whitespaces, if not remove \\s from the one above)

Related

Regex to remove only special characters and not other language letters

I used a regex expression to remove special characters from name. The expression will remove all letters except English alphabets.
public static void main(String args[]) {
String name = "Özcan Sevim.";
name = name.replaceAll("[^a-zA-Z\\s]", " ").trim();
System.out.println(name);
}
Output:
zcan Sevim
Expected Output:
Özcan Sevim
I get bad result as I did it this way, the right way will be to remove special characters based on ASCII codes so that other letters will not be removed, can someone help me with a regex that would remove only special characters.
You can use \p{IsLatin} or \p{IsAlphabetic}
name = name.replaceAll("[^\\p{IsLatin}]", " ").trim();
Or to remove the punctuation just use \p{Punct} like this :
name = name.replaceAll("\\p{Punct}", " ").trim();
Outputs
Özcan Sevim
take a look at the full list of Summary of regular-expression constructs and use the one which can help you.
Use Guava CharMatcher for that :) It will be easier to read and maintain it.
name = CharMatcher.ASCII.negate().removeFrom(name);
use [\W+] or "[^a-zA-Z0-9]" as regex to match any special characters and also use String.replaceAll(regex, String) to replace the spl charecter with an empty string. remember as the first arg of String.replaceAll is a regex you have to escape it with a backslash to treat em as a literal charcter.
String string= "hjdg$h&jk8^i0ssh6";
Pattern pt = Pattern.compile("[^a-zA-Z0-9]");
Matcher match= pt.matcher(string);
while(match.find())
{
String s= match.group();
string=string.replaceAll("\\"+s, "");
}
System.out.println(string);

How do I make multi replace() java

I have a string with \r\n, \r, \n or \" characters in it. How can I replace them faster?
What I already have is:
String s = "Kerner\\r\\n kyky\\r hihi\\n \\\"";
System.out.println(s.replace("\\r\\n", "\n").replace("\\r", "").replace("\\n", "").replace("\\", ""));
But my code does not look beautiful enough.
I found on the Internet something like:
replace("\\r\\n|\\r|\\n|\\", "")
I tried that, but it didn't work.
You can wrap it in a method, put /r/n, /n and /r in a list. iterate the list and replace all such characters and return the modified string.
public String replaceMultipleSubstrings(String original, List<String> mylist){
String tmp = original;
for(String str: mylist){
tmp = tmp.replace(str, "");
}
return tmp;
}
Test:
mylist.add("\\r");
mylist.add("\\r\\n");
mylist.add("\\n");
mylist.add("\\"); // add back slash
System.out.println("original:" + s);
String x = new Main().replaceMultipleSubstrings(s, mylist);
System.out.println("modified:" + x);
Output:
original:Kerner\r\n kyky\r hihi\n \"
modified:Kerner kyky hihi "
I don't know if your current replacement logic be correct, but it says now that either \n, \r, or \r\n gets replaced with empty string, and backslash also gets replaced with empty string. If so, then you can try the following regex replace all:
String s = "Kerner\\r\\n kyky\\r hihi\\n \\\"";
System.out.println(s.replaceAll("\\r|\\n|\\r\\n|\\\\", ""));
One problem I saw with your attempt is that you are calling replace(), not replaceAll(), so it would only do a single replacement and then stop.
String.replaceAll() can be used, in your question you tried to use String.replace() which does not interpret regular expressions, only plain replacement strings...
You also need to escape the \\ again, i.e. \\\\ instead of \\
String s = "Kerner\\r\\n kyky\\r hihi\\n \\\"";
System.out.println(s.replaceAll("\\\\r|\\\\n|\\\\\"", ""));
Output
Kerner kyky hihi
Note the differences between String.replaceAll() and String.replace()
String.replaceAll()
Replaces each substring of this string that matches the given regular
expression with the given replacement.
String.replace()
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence.
Use a regular expression if you want to do all the replaces in one go.
http://www.javamex.com/tutorials/regular_expressions/search_replace.shtml

Parse signed number from string

I have string like:
"-------5548481818fgh7hf8ghf----fgh54f4578"
I don't want to parse using Pattern and Matcher. I have code:
string.replaceAll("regex", ""));
How to make regex to exclude all symbols except a "-" to get string like:
-554848181878544578
You can use this negative lookahead regex:
String s = "-------5548481818fgh7hf8ghf----fgh54f4578";
String r = s.replaceAll("(?!^[-+])\\D+", "");
//=> -554848181878544578
(?!^-)\D will replace each non-digit except the hyphen at start.
RegEx Demo
This will work
String Str = new String("-------5548481818fgh7hf8ghf----fgh54f4578-");
String tmp = Str.replaceAll("([-+])+|([^\\d])","$1").replaceAll("\\d[+-](\\d|$)","");
System.out.println(tmp);
Ideone Demo
Alternative: Grab the opposite, instead of replacing the negative. Seems to be arbitrary that you've picked to remove characters you don't want, instead of grabbing the characters you do want. Example in javascript:
s = "-------5548481818fgh7hf8ghf----fgh54f4578"
s = '-' + s.match(/[0-9]+/g).join('')
// "-554848181878544578"

How to split the string using '^' this special character in java?

I want to split the following string "Good^Evening" i used split option it is not split the value. please help me.
This is what I've been trying:
String Val = "Good^Evening";
String[] valArray = Val.Split("^");
I'm assuming you did something like:
String[] parts = str.split("^");
That doesn't work because the argument to split is actually a regular expression, where ^ has a special meaning. Try this instead:
String[] parts = str.split("\\^");
The \\ is really equivalent to a single \ (the first \ is required as a Java escape sequence in string literals). It is then a special character in regular expressions which means "use the next character literally, don't interpret its special meaning".
The regex you should use is "\^" which you write as "\\^" as a Java String literal; i.e.
String[] parts = "Good^Evening".split("\\^");
The regex needs a '\' escape because the caret character ('^') is a meta-character in the regex language. The 2nd '\' escape is needed because '\' is an escape in a String literal.
try this
String str = "Good^Evening";
String newStr = str.replaceAll("[^]+", "");

Java Regexp to Match ASCII Characters

What regex would match any ASCII character in java?
I've already tried:
^[\\p{ASCII}]*$
but found that it didn't match lots of things that I wanted (like spaces, parentheses, etc...). I'm hoping to avoid explicitly listing all 127 ASCII characters in a format like:
^[a-zA-Z0-9!##$%^*(),.<>~`[]{}\\/+=-\\s]*$
The first try was almost correct
"^\\p{ASCII}*$"
I have never used \\p{ASCII} but I have used ^[\\u0000-\\u007F]*$
If you only want the printable ASCII characters you can use ^[ -~]*$ - i.e. all characters between space and tilde.
https://en.wikipedia.org/wiki/ASCII#ASCII_printable_code_chart
For JavaScript it'll be /^[\x00-\x7F]*$/.test('blah')
I think question about getting ASCII characters from a raw string which has both ASCII and special characters...
public String getOnlyASCII(String raw) {
Pattern asciiPattern = Pattern.compile("\\p{ASCII}*$");
Matcher matcher = asciiPattern.matcher(raw);
String asciiString = null;
if (matcher.find()) {
asciiString = matcher.group();
}
return asciiString;
}
The above program will remove the non ascii string and return the string. Thanks to #Oleg Pavliv for pattern.
For ex:
raw = ��+919986774157
asciiString = +919986774157

Categories