Replacing a character in a string with another character from another string - java

I am trying to eventually replace a sentence with another set of String. But I hit a roadblock while trying to replace a char in a String with another character of another String.
Here's what I have so far.
String letters = "abcdefghijklmnopqrstuvwxyz";
String encode = "kngcadsxbvfhjtiumylzqropwe";
// the sentence that I want to encode
String sentence = "hello, nice to meet you!";
//swapping each char of 'sentence' with the chars in 'encode'
for (int i = 0; i < sentence.length(); i++) {
int indexForEncode = letters.indexOf(sentence.charAt(i));
sentence.replace(sentence.charAt(i), encode.charAt(indexForEncode));
}
System.out.println(sentence);
This way of replacing characters doesn't work. Can someone help me?

The reason
sentence.replace(sentence.charAt(i), encode.charAt(indexForEncode));
doesn't work is that Strings are immutable (i.e., they never change).
So, sentence.replace(...) doesn't actually change sentence; rather, it returns a new String. You would need to write sentence = sentence.replace(...) to capture that result back in sentence.
OK, Strings 101: class dismissed (;->).
Now with all that said, you really don't want want to keep reassigning your partially encoded sentence back to itself, because you will, almost certainly, find yourself re-encoding characters of sentence that you already encoded. Best to leave sentence in its original form while building up the encoded string one character at a time like this:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < sentence.length(); i++){
int indexForEncode = letters.indexOf(sentence.charAt(i));
sb.append(indexForEncode != -1
? encode.charAt(indexForEncode)
: sentence.charAt(i)
);
}
sentence = sb.toString();

I would use a character array as follows. Make the changes to a character array and then use String.valueOf to get the new version of the string.
String letters = "abcdefghijklmnopqrstuvwxyz";
String encode = "kngcadsxbvfhjtiumylzqropwe";
// the sentence that I want to encode
String sentence = "hello, nice to meet you!";
char[] chars = sentence.toCharArray();
for (int i = 0; i < chars.length; i++){
int indexForEncode = letters.indexOf(sentence.charAt(i));
// if index is < 0, use original character, otherwise, encode.
chars[i] = indexForEncode < 0 ? chars[i] : encode.charAt(indexForEncode);
}
System.out.println(String.valueOf(chars));
Prints
xahhi, tbga zi jaaz wiq!

You can use codePoints method to iterate over the characters of this string and replace them with characters from another string, if any.
Try it online!
public static void main(String[] args) {
String letters = "abcdefghijklmnopqrstuvwxyz";
String encode = "kngcadsxbvfhjtiumylzqropwe";
String sentence = "hello, nice to meet you!";
String encoded = replaceCharacters(sentence, letters, encode);
String decoded = replaceCharacters(encoded, encode, letters);
System.out.println(encoded); // xahhi, tbga zi jaaz wiq!
System.out.println(decoded); // hello, nice to meet you!
}
public static String replaceCharacters(String text, String from, String to) {
// wrong cipher, return unencrypted string
if (from.length() != to.length()) return text;
// IntStream over the codepoints of this text string
return text.codePoints()
// Stream<Character>
.mapToObj(ch -> (char) ch)
// encrypt characters
.map(ch -> {
// index of this character
int i = from.indexOf(ch);
// if not present, then leave it as it is,
// otherwise replace this character
return i < 0 ? ch : to.charAt(i);
}) // Stream<String>
.map(String::valueOf)
// concatenate into a single string
.collect(Collectors.joining());
}
See also: Implementation of the Caesar cipher

Related

Replace special characters (non ASCII) in String by corresponding unicode [duplicate]

I have strings "A função", "Ãugent" in which I need to replace characters like ç, ã, and à with empty strings.
How can I remove those non-ASCII characters from my string?
I have attempted to implement this using the following function, but it is not working properly. One problem is that the unwanted characters are getting replaced by the space character.
public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
String newsrcdta = null;
char array[] = Arrays.stringToCharArray(tmpsrcdta);
if (array == null)
return newsrcdta;
for (int i = 0; i < array.length; i++) {
int nVal = (int) array[i];
boolean bISO =
// Is character ISO control
Character.isISOControl(array[i]);
boolean bIgnorable =
// Is Ignorable identifier
Character.isIdentifierIgnorable(array[i]);
// Remove tab and other unwanted characters..
if (nVal == 9 || bISO || bIgnorable)
array[i] = ' ';
else if (nVal > 255)
array[i] = ' ';
}
newsrcdta = Arrays.charArrayToString(array);
return newsrcdta;
}
This will search and replace all non ASCII letters:
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
FailedDev's answer is good, but can be improved. If you want to preserve the ascii equivalents, you need to normalize first:
String subjectString = "öäü";
subjectString = Normalizer.normalize(subjectString, Normalizer.Form.NFD);
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
=> will produce "oau"
That way, characters like "öäü" will be mapped to "oau", which at least preserves some information. Without normalization, the resulting String will be blank.
This would be the Unicode solution
String s = "A função, Ãugent";
String r = s.replaceAll("\\P{InBasic_Latin}", "");
\p{InBasic_Latin} is the Unicode block that contains all letters in the Unicode range U+0000..U+007F (see regular-expression.info)
\P{InBasic_Latin} is the negated \p{InBasic_Latin}
You can try something like this. Special Characters range for alphabets starts from 192, so you can avoid such characters in the result.
String name = "A função";
StringBuilder result = new StringBuilder();
for(char val : name.toCharArray()) {
if(val < 192) result.append(val);
}
System.out.println("Result "+result.toString());
[Updated solution]
can be used with "Normalize" (Canonical decomposition) and "replaceAll", to replace it with the appropriate characters.
import java.text.Normalizer;
import java.text.Normalizer.Form;
import java.util.regex.Pattern;
public final class NormalizeUtils {
public static String normalizeASCII(final String string) {
final String normalize = Normalizer.normalize(string, Form.NFD);
return Pattern.compile("\\p{InCombiningDiacriticalMarks}+")
.matcher(normalize)
.replaceAll("");
} ...
Or you can use the function below for removing non-ascii character from the string.
You will get know internal working.
private static String removeNonASCIIChar(String str) {
StringBuffer buff = new StringBuffer();
char chars[] = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
if (0 < chars[i] && chars[i] < 127) {
buff.append(chars[i]);
}
}
return buff.toString();
}
The ASCII table contains 128 codes, with a total of 95 printable characters, of which only 52 characters are letters:
[0-127] ASCII codes
[32-126] printable characters
[48-57] digits [0-9]
[65-90] uppercase letters [A-Z]
[97-122] lowercase letters [a-z]
You can use String.codePoints method to get a stream over int values of characters of this string and filter out non-ASCII characters:
String str1 = "A função, Ãugent";
String str2 = str1.codePoints()
.filter(ch -> ch < 128)
.mapToObj(Character::toString)
.collect(Collectors.joining());
System.out.println(str2); // A funo, ugent
Or you can explicitly specify character ranges. For example filter out everything except letters:
String str3 = str1.codePoints()
.filter(ch -> ch >= 'A' && ch <= 'Z'
|| ch >= 'a' && ch <= 'z')
.mapToObj(Character::toString)
.collect(Collectors.joining());
System.out.println(str3); // Afunougent
See also: How do I not take Special Characters in my Password Validation (without Regex)?
String s = "A função";
String stripped = s.replaceAll("\\P{ASCII}", "");
System.out.println(stripped); // Prints "A funo"
or
private static final Pattern NON_ASCII_PATTERN = Pattern.compile("\\P{ASCII}");
public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
return NON_ASCII_PATTERN.matcher(s).replaceAll("");
}
public static void main(String[] args) {
matchAndReplaceNonEnglishChar("A função"); // Prints "A funo"
}
Explanation
The method String.replaceAll(String regex, String replacement) replaces all instances of a given regular expression (regex) with a given replacement string.
Replaces each substring of this string that matches the given regular expression with the given replacement.
Java has the "\p{ASCII}" regular expression construct which matches any ASCII character, and its inverse, "\P{ASCII}", which matches any non-ASCII character. The matched characters can then be replaced with the empty string, effectively removing them from the resulting string.
String s = "A função";
String stripped = s.replaceAll("\\P{ASCII}", "");
System.out.println(stripped); // Prints "A funo"
The full list of valid regex constructs is documented in the Pattern class.
Note: If you are going to be calling this pattern multiple times within a run, it will be more efficient to use a compiled Pattern directly, rather than String.replaceAll. This way the pattern is compiled only once and reused, rather than each time replaceAll is called:
public class AsciiStripper {
private static final Pattern NON_ASCII_PATTERN = Pattern.compile("\\P{ASCII}");
public static String stripNonAscii(String s) {
return NON_ASCII_PATTERN.matcher(s).replaceAll("");
}
}
An easily-readable, ascii-printable, streams solution:
String result = str.chars()
.filter(c -> isAsciiPrintable((char) c))
.mapToObj(c -> String.valueOf((char) c))
.collect(Collectors.joining());
private static boolean isAsciiPrintable(char ch) {
return ch >= 32 && ch < 127;
}
To convert to "_": .map(c -> isAsciiPrintable((char) c) ? c : '_')
32 to 127 is equivalent to the regex [^\\x20-\\x7E] (from comment on the regex solution)
Source for isAsciiPrintable: http://www.java2s.com/Code/Java/Data-Type/ChecksifthestringcontainsonlyASCIIprintablecharacters.htm
CharMatcher.retainFrom can be used, if you're using the Google Guava library:
String s = "A função";
String stripped = CharMatcher.ascii().retainFrom(s);
System.out.println(stripped); // Prints "A funo"

Replace specific character and getting index in Java

I have tried a code to replace only specific character. In the string there are three same characters, and I want to replace the second or third character only. For example:
String line = "this.a[i] = i";
In this string there are three 'i' characters. I want to replace the second or third character only. So, the string will be
String line = "this.a[i] = "newChar";
This is my code to read the string and replace it by another string:
String EQ_VAR;
EQ_VAR = getequals(line);
int length = EQ_VAR.length();
if(length == 1){
int gindex = EQ_VAR.indexOf(EQ_VAR);
StringBuilder nsb = new StringBuilder(line);
nsb.replace(gindex, gindex, "New String");
}
The method to get the character:
String getequals(String str){
int startIdx = str.indexOf("=");
int endIdx = str.indexOf(";");
String content = str.substring(startIdx + 1, endIdx);
return content;
}
I just assume that using an index is the best option to replace a specific character. I have tried using String replace but then all 'i' characters are replaced and the result string look like this:
String line = "th'newChar's.a[newChar] = newChar";
Here's one way you could accomplish replacing all occurances except first few:
String str = "This is a text containing many i many iiii = i";
String replacement = "hua";
String toReplace = str.substring(str.indexOf("=")+1, str.length()).trim(); // Yup, gets stuff after "=".
int charsToNotReplace = 1; // Will ignore these number of chars counting from start of string
// First replace all the parts
str = str.replaceAll(toReplace, replacement);
// Then replace "charsToNotReplace" number of occurrences back with original chars.
for(int i = 0; i < charsToNotReplace; i++)
str = str.replaceFirst(replacement, toReplace);
// Just trim from "="
str = str.substring(0, str.indexOf("=")-1);
System.out.println(str);
Result: This huas a text contahuanhuang many hua many huahuahuahua;
You set set charsToNotReplace to however number of first number of chars you want to ignore. For example setting it to 2 will ignore replacing first two occurrences (well, technically).

How can I remove characters from a StringBuffer?

I am trying to make an encryption program and part of this includes having a passcode displayed at the beginning of my encrypted text and then displaying the alphabet after that with the letters contained in the passcode being removed from the alphabet. I am trying to remove the characters in the passcode from my alphabet StringBuffer but it seems like there is no easy way to do this. There is no method that automatically searches a method for all occurences of a character but there is for a String object. However, I must replace a character with another character and I want to replace a character with nothing(essentially delete it). This is my code: Any help would be appreciated.
StringBuffer alphabet = new StringBuffer("abcdefghijklmnopqrstuvwxyz");
for(int i = 0; i < pass.length(); i++)
{
char replacedletter = pass.charAt(i);
alphabet.replace(replacedletter,"");
}
System.out.println(pass + alphabet);
This might work for you:
StringBuffer s=...
for(char c: passcode.toCharArray()){
int index=-1;
while((index=s.indexOf(c))!=-1){
s.deleteCharAt(index);
}
}
You didn't indicate which version of java you are using but in 7 StringBuffer does have a replace method.
replace(int start, int end, String str)
Replaces the characters in a substring of this sequence with characters in the specified String.
Combine this with the indexOf method to replace all occurrences.
int ndx = alphabet.indexOf(String.valueOf(replacedLetter), 0);
while (ndx > -1) {
alphabet.replace(ndx, ndx + 1, "");
ndx = alphabet.indexOf(String.valueOf(replacedLetter), ndx);
}

How to distinguish two strings in a String?(How to prevent plain text injection)

Say I have two randomly generated Strings.
What can I do to make a single String with the two Strings generated, while being able to split them to get the original two Strings for later use?
For example, I have "[aweiroj\3aoierjvg0_3409" and " 4093 w_/e9 ". How can I attach those two words into one variable while being able to split them to original two Strings?
My problem is, I can't seem to find a regex for .spit() because those two strings can have any chararacters(alpabet, integer, \, /, spaces...).
EDIT
I just thought of a real life case where this could be used. Sometimes, sending plain text over network(HTTP) is better than xml or json. Slow server with fast broadband - use xml or json, fast server with slow broadband - use plain text. The answers below could prevent plain text injection. However, these methods are not benchmarked or tested, I would probably test these methods before actually using them.
The short answer is: Don't do that. Use an array, or a class with two data members, but combining the strings together into one string is probably a bad idea.
But if you have some truly obscure use case, you can:
Create a sufficiently-unique delimiter, like "<<Jee Seok Yoon's Delimiter>>".
final static String DELIM = "<<Jee Seok Yoon's Unique Delimiter>>";
String a = /*...*/;
String b = /*...*/;
String combined = a + DELIM + b;
int breakAt = combined.indexOf(DELIM);
String a1 = combined.substring(0, breakAt);
String b1 = combined.substring(breakAt + DELIM.length());
Have a simpler delimiter that you escape if present in the string.
Remember the length of the first string and store it in your unified string followed by an "end of length" delimiter.
String a = /*...*/;
String b = /*...*/;
String combined = String.valueOf(a.length()) + "|" + a + b;
int breakAt = combined.indexOf("|");
int len = Integer.parseInt(combined.substring(0, breakAt), 10);
String a1 = combined.substring(breakAt + 1, len);
String b1 = combined.substring(breakAt + 1 + len);
(Both code examples are completely off-the-cuff and untested.)
I would create a Class that holds both Strings and is able to print them seperatly and combined.
This one simply extends ArrayList so you don't need to reimplement add, get and so on:
public class ConcatedString extends ArrayList<String>
{
public String concated() {
StringBuilder b = new StringBuilder();
for (String string : this)
{
b.append(string);
}
return b.toString();
}
}
If this is a matter of serialization of some (obscure) kind, then there is at least one obvious way to do this.
Encode the strings using some encoding (HTML encoding is an easy and readable choice). Pick a character that the encoded strings cannot possibly contain, use that as a separator and concatenate them all.
Then, to retrieve, separate the strings by that character and decode the substrings using your initial method in reverse.
If you want it to work in every cases, you need to define 2 special characters :
A delimiter character
An escape character.
1-Encoding : When you concat the 2 String :
In both String,
replace all characters which equal the escape character with 2 escape characters
replace all characters which equal the delimiter character with escape + delimiter
then concat both String with the delimiter character between them.
2-Decoding : When you decode the String :
If the current character is a escape character while the next one is also a escape character, replace it with only one escape character and skip 1 character.
If the current character is a escape character while the next one is also a delimiter character, replace it with only one delimiter character and skip 1 character.
If the current character is the delimiter character, then you are between the 2 original Strings.
Here is a working example :
//I make on purpose a bad choice for escape/delimiter characters
private static final char DELIMITER = '1';
private static final char ESCAPE = '2';
public static String encode(String s1, String s2){
StringBuilder sb = new StringBuilder();
subEncode(s1, sb);
sb.append(DELIMITER);
subEncode(s2, sb);
return sb.toString();
}
private static void subEncode(String s, StringBuilder sb) {
for(char c : s.toCharArray()) {
if(c == ESCAPE) {
sb.append(ESCAPE);
sb.append(ESCAPE);
}else if(c == DELIMITER) {
sb.append(ESCAPE);
sb.append(DELIMITER);
}else {
sb.append(c);
}
}
}
public static String[] decode(String encoded) {
StringBuilder sb1 = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
StringBuilder currentSb = sb1;
char[] chars = encoded.toCharArray();
for(int i = 0; i< chars.length ; i++) {
if(chars[i] == ESCAPE) {
if(chars.length < i+2) {
throw new IllegalArgumentException("Malformed encoded String");
}
if(chars[i+1] == ESCAPE) {
currentSb.append(ESCAPE);
}else if(chars[i+1] == DELIMITER) {
currentSb.append(DELIMITER);
}
i++;
}else if(chars[i] == DELIMITER) {
currentSb=sb2;
}else {
currentSb.append(chars[i]);
}
}
return new String[]{sb1.toString(), sb2.toString()};
}
Test :
public static void main(String[] args) {
//Nominal case :
{
String s1 = "aaa";
String s2 = "bbb";
System.out.println("Encoded : " + encode(s1, s2));
System.out.println("Decoded" + Arrays.asList(decode(encode(s1,s2))));
}
//with bad characters :
{
String s1 = "111";
String s2 = "222";
System.out.println("Encoded : " + encode(s1, s2));
System.out.println("Decoded" + Arrays.asList(decode(encode(s1,s2))));
}
//with random characters :
{
String s1 = "a11a1";
String s2 = "1112bb22";
System.out.println("Encoded : " + encode(s1, s2));
System.out.println("Decoded" + Arrays.asList(decode(encode(s1,s2))));
}
}
Output :
Encoded : aaa1bbb
Decoded[aaa, bbb]
Encoded : 2121211222222
Decoded[111, 222]
Encoded : a2121a21121212122bb2222
Decoded[a11a1, 1112bb22]
Another way to do this, format the encoded String using the following format :
size_of_str_1:str1|size_of_str2:str2
Example : if string1 is 'aa' and string2 is 'bbbb', the encoded String is : '2:aa|4:bbbb'.
You decode it via String#subString(). the "hard" part is to parse the string until you finished to read the size of the next String.

How can non-ASCII characters be removed from a string?

I have strings "A função", "Ãugent" in which I need to replace characters like ç, ã, and à with empty strings.
How can I remove those non-ASCII characters from my string?
I have attempted to implement this using the following function, but it is not working properly. One problem is that the unwanted characters are getting replaced by the space character.
public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
String newsrcdta = null;
char array[] = Arrays.stringToCharArray(tmpsrcdta);
if (array == null)
return newsrcdta;
for (int i = 0; i < array.length; i++) {
int nVal = (int) array[i];
boolean bISO =
// Is character ISO control
Character.isISOControl(array[i]);
boolean bIgnorable =
// Is Ignorable identifier
Character.isIdentifierIgnorable(array[i]);
// Remove tab and other unwanted characters..
if (nVal == 9 || bISO || bIgnorable)
array[i] = ' ';
else if (nVal > 255)
array[i] = ' ';
}
newsrcdta = Arrays.charArrayToString(array);
return newsrcdta;
}
This will search and replace all non ASCII letters:
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
FailedDev's answer is good, but can be improved. If you want to preserve the ascii equivalents, you need to normalize first:
String subjectString = "öäü";
subjectString = Normalizer.normalize(subjectString, Normalizer.Form.NFD);
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
=> will produce "oau"
That way, characters like "öäü" will be mapped to "oau", which at least preserves some information. Without normalization, the resulting String will be blank.
This would be the Unicode solution
String s = "A função, Ãugent";
String r = s.replaceAll("\\P{InBasic_Latin}", "");
\p{InBasic_Latin} is the Unicode block that contains all letters in the Unicode range U+0000..U+007F (see regular-expression.info)
\P{InBasic_Latin} is the negated \p{InBasic_Latin}
You can try something like this. Special Characters range for alphabets starts from 192, so you can avoid such characters in the result.
String name = "A função";
StringBuilder result = new StringBuilder();
for(char val : name.toCharArray()) {
if(val < 192) result.append(val);
}
System.out.println("Result "+result.toString());
[Updated solution]
can be used with "Normalize" (Canonical decomposition) and "replaceAll", to replace it with the appropriate characters.
import java.text.Normalizer;
import java.text.Normalizer.Form;
import java.util.regex.Pattern;
public final class NormalizeUtils {
public static String normalizeASCII(final String string) {
final String normalize = Normalizer.normalize(string, Form.NFD);
return Pattern.compile("\\p{InCombiningDiacriticalMarks}+")
.matcher(normalize)
.replaceAll("");
} ...
Or you can use the function below for removing non-ascii character from the string.
You will get know internal working.
private static String removeNonASCIIChar(String str) {
StringBuffer buff = new StringBuffer();
char chars[] = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
if (0 < chars[i] && chars[i] < 127) {
buff.append(chars[i]);
}
}
return buff.toString();
}
The ASCII table contains 128 codes, with a total of 95 printable characters, of which only 52 characters are letters:
[0-127] ASCII codes
[32-126] printable characters
[48-57] digits [0-9]
[65-90] uppercase letters [A-Z]
[97-122] lowercase letters [a-z]
You can use String.codePoints method to get a stream over int values of characters of this string and filter out non-ASCII characters:
String str1 = "A função, Ãugent";
String str2 = str1.codePoints()
.filter(ch -> ch < 128)
.mapToObj(Character::toString)
.collect(Collectors.joining());
System.out.println(str2); // A funo, ugent
Or you can explicitly specify character ranges. For example filter out everything except letters:
String str3 = str1.codePoints()
.filter(ch -> ch >= 'A' && ch <= 'Z'
|| ch >= 'a' && ch <= 'z')
.mapToObj(Character::toString)
.collect(Collectors.joining());
System.out.println(str3); // Afunougent
See also: How do I not take Special Characters in my Password Validation (without Regex)?
String s = "A função";
String stripped = s.replaceAll("\\P{ASCII}", "");
System.out.println(stripped); // Prints "A funo"
or
private static final Pattern NON_ASCII_PATTERN = Pattern.compile("\\P{ASCII}");
public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
return NON_ASCII_PATTERN.matcher(s).replaceAll("");
}
public static void main(String[] args) {
matchAndReplaceNonEnglishChar("A função"); // Prints "A funo"
}
Explanation
The method String.replaceAll(String regex, String replacement) replaces all instances of a given regular expression (regex) with a given replacement string.
Replaces each substring of this string that matches the given regular expression with the given replacement.
Java has the "\p{ASCII}" regular expression construct which matches any ASCII character, and its inverse, "\P{ASCII}", which matches any non-ASCII character. The matched characters can then be replaced with the empty string, effectively removing them from the resulting string.
String s = "A função";
String stripped = s.replaceAll("\\P{ASCII}", "");
System.out.println(stripped); // Prints "A funo"
The full list of valid regex constructs is documented in the Pattern class.
Note: If you are going to be calling this pattern multiple times within a run, it will be more efficient to use a compiled Pattern directly, rather than String.replaceAll. This way the pattern is compiled only once and reused, rather than each time replaceAll is called:
public class AsciiStripper {
private static final Pattern NON_ASCII_PATTERN = Pattern.compile("\\P{ASCII}");
public static String stripNonAscii(String s) {
return NON_ASCII_PATTERN.matcher(s).replaceAll("");
}
}
An easily-readable, ascii-printable, streams solution:
String result = str.chars()
.filter(c -> isAsciiPrintable((char) c))
.mapToObj(c -> String.valueOf((char) c))
.collect(Collectors.joining());
private static boolean isAsciiPrintable(char ch) {
return ch >= 32 && ch < 127;
}
To convert to "_": .map(c -> isAsciiPrintable((char) c) ? c : '_')
32 to 127 is equivalent to the regex [^\\x20-\\x7E] (from comment on the regex solution)
Source for isAsciiPrintable: http://www.java2s.com/Code/Java/Data-Type/ChecksifthestringcontainsonlyASCIIprintablecharacters.htm
CharMatcher.retainFrom can be used, if you're using the Google Guava library:
String s = "A função";
String stripped = CharMatcher.ascii().retainFrom(s);
System.out.println(stripped); // Prints "A funo"

Categories