I like to replace a certain set of characters of a string with a corresponding replacement character in an efficent way.
For example:
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
String result = replaceChars("Gračišće", sourceCharacters , targetCharacters );
Assert.equals(result,"Gracisce") == true;
Is there are more efficient way than to use the replaceAll method of the String class?
My first idea was:
final String s = "Gračišće";
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
// preparation
final char[] sourceString = s.toCharArray();
final char result[] = new char[sourceString.length];
final char[] targetCharactersArray = targetCharacters.toCharArray();
// main work
for(int i=0,l=sourceString.length;i<l;++i)
{
final int pos = sourceCharacters.indexOf(sourceString[i]);
result[i] = pos!=-1 ? targetCharactersArray[pos] : sourceString[i];
}
// result
String resultString = new String(result);
Any ideas?
Btw, the UTF-8 characters are causing the trouble, with US_ASCII it works fine.
You can make use of java.text.Normalizer and a shot of regex to get rid of the diacritics of which there exist much more than you have collected as far.
Here's an SSCCE, copy'n'paste'n'run it on Java 6:
package com.stackoverflow.q2653739;
import java.text.Normalizer;
import java.text.Normalizer.Form;
public class Test {
public static void main(String... args) {
System.out.println(removeDiacriticalMarks("Gračišće"));
}
public static String removeDiacriticalMarks(String string) {
return Normalizer.normalize(string, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
}
This should yield
Gracisce
At least, it does here at Eclipse with console character encoding set to UTF-8 (Window > Preferences > General > Workspace > Text File Encoding). Ensure that the same is set in your environment as well.
As an alternative, maintain a Map<Character, Character>:
Map<Character, Character> charReplacementMap = new HashMap<Character, Character>();
charReplacementMap.put('š', 's');
charReplacementMap.put('đ', 'd');
// Put more here.
String originalString = "Gračišće";
StringBuilder builder = new StringBuilder();
for (char currentChar : originalString.toCharArray()) {
Character replacementChar = charReplacementMap.get(currentChar);
builder.append(replacementChar != null ? replacementChar : currentChar);
}
String newString = builder.toString();
I'd use the replace method in a simple loop.
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
String s = "Gračišće";
for (int i=0 ; i<sourceCharacters.length() ; i++)
s = s.replace(sourceCharacters.charAt[i], targetCharacters.charAt[i]);
System.out.println(s);
Related
I have the a string in Java which contains hex values beneath normal characters. It looks something like this:
String s = "Hello\xF6\xE4\xFC\xD6\xC4\xDC\xDF"
What I want is to convert the hex values to the characters they represent, so it will look like this:
"HelloöäüÖÄÜß"
Is there a way to replace all hex values with the actual character they represent?
I can achieve what I want with this, but I have to do one line for every character and it does not cover unexcepted characters:
indexRequest = indexRequest.replace("\\xF6", "ö");
indexRequest = indexRequest.replace("\\xE4", "ä");
indexRequest = indexRequest.replace("\\xFC", "ü");
indexRequest = indexRequest.replace("\\xD6", "Ö");
indexRequest = indexRequest.replace("\\xC4", "Ä");
indexRequest = indexRequest.replace("\\xDC", "Ü");
indexRequest = indexRequest.replace("\\xDF", "ß");
public static void main(String[] args) {
String s = "Hello\\xF6\\xE4\\xFC\\xD6\\xC4\\xDC\\xDF\\xFF ";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("\\\\x[0-9A-F]+");
Matcher m = p.matcher(s);
while(m.find()){
String hex = m.group(); //find hex values
int num = Integer.parseInt(hex.replace("\\x", ""), 16); //parse to int
char bin = (char)num; // cast int to char
m.appendReplacement(sb, bin+""); // replace hex with char
}
m.appendTail(sb);
System.out.println(sb.toString());
}
I would loop through every chacter to find the '\' and than skip one char and start a methode with the next two chars.
And than just use the code by Michael Berry
here:
Convert a String of Hex into ASCII in Java
You can use a regex [xX][0-9a-fA-F]+ to identify all the hex code in your string, convert them to there corresponding character using Integer.parseInt(matcher.group().substring(1), 16) and replace them in string. Below is a sample code for it
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HexToCharacter {
public static void main(String[] args) {
String s = "HelloxF6xE4xFCxD6xC4xDCxDF";
StringBuilder sb = new StringBuilder(s);
Pattern pattern = Pattern.compile("[xX][0-9a-fA-F]+");
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
int indexOfHexCode = sb.indexOf(matcher.group());
sb.replace(indexOfHexCode, indexOfHexCode+matcher.group().length(), Character.toString((char)Integer.parseInt(matcher.group().substring(1), 16)));
}
System.out.println(sb.toString());
}
}
I have tested this regex pattern using your string. If there are other test-cases that you have in mind, then you might need to change regex accordingly
I have to display string with visible control characters like \n, \t etc.
I have tried quotations like here, also I have tried to do something like
Pattern pattern = Pattern.compile("\\p{Cntrl}");
Matcher matcher = pattern.matcher(str);
String controlChar = matcher.group();
String replace = "\\" + controlChar;
result = result.replace(controlChar, replace);
but I have failed
Alternative: Use visible characters instead of escape sequences.
To make control characters "visible", use the characters from the Unicode Control Pictures Block, i.e. map \u0000-\u001F to \u2400-\u241F, and \u007F to \u2421.
Note that this requires output to be Unicode, e.g. UTF-8, not a single-byte code page like ISO-8859-1.
private static String showControlChars(String input) {
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("[\u0000-\u001F\u007F]").matcher(input);
while (m.find()) {
char c = m.group().charAt(0);
m.appendReplacement(buf, Character.toString(c == '\u007F' ? '\u2421' : (char) (c + 0x2400)));
if (c == '\n') // Let's preserve newlines
buf.append(System.lineSeparator());
}
return m.appendTail(buf).toString();
}
Output using method above as input text:
␉private static String showControlChars(String input) {␍␊
␉␉StringBuffer buf = new StringBuffer();␍␊
␉␉Matcher m = Pattern.compile("[\u0000-\u001F\u007F]").matcher(input);␍␊
␉␉while (m.find()) {␍␊
␉␉␉char c = m.group().charAt(0);␍␊
␉␉␉m.appendReplacement(buf, Character.toString(c == '\u007F' ? '\u2421' : (char) (c + 0x2400)));␍␊
␉␉␉if (c == '\n')␍␊
␉␉␉␉buf.append(System.lineSeparator());␍␊
␉␉}␍␊
␉␉return m.appendTail(buf).toString();␍␊
␉}␍␊
Simply replace occurences of '\n' with the escaped version (i.e. '\\n'), like this:
final String result = str.replace("\n", "\\n");
For example:
public static void main(final String args[]) {
final String str = "line1\nline2";
System.out.println(str);
final String result = str.replace("\n", "\\n");
System.out.println(result);
}
Will yield the output:
line1
newline
line1\nnewline
just doing
result = result.replace("\\", "\\\\");
will work!!
Hello I have following method to display a promotion line when I comment a shoutbox:
public String getShoutboxUnderline(){
StringBuilder builder = new StringBuilder();
builder.append("watch");
builder.append("on");
builder.append("youtube");
builder.append(":");
builder.append("Mickey");
builder.append("en");
builder.append("de");
builder.append("stomende");
builder.append("drol");
return builder.toString();
}
But when I get it, I get watchonyoutube:mickeyendestomendedrol, which is without spaces. How do I get spaces in my Stringbuilder?
As of JDK 1.8, you can use a StringJoiner, which is more convenient in your case:
StringJoiner is used to construct a sequence of characters separated
by a delimiter and optionally starting with a supplied prefix and
ending with a supplied suffix.
StringJoiner joiner = new StringJoiner(" "); // Use 'space' as the delimiter
joiner.add("watch") // watch
.add("on") // watch on
.add("youtube") // watch on youtube
.add(":") // etc...
.add("Mickey")
.add("en")
.add("de")
.add("stomende")
.add("drol");
return joiner.toString();
This way, you will not need to add those spaces "manually".
Just invoke builder.append(" ") at the location of your preference.
E.g.
builder
.append("watch")
.append(" ")
.append("on")
...etc.
NB:
Using the fluent builder syntax here for convenience
You can also just append a space after each literal instead (save for the last one)
Cleaner way of doing it.
Create a class variable:
private static final String BLANK_SPACE=" ";
Now in you StringBuilder code ,append it where required:
StringBuilder builder = new StringBuilder();
builder.append("watch");
builder.append(BLANK_SPACE);
builder.append("on");
builder.append("youtube");
builder.append(":");
builder.append(BLANK_SPACE);
builder.append("Mickey");
builder.append("en");
builder.append("de");
builder.append(BLANK_SPACE);
builder.append("stomende");
builder.append("drol");
System.out.println(builder.toString());
A space is only a string containing the single character space.
So you can append it exactly as appending any other string.
StringBuilder builder = new StringBuilder();
builder.append("watch");
builder.append(" ");
builder.append("on");
builder.append(" ");
// and so on
Remember also that the append method returns the StringBuilder so it is possible to join appends one after the other as follow
StringBuilder builder = new StringBuilder();
builder.append("watch").append(" ");
builder.append("on").append(" ");
// and so on
You can use this, it's equivalent to using StringBuilder or StringJoiner, but smaller
public class StringUnifier {
String separator = "";
List<String> dataList = new ArrayList<>();
private String response = "";
StringUnifier(String separator) {
this.separator = separator;
}
StringUnifier add(String data) {
if (!data.isEmpty()) {
this.dataList.add(data);
}
return this;
}
#Override
public String toString() {
this.dataList.forEach(str -> {
this.response += (str + this.separator);
});
return this.response.substring(0, this.response.length() - this.separator.length());
}
}
MAIN
public class Main_Test {
public static void main(String[] args) {
StringUnifier stringUnifier = new StringUnifier(" ");
stringUnifier.add("columna1").add("columna2").add("columna3");
System.out.println(stringUnifier.toString());
}
}
RUN
output:
columna1 columna2 columna3
I have a string which contains normal text and Unicode in between, for example "abc\ue415abc".
I want to replace all occurrences of \\u with \u. How can I achieve this?
I used the following code but it's not working properly.
String s = "aaa\\u2022bbb\\u2014ccc";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
while (m.find()) {
try {
int cp = Integer.parseInt(m.group(1), 16);
m.appendReplacement(buf, "");
buf.appendCodePoint(cp);
} catch (NumberFormatException e) {
}
}
m.appendTail(buf);
s = buf.toString();
Please help. Thanks in advance.
From API reference: http://developer.android.com/reference/java/lang/String.html#replace(java.lang.CharSequence, java.lang.CharSequence)
You can use public
public String replace (CharSequence target, CharSequence replacement)
string = string.replace("\\u", "\u");
or
String replacedString = string.replace("\\u", "\u");
Your initial string doesn't, in fact, have any double backslashes.
String s = "aaa\\u2022bbb\\u2014ccc";
yields a string that contains aaa\u2022bbb\u2014ccc, as \\ is just java string-literal escaping for \.
If you want unicode characters: (StackOverflow21028089.java)
import java.util.regex.*;
class StackOverflow21028089 {
public static void main(String[] args) {
String s = "aaa\\u2022bbb\\u2014ccc";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
while (m.find()) {
try {
// see example:
// http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#appendReplacement%28java.lang.StringBuffer,%20java.lang.String%29
int cp = Integer.parseInt(m.group(1), 16);
char[] chars = Character.toChars(cp);
String rep = new String(chars);
System.err.printf("Found %d which means '%s'\n", cp, rep);
m.appendReplacement(buf, rep);
} catch (NumberFormatException e) {
System.err.println("Confused: " + e);
}
}
m.appendTail(buf);
s = buf.toString();
System.out.println(s);
}
}
=>
Found 8226 which means '•'
Found 8212 which means '—'
aaa•bbb—ccc
If you want aaa\u2022bbb\u2014ccc, that's what you started with. If you meant to start with a string literal with aaa\\u2022bbb\\u2014ccc, that's this:
String s = "aaa\\\\u2022bbb\\\\u2014ccc";
and converting it to the one with single slashes can be as simple as #Overv's code:
s = s.replaceAll("\\\\u", "\\u");
though since backslash has a special meaning in regex patterns and replacements (see Matcher's docs) (in addition to java parsing), this should probably be:
s = s.replaceAll("\\\\\\\\u", "\\\\u");
=>
aaa\u2022bbb\u2014ccc
Try this:
s = s.replace(s.indexOf("\\u"), "\u");
There is a contains method and a replace method in String. That being said
String hello = "hgjgu\udfgyud\\ushddsjn\hsdfds\\ubjn";
if(hello.contains("\\u"))
hello.replace("\\u","\u");
System.out.println(hello);
It will print :- hgjgu\udfgyud\ushddsjn\hsdfds\ubjn
I have a string array variable which values changes continuously. Random arrays are generated from it. This is what i have:
String trans = Utility.GetColumnValue(testdata[k], "suggest_text_2");
The trans value changes continuously. How can i concatenate it with the previous values? How can i print every value of trans as it changes continuously? Do i need to use any buffer?
If you need the intermediate results, you will probably need something like this:
String yourOldString;
String freshString;
// other code which updates freshString
yourOldString = yourOldString + " " + freshString;
However if you do need to catch all updates but only print out the final result, use a StringBuilder:
private static final String WHITESPACE = " ";
String yourOldString;
String freshString;
StringBuilder builder = new StringBuilder();
builder.append(yourOldString);
// other code which updates freshString
builder.append(WHITESPACE);
builder.append(freshString);
// once everything is done:
String resultString = builder.toString();
String a = "foo";
String space = " ";
String b = "bar";
String c = a+space+b;
It's often best to use StringBuilder to concatenate strings:
String [] array { "fee", "fie", "foe", "fum" };
boolean firstTime = true;
StringBuilder sb = new StringBuilder(50);
for (String word : array) {
if (firstTime) {
firstTime = false;
} else {
sb.append(' ');
}
sb.append(word);
}
String finalResult = sb.toString();
System.out.println("string1 "+"string2");
Simply,
String trans = Utility.GetColumnValue(testdata[k], "suggest_text_2");
StringBuffer concat = new StringBuffer();
concat.append(test).append(" ");
Or,
StringBuffer concat = new StringBuffer();
concat.append(Utility.GetColumnValue(testdata[k], "suggest_text_2")).append(" ");
To concatenate/combine set of data.
for(String s: trans){
concat.append(s).append(" ");
}
String concatenation(like trans + " ") is slower than the StringBuffer append(). I strongly suggest you to use StringBuilder. Because when combinig String on the fly StringBuffer is created and then using toString() converts to String.
Here is nice blog post to read to learn about performance of these two.