How to make control characters visible? - java

I have to display string with visible control characters like \n, \t etc.
I have tried quotations like here, also I have tried to do something like
Pattern pattern = Pattern.compile("\\p{Cntrl}");
Matcher matcher = pattern.matcher(str);
String controlChar = matcher.group();
String replace = "\\" + controlChar;
result = result.replace(controlChar, replace);
but I have failed

Alternative: Use visible characters instead of escape sequences.
To make control characters "visible", use the characters from the Unicode Control Pictures Block, i.e. map \u0000-\u001F to \u2400-\u241F, and \u007F to \u2421.
Note that this requires output to be Unicode, e.g. UTF-8, not a single-byte code page like ISO-8859-1.
private static String showControlChars(String input) {
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("[\u0000-\u001F\u007F]").matcher(input);
while (m.find()) {
char c = m.group().charAt(0);
m.appendReplacement(buf, Character.toString(c == '\u007F' ? '\u2421' : (char) (c + 0x2400)));
if (c == '\n') // Let's preserve newlines
buf.append(System.lineSeparator());
}
return m.appendTail(buf).toString();
}
Output using method above as input text:
␉private static String showControlChars(String input) {␍␊
␉␉StringBuffer buf = new StringBuffer();␍␊
␉␉Matcher m = Pattern.compile("[\u0000-\u001F\u007F]").matcher(input);␍␊
␉␉while (m.find()) {␍␊
␉␉␉char c = m.group().charAt(0);␍␊
␉␉␉m.appendReplacement(buf, Character.toString(c == '\u007F' ? '\u2421' : (char) (c + 0x2400)));␍␊
␉␉␉if (c == '\n')␍␊
␉␉␉␉buf.append(System.lineSeparator());␍␊
␉␉}␍␊
␉␉return m.appendTail(buf).toString();␍␊
␉}␍␊

Simply replace occurences of '\n' with the escaped version (i.e. '\\n'), like this:
final String result = str.replace("\n", "\\n");
For example:
public static void main(final String args[]) {
final String str = "line1\nline2";
System.out.println(str);
final String result = str.replace("\n", "\\n");
System.out.println(result);
}
Will yield the output:
line1
newline
line1\nnewline

just doing
result = result.replace("\\", "\\\\");
will work!!

Related

Decode and replace hex values in a string in Java

I have the a string in Java which contains hex values beneath normal characters. It looks something like this:
String s = "Hello\xF6\xE4\xFC\xD6\xC4\xDC\xDF"
What I want is to convert the hex values to the characters they represent, so it will look like this:
"HelloöäüÖÄÜß"
Is there a way to replace all hex values with the actual character they represent?
I can achieve what I want with this, but I have to do one line for every character and it does not cover unexcepted characters:
indexRequest = indexRequest.replace("\\xF6", "ö");
indexRequest = indexRequest.replace("\\xE4", "ä");
indexRequest = indexRequest.replace("\\xFC", "ü");
indexRequest = indexRequest.replace("\\xD6", "Ö");
indexRequest = indexRequest.replace("\\xC4", "Ä");
indexRequest = indexRequest.replace("\\xDC", "Ü");
indexRequest = indexRequest.replace("\\xDF", "ß");
public static void main(String[] args) {
String s = "Hello\\xF6\\xE4\\xFC\\xD6\\xC4\\xDC\\xDF\\xFF ";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("\\\\x[0-9A-F]+");
Matcher m = p.matcher(s);
while(m.find()){
String hex = m.group(); //find hex values
int num = Integer.parseInt(hex.replace("\\x", ""), 16); //parse to int
char bin = (char)num; // cast int to char
m.appendReplacement(sb, bin+""); // replace hex with char
}
m.appendTail(sb);
System.out.println(sb.toString());
}
I would loop through every chacter to find the '\' and than skip one char and start a methode with the next two chars.
And than just use the code by Michael Berry
here:
Convert a String of Hex into ASCII in Java
You can use a regex [xX][0-9a-fA-F]+ to identify all the hex code in your string, convert them to there corresponding character using Integer.parseInt(matcher.group().substring(1), 16) and replace them in string. Below is a sample code for it
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HexToCharacter {
public static void main(String[] args) {
String s = "HelloxF6xE4xFCxD6xC4xDCxDF";
StringBuilder sb = new StringBuilder(s);
Pattern pattern = Pattern.compile("[xX][0-9a-fA-F]+");
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
int indexOfHexCode = sb.indexOf(matcher.group());
sb.replace(indexOfHexCode, indexOfHexCode+matcher.group().length(), Character.toString((char)Integer.parseInt(matcher.group().substring(1), 16)));
}
System.out.println(sb.toString());
}
}
I have tested this regex pattern using your string. If there are other test-cases that you have in mind, then you might need to change regex accordingly

Replacing \\u by \u in java string

I have a string which contains normal text and Unicode in between, for example "abc\ue415abc".
I want to replace all occurrences of \\u with \u. How can I achieve this?
I used the following code but it's not working properly.
String s = "aaa\\u2022bbb\\u2014ccc";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
while (m.find()) {
try {
int cp = Integer.parseInt(m.group(1), 16);
m.appendReplacement(buf, "");
buf.appendCodePoint(cp);
} catch (NumberFormatException e) {
}
}
m.appendTail(buf);
s = buf.toString();
Please help. Thanks in advance.
From API reference: http://developer.android.com/reference/java/lang/String.html#replace(java.lang.CharSequence, java.lang.CharSequence)
You can use public
public String replace (CharSequence target, CharSequence replacement)
string = string.replace("\\u", "\u");
or
String replacedString = string.replace("\\u", "\u");
Your initial string doesn't, in fact, have any double backslashes.
String s = "aaa\\u2022bbb\\u2014ccc";
yields a string that contains aaa\u2022bbb\u2014ccc, as \\ is just java string-literal escaping for \.
If you want unicode characters: (StackOverflow21028089.java)
import java.util.regex.*;
class StackOverflow21028089 {
public static void main(String[] args) {
String s = "aaa\\u2022bbb\\u2014ccc";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
while (m.find()) {
try {
// see example:
// http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#appendReplacement%28java.lang.StringBuffer,%20java.lang.String%29
int cp = Integer.parseInt(m.group(1), 16);
char[] chars = Character.toChars(cp);
String rep = new String(chars);
System.err.printf("Found %d which means '%s'\n", cp, rep);
m.appendReplacement(buf, rep);
} catch (NumberFormatException e) {
System.err.println("Confused: " + e);
}
}
m.appendTail(buf);
s = buf.toString();
System.out.println(s);
}
}
=>
Found 8226 which means '•'
Found 8212 which means '—'
aaa•bbb—ccc
If you want aaa\u2022bbb\u2014ccc, that's what you started with. If you meant to start with a string literal with aaa\\u2022bbb\\u2014ccc, that's this:
String s = "aaa\\\\u2022bbb\\\\u2014ccc";
and converting it to the one with single slashes can be as simple as #Overv's code:
s = s.replaceAll("\\\\u", "\\u");
though since backslash has a special meaning in regex patterns and replacements (see Matcher's docs) (in addition to java parsing), this should probably be:
s = s.replaceAll("\\\\\\\\u", "\\\\u");
=>
aaa\u2022bbb\u2014ccc
Try this:
s = s.replace(s.indexOf("\\u"), "\u");
There is a contains method and a replace method in String. That being said
String hello = "hgjgu\udfgyud\\ushddsjn\hsdfds\\ubjn";
if(hello.contains("\\u"))
hello.replace("\\u","\u");
System.out.println(hello);
It will print :- hgjgu\udfgyud\ushddsjn\hsdfds\ubjn

Java Regex BBCode not replacing but

I am going to use the following method to replace special BB Codes for html links
public String replace(String text , String bbcode , String imageLocation ){
StringBuffer imageBuffer = new StringBuffer ("");
Pattern pattern = Pattern.compile("\\"+bbcode );
Matcher matcher = pattern.matcher(text);
StringBuilder builder = new StringBuilder();
int i = 0;
while (matcher.find()) {
//String orginal = replacements.get(matcher.group(1));
imageBuffer.append("<img src=\"" + imageLocation + "\" />");
String replacement = imageBuffer.toString();
builder.append(text.substring(i, matcher.start()));
if (replacement == null) {
builder.append(matcher.group(0));
} else {
builder.append(replacement);
}
i = matcher.end();
}
builder.append(text.substring(i, text.length()));
return builder.toString();
}
but when it comes to replacing the following bbcodes ,
:D
O:-)
:-[
:o)
:~(
:xx(
:-]
:-(
^3^
#_#
:O
:)
:P
;-)
???
?_?
Z_Z
It turns out to be not closing bracket and : <-- not recognized
How should I override the regex functional code and replace the abovementioned list of icons as html image links?
I am currently using this string array but
It comes out with the following error
error: Error: No resource type specified (at '^index_6' with value '#_#').
<string-array name="hkgicon_array">
<item>[369]</item>
<item>#adore#</item>
<item>#yup#</item>
<item>#ass#</item>
<item>:-(</item>
<item>^3^</item>
<item>#_#</item>
</string-array>
USE QUOTE
You can use Pattern pattern = Pattern.compile(Pattern.quote(bbcode ));
in your code instead of Pattern.compile("\\"+bbcode );.
Try this code:
public static String replace(String text , String bbcode , String imageLocation ){
StringBuffer imageBuffer = new StringBuffer ("");
Pattern pattern = Pattern.compile(Pattern.quote(bbcode ));
Matcher matcher = pattern.matcher(text);
StringBuilder builder = new StringBuilder();
int i = 0;
while (matcher.find()) {
//String orginal = replacements.get(matcher.group(1));
imageBuffer.append("<img src=\"" + imageLocation + "\" />");
String replacement = imageBuffer.toString();
builder.append(text.substring(i, matcher.start()));
if (replacement == null) {
builder.append(matcher.group(0));
} else {
builder.append(replacement);
}
i = matcher.end();
}
builder.append(text.substring(i, text.length()));
return builder.toString();
}
Refer stackoverflow for more details .

How to replace characters in a java String?

I like to replace a certain set of characters of a string with a corresponding replacement character in an efficent way.
For example:
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
String result = replaceChars("Gračišće", sourceCharacters , targetCharacters );
Assert.equals(result,"Gracisce") == true;
Is there are more efficient way than to use the replaceAll method of the String class?
My first idea was:
final String s = "Gračišće";
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
// preparation
final char[] sourceString = s.toCharArray();
final char result[] = new char[sourceString.length];
final char[] targetCharactersArray = targetCharacters.toCharArray();
// main work
for(int i=0,l=sourceString.length;i<l;++i)
{
final int pos = sourceCharacters.indexOf(sourceString[i]);
result[i] = pos!=-1 ? targetCharactersArray[pos] : sourceString[i];
}
// result
String resultString = new String(result);
Any ideas?
Btw, the UTF-8 characters are causing the trouble, with US_ASCII it works fine.
You can make use of java.text.Normalizer and a shot of regex to get rid of the diacritics of which there exist much more than you have collected as far.
Here's an SSCCE, copy'n'paste'n'run it on Java 6:
package com.stackoverflow.q2653739;
import java.text.Normalizer;
import java.text.Normalizer.Form;
public class Test {
public static void main(String... args) {
System.out.println(removeDiacriticalMarks("Gračišće"));
}
public static String removeDiacriticalMarks(String string) {
return Normalizer.normalize(string, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
}
This should yield
Gracisce
At least, it does here at Eclipse with console character encoding set to UTF-8 (Window > Preferences > General > Workspace > Text File Encoding). Ensure that the same is set in your environment as well.
As an alternative, maintain a Map<Character, Character>:
Map<Character, Character> charReplacementMap = new HashMap<Character, Character>();
charReplacementMap.put('š', 's');
charReplacementMap.put('đ', 'd');
// Put more here.
String originalString = "Gračišće";
StringBuilder builder = new StringBuilder();
for (char currentChar : originalString.toCharArray()) {
Character replacementChar = charReplacementMap.get(currentChar);
builder.append(replacementChar != null ? replacementChar : currentChar);
}
String newString = builder.toString();
I'd use the replace method in a simple loop.
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
String s = "Gračišće";
for (int i=0 ; i<sourceCharacters.length() ; i++)
s = s.replace(sourceCharacters.charAt[i], targetCharacters.charAt[i]);
System.out.println(s);

Regarding Java String Manipulation

I have the string "MO""RET" gets stored in items[1] array after the split command. After it get's stored I do a replaceall on this string and it replaces all the double quotes.
But I want it to be stored as MO"RET. How do i do it. In the csv file from which i process using split command Double quotes within the contents of a Text field are repeated (Example: This account is a ""large"" one"). So i want retain the one of the two quotes in the middle of string if it get's repeated and ignore the end quotes if present . How can i do it?
String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
items[1] has "MO""RET"
String recordType = items[1].replaceAll("\"","");
After this recordType has MORET I want it to have MO"RET
Don't use regex to split a CSV line. This is asking for trouble ;) Just parse it character-by-character. Here's an example:
public static List<List<String>> parseCsv(InputStream input, char separator) throws IOException {
BufferedReader reader = null;
List<List<String>> csv = new ArrayList<List<String>>();
try {
reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
for (String record; (record = reader.readLine()) != null;) {
boolean quoted = false;
StringBuilder fieldBuilder = new StringBuilder();
List<String> fields = new ArrayList<String>();
for (int i = 0; i < record.length(); i++) {
char c = record.charAt(i);
fieldBuilder.append(c);
if (c == '"') {
quoted = !quoted;
}
if ((!quoted && c == separator) || i + 1 == record.length()) {
fields.add(fieldBuilder.toString().replaceAll(separator + "$", "")
.replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim());
fieldBuilder = new StringBuilder();
}
if (c == separator && i + 1 == record.length()) {
fields.add("");
}
}
csv.add(fields);
}
} finally {
if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}
return csv;
}
Yes, there's little regex involved, but it only trims off ending separator and surrounding quotes of a single field.
You can however also grab any 3rd party Java CSV API.
How about:
String recordType = items[1].replaceAll( "\"\"", "\"" );
I prefer you to use replace instead of replaceAll.
replaceAll uses REGEX as the first argument.
The requirement is to replace two continues QUOTES with one QUOTE
String recordType = items[1].replace( "\"\"", "\"" );
To see the difference between replace and replaceAll , execute bellow code
recordType = items[1].replace( "$$", "$" );
recordType = items[1].replaceAll( "$$", "$" );
Here you can use the regular expression.
recordType = items[1].replaceAll( "\\B\"", "" );
recordType = recordType.replaceAll( "\"\\B", "" );
First statement replace the quotes in the beginning of the word with empty character.
Second statement replace the quotes in the end of the word with empty character.

Categories