How to make control characters visible?

How to make control characters visible? - java

I have to display string with visible control characters like \n, \t etc.
I have tried quotations like here, also I have tried to do something like
Pattern pattern = Pattern.compile("\\p{Cntrl}");
Matcher matcher = pattern.matcher(str);
String controlChar = matcher.group();
String replace = "\\" + controlChar;
result = result.replace(controlChar, replace);
but I have failed

Alternative: Use visible characters instead of escape sequences.
To make control characters "visible", use the characters from the Unicode Control Pictures Block, i.e. map \u0000-\u001F to \u2400-\u241F, and \u007F to \u2421.
Note that this requires output to be Unicode, e.g. UTF-8, not a single-byte code page like ISO-8859-1.
private static String showControlChars(String input) {
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("[\u0000-\u001F\u007F]").matcher(input);
while (m.find()) {
char c = m.group().charAt(0);
m.appendReplacement(buf, Character.toString(c == '\u007F' ? '\u2421' : (char) (c + 0x2400)));
if (c == '\n') // Let's preserve newlines
buf.append(System.lineSeparator());
}
return m.appendTail(buf).toString();
}
Output using method above as input text:
␉private static String showControlChars(String input) {␍␊
␉␉StringBuffer buf = new StringBuffer();␍␊
␉␉Matcher m = Pattern.compile("[\u0000-\u001F\u007F]").matcher(input);␍␊
␉␉while (m.find()) {␍␊
␉␉␉char c = m.group().charAt(0);␍␊
␉␉␉m.appendReplacement(buf, Character.toString(c == '\u007F' ? '\u2421' : (char) (c + 0x2400)));␍␊
␉␉␉if (c == '\n')␍␊
␉␉␉␉buf.append(System.lineSeparator());␍␊
␉␉}␍␊
␉␉return m.appendTail(buf).toString();␍␊
␉}␍␊

Simply replace occurences of '\n' with the escaped version (i.e. '\\n'), like this:
final String result = str.replace("\n", "\\n");
For example:
public static void main(final String args[]) {
final String str = "line1\nline2";
System.out.println(str);
final String result = str.replace("\n", "\\n");
System.out.println(result);
}
Will yield the output:
line1
newline
line1\nnewline

just doing
result = result.replace("\\", "\\\\");
will work!!

Related

Decode and replace hex values in a string in Java

I have the a string in Java which contains hex values beneath normal characters. It looks something like this:
String s = "Hello\xF6\xE4\xFC\xD6\xC4\xDC\xDF"
What I want is to convert the hex values to the characters they represent, so it will look like this:
"HelloöäüÖÄÜß"
Is there a way to replace all hex values with the actual character they represent?
I can achieve what I want with this, but I have to do one line for every character and it does not cover unexcepted characters:
indexRequest = indexRequest.replace("\\xF6", "ö");
indexRequest = indexRequest.replace("\\xE4", "ä");
indexRequest = indexRequest.replace("\\xFC", "ü");
indexRequest = indexRequest.replace("\\xD6", "Ö");
indexRequest = indexRequest.replace("\\xC4", "Ä");
indexRequest = indexRequest.replace("\\xDC", "Ü");
indexRequest = indexRequest.replace("\\xDF", "ß");

public static void main(String[] args) {
String s = "Hello\\xF6\\xE4\\xFC\\xD6\\xC4\\xDC\\xDF\\xFF ";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("\\\\x[0-9A-F]+");
Matcher m = p.matcher(s);
while(m.find()){
String hex = m.group(); //find hex values
int num = Integer.parseInt(hex.replace("\\x", ""), 16); //parse to int
char bin = (char)num; // cast int to char
m.appendReplacement(sb, bin+""); // replace hex with char
}
m.appendTail(sb);
System.out.println(sb.toString());
}

I would loop through every chacter to find the '\' and than skip one char and start a methode with the next two chars.
And than just use the code by Michael Berry
here:
Convert a String of Hex into ASCII in Java

You can use a regex [xX][0-9a-fA-F]+ to identify all the hex code in your string, convert them to there corresponding character using Integer.parseInt(matcher.group().substring(1), 16) and replace them in string. Below is a sample code for it
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HexToCharacter {
public static void main(String[] args) {
String s = "HelloxF6xE4xFCxD6xC4xDCxDF";
StringBuilder sb = new StringBuilder(s);
Pattern pattern = Pattern.compile("[xX][0-9a-fA-F]+");
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
int indexOfHexCode = sb.indexOf(matcher.group());
sb.replace(indexOfHexCode, indexOfHexCode+matcher.group().length(), Character.toString((char)Integer.parseInt(matcher.group().substring(1), 16)));
}
System.out.println(sb.toString());
}
}
I have tested this regex pattern using your string. If there are other test-cases that you have in mind, then you might need to change regex accordingly

Replacing \\u by \u in java string

I have a string which contains normal text and Unicode in between, for example "abc\ue415abc".
I want to replace all occurrences of \\u with \u. How can I achieve this?
I used the following code but it's not working properly.
String s = "aaa\\u2022bbb\\u2014ccc";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
while (m.find()) {
try {
int cp = Integer.parseInt(m.group(1), 16);
m.appendReplacement(buf, "");
buf.appendCodePoint(cp);
} catch (NumberFormatException e) {
}
}
m.appendTail(buf);
s = buf.toString();
Please help. Thanks in advance.

From API reference: http://developer.android.com/reference/java/lang/String.html#replace(java.lang.CharSequence, java.lang.CharSequence)
You can use public
public String replace (CharSequence target, CharSequence replacement)
string = string.replace("\\u", "\u");
or
String replacedString = string.replace("\\u", "\u");

Your initial string doesn't, in fact, have any double backslashes.
String s = "aaa\\u2022bbb\\u2014ccc";
yields a string that contains aaa\u2022bbb\u2014ccc, as \\ is just java string-literal escaping for \.
If you want unicode characters: (StackOverflow21028089.java)
import java.util.regex.*;
class StackOverflow21028089 {
public static void main(String[] args) {
String s = "aaa\\u2022bbb\\u2014ccc";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
while (m.find()) {
try {
// see example:
// http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#appendReplacement%28java.lang.StringBuffer,%20java.lang.String%29
int cp = Integer.parseInt(m.group(1), 16);
char[] chars = Character.toChars(cp);
String rep = new String(chars);
System.err.printf("Found %d which means '%s'\n", cp, rep);
m.appendReplacement(buf, rep);
} catch (NumberFormatException e) {
System.err.println("Confused: " + e);
}
}
m.appendTail(buf);
s = buf.toString();
System.out.println(s);
}
}
=>
Found 8226 which means '•'
Found 8212 which means '—'
aaa•bbb—ccc
If you want aaa\u2022bbb\u2014ccc, that's what you started with. If you meant to start with a string literal with aaa\\u2022bbb\\u2014ccc, that's this:
String s = "aaa\\\\u2022bbb\\\\u2014ccc";
and converting it to the one with single slashes can be as simple as #Overv's code:
s = s.replaceAll("\\\\u", "\\u");
though since backslash has a special meaning in regex patterns and replacements (see Matcher's docs) (in addition to java parsing), this should probably be:
s = s.replaceAll("\\\\\\\\u", "\\\\u");
=>
aaa\u2022bbb\u2014ccc

Try this:
s = s.replace(s.indexOf("\\u"), "\u");

There is a contains method and a replace method in String. That being said
String hello = "hgjgu\udfgyud\\ushddsjn\hsdfds\\ubjn";
if(hello.contains("\\u"))
hello.replace("\\u","\u");
System.out.println(hello);
It will print :- hgjgu\udfgyud\ushddsjn\hsdfds\ubjn

Java Regex BBCode not replacing but

I am going to use the following method to replace special BB Codes for html links
public String replace(String text , String bbcode , String imageLocation ){
StringBuffer imageBuffer = new StringBuffer ("");
Pattern pattern = Pattern.compile("\\"+bbcode );
Matcher matcher = pattern.matcher(text);
StringBuilder builder = new StringBuilder();
int i = 0;
while (matcher.find()) {
//String orginal = replacements.get(matcher.group(1));
imageBuffer.append("<img src=\"" + imageLocation + "\" />");
String replacement = imageBuffer.toString();
builder.append(text.substring(i, matcher.start()));
if (replacement == null) {
builder.append(matcher.group(0));
} else {
builder.append(replacement);
}
i = matcher.end();
}
builder.append(text.substring(i, text.length()));
return builder.toString();
}
but when it comes to replacing the following bbcodes ,
:D
O:-)
:-[
:o)
:~(
:xx(
:-]
:-(
^3^
#_#
:O
:)
:P
;-)
???
?_?
Z_Z
It turns out to be not closing bracket and : <-- not recognized
How should I override the regex functional code and replace the abovementioned list of icons as html image links?
I am currently using this string array but
It comes out with the following error
error: Error: No resource type specified (at '^index_6' with value '#_#').
<string-array name="hkgicon_array">
<item>[369]</item>
<item>#adore#</item>
<item>#yup#</item>
<item>#ass#</item>
<item>:-(</item>
<item>^3^</item>
<item>#_#</item>
</string-array>

USE QUOTE
You can use Pattern pattern = Pattern.compile(Pattern.quote(bbcode ));
in your code instead of Pattern.compile("\\"+bbcode );.
Try this code:
public static String replace(String text , String bbcode , String imageLocation ){
StringBuffer imageBuffer = new StringBuffer ("");
Pattern pattern = Pattern.compile(Pattern.quote(bbcode ));
Matcher matcher = pattern.matcher(text);
StringBuilder builder = new StringBuilder();
int i = 0;
while (matcher.find()) {
//String orginal = replacements.get(matcher.group(1));
imageBuffer.append("<img src=\"" + imageLocation + "\" />");
String replacement = imageBuffer.toString();
builder.append(text.substring(i, matcher.start()));
if (replacement == null) {
builder.append(matcher.group(0));
} else {
builder.append(replacement);
}
i = matcher.end();
}
builder.append(text.substring(i, text.length()));
return builder.toString();
}
Refer stackoverflow for more details .

How to replace characters in a java String?

I like to replace a certain set of characters of a string with a corresponding replacement character in an efficent way.
For example:
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
String result = replaceChars("Gračišće", sourceCharacters , targetCharacters );
Assert.equals(result,"Gracisce") == true;
Is there are more efficient way than to use the replaceAll method of the String class?
My first idea was:
final String s = "Gračišće";
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
// preparation
final char[] sourceString = s.toCharArray();
final char result[] = new char[sourceString.length];
final char[] targetCharactersArray = targetCharacters.toCharArray();
// main work
for(int i=0,l=sourceString.length;i<l;++i)
{
final int pos = sourceCharacters.indexOf(sourceString[i]);
result[i] = pos!=-1 ? targetCharactersArray[pos] : sourceString[i];
}
// result
String resultString = new String(result);
Any ideas?
Btw, the UTF-8 characters are causing the trouble, with US_ASCII it works fine.

You can make use of java.text.Normalizer and a shot of regex to get rid of the diacritics of which there exist much more than you have collected as far.
Here's an SSCCE, copy'n'paste'n'run it on Java 6:
package com.stackoverflow.q2653739;
import java.text.Normalizer;
import java.text.Normalizer.Form;
public class Test {
public static void main(String... args) {
System.out.println(removeDiacriticalMarks("Gračišće"));
}
public static String removeDiacriticalMarks(String string) {
return Normalizer.normalize(string, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
}
This should yield
Gracisce
At least, it does here at Eclipse with console character encoding set to UTF-8 (Window > Preferences > General > Workspace > Text File Encoding). Ensure that the same is set in your environment as well.
As an alternative, maintain a Map<Character, Character>:
Map<Character, Character> charReplacementMap = new HashMap<Character, Character>();
charReplacementMap.put('š', 's');
charReplacementMap.put('đ', 'd');
// Put more here.
String originalString = "Gračišće";
StringBuilder builder = new StringBuilder();
for (char currentChar : originalString.toCharArray()) {
Character replacementChar = charReplacementMap.get(currentChar);
builder.append(replacementChar != null ? replacementChar : currentChar);
}
String newString = builder.toString();

I'd use the replace method in a simple loop.
String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";
String s = "Gračišće";
for (int i=0 ; i<sourceCharacters.length() ; i++)
s = s.replace(sourceCharacters.charAt[i], targetCharacters.charAt[i]);
System.out.println(s);

Regarding Java String Manipulation

I have the string "MO""RET" gets stored in items[1] array after the split command. After it get's stored I do a replaceall on this string and it replaces all the double quotes.
But I want it to be stored as MO"RET. How do i do it. In the csv file from which i process using split command Double quotes within the contents of a Text field are repeated (Example: This account is a ""large"" one"). So i want retain the one of the two quotes in the middle of string if it get's repeated and ignore the end quotes if present . How can i do it?
String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
items[1] has "MO""RET"
String recordType = items[1].replaceAll("\"","");
After this recordType has MORET I want it to have MO"RET

Don't use regex to split a CSV line. This is asking for trouble ;) Just parse it character-by-character. Here's an example:
public static List<List<String>> parseCsv(InputStream input, char separator) throws IOException {
BufferedReader reader = null;
List<List<String>> csv = new ArrayList<List<String>>();
try {
reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
for (String record; (record = reader.readLine()) != null;) {
boolean quoted = false;
StringBuilder fieldBuilder = new StringBuilder();
List<String> fields = new ArrayList<String>();
for (int i = 0; i < record.length(); i++) {
char c = record.charAt(i);
fieldBuilder.append(c);
if (c == '"') {
quoted = !quoted;
}
if ((!quoted && c == separator) || i + 1 == record.length()) {
fields.add(fieldBuilder.toString().replaceAll(separator + "$", "")
.replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim());
fieldBuilder = new StringBuilder();
}
if (c == separator && i + 1 == record.length()) {
fields.add("");
}
}
csv.add(fields);
}
} finally {
if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}
return csv;
}
Yes, there's little regex involved, but it only trims off ending separator and surrounding quotes of a single field.
You can however also grab any 3rd party Java CSV API.

How about:
String recordType = items[1].replaceAll( "\"\"", "\"" );

I prefer you to use replace instead of replaceAll.
replaceAll uses REGEX as the first argument.
The requirement is to replace two continues QUOTES with one QUOTE
String recordType = items[1].replace( "\"\"", "\"" );
To see the difference between replace and replaceAll , execute bellow code
recordType = items[1].replace( "$$", "$" );
recordType = items[1].replaceAll( "$$", "$" );

Here you can use the regular expression.
recordType = items[1].replaceAll( "\\B\"", "" );
recordType = recordType.replaceAll( "\"\\B", "" );
First statement replace the quotes in the beginning of the word with empty character.
Second statement replace the quotes in the end of the word with empty character.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to make control characters visible? - java

just doing result = result.replace("\\", "\\\\"); will work!!

Related

Decode and replace hex values in a string in Java

Replacing \\u by \u in java string

Java Regex BBCode not replacing but

How to replace characters in a java String?

Regarding Java String Manipulation

Categories

Resources