Is there a java library to convert special characters into decimal equivalent?
example:
input: "©™®"
output: "& #169; & #8482; & #174;"(space after & is only for question purpose, if typed without a space decimal equivalent is converted to special character)
Thank you !
This can be simply achieved with String.format(). The representations are simply the character value as decimal, padded to 4 characters and wrapped in &#;
The only tricky part is deciding which characters are "special". Here I've assumed not digit, not whitespace and not alpha...
StringBuilder output = new StringBuilder();
String input = "Foo bar ©™® baz";
for (char each : input.toCharArray()) {
if (Character.isAlphabetic(each) || Character.isDigit(each) || Character.isWhitespace(each)) {
output.append(each);
} else {
output.append(String.format("&#%04d;", (int) each));
}
}
System.out.println(output.toString());
You just need to fetch the integer value of the character as mentioned in How do I get the decimal value of a unicode character in Java?.
As per Oracle Java doc
char: The char data type is a single 16-bit Unicode character. It has
a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or
65,535 inclusive).
Assuming your characters fall within the character range, you can just get the decimal equivalent of each character from your string.
String text = "©™®";
char[] cArr = text.toCharArray();
for (char c : cArr)
{
int value = c; // get the decimal equivalent of the character
String result = "& #" + value; // append to some format string
System.out.println(result);
}
Output:
& #169
& #8482
& #174
I have a String containing ASCII representation of a character i.e.
String test = "0x07";
Is there a way I can somehow parse it to its character value.
I want something like
char c = 0x07;
But what the character exactly is, will be known only by reading the value in the string.
You have to add one step:
String test = "0x07";
int decimal = Integer.decode(test);
char c = (char) decimal;
I want to convert only the special characters to their UTF-8 equivalent character.
For example given a String: Abcds23#$_ss, it should get converted to Abcds23353695ss.
The following is how i did the above conversion:
The utf-8 in hexadecimal for # is 23 and in decimal is 35. The utf-8 in hexadecimal for $ is 24 and in decimal is 36. The utf-8 in hexadecimal for _ is 5f and in decimal is 95.
I know we have the String.replaceAll(String regex, String replacement) method. But I want to replace specific character with their specific UTF-8 equivalent.
How do I do the same in java?
I don't know how do you define "special characters", but this function should give you an idea:
public static String convert(String str)
{
StringBuilder buf = new StringBuilder();
for (int index = 0; index < str.length(); index++)
{
char ch = str.charAt(index);
if (Character.isLetterOrDigit(ch))
buf.append(ch);
else
buf.append(str.codePointAt(index));
}
return buf.toString();
}
#Test
public void test()
{
Assert.assertEquals("Abcds23353695ss", convert("Abcds23#$_ss"));
}
The following uses java 8 or above and checks whether a Unicode code point (symbol) is a letter or digit, pure ASCII (< 128) and otherwise output the Unicode code point as string of the numerical value.
static String convert(String str) {
int[] cps = str.codePoints()
.flatMap((cp) ->
Character.isLetterOrDigit(cp) && cp < 128
? IntStream.of(cp)
: String.valueOf(cp).codePoints())
.toArray();
return new String(cps, 0, cps.length);
}
String.codePoints() yields an IntStream, flatMap adds IntStreams in a single flattened stream, and toArray collects it in an array. So we can construct a new String from those code points. Entirely Unicode safe.
Conversion is not undoable without delimiters.
On Unicode:
Unicode numbers symbols, called code points, from 0 upwards, into the 3 byte range.
To be coded (formated) in bytes there exist UTF-8 (multi-byte), UTF-16LE and UTF-16BE (2byte-sequences) and UTF-32 (code points as-is more or less).
Java string constants in a .class file are in UTF-8. A String is composed of UTF-16BE chars. And String can give code points as above. So java by design uses Unicode for text.
I want to know that how to recognize and print next character in ASCII sequence if input is a non- string value like "space" or "!".
I know that for string value we can convert it into ASCII value by using
char character = 'a';
int ascii = (int) character;
Then adding 1 to it and converting it back to char , we can get next value in the sequence .
You can use:
char character = 'a';
int ascii = (char)((int)character+1);
It should work. But I have haven`t tested it.
I want to represent an empty character in Java as "" in String...
Like that char ch = an empty character;
Actually I want to replace a character without leaving space.
I think it might be sufficient to understand what this means: no character not even space.
You may assign '\u0000' (or 0).
For this purpose, use Character.MIN_VALUE.
Character ch = Character.MIN_VALUE;
char means exactly one character. You can't assign zero characters to this type.
That means that there is no char value for which String.replace(char, char) would return a string with a diffrent length.
As Character is a class deriving from Object, you can assign null as "instance":
Character myChar = null;
Problem solved ;)
An empty String is a wrapper on a char[] with no elements. You can have an empty char[]. But you cannot have an "empty" char. Like other primitives, a char has to have a value.
You say you want to "replace a character without leaving a space".
If you are dealing with a char[], then you would create a new char[] with that element removed.
If you are dealing with a String, then you would create a new String (String is immutable) with the character removed.
Here are some samples of how you could remove a char:
public static void main(String[] args) throws Exception {
String s = "abcdefg";
int index = s.indexOf('d');
// delete a char from a char[]
char[] array = s.toCharArray();
char[] tmp = new char[array.length-1];
System.arraycopy(array, 0, tmp, 0, index);
System.arraycopy(array, index+1, tmp, index, tmp.length-index);
System.err.println(new String(tmp));
// delete a char from a String using replace
String s1 = s.replace("d", "");
System.err.println(s1);
// delete a char from a String using StringBuilder
StringBuilder sb = new StringBuilder(s);
sb.deleteCharAt(index);
s1 = sb.toString();
System.err.println(s1);
}
As chars can be represented as Integers (ASCII-Codes), you can simply write:
char c = 0;
The 0 in ASCII-Code is null.
If you want to replace a character in a String without leaving any empty space then you can achieve this by using StringBuilder. String is immutable object in java,you can not modify it.
String str = "Hello";
StringBuilder sb = new StringBuilder(str);
sb.deleteCharAt(1); // to replace e character
I was looking for this. Simply set the char c = 0; and it works perfectly. Try it.
For example, if you are trying to remove duplicate characters from a String , one way would be to convert the string to char array and store in a hashset of characters which would automatically prevent duplicates.
Another way, however, will be to convert the string to a char array, use two for-loops and compare each character with the rest of the string/char array (a Big O on N^2 activity), then for each duplicate found just set that char to 0..
...and use new String(char[]) to convert the resulting char array to string and then sysout to print (this is all java btw). you will observe all chars set to zero are simply not there and all duplicates are gone. long post, but just wanted to give you an example.
so yes set char c = 0; or if for char array, set cArray[i]=0 for that specific duplicate character and you will have removed it.
You can't. "" is the literal for a string, which contains no characters. It does not contain the "empty character" (whatever you mean by that).
In java there is nothing as empty character literal, in other words, '' has no meaning unlike "" which means a empty String literal
The closest you can go about representing empty character literal is through zero length char[], something like:
char[] cArr = {}; // cArr is a zero length array
char[] cArr = new char[0] // this does the same
If you refer to String class its default constructor creates a empty character sequence using new char[0]
Also, using Character.MIN_VALUE is not correct because it is not really empty character rather smallest value of type character.
I also don't like Character c = null; as a solution mainly because jvm will throw NPE if it tries to un-box it. Secondly, null is basically a reference to nothing w.r.t reference type and here we are dealing with primitive type which don't accept null as a possible value.
Assuming that in the string, say str, OP wants to replace all occurrences of a character, say 'x', with empty character '', then try using:
str.replace("x", "");
char ch = Character.MIN_VALUE;
The code above will initialize the variable ch with the minimum value that a char can have (i.e. \u0000).
this is how I do it.
char[] myEmptyCharArray = "".toCharArray();
You can do something like this:
mystring.replace(""+ch, "");
String before = EMPTY_SPACE+TAB+"word"+TAB+EMPTY_SPACE
Where
EMPTY_SPACE = " " (this is String)
TAB = '\t' (this is Character)
String after = before.replaceAll(" ", "").replace('\t', '\0')
means
after = "word"
You can only re-use an existing character. e.g. \0 If you put this in a String, you will have a String with one character in it.
Say you want a char such that when you do
String s =
char ch = ?
String s2 = s + ch; // there is not char which does this.
assert s.equals(s2);
what you have to do instead is
String s =
char ch = MY_NULL_CHAR;
String s2 = ch == MY_NULL_CHAR ? s : s + ch;
assert s.equals(s2);
Use the \b operator (the backspace escape operator) in the second parameter
String test= "Anna Banana";
System.out.println(test); //returns Anna Banana<br><br>
System.out.println(test.replaceAll(" ","\b")); //returns AnnaBanana removing all the spaces in the string