How do I convert a unicode-string to a readable string - java

I have this string "\U05d0\U05d5\U05d2\U05e0\U05d3\U05d4","\U05d0\U05d5\U05d6\U05d1\U05e7\U05d9\U05e1\U05d8\U05df","\U05d0\U05d5\U05e1\U05d8\U05e8\U05d9\U05d4"
how do i convert it to a readable string? (note this is supposed to be hebrew)
i tried this method but it didnt work
byte[] bytes = s.getBytes();
String decoded = new String(bytes);
System.out.println(decoded);

All U should be lowercase u:
String s = "\u05d0\u05d5\u05d2\u05e0\u05d3\u05d4";
try{
byte[] bytes = s.getBytes();
String decoded = new String(bytes);
System.out.println(decoded);
} catch(UnsupportedEncodingException e) {
// ...
}
See Byte Encodings and Strings.
Output:
אוגנדה

Related

Java: Converting byte string to byte array

I want to convert my byte[] to a String, and then convert that String to a byte[].
So,
byte[] b = myFunction();
String bstring = b.toString();
/* Here the methode to convert the bstring to byte[], and call it ser */
String deser = new String(ser);
bstring gives me [B#74e752bb.
And then convert the String to byte[]. I'm not using it in this order, but this is an example.
How do I need to do this in Java?
When converting byte[] to String, you should use this,
new String(b, "UTF-8");
instead of,
b.toString();
When you are converting byte array to String, you should always specify a character encoding and use the same encoding while converting back to byte array from String. Best is to use UTF-8 encoding as that is quite powerful and compact encoding and can represent over a million characters. If you don't specify a character encoding, then platform's default encoding may be used which may not be able to represent all characters properly when converted from byte array to String.
Your method when dealt appropriately, should be written something like this,
public static void main(String args[]) throws Exception {
byte[] b = myFunction();
// String bstring = b.toString(); // don't do this
String bstring = new String(b, "UTF-8");
byte[] ser = bstring.getBytes("UTF-8");
/* Here the methode to convert the bstring to byte[], and call it ser */
String deser = new String(ser, "UTF-8");
}
I am no expert, but you should try the methods provided by the "Byte" class and if necessary, some loops. Try byte b = Byte.parseByte(String s) to convert a string to a byte and String s = Byte.toString(byte b) to convert a byte to a string. Hope this helps :).
You can do it like this,
String string = "Your String";
byte[] bytesFromString = string.getBytes(); // get bytes from a String
String StringFromByteArray = new String(bytesFromString); // get the String from a byte array

Convert String to byte[] array

I want to convert a String to a byte array but the array must have 256 positions, I mean, like this:
public byte[] temporal1 = new byte[256];
public byte[] temporal2 = new byte[256];
So, when I do:
String send = "SEND_MESSAGE";
String sendAck = "SEND_MESSAGE_ACK";
temporal1 = send.getBytes();
temporal2 = sendAck.getBytes();
I get this error: "./th.java:24: error: <identifier> expected". I know that if I do public byte[] temporal1 = send.getBytes();it works, but I need the array with that size to compare it with other byte array byte to byte.
can you please show the exact Exception or Error which is occurring in the console. because it works completely fine with me.
byte b1[] = new byte[256];
String s = "hello there";
b1 = s.getBytes();
System.out.println(b1);
To have the byte array temporal1 padded upto 256 bytes, you might do:
public byte[] temporal1 = new byte[256];
String send = "SEND_MESSAGE";
byte[] sendB = send.getBytes(send, StandardCharsets.UTF_8);
System.arraycopy(sendB, 0, temporal1, 0, Math.max(256, sendB.length));
If you want a C like terminating 0 byte, sendB may only provide 255 bytes: Math.max(255, sendB.length).
Better:
String send = "SEND_MESSAGE";
byte[] sendB = send.getBytes(send, StandardCharsets.UTF_8);
byte[] temporal1 = Arrays.copyOf(sendB, 256); // Pads or truncates.
temportal1[255] = (byte) 0; // Maybe
To get a byte[] from String with defined size:
public static byte[] toBytes(String data, int length) {
byte[] result = new byte[length];
System.arraycopy(data.getBytes(), 0, result, length - data.length(), data.length());
return result;
}
Ex:
byte[] sample = toBytes("SEND_MESSAGE", 256);
sample will be of size 256.

convert Hindi text to UTF-16 format

I want to convert my Hindi input to UTF-16 format. That's why I convert my string to byte array using character set "UTF-16".
But it will replace my string with ?????.
Here is the code
String original = "गुणवत्ता";
byte[] bytearr = original.getBytes("UTF-16");
String test= new String(bytearr,"UTF-16");
Try to encode the converted string as follows:
String original = "गुणवत्ता";
byte[] bytearr = original.getBytes("UTF-16");
String test= new String(bytearr,"UTF-16");
String encodedString = MimeUtility.encodeText(test, "utf-16", "B");

Java: Bits -> Bytes -> String Encoding

In Java, I have a String of bits e.g. "01100111000111...". Next, I want to do the following:
convert string to byte array which I have successfully done using:
byte[] bytes = new BigInteger(bits, 2).toByteArray();
Next, I want to convert bytes to String which I tried to do using:
String byteString = new String(bytes, "UTF-8");
but the results are not correct (garbage characters etc.).
I think "UTF-8" is not the proper encoding.
Kindly tell if there is any other way to get the string from such bytes or the proper encoding.
Edited after your comment:
String string = "01100111000111";
byte[] bytes = new BigInteger(string, 2).toByteArray();
String out = "";
for(byte b: bytes)
out+= String.format("%8s", Integer.toBinaryString(b & 0xFF)).replace(' ', '0');
System.out.println(out);
output:
0001100111000111
Hope this can help.

Convert byte[] to String using binary encoding

I want to translate each byte from a byte[] into a char, then put those chars on a String. This is the so-called "binary" encoding of some databases. So far, the best I could find is this huge boilerplate:
byte[] bytes = ...;
char[] chars = new char[bytes.length];
for (int i = 0; i < bytes.length; ++i) {
chars[i] = (char) (bytes[i] & 0xFF);
}
String s = new String(chars);
Is there another option from Java SE or perhaps from Apache Commons? I wish I could have something like this:
final Charset BINARY_CS = Charset.forName("BINARY");
String s = new String(bytes, BINARY_CS);
But I'm not willing to write a Charset and their codecs (yet). Is there such a ready binary Charset in JRE or in Apache Commons?
You could use the ASCII encoding for 7-bit characters
String s = "Hello World!";
byte[] b = s.getBytes("ASCII");
System.out.println(new String(b, "ASCII"));
or 8-bit ascii
String s = "Hello World! \u00ff";
byte[] b = s.getBytes("ISO-8859-1");
System.out.println(new String(b, "ISO-8859-1"));
EDIT
System.out.println("ASCII => " + Charset.forName("ASCII"));
System.out.println("US-ASCII => " + Charset.forName("US-ASCII"));
System.out.println("ISO-8859-1 => " + Charset.forName("ISO-8859-1"));
prints
ASCII => US-ASCII
US-ASCII => US-ASCII
ISO-8859-1 => ISO-8859-1
You could skip the step of a char array and putting in String and even use a StringBuilder (or StringBuffer if you are worried about multi-threading). My example shows StringBuilder.
byte[] bytes = ...;
StringBuilder sb = new StringBuilder(bytes.length);
for (int i = 0; i < bytes.length; i++) {
sb.append((char) (bytes[i] & 0xFF));
}
return sb.toString();
I know it doesn't answer your other question. Just seeking to help with simplifying the "boilerplate" code.
There is a String constructor that takes an array of bytes and a string specifying the format of the bytes:
String s = new String(bytes, "UTF-8"); // if the charset is UTF-8
String s = new String(bytes, "ASCII"); // if the charset is ASCII
You can use base64 encoding. There is an implementation done by apache
http://commons.apache.org/codec/
Base 64
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html

Categories