How do I convert from ASCII to String - java

I am trying to parse an ascii list to a string. The problem is that with some special chars, I have torubles. If I try to parse this:
115 097 116 195 168 108 194 183 108 105 116
, the result sould be "satèl·lit". The code I am using to parse it is :
ASCIIList.add(Character.toString((char) Integer.parseInt(asciiValue)));
But the result is satèl·lit. I saw that for example "è" -> "195 168". I do not know how to parse it correctly.

Assuming you already have split the input into an array of string, the code could look like so:
String convertToString(String[] numberArray) {
byte[] utf8Bytes = new byte[numberArray.length];
for (int i = 0; i < numberArray.length; i++) {
utf8Bytes[i] = (byte) Integer.parseInt(numberArray[i]);
}
return new String(utf8Bytes, StandardCharsets.UTF_8);
}
So each number becomes a bytes. The entire array of bytes is then converted into a string using UTF-8 charset.
UTF-8 uses multiple bytes to represent characters outside the ASCII range. In your example it affects "è" and "·".

Related

Convert USB keyboard data from byte array to String USB4Java

I am reading a USB Keyboard (QR Code scanner) input using usb4java.
My code snippet looks like this:
byte[] data = new byte[16];
UsbPipe usbPipe = usbEndpoint.getUsbPipe();
if (usbPipe != null) {
if (!usbPipe.isOpen()) {
usbPipe.open();
}
if (usbPipe.isOpen()) {
UsbIrp usbIrp = usbPipe.createUsbIrp();
usbIrp.setData(data);
I have two questions:
1] On pressing A, byte array data is 2,0,0,0,0,0,0,0,2,0,4,0,0,0,0,0
On pressing AB, byte aray data is 2,0,0,0,0,0,0,0,2,0,4,0,0,0,0,0,2,0,5,0,0,0,0,0
How to convert it into character in java? i.e. get A or AB after conversion.
2] Currently, I am passing fixed size of byte array in above code snippet. For example, if I am expecting 1 char, I am passing 16 as size of byte array, for 2 characters 24 as size and so on. Is there any other elegant solution for making it dynamic?
PS: My byte array converter snippet:
StringBuffer sb = new StringBuffer();
for (byte b : data) {
sb.append(b);
sb.append(",");
}
String byteString = sb.toString();
return byteString;
Thanks for any help
EDIT 1: Full source code here: http://tpcg.io/zt3WfM
Based on the documentation the format should be:
22 00 04 00 00 00 00 00
Offset Size Description
0 Byte Modifier keys status.
1 Byte Reserved field.
2 Byte Keypress #1.
3 Byte Keypress #2.
4 Byte Keypress #3.
5 Byte Keypress #4.
6 Byte Keypress #5.
7 Byte Keypress #6.
Based on the ASCII codes
// 'A' is 0x65
byte codeA = 0x04; // The code for A key
cahr a = 0x61 + codeA ;
byte codeX = 0x1B; // The code for X key
char x = 0x61 + code; // x == 'X'
System.out.println(a);
System.out.println(x);
Or you can use a Map(0x04, 'A')

Character strings to binary string - why are some characters multi-byte?

This code is supposed to convert a character strings to binary ones, but with a few strings, it returns a String with 16 binary digits, not 8 as I expected them to be.
public class aaa {
public static void main(String argv[]){
String nux="ª";
String nux2="Ø";
String nux3="(";
byte []bites = nux.getBytes();
byte []bites2 = nux2.getBytes();
byte []bites3 = nux3.getBytes();
System.out.println(AsciiToBinary(nux));
System.out.println(AsciiToBinary(nux2));
System.out.println(AsciiToBinary(nux3));
System.out.println("number of bytes :"+bites.length);
System.out.println("number of bytes :"+bites2.length);
System.out.println("number of bytes :"+bites3.length);
}
public static String AsciiToBinary(String asciiString){
byte[] bytes = asciiString.getBytes();
StringBuilder binary = new StringBuilder();
for (byte b : bytes)
{
int val = b;
for (int i = 0; i < 8; i++)
{
binary.append((val & 128) == 0 ? 0 : 1);
val <<= 1;
}
binary.append(' ');
}
return binary.toString();
}
}
in the first two strings, I don't understand why they return 2 bytes, since they are single-character strings.
Compiled here to: https://ideone.com/AbxBZ9
This returns:
11000010 10101010
11000011 10011000
00101000
number of bytes :2
number of bytes :2
number of bytes :1
I am using this code: Convert A String (like testing123) To Binary In Java
NetBeans IDE 8.1
A character is not always 1-byte long. Think about it - many languages, such as Chinese or Japanese, have thousands of characters, how would you map those characters to bytes?
You are using UTF-8 (one of the many, many ways of mapping characters to bytes) - looking up a character table for UTF-8, and searching for the sequence 11000010 10101010, I arrive at
U+00AA ª 11000010 10101010
Which is the UTF-8 encoding for ª. UTF-8 is often the default character encoding (charset) for Java -- but you cannot rely on this. That is why you should always specify a charset when converting strings to bytes or vice-versa
you can understand why some character are two bytes by running this simple code
// integer - binary
System.out.println(Byte.MIN_VALUE);
// -128 - 0b11111111111111111111111110000000
System.out.println(Byte.MAX_VALUE);
// 127 - 0b1111111
System.out.println((int) Character.MIN_VALUE);
// 0 - 0b0
System.out.println((int) Character.MAX_VALUE);
// 65535 - 0b1111111111111111
as you can see ,we can show Byte.MAX_VALUE with just 7 bits or 1 byte (01111111)
if you cast Character.MIN_VALUE to integer, it will be : 0
we can show it's binary format with one bit or 1 byte (00000000)!
but what about Character.MAX_VALUE ?
in binary format it's
1111111111111111 which is 65535 in decimal format and can be shown with 2 bytes (11111111 11111111).
so characters which their decimal format is between 0 and 65535 can be shown with 1 or 2 bytes.
hope you understand.

Why new String with UTF-8 contains more bytes

byte bytes[] = new byte[16];
random.nextBytes(bytes);
try {
return new String(bytes, "UTF-8");
} catch (UnsupportedEncodingException e) {
log.warn("Hash generation failed", e);
}
When I generate a String with given method, and when i apply string.getBytes().length it returns some other value. Max was 32. Why a 16 byte array ends up generating a another size byte string ?
But if i do string.length() it returns 16.
This is because your bytes are first converted to Unicode string, which attempts to create UTF-8 char sequence from these bytes. If a byte cannot be treated as ASCII char nor captured with next byte(s) to form legal unicode char, it is replaced by "�". Such char is transformed into 3 bytes when calling String#getBytes(), thus adding 2 extra bytes to resulting output.
If you're lucky to generate ASCII chars only, String#getBytes() will return 16-byte array, if no, resulting array may be longer. For example, the following code snippet:
byte[] b = new byte[16];
Arrays.fill(b, (byte) 190);
b = new String(b, "UTF-8").getBytes();
returns array of 48(!) bytes long.
Classical mistake born from the misunderstanding of the relationship between bytes and chars, so here we go again.
There is no 1-to-1 mapping between byte and char; it all depends on the character coding you use (in Java, that is a Charset).
Worse: given a byte sequence, it may or may not be encoded to a char sequence.
Try this for instance:
final byte[] buf = new byte[16];
new Random().nextBytes(buf);
final Charset utf8 = StandardCharsets.UTF_8;
final CharsetDecoder decoder = utf8.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT);
decoder.decode(ByteBuffer.wrap(buf));
This is very likely to throw a MalformedInputException.
I know this is not exactly an answer but then you didn't clearly explain your problem; and the example above shows already that you have the wrong understanding between what a byte is and what a char is.
The generated bytes might contain valid multibyte characters.
Take this as example. The string contains only one character, but as byte representation it take three bytes.
String s = "Ω";
System.out.println("length = " + s.length());
System.out.println("bytes = " + Arrays.toString(s.getBytes("UTF-8")));
String.length() return the length of the string in characters. The character Ω is one character whereas it's a 3 byte long in UTF-8.
If you change your code like this
Random random = new Random();
byte bytes[] = new byte[16];
random.nextBytes(bytes);
System.out.println("string = " + new String(bytes, "UTF-8").length());
System.out.println("string = " + new String(bytes, "ISO-8859-1").length());
The same bytes are interpreted with a different charset. And following the javadoc from String(byte[] b, String charset)
The length of the new String is a function of the charset, and hence may
not be equal to the length of the byte array.
If you look at the string you're producing, most of the random bytes you're generating do not form valid UTF-8 characters. The String constructor, therefore, replaces them with the unicode 'REPLACEMENT CHARACTER' �, which takes up 3 bytes, 0xFFFD.
As an example:
public static void main(String[] args) throws UnsupportedEncodingException
{
Random random = new Random();
byte bytes[] = new byte[16];
random.nextBytes(bytes);
printBytes(bytes);
final String s = new String(bytes, "UTF-8");
System.out.println(s);
printCharacters(s);
}
private static void printBytes(byte[] bytes)
{
for (byte aByte : bytes)
{
System.out.print(
Integer.toHexString(Byte.toUnsignedInt(aByte)) + " ");
}
System.out.println();
}
private static void printCharacters(String s)
{
s.codePoints().forEach(i -> System.out.println(Character.getName(i)));
}
On a given run, I got this output:
30 41 9b ff 32 f5 38 ec ef 16 23 4a 54 26 cd 8c
0A��2�8��#JT&͌
DIGIT ZERO
LATIN CAPITAL LETTER A
REPLACEMENT CHARACTER
REPLACEMENT CHARACTER
DIGIT TWO
REPLACEMENT CHARACTER
DIGIT EIGHT
REPLACEMENT CHARACTER
REPLACEMENT CHARACTER
SYNCHRONOUS IDLE
NUMBER SIGN
LATIN CAPITAL LETTER J
LATIN CAPITAL LETTER T
AMPERSAND
COMBINING ALMOST EQUAL TO ABOVE
String.getBytes().length is likely to be longer, as it counts bytes needed to represent the string, while length() counts 2-byte code units.
read more here
This will try to create a String assuming the bytes are in UTF-8.
new String(bytes, "UTF-8");
This in general will go horribly wrong as UTF-8 multi-byte sequences can be invalid.
Like:
String s = new String(new byte[] { -128 }, StandardCharsets.UTF_8);
The second step:
byte[] bytes = s.getBytes();
will use the platform encoding (System.getProperty("file.encoding")). Better specify it.
byte[] bytes = s.getBytes(StandardCharsets.UTF_8);
One should realize, internally String will maintain Unicode, an array of 16-bit char in UTF-16.
One should entirely abstain from using String for byte[]. It will always involve a conversion, cost double memory and be error prone.

Why am i getting 3 bytes instead 1 byte after hexadecimal/string/byte conversion in java?

I have this program:
String hexadecimal = "AF";
byte decimal[] = new byte[hexadecimal.length()/2];
int j = 0;
for ( int i = 0; i < decimal.length; i++)
{
decimal[i] = (byte) Integer.parseInt(hexadecimal.substring(j,j+2),16); //Maybe the problem is this statement
j = j + 2;
}
String s = new String(decimal);
System.out.println("TOTAL LEN: " + s.length());
byte aux[] = s.getBytes();
System.out.println("TOTAL LEN: " + aux.length);
The first total is "1" and the second one is "3", i thought i would will get "1" in the second total. Why is happen this? My intention is generate another hexadecimal string with the same value as the original string (AF), but i am having this issue.
Regards!
P.D. Sorry for my english, let me know if i explained myself well.
Don't know what exactly you try to achieve. But find below what you are doing.
Integer.parseInt(hexadecimal.substring(j, j + 2), 16) returns 175
(byte) 175 is -81
new String(decimal) tries to create an String from this byte array related to your current character set (probably it's UTF-8)
As the byte array does not contain a valid representation of UTF-8 bytes the created String contains the "REPLACEMENT CHARACTER" for the Unicode codepoint U+FFFD. The UTF-8 byte representation for this codepoint is EF BF BD (or -17 -65 -67). That's why the second length is three.
Have a look here Wikipedia UTF-8. Any character with a codepoint <= 7F can be represented by a single byte. For all other characters the first byte must have the bits 7 and 6 set 11....... Which is not the case for the value -81 which is 10101111. There for this is not a valid codepoint and it's replaced with the "REPLACEMENT CHARACTER".

Printing the address of hex string instead of hex string value

I converted a byte array into string by doing
String s = encryptedBytes1.toString();
String gh = convertStringToHex(s);
Then I printed on screen gh which is the hex form it returned this:
gh:[B#5985910
this is the function convert
public static String convertStringToHex(String str){
char[] chars = str.toCharArray();
StringBuffer hex = new StringBuffer();
for(int i = 0; i < chars.length; i++){
hex.append(Integer.toHexString((int)chars[i]));
}
return hex.toString();
}
Can any one help me printing the hex form string?
In general you can convert string and hex values (numbers) with the following functions:
String hexString1 = "0x20";
Integer integer = Integer.decode(hexString); // is 32
String hexString1 = String.toHexString(integer); // is "20"
Now you need to iterate over your byteArray/String.
EDIT: As you specified your question, please see this answer on SO. I guess it is the same problem: Converting A String To Hexadecimal In Java
encryptedBytes1.toString() is giving you a string representation of the object because all arrays are objects in Java it is not converting a byte array into a String.
I think that you are not converting your byte array to String properly. This works for me.
byte encryptedBytes1[] = "ABCDEFGHIK".getBytes();
String aux = new String(encryptedBytes1);
System.out.println(convertStringToHex(aux));
41 42 43 44 45 46 47 48 49 4b
Keep in mind that you may need to specify a charset and that the primitive data byte takes 1 byte and char(which is meant to contain a Unicode Character) takes 2.

Categories