Problem converting C/C++ unsigned char to JAVA - java

The problem with unsigned char.
I am reading a PPM image file which has data in ASCII/Extended ASCII.
For a character, eg. '†' ,
In JAVA, after reading it as char and typecasting into int its value is 8224.
In C/C++, after reading it as a unsigned char and typecasting into int its value is 160.
How would i read in JAVA so as to get value 160 ?
The followng C++
unsigned char ch1 ='†';
char ch2 = '†';
cout << (int) ch1 << "\n"; // prints 160
cout << (int) ch2 << "\n"; // prints -96
In Java,
char ch1 = '^';
char ch2 = '†';
System.out.println (" value : " + (int) ch1); // prints 94
System.out.println (" value :" + (byte) ch1); // prints 94
System.out.println (" value : " + (int) ch2); // prints 8224
System.out.println (" value :" + (byte) ch2); // prints 32
Following are some exceptions
8224 †
8226 •
8800 ≠
8482 ™
8710 ∆
8211 –
8221 ”
8216 ‘
9674 ◊
8260 ⁄
8249 ‹
8249 ‹
8734 ∞
8747 ∫
8364 €
8730 √
8804 ≤
Following are some good ones
94 ^
102 f
112 p
119 w
126 ~
196 Ä
122 z
197 Å
197 Å
Any help is appreciated

In C++ you are using "narrow" characters in some specific encoding that happens to define character '†' as 160. In other encodings 160 may mean something else, and character '†' may be missing altogether.
In Java, you are always dealing with Unicode. 8660 = 0x2020 = U+2020 "DAGGER".
To get "160", you need to convert your string to the same encoding you are using with C++. See String.getBytes(charset).

IIRC Java uses a 16-bit representation for chars (UNICODE?) and C++ normally doesn't unless you use wchars.
I think you'd be better off trying to get C++ to use the UNICODE characters that Java uses rather than the other way around.

If you write out the unsigned char 160 in C++ as a single byte, and use InputStream.read() you will get 160. Which character this means depends on the assumed encoding but the value 160 is unchanged.

Related

Why reading a character that has no ASCII representation with System.in doesn't give the character in two bytes?

import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
char ch = '诶';
System.out.println((int)ch);
int c;
while ((c = System.in.read()) != -1)
{
System.out.println(c);
}
}
}
Output:
35830
Here, The value that represents the char 诶 in unicode is 35830. In binary, It'll be 10001011 11110110.
When I enter that character in the terminal, I expected to get two bytes, 10001011 and 11110110. and when combining them again, I can be able to obtain the original char.
But what I actually get is:
232
175
182
10
I can see that 10 represent the newline character. But what does the first 3 numbers mean?
UTF-8 is a multi-byte variable-length encoding.
In order for something reading a stream of bytes to know that there are more bytes to be read in order to finish the current codepoint, there are some values that just cannot occur in a valid UTF-8 byte stream. Basically, certain patterns indicate "hang on, I'm not done".
There's a table which explains it here. For a codepoint in the range U+0800 to U+FFFF, it needs 16 bits to represent it; its byte representation consists of 3 bytes:
1st byte 2nd byte 3rd byte
1110xxxx 10xxxxxx 10xxxxxx
You are seeing 232 175 182 because those are the bytes of the UTF-8 encoding.
byte[] bytes = "诶".getBytes(StandardCharsets.UTF_8);
for (byte b : bytes) {
System.out.println((0xFF & b) + " " + Integer.toString(0xFF & b, 2));
}
Ideone demo
Output:
232 11101000
175 10101111
182 10110110
So the 3 bytes follow the pattern described above.

JAVA Byte Manipulation

I want to read a binary file and do some manipulation on each byte. I want to test that I am manipulating the bytes correctly. I want to set a byte variable1 to "00000000" and then another byte variable2 set at "00001111" and OR them newvariable = variable1|variable2, shift the newvariable << 4 bits and then print out the int value.
byte a = 00000000;
//Convert first oneByte to 4 bits and then xor with a;
byte b = 00001111;
byte c = (byte)(a|b);
c = c << 4;
System.out.println("byte= " + c + "\n");
I am not sure why I keep getting "incompatiable types:possible lossy conversion from byte to int"
You need to put a '0b' in front of those numbers to express binary constants. The number 00001111 is interpreted as a literal in octal, which is 585 in decimal. The max byte is 127 (since it's signed). Try 0b00001111 instead.
As literals, those will still be int, so depending on where you do the assignment, you may also need to explicitly cast down to byte.

Why am i getting 3 bytes instead 1 byte after hexadecimal/string/byte conversion in java?

I have this program:
String hexadecimal = "AF";
byte decimal[] = new byte[hexadecimal.length()/2];
int j = 0;
for ( int i = 0; i < decimal.length; i++)
{
decimal[i] = (byte) Integer.parseInt(hexadecimal.substring(j,j+2),16); //Maybe the problem is this statement
j = j + 2;
}
String s = new String(decimal);
System.out.println("TOTAL LEN: " + s.length());
byte aux[] = s.getBytes();
System.out.println("TOTAL LEN: " + aux.length);
The first total is "1" and the second one is "3", i thought i would will get "1" in the second total. Why is happen this? My intention is generate another hexadecimal string with the same value as the original string (AF), but i am having this issue.
Regards!
P.D. Sorry for my english, let me know if i explained myself well.
Don't know what exactly you try to achieve. But find below what you are doing.
Integer.parseInt(hexadecimal.substring(j, j + 2), 16) returns 175
(byte) 175 is -81
new String(decimal) tries to create an String from this byte array related to your current character set (probably it's UTF-8)
As the byte array does not contain a valid representation of UTF-8 bytes the created String contains the "REPLACEMENT CHARACTER" for the Unicode codepoint U+FFFD. The UTF-8 byte representation for this codepoint is EF BF BD (or -17 -65 -67). That's why the second length is three.
Have a look here Wikipedia UTF-8. Any character with a codepoint <= 7F can be represented by a single byte. For all other characters the first byte must have the bits 7 and 6 set 11....... Which is not the case for the value -81 which is 10101111. There for this is not a valid codepoint and it's replaced with the "REPLACEMENT CHARACTER".

How to print extended value ascii for a string in java?

I have requirement where i have to print ascii value for a string, when i try printing the values its printing unexpected value,my program looks like below
int s=1161;
String hex=Integer.toHexString(1161);
hex="0"+hex;
char firstByte = (char) (Integer.parseInt(hex.substring(0,2),16));
char secondByte = (char) (Integer.parseInt(hex.substring(2,4),16));
and the output if the program is
first byte-- some rectangle shape
second byte--?
where i'm expecting the ascii code are
first byte-- EOT
second byte--‰
can some one help me how can i achieve this?
You intend to do tbe following in a somewhat convoluted way:
String hex = String.format("%04x", s); // delivering 0489
The first byte is 0x04 = 4, an ASCII control char, Ctrl-D, or EOT.
The second byte is 89, is actually out of the 7bit ASCII range. Depending on the encoding that might be the promil sign, but in Unicode would be the Unicode control character for a tab with justification.
You should try following code...
int s = 1161;
String hex = Integer.toHexString(s);
// hex="0"+hex;
char firstByte = (char) (Integer.parseInt(hex.substring(0, 2), 16));
char secondByte = (char) (Integer.parseInt(hex.substring(2, 3), 16));
System.out.println("First = " + firstByte + ", Second = " + secondByte + ", Hex " + hex);
output
First = H, Second = , Hex 489
Test your functions with more reliable input. Control characters like EOT may be represented by squares or any other kind of placeholder. Anything above 127 is not uniquely defined in ascii, so it might just show up as "?". Seems to me your function works correctly.
See also http://en.wikipedia.org/wiki/Ascii for all well defined ascii symbols.

how to get the binary values of the bytes stored in byte array

i am working on a project that gets the data from the file into a byte array and adds "0" to that byte array until the length of the byte array is 224 bits. I was able to add zero's but i am unable to confirm that how many zero's are sufficient. So i want to print the file data in the byte array in binary format. Can anyone help me?
For each byte:
cast to int (happens in the next step via automatic widening of byte to int)
bitwise-AND with mask 255 to zero all but the last 8 bits
bitwise-OR with 256 to set the 9th bit to one, making all values exactly 9 bits long
invoke Integer.toBinaryString() to produce a 9-bit String
invoke String#substring(1) to "delete" the leading "1", leaving exactly 8 binary characters (with leading zeroes, if any, intact)
Which as code is:
byte[] bytes = "\377\0\317\tabc".getBytes();
for (byte b : bytes) {
System.out.println(Integer.toBinaryString(b & 255 | 256).substring(1));
}
Output of above code (always 8-bits wide):
11111111
00000000
11001111
00001001
01100001
01100010
01100011
Try Integer.toString(bytevalue, 2)
Okay, where'd toBinaryString come from? Might as well use that.
You can work with BigInteger like below example, most especially if you have 256 bit or longer.
Put your array into a string then start from there, see sample below:
String string = "10000010";
BigInteger biStr = new BigInteger(string, 2);
System.out.println("binary: " + biStr.toString(2));
System.out.println("hex: " + biStr.toString(16));
System.out.println("dec: " + biStr.toString(10));
Another example which accepts bytes:
String string = "The girl on the red dress.";
byte[] byteString = string.getBytes(Charset.forName("UTF-8"));
System.out.println("[Input String]: " + string);
System.out.println("[Encoded String UTF-8]: " + byteString);
BigInteger biStr = new BigInteger(byteString);
System.out.println("binary: " + biStr.toString(2)); // binary
System.out.println("hex: " + biStr.toString(16)); // hex or base 16
System.out.println("dec: " + biStr.toString(10)); // this is base 10
Result:
[Input String]: The girl on the red dress.
[Encoded String UTF-8]: [B#70dea4e
binary: 101010001101000011001010010000001100111011010010111001001101100001000000110111101101110001000000111010001101000011001010010000001110010011001010110010000100000011001000111001001100101011100110111001100101110
hex: 546865206769726c206f6e20746865207265642064726573732e
You can also work to convert Binary to Byte format
try {
System.out.println("binary to byte: " + biStr.toString(2).getBytes("UTF-8"));
} catch (UnsupportedEncodingException e) {e.printStackTrace();}
Note:
For string formatting for your Binary format you can use below sample
String.format("%256s", biStr.toString(2).replace(' ', '0')); // this is for the 256 bit formatting
First initialize the byte array with 0s:
byte[] b = new byte[224];
Arrays.fill(b, 0);
Now just fill the array with your data. Any left over bytes will be 0.

Categories