This question already has answers here:
What is the best way to work around the fact that ALL Java bytes are signed?
(7 answers)
Closed 4 years ago.
I need to make use of an array of unsigned bytes. I need to send certain characters over the network to a server and some of these characters are greater that 127.
I have a simplified version of the code below to try and understand the concept:
int i= 160;
byte j = (byte) i;
System.out.println((byte)i);
System.out.println(j);
and this gives an output of:
-96
-96
I need to print 160. As the server is expecting a byte of 160 and if it receives -96 it does not accept the value. The reason I used an int is that when I was reading how to get around the problem, I often came across the suggestion to just use an int, but I don't quite understand that, as I need my array to be of type byte.
This is the part of my code where I send the array:
public boolean send(byte[] data) {
try {
out.write(data); // Write the data to the outStream
out.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return false; // Return false if the TCP Transmit failed
// }
return false;
}
I would really appreciate it if some one could help me.
There is no distinction in Java between signed and unsigned bytes. Both of the following will assign the same value to a byte:
byte i = (byte)160;
byte j = (byte)-96;
Its up to you as a developer to treat them as signed or unsigned when you print them out. The default is to print them signed, but you can force them to print unsigned by converting them to an integer in an unsigned manner.
System.out.println(i); // -96
System.out.println(0xff&i); // 160
If you want to know how bytes can represent both negative and positive numbers at the same time, read this article on two’s complement arithmetic in Java
Sending -96 is the correct behavior. Signed and unsigned bytes are different when they're printed out, but not in the bit representation, and they should not make a difference when they are received by another server.
There is no System.out.println(byte). The closest is System.out.println(int). This means your byte values are converted to ints. The conversion extends the high bit, resulting in a negative number.
This following code will demonstrate my point:
byte[] data =
{
(byte)0x01,
(byte)0x02,
(byte)0x7F,
(byte)0x80
};
for (byte current : data)
{
String output = String.format("0x%x, 0x%x", current, (int)current);
System.out.println(output);
}
If you want to use System.out.println to see your byte values, mask off the top three bytes of the integer value, something like this:
System.out.println(((0x000000FF & (int)current);
Bytes are not signed or unsigned by themselves. They are interpreted as such when some operations are applied (say, compare to 0 to determine sign). The operations can be signed and unsigned, and in Java, only signed byte operations are available. So the code you cited is useless for the question - it sends bytes but does not do any operation. Better show the code which receives bytes. One correct and handy method is to use java.io.InputStream.read() method which returns byte as an integer in the range 0...255.
Related
I have int numbers with values between 0-65535. I need to store each number as a byte array of 2 bytes length, whether the number could fit on 1 byte as well or not. Once the numbers are stored in byte arrays, I need to be able to convert them back to int. Right now I don't know how to store a number that is not between -32,768 and 32,767 on 2 bytes and be able to properly convert it back to its original int value.
You can store values from 0-65535 in a char-value and convert a char to byte[] (with a length of 2) using the following method:
public static byte[] toBytes(char c) {
return ByteBuffer.allocate(Character.BYTES).putChar(c).array();
}
See here
EDIT:
Works backwards using ByteBuffer to:
public static char charFromBytes(byte[] bytes) {
return ByteBuffer.wrap(bytes).getChar();
}
Storing the first byte as: (byte) (myIntNumber >> 8) and the second as (byte) myIntNumber seems working just fine for int -> byte array conversion, I'm still curious about how do I get back the int properly from a byte array.
I've got a huge string of bits (with some \n in it too) that I pass as a parameter to a method, which should isolate the bits 8 by 8, and convert them all to bytes using parseInt().
Thing is, every time the substring of 8 bits starts with a 1, the resulting byte is a negative number. For example, the first substring is '10001101', and the resulting byte is -115. I can't seem to figure out why, can someone help? It works fine with other substrings.
Here's my code, if needed :
static String bitsToBytes(String geneString) {
String geneString_temp = "", sub;
for(int i = 0; i < geneString.length(); i = i+8) {
sub = geneString.substring(i, i+8);
if (sub.indexOf("\n") != -1) {
if (sub.indexOf("\n") != geneString.length())
sub = sub.substring(0, sub.indexOf("\n")) + sub.substring(sub.indexOf("\n")+1, sub.length()) + geneString.charAt(i+9);
}
byte octet = (byte) Integer.parseInt(sub, 2);
System.out.println(octet);
geneString_temp = geneString_temp + octet;
}
geneString = geneString_temp + "\n";
return geneString;
}
In Java, byte is a signed type, meaning that when the most significant bit it set to 1, the number is interpreted as negative.
This is precisely what happens when you print your byte here:
System.out.println(octet);
Since PrintStream does not have an overload of println that takes a single byte, the overload that takes an int gets called. Since octet's most significant bit is set to 1, the number gets sign-extended by replicating its sign bit into bits 9..32, resulting in printout of a negative number.
byte is a signed two's complement integer. So this is a normal behavior: the two's complement representation of a negative number has a 1 in the most-significant bit. You could think of it like a sign bit.
If you don't like this, you can use the following idiom:
System.out.println( octet & 0xFF );
This will pass the byte as an int while preventing sign extension. You'll get an output as if it were unsigned.
Java doesn't have unsigned types, so the only other thing you could do is store the numbers in a wider representation, e.g. short.
In Java, all integers are signed, and the most significant bit is the sign bit.
Because parseInt parse signed int that means it converts the binary if it begins with 0 its positive and if 1 its negative try to use parseUnsignedInt instead
The question is about the correct way of creating a hash in Java:
Lets assume I have a positive BigInteger value that I would like to create a hash from. Lets assume that below instance of the messageDigest is a valid instance of (SHA-256)
public static final BigInteger B = new BigInteger("BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58", 16);
byte[] byteArrayBBigInt = B.toByteArray();
this.printArray(byteArrayBBigInt);
messageDigest.reset();
messageDigest.update(byteArrayBBigInt);
byte[] outputBBigInt = messageDigest.digest();
Now I only assume that the code below is correct, as according to the test the hashes I produce match with the one produced by:
http://www.fileformat.info/tool/hash.htm?hex=BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58
However I am not sure why we are doing the step below i.e.
because the returned byte array after the digest() call is signed and in this case it is a negative, I suspect that we do need to convert it to a positive number i.e. we can use a function like that.
public static String byteArrayToHexString(byte[] b) {
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString((b[i] & 0xff) + 0x100, 16).substring(1);
}
return result;
}
thus:
String hex = byteArrayToHexString(outputBBigInt)
BigInteger unsignedBigInteger = new BigInteger(hex, 16);
When I construct a BigInteger from the new hex string and convert it back to byte array then I see that the sign bit, that is most significant bit i.e. the leftmost bit, is set to 0 which means that the number is positive, moreover the whole byte is constructed from zeros ( 00000000 ).
My question is: Is there any RFC that describes why do we need to convert the hash always to a "positive" unsigned byte array. I mean even if the number produced after the digest call is negative it is still a valid hash, right? thus why do we need that additional procedure. Basically, I am looking for a paper: standard or rfc describing that we need to do so.
A hash consists of an octet string (called a byte array in Java). How you convert it to or from a large number (a BigInteger in Java) is completely out of the scope for cryptographic hash algorithms. So no, there is no RFC to describe it as there is (usually) no reason to treat a hash as a number. In that sense a cryptographic hash is rather different from Object.hashCode().
That you can only treat hexadecimals as unsigned is a bit of an issue, but if you really want to then you can first convert it back to a byte array, and then perform new BigInteger(result). That constructor does threat the encoding within result as signed. Note that in protocols it is often not needed to convert back and forth to hexadecimals; hexadecimals are mainly for human consumption, a computer is fine with bytes.
I apologize if this question is a bit simplistic, but I'm somewhat puzzled as to why my professor has made the following the statement:
Notice that read() returns an integer value. Using an int as a return type allows read() to use -1 to indicate that it has reached the end of the stream. You will recall from your introduction to Java that an int is equal to a char which makes the use of the -1 convenient.
The professor was referencing the following sample code:
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("Independence.txt");
out = new FileOutputStream("Independence.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
This is an advanced Java course, so obviously I've taken a few introductory courses prior to this one. Maybe I'm just having a "blonde moment" of sorts, but I'm not understanding in what context an integer could be equal to a character when making comparisons. The instance method read() returns an integer value when it comes to EOF. That I understand perfectly.
Can anyone shed light on the statement in bold?
In Java, chars is a more specific type of int. I can write.
char c = 65;
This code prints out "A". I need the cast there so Java knows I want the character representation and not the integer one.
public static void main(String... str) {
System.out.println((char) 65);
}
You can look up the int to character mapping in an ASCII table.
And per your teacher, int allows for more values. Since -1 isn't a character value, it can serve as a flag value.
To a computer a character is just a number (that may at some point be mapped to a picture of a letter for display to the user). Languages usually have a special character type to distinguish between "just a number" and "a number that refers to a character", but inside, it's still just some sort of integer.
The reason why read() returns an int is to have "one extra value" to represent EOF. All the values of char are already defined to mean something else, so it uses a larger type to get more values.
It means your professor has been spending too much time programming in C. The definition of read for InputStream (and FileInputStream) is:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned.
(See http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read())
A char in Java, on the other hand, represents a Unicode character, and is treated as an integer in the range 0 to 65535. (In C, a char is an 8-bit integral value, either 0 to 255 or -128 to 127.)
Please note that in Java, a byte is actually an integer in the range -128 to 127; but the definition of read has been specified to avoid the problem, by decreeing that it will return 0 to 255 anyway. The javadoc is using "byte" in a loose sense here.
The char data type in Java is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
The int data type in Java is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive).
Since char cannot be negative (a number between 0 and 65,535) and an int can be negative, the possible values returned from the method is -1 (to signify nothing left) to 65,535 (max value of a char).
What your professor is referring to the fact that characters are just integers used in a special context. If we ignore Unicode and other encoding types and focus on the old days of ASCII, there was an ASCII table (http://www.asciitable.com/). A string of characters is really just a sequence of integers, for example, TUV would be 84 followed by 85 followed by 86.
The 'char' type is an integer internally in the JVM and is more or less a hint that this integer should only be used in a character context.
You can even cast between them.
char a = (char) 65;
int i = (int) 'A';
Those two variables hold the same data in memory, but the compiler and JVM treat them slightly differently.
Because of this, read() returns an integer instead of char so as to allow a -1, which is not a valid character code. Values other than -1 can be cast to a char, while -1 indicates EOF.
Of course, Unicode changes all of this with multi-byte character and code points. I'll leave that as an exercise to you.
I am not sure what the professor means but what it all comes down to is computers only understand 1's and 0's we don't understand 1's and 0's all that we'll so we use a code system first Morris code then ascii now utf -16 ... It varies from computer to computer how accurate numbers(int) is.you know in the real world int is infinate they just keep counting.char also has a size.in utf _16 let's just say it's 16 bits (I will let you read up on that) so if char and int both take 16 bits as the professor says they are the same (size) and reading 1 char is the same as 1int . By the way to be politically correct char is infinite as well.Chinese characters French characters and the character I just made up but can't post cause its not supported.so think of the code system for int and char. -1 int is eof char.(eof = end of file) good luck, I hope this helped.what I don't understand is reading and writing to the same file?
I have a file on disk which I'm reading which has been written by c/c++ code. I know I have two 64-bit unsigned integers to read, but Java doesn't support unsigned integers, so the value I get when I do DataInputStream.readLong() is incorrect. (Ignore byte-order for now I'm actually using a derivative of DIS called LEDataInputStream which I downloaded from the web)
A lot of posts on here talk about using BigInteger but the javadoc for reading a bytearray only talks about loading a bytearray respresentation, and the questions seem centered on the fact that some people are going outside the positive bounds of the java long type, which I will be nowhere near with the data I'm reading.
I have a MATLab/Octave script which reads these long long values as two 32-bit integers each, then does some multiplying and adding to get the answer it wants too.
I suppose the question is - how do i read a 64-bit unsigned integer either using BigInteger, or using [LE]DataInputStream.XXX?
Thanks in advance
I would suggest using a ByteBuffer and then using code such as this to get what you want.
You can use a long as a 64-bit value to store unsigned data. Here is a module showing that most Unsigned operations can be performed using the standard long type. It really depends on what you want to do with the value as whether this is problem or not.
EDIT: A common approach to handling unsigned numbers is to widen the data type. This simpler in many cases but not a requirement (and for long using BigInteger doesn't make things any simpler IMHO)
EDIT2: What is wrong with the following code?
long max_unsigned = 0xFFFFFFFFFFFFFFFFl;
long min_unsigned = 0;
System.out.println(Unsigned.asString(max_unsigned) + " > "
+ Unsigned.asString(min_unsigned) + " is "
+ Unsigned.gt(max_unsigned, min_unsigned));
prints
18446744073709551615 > 0 is true
first you check out this question
Also see this
Now use of BigInteger class
// Get a byte array
byte[] bytes = new byte[]{(byte)0x12, (byte)0x0F, (byte)0xF0};
// Create a BigInteger using the byte array
BigInteger bi = new BigInteger(bytes);
// Format to binary
String s = bi.toString(2); // 100100000111111110000
// Format to octal
s = bi.toString(8); // 4407760
// Format to decimal
s = bi.toString(); // 1183728
// Format to hexadecimal
s = bi.toString(16); // 120ff0
if (s.length() % 2 != 0) {
// Pad with 0
s = "0"+s;
}
// Parse binary string
bi = new BigInteger("100100000111111110000", 2);
// Parse octal string
bi = new BigInteger("4407760", 8);
// Parse decimal string
bi = new BigInteger("1183728");
// Parse hexadecimal string
bi = new BigInteger("120ff0", 16);
// Get byte array
bytes = bi.toByteArray();