I'm trying to read a 31 bit long Integer from an InputStream in java and i can't figure a way out for doing this. I receive four byte from the InputStream and the first bit of the first byte is a reserved bit which is always unset (0x0) and the rest is 31 bit long integer.Here is a visualization of what i described :
+-+-------------+---------------+-------------------------------+
|R| 31 bit long Integer |
+-+-------------------------------------------------------------+
I would appreciate it if you could help me come up with a solution. Thanks!
I'm trying to read a 31 bit long Integer from an InputStream
That is impossible.
The minimum size of thing you can read is a byte, which is 8 bits; all things you can read from them are a multiple of 8.
and the first bit of the first byte is a reversed byte which is always onset (0x0)
This sentence doesn't make any sense. The first 'bit of a byte' cannot be a 'reversed byte'. Given that bits are a 1-dimensional concept, there is no such thing as a 'reversed bit', and 'onset', if it means anything, means '1' and not '0', and bits are not as a rule communicated in '0x' syntax, which is hexadecimal.
I conclude you must be confused about the API.
However, to be a bit more helpful: If you have 4 bytes of data that contains a 31-bit-length integer, then:
You need to know if it is 'big endian' or 'little endian'. It will be someplace in the docs; usually protocols are big endian.
That first bit can trivially be stripped away or isolated, which should help.
Assuming big endian:
try (InputStream raw = socket.getInputStream();
DataInputStream data = new DataInputStream(raw)) {
int v = data.readInt();
boolean isolatedBit = (v >>> 31) != 0;
v = v & 0x7FFFFFFF;
}
DataInputStream has the readInt() call that takes care of business.
isolatedBit will be 0 if that 'R' bit is unset, and '1' if it is iset.
Even if this R thing is set, that last line will ensure that the value of v has that bit unset. As a consequence, the number will be between 0 and 2^31-1 (thus, always positive).
NB: After some corrections to the original question, this is much simpler:
Given that the reserved bit is always unset, you can just call int v = data.readInt(), that's the only thing in the try block that would then be required. Had the 'reserved bit' always been a 1 - you would need that & 0x7FFFFFFF to get rid of it.
Related
I have String hash in hex form ("e6fb06210fafc02fd7479ddbed2d042cc3a5155e") and I would like to compare it to crypt.digest().
One way, which works fine, is to convert crypt.digest() to hex, but I would like to avoid multiple conversions and rather convert hash from hex form (above) to byte array.
What I tried was:
byte[] hashBytes = new BigInteger(hash, 16).toByteArray();
but it does not match with crypt.digest(). When I convert hashBytes back to hex I get "00e6fb06210fafc02fd7479ddbed2d042cc3a5155e".
The leading zeros seem to be the reason why I fail to match byte arrays. Why do they occur? How can I get the same result using crypt.digest() and toByteArray?
The reason for the extra 00 is that e6 has it high (sign) bit set.
A redundant byte 00 makes it an unsigned value for BigInteger.
String hash = "e6fb06210fafc02fd7479ddbed2d042cc3a5155e";
byte[] hashBytes = new BigInteger(hash, 16).toByteArray();
hashBytes = hashBytes.length > 1 && hashBytes[0] == 0
? Arrays.copyOfRange(hashBytes, 1, hashBytes.length) : hashBytes;
System.out.println(Arrays.toString(hashBytes));
The question arises, what if the hash actually starts with a 00?
Then you need the hash length, or do a lenient comparison.
The answer can be found in the following answer from a thread about the highly related question Convert a string representation of a hex dump to a byte array using Java?:
The issue with BigInteger is that there must be a "sign bit". If the leading byte has the high bit set then the resulting byte array has an extra 0 in the 1st position. But still +1.
– Gray Oct 28 '11 at 16:20
Since the first bit has a special meaning (indicating the sign, 0 for positive, 1 for negative), BigInteger will prefix the data with an additional 0 in case your data started with a 1 on the high bit. Otherwise it would be interpreted as negative although it was not negative to begin with.
I.e. data like
101110
is turned into
0101110
You could easily undo this manually by using Arrays.copyOfRange(data, 1, data.length) if it happens.
However, instead of fixing that code, I would suggest using one of the other solutions posted in the linked thread. They are cleaner and easier to read and maintain.
I am trying to port my C++ project in java.
Down the line I have to read some bytes from serial port and combine two bytes into short. Things are working elegantly with below code.
ByteBuffer bb = ByteBuffer.allocate(2);
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.put(b1);
bb.put(b2);
result = bb.getShort(0);
In c++ project instead of short they have unsigned short (in java we don't have unsigned).
So my above logic result does not align with C++ result for below case:-
b1 = 106 , b2 = -1 c++ result = 150 and java = -150
b1 =-6, b2 = -1 506 in VC++ but -6 in java
However, if only first byte is negative then my result are similar :
b1 = -12 , b2 = 1 c++ result = 500 and java = 500
I want to align my result with c++. any suggestions and help would be highly appreciable .
Don't use getShort(), use getInt() instead.
Your problem is that the most significant bit is being interpreted as the sign bit due to the fact that this is the case for a signed short. If you extend the data type so that bit is no longer considered the sign bit, you should get the behaviour you want.
You may need to pad out your buffer to do this:
ByteBuffer bb = ByteBuffer.allocate(4);
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.put(b1);
bb.put(b2);
bb.put(0);
bb.put(0);
result = bb.getInt(0);
EDIT: This would work if only the C++ code was doing the right thing, but as pointed out in the comments above, it isn't (i.e. it is NOT simply taking the two bytes and combining them into a short. It performs some other transform on them as well).
java -6 (0xFFFA) is correct, C++ 506 (0x01FA) is not. 0xFA. Seems you mistyped -1: 1 or - -1.
This is an excerpt of code from a music tuner application. A byte[] array is created, audio data is read into the buffer arrays, and then the for loop iterates through buffer and combines the values at indices n,n+1, to create an array of 16-bit numbers that is half the length.
byte[] buffer = new byte[2*1200];
targetDataLine.read(buffer, 0, buffer.length)
for ( int i = 0; i < n; i+=2 ) {
int value = (short)((buffer[i]&0xFF) | ((buffer[i+1]&0xFF) << 8)); //**Don't understand**
a[i >> 1] = value;
}
So far, what I have is this:
From a different SO post, I learned that every byte being stored in a larger type must be & with 0xFF, due to its conversion to a 32-bit number. I guess the leading 24 bits are filled with 1s (though I don't know why it isn't filled with zeros... wouldn't leading with 1s change the value of the number? 000000000010 (2) is different from 111111110010 (-14), after all.), so the purpose of 0xff is to only grab the last 8 bits (which is the whole byte).
When buffer[i+1] is shifted left by 8 bits, this makes it so that, when ORing, the eight bits from buffer[i+1] are in the most significant positions, and the eight bits from buffer[i] are in the least significant eight bits. We wind up with a 16-bit number that is of the form buffer[i+1] + buffer[i]. (I'm using + but I understand it's closer to concatenation.)
First, why are we ORing buffer[i] | buffer[i+1] << 8? This seems to destroy the original sound information unless we pull it back out in the same way; while I understand that OR will combine them into one value, I don't see how that value can be useful or used in calculations later. And the only way this data is accessed later is as its literal values:
diff += Math.abs(a[j]-a[i+j];
If I have 101 and 111, added together I should get 12, or 1100. Yet 101 | 111 << 3 gives 111101, which is equal to 61. The closest I got to understanding was that 101 (5) | 111000 (56) is the same as adding 5+56=61. But the order matters -- doing the reverse 101 <<3 | 111 is completely different. I really don't understand how the data can remain useful, when it is OR'd in this way.
The other problem I'm having is that, because Java uses signed bytes, the eighth position doesn't indicate the value, but the sign. If I'm ORing two binary signed numbers, then in the resulting 16-bit number, the bit at 2⁷ is now acting as a value instead of a placeholder. If I had a negative byte before running the OR, then in my final value post-operation, it would now erroneously be acting as though the original number had a positive 2⁷ in it. 0xff doesn't get rid of this, because it preserves the eighth, signed byte, so shouldn't this be a problem?
For example, 1111 (-1) and 0101, when OR'd, might give 01011111. But 1111 wasn't representing POSITIVE 1111, it was representing the signed version; yet in the final answer, it now is acting as a positive 2³.
UPDATE: I marked the accepted answer, but it took that + a little extra work to figure out where I went wrong. For anyone who may read this in the future:
As far as the signing goes, the code I have uses signed bytes. My only guess as to why this doesn't mess anything up is because all of the values received might be of positive sign. Except that this doesn't make sense, given a waveform varies amplitude from [-1,1]. I'm going to play around with this to try and figure it out. If there are negative signs, the implementation of code here doesn't seem to remove the 1 when ORing, so I suspect that it doesn't affect the computation too much (given that we're dealing with really large values (diff += means diff will be really large -- a few extra 1s shouldn't hurt the outcome given the code and the comparisons it relies on. So this was all wrong. I gave it some more thought and it's really simple, actually -- the only reason this was such a problem is because I didn't know about big-endian, and then once I read about it, I misunderstood exactly how it is implemented. Endian-ness explained in the next bulletpoint.
Regarding the order in which the bits are placed, destroying the sound, etc. The code I'm using sets bigEndian=false, meaning that the byte order goes from least significant byte to most significant byte. For this reason, combining the two indices of buffer requires taking the second index, placing its bits first, and placing the first index as second (so we are now in big-endian byte order). One of the problems I had was the impression that "endian-ness" determines the bit order. I thought 10010101 big-endian would become 10101001 small-endian. Turns out this is not the case -- the bits in each byte remain in their original order; the difference is that the bytes are ordered "backward". So 10110101 111000001 big-endian becomes 11100001 10110101 -- same bit order within each byte; however, different byte order.
Finally, I'm not sure why, but the accepted answer is correct: targetDataLine.read() may place the bits into a byte array only (not just in my code, but in all Java code using targetDataLine -- read() only accepts arguments where the destination var is a byte array), but the data is in fact one short split into two bytes. It is for this reason that every two indices must be combined together.
Coming back to the signing goes, it should be obvious by now why this isn't an issue. This is the commenting that I now have in the code, which more coherently explains what it took all of this^ to explain before:
/* The Javadoc explains that the targetDataLine will only read to a byte-typed array.
However, because the sample size is 16-bit, it is actually storing 16-bit numbers
there (shorts), auto-parsing them every eight bits. Additionally, because it is storing
them in little-endian, bits [2^0,2^7] are stored in index[i] in normal order (powers 76543210)
while bits [2^8,2^15] are stored in index[i+1]. So, together they currently read as [7-6-5-4-3-2-1-0 15-14-13-12-11-10-9-8],
which is a problem. In the next for loop, we take care of this and re-organize the bytes by swapping every pair (remember the bits are ok, but the bytes are out of order).
Also, although the array is signed, this will not matter when we combine bytes, because the sign-bit (2^15) will be placed
back at the beginning like it normally is; although 2^7 currently exists as the most significant bit in its byte,
it is not a sign-indicating bit,
because it is really the middle of the short which was split. */
This is combining the byte stream from input in low bytes first byte order to a stream of shorts in internal byte order.
With sign extesion it is more a question of the sign encoding of the original byte stream. If the original byte stream is unsigned (coding values from 0 to 255), then the overcomes the then unwanted effects of java treating values as signed. So educated guess is taht the external byte strem encodes unsigned bytes.
Judging whether the code is plausible needs information on what externel encoding is being treated and what internal encoding is used. E.g. (wild guess could be totally wrong!): the two byte junks read coud belong to 2 channels of a stereo sound encoding and are put into a single short for ease of internal processing. You should look at the encoding being read and the use of the converted data within the application.
I am receiving some numerical data from a Java client via socket connection on C++ server. When I receive 4 byte int type data, what I need is just using ntohl() function or reverse the bit order to convert to c++ int type. However, I'am having trouble trying to convert long data type from Java. No matter what I tried, I could not recover the correct value. I used LONG64, ULONG64 and int64_t as well, and none of them worked.
For example, when I send long s = 1 from Java, on C++ side I did:
int64_t size;
recv(client, (char *)&size, sizeof int64_t, 0);
if I do
size = ntohl(size)
Then size will become 0 whatever the original long value is in Java !
If I don't do ntohl() conversion, then size = 72057594037927936 for s = 1
I have hardly found any useful information on this topic and I would appreciate any suggestion.
The value 72057594037927936 is 0x0100000000000000 in Hex. As you may have guessed, that's simply backwards byte ordering, the 1 is in front instead of back.
ntohl() is 32-bit, so it is throwing out those top four bytes (the first 8 hex digits), giving you zero. You could possibly use htonll instead, but that isn't quite right. The best thing is to reverse the order of the bytes yourself.
int64_t size;
recv(client, (char *)&size, sizeof int64_t, 0);
char *start = (char *)&size, *end = start + sizeof(size);
std::reverse(start, end);
There are a ton of ways of reversing the bytes, and a ton of ways of dealing with little/big endian problems in general.
I have Huffman coding project that in first step we obtain code of each character depends on Huffman tree.I obtain code of each character for example : a = 01 , b= 101 , c = 111.these codes are String and i want to save them in a file with .cmp extension in binary for example we have a text such : abc and encoding is:01101111 how can i write them to a file with binary value in a file with .cmp extension and after that read them and decode them?
Hopefully you know that bytes and integers consist of bits, so you just need to build a little queue of bits that is a single integer containing the bits and another integer that tracks the number of bits in the first integer, accumulating bits using the shift and or operators. Once you have accumulated a byte, write it out and shift it out of your queue. E.g. to put n bits in buf |= val << bits; bits += n;, and then to pull bits out if you have enough: while (bits >= 8) { write_out(buf & 0xff); buf >>= 8; bits -= 8;. Make sure that you integer is large enough to handle the largest value of n you will have. I.e., buf needs to be able to hold maxn+7 bits, since the while loop will never leave more than 7 bits in the buffer.
if you want to work with bit streams then it is easier to take completed framework, for instance JBBP (java binary block parser) which has JBBPBitOutputStream class providing bit write operations (also there is JBBPBitInputStream class to read bits from streams)