I have the following code to convert hex string to bytes in Java:
String s = "longhex";
int len = s.length();
byte[] data = new byte[(len / 2)];
for (int i = 0; i < len; i += 2)
{
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4) + Character.digit(s.charAt(i + 1), 16));
}
is this a correct way to reproduce it in ruby?
s = "longhex"
bytes = []
(0..s.length / 2 - 1).step(2).each do |i|
bytes[i / 2] = s[i].ord << 4 + s[i + 1].ord
end
No it is not correct. << has a lower operator precedence than +. Note even in java there are parentheses around shift operator. Also, it’s not ruby, it’s c written with an almost ruby syntax.
str.codepoints.
each_slice(2).
map { |f, l| (f << 4) + l }
would probably do what you want, but without seeing an expected outcome it’s hard to say.
Correct version as by Ilya is:
str.scan(/.{1}/).
each_slice(2).
map { |f, l| (Integer(f,16) << 4) + Integer(l,16) }
Related
I am working on a side project at work where I would like to read/write SAS Transport files. The challenge is that numbers are encoded in 64-bit IBM floating point numbers. While I have been able to find plenty of great resources for reading a byte array (containing an IBM float) into a IEEE 32-bit floats and 64-bit floats, I'm struggling to find the code to convert floats/doubles back to IBM floats.
I recently found some code for writing a 32-bit IEEE float back out to a byte array (containing an IBM float). It seems to be working, so I've been trying to translate it to a 64-bit version. I've reversed engineered where most of the magic numbers are coming from, but I've been stumped for over a week now.
I have also tried to translate the functions listed at the end of the SAS Transport documentation to Java, but I've run into a lot of issues related to endiness, Java's lack of unsigned types, and so on. Can anyone provide the code to convert doubles to IBM floating point format?
Just to show the progress I've made, here are some shortened versions of the code I've written so far:
This grabs a 32-bit IBM float from a byte array and generates an IEEE float:
public static double fromIBMFloat(byte[] data, int offset) {
int temp = readIntFromBuffer(data, offset);
int mantissa = temp & 0x00FFFFFF;
int exponent = ((temp >> 24) & 0x7F) - 64;
boolean isNegative = (temp & 0x80000000) != 0;
double result = mantissa * Math.pow(2, 4 * exponent - 24);
if (isNegative) {
result = -result;
}
return result;
}
This is the same thing for 64-bit:
public static double fromIBMDouble(byte[] data, int offset) {
long temp = readLongFromBuffer(data, offset);
long mantissa = temp & 0x00FFFFFFFFFFFFFFL;
long exponent = ((temp >> 56) & 0x7F) - 64;
boolean isNegative = (temp & 0x8000000000000000L) != 0;
double result = mantissa * Math.pow(2, 4 * exponent - 24);
if (isNegative) {
result = -result;
}
return result;
}
Great! These work for going to IEEE floats, but now I need to go the other way. This simple implementation seems to be working for 32-bit floats:
public static void toIBMFloat(double value, byte[] xport, int offset) {
if (value == 0.0 || Double.isNaN(value) || Double.isInfinite(value)) {
writeIntToBuffer(xport, offset, 0);
return;
}
int fconv = Float.floatToIntBits((float)value);
int fmant = (fconv & 0x007FFFFF) | 0x00800000;
int temp = (fconv & 0x7F800000) >> 23;
int t = (temp & 0xFF) - 126;
while ((t & 0x3) != 0) {
++t;
fmant >>= 1;
}
fconv = (fconv & 0x80000000) | (((t >> 2) + 64) << 24) | fmant;
writeIntToBuffer(xport, offset, fconv);
}
Now, the only thing left is to translate that to work with 64-bit IBM floats. A lot of the magic numbers listed relate to the number of bits in the IEEE 32-bit floating point exponent (8-bits) and mantissa (23-bit). So for 64-bit, I just need to switch those to use the 11-bit exponent and 52-bit mantissa. But where does that 126 come from? What is the point of the 0x3 in the while loop?
Any help breaking down the 32-bit version so I can implement a 64-bit version would be greatly appreciated.
I circled back and took another swing at the C implementations provided at the end of the SAS transport documentation. It turns out the issue wasn't with my implementation; it was an issue with my tests.
TL;DR
These are my 64-bit implementations:
public static void writeIBMDouble(double value, byte[] data, int offset) {
long ieee8 = Double.doubleToLongBits(value);
long ieee1 = (ieee8 >>> 32) & 0xFFFFFFFFL;
long ieee2 = ieee8 & 0xFFFFFFFFL;
writeLong(0L, data, offset);
long xport1 = ieee1 & 0x000FFFFFL;
long xport2 = ieee2;
int ieee_exp = 0;
if (xport2 != 0 || ieee1 != 0) {
ieee_exp = (int)(((ieee1 >>> 16) & 0x7FF0) >>> 4) - 1023;
int shift = ieee_exp & 0x3;
xport1 |= 0x00100000L;
if (shift != 0) {
xport1 <<= shift;
xport1 |= ((byte)(((ieee2 >>> 24) & 0xE0) >>> (5 + (3 - shift))));
xport2 <<= shift;
}
xport1 |= (((ieee_exp >>> 2) + 65) | ((ieee1 >>> 24) & 0x80)) << 24;
}
if (-260 <= ieee_exp && ieee_exp <= 248) {
long temp = ((xport1 & 0xFFFFFFFFL) << 32) | (xport2 & 0xFFFFFFFFL);
writeLong(temp, data, offset);
return;
}
writeLong(0xFFFFFFFFFFFFFFFFL, data, offset);
if (ieee_exp > 248) {
data[offset] = 0x7F;
}
}
public static void writeLong(long value, byte[] buffer, int offset) {
buffer[offset] = (byte)(value >>> 56);
buffer[offset + 1] = (byte)(value >>> 48);
buffer[offset + 2] = (byte)(value >>> 40);
buffer[offset + 3] = (byte)(value >>> 32);
buffer[offset + 4] = (byte)(value >>> 24);
buffer[offset + 5] = (byte)(value >>> 16);
buffer[offset + 6] = (byte)(value >>> 8);
buffer[offset + 7] = (byte)value;
}
And:
public static double readIBMDouble(byte[] data, int offset) {
long temp = readLong(data, offset);
long ieee = 0L;
long xport1 = temp >>> 32;
long xport2 = temp & 0x00000000FFFFFFFFL;
long ieee1 = xport1 & 0x00ffffff;
long ieee2 = xport2;
if (ieee2 == 0L && xport1 == 0L) {
return Double.longBitsToDouble(ieee);
}
int shift = 0;
int nib = (int)xport1;
if ((nib & 0x00800000) != 0) {
shift = 3;
} else if ((nib & 0x00400000) != 0) {
shift = 2;
} else if ((nib & 0x00200000) != 0) {
shift = 1;
}
if (shift != 0) {
ieee1 >>>= shift;
ieee2 = (xport2 >>> shift) | ((xport1 & 0x00000007) << (29 + (3 - shift)));
}
ieee1 &= 0xffefffff;
ieee1 |= (((((long)(data[offset] & 0x7f) - 65) << 2) + shift + 1023) << 20) | (xport1 & 0x80000000);
ieee = ieee1 << 32 | ieee2;
return Double.longBitsToDouble(ieee);
}
public static long readLong(byte[] buffer, int offset) {
long result = unsignedByteToLong(buffer[offset]) << 56;
result |= unsignedByteToLong(buffer[offset + 1]) << 48;
result |= unsignedByteToLong(buffer[offset + 2]) << 40;
result |= unsignedByteToLong(buffer[offset + 3]) << 32;
result |= unsignedByteToLong(buffer[offset + 4]) << 24;
result |= unsignedByteToLong(buffer[offset + 5]) << 16;
result |= unsignedByteToLong(buffer[offset + 6]) << 8;
result |= unsignedByteToLong(buffer[offset + 7]);
return result;
}
private static long unsignedByteToLong(byte value) {
return (long)value & 0xFF;
}
These are basically a one-to-one translation from what's in the document, except I convert the byte[] into a long up-front and just do bit-twiddling instead of working directly with bytes.
I also realized the code in the documentation had some special cases included for "missing" values that are specific to the SAS transport standard and have nothing to do with IBM hexidecimal floating point numbers. In fact, the Double.longBitsToDouble method detects the invalid bit-sequence and just sets the value to NaN. I moved this code out since it wasn't going to work anyway.
The good thing is that as part of this exercise I did learn a lot of tricks to bit manipulation in Java. For instance, a lot of the issues I ran into involving sign were resolved by using the >>> operator instead of the >> operator. Other than that, you just need to be careful upcasting to mask with 0xFF, 0xFFFF, etc. to make sure the sign is ignored.
I also learned about ByteBuffer which can facilitate loading back and forth among byte[] and primitives/strings; however, that comes with some minor overhead. But it would handle any endianness issues. It turns out endianness wasn't even a concern since most architectures in use today (x86) are little endian to begin with.
It seems reading/writing SAS transport files is a pretty common need, especially in the clinical trials arena so hopefully anyone working in Java/C# won't have to go through the trouble I did.
What is the Big O of bit count? I'm not sure how the method works, but I would assume it is done in O(logn).
Specifically with this code (where x = 4, y = 1):
return Integer.bitCount(x^y);
Given its implementation, the method consists of six O(1) statements performed in sequence, so it's O(1).
public static int bitCount(int i) {
// HD, Figure 5-2
i = i - ((i >>> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
i = (i + (i >>> 4)) & 0x0f0f0f0f;
i = i + (i >>> 8);
i = i + (i >>> 16);
return i & 0x3f;
}
It is O(1), here is its implementation for JDK 1.5+:
public static int bitCount(int i) {
i = i - ((i >>> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
i = (i + (i >>> 4)) & 0x0f0f0f0f;
i = i + (i >>> 8);
i = i + (i >>> 16);
return i & 0x3f;
}
Any algorithm that work on input of limited size have complexity of O(1).
bitCount works with input of limited size.
Therefore bitCount have complexity of O(1).
Working with BitSets I have a failing test:
BitSet bitSet = new BitSet();
bitSet.set(1);
bitSet.set(100);
logger.info("BitSet: " + BitSetHelper.toString(bitSet));
BitSet fromByteArray = BitSetHelper.fromByteArray(bitSet.toByteArray());
logger.info("fromByteArray: " + BitSetHelper.toString(bitSet));
Assert.assertEquals(2, fromByteArray.cardinality());
Assert.assertTrue(fromByteArray.get(1)); <--Assertion fail!!!
Assert.assertTrue(fromByteArray.get(100)); <--Assertion fail!!!
To be more weird I can see my String representation of both BitSets:
17:34:39.194 [main] INFO c.i.uniques.helper.BitSetHelperTest - BitSet: 00000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000
17:34:39.220 [main] INFO c.i.uniques.helper.BitSetHelperTest - fromByteArray: 00000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000
Are equals! What's happening here??
The used methods on this example are:
public static BitSet fromByteArray(byte[] bytes) {
BitSet bits = new BitSet();
for (int i = 0; i < bytes.length * 8; i++) {
if ((bytes[bytes.length - i / 8 - 1] & (1 << (i % 8))) > 0) {
bits.set(i);
}
}
return bits;
}
And the method used to get the String representation:
public static String toString(BitSet bitSet) {
StringBuffer buffer = new StringBuffer();
for (byte b : bitSet.toByteArray()) {
buffer.append(String.format("%8s", Integer.toBinaryString(b & 0xFF)).replace(' ', '0'));
}
return buffer.toString();
}
Some one could explain what's going on here?
Note that BitSet has a valueOf(byte[]) that already does this for you.
Inside your fromByteArray method
for (int i = 0; i < bytes.length * 8; i++) {
if ((bytes[bytes.length - i / 8 - 1] & (1 << (i % 8))) > 0) {
bits.set(i);
}
}
you're traversing your byte[] in reverse. On the first iteration,
bytes.length - i / 8 - 1
will evaluate to
8 - (0 / 8) - 1
which is 7, which will access the most significant byte. This is the one containing the 100th bit from your original bitset. Viewed from the reverse side, this is the fourth bit. And if you check the bits set in your generated BitSet, you'll notice the 5th and 98th (there might be an off by one bug here) bits are set.
But the byte[] returned by toByteArray() contains
a little-endian representation of all the bits in this bit set
You need to read the byte[] in the appropriate order
for (int i = 0; i < bytes.length * 8; i++) {
if ((bytes[i / 8] & (1 << (i % 8))) > 0) {
bits.set(i);
}
}
I'm trying to send Java's signed integers over TCP to a C client.
At the Java side, I write the integers to the outputstream like so:
static ByteBuffer wrapped = ByteBuffer.allocateDirect(4); // big-endian by default
public static void putInt(OutputStream out, int nr) throws IOException {
wrapped.rewind();
wrapped.putInt(nr);
wrapped.rewind();
for (int i = 0; i < 4; i++)
out.write(wrapped.get());
}
At the C side, I read the integers like so:
int cnt = 0;
char buf[1];
char sizebuf[4];
while(cnt < 4) {
iResult = recv(ConnectSocket, buf, 1, 0);
if (iResult <= 0) continue;
sizebuf[cnt] = buf[0];
cnt++;
}
However, how do I convert the char array to an integer in C?
Edit
I have tried the following (and the reverse):
int charsToInt(char* array) {
return (array[3] << 24) | (array[2] << 16) | (array[1] << 8) | array[0];
}
Edited again, because I forgot the tags.
Data
For example of what happens currently:
I receive:
char 0
char 0
char 12
char -64
the int becomes 2448
and use the function for creating the int from the char array:
int charsToInt(char* array) {
return ntohl(*((int*) array));
}
I expect the signed integer: 3264
Update
I will investigate more after some sleep..
Update
I have a Java client which interprets the integers correctly and receives the exact same bytes:
0
0
12
-64
That depends on endianness, but you want either:
int x = sizebuf[0] +
(sizebuf[1] << 8) +
(sizebuf[2] << 16) +
(sizebuf[3] << 24);
or:
int x = sizebuf[3] +
(sizebuf[2] << 8) +
(sizebuf[1] << 16) +
(sizebuf[0] << 24);
Note that sizebuf needs to have an unsigned type for this to work correctly. Otherwise you need to mask off any sign-extended values you don't want:
int x = (sizebuf[3] & 0x000000ff) +
((sizebuf[2] << 8) & 0x0000ff00) +
((sizebuf[1] << 16) & 0x00ff0000) +
((sizebuf[0] << 24) & 0xff000000);
The classical C library has the method you want already, and it is independent from the machine endianness: ntohl!
// buf is a char */uint8_t *
uint32_t from_network = *((uint32_t *) buf);
uint32_t ret = ntohl(from_network);
This, and htonl for the reverse etc expect that the "network order" is big endian.
(the code above presupposes that buf has at least 4 bytes; the return type, and argument type, of ntohl and htonl are uint32_t; the JLS defines an int as 4 bytes so you are guaranteed the result)
To convert you char array, one possibility is to cast it to int* and to store the result :
int result = *((int*) sizebuf)
This is valid and one line. Other possibility is to compute integer from chars.
for (i = 0 ; i < 4; i++)
result = result << sizeof(char) + buf[0]
Choose the one that you prefer.
Alexis.
Edit :
sizeof(char) is 1 because sizeof return a Byte result. So the right line is :
result = result << (sizeof(char) * 8) + buf[0]
I have a spec which reads the next two bytes are signed int.
To read that in java i have the following
When i read a signed int in java using the following code i get a value of 65449
Logic for calculation of unsigned
int a =(byte[1] & 0xff) <<8
int b =(byte[0] & 0xff) <<0
int c = a+b
I believe this is wrong because if i and with 0xff i get an unsigned equivalent
so i removed the & 0xff and the logic as given below
int a = byte[1] <<8
int b = byte[0] << 0
int c = a+b
which gives me the value -343
byte[1] =-1
byte[0]=-87
I tried to offset these values with the way the spec reads but this looks wrong.Since the size of the heap doesnt fall under this.
Which is the right way to do for signed int calculation in java?
Here is how the spec goes
somespec() { xtype 8 uint8 xStyle 16 int16 }
xStyle :A signed integer that represents an offset (in bytes) from the start of this Widget() structure to the start of an xStyle() structure that expresses inherited styles for defined by page widget as well as styles that apply specifically to this widget.
If you value is a signed 16-bit you want a short and int is 32-bit which can also hold the same values but not so naturally.
It appears you wants a signed little endian 16-bit value.
byte[] bytes =
short s = ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).getShort();
or
short s = (short) ((bytes[0] & 0xff) | (bytes[1] << 8));
BTW: You can use an int but its not so simple.
// to get a sign extension.
int i = ((bytes[0] & 0xff) | (bytes[1] << 8)) << 16 >> 16;
or
int i = (bytes[0] & 0xff) | (short) (bytes[1] << 8));
Assuming that bytes[1] is the MSB, and bytes[0] is the LSB, and that you want the answer to be a 16 bit signed integer:
short res16 = ((bytes[1] << 8) | bytes[0]);
Then to get a 32 bit signed integer:
int res32 = res16; // sign extends.
By the way, the specification should say which of the two bytes is the MSB, and which is the LSB. If it doesn't and if there aren't any examples, you can't implement it!
Somewhere in the spec it will say how an "int16" is represented. Paste THAT part. Or paste a link to the spec so that we can read it ourselves.
Take a look on DataInputStream.readInt(). You can either steel code from there or just use DataInputStream: wrap your input stream with it and then read typed data easily.
For your convenience this is the code:
public final int readInt() throws IOException {
int ch1 = in.read();
int ch2 = in.read();
int ch3 = in.read();
int ch4 = in.read();
if ((ch1 | ch2 | ch3 | ch4) < 0)
throw new EOFException();
return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
}
I can't compile it right now, but I would do (assuming byte1 and byte0 are realling of byte type).
int result = byte1;
result = result << 8;
result = result | byte0; //(binary OR)
if (result & 0x8000 == 0x8000) { //sign extension
result = result | 0xFFFF0000;
}
if byte1 and byte0 are ints, you will need to make the `&0xFF
UPDATE because Java forces the expression of an if to be a boolean
do you have a way of finding a correct output for a given input?
technically, an int size is 4 bytes, so with just 2 bytes you can't reach the sign bit.
I ran across this same problem reading a MIDI file. A MIDI file has signed 16 bit as well as signed 32 bit integers. In a MIDI file, the most significant bytes come first (big-endian).
Here's what I did. It might be crude, but it maintains the sign. If the least significant bytes come first (little-endian), reverse the order of the indexes.
pos is the position in the byte array where the number starts.
length is the length of the integer, either 2 or 4. Yes, a 2 byte integer is a short, but we all work with ints.
private int convertBytes(byte[] number, int pos, int length) {
int output = 0;
if (length == 2) {
output = ((int) number[pos]) << 24;
output |= convertByte(number[pos + 1]) << 16;
output >>= 16;
} else if (length == 4) {
output = ((int) number[pos]) << 24;
output |= convertByte(number[pos + 1]) << 16;
output |= convertByte(number[pos + 2]) << 8;
output |= convertByte(number[pos + 3]);
}
return output;
}
private int convertByte(byte number) {
return (int) number & 0xff;
}