Base64 and binary streams between Java and C# [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I feel like the answer is obvious but suppose I have the following in C#
using (MemoryStream ms = new MemoryStream())
{
using (BinaryWriter bw = new BinaryWriter(ms))
{
// Write some floats, bytes, and uints
// Convert.ToBase64String this stuff from ms.ToArray
}
}
and the following in Java (ok it's Scala but using Java libraries):
val byteStream = new ByteArrayOutputStream()
val outStream = new DataOutputStream(byteStream)
// Write some floats, bytes, and longs where the uints were using
// writeFloat, writeByte, and writeLong. .NET has an overloaded
// function that takes whatever.
// Base64.getEncoder.encodeToString byteStream.toByteArray
I get completely different base 64 strings. What are they doing different here? I need the Java output to match the .NET output. I assume its some sort of byte ordering issue but I haven't had any luck using ByteBuffer to correct this.
Java:
PczMzT3MzM0/gAAAPczMzQAAAAAAAAAAAAAAAD3MzM0/gAAAAQAAAABRn8XzAAAAAAAAAAEAAAAAAAAAAQ==
C# (with unknown = signs as we chop them off for reasons) :
zczMPc3MzD0AAIA/zczMPQAAAAAAAAAAAAAAAM3MzD0AAIA/AfPFn1EBAAAAAQAAAA
I really feel as though it is byte ordering which is why I tried using ByteBuffer in the Java code, order method, to change the ordering but I did not have success.
For further clarity the Java code is running on x86_64 CentOS Java 7 and the .NET is on x86_64 Windows Server 2008 .NET 4.These values are coming from Protobuf objects so they should be pretty cross platform I would think. Numerically the data is identical and consistent regardless of what I put in at least when I write at least these three data types. The only significant difference is the lack of an unsigned type in Java and perhaps there is a binary representation difference which is where I was initially trying to resolve but I do not seem to be able to figure it out.
As I have said. Using another format is not an option. I need the binary data written from java and then base 64 encoded to result in the same results as .NET. Serialization choices are not an option. This has to be it. I need a resource that will aid in bringing this together whether that means binary manipulation of byte data or not. I need some explanation in the datatypes and as I have searched significantly and not found a resource explaining how to do this or what the real differences are so I can implement a solution I decided to ask here.

The main problem is that C#'s BinaryWriter writes the data type's low bytes first, where as Java's DataOutputStream writes the high bytes first.
Also, when you write a .NET unsigned integer, that writes 4 bytes. But when you write a Java long, it writes 8 bytes. So that's another difference right there.
But fixing them to match is actually not that hard once you understand the differences. Here are 2 code snippets, one in C#, and the other in Java that encode the same information and output the same Base64-encoded string. In my case, I chose to override how Java writes the floats and longs.
.NET code sample
static void Main(string[] args)
{
using (MemoryStream ms = new MemoryStream())
{
using (BinaryWriter bw = new BinaryWriter(ms))
{
// floats
bw.Write(-456.678f);
bw.Write(0f);
bw.Write(float.MaxValue);
// bytes
bw.Write((byte)0);
bw.Write((byte)120);
bw.Write((byte)255);
// uints
bw.Write(0U);
bw.Write(65000U);
bw.Write(4294967295U);
}
var base64String = Convert.ToBase64String(ms.ToArray());
Console.WriteLine(base64String);
}
}
Java code sample
public static void main(String[] args) throws Exception {
try (ByteArrayOutputStream byteStream = new ByteArrayOutputStream()) {
try (DataOutputStream outStream = new DataOutputStream(byteStream)) {
// floats
writeFloat(-456.678f, outStream);
writeFloat(0f, outStream);
writeFloat(Float.MAX_VALUE, outStream);
// bytes
outStream.writeByte(0);
outStream.writeByte(120);
outStream.writeByte(255);
// longs (uints)
writeUint(0L, outStream);
writeUint(65000L, outStream);
writeUint(4294967295L, outStream);
}
String base64String = Base64.getEncoder().encodeToString(byteStream.toByteArray());
System.out.println(base64String);
}
}
private static void writeFloat(float f, DataOutputStream stream) throws Exception {
int val = Float.floatToIntBits(f);
stream.writeByte(val & 0xFF);
stream.writeByte((val >>> 8) & 0xFF);
stream.writeByte((val >>> 16) & 0xFF);
stream.writeByte((val >>> 24) & 0xFF);
}
private static void writeUint(long val, DataOutputStream stream) throws Exception {
stream.writeByte((int) (val & 0xFF));
stream.writeByte((int) ((val >>> 8) & 0xFF));
stream.writeByte((int) ((val >>> 16) & 0xFF));
stream.writeByte((int) ((val >>> 24) & 0xFF));
}
Output for both samples
yVbkwwAAAAD//39/AHj/AAAAAOj9AAD/////
Make sure you test edge cases with the float type and make adjustments where necessary. If it matters to you, I expect funny values like NaN to cause differences, but maybe you don't care about that. Otherwise, I expect that it will work fine.

How to implement cross-platform binary communication:
define exact byte format
implement in each platform
Frequently you can simplify both steps by using off-the-shelf protocol that is close to your needs (like https://en.wikipedia.org/wiki/BSON) and supported on one or all platforms you are interested in.
Note that generally basic binary serialization types in a given language/framework are targeting strictly that language/framework (and often particular version) as it frequently gives speed/size benefit and there is no well accepted standard on "binary object representation".
Alternative approach is to use well defined text formats like JSON/XML as suggested in other answer.
Some possible technical differences between binary formats:
serialization of integer types can differ by byte order/possible alternative representation (like compressed int in .Net)
size of boolean and enumeration types could be different
arrays/strings can use different types to represent length
padding may be added by some binary represenations
strings can be Utf8, Utf-16, or any other specified/unspecified encoding with or without trailing 0.

Different platforms have different binary representations. If you want to match base64 strings you should use json or xml serialization. Json or xml providing cross platform.
Editted: Don't missunderstand me: Base64 is standard encoding algorithm. It gives the same output for the same data. I mean byte array might be different.

Related

Implement fread (readInt) in java

I am attempting to convert a program that reads a binary file in C++ to java. The file is in little-endian.
fread(n, sizeof (unsigned), 1, inputFile);
The snippet above of c++ reads 1 integer into the integer variable 'n'.
I am currently using this method to accomplish the same thing:
public static int readInt(RandomAccessFile inputStream) throws IOException {
int retVal;
byte[] buffer = new byte[4];
inputStream.readFully(buffer);
ByteBuffer wrapped = ByteBuffer.wrap(buffer);
wrapped.order(ByteOrder.LITTLE_ENDIAN);
retVal = wrapped.getInt();
return retVal;
}
but this method sometimes differs in its result to the c++ example. I haven't been able to determine which parts of the file cause this method to fail, but I know it does. For example, when reading one part of the file my readInt method returns 543974774 but the C++ version returns 1.
Is there a better way to read little endian values in Java? Or is there some obvious flaw in my implementation? Any help understanding where I could be going wrong, or how could I could read these values in a more effective way would be very appreciated.
Update:
I am using RandomAcccessFile because I frequently require fseek functionality which RandomAccessFile provides in java
543974774 is, in hex, 206C6576.
There is no endianness on the planet that turns 206C6576 into '1'. The problem is therefore that you aren't reading what you think you're reading: If the C code is reading 4 bytes (or even a variable, unknown number of bytes) and turns that into '1', then your java code wasn't reading the same bytes - your C code and java code is out of sync: At some point, your C code read, for example, 2 bytes, and your java code read 4 bytes, or vice versa.
The problem isn't in your readInt method - that does the job properly every time.

Java: fastest way to serialize to a byte buffer

I'm required to work on a serialization library in Java which must be as fast as possible. The idea is to create various methods which will serialize the specified value and its associated key and puts them in a byte buffer. Several objects that wrap this buffer must be created since the objects that need to be serialized are potentially alot.
Considerations:
I know the Unsafe class may not be implemented in every JVM, but it's not a problem.
Premature optimization: this library has to be fast and this serialization is the only thing it has to do.
The objects once serialized are tipically small (less than 10k) but they are alot and they can be up to 2Gb big.
The underlying buffer can be expanded / reduced but I'll skip implementation details, the method is similar to the one used in the ArrayList implementation.
To clarify my situation: I have various methods like
public void putByte(short key, byte value);
public void putInt(short key, int value);
public void putFloat(short key, float value);
... and so on...
these methods append the key and the value in a byte stream, so if i call putInt(-1, 1234567890) my buffer would look like: (the stream is big endian)
key the integer value
[0xFF, 0xFF, 0x49, 0x96, 0x02, 0xD2]
In the end a method like toBytes() must be called to return a byte array which is a trimmed (if needed) version of the underlying buffer.
Now, my question is: what is the fastest way to do this in java?
I googled and stumbled upon various pages (some of these were on SO) and I also did some benchmarks (but i'm not really experienced in benchmarks and that's one of the reasons I'm asking for the help of more experienced programmers about this topic).
I came up with the following solutions:
1- The most immediate: a byte array
If I have to serialize an int it would look like this:
public void putInt(short key, int value)
{
array[index] = (byte)(key >> 8);
array[index+1] = (byte) key;
array[index+2] = (byte)(value >> 24);
array[index+3] = (byte)(value >> 16);
array[index+4] = (byte)(value >> 8);
array[index+5] = (byte) value;
}
2- A ByteBuffer (be it direct or a byte array wrapper)
The putInt method would look like the following
public void putInt(short key, int value)
{
byteBuff.put(key).put(value);
}
3- Allocation on native memory through Unsafe
Using the Unsafe class I would allocate the buffer on native memory and so the putInt would look like:
public void putInt(short key, int value)
{
Unsafe.putShort(address, key);
Unsafe.putInt(address+2, value);
}
4- allocation through new byte[], access through Unsafe
I saw this method in the lz4 compression library written in java. Basically once a byte array is instantiated i write bytes the following way:
public void putInt(short key, int value)
{
Unsafe.putShort(byteArray, BYTE_ARRAY_OFFSET + 0, key);
Unsafe.putInt(byteArray, BYTE_ARRAY_OFFSET + 2, value);
}
The methods here are simplified, but the basic idea is the one shown, I also have to implement the getter methods . Now, since i started to work in this i learnt the following things:
1- The JVM can remove array boundary checks if it's safe (in a for loop for example where the counter has to be less to the length of the array)
2- Crossing the JVM memory boundaries (reading/writing from/to native memory) has a cost.
3- Calling a native method may have a cost.
4- Unsafe putters and getters don't make boundary checks in native memory, nor on a regular array.
5- ByteBuffers wrap a byte array (non direct) or a plain native memory area (direct) so case 2 internally would look like case 1 or 3.
I run some benchmarks (but as I said I would like the opinion / experience of other developers) and it seems that case 4 is slightly (almost equals) to case 1 in reading and about 3 times faster in writing. It also seems that a for loop with Unsafe read and write (case 4) to copy an array to another (copying 8 bytes at time) is faster than System.arraycopy.
Long story made short (sorry for the long post):
case 1 seems to be fast, but that way I have to write a single byte each time + masking operations, which makes me think that maybe Unsafe, even if it's a call to native code may be faster.
case 2 is similar to case 1 and 3, so I could skip it (correct me if I'm missing something)
case 3 seems to be the slowest (at least from my benchmarks), also, I would need to copy from a native memory to a byte array because that's must be the output. But here this programmer claims it's the fastest way by far. If I understood correctly, what am I missing?
case 4 (as supported here) seems to be the fastest.
The number of choices and some contradictory information confuse me a bit, so can anyone clarify me these doubts?
I hope I wrote every needed information, otherwise just ask for clarifications.
Thanks in advance.
Case 5: DataOutputStream writing to a ByteArrayOutputStream.
Pro: it's already done; it's as fast as anything else you've mentioned here; all primitives are already implemented. The converse is DataInputStream reading from a ByteArrayInputStream.
Con: nothing I can think of.

String to Byte[] and Byte to String

Given the following example:
String f="FF00000000000000";
byte[] bytes = DatatypeConverter.parseHexBinary(f);
String f2= new String (bytes);
I want the output to be FF00000000000000 but it's not working with this method.
You're currently trying to interpret the bytes as if they were text encoded using the platform default encoding (UTF-8, ISO-8859-1 or whatever). That's not what you actually want to do at all - you want to convert it back to hex.
For that, just look at the converter you're using for the parsing step, and look for similar methods which work in the opposite direction. In this case, you want printHexBinary:
String f2 = DatatypeConverter.printHexBinary(bytes);
The approach of "look for reverse operations near the original operation" is a useful one in general... but be aware that sometimes you need to look at a parallel type, e.g. DataInputStream / DataOutputStream. When you find yourself using completely different types for inverse operations, that's usually a bit of a warning sign. (It's not always wrong, it's just worth investigating other options.)

A C structure accessed in Java

I have a C structure that is sent over some intermediate networks and gets received over a serial link by a java code. The Java code gives me a byte array that I now want to repackage it as the original structure. Now if the receive code was in C, this was simple. Is there any simple way to repackage a byte[] in java to a C struct. I have minimal experience in java but this doesnt appear to be a common problem or solved in any FAQ that I could find.
FYI the C struct is
struct data {
uint8_t moteID;
uint8_t status; //block or not
uint16_t tc_1;
uint16_t tc_2;
uint16_t panelTemp; //board temp
uint16_t epoch#;
uint16_t count; //pkt seq since the start of epoch
uint16_t TEG_v;
int16_t TEG_c;
}data;
I would recommend that you send the numbers across the wire in network byte order all the time. This eliminates the problems of:
Compiler specific word boundary generation for your structure.
Byte order specific to your hardware (both sending and receiving).
Also, Java's numbers are always stored in network-byte-order no matter the platform that you run Java upon (the JVM spec requires a specific byte order).
A very good class for extracting bits from a stream is java.nio.ByteBuffer, which can wrap arbitrary byte arrays; not just those coming from a I/O class in java.nio. You really should not hand code your own extraction of primitive values if at all possible (i.e. bit shifting and so forth) since it is easy to get this wrong, the code is the same for every instance of the same type, and there are plenty of standard classes that provide this for you.
For example:
public class Data {
private byte moteId;
private byte status;
private short tc_1;
private short tc_2;
//...etc...
private int tc_2_as_int;
private Data() {
// empty
}
public static Data createFromBytes(byte[] bytes) throws IOException {
final Data data = new Data();
final ByteBuffer buf = ByteBuffer.wrap(bytes);
// If needed...
//buf.order(ByteOrder.LITTLE_ENDIAN);
data.moteId = buf.get();
data.status = buf.get();
data.tc_1 = buf.getShort();
data.tc_2 = buf.getShort();
// ...extract other fields here
// Example to convert unsigned short to a positive int
data.tc_2_as_int = buf.getShort() & 0xffff;
return data;
}
}
Now, to create one, just call Data.createFromBytes(byteArray).
Note that Java does not have unsigned integer variables, but these will be retrieved with the exact same bit pattern. So anything where the high-order bit is not set will be exactly the same when used. You will need to deal with the high-order bit if you expected that in your unsigned numbers. Sometimes this means storing the value in the next larger integer type (byte -> short; short -> int; int -> long).
Edit: Updated the example to show how to convert a short (16-bit signed) to an int (32-bit signed) with the unsigned value with tc_2_as_int.
Note also that if you cannot change the byte-order and it is not in network order, then java.nio.ByteBuffer can still serve you here with buf.order(ByteOrder.LITTLE_ENDIAN); before retrieving the values.
This can be difficult to do when sending from C to C.
If you have a data struct, cast it so that you end up with an array of bytes/chars and then you just blindly send it you can sometimes end up with big problems decoding it on the other end.
This is because sometimes the compiler has decided to optimize the way that the data is packed in the struct, so in raw bytes it may not look exactly how you expect it would look based on how you code it.
It really depends on the compiler!
There are compiler pragma's you can use to make packing unoptimized. See C/C++ Preprocessor Reference - pack
The other problem is the 32/64-bit bit problem if you just use "int", and "long" without specifying the number of bytes... but you have done that :-)
Unfortunately, Java doesnt really have structs... but it represents the same information in classes.
What I recommend is that you make a class that consists of your variables, and just make a custom unpacking function that will pull the bytes out from the received packet (after you have checked its correctness after transfer) and then load them in to the class.
e.g. You have a data class like
class Data
{
public int moteID;
public int status; //block or not
public int tc_1;
public int tc_2;
}
Then when you receive a byte array, you can do something like this
Data convertBytesToData(byte[] dataToConvert)
{
Data d = Data();
d.moteId = (int)dataToConvert[0];
d.status = (int)dataToConvert[1];
d.tc_1 = ((int)dataToConvert[2] << 8) + dataTocConvert[3]; // unpacking 16-bits
d.tc_2 = ((int)dataToConvert[4] << 8) + dataTocConvert[5]; // unpacking 16-bits
}
I might have the 16-bit unpacking the wrong way around, it depends on the endian of your C system, but you'll be able to play around and see if its right or not.
I havent played with Java for sometime, but hopefully there might be byte[] to int functions built in these days.
I know there are for C# anyway.
With all this in mind, if you are not doing high data rate transfers, definately look at JSON and Protocol Buffers!
Assuming you have control over both ends of the link, rather than sending raw data you might be better off going for an encoding that C and Java can both use. Look at either JSON or Protocol Buffers.
What you are trying to do is problematic for a couple of reasons:
Different C implementations will represent uint16_t (and int16_t) values in different ways. In some cases, the most significant byte will be first when the struct is laid out in memory. In other cases, the least significant byte will.
Different C compilers may pack the fields of the struct differently. So it is possible (for example) that the fields have been reordered or padding may have been added.
So what this all means is that you have to figure out exactly the struct is laid out ... and just hope that this doesn't change when / if you change C compilers or C target platform.
Having said that, I could not find a Java library for decoding arbitrary binary data streams that allows you to select "endian-ness". The DataInputStream and DataOutputStream classes may be the answer, but they are explicitly defined to send/expect the high order byte first. If your data comes the other way around you will need to do some Java bit bashing to fix it.
EDIT : actually (as #Kevin Brock points out) java.nio.ByteBuffer allows you to specify the endian-ness when fetching various data types from a binary buffer.

How to get data out of network packet data in Java

In C if you have a certain type of packet, what you generally do is define some struct and cast the char * into a pointer to the struct. After this you have direct programmatic access to all data fields in the network packet. Like so :
struct rdp_header {
int version;
char serverId[20];
};
When you get a network packet you can do the following quickly :
char * packet;
// receive packet
rdp_header * pckt = (rdp_header * packet);
printf("Servername : %20.20s\n", pckt.serverId);
This technique works really great for UDP based protocols, and allows for very quick and very efficient packet parsing and sending using very little code, and trivial error handling (just check the length of the packet). Is there an equivalent, just as quick way in java to do the same ? Or are you forced to use stream based techniques ?
Read your packet into a byte array, and then extract the bits and bytes you want from that.
Here's a sample, sans exception handling:
DatagramSocket s = new DatagramSocket(port);
DatagramPacket p;
byte buffer[] = new byte[4096];
while (true) {
p = new DatagramPacket(buffer, buffer.length);
s.receive(p);
// your packet is now in buffer[];
int version = buffer[0] << 24 + buffer[1] << 16 + buffer[2] < 8 + buffer[3];
byte[] serverId = new byte[20];
System.arraycopy(buffer, 4, serverId, 0, 20);
// and process the rest
}
In practise you'll probably end up with helper functions to extract data fields in network order from the byte array, or as Tom points out in the comments, you can use a ByteArrayInputStream(), from which you can construct a DataInputStream() which has methods to read structured data from the stream:
...
while (true) {
p = new DatagramPacket(buffer, buffer.length);
s.receive(p);
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
DataInput di = new DataInputStream(bais);
int version = di.readInt();
byte[] serverId = new byte[20];
di.readFully(serverId);
...
}
I don't believe this technique can be done in Java, short of using JNI and actually writing the protocol handler in C. The other way to do the technique you describe is variant records and unions, which Java doesn't have either.
If you had control of the protocol (it's your server and client) you could use serialized objects (inc. xml), to get the automagic (but not so runtime efficient) parsing of the data, but that's about it.
Otherwise you're stuck with parsing Streams or byte arrays (which can be treated as Streams).
Mind you the technique you describe is tremendously error prone and a source of security vulnerabilities for any protocol that is reasonably interesting, so it's not that great a loss.
I wrote something to simplify this kind of work. Like most tasks, it was much easier to write a tool than to try to do everything by hand.
It consisted of two classes, Here's an example of how it was used:
// Resulting byte array is 9 bytes long.
byte[] ba = new ByteArrayBuilder()
.writeInt(0xaaaa5555) // 4 bytes
.writeByte(0x55) // 1 byte
.writeShort(0x5A5A) // 2 bytes
.write( (new BitBuilder()) // 2 bytes---0xBA12
.write(3, 5) // 101 (3 bits value of 5)
.write(2, 3) // 11 (2 bits value of 3)
.write(3, 2) // 010 (...)
.write(2, 0) // 00
.write(2, 1) // 01
.write(4, 2) // 0002
).getBytes();
I wrote the ByteArrayBuilder to simply accumulate bits. I used a method chaining pattern (Just returning "this" from all methods) to make it easier to write a bunch of statements together.
All the methods in the ByteArrayBuilder were trivial, just like 1 or 2 lines of code (I just wrote everything to a data output stream)
This is to build a packet, but tearing one apart shouldn't be any harder.
The only interesting method in BitBuilder is this one:
public BitBuilder write(int bitCount, int value) {
int bitMask=0xffffffff;
bitMask <<= bitCount; // If bitcount is 4, bitmask is now ffffff00
bitMask = ~bitMask; // and now it's 000000ff, a great mask
bitRegister <<= bitCount; // make room
bitRegister |= (value & bitMask); // or in the value (masked for safety)
bitsWritten += bitCount;
return this;
}
Again, the logic could be inverted very easily to read a packet instead of build one.
edit: I had proposed a different approach in this answer, I'm going to post it as a separate answer because it's completely different.
Look at the Javolution library and its struct classes, they will do just what you are asking for. In fact, the author has this exact example, using the Javolution Struct classes to manipulate UDP packets.
This is an alternate proposal for an answer I left above. I suggest you consider implementing it because it would act pretty much the same as a C solution where you could pick fields out of a packet by name.
You might start it out with an external text file something like this:
OneByte, 1
OneBit, .1
TenBits, .10
AlsoTenBits, 1.2
SignedInt, +4
It could specify the entire structure of a packet, including fields that may repeat. The language could be as simple or complicated as you need--
You'd create an object like this:
new PacketReader packetReader("PacketStructure.txt", byte[] packet);
Your constructor would iterate over the PacketStructure.txt file and store each string as the key of a hashtable, and the exact location of it's data (both bit offset and size) as the data.
Once you created an object, passing in the bitStructure and a packet, you could randomly access the data with statements as straight-forward as:
int x=packetReader.getInt("AlsoTenBits");
Also note, this stuff would be much less efficient than a C struct, but not as much as you might think--it's still probably many times more efficient than you'll need. If done right, the specification file would only be parsed once, so you would only take the minor hit of a single hash lookup and a few binary operations for each value you read from the packet--not bad at all.
The exception is if you are parsing packets from a high-speed continuous stream, and even then I doubt a fast network could flood even a slowish CPU.
Short answer, no you can't do it that easily.
Longer answer, if you can use Serializable objects, you can hook your InputStream up to an ObjectInputStream and use that to deserialize your objects. However, this requires you have some control over the protocol. It also works easier if you use a TCP Socket. If you use a UDP DatagramSocket, you will need to get the data from the packet and then feed that into a ByteArrayInputStream.
If you don't have control over the protocol, you may be able to still use the above deserialization method, but you're probably going to have to implement the readObject() and writeObject() methods rather than using the default implementation given to you. If you need to use someone else's protocol (say because you need to interop with a native program), this is likely the easiest solution you are going to find.
Also, remember that Java uses UTF-16 internally for strings, but I'm not certain that it serializes them that way. Either way, you need to be very careful when passing strings back and forth to non-Java programs.

Categories