First off, I saw Java equivalent of Python's struct.pack?... this is a clarification.
I am new to Java and trying to mirror some of the techniques that I have used in Python. I am trying to send data over the network, and want to ensure I know what it looks like. In python, I would use struct.pack. For example:
data = struct.pack('i', 10)
data += "Some string"
data += struct.pack('i', 500)
print(data)
That would print the packed portions in byte order with the string in plaintext in the middle.
I tried to replicate that with ByteBuffer:
String somestring = "Some string";
ByteBuffer buffer = ByteBuffer.allocate(100);
buffer.putInt(10);
buffer.put(somestring.getbytes());
buffer.putInt(500);
System.out.println(buffer.array());
What part am I not understanding?
That sounds more complicated than you really need.
I suggest using DataOutputStream and BufferedOutputStream:
DataOutputStream dos = new DataOutputStream(
new BufferedOutputStream(socket.getOutputStream()));
dos.writeInt(50);
dos.writeUTF("some string"); // this includes a 16-bit unsigned length
dos.writeInt(500);
This avoids creating more objects than needed by writing directly to the stream.
if use https://github.com/raydac/java-binary-block-parser then the code will be much easier
JBBPOut.BeginBin().Int(10).Utf8("Some string").Int(500).End().toByteArray();
Related
I'm working on a string compressor for a school assignment,
There's one bug that I can't seem to work out. The compressed data is being written a file using a FileWriter, represented by a byte array. The compression algorithm returns an input stream so the data flows as such:
piped input stream
-> input stream reader
-> data stored in char buffer
-> data written to file with file writer.
Now, the bug is, that with some very specific strings, the second to last byte in the byte array is written wrong. and it's always the same bit values "11111100".
Every time it's this bit values and always the second to last byte.
Here are some samples from the code:
InputStream compress(InputStream){
//...
//...
PipedInputStream pin = new PipedInputStream();
PipedOutputStream pout = new PipedOutputStream(pin);
ObjectOutputStream oos = new ObjectOutputStream(pout);
oos.writeObject(someobject);
oos.flush();
DataOutputStream dos = new DataOutputStream(pout);
dos.writeFloat(//);
dos.writeShort(//);
dos.write(SomeBytes); // ---Here
dos.flush();
dos.close();
return pin;
}
void write(char[] cbuf, int off, int len){
//....
//....
InputStreamReader s = new InputStreamReader(
c.compress(new ByteArrayInputStream(str.getBytes())));
s.read(charbuffer);
out.write(charbuffer);
}
A string which triggers it is "hello and good evenin" for example.
I have tried to iterate over the byte array and write them one by one, it didn't help.
It's also worth noting that when I tried to write to a file using the output stream in the algorithm itself it worked fine. This design was not my choice btw.
So I'm not really sure what i'm doing wrong here.
Considering that you're saying:
Now, the bug is, that with some very specific strings, the second to
last byte in the byte array is written wrong. and it's always the same
bit values "11111100".
You are taking a
binary stream (the compressed data)
-> reading it as chars
-> then writing it as chars.
And your are converting bytes to chars without clearly defining the encoding.
I'd say that the problem is that your InputStreamReader is translating some byte sequences in a way that you're not expecting.
Remember that in encodings like utf-8 two or three bytes may become one single char.
It can't be coincidence that the very byte pattern you pointed out (11111100) Is one of the utf-8 escape codes (1111110x). Check this wikipedia table at and you'll see that uft-8 is destructive since if a byte starts with: 1111110x the next must start with 10xxxxxx.
Meaning that if using utf-8 to convert
bytes1[] -> chars[] -> bytes2[]
in some cases bytes2 will be different from bytes1.
I recommend changing your code to remove those readers. Or specify ASCII encoding to see if that prevent the translations.
I solved this by encoding and decoding the bytes with Base64.
I have a Java client connected via socket to a C++ server.
The C++ server sends back to the client serialized objects.
However serialization works differently for Java and C++, so I cannot read the objects in that way:
objectInputStream.readObject();
This forces me to read each single value of the object manually:
byte[] buffer = read(FOUR_BYTES);
int flag = convertBufferToInt(buffer);
byte[] buffer = read(FOUR_BYTES);
float price = convertBufferToFloat(buffer);
// More stuff
myObject.setFlag(flag);
myObject.setPrice(price);
// More stuff
That's very hard to maintain. Isn't there an easier way to fill in my object with data?
To solve this in general you would need to write a C++ parser for objects serialized in Java. This is no small task.
Rather, I would recommend that you find some serialization format that is easy to parse and share between your Java and C++ programs. Preferably a format where there exists Java as well as C++ libraries for the serialization/deserialization. JSON or Google Protocol Buffers are obvious candidates.
Yes there is (are). You have 2 options using only the standard library:
Using the DataInputStream class
Check out the DataInputStream class. It has methods to read values of primitive types like readByte(), readInt(), readLong(), readFloat(), readChar(), readUTF() (for reading UTF-8 encoded String) etc.
So your code becomes as simple as:
// Obtain InputStream from Socket:
InputStream is = ...;
// Create DataInputStream:
DataInputStream dis = new DataInputStream(is);
myObject.setFlag(dis.readInt());
myObject.setPrice(dis.readFloat());
Using the ByteBuffer class
For this you have to read first the whole data into a byte array. Once you've done that, you can create a ByteBuffer using the ByteBuffer.wrap(byte[] array) method. The ByteBuffer class also supports reading primitive types just like the DataInputStream class.
The good thing about ByteBuffer that it supports changing the byte order (the order how the low and high bytes of a multi-byte value like int are read/written): ByteBuffer.order(ByteOrder bo). This is very useful if you're communicating with systems which use a differnet byte order (which might apply in your case).
Example using ByteBuffer:
// Read all your input data:
byte[] data = ...;
// Create ByteBuffer:
ByteBuffer bb = ByteBuffer.wrap(data);
myObject.setFlag(bb.getInt());
myObject.setPrice(bb.getFloat());
I have a data structure java.nio.HeapByteBuffer[pos=71098 lim=71102 cap=94870], which I need to convert into Int (in Scala), the conversion might look simple but whatever which I approach , i did not get right conversion. could you please help me?
Here is my code snippet:
val v : ByteBuffer= map.get("company").get
val utf_str = new String(v, java.nio.charset.StandardCharsets.UTF_8)
println (utf_str)
the output is just "R" ??
I can't see how you can even get that to compile, String has constructors that accepts another string or possibly an array, but not a ByteBuffer or any of its parents.
To work with the nio buffer api you first write to a buffer, then do a flip before you read from the buffer, there are lots of good resources online about that. This one for example: http://tutorials.jenkov.com/java-nio/buffers.html
How to read that as a string entirely depends on how the characters are encoded inside the buffer, if they are two bytes per character (as strings are in Java/the JVM) you can convert your buffer to a character buffer by using asCharBuffer.
So, for example:
val byteBuffer = ByteBuffer.allocate(7).order(ByteOrder.BIG_ENDIAN);
byteBuffer.putChar('H').putChar('i').putChar('!')
byteBuffer.flip()
val charBuffer = byteBuffer.asCharBuffer
assert(charBuffer.toString == "Hi!")
Java newbie here. Are there any helper functions to serialize data in and out of byte arrays? I am writing a Java package that implements a network protocol. So I have to write some typical variables like a version (1byte), sequence Number (long) and binary data (bytes) in a loop. How do I do this in Java? Coming from C I am thinking of creating a byte array of the required size and then since there is no memcpy() I am converting the long into a temporary byte array and then copying it into the actual byte array. It seems so inefficient and also really error prone. Is there a class I could use to marshall and unmarshall parameters to a byte array?
Also why does all the Socket classes only deals with char[] and not byte[]? A socket by definition has to deal with binary data also. How is this done in Java?
I am sure what I am missing is the Java mindset. Appreciate it if some one can point it to me.
EDIT: I did look at DataOutputStream and DataInputStream but I cannot convert the bytes to a String not to a byte[] which means the information might be lost in the conversion to write to a socket.
Pav
Have a look at DataInputStream, DataOutputStream, ObjectInputStream and ObjectOutputStream. Check first if the layout of the data is acceptable to you. Also, Serialization.
Sockets neither deal with char[] nor with byte[] but with InputStream and OutputStream which are used to read and write bytes.
If you are sending the data over a socket, then you don't need a temporary byte array at all; you can wrap the socket's OutputStream with DataOutputStream or ObjectOutputStream and just write what you want to write.
There might be an aspect I've missed that means you do actually need temporary byte arrays. If so, look at ByteArrayOutputStream. Also, there's no memcpy(), sure, but there is System.arraycopy.
As above, DataInputStream and DataOutputStream are exactly what you are looking for. Re your comment about String, if you're planning to use Java Strings over the wire, you're not designing a network protocol, youre designing a Java protocol. There are readUTF() and writeUTF() if you're sure the other end is Java or if you can code the other end to understand these formats. Or you can send as bytes along with the appropriate charset, or predefine the charset for the entire protocol if that makes sense.
In C if you have a certain type of packet, what you generally do is define some struct and cast the char * into a pointer to the struct. After this you have direct programmatic access to all data fields in the network packet. Like so :
struct rdp_header {
int version;
char serverId[20];
};
When you get a network packet you can do the following quickly :
char * packet;
// receive packet
rdp_header * pckt = (rdp_header * packet);
printf("Servername : %20.20s\n", pckt.serverId);
This technique works really great for UDP based protocols, and allows for very quick and very efficient packet parsing and sending using very little code, and trivial error handling (just check the length of the packet). Is there an equivalent, just as quick way in java to do the same ? Or are you forced to use stream based techniques ?
Read your packet into a byte array, and then extract the bits and bytes you want from that.
Here's a sample, sans exception handling:
DatagramSocket s = new DatagramSocket(port);
DatagramPacket p;
byte buffer[] = new byte[4096];
while (true) {
p = new DatagramPacket(buffer, buffer.length);
s.receive(p);
// your packet is now in buffer[];
int version = buffer[0] << 24 + buffer[1] << 16 + buffer[2] < 8 + buffer[3];
byte[] serverId = new byte[20];
System.arraycopy(buffer, 4, serverId, 0, 20);
// and process the rest
}
In practise you'll probably end up with helper functions to extract data fields in network order from the byte array, or as Tom points out in the comments, you can use a ByteArrayInputStream(), from which you can construct a DataInputStream() which has methods to read structured data from the stream:
...
while (true) {
p = new DatagramPacket(buffer, buffer.length);
s.receive(p);
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
DataInput di = new DataInputStream(bais);
int version = di.readInt();
byte[] serverId = new byte[20];
di.readFully(serverId);
...
}
I don't believe this technique can be done in Java, short of using JNI and actually writing the protocol handler in C. The other way to do the technique you describe is variant records and unions, which Java doesn't have either.
If you had control of the protocol (it's your server and client) you could use serialized objects (inc. xml), to get the automagic (but not so runtime efficient) parsing of the data, but that's about it.
Otherwise you're stuck with parsing Streams or byte arrays (which can be treated as Streams).
Mind you the technique you describe is tremendously error prone and a source of security vulnerabilities for any protocol that is reasonably interesting, so it's not that great a loss.
I wrote something to simplify this kind of work. Like most tasks, it was much easier to write a tool than to try to do everything by hand.
It consisted of two classes, Here's an example of how it was used:
// Resulting byte array is 9 bytes long.
byte[] ba = new ByteArrayBuilder()
.writeInt(0xaaaa5555) // 4 bytes
.writeByte(0x55) // 1 byte
.writeShort(0x5A5A) // 2 bytes
.write( (new BitBuilder()) // 2 bytes---0xBA12
.write(3, 5) // 101 (3 bits value of 5)
.write(2, 3) // 11 (2 bits value of 3)
.write(3, 2) // 010 (...)
.write(2, 0) // 00
.write(2, 1) // 01
.write(4, 2) // 0002
).getBytes();
I wrote the ByteArrayBuilder to simply accumulate bits. I used a method chaining pattern (Just returning "this" from all methods) to make it easier to write a bunch of statements together.
All the methods in the ByteArrayBuilder were trivial, just like 1 or 2 lines of code (I just wrote everything to a data output stream)
This is to build a packet, but tearing one apart shouldn't be any harder.
The only interesting method in BitBuilder is this one:
public BitBuilder write(int bitCount, int value) {
int bitMask=0xffffffff;
bitMask <<= bitCount; // If bitcount is 4, bitmask is now ffffff00
bitMask = ~bitMask; // and now it's 000000ff, a great mask
bitRegister <<= bitCount; // make room
bitRegister |= (value & bitMask); // or in the value (masked for safety)
bitsWritten += bitCount;
return this;
}
Again, the logic could be inverted very easily to read a packet instead of build one.
edit: I had proposed a different approach in this answer, I'm going to post it as a separate answer because it's completely different.
Look at the Javolution library and its struct classes, they will do just what you are asking for. In fact, the author has this exact example, using the Javolution Struct classes to manipulate UDP packets.
This is an alternate proposal for an answer I left above. I suggest you consider implementing it because it would act pretty much the same as a C solution where you could pick fields out of a packet by name.
You might start it out with an external text file something like this:
OneByte, 1
OneBit, .1
TenBits, .10
AlsoTenBits, 1.2
SignedInt, +4
It could specify the entire structure of a packet, including fields that may repeat. The language could be as simple or complicated as you need--
You'd create an object like this:
new PacketReader packetReader("PacketStructure.txt", byte[] packet);
Your constructor would iterate over the PacketStructure.txt file and store each string as the key of a hashtable, and the exact location of it's data (both bit offset and size) as the data.
Once you created an object, passing in the bitStructure and a packet, you could randomly access the data with statements as straight-forward as:
int x=packetReader.getInt("AlsoTenBits");
Also note, this stuff would be much less efficient than a C struct, but not as much as you might think--it's still probably many times more efficient than you'll need. If done right, the specification file would only be parsed once, so you would only take the minor hit of a single hash lookup and a few binary operations for each value you read from the packet--not bad at all.
The exception is if you are parsing packets from a high-speed continuous stream, and even then I doubt a fast network could flood even a slowish CPU.
Short answer, no you can't do it that easily.
Longer answer, if you can use Serializable objects, you can hook your InputStream up to an ObjectInputStream and use that to deserialize your objects. However, this requires you have some control over the protocol. It also works easier if you use a TCP Socket. If you use a UDP DatagramSocket, you will need to get the data from the packet and then feed that into a ByteArrayInputStream.
If you don't have control over the protocol, you may be able to still use the above deserialization method, but you're probably going to have to implement the readObject() and writeObject() methods rather than using the default implementation given to you. If you need to use someone else's protocol (say because you need to interop with a native program), this is likely the easiest solution you are going to find.
Also, remember that Java uses UTF-16 internally for strings, but I'm not certain that it serializes them that way. Either way, you need to be very careful when passing strings back and forth to non-Java programs.