Unserialize an array of bytes taking account of its useful length - java

I have an array of bytes whose length equals XXX. It contains a serialized object which I want to unserialise (ie. : I want to create a copy of this object from these stored bytes).
But I have a constraint : the useful length of my bytes array. Indeed, I want to take in consideration the latter to unserialise (ie. : the serialized object can be shorter than the array's size).
I hope you will understand easier with my two little methods (the first serialises, while the last unserialises) :
byte[] toBytes() throws IOException {
byte[] array_bytes;
ByteArrayOutputStream byte_array_output_stream = new ByteArrayOutputStream();
ObjectOutput object_output = new ObjectOutputStream(byte_array_output_stream);
object_output.writeObject(this);
object_output.close();
array_bytes = byte_array_output_stream.toByteArray();
return array_bytes;
}
And the current unserialisation method (which is "wrong" for the moment because I don't use the useful length) :
static Message fromBytes(byte[] bytes, int length) throws IOException, ClassNotFoundException, ClassCastException {
Message message;
ByteArrayInputStream byte_array_input_stream = new ByteArrayInputStream(bytes);
ObjectInput object_input = new ObjectInputStream(byte_array_input_stream);
message = (Message) object_input.readObject();
object_input.close();
return message;
}
As you can see, readObject doesn't need a length, and I must : that's a problem, and perhaps I should NOT use this method.
Thus, my question is : With or without using readObject, how could I take in consideration the useful length (ie. : "payload" ?) of my bytes array ?

I assume that your Message class implements Serializable.
In this case, when you write your message, it gets automatically serialized from the java runtime, as explained in the Serializable interface.
I cannot be sure how or why you might find part of the generated byte array as not useful, since it is all part of the serialized instance.
However, I might suggest that you follow the Externalizable interface way:
your Message class will implement Externalizable. Then you have the option of controlling how exactly your class gets serialized and de-serialized in writeExternal(ObjectOutput out) and readExternal(ObjectInput in) methods respectively, where you can write the length you want in the stream, read it back, and/or keep only the required amount of bytes.

Related

Java write a byte array with given ObjectOutputStream

I have a serializable class with custom writeObject() and readObject() methods.
When an object serializes, it needs to write two byte arrays, one after another. When something deserializes it, it needs to read those two arrays.
This is my code:
private void writeObject (final ObjectOutputStream out) throws IOException {
..
out.writeByte(this.signature.getV()); //one byte
out.writeObject(this.signature.getR()); //an array of bytes
out.writeObject(this.signature.getS()); //an array of bytes
out.close();
}
private void readObject (final ObjectInputStream in) throws IOException, ClassNotFoundException {
..
v = in.readByte();
r = (byte[])in.readObject();
s = (byte[])in.readObject();
this.signature = new Sign.SignatureData(v, r, s); //creating a new object because
//sign.signaturedata
// is not serializable
in.close();
}
When the object is being deserialized (readObject method) it throws an EOFException and all three variables are null/undefined.
Relating to question title, I saw a class called ByteArrayOutputStream, but to use it, it has to be enclosed in a ObjectOutputStream, which I cannot do, ad I have an OutputStream given and must write with it.
1. How do one properly write a byte array using objectOutputStream and properly reads it using ObjectInputStream?
2. Why the code above throws an EOFException without reading even one variable?
EDIT: I need to clarify: the readObject() and writeObject() are called by jvm itself while deserializing and serializing the object.
The second thing is, the SignatureData is a subclass to Sign, that comes from a third-party library - and that's why it's not serializable.
The third thing is, the problem probably lies in the reading and writing byte arrays by ObjectInput/ObjectOutput streams, not in the Sign.SignatureData class.

Google Protobuf ByteString vs. Byte[]

I am working with google protobuf in Java.
I see that it is possible to serialize a protobuf message to String, byte[], ByteString, etc:
(Source: https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite)
I don't know what a ByteString is. I got the following definition from the the protobuf API documentation (source: https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/ByteString):
"Immutable sequence of bytes. Substring is supported by sharing the reference to the immutable underlying bytes, as with String."
It is not clear to me how a ByteString is different from a String or byte[].
Can somebody please explain?
Thanks.
You can think of ByteString as an immutable byte array. That's pretty much it. It's a byte[] which you can use in a protobuf. Protobuf does not let you use Java arrays because they're mutable.
ByteString exists because String is not suitable for representing arbitrary sequences of bytes. String is specifically for character data.
The protobuf MessageLite Interface provides toByteArray() and toByteString() methods. If ByteString is an immutable byte[], would the byte representation of a message represented by both ByteString and byte[] be the same?
Sort of. If you call toByteArray() you'll get the same value as if you were to call toByteString().toByteArray(). Compare the implementation of the two methods, in AbstractMessageLite:
public ByteString toByteString() {
try {
final ByteString.CodedBuilder out =
ByteString.newCodedBuilder(getSerializedSize());
writeTo(out.getCodedOutput());
return out.build();
} catch (IOException e) {
throw new RuntimeException(
"Serializing to a ByteString threw an IOException (should " +
"never happen).", e);
}
}
public byte[] toByteArray() {
try {
final byte[] result = new byte[getSerializedSize()];
final CodedOutputStream output = CodedOutputStream.newInstance(result);
writeTo(output);
output.checkNoSpaceLeft();
return result;
} catch (IOException e) {
throw new RuntimeException(
"Serializing to a byte array threw an IOException " +
"(should never happen).", e);
}
}
A ByteString gives you the ability to perform more operations on the underlying data without having to copy the data into a new structure. For instance, if you wanted to provide a subset of bytes in a byte[] to another method, you would need to supply it with a start index and an end index. You can also concatenate ByteStrings without having to create a new data structure and manually copy the data.
However, with a ByteString you can give the method a subset of that data without the method knowing anything about the underlying storage. Just like a a substring of a normal String.
A String is for representing text and is not a good way to store binary data (as not all binary data has a textual equivalent unless you encode it in a manner that does: e.g. hex or Base64).

Serialize multiple objects into a single file using Kryo

As far I know, Kryo serialization / deserialization happens per object. Is it possible to serialize multiple objects into a single file?. One of workaround suggested in another similar SO question was to use an array of objects. Considering a huge amount of data that needs to be serialized, I feel it would not be as efficient as it should be. Is it right assumption?
Does Kryo API take an OutputStream? If so, just feed it the same OutputStream to serialize multiple files. Do the same with InputStream when reading. A good serialization format will have length encodings or termination symbols and would not rely on EOF for anything.
The array approach would also work with minimal overhead as long as all of these objects are already in memory. You are talking about adding just a few bytes per object to create an array to hold them. If they aren't all in memory, you would have to load them all into memory first to create an array around them. That could definitely become a problem given large enough data set.
As Kryo supports streaming there is nothing to stop you writing/reading more than one object to kryo "at the top level". For example the following program writes two unrelated objects to a file and then deserializes them again
public class TestClass{
public static void main(String[] args) throws FileNotFoundException{
serialize();
deSerialize();
}
public static void serialize() throws FileNotFoundException{
Collection<String>collection=new ArrayList<>();
int otherData=12;
collection.add("This is a serialized collection of strings");
Kryo kryo = new Kryo();
Output output = new Output(new FileOutputStream("testfile"));
kryo.writeClassAndObject(output, collection);
kryo.writeClassAndObject(output, otherData); //we could add as many of these as we like
output.close();
}
public static void deSerialize() throws FileNotFoundException{
Collection<String>collection;
int otherData;
Kryo kryo = new Kryo();
Input input = new Input(new FileInputStream("testfile"));
collection=(Collection<String>)kryo.readClassAndObject(input);
otherData=(Integer)kryo.readClassAndObject(input);
input.close();
for(String string: collection){
System.out.println(string);
}
System.out.println("There are other things too! like; " + otherData);
}
}

Reading/writing binary structures: how to simplify this code?

I'm writing a network app, which sends and receives a lot of different kinds of binary packets, and I'm trying to make adding new kinds of packets to my app as easy as possible.
For now, I created a Packet class, and I create subclasses of it for each different kind of packet. However, it isn't as clean as it seems; I've ended up with code like this:
static class ItemDesc extends Packet {
public final int item_id;
public final int desc_type;
public final String filename;
public final String buf;
public ItemDesc(Type t, int item_id, int desc_type, String filename, String buf) {
super(t); // sets type for use in packet header
this.item_id = item_id;
this.desc_type = desc_type;
this.filename = filename;
this.buf = buf;
}
public ItemDesc(InputStream i) throws IOException {
super(i); // reads packet header and sets this.input
item_id = input.readInt();
desc_type = input.readByte();
filename = input.readStringWithLength();
buf = input.readStringWithLength();
}
public void writeTo(OutputStream o) throws IOException {
MyOutputStream dataOutput = new MyOutputStream();
dataOutput.writeInt(item_id);
dataOutput.writeByte(desc_type);
dataOutput.writeStringWithLength(filename);
dataOutput.writeStringWithLength(buf);
super.write(dataOutput.toByteArray(), o);
}
}
What bothers me about this approach is the code repetition - I'm repeating the packet structure four times. I'd be glad to avoid this, but I can't see a reasonable way to simplify it.
If I was writing in Python I would create a dictionary of all possible field types, and then define new packet types like this:
ItemDesc = [('item_id', 'int'), ('desc_type', 'byte'), ...]
I suppose that I could do something similar in any functional language. However, I can't see a way to take this approach to Java.
(Maybe I'm just too pedantic, or I got used to functional programming and writing code that writes code, so I could avoid any repetition :))
Thank you in advance for any suggestions.
I agree with #silky that your current code is a good solution. A bit of repetitious (though not duplicated) code is not a bad thing, IMO.
If you wanted a more python-like solution, you could:
Replace the member attributes of ItemDesc with some kind of order-preserving map structure, do the serialization using a common writeTo method that iterates over the map. You also need to add getters for each attribute, and replace all uses of the existing fields.
Replace the member attributes with a Properties object and use Properties serialization instead of binary writes.
Write a common writeTo method that uses Java reflection to access the member attributes and their types and serialize them.
But in all 3 cases, the code will be slower, more complicated and potentially more fragile than the current "ugly" code. I wouldn't do this.
Seem okay to me. You may just want to abstract some of the 'general' parts of the packet up the inheritance chain, so you don't need to read them, but it makes sense to be repeating the format like you are, because you've got a case for reading in raw from the constructor, reading from a stream, and writing. I see nothing wrong with it.
I am not sure you can do this in java- but maybe you could reuse one of the ctors:
public ItemDesc(InputStream i) throws IOException {
super(i); // reads packet header and sets this.input
this(input.readInt(), input.readByte(), input.readStringWithLength(), input.readStringWithLength());
}
Were 'this' means a call to this classes ctor, whtever the syntax might be.

Java: Serializing unknown Arraysize

If I safe an Array and reload it, is there a possibility to get the size if its unknown?
Thanks
What do you mean by "unknown"? You can get the length of any java array with the length field.
int[] myArray = deserializeSomeArray();
int size = myArray.length;
It sounds like you're serializing and storing the individual objects in the array (after much reading between the lines). Use the ObjectOutputStream to store the array itself. If the objects stored in the array are serializable, they'll be stored too. When you deserialize you'll get the entire array back intact.
I think you need to supply some more information. How are you saving the array? Using an ObjectOutputStream?
No because the length of the array is just the size of memory allocated divided by the size of the object stored in it, and since no objects have a size of 0 you will always have a proper length, (which could be 0)
If you use ObjectInputStream.readObject() to read the saved array, it will be reconstituted with the proper length and you can just read the size with array.length.
Attempting to read between the lines...
If you are actually reading array, then (unlike C) all arrays know their length. Java is a safe language, so the length is necessary for bounds checking.
MyType[] things = (MyType[])in.readObject();
int len = things.length;
Perhaps your difficulty is that you are doing custom (de)serialisation and are writing out individual elements of the array (hint: don't - use an array). In the case you need to catch OptionDataException to detect the end of the enclosing object's custom data:
private static final MyType[] NOTHING = new MyType[0];
private transient MyType[] things = NOTHING;
private void writeObject(ObjectOutputStream out) throws IOException {
out.defaultWriteObject(); // Do not forget this call!
for (MyType thing : things) {
out.writeObject(thing);
}
}
private void readObject(
ObjectInputStream in
) throws IOException, ClassNotFoundException {
in.defaultReadObject(); // Do not forget this call!
List<MyType> things = new ArrayList<MyType>();
try {
for (;;) {
things.add((MyType)in.readObject();
}
} catch (OptionalDataException exc) {
// Okay - end of custom data.
}
this.things = things.toArray(NOTHING);
}
If you are going to do that sort of thing, it's much better to write out the number of objects you are going to read as an int before the actual data.

Categories