How to binary (de)serialize object into/form string? - java

I need serialize objects into String and deserialize.
I readed sugestion on stackoverflow and make this code:
class Data implements Serializable {
int x = 5;
int y = 3;
}
public class Test {
public static void main(String[] args) {
Data data = new Data();
String out;
try {
// zapis
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(data);
out = new String(baos.toByteArray());
System.out.println(out);
// odczyt.==========================================
ByteArrayInputStream bais = new ByteArrayInputStream(out.getBytes());
ObjectInputStream ois = new ObjectInputStream(bais);
Data d = (Data) ois.readObject();
System.out.println("d.x = " + d.x);
System.out.println("d.y = " + d.y);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
}
but I get error:
java.io.StreamCorruptedException: invalid stream header: EFBFBDEF
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298)
at p.Test.main(Test.java:37)
Why?
I expected:
d.x = 5
d.y = 3
how to do in good way?
Ah. I don't want to write this object in file. I have to have it in string format.

Use
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray()); instead of
ByteArrayInputStream bais = new ByteArrayInputStream(out.getBytes());, since the String conversion corrupts the data (because of the encoding).
If you really need to store the result in a String, you need a safe way to store arbitrary bytes in a String. One way of doing that is to us Base64-encoding.
A totally different approach would have been to not use the standard Java serialization for this class, but create your own Data to/from String converter.

It is not entirely true to say that conversion to string corrupts the data. Conversion to "UTF-8" does because it is not bijective (some characters are 2 bytes but not all 2 bytes sequences are allowed as character sequences), while "ISO-8859-1" is bijective (1 character of a String is a byte and vice-versa).
Base64 encoding is not very space-efficient compared to this.
This is why I would recommend:
/**
* Serialize any object
* #param obj
* #return
*/
public static String serialize(Object obj) {
try {
ByteArrayOutputStream bo = new ByteArrayOutputStream();
ObjectOutputStream so = new ObjectOutputStream(bo);
so.writeObject(obj);
so.flush();
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
return bo.toString("ISO-8859-1");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Deserialize any object
* #param str
* #param cls
* #return
*/
public static <T> T deserialize(String str, Class<T> cls) {
// deserialize the object
try {
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
byte b[] = str.getBytes("ISO-8859-1");
ByteArrayInputStream bi = new ByteArrayInputStream(b);
ObjectInputStream si = new ObjectInputStream(bi);
return cls.cast(si.readObject());
} catch (Exception e) {
e.printStackTrace();
}
}

Related

Json to avro conversion

I'm converting Json to avro. I have json data in JSONArray. So while converting it into byte array i'm facing the problem.
below is my code:
static byte [] fromJsonToAvro(JSONArray json, String schemastr) throws Exception {
ExcelToJson ejj = new ExcelToJson();
List<String> list = new ArrayList<String>();
if (json != null) {
int len = json.length();
for (int i=0;i<len;i++){
list.add(json.get(i).toString());
}
}
InputStream input = new ByteArrayInputStream(list.getBytes()); //json.toString().getBytes()
DataInputStream din = new DataInputStream(input);
.
.
.//rest of the logic
So how can i do it? How to convert JsonArray object to bytes(i.e., how to use getBytes() method for JsonArray objects). The above code giving an error at list.getBytes() and saying getBytes() is undifined for list.
Avro works at the record level, bound to a schema. I don't think there's such a concept as "convert this JSON fragment to bytes for an Avro field independent of any schema or record".
Assuming the array is part of a larger JSON record, if you're starting with a string of the record, you could do
public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
InputStream input = null;
DataFileWriter<GenericRecord> writer = null;
Encoder encoder = null;
ByteArrayOutputStream output = null;
try {
Schema schema = new Schema.Parser().parse(schemaStr);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
input = new ByteArrayInputStream(json.getBytes());
output = new ByteArrayOutputStream();
DataInputStream din = new DataInputStream(input);
writer = new DataFileWriter<GenericRecord>(new GenericDatumWriter<GenericRecord>());
writer.create(schema, output);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
GenericRecord datum;
while (true) {
try {
datum = reader.read(null, decoder);
} catch (EOFException eofe) {
break;
}
writer.append(datum);
}
writer.flush();
return output.toByteArray();
} finally {
try { input.close(); } catch (Exception e) { }
}
}
For an on-line json to avro converter check the following URL
http://avro4s-ui.landoop.com
It is using the library avro4s that offers a lot of conversions including json=>avro
This discussion is likely useful:
http://mail-archives.apache.org/mod_mbox/avro-user/201209.mbox/%3CCALEq1Z8s1sfaAVB7YE2rpZ=v3q1V_h7Vm39h0HsOzxJ+qfQRSg#mail.gmail.com%3E
The gist is that there is a special Json schema and you can use JsonReader/Writer to get to and from that. The Json schema you should use is defined here:
https://github.com/apache/avro/blob/trunk/share/schemas/org/apache/avro/data/Json.avsc

Java - Error deserializing HashTable containing primitive type

I have serialized a HashTable<String,Object> object using an ObjectOutputStream. When serializing the object, I get no exception, but upon deserialization, the following exception occurs:
Exception in thread "main" java.io.InvalidClassException: java.lang.Long; local class
incompatible: stream classdesc serialVersionUID = 4290774032661291999, local class
serialVersionUID = 4290774380558885855
I no longer get the error when I remove all of the keys in the HashTable that have a value that is not a String (all of the key / value pairs I removed had a primitive type as their value).
What could be causing this error?
UPDATE - Here's the code
public static String serialize(Quiz quiz) throws IOException{
HashMap<String,Object> quizData = new HashMap<String,Object>();
quizData.put("version", 0); //int
quizData.put("name", quiz.getName()); //String
quizData.put("desc", quiz.getDesc()); //String
quizData.put("timelimitType", quiz.getTimelimitType()); //String
quizData.put("timelimit", quiz.getTimelimit()); //long
ArrayList<String> serializedQuestionsData = new ArrayList<String>();
for (Question question : quiz.getQuestions())
serializedQuestionsData.add(Question.serialize(question));
quizData.put("questions", serializedQuestionsData.toArray(new String[0])); //String[]
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos;
try { oos = new ObjectOutputStream(baos); } catch (IOException error){ throw error; }
try { oos.writeObject(quizData); } catch (IOException error){ throw error; }
return baos.toString();
}
#SuppressWarnings("unchecked")
public static Quiz deserialize(String serializedQuizData) throws IOException, ClassNotFoundException{
ByteArrayInputStream bais = new ByteArrayInputStream(serializedQuizData.getBytes());
ObjectInputStream ois;
try { ois = new ObjectInputStream(bais); } catch (IOException error){ throw error; }
HashMap<String,Object> quizData;
// Exception occurs on the following line!!
try { quizData = (HashMap<String,Object>) ois.readObject(); } catch (ClassNotFoundException error){ throw error; }
Quiz quiz;
if ((int) quizData.get("version") == 0){
quiz = new Quiz((String) quizData.get("name"),
(String) quizData.get("desc"),
(String) quizData.get("timelimitType"),
(long) quizData.get("timelimit"));
for (String serializedQuestionData : (String[]) quizData.get("questions"))
quiz.addQuestion(Question.deserialize(serializedQuestionData));
} else {
throw new UnsupportedOperationException("Unsupported version: \"" + quizData.get("version") + "\"");
}
return quiz;
}
The problem is that you're transforming a byte array output stream to a String using toString(). The toString() method simply uses the platform default encoding to transform the bytes (which do not represent characters at all but are purely binary data) into a String. This is thus a lossy operation, because your platform default encoding doesn't have a valid character for every possible byte.
You shouldn't use String to hold binary data. A String contains characters. If you really need a String, then encode the byte array using a Hexadecimal or Base64 encoder. Otherwise, simply use a byte array to hold your binary data:
public static byte[] serialize(Quiz quiz) throws IOException{
...
ByteArrayOutputStream baos = new ByteArrayOutputStream();
...
return baos.toByteArray();
}
#SuppressWarnings("unchecked")
public static Quiz deserialize(byte[] serializedQuizData) throws IOException, ClassNotFoundException{
ByteArrayInputStream bais = new ByteArrayInputStream(serializedQuizData);
...
return quiz;
}
The only explanation I can think of is that is that something is corrupting your object stream between you reading it and writing it. The serialVersionID in "the local class) (4290774380558885855) is standard across all Java implementations that try to be compatible with Java (tm). The source code for java.lang.Long says that that serial version id has not changed since Java 1.0.2.
If you need further help, you will need to provide an SSCCE that covers both creation and reading of the serialized object.

What's the difference between getBytes and serialize with String?

Just as the title says, I can't differ getBytes[] from serialization mechanism with String. Below is a test between getBytes[] and serialization mechanism:
public void testUTF() {
byte[] data = SerializeUtil.serUTFString(str);
System.out.println(data.length);
System.out.println(str.getBytes().length);
}
Here is SerializeUtil:
public static byte[] serUTFString(String data) {
byte[] result = null;
ObjectOutputStream oos = null;
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
try {
oos = new ObjectOutputStream(byteArray);
try {
oos.writeUTF(data);
oos.flush();
result = byteArray.toByteArray();
} finally {
oos.close();
}
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
When I set str to Redis, both can work correctly, but getBytes[] seems more efficient. Since they all return a byte array from String, whats's the difference, is serialization necessary?
String.getBytes() returns a byte array repersenting the string characters in the default encoding. ObjectOutputStream.writeUTF writes the string length then bytes in modified UTF-8 format, see java.io.DataOutput API.

Java Serialization Pains (java.io.StreamCorruptedException)

I'm trying to Serialize an object to a Byte array, for storage in a String. I cannot for the life of me figure out where I'm going wrong here.
String store = null;
// Writing
try {
String hi = "Hi there world!";
ByteArrayOutputStream out = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(out);
oos.writeObject(hi);
oos.close();
store = out.toString("UTF-8");
} catch(Exception e) {
System.out.println(e);
}
// Reading
try {
ByteArrayInputStream in = new ByteArrayInputStream(store.getBytes("UTF-8"));
ObjectInputStream ois = new ObjectInputStream(in);
String data = (String) ois.readObject();
} catch(Exception e) {
System.out.println(e);
}
I keep getting java.io.StreamCorruptedException and I don't know why :(
store = out.toString("UTF-8");
the data in out is not UTF-8 formatted, in fact it's not a String at all. It's a serialized instance of a String. You can call toString on it, just because you can call toString on any object.
you'd want to to
byte[] data = out.toByteArray();
and then pass data into the ByteArrayInputStream constructor
Unfortunatelly, Java strings aren't an array of bytes (as in C), but rather an array of chars (16-bit values). Also, all strings are unicode in Java.
My best advice is: use Base64 encoding/decoding if you need to store binary data into strings. Apache Commons has some great classes for this task, and you can find more info at:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html
If you want to save the byte array to a string, you need to convert it to a Base64 string, not to a UTF-8 string. For that purpose you can use org.apache.commons.codec.binary.Base64
I would recommend the following code:
Note that the "ISO-8859-1" encoding preserves a byte array, while "UTF-8" does not (some bytes array lead to invalid Strings in this encoding).
/**
* Serialize any object
* #param obj
* #return
*/
public static String serialize(Object obj) {
try {
ByteArrayOutputStream bo = new ByteArrayOutputStream();
ObjectOutputStream so = new ObjectOutputStream(bo);
so.writeObject(obj);
so.flush();
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
return bo.toString("ISO-8859-1");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Deserialize any object
* #param str
* #param cls
* #return
*/
public static <T> T deserialize(String str, Class<T> cls) {
// deserialize the object
try {
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
byte b[] = str.getBytes("ISO-8859-1");
ByteArrayInputStream bi = new ByteArrayInputStream(b);
ObjectInputStream si = new ObjectInputStream(bi);
return cls.cast(si.readObject());
} catch (Exception e) {
e.printStackTrace();
}
}
The problem is that the initial string when serialized is a serialized String. That's not the same as chopping the string into an array of its constituent characters.

Blob object not working properly even though the class is seralized

I have class which is seralized and does convert a very large amount of data object to blob to save it to database.In the same class there is decode method to convert blob to the actual object.Following is the code for encode and decode of the object.
private byte[] encode(ScheduledReport schedSTDReport)
{
byte[] bytes = null;
try
{
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(bos);
oos.writeObject(schedSTDReport);
oos.flush();
oos.close();
bos.close();
//byte [] data = bos.toByteArray();
//ByteArrayOutputStream baos = new ByteArrayOutputStream();
//GZIPOutputStream out = new GZIPOutputStream(baos);
//XMLEncoder encoder = new XMLEncoder(out);
//encoder.writeObject(schedSTDReport);
//encoder.close();
bytes = bos.toByteArray();
//GZIPOutputStream out = new GZIPOutputStream(bos);
//out.write(bytes);
//bytes = bos.toByteArray();
}
catch (Exception e)
{
_log.error("Exception caught while encoding/zipping Scheduled STDReport", e);
}
decode(bytes);
return bytes;
}
/*
* Decode the report definition blob back to the
* ScheduledReport object.
*/
private ScheduledReport decode(byte[] bytes)
{
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
ScheduledReport sSTDR = null;
try
{
ObjectInputStream ois = new ObjectInputStream(bais);
//GZIPInputStream in = new GZIPInputStream(bais);
//XMLDecoder decoder = new XMLDecoder(in);
sSTDR = (ScheduledReport)ois.readObject();//decoder.readObject();
//decoder.close();
}
catch (Exception e)
{
_log.error("IOException caught while decoding/unzipping Scheduled STDReport", e);
}
return sSTDR;
}
The problem here is whenver I change something else in this class
means any other method,a new class version is created and so the new version the class is unable to decode the originally encoded blob object. The object which I am passing for encode is also seralized object but this problem exists. Any ideas thanks
Yup, Java binary serialization is pretty brittle :(
You can add a static serialVersionUID field to the class so that you can control the version numbers... this should prevent problems due to adding methods. You'll still run into potential issues when fields are added though. See the JavaDocs for Serializable for some more details.
You might want to consider using another serialization format such as Protocol Buffers to give you more control though.
You can implement java.io.Externalizable so that you are able to control what is serialized and expected in deserialization.

Categories