Java Serialization Pains (java.io.StreamCorruptedException) - java

I'm trying to Serialize an object to a Byte array, for storage in a String. I cannot for the life of me figure out where I'm going wrong here.
String store = null;
// Writing
try {
String hi = "Hi there world!";
ByteArrayOutputStream out = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(out);
oos.writeObject(hi);
oos.close();
store = out.toString("UTF-8");
} catch(Exception e) {
System.out.println(e);
}
// Reading
try {
ByteArrayInputStream in = new ByteArrayInputStream(store.getBytes("UTF-8"));
ObjectInputStream ois = new ObjectInputStream(in);
String data = (String) ois.readObject();
} catch(Exception e) {
System.out.println(e);
}
I keep getting java.io.StreamCorruptedException and I don't know why :(

store = out.toString("UTF-8");
the data in out is not UTF-8 formatted, in fact it's not a String at all. It's a serialized instance of a String. You can call toString on it, just because you can call toString on any object.
you'd want to to
byte[] data = out.toByteArray();
and then pass data into the ByteArrayInputStream constructor

Unfortunatelly, Java strings aren't an array of bytes (as in C), but rather an array of chars (16-bit values). Also, all strings are unicode in Java.
My best advice is: use Base64 encoding/decoding if you need to store binary data into strings. Apache Commons has some great classes for this task, and you can find more info at:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html

If you want to save the byte array to a string, you need to convert it to a Base64 string, not to a UTF-8 string. For that purpose you can use org.apache.commons.codec.binary.Base64

I would recommend the following code:
Note that the "ISO-8859-1" encoding preserves a byte array, while "UTF-8" does not (some bytes array lead to invalid Strings in this encoding).
/**
* Serialize any object
* #param obj
* #return
*/
public static String serialize(Object obj) {
try {
ByteArrayOutputStream bo = new ByteArrayOutputStream();
ObjectOutputStream so = new ObjectOutputStream(bo);
so.writeObject(obj);
so.flush();
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
return bo.toString("ISO-8859-1");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Deserialize any object
* #param str
* #param cls
* #return
*/
public static <T> T deserialize(String str, Class<T> cls) {
// deserialize the object
try {
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
byte b[] = str.getBytes("ISO-8859-1");
ByteArrayInputStream bi = new ByteArrayInputStream(b);
ObjectInputStream si = new ObjectInputStream(bi);
return cls.cast(si.readObject());
} catch (Exception e) {
e.printStackTrace();
}
}

The problem is that the initial string when serialized is a serialized String. That's not the same as chopping the string into an array of its constituent characters.

Related

What's the difference between getBytes and serialize with String?

Just as the title says, I can't differ getBytes[] from serialization mechanism with String. Below is a test between getBytes[] and serialization mechanism:
public void testUTF() {
byte[] data = SerializeUtil.serUTFString(str);
System.out.println(data.length);
System.out.println(str.getBytes().length);
}
Here is SerializeUtil:
public static byte[] serUTFString(String data) {
byte[] result = null;
ObjectOutputStream oos = null;
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
try {
oos = new ObjectOutputStream(byteArray);
try {
oos.writeUTF(data);
oos.flush();
result = byteArray.toByteArray();
} finally {
oos.close();
}
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
When I set str to Redis, both can work correctly, but getBytes[] seems more efficient. Since they all return a byte array from String, whats's the difference, is serialization necessary?
String.getBytes() returns a byte array repersenting the string characters in the default encoding. ObjectOutputStream.writeUTF writes the string length then bytes in modified UTF-8 format, see java.io.DataOutput API.

How to binary (de)serialize object into/form string?

I need serialize objects into String and deserialize.
I readed sugestion on stackoverflow and make this code:
class Data implements Serializable {
int x = 5;
int y = 3;
}
public class Test {
public static void main(String[] args) {
Data data = new Data();
String out;
try {
// zapis
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(data);
out = new String(baos.toByteArray());
System.out.println(out);
// odczyt.==========================================
ByteArrayInputStream bais = new ByteArrayInputStream(out.getBytes());
ObjectInputStream ois = new ObjectInputStream(bais);
Data d = (Data) ois.readObject();
System.out.println("d.x = " + d.x);
System.out.println("d.y = " + d.y);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
}
but I get error:
java.io.StreamCorruptedException: invalid stream header: EFBFBDEF
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298)
at p.Test.main(Test.java:37)
Why?
I expected:
d.x = 5
d.y = 3
how to do in good way?
Ah. I don't want to write this object in file. I have to have it in string format.
Use
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray()); instead of
ByteArrayInputStream bais = new ByteArrayInputStream(out.getBytes());, since the String conversion corrupts the data (because of the encoding).
If you really need to store the result in a String, you need a safe way to store arbitrary bytes in a String. One way of doing that is to us Base64-encoding.
A totally different approach would have been to not use the standard Java serialization for this class, but create your own Data to/from String converter.
It is not entirely true to say that conversion to string corrupts the data. Conversion to "UTF-8" does because it is not bijective (some characters are 2 bytes but not all 2 bytes sequences are allowed as character sequences), while "ISO-8859-1" is bijective (1 character of a String is a byte and vice-versa).
Base64 encoding is not very space-efficient compared to this.
This is why I would recommend:
/**
* Serialize any object
* #param obj
* #return
*/
public static String serialize(Object obj) {
try {
ByteArrayOutputStream bo = new ByteArrayOutputStream();
ObjectOutputStream so = new ObjectOutputStream(bo);
so.writeObject(obj);
so.flush();
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
return bo.toString("ISO-8859-1");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Deserialize any object
* #param str
* #param cls
* #return
*/
public static <T> T deserialize(String str, Class<T> cls) {
// deserialize the object
try {
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
byte b[] = str.getBytes("ISO-8859-1");
ByteArrayInputStream bi = new ByteArrayInputStream(b);
ObjectInputStream si = new ObjectInputStream(bi);
return cls.cast(si.readObject());
} catch (Exception e) {
e.printStackTrace();
}
}

GZIP decompress string and byte conversion

I have a problem in code:
private static String compress(String str)
{
String str1 = null;
ByteArrayOutputStream bos = null;
try
{
bos = new ByteArrayOutputStream();
BufferedOutputStream dest = null;
byte b[] = str.getBytes();
GZIPOutputStream gz = new GZIPOutputStream(bos,b.length);
gz.write(b,0,b.length);
bos.close();
gz.close();
}
catch(Exception e) {
System.out.println(e);
e.printStackTrace();
}
byte b1[] = bos.toByteArray();
return new String(b1);
}
private static String deCompress(String str)
{
String s1 = null;
try
{
byte b[] = str.getBytes();
InputStream bais = new ByteArrayInputStream(b);
GZIPInputStream gs = new GZIPInputStream(bais);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int numBytesRead = 0;
byte [] tempBytes = new byte[6000];
try
{
while ((numBytesRead = gs.read(tempBytes, 0, tempBytes.length)) != -1)
{
baos.write(tempBytes, 0, numBytesRead);
}
s1 = new String(baos.toByteArray());
s1= baos.toString();
}
catch(ZipException e)
{
e.printStackTrace();
}
}
catch(Exception e) {
e.printStackTrace();
}
return s1;
}
public String test() throws Exception
{
String str = "teststring";
String cmpr = compress(str);
String dcmpr = deCompress(cmpr);
}
This code throw java.io.IOException: unknown format (magic number ef1f)
GZIPInputStream gs = new GZIPInputStream(bais);
It turns out that when converting byte new String (b1) and the byte b [] = str.getBytes () bytes are "spoiled." At the output of the line we have already more bytes. If you avoid the conversion to a string and work on the line with bytes - everything works. Sorry for my English.
public String unZip(String zipped) throws DataFormatException, IOException {
byte[] bytes = zipped.getBytes("WINDOWS-1251");
Inflater decompressed = new Inflater();
decompressed.setInput(bytes);
byte[] result = new byte[100];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while (decompressed.inflate(result) != 0)
buffer.write(result);
decompressed.end();
return new String(buffer.toByteArray(), charset);
}
I'm use this function to decompress server responce. Thanks for help.
You have two problems:
You're using the default character encoding to convert the original string into bytes. That will vary by platform. It's better to specify an encoding - UTF-8 is usually a good idea.
You're trying to represent the opaque binary data of the result of the compression as a string by just calling the String(byte[]) constructor. That constructor is only meant for data which is encoded text... which this isn't. You should use base64 for this. There's a public domain base64 library which makes this easy. (Alternatively, don't convert the compressed data to text at all - just return a byte array.)
Fundamentally, you need to understand how different text and binary data are - when you want to convert between the two, you should do so carefully. If you want to represent "non text" binary data (i.e. bytes which aren't the direct result of encoding text) in a string you should use something like base64 or hex. When you want to encode a string as binary data (e.g. to write some text to disk) you should carefully consider which encoding to use. If another program is going to read your data, you need to work out what encoding it expects - if you have full control over it yourself, I'd usually go for UTF-8.
Additionally, the exception handling in your code is poor:
You should almost never catch Exception; catch more specific exceptions
You shouldn't just catch an exception and continue as if it had never happened. If you can't really handle the exception and still complete your method successfully, you should let the exception bubble up the stack (or possibly catch it and wrap it in a more appropriate exception type for your abstraction)
When you GZIP compress data, you always get binary data. This data cannot be converted into string as it is no valid character data (in any encoding).
So your compress method should return a byte array and your decompress method should take a byte array as its parameter.
Futhermore, I recommend you use an explicit encoding when you convert the string into a byte array before compression and when you turn the decompressed data into a string again.
When you GZIP compress data, you always get binary data. This data
cannot be converted into string as it is no valid character data (in
any encoding).
Codo is right, thanks a lot for enlightening me. I was trying to decompress a string (converted from the binary data). What I amended was using InflaterInputStream directly on the input stream returned by my http connection. (My app was retrieving a large JSON of strings)

why object to byte[] returns jumbled characters in java?

public static byte[] objectToByteArray(Object obj) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
ObjectOutputStream objOut = null;
try {
objOut = new ObjectOutputStream(out);
objOut.writeObject(obj);
objOut.flush();
} finally {
objOut.close();
out.close();
}
return out.toByteArray();
}
Main Method :
public static void main(String[] args) throws IOException {
//
// System.out.println(getFileName("/home/local/ZOHOCORP/bharathi-1397/logs/bharathi-1397.csez.zohocorpin.com_2012_05_24.log",0));
try {
throw new IOException("Error in main method");
} catch (IOException io) {
System.out.println(new String(objectToByteArray(io.getMessage()),
"UTF-8"));
}
//
}
Output :
��
I want to convert Object to byte[] but why it returns ctrl characters like this . I didn't understand can you please help me .
Serialization converts an object into binary data. It's fundamentally not text data - just like an image file isn't text data.
Your code tries to interpret this opaque binary data as if it were UTF-8-encoded text data. It's not, so no wonder you're seeing garbage. If you opened up an image file in a text editor, you'd see similar non-useful text. You can try to interpret the data as text (as you are doing) but you won't get anything useful out.
If you want to represent opaque binary data as text in a printable, reversible manner you should use base64 or hex. There are lots of libraries available to convert to base64, including this public domain one.
Java store the object in its own binary format when you use ObjectOutputStream. You have to use ObjectInputStream to decode data into proper Object before use.
java api doc:
public final void writeObject(Object obj) throws IOException;
Write the specified object to the ObjectOutputStream. The class of the object, the signature of the class, and the values of the non-transient and non-static fields of the class and all of its supertypes are written. Default serialization for a class can be overridden using the writeObject and the readObject methods. Objects referenced by this object are written transitively so that a complete equivalent graph of objects can be reconstructed by an ObjectInputStream.
http://docs.oracle.com/javase/1.4.2/docs/api/java/io/ObjectOutputStream.html#writeObject(java.lang.Object)
You can find the below approach for writing and reading the byte values from and to the file.
Convert to Hex String and then into Base64 format
To Read the file
Base64 Format
byte [] encDataByteArr = hexToBytes(encData);
byte [] dataBack = Base64.decodeBase64(encDataByteArr);
To Write in the file
byte [] data = Base64.encodeBase64(encDataByteArr);
returnValue = bytesToHex(data);
Some Utility Methods for that is as below
public static synchronized String bytesToHex(byte [] buf){
StringBuffer strbuf = new StringBuffer(buf.length * 2);
int i;
for (i = 0; i < buf.length; i++) {
if (((int) buf[i] & 0xff) < 0x10){
strbuf.append("0");
}
strbuf.append(Long.toString((int) buf[i] & 0xff, 16));
}
return strbuf.toString();
}
public synchronized static byte[] hexToBytes(String hexString) {
byte[] b = new BigInteger(hexString,16).toByteArray();
return b;
}

Converting part of a ByteBuffer to a String

I have a ByteBuffer containing bytes that were derived by String.getBytes(charsetName), where "containing" means that the string comprises the entire sequence of bytes between the ByteBuffer's position() and limit().
What's the best way for me to get the string back? (assuming I know the encoding charset) Is there anything better than the following (which seems a little clunky)
byte[] ba = new byte[bbuf.remaining()];
bbuf.get(ba);
try {
String s = new String(ba, charsetName);
}
catch (UnsupportedEncodingException e) {
/* take appropriate action */
}
String s = Charset.forName(charsetName).decode(bbuf).toString();

Categories