Just as the title says, I can't differ getBytes[] from serialization mechanism with String. Below is a test between getBytes[] and serialization mechanism:
public void testUTF() {
byte[] data = SerializeUtil.serUTFString(str);
System.out.println(data.length);
System.out.println(str.getBytes().length);
}
Here is SerializeUtil:
public static byte[] serUTFString(String data) {
byte[] result = null;
ObjectOutputStream oos = null;
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
try {
oos = new ObjectOutputStream(byteArray);
try {
oos.writeUTF(data);
oos.flush();
result = byteArray.toByteArray();
} finally {
oos.close();
}
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
When I set str to Redis, both can work correctly, but getBytes[] seems more efficient. Since they all return a byte array from String, whats's the difference, is serialization necessary?
String.getBytes() returns a byte array repersenting the string characters in the default encoding. ObjectOutputStream.writeUTF writes the string length then bytes in modified UTF-8 format, see java.io.DataOutput API.
Related
I have a String that is encoded in base64, I need to take this string, decode it and create a truststore file, but when I do that, the final file is not valid. Here is my code:
public static void buildFile() {
String exampleofencoded = "asdfasdfasdfadfa";
File file = new File("folder/file.jks");
try (FileOutputStream fos = new FileOutputStream(file);
BufferedOutputStream bos = new BufferedOutputStream(fos);
DataOutputStream dos = new DataOutputStream(bos))
{
Base64.Decoder decoder = Base64.getDecoder();
String decodedString =new String(decoder.decode(exampleofencoded).getBytes());
dos.writeBytes(decodedString);
}
catch (IOException e) {
System.out.println("Error creating file");
}
catch(NullPointerException e) {
System.out.println(e.getMessage();
}
}
The problem is two-fold.
You're converting a byte[] array to String, which is a lossy operation for actual binary data for most character sets (except maybe iso-8859-1).
You're using DataOutputStream, which is not a generic output stream, but intended for a specific serialization format of primitive types. And specifically its writeBytes method comes with an important caveat ("Each character in the string is written out, in sequence, by discarding its high eight bits."), which is one more reason why only using iso-8859-1 will likely work.
Instead, write the byte array directly to the file
public static void buildFile() {
String exampleofencoded = "asdfasdfasdfadfa";
File file = new File("folder/file.jks");
try (FileOutputStream fos = Files.newOutputStream(file.toPath()) {
Base64.Decoder decoder = Base64.getDecoder();
byte[] decodedbytes = decoder.decode(exampleofencoded);
fos.write(decodedbytes);
} catch (IOException e) {
System.out.println("Error creating file");
}
}
As an aside, you shouldn't catch NullPointerException in your code, it is almost always a problem that can be prevented by careful programming and/or validation of inputs. I would usually also advise against catch the IOException here and only printing it. It is probably better to propagate that exception as well, and let the caller handle it.
This question already has answers here:
How to convert Java String into byte[]?
(9 answers)
Closed 4 years ago.
I have the following code to zip and unzip the String:
public static void main(String[] args) {
// TODO code application logic here
String Source = "hello world";
byte[] a = ZIP(Source);
System.out.format("answer:");
System.out.format(a.toString());
System.out.format("\n");
byte[] Source2 = a.toString().getBytes();
System.out.println("\nsource 2:" + Source2.toString() + "\n");
String b = unZIP(Source2);
System.out.println("\nunzip answer:");
System.out.format(b);
System.out.format("\n");
}
public static byte[] ZIP(String source) {
ByteArrayOutputStream bos= new ByteArrayOutputStream(source.length()* 4);
try {
GZIPOutputStream outZip= new GZIPOutputStream(bos);
outZip.write(source.getBytes());
outZip.flush();
outZip.close();
} catch (Exception Ex) {
}
return bos.toByteArray();
}
public static String unZIP(byte[] Source) {
ByteArrayInputStream bins= new ByteArrayInputStream(Source);
byte[] buf= new byte[2048];
StringBuffer rString= new StringBuffer("");
int len;
try {
GZIPInputStream zipit= new GZIPInputStream(bins);
while ((len = zipit.read(buf)) > 0) {
rString.append(new String(buf).substring(0, len));
}
return rString.toString();
} catch (Exception Ex) {
return "";
}
}
When "Hello World" have been zipped, it's will become [B#7bdecdec in byte[] and convert into String and display on the screen. However, if I'm trying to convert the string back into byte[] with the following code:
byte[] Source2 = a.toString().getBytes();
the value of variable a will become to [B#60a1807c instead of [B#7bdecdec . Does anyone know how can I convert the String (a value of byte but been convert into String) back in byte[] in JAVA?
Why doing byte[] Source2 = a.toString().getBytes(); ?
It seems like a double conversion; you convert a byte[] to string the to byte[].
The real conversion of a byte[] to string is new String(byte[]) hoping that you're in the same charset.
Source2 should be an exact copy of a hence you should just do byte[] Source2 = a;
Your unzip is wrong because you are converting back a string which might be in some other encoding (let's say UTF-8):
public static String unZIP(byte[] source) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream(source.length*2);
try (ByteArrayInputStream in = new ByteArrayInputStream(source);
GZIPInputStream zis = new GZIPInputStream(in)) {
byte[] buffer = new buffer[4096];
for (int n = 0; (n = zis.read(buffer) != 0; ) {
bos.write(buffer, 0, n);
}
}
return new String(bos.toByteArray(), StandardCharsets.UTF_8);
}
This one, not tested, will:
Store byte from the gzip stream into a ByteArrayOutputStream
Close the gzip/ByteArrayInputStream using try with resources
Convert the whole into a String using UTF-8 (you should always use encoding and unless rare case, UTF-8 is the way to go).
You must not use StringBuffer for two reasons:
The most important one: this will not behave well with multi bytes string such as UTF-8 or UTF-16.
And second, StringBuffer is synchronized: you should use StringBuilder whenever possible and whenever it should be used (eg: not here!). StringBuffer should be reserved for case where your share the StringBuffer with several threads, otherwise it is useless.
With those change, you will also need to change the ZIP as per David Conrad comment and because the unZIP use UTF-8:
public static byte[] ZIP(String source) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream(source.length()* 4);
try (GZIPOutputStream zip = new GZIPOutputStream(bos)) {
zip.write(source.getBytes(StandardCharsets.UTF_8));
}
return bos.toByteArray();
}
As for the main, printing a byte[] will result in the default toString.
I need serialize objects into String and deserialize.
I readed sugestion on stackoverflow and make this code:
class Data implements Serializable {
int x = 5;
int y = 3;
}
public class Test {
public static void main(String[] args) {
Data data = new Data();
String out;
try {
// zapis
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(data);
out = new String(baos.toByteArray());
System.out.println(out);
// odczyt.==========================================
ByteArrayInputStream bais = new ByteArrayInputStream(out.getBytes());
ObjectInputStream ois = new ObjectInputStream(bais);
Data d = (Data) ois.readObject();
System.out.println("d.x = " + d.x);
System.out.println("d.y = " + d.y);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
}
but I get error:
java.io.StreamCorruptedException: invalid stream header: EFBFBDEF
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298)
at p.Test.main(Test.java:37)
Why?
I expected:
d.x = 5
d.y = 3
how to do in good way?
Ah. I don't want to write this object in file. I have to have it in string format.
Use
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray()); instead of
ByteArrayInputStream bais = new ByteArrayInputStream(out.getBytes());, since the String conversion corrupts the data (because of the encoding).
If you really need to store the result in a String, you need a safe way to store arbitrary bytes in a String. One way of doing that is to us Base64-encoding.
A totally different approach would have been to not use the standard Java serialization for this class, but create your own Data to/from String converter.
It is not entirely true to say that conversion to string corrupts the data. Conversion to "UTF-8" does because it is not bijective (some characters are 2 bytes but not all 2 bytes sequences are allowed as character sequences), while "ISO-8859-1" is bijective (1 character of a String is a byte and vice-versa).
Base64 encoding is not very space-efficient compared to this.
This is why I would recommend:
/**
* Serialize any object
* #param obj
* #return
*/
public static String serialize(Object obj) {
try {
ByteArrayOutputStream bo = new ByteArrayOutputStream();
ObjectOutputStream so = new ObjectOutputStream(bo);
so.writeObject(obj);
so.flush();
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
return bo.toString("ISO-8859-1");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Deserialize any object
* #param str
* #param cls
* #return
*/
public static <T> T deserialize(String str, Class<T> cls) {
// deserialize the object
try {
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
byte b[] = str.getBytes("ISO-8859-1");
ByteArrayInputStream bi = new ByteArrayInputStream(b);
ObjectInputStream si = new ObjectInputStream(bi);
return cls.cast(si.readObject());
} catch (Exception e) {
e.printStackTrace();
}
}
I have some strings that are roughly 10K characters each. There is plenty of repetition in them. They are serialized JSON objects. I'd like to easily compress them into a byte array, and uncompress them from a byte array.
How can I most easily do this? I'm looking for methods so I can do the following:
String original = "....long string here with 10K characters...";
byte[] compressed = StringCompressor.compress(original);
String decompressed = StringCompressor.decompress(compressed);
assert(original.equals(decompressed);
You can try
enum StringCompressor {
;
public static byte[] compress(String text) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
OutputStream out = new DeflaterOutputStream(baos);
out.write(text.getBytes("UTF-8"));
out.close();
} catch (IOException e) {
throw new AssertionError(e);
}
return baos.toByteArray();
}
public static String decompress(byte[] bytes) {
InputStream in = new InflaterInputStream(new ByteArrayInputStream(bytes));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
byte[] buffer = new byte[8192];
int len;
while((len = in.read(buffer))>0)
baos.write(buffer, 0, len);
return new String(baos.toByteArray(), "UTF-8");
} catch (IOException e) {
throw new AssertionError(e);
}
}
}
Peter Lawrey's answer can be improved a bit using this less complex code for the decompress function
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
OutputStream out = new InflaterOutputStream(baos);
out.write(bytes);
out.close();
return new String(baos.toByteArray(), "UTF-8");
} catch (IOException e) {
throw new AssertionError(e);
}
I made a library to solve the problem of compressing generic Strings (expecially short ones).
It tries to compress the String using various algorithms (plain utf-8, 5bit encoding for latin letters, huffman encoding, gzip for long Strings) and chooses the one with the shortest result (in the worst case, it will choose the utf-8 encoding, so that you never risk to lose space).
I hope it may be useful, here's the link
https://github.com/lithedream/lithestring
EDIT: I realized that your Strings are always "long", my library defaults on gzip for those sizes, I fear I cannot do better for you.
I'm trying to Serialize an object to a Byte array, for storage in a String. I cannot for the life of me figure out where I'm going wrong here.
String store = null;
// Writing
try {
String hi = "Hi there world!";
ByteArrayOutputStream out = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(out);
oos.writeObject(hi);
oos.close();
store = out.toString("UTF-8");
} catch(Exception e) {
System.out.println(e);
}
// Reading
try {
ByteArrayInputStream in = new ByteArrayInputStream(store.getBytes("UTF-8"));
ObjectInputStream ois = new ObjectInputStream(in);
String data = (String) ois.readObject();
} catch(Exception e) {
System.out.println(e);
}
I keep getting java.io.StreamCorruptedException and I don't know why :(
store = out.toString("UTF-8");
the data in out is not UTF-8 formatted, in fact it's not a String at all. It's a serialized instance of a String. You can call toString on it, just because you can call toString on any object.
you'd want to to
byte[] data = out.toByteArray();
and then pass data into the ByteArrayInputStream constructor
Unfortunatelly, Java strings aren't an array of bytes (as in C), but rather an array of chars (16-bit values). Also, all strings are unicode in Java.
My best advice is: use Base64 encoding/decoding if you need to store binary data into strings. Apache Commons has some great classes for this task, and you can find more info at:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html
If you want to save the byte array to a string, you need to convert it to a Base64 string, not to a UTF-8 string. For that purpose you can use org.apache.commons.codec.binary.Base64
I would recommend the following code:
Note that the "ISO-8859-1" encoding preserves a byte array, while "UTF-8" does not (some bytes array lead to invalid Strings in this encoding).
/**
* Serialize any object
* #param obj
* #return
*/
public static String serialize(Object obj) {
try {
ByteArrayOutputStream bo = new ByteArrayOutputStream();
ObjectOutputStream so = new ObjectOutputStream(bo);
so.writeObject(obj);
so.flush();
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
return bo.toString("ISO-8859-1");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Deserialize any object
* #param str
* #param cls
* #return
*/
public static <T> T deserialize(String str, Class<T> cls) {
// deserialize the object
try {
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
byte b[] = str.getBytes("ISO-8859-1");
ByteArrayInputStream bi = new ByteArrayInputStream(b);
ObjectInputStream si = new ObjectInputStream(bi);
return cls.cast(si.readObject());
} catch (Exception e) {
e.printStackTrace();
}
}
The problem is that the initial string when serialized is a serialized String. That's not the same as chopping the string into an array of its constituent characters.