I have some data in bytes, and I want to put them into Redis, but Redis only accepts binary safe string, and my data has some binary non-safe bytes. So how can I convert these bytes into binary safe string so that I can save them to Redis?
Base64 works for me, but it makes data larger, any better idea?
UPDATE: I want to serialize my protobuf object to Redis, and the serialized data has '\x00', so when I read the data from Redis, I can not deserialize the data to object. Then I tried base64, it works fine, but with larger size.
So I want to figure out how to serialize binary data (protobuf object) to Redis safely and with smaller size
You could try ISO-8859-1 encoding. This uses a one to one mapping between bytes and chars.
This could still result in corruption depending on why Redis need this "binary safe" string. You may have to use base64.
The only safe way to serialize a binary object (such as a protobuf object) is to base64 encode it. Base64 has a 33% overhead but gives you the ability to safely convert from arbitrary binary data to text (such as for use in an xml file) and back.
Related
I'm using Jackson streaming API to deserialise a quite large JSON (on the order of megabytes) into POJO. It's working fine, but I'd like to optimize it (both memory and processing wise, code runs on Android).
The main problem I'd like to optimize away is converting a large number of strings from UTF-8 to ISO-8859-1. Currently I use:
String result = new String(parser.getText().getBytes("ISO-8859-1"));
As I understand it, parser originally copies token content into String (getText()), then creates a byte array from it (getBytes()), which is then used to create a final String in desired encoding. Way too much allocations and copying.
Ideal solution would be if getText() would accept the encoding parameter and just give me the final string, but that's not the case.
Any other ideas, or flaws in my thinking?
You can use:
parser.getBinaryValue() (present on version 2.4 of Jackson)
or you can implement an ObjectCodec (with a method readValue(...) that knows converting bytes to String in ISO8859-1) and set it using parser.setCodec().
If you have control over the json generation, avoid using a charset different than UTF-8.
I'm planning to use Serialization to save the Bean modified by user--to store the history record. But the ByteArrayOutputStream output a byte array:byte[]. If I convert it to String and convert it back, then it can't be de-serialized. --How to explain this?
If I use byte array to store in the oracle, it's complicated.Is there any way to make the String can be de-serialized? Thank you!
I'm a Chinese, so forgive me for my bad English. :)
Use ObjectOutputStream to serialize and ObjectInputStream to deserialize objects. The API documentation of those classes has examples that show how to use them to serialize and deserialize objects to and from a file.
Don't try to force a byte[] into a String. (Why would you want to put it in a String?). Serialized objects are binary data, not text characters that you would store in a String.
Brief Answer: encode the byte array as a Base64 string.
Base64 is a way of ensuring that binary data can be stored and transmitted as text - a reasonable explanation can be found on Wikipedia; if you don't encode the byte array, data can easily become "corrupted" by the use of different codepages etc. One thing to be aware of - base64 encoding will take more space than the byte array (so a byte array of 20 bytes may take around 30 characters to be stored)
There are many libraries that can encode/decode Base64 Apache Commons Codec is just one. See this question for more discussions on which library to use (there is a "private" one in the JDK, but use of it may be considered questionable by some developers).
So, in summary, to serialize an object into string, us an ObjectOutputStream and a ByeArrayOutputStream to convert to a byte array, and then use a Base64 encoder to translate that into a string.
To deserialize, use a Base64 decoder to convert the string back into a byte array, and then use a ByteArrayInputStream and a ObjectInputStream to read it back.
In byte[] all possible byte values can be used i.e. -128 to 127. However, in text these values and combination of values can be invalid and not convert to text as expected.
I suggest you consider a text based serialization like XML or JSon. These can be read/written as text safely. Text base serialization can be read by a human and possibly edited as text if you want to correct a value.
EDIT: I would look at using XMLEncoder which is crude, but built in or XStream for XML and JSon, which is more flexible and efficient (but requires a couple of extra libraries)
static final String SQL_SERIALIZE_OBJECT="insert into serialized_java_objects(serialized_id,object_name,serialized_object) values (ser_id_seq.nextval,?,?)";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(objectToBeSerilize);
byte[] serializeBytes = baos.toByteArray();
prepStatement = connection.prepareStatement(SQL_SERIALIZE_OBJECT);
prepStatement.setString(1, objectToBeSerilize.getClass().getName());
prepStatement.setBytes(2, serializeBytes);
prepStatement.executeUpdate();
Is it possible to attach file to JSONObject in Java (and to JSON at all)?
For example, can I attach bitmap to 'image' field?
{'user_id':'5', 'auth_token':'abc', 'image': ???}
You can convert the bitmap (or any binary data) to text using base64 (which makes it a String) I wouldn't use one of the Base64 classes in the JVM unless you are fully aware that they are for internal use. (may not be available on all JDKs and could change in future versions)
You could copy java.util.prefs.Base64 if you don't have one in a library already.
refer the answer of BalusC
A bitmap is binary data. JSON is to be represented as character data. So you need to convert binary data to character data and vice versa without loss of information.
Yes you can send image by converting that image into characters data,
I want to transfer some database data through a TCP socket. The data is formatted to JSON.
Since the database size might grow, I'm afraid that the String object maximum size will not be enough to store the entire data with JSON formatting.
I already had an problem transferring the data using the DataOutput function writeUTF().
What should I do? Maybe convert the database rows to CSV and transfer it through the Internet line by line? Or do I not need to worry about String limits and solve the writeUTF() problem by getting the bytes of the String, transferring them through the socket and rebuilding the String from the bytes at the destination?
Java strings can be extremely long - you're unlikely to run into problems with the String type itself. If you convert the string to binary first, then use writeInt to write the number of bytes, then the bytes themselves, that should be fine. The problem with writeUTF is that it uses writeShort, so it only handles up to 64K of data.
I read from ORACLE of the following bit:
Can I execute methods on compressed versions of my objects, for example isempty(zip(serial(x)))?
This is not really viable for arbitrary objects because of the encoding of objects. For a particular object (such as String) you can compare the resulting bit streams. The encoding is stable, in that every time the same object is encoded it is encoded to the same set of bits.
So I got this idea, say if I have a char array of 4M something long, is it possible for me to compress it to several hundreds of bytes using GZIPOutputStream, and then map the whole file into memory, and do random search on it by comparing bits? Say if I am looking for a char sequence of "abcd", could I somehow get the bit sequence of compressed version of "abcd", and then just search the file for it? Thanks.
You cannot use GZIP or similar to do this as the encoding of each byte change as the stream is processed. i.e. the only way to determine what a byte means is to read all the bytes previous.
If you want to access the data randomly, you can break the String into smaller sections. That way you only need to decompress a relative short section of data.