Redis value as byte[] vs plain string - java

I am using redis as centralized cache for distributed system. Currently i am using jedis to connect to redis cluster, where i am storing the value as byte[] instead of string. My question is does storing plain string or byte [] has impact on getting the data. In my application i serialize my java pojo object and convert to byte [] and then store, where as i can convert it to json and store so while getting it from redis i can readily use the object instead of deserialize. I have tried both but the only difference i can see is the extra step of deserialize

In Redis, everything is a byte[]. What redis calls as strings are actually byte[] in programming languages.
When you store JSON, you still need to serialize it to byte[] before saving to redis, and do the reverse when you read back. This is no different from serializing a java object. In other words, you always have to pay the cost of serialization and deserialization.
That said, different libraries have different serialization costs. Java serialization is know to be slow and inefficient. JSON is likely to be better than java serialization - but wastes memory in redis because it is a text based. You can choose a better serialization library.
Kryo is a faster replacement for the java serializer. Message Pack is like JSON but faster. Protocol Buffers / Flat Buffers are even better, but require you to declare a schema upfront. There are other serialization formats as well, each with their tradeoffs.
The general recommendation - try to use the hash datatype. It is efficient, and lets you request specific fields instead of the whole object. Only if hash does not work for you, pick something else based on your needs.
P.S. If you are into benchmarks, this website has several - https://github.com/eishay/jvm-serializers/wiki

Related

Java objects to Hbase

I'm currently using KITE API + AVRO to handle java objects to HBase. But due to various problems I'm looking for an alternative.
I've been reading about:
Phoenix
Native Hbase Api.
But there is more an alternative? .
The idea is to save and to load the java objects to Hbase and uses them in a java application.
If you're storing your objects in the Value portion of the KeyValue pair, then it's really just an array / sequence of bytes (i.e. in the code for KeyValue class there is a getValue method which returns a byte array).
At this point, you're down to object serialization and there are a host of libraries you can use with various ease of use, performance characteristics, and details of implementation. Avro is one type of serialization library which stores the schema with each record, but you could in theory use:
Standard Java serialization (implement Serializable)
Kryo
Protobuf
Just to name a few. You may want to investigate the various strengths of each library & its tradeoffs and balance that against the type of objects you plan to store (i.e. are they all effectively the same type of object or do they vary widely in type? Are they going to be long lived i.e. years and have the expectation of schema evolution & backwards compatibility etc.)
Phoenix is a JDBC api to HBase. It handles most SQL types (except intervals) - you can store arbitrary java objects using the binary data type. But if you are only storing binary data, you could easily stick with HBase. If you can coerce your data in standard SQL types, Phoenix may be a good option.
If you want to stick with the Hadoop/HBase code you can have your complex class implement org.apache.hadoop.io.Writable.
// Some complex java object
// that implements org.apache.hadoop.io.Writable
SomeObject myObject = new SomeObject();
// write the object to a byte array
// for storage in HBase
byte[] byteArr = WritableUtils.toByteArray(myObject);
Reference

How to convert thrift objects to readable string and convert it back?

Sometimes, we need to create some thrift objects in unit tests. We can do it by manually create object using Java code, like:
MyObj myObj = new MyObj();
myObj.setName("???");
myObj.setAge(111);
But which is not convenient. I'm looking for a way to create objects with some readable text.
We can convert thrift objects to JSON with TSimpleJSONProtocol, and get very readable JSON string, like:
{ "name": "???", "age": 111 }
But the problem is TSimpleJSONProtocol is write only, thrift can't read it back to construct an instance of MyObj.
Although there is a TJSONProtocol which supports to serialize and deserialize, but the generated JSON is not readable, it uses a very simplified JSON format and most of the field names are missing. Not convenient to construct it in tests.
Is there any way to convert thrift objects to readable string and also can convert it back? If TSimpleJSONProtocol supports converting back, which is just what I'm looking for
The main goal of Thrift is to provide efficient serialization and RPC mechanisms. What you want is something that is - at least partially - contrary to that. Human-readable data structures and machine processing efficiency are to a good extent conflicting goals, and Thrift favors the latter over the former.
You already found out about the TSimpleJson and TJson protocols and about their pros and cons, so there is not much to add. The only thing that is left to say is this: the protocol/transport stack of Thrift is simple enough.
This simplicity makes it possible to add another protocol based on your specific needs without much or overly complicated work. One could probably even write an XML protocol (if anyone really wants such bloatware) in short time.
The only caveat, especially vis-à-vis your specific case, is the fact that Thrift needs the field ID to deserialize the data. So you either need to store them in the data, or you need some other mechanism which is able to retrieve that field ID based on the field and structure names.

Java serialization to string

I have the following declaration of the static type Object:
Integer typeId;
//Obtaining typeId
Object containerObject = ContainerObjectFactory.create(typeId);
The factory can produce different types of container objects, e.g. Date, Integer, BigDecimal and so forth.
Now, after creating the containerObejct I need to serialize it to an object of type String and store it into a database with hibernate. I'm not going to provide Object-relational mapping because it doesn't relate to the question directly.
Well, what I want to do is to serialize the containerObject depending on it runtime-type and desirialize it later with the type it was serialized. Is it ever possible? Could I use xml-serialization for those sakes?
There are numerous alternatives, and your question is quite broad. You could:
use the native Java serialisation, which is binary, and then Base64 encode it
use an XML serialisation library, such as XStream
use a JSON serialisation library, such as Gson
One key feature you mention is that the object type needs to be embedded in the serialised data. Native Java serialisation embeds the type in the data so this is a good candidate. This is a double-edged sword however, as this makes the data brittle - if at some time in the future you changed the fully qualified class name then you'd no longer be able to deserialise the object.
Gson, on the other hand, doesn't embed the type information, and so you'd have to store both the JSON and the object type in order to deserialise the object.
XML and JSON have advantages that they're a textual format, so even without deserialising it, you can use your human eyes to see what it is. Base64 encoded Java serialisation however, is an unintelligible blob of characters.
There are multiple ways, but you need custom serialization scheme, e.g.:
D|25.01.2015
I|12345
BD|123456.123452436
where the first part of the String represents the type and the second part represents the data. You can even use some binary serialization scheme for this.

Redis - Approach for Storing Data Into Redis :: JSON String OR Serialized pojo

I have a class like below:
public class Person
{
public String name;
public String age;
}
I am a bit confused over the approach of saving a Map of Perons into Redis:
Should I go for java serialized/deserialized object approach or should i try converting to JSON and then storing and vice versa.
Any thoughts on below mentioned points:
Cost of serialization and deserialization VS cost of mapping to Java and to JSON
memory Requirement for JSON and serialized object for Redis
Compression : Stream vs Data
Which compression should we go for
Though DATA compression seems a bit difficult(not much benificial) as we are using Redish Hash
Some of the assumptions are:
The pojo contain many instancd variables
will be using Redis hash to store object
You should consider using MessagePack as it is full compatible with Redis and Lua, it is a great compression on JSON: http://msgpack.org/
It implies some Lua code to compress and uncompress, but the cost should be small. Here is an example: http://gists.fritzy.io/2013/11/06/store-json-as-msgpack
There is a small benchmark which lacks data: https://gist.github.com/muga/1119814
Still it should be a great option for you, as you can use it in different languages, fully supported on redis, and it is based on JSON.
The answer is you should measure it for your use cases and environment. I would first try JSON at it's more versatile and less problematic - i.e. easier to debug and restore corrupted data.
Performance. JSON serialization is fast, so in many scenarios it won't be your bottleneck. Most probably it is disk or network IO: java serialization benchmarking. Avoid using default Java serialization as it is slow. Kryo is an option for binary output. If you need miltiple platforms for binary format consider DB's internal format or i.e. Google Protobuffers.
Compression. In Google they use Snappy for less-cpu-demanding compression. Snappy is also used in Cassandra, Hadoop and Hypertable. Some benchmarks for JVM compressors: Compression test using Calgary corpus data set .

Storing Serializable Objects in the Database

I'm writing an application which needs to write an object into database.
For simplicity, I want to serialize the object.
But ObjectOuputStream needed for the same purpose has only one constructor which takes any subclass of OutputStream as parameter.
What parameter should be passed to it?
You can pass a ByteArrayOutputStream and then store the resulting stream.toByteArray() in the database as blob.
Make sure you specify a serialVersionUID for the class, because otherwise you'll have hard time when you add/remove a field.
Also consider the xml version for object serialization - XMLEncoder, if you need a bit more human-readable data.
And ultimately, you may want to translate your object model to the relational model via an ORM framework. JPA (Hibernate/EclipseLink/OpenJPA) provide object-relational mapping so that you work with objects, but their fields and relations are persisted in a RDBMS.
Using ByteArrayOutputStream should be a simple enough way to convert to a byte[] (call toByteArray after you've flushed). Alternatively there is Blob.setBinaryStream (which actually returns an OutputStream).
You might also want to reconsider using the database as a database...
e.g. create ByteArrayOutputStream and pass it to ObjectOuputStream constructor
One thing to add to this. java serialization is a good, general use tool. however, it can be a bit verbose. you might want to try gzipping the serialized data. you can do this by putting a GZIP stream between the object stream and the byte stream. this will use a small amount of extra cpu, but that is often a worthy tradeoff to shipping the extra bytes over the network and shoving them in a db.

Categories