Second order unsafe deserialization - java

I have a requirement to serialize a map to a string and store this string in the DB. Whenever I need this data, I read it from the DB and deserialize it. Seeker points out that this could lead to second-order unsafe deserialization. I am using XStream TPP for serializing and deserializing process. I have already added whitelisting (by using allowTypesByWildcard) mechanism. I know we can either do a checksum validation or encrypt the serialized data before inserting it in DB but it would fail if the DB already has data that is not encrypted. Is there any better way to avoid this vulnerability and also handle already existing data in the DB?

Related

Redis value as byte[] vs plain string

I am using redis as centralized cache for distributed system. Currently i am using jedis to connect to redis cluster, where i am storing the value as byte[] instead of string. My question is does storing plain string or byte [] has impact on getting the data. In my application i serialize my java pojo object and convert to byte [] and then store, where as i can convert it to json and store so while getting it from redis i can readily use the object instead of deserialize. I have tried both but the only difference i can see is the extra step of deserialize
In Redis, everything is a byte[]. What redis calls as strings are actually byte[] in programming languages.
When you store JSON, you still need to serialize it to byte[] before saving to redis, and do the reverse when you read back. This is no different from serializing a java object. In other words, you always have to pay the cost of serialization and deserialization.
That said, different libraries have different serialization costs. Java serialization is know to be slow and inefficient. JSON is likely to be better than java serialization - but wastes memory in redis because it is a text based. You can choose a better serialization library.
Kryo is a faster replacement for the java serializer. Message Pack is like JSON but faster. Protocol Buffers / Flat Buffers are even better, but require you to declare a schema upfront. There are other serialization formats as well, each with their tradeoffs.
The general recommendation - try to use the hash datatype. It is efficient, and lets you request specific fields instead of the whole object. Only if hash does not work for you, pick something else based on your needs.
P.S. If you are into benchmarks, this website has several - https://github.com/eishay/jvm-serializers/wiki

Parsing a cypher query result (JSON) to a Java object

I use the jersey/jackson stack to address a neo4j database via the REST api, but I have some issues how to interpret the result.
If I read the node by its ID (/db/data/node/xxx) the result can be mapped to my DTO very easy by calling readEntity(MyDto.class) on the response. However, usage of internal IDs is not recommended and various use cases require to query by custom properties. Here cypher comes into play (/db/data/cypher).
Assuming a node exists with a property "myid" and a value of "1234", I can fetch it with the cypher query "MATCH (n {myid: 1234}) RETURN n". The result is a JSON string with a bunch of resources and eventually the "data" I want do unmarshall to a java object. Unmarshalling it directly fails with a ProcessingException (error reading entity from input stream). I see no API allowing to iterate the result's data.
My idea is to define some kind of generic wrapper class with an attribute "data", giving this one to the unmarshaller, and unwrapping my DTO afterwards. I wonder if there is a more elegant way to do this, like using "RETURN n.data" (which does not work) or something like this. Is it?
You should look into neo4j 2.0 where return n just returns the property map.
I usually tend to deserialize the result as a nested list/map (i.e. have ObjectMapper read to Object.class or Map.class) structure and grab the data map directly out of that.
There's probably a way to tell jackson to ignore all the information around that data field
If you want to have a nicer presentation you can also check out my cypher-rs project which returns only the data in question, nothing more.

Difference between Serialization and saving an object via JDBC to JAVA_OBJECT

I am aware of what Serialization is however I have not found any real practical example describing the latter one (saving an object in a database taking advantage of the JAVA_OBJECT mapping).
Do I have first to serialize the object and then save it to the database?
In the case of MySQL, you don't have to serialize the object first, the driver will do it for you. Just use the PreparedStatement.setObject method.
For example, first in MySQL create the table:
create table blobs (b blob);
Then in a Java program create a prepared statement, set the parameters, and execute:
PreparedStatement preps;
preps = connection.prepareStatement("insert into blobs (b) values (?)");
preps.setObject(1, new CustomObject());
preps.execute();
Don't forget that the class of the object that you want to store has to implement the Serializable interface.
Serialization is used to save the state of an object and marshall it to a stream and share it with a remote process. The other process just need to have the same class version to deserialize the stream back to an object.
The problem with the database approach is that you will need to expose the databse even to the remote process. This is generally not done due to various reasons, mainly security.

How do I put Serializable Objects in a JSON body? In what format should I store in my database?

I am building a logging/journal service for an osgi framework, with the intention of using my journal entries for restoring the system from a backup after a systemfailure.
But I came across a problem: to make sure I have enough data, necessary to be able to restore the system correctly, I need to pass used functioncalls and their arguments.
I pass the functionName as a String to my journalService and the arguments as an array of Serializable Objects. I demand the arguments to be Serializable because I need to persist them to an external database.
I contact my database via a REST/JSON framework, so I just want to post my journal entries to my database. My problem however is this: how can I put the Serializable Objects (the args) into my JSON body? And in what format do I need to store them in my database?
I would serialize the objects to a byte array/stream, and base64-encode the array/stream to get a printable String.
At the database level, you can store the base64 string as a CLOB, or decode it to a byte array and store it as a BLOB.

Storing Serializable Objects in the Database

I'm writing an application which needs to write an object into database.
For simplicity, I want to serialize the object.
But ObjectOuputStream needed for the same purpose has only one constructor which takes any subclass of OutputStream as parameter.
What parameter should be passed to it?
You can pass a ByteArrayOutputStream and then store the resulting stream.toByteArray() in the database as blob.
Make sure you specify a serialVersionUID for the class, because otherwise you'll have hard time when you add/remove a field.
Also consider the xml version for object serialization - XMLEncoder, if you need a bit more human-readable data.
And ultimately, you may want to translate your object model to the relational model via an ORM framework. JPA (Hibernate/EclipseLink/OpenJPA) provide object-relational mapping so that you work with objects, but their fields and relations are persisted in a RDBMS.
Using ByteArrayOutputStream should be a simple enough way to convert to a byte[] (call toByteArray after you've flushed). Alternatively there is Blob.setBinaryStream (which actually returns an OutputStream).
You might also want to reconsider using the database as a database...
e.g. create ByteArrayOutputStream and pass it to ObjectOuputStream constructor
One thing to add to this. java serialization is a good, general use tool. however, it can be a bit verbose. you might want to try gzipping the serialized data. you can do this by putting a GZIP stream between the object stream and the byte stream. this will use a small amount of extra cpu, but that is often a worthy tradeoff to shipping the extra bytes over the network and shoving them in a db.

Categories