Java write object to MySQL - each field or serialized byte array? - java

I'm trying to save an instance of a Serializable Java class to a MySQL database.
As far as I know I have two options:
Create a table that contains all fields as columns and re-create the object from the field-data saved in the database.
Serialize the instance -> get the byte array -> save the byte array to the database -> re-create the instance from the byte array.
My Question: Which way works faster and which way needs less space?
And my second question: How would I easilly write and get the byte array from the mysql database using jdbc?

Saving the serialized byte array would maybe save space as you wouldn't have all of the meta data associated with table columns, headers, etc... Having said that, I don't know that there would be any noticeable difference in speed or storage space by saving the individual fields in columns in the database versus one column with the object bytes. If you're going to serialize it and save it, you might as well save it to a file and not use a database at all. Not to mention, as your objects and model change, loading older versions could be problematic. Maintainability would be a nightmare as well.
Personally, I'd never save a serialized byte array of an object in a database unless there was a very specific business case or reason to do so. I'd just create the table, columns, and persist it that way using JDBC or your favorite persistenace framework (like Hibernate). Saving it as a serialized byte array only seems to limit what you can do with the data. If you don't want to create the database, tables, columns, etc... then consider just serializing it and writing to a file. That would probably save some space and time as you wouldn't have to maintain a database server. Granted, the more objects you have, the more files, and the harder it would be to search and query that data.
TL;DR: I'd just create the database tables for the data you're trying to save. I don't see any noticeable benefits from saving it in a database as a serialized byte array.

Related

Single data column vs multiple columns in Cassandra

I'm working on a project with an existing cassandra database.
The schema looks like this:
partition key (big int)
clustering key1 (timestamp)
data (text)
1
2021-03-10 11:54:00.000
{a:"somedata", b:2, ...}
My question is: Is there any advantage storing data in a json string?
Will it save some space?
Until now I discovered disadvantages only:
You cannot (easily) add/drop columns at runtime, since the application could override the json string column.
Parsing the json string is currently the bottleneck regarding performance.
No, there is no real advantage to storing JSON as string in Cassandra unless the underlying data in the JSON is really schema-less. It will also not save space but in fact use more because each item has to have a key+value instead of just storing the value.
If you can, I would recommend mapping the keys to CQL columns so you can store the values natively and accessing the data is more flexible. Cheers!
Erick is spot-on-correct with his answer.
The only thing I'd add, would be that storing JSON blobs in a single column makes updates (even more) problematic. If you update a single JSON property, the whole column gets rewritten. Also the original JSON blob is still there...just "obsoleted" until compaction runs. The only time that storing a JSON blob in a single column makes any sense, is if the properties don't change.
And I agree, mapping the keys to CQL columns is a much better option.
I don't disagree with the excellent and already accepted answer by #erick-ramirez.
However there is often a good case to be made for using frozen UDTs instead of separate columns for related data that is only ever going to be set and retrieved at the same time and will not be specifically filtered as part of your query.
The "frozen" part is important as it means less work for cassandra but does mean that you rewrite the whole value each update.
This can have a large performance boost over a large number of columns. The nice ScyllaDB people have a great post on that:
If You Care About Performance, Employ User Defined Types
(I know Scylla DB is not exactly Cassandra but I've seen multiple articles that say the same thing about Cassandra)
One downside is that you add work to the application layer and sometimes mapping complex UDTs to your Java types will be interesting.

Most efficient way to store unused chunk of data in PostgreSQL

There are few columns in a table and about 100+ columns based data, which only need to be stored for later export to another sources.
This data (besides the first few columns mentioned) doesn't need to be indexed / filtered or be manipulated in some sort. There are no queries, that can check this data in any way.
The only thing, that application layer can retrieve the whole row with additional unused workload and deserialize it for further conversion in external format.
There was an idea to serialize whole class into this field, but later we realized, that it's a tremendous overhead for data size (because of additional java class metadata).
So it's a simple key-value data (keys set is static as the relational model suggests).
What is a right way and data type to store this additional unused data in PostgreSQL in terms of DB performance (50+ TB storage)? Perhaps it's worth to omit key data and store only values as array (since keys are static) and get values after deserialization by index at the application layer (since DB performance on the first place)?
a_horse_with_no_name, thanks a lot, but jsonb is really tricky data type.
In terms of required amount of bytes for single tuple, that contains jsonb, one must always keep in mind - the size of key names in json format.
Such that, if someone want to reinvent the wheel and store large key names as single byte indexes - it will decrease overall tuple size,
but it isn't better than storing all data as typical relational table fields, because TOAST algorithm applies for both cases.
Another way is to use EXTERNAL storage method for single jsonb field.
In that case PostgreSQL will keep more tuples in cache, since there is no need to keep whole jsonb data in memory.
Anyway, i ended up with combination of protobuf + zlib in bytea field type (since there is no need to query data in bytea field in our system) :
Uber research for protobuf + zlib

Mixed list of new/updated objects: how to efficiently store them to the DB?

OK, so let's say I have a list that contains the following types of objects:
Objects that are already stored in the database (have the same PK),
and are the same as in the database, not modified
Objects that are already stored in the database (have the same PK), and are modified in regards to the stored ones, so they need to be updated
Objects that don't yet exist in the database, and are about to be saved
Such list of objects is being sent as a JSON to the web-service, and the web-service now has to communicate to the database, and decide what objects to store, update or ignore.
My question is how to do this effectively?
One idea is to iterate the list, and for every object's PK make a query to the database, and check if the object in the database is non-existent, the same, or modified. And then choose the action based on that information.
What bothers me with that approach is a whole lot of queries to the database, just to save some objects. What if only 1 of 100 should really be saved? It is so ineffective.
Is there any better way to do that?
You can send the whole list to DB (MYSQL) and do upsert :
INSERT ... ON DUPLICATE KEY UPDATE

jBPM PROCESSINSTANCEINFO and WorkitemInfo tables in H2 db

I am new to jbpm and would like to know if the already configured H2 db stores the objects(DataItems) associated with the process and work item in it somewhere.
I can see there is a byte array present in both the tables and I am not sure what exactly that bytearray stores and how to unmarshall it.
Any sort of information would be really helpful.
Thanks.
The *Info objects do store all relevant data the engine needs in a binary format. This is not meant for query purposes however. If you want to get access to variable values, either use the audit logs or use pluggable variable persistence to store them separately (for example by making them a JPA entity they will be stored in a separate table).

LinkedList with Serialization in Java

I'm getting introduced to serialization and ran into some problems when pairing it with LinkedList
Consider i have the following table:
CREATE TABLE JAVA_OBJECTS (
ID BIGINT NOT NULL UNIQUE AUTO_INCREMENT,
OBJ_NAME VARCHAR(50),
OBJ_VALUE BLOB
);
And i'm planning to store 3 object types - so the table may look like so -
ID OBJ_NAME OBJ_VALUE
============================
1 Class1 BLOB
2 Class2 BLOB
3 Class1 BLOB
4 Class3 BLOB
5 Class3 BLOB
And i'll use 3 different LinkedList's to manage these objects..
I've been able to implement LoadFromTable() and StoreIntoTable(Class1 obj1).
My question is - if i change an attribute for a Class2 object in LinkedList<Class2>, how do i effect the change in the DB for this individual item? Also take into account that the order of the elements in LinkedList may change..
Thanks : )
* EDIT
Yes, i understand that i'll have to delete/update a row in my DB table. But how do i keep track of WHICH row to update? I'm only storing the objects in the List, not their respective IDs in the table.
You'll have to store their IDs in the objects you are storing. However, I would suggest not trying to roll your own ORM system, and instead use something like Hibernate.
If you change an attribute in a an object or the order of items. You will have to delete that row and insert the updated list again.
How do i effect the change in the DB for this individual item?
I hope I get you right. The SQL update and delete statements allow you to add a WHERE clause in which you chose the ID of the row to update.
e.g.
UPDATE JAVA_OBJECTS SET OBJ_NAME ="new name" WHERE ID = 2
EDIT:
To prevent problems with your Ids you could wrap you object
class Wrapper {
int dbId;
Object obj;
}
And add them instead of the 'naked' object into your LinkedList
You can use AUTO_INCREMENT attribute for your table and then use the mysql_insert_id() function to retrieve the id assigned to the row added/updated by the last INSERT/UPDATE statement. Along with this maintain a map (eg a HashMap) from the java object to the Id. Using this map you can keep track of which row to delete/update.
Edit: See the answer to this question as well.
I think the real problem here is, that you mix and match different levels of abstraction. By storing serialized Java objects into a relational database as BLOBs you have to consider several drawbacks:
You loose interoperability. Applications written in other languages than Java are not able to read the data back. Even other Java applications have to have the class files of the serialized classes in their classpath.
Changing the class definitions of the stored classes will end up in maintenance nightmares.
You give up the advantages of a relational database. Serialization hides the actual data from the database. So the database is presented only with a black box. You are unable to execute any meaningfull query against the real data. All what you have is the ID and block of bytes.
You have to implement low level data handling by yourself. Actually the database is made to handle your data effectively, but because of serialization you hinder it doing its job. So you are on your own and you are running into that problem right now.
So in most cases you benifit from separation of concerns and using the right tool for a job.
Here are some suggestions:
Separate the internal data handling inside your application from persistent storage. Design your database schema in a way to enable the built-in database features to handle the data efficently. In case of a relational database like MySQL you can choose from different technologies like plain JDBC, object relational mappers like JPA or simple mappers like MyBatis. Separation here means to avoid to contaminate the database with implementation specific concerns.
If you have for example in your Java application a List of Person instances and each Person consists of a name and an age. Then you would represent that list in a relational database as a table consisting of a VARCHAR field for the name and a numeric field for the age and maybe a third field for a unique key. Then the database is able to do what it can do best: managing large amounts of data.
Inside your application you typically separate the persistent layer from the rest of your program containing the code to communicate with the database.
In some use cases a relational database may not be the appropiate tool. Maybe in a single user desktop application with a small set of data it may be the best to simply serialize your Person list into a plain file and read it back at the next start up.
But there exists other alternatives to persist your data. Maybe some kind of object oriented database is the right tool. In particular I have experiences with Fast Objects. As a simplification it is serialization on steroids. There is no need for a layer like JPA or JDBC between your application and your database. You are able to store the class instances directly into the database. But unlike the relational database with its BLOB field, the OODB knows your classes and the actual data and can benefit from that.
Another alternative may be JDBM or Berkeley DB.
So separation of concerns and choosing the right persistence strategy (and using it the right way) is a key concern for the success of your project. But doing it right is hard even for experienced developers.

Categories