Hibernate: adding an object if not already present - java

Here's the case: I am creating a batch script that runs daily, parsing logfiles and exporting the data to a database. The format of this file is basically
std_prop1;std_prop2;std_prop3;[opt_prop1;[opt_prop2;[opt_prop3;[..]]]
The standard properties map to a table with a column for each property, where each line in the logfile basically maps to a corresponding row. It might look like LOGDATA(id,timestamp,systemId,methodName,callLenght). Since we should be able to log as many optional properties as we like, we cannot map them to the same table, since that would mean adding a row the table every time a new property was introduced. Not to think of the number of NULL references ...
So the additional properties go in another table, say EXTRA_PROPS(logdata_foreign_key,propname,value). In reality, most of the optional properties are the same (e.g. os version, app container, etc), making it somewhat wasteful to log for instance 4 rows in EXTRA_PROPS for each row in LOGDATA (in the case that one on average had 4 extra properties). So what I would like my batch job to do is
for each additionalProperty in logRow:
see if additionalProperty already exist
if exists:
create a reference to it in a reference table
if not:
add the property to the extra properties table
create a reference to it in a reference table
I would then probably have three slightly different tables:
LOGDATA(id,timestamp,systemId,methodName,callLenght)
EXTRA_PROPS(id,propname,value)
LOGDATA_HAS_EXTRA_PROPS(logid,extra_prop_id)
I am not 100% this is a better way of doing it, I would still create N rows in the LOGDATA_HAS_EXTRA_PROPS table for N properties, but at least I would not add any new rows to EXTRA_PROPS.
Even if this might not be the best way (what is?), I am still wondering about the tecnhical side: How would I implement this using Hibernate? It does not have to be superfast, but it would need to chew through 100K+ rows.

Firstly, I would not recommend using Hibernate for this type of logic. Hibernate is a great product but doing this kind of high load data operations may not be it's strongest point.
From data modeling standpoint, it appears to me that (propname,value) is actually a primary key in EXTRA_PROPS. Basically, you want to express the logic that, for example, hostname + foo.bar.com combination will only appear once in the table. Am I right? That would be PK. So you will need to use that in LOGDATA_HAS_EXTRA_PROPS. Using name alone will not be sufficient for reference.
In Hibernate (if you choose to use it), that can be expressed via composite key using #EmbeddedId or Embeddable on object mapped to EXTRA_PROPS. And then you can have many to many relationship that uses LOGDATA_HAS_EXTRA_PROPS as association table.

Related

Add Column to Cassandra db with out losing data

I am using Cassandra database integrated into a spring boot application.
My Question is around the schema actions. If I need to make structural changes to the DB, say add a column to a table, the database needs to be recreated, however this means all the existing data gets deleted:
schema-action: CREATE_IF_NOT_EXISTS
The only way I have managed to solve this is by using the RECREATE scheme action, but as mentioned earlier, this results in data-loss.
What would be the best approach to handle this? To add structural changes such as a column name with out having to recreate the database and lose all existing data?
Thanks
Cassandra does allow you to modify the schema of an existing table without recreating it from scratch, using the ALTER TABLE statement via cqlsh. However, as explained in that link, there are some important limitations on the kind of changes you can do. You cannot modify the primary key of the table at all, you can add or delete regular columns, and you can't change the type of a column to a non-compatible one.
The reason for most of these limitations is how Cassandra needs to deal with the old data that already exists in the table. For example, it doesn't make sense to say that a column A that until now contained strings - will now contain integers - how are we supposed to handle all the old values in column A which weren't integers?
As Aaron rightly said in a comment, it is unlikely you'll want to do these schema changes as part of your application. These are usually rare operations which are done manually, or via some management application - not your usual application.

Hibernate create table at runtime

In my database(postgreSQl) is table Person which contains name and password. It is possible to dynamically(at Spring runtime) create new table by using Hibernate? For example I want to create something like single infoTable for every Person (when Person object is created). This table should have name like Person_Id+"infoTable". Is there any way to do that?
Hibernate does offer the mode called EntityMode.MAP that is excellent for unstructured data.
In order to use EntityMode.MAP, you want to create a Map<String, Object> structure where the map keys represent the column names and the object represents the value for the associated map key column. To save such data, you'd use:
session.save( "the_name_of_my_table", theEntityMap );
While this exists, I don't believe this is ideal for your use case though.
As others have suggested, you'd be better off creating an entity that contains a foreign key back to your Person entity and merely manage the multiple rows in a single table. There are numerous database features that can help easily deal with large volumes of data without having to resort to using EntityMode.MAP.
You set hibernate.hbm2ddl.auto=update in hibernate configuration, but think twice before doing so.

Hibernate three tables many to many

I have a database with 3 tables. The main table is Contract, and it is joined with pairs of keys from two tables: Languages and Regions.
each pair is unique, but it is possible that one contract will have the following pair ids:
{ (1,1), (1,2), (2,1), (2,2) }
Today, the three tables are linked via a connecting entity called ContractLanguages. It contains a sequence id, and triplets of ids from the three tables.
However, in large enough contracts this causes a serious performance issue, as the hibernate environment creates a staggering amount of objects.
Therefore, we would like to remove this connecting entity, so that Contract will hold some collection of these pairs.
Our proposed solution: create an #embeddable class containing the Language and Region id's, and store them in the Contract entity.
The idea behind this is that there is a relatively small number of languages and regions.
We are assuming that hibernate manages a list of such pairs and does not create duplicates, therefore substantially reducing the amount of objects created.
However, we have the following questions:
Will this solution work? Will hibernate know to create the correct object?
Assuming the solution works (the link is created correctly), will hibernate optimize the object creation to stop creating duplicate objects?
If this solution does not work, how do we solve the problem mentioned above without a connecting entity?
From your post and comments I assume the following situation, please correct me if I'm wrong:
You have a limited set of Languages + Regions combinations (currently modelled as ContractLanguages entities)
You have a huge amount of Contract entities
Each contract can reference multiple Languages and Regions
You have problems loading all the contract languages because currently the combination consists of contract + language + region
Based on those assumptions, several possible optimizations come to my mind:
You could create a LanguageRegion entity which has a unique id and each contract references a set of those. That way you'd get one more table but Hibernate would just create one entity per LanguageRegion and load it once per session, even if multiple contracts would reference it. For that to work correctly you should employ lazy loading and maybe load those LanguageRegion entities into the first level cache before loading the contracts.
Alternatively you could just load columns that are needed, i.e. just load parts of an entity. You'd employ lazy loading as well but wouldn't access the contract languages directly but load them in a separate query, e.g. (names are guessed)
SELECT c.id, lang.id, lang.name, region.id, region.name FROM Contract c
JOIN c.contractlangues cl
JOIN cl.language lang
JOIN cl.region region
WHERE c.id in (:contractIds)
Then you load the contracts, get their ids, load the language and region details using that query (it returns a List<Object[]> with the object array containing the column values as selected. You put those into an appropriate data structure and access them as needed. That way you'd bypass entity creation and just get the data that is needed.

Select and Create_If_Not_Available versus Create always on a table with Unique Constraint

Please deal with this basic question.
I have a table T with unique constraint on composite columns "A" and "B".
While creating entry into T, I have two options:
1. Select and Create_If_Not_Available
a. Select * from T where A='a' and B='b'.
b. If no entry available, create a new entry (with handling of ConstraintViolation because of race condition), else move to next step.
2. Create always
a. Create entry into DB (With handling of ConstraintViolation and relying on Databases's unique constraint).
As per me, 2nd will have less queries being fired on DB and replies on database (outside the application code).
First makes a lot of unnecessary calls but has very little dependency on unique constraint of database.
Given high scale, which one should I prefer and why?
I am using Java, Hibernate and Oracle.
Thanks,
The things done by the database in step 2 will have to be done in step 1 anyway, so from a pure efficiency standpoint, step 2 is better. Why do you care about "dependency on unique constraints"? From an integrity perspective, you need them.
If your application is attempting to insert a lot of records for existing keys, it seems to me you have other problems.

LinkedList with Serialization in Java

I'm getting introduced to serialization and ran into some problems when pairing it with LinkedList
Consider i have the following table:
CREATE TABLE JAVA_OBJECTS (
ID BIGINT NOT NULL UNIQUE AUTO_INCREMENT,
OBJ_NAME VARCHAR(50),
OBJ_VALUE BLOB
);
And i'm planning to store 3 object types - so the table may look like so -
ID OBJ_NAME OBJ_VALUE
============================
1 Class1 BLOB
2 Class2 BLOB
3 Class1 BLOB
4 Class3 BLOB
5 Class3 BLOB
And i'll use 3 different LinkedList's to manage these objects..
I've been able to implement LoadFromTable() and StoreIntoTable(Class1 obj1).
My question is - if i change an attribute for a Class2 object in LinkedList<Class2>, how do i effect the change in the DB for this individual item? Also take into account that the order of the elements in LinkedList may change..
Thanks : )
* EDIT
Yes, i understand that i'll have to delete/update a row in my DB table. But how do i keep track of WHICH row to update? I'm only storing the objects in the List, not their respective IDs in the table.
You'll have to store their IDs in the objects you are storing. However, I would suggest not trying to roll your own ORM system, and instead use something like Hibernate.
If you change an attribute in a an object or the order of items. You will have to delete that row and insert the updated list again.
How do i effect the change in the DB for this individual item?
I hope I get you right. The SQL update and delete statements allow you to add a WHERE clause in which you chose the ID of the row to update.
e.g.
UPDATE JAVA_OBJECTS SET OBJ_NAME ="new name" WHERE ID = 2
EDIT:
To prevent problems with your Ids you could wrap you object
class Wrapper {
int dbId;
Object obj;
}
And add them instead of the 'naked' object into your LinkedList
You can use AUTO_INCREMENT attribute for your table and then use the mysql_insert_id() function to retrieve the id assigned to the row added/updated by the last INSERT/UPDATE statement. Along with this maintain a map (eg a HashMap) from the java object to the Id. Using this map you can keep track of which row to delete/update.
Edit: See the answer to this question as well.
I think the real problem here is, that you mix and match different levels of abstraction. By storing serialized Java objects into a relational database as BLOBs you have to consider several drawbacks:
You loose interoperability. Applications written in other languages than Java are not able to read the data back. Even other Java applications have to have the class files of the serialized classes in their classpath.
Changing the class definitions of the stored classes will end up in maintenance nightmares.
You give up the advantages of a relational database. Serialization hides the actual data from the database. So the database is presented only with a black box. You are unable to execute any meaningfull query against the real data. All what you have is the ID and block of bytes.
You have to implement low level data handling by yourself. Actually the database is made to handle your data effectively, but because of serialization you hinder it doing its job. So you are on your own and you are running into that problem right now.
So in most cases you benifit from separation of concerns and using the right tool for a job.
Here are some suggestions:
Separate the internal data handling inside your application from persistent storage. Design your database schema in a way to enable the built-in database features to handle the data efficently. In case of a relational database like MySQL you can choose from different technologies like plain JDBC, object relational mappers like JPA or simple mappers like MyBatis. Separation here means to avoid to contaminate the database with implementation specific concerns.
If you have for example in your Java application a List of Person instances and each Person consists of a name and an age. Then you would represent that list in a relational database as a table consisting of a VARCHAR field for the name and a numeric field for the age and maybe a third field for a unique key. Then the database is able to do what it can do best: managing large amounts of data.
Inside your application you typically separate the persistent layer from the rest of your program containing the code to communicate with the database.
In some use cases a relational database may not be the appropiate tool. Maybe in a single user desktop application with a small set of data it may be the best to simply serialize your Person list into a plain file and read it back at the next start up.
But there exists other alternatives to persist your data. Maybe some kind of object oriented database is the right tool. In particular I have experiences with Fast Objects. As a simplification it is serialization on steroids. There is no need for a layer like JPA or JDBC between your application and your database. You are able to store the class instances directly into the database. But unlike the relational database with its BLOB field, the OODB knows your classes and the actual data and can benefit from that.
Another alternative may be JDBM or Berkeley DB.
So separation of concerns and choosing the right persistence strategy (and using it the right way) is a key concern for the success of your project. But doing it right is hard even for experienced developers.

Categories