Not able to clear hibernate cache - java

I am using broadleaf demo application which has hibernate configured with ECache. I also have a external application which is interacting with same db directly.
When I update db using external application, my broadleaf application unware of those changes throws duplicate primary key while creating new entities. I am trying to resolve this issue by clearing out the hibernate cache periodically which enables hibernate to build the cache from scratch and hence everything syncs up.
I am using following code to clear out the second level cache.
Cache cache = sessionFactory.getCache();
String entityName = "someName";
cache.evictEntityRegion(entityName);
But, this doesn't seem to work.
I even tried to clear the cahche manually using JMX listeners like visualvm. But this also doesn't work. I am still getting old primary key values in my API's. Is this because only second level cache is being cleared leaving first level cache? I am stuck here. Can any one please help with this issue?
UPDATED :
Let's say I have application A and B. A uses broadleaf and B uses raw SQL queries to insert into db. I create few orders using application A and then I insert few orders directly in db using application B along with I update the SEQUENCE_GENERATOR table with max(order_id) + 1. Afterward when I try to create order using application A, it throws duplicate primary key exception. I tried to debug into the issue where I found that IdOverrideTableGenerator is still giving my old primary key. This made me curious about the second level cache. Doesn't broadleaf uses SEQUENCE_GENERATOR for starting references for primary key generation and maintains current state in cache ? In my case even updating the SEQUENCE_GENERATOR doesn't ensure the fresh and unique primary key.

You're correct in that you need L2 cache invalidation for your external imports if you want your implementation to recognize your new entities at runtime. Otherwise, you would have to wait for the configured TTL on your cache region to expire for your application to see the new records.
However, L2 cache doesn't have any direct correlation to how Hibernate determines primary keys in the case of Broadleaf. Broadleaf utilizes a table generator strategy for grabbing a batch of ids in a performant and cluster-safe way. You probably notice a table entitled SEQUENCE_GENERATOR in your schema. This table contains various id ranges that have been acquired for different domain classes. Whenever Hibernate needs to grab a new batch of ids for insertions, it will interact with this table to register a new range of ids to check out. This should guarantee that no node in the cluster will try to insert an entity with a colliding id.
In your case, you need to guarantee that an external process can perform insertions in a non-colliding manner. To do so, I believe you need to create an API for the external process to call that will perform this same "id checkout" operation on behalf of that calling process. Then, your import code (presumably housed elsewhere) will have a range of ids it can safely use. The code backing the API you create should perform the same operation that Hibernate would normally perform to acquire a batch of ids for entity insertions. You can review org.hibernate.id.enhanced.TableGenerator for an example of what this looks like and create something similar for your own purposes.

Related

SQL equivalent of Javax Cache 'put' (INSERT or UPDATE)

I am using javax cache along with database. I uses cache's APIs to get/put/delete entities and the database is behind this cache. For this,I am using CacheLoader and CacheWriter.
So, following are SQL's construct equivalent to cache API
SELECT -> get
INSERT -> put
DELETE -> delete
If I have entry already present in cache and I updated it, then I will get that value 'write' method only. But, since the value is present in database, I need to use UPDATE query.
How to identify which database operation to perform in cache's 'put' operation ?
Note : UPSERT is not good option from performance point of view.
If you put the value in the cache you can first check if the key is already there, in that case you need an UPDATE. If the key was not present, you need an INSERT. It sounds like you could benefit from an ORM with an L2 cache, such as Hibernate, which handles all these scenarios (and many more) for you.
There are several ways I can think of. Basically these are variations of:
Metadata in the database
Within an entity I have typically additional fields which are timestamps for insert and update and a modification counter which are handled by the object to relational mapper (ORM). That is very useful for debugging. The CacheWriter can check whether the insert timestamp is set, if yes, it is an update, if no it is an insert.
It does not matter whether the value gets evicted meanwhile, if your application is reading the latest contents through the cache and writes a modified version of it.
If your application does not read the data before modifying or this happens very often, I suggest to cache a flag that like insertedAlready. That leads to three way logic: isnerted, not inserted, not in the cache = don't know yet. In the letter case you need to do a read before update or insert in the cache writer.
Metadata in the cache only
The cached object stores additional data whether the object was read from the database before. Like:
class CachedDbValue<V> {
boolean insertedAlready;
V databaseContent;
}
The code facing your application needs to wrap the database data into the cached value.
Side note 1: Don't read the object from the cache and modify the instance directly, always make a copy. Modifying the object directly may have different unwanted effects with different JCache implementations. Also check my explanation here: javax.cache store by reference vs. store by value
Side note 2: You are building a caching ORM layer by yourself. Maybe use an existing one.

Add Column to Cassandra db with out losing data

I am using Cassandra database integrated into a spring boot application.
My Question is around the schema actions. If I need to make structural changes to the DB, say add a column to a table, the database needs to be recreated, however this means all the existing data gets deleted:
schema-action: CREATE_IF_NOT_EXISTS
The only way I have managed to solve this is by using the RECREATE scheme action, but as mentioned earlier, this results in data-loss.
What would be the best approach to handle this? To add structural changes such as a column name with out having to recreate the database and lose all existing data?
Thanks
Cassandra does allow you to modify the schema of an existing table without recreating it from scratch, using the ALTER TABLE statement via cqlsh. However, as explained in that link, there are some important limitations on the kind of changes you can do. You cannot modify the primary key of the table at all, you can add or delete regular columns, and you can't change the type of a column to a non-compatible one.
The reason for most of these limitations is how Cassandra needs to deal with the old data that already exists in the table. For example, it doesn't make sense to say that a column A that until now contained strings - will now contain integers - how are we supposed to handle all the old values in column A which weren't integers?
As Aaron rightly said in a comment, it is unlikely you'll want to do these schema changes as part of your application. These are usually rare operations which are done manually, or via some management application - not your usual application.

How to optimize one big insert with hibernate

For my website, I'm creating a book database. I have a catalog, with a root node, each node have subnodes, each subnode has documents, each document has versions, and each version is made of several paragraphs.
In order to create this database the fastest possible, I'm first creating the entire tree model, in memory, and then I call session.save(rootNode)
This single save will populate my entire database (at the end when I'm doing a mysqldump on the database it weights 1Go)
The save coasts a lot (more than an hour), and since the database grows with new books and new versions of existing books, it coasts more and more. I would like to optimize this save.
I've tried to increase the batch_size. But it changes nothing since it's a unique save. When I mysqldump a script, and I insert it back into mysql, the operation coast 2 minutes or less.
And when I'm doing a "htop" on the ubuntu machine, I can see the mysql is only using 2 or 3 % CPU. Which means that it's hibernate who's slow.
If someone could give me possible techniques that I could try, or possible leads, it would be great... I already know some of the reasons, why it takes time. If someone wants to discuss it with me, thanks for his help.
Here are some of my problems (I think): For exemple, I have self assigned ids for most of my entities. Because of that, hibernate is checking each time if the line exists before it saves it. I don't need this because, the batch I'm executing, is executed only one, when I create the databse from scratch. The best would be to tell hibernate to ignore the primaryKey rules (like mysqldump does) and reenabeling the key checking once the database has been created. It's just a one shot batch, to initialize my database.
Second problem would be again about the foreign keys. Hibernate inserts lines with null values, then, makes an update in order to make foreign keys work.
About using another technology : I would like to make this batch work with hibernate because after, all my website is working very well with hibernate, and if it's hibernate who creates the databse, I'm sure the naming rules, and every foreign keys will be well created.
Finally, it's a readonly database. (I have a user database, which is using innodb, where I do updates, and insert while my website is running, but the document database is readonly and mYisam)
Here is a exemple of what I'm doing
TreeNode rootNode = new TreeNode();
recursiveLoadSubNodes(rootNode); // This method creates my big tree, in memory only.
hibernateSession.beginTrasaction();
hibernateSession.save(rootNode); // during more than an hour, it saves 1Go of datas : hundreads of sub treeNodes, thousands of documents, tens of thousands paragraphs.
hibernateSession.getTransaction().commit();
It's a little hard to guess what could be the problem here but I could think of 3 things:
Increasing batch_size only might not help because - depending on your model - inserts might be interleaved (i.e. A B A B ...). You can allow Hibernate to reorder inserts and updates so that they can be batched (i.e. A A ... B B ...).Depending on your model this might not work because the inserts might not be batchable. The necessary properties would be hibernate.order_inserts and hibernate.order_updates and a blog post that describes the situation can be found here: https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/
If the entities don't already exist (which seems to be the case) then the problem might be the first level cache. This cache will cause Hibernate to get slower and slower because each time it wants to flush changes it will check all entries in the cache by iterating over them and calling equals() (or something similar). As you can see that will take longer with each new entity that's created.To Fix that you could either try to disable the first level cache (I'd have to look up whether that's possible for write operations and how this is done - or you do that :) ) or try to keep the cache small, e.g. by inserting the books yourself and evicting each book from the first level cache after the insert (you could also go deeper and do that on the document or paragraph level).
It might not actually be Hibernate (or at least not alone) but your DB as well. Note that restoring dumps often removes/disables constraint checks and indices along with other optimizations so comparing that with Hibernate isn't that useful. What you'd need to do is create a bunch of insert statements and then just execute those - ideally via a JDBC batch - on an empty database but with all constraints and indices enabled. That would provide a more accurate benchmark.
Assuming that comparison shows that the plain SQL insert isn't that much faster then you could decide to either keep what you have so far or refactor your batch insert to temporarily disable (or remove and re-create) constraints and indices.
Alternatively you could try not to use Hibernate at all or change your model - if that's possible given your requirements which I don't know. That means you could try to generate and execute the SQL queries yourself, use a NoSQL database or NoSQL storage in a SQL database that supports it - like Postgres.
We're doing something similar, i.e. we have Hibernate entities that contain some complex data which is stored in a JSONB column. Hibernate can read and write that column via a custom usertype but it can't filter (Postgres would support that but we didn't manage to enable the necessary syntax in Hibernate).

How to synchronize a code which is deployed on a cluster?

We have a stateless ejb which persists some data in an object oriented database. Unfortunately, today our persistence object does not have a unique key due to some unknown reason and altering the PO is also not possible today.
So we decided to synchronize the code. Then we check if there is an object already persisted with the name(what we consider should be unique). Then we decide to persist or not.
Later we realized that the code is deployed on a cluster which has three jboss instances.
Can anyone please suggest an idea which does not allow to persist objects with the same name.
If you have a single database behind the JBoss cluster you can just apply a unique contraint to the column for example (I am assuming its an SQL database):
ALTER TABLE your_table ADD CONSTRAINT unique_name UNIQUE (column_name);
Then in the application code you may want to catch the SQL exception and let the user know they need to try again or whatever.
Update:
If you cannot alter the DB schema then you can achieve the same result by performing a SELECT query before insert to check for duplicate entries, if you are worried about 2 inserts happening at the same time you can look at applying a WRITE_LOCK to the row in question

keeping the history of table in java

I need the sample program in Java for keeping the history of table if user inserted, updated and deleted on that table. Can anybody help in this?
Thanks in advance.
If you are working with Hibernate you can use Envers to solve this problem.
You have two options for this:
Let the database handle this automatically using triggers. I don't know what database you're using but all of them support triggers that you can use for this.
Write code in your program that does something similar when inserting, updating and deleting a user.
Personally, I prefer the first option. It probably requires less maintenance. There may be multiple places where you update a user, all those places need the code to update the other table. Besides, in the database you have more options for specifying required values and integrity constraints.
Well, we normally have our own history tables which (mostly) look like the original table. Since most of our tables already have the creation date, modification date and the respective users, all we need to do is copy the dataset from the live table to the history table with a creation date of now().
We're using Hibernate so this could be done in an interceptor, but there may be other options as well, e.g. some database trigger executing a script, etc.
How is this a Java question?
This should be moved in Database section.
You need to create a history table. Then create database triggers on the original table for "create or replace trigger before insert or update or delete on table for each row ...."
I think this can be achieved by creating a trigger in the sql-server.
you can create the TRIGGER as follows:
Syntax:
CREATE TRIGGER trigger_name
{BEFORE | AFTER } {INSERT | UPDATE |
DELETE } ON table_name FOR EACH ROW
triggered_statement
you'll have to create 2 triggers one for before the operation is performed and another after the operation is performed.
otherwise it can be achieved through code also but it would be a bit tedious for the code to handle in case of batch processes.
You should try using triggers. You can have a separate table (exact replica of your table of which you need to maintain history) .
This table will then be updated by trigger after every insert/update/delete on your main table.
Then you can write your java code to get these changes from the second history table.
I think you can use the redo log of your underlying database to keep track of the operation performed. Is there any particular reason to go for the program?
You could try creating say a List of the objects from the table (Assuming you have objects for the data). Which will allow you to loop through the list and compare to the current data in the table? You will then be able to see if any changes occurred.
You can even create another list with a object that contains an enumerator that gives you the action (DELETE, UPDATE, CREATE) along with the new data.
Haven't done this before, just a idea.
Like #Ashish mentioned, triggers can be used to insert into a seperate table - this is commonly referred as Audit-Trail table or audit log table.
Below are columns generally defined in such audit trail table : 'Action' (insert,update,delete) , tablename (table into which it was inserted/deleted/updated), key (primary key of that table on need basis) , timestamp (the time at which this action was done)
It is better to audit-log after the entire transaction is through. If not, in case of exception being passed back to code-side, seperate call to update audit tables will be needed. Hope this helps.
If you are talking about db tables you may use either triggers in db or add some extra code within your application - probably using aspects. If you are using JPA you may use entity listeners or perform some extra logic adding some aspect to your DAO object and apply specific aspect to all DAOs which perform CRUD on entities that needs to sustain historical data. If your DAO object is stateless bean you may use Interceptor to achive that in other case use java proxy functionality, cglib or other lib that may provide aspect functionality for you. If you are using Spring instead of EJB you may advise your DAOs within application context config file.
Triggers are not suggestable, when I stored my audit data in file else I didn't use the database...my suggestion is create table "AUDIT" and write java code with help of servlets and store the data in file or DB or another DB also ...

Categories