Is it reliable to use ehCache as a datasource instead of a database ?
My business functionality will be to periodically collect information from a running application and store it in the Ehcache cache and then retrieve and display statistics about the collected information by querying the cache with EhCache Search API. The cache will only nee to keep the last 30-45 days of data.
What do you think about this approah?
ehCache could be an acceptable solution - assuming TTI, TTL and other params are set according to your business needs. There shouldn't be any reliability issue per se. A SQL database affords options for transactional commits, complex queries and relational support which aren't provided of course of ehCache by itself.
Related
I am developing Java based jobs having some business logic which would run in its own jvm and would want to have a separate cache containing frequently accessed data from database , also running in its own jvm. The jobs need to access this cache instead of hitting database.
Which cache can I use? Ehcache , hazelcast or coherence?
How will jobs access this cache? Basically how will I expose cache operations mostly fetch operations to the jobs?
I only have some experience with EhCache, which served as a cache layer for Hibernate (or any other ORM) and it is transparent to the fetch operations, meaning that you don't have to explicitly activate it every time you run a DB query. EhCache inspects each query you run and if it sees the same query, with the same parameters, was run previously then it will hit the cache ,unless invalidated. However, EhCache runs on the same JVM as your java application with default configuration.
There is another solution since you wish to run the cache on a separate machine. A Database called Redis (https://redis.io/) is a great tool for building fast caches. it is a Document based, NOSQL store, which can run standalone or embedded on your cache JVM. i highly recommend you try it out (i am not affiliated with Redis in any way :) )
I have a webapp application and several batch editing the same database.
WebApp is deployed in Tomcat using the Tomcat Datasource building. This Datasource using hibernate second level cache configuring with her own ehcache.xml file.
Batchs are runging for updating the same database and use their own ehcache configaration ehcache.xml.
So webapp and batch don't share the same Region for Cache.
My problem is when the batch updates the database, my webapp view is not updating. This behaviour is normal because the expiration of entity on cache is not done on webapp side. The view is updating after refresh it.
My Question:
What is the best practice for this concurrency situation ?
Thx
My web application is deployed into Tomcat container. And it has her own ehcache.xml to configure Region cache.
For my bathc, there is a trigger based on the crontable to trigger all the night the batch. But it is outside TomCat. it also has her own ehcache.xml on configure her own Region cache.
There is no best practice solution.You have couple of variants depending on how much time you can tolerate the cache to be cold and how resilent you can be to invalid cache entries.
Another criteria would be what kind of data are you updating. Is it transactional data (Money ?) or is it something you update once a day and correctness is not critical. I pressume your Batch application is deployed together with your web application. This pressumption is important because you need to be capable of connecting from the batch to your hibernate second level cache in order to send invalidations.
Perform the batch and send asynchroneus invalidation messages for the updated entries. There will be a very short window in which your cache will contain stale entries but it should be fine as long as you don't have extremely high requirements for consistency like money handling. The overall performance should be fine.
Synchronous invalidation of the affected cache entries with transaction spanning across the cache invalidation and the database update. Performance wise this is suicide :) The transaction here is important in case of failure to persist data to DB you may end up with wrong data in your cache.
Shut the caching off for all transactional data that may be affected. The performance of your online part will get affected.
3A) you can decide to put some warm up logic for the cache once the batch is complete.
3B
Run the batch in a maintenance window. Your online application will go down for that period. You will probably spare development time :)
Use your cache as primary data storage and then place a background process synchronizing it with the DB. You may need some amount of replication to ensure no data loss. Some caches provide disk persistence as well.
What happens if you batch and you web application are running separately. That is a problem that can be solved in two ways:
You use a cache server like hazelcast or infinispan. Ehcache alone is not a server you need terracota on top to make it a server.
Alternativly you keeo the ehcache, then you need to build an interface in your webb app for cache invalidation. For example you can use a queue to send the invalidation messages ad then consume them and process them.
My website serve live info to user. These information can change dynamically. (You can think it is a STOCK Prices) My each query time to get these information from db about 3-5 seconds. My total time to get all information about 3 minutes. I serve these information to 6000 user. I am using hashmap to store and serve information to users. I get all information from db every 5 minutes and store it on hashmap. Everything is OK but I want to use advanced cache systems. What is your suggest. Can I use HSQLDB for that? INFO: I am using Spring MVC + Hibernate so I don't want to use non-JAVA solutions such as REDIS.
You may use ehcache as a second level cache for hibernate or as "self managed" cache. Guava library offers also efficient cache capibilities.
I need to develop some services and expose an API to some third parties.
In those services I may need to fetch/insert/update/delete data with some complex calculations involved(not just simple CRUD). I am planning to use Spring and MyBatis.
But the real challenge is there will be multiple DB nodes with same data(some external setup will takes care of keeping them in sync). When I got a request for some data I need to randomly pick one DB node and query it and return the results. If the selected DB is unreachable or having some network issues or some unknown problem then I need to try to connect to some other DB node.
I am aware of Spring's AbstractRoutingDataSource. But where to inject the DB Connection Retry logic? Will Spring handle transactions properly if I switch the dataSource dynamically?
Or should I avoid Spring & MyBatis out-of-the-box integration and do Transaction management by myself using MyBatis?
What do you guys suggest?
I propose to you using of NoSQL database like MongoDB. It is easy clustering. You can configure for example use 10 servers and do replication of data 3 times.
Thats mean that if 2 of your 10 servers will fails - you still got data save.
NoSQL databases is different comparing to RDBS, but they can give hight performance for clustering.
Also, there is no transactions support for NoSQL - you have to do it manually in case of financial operations.
Actually you should thing in different way when developing with NoSQL.
Yes, it will work. Get AbstractRoutingDataSource and code your own one. The only thing you cannot do is to change the target database while a transaction is running.
So what you have to do is putting the db retry code in the getConnection. If during the transaction that connection becomes invalid you should let it fail.
I’m trying to figure out which cache concurrency strategy should I use for my application (for entity updates, in particular). The application is a web-service developed using Hibernate, is deployed on Amazon EC2 cluster and works on Tomcat, so no application server there.
I know that there are nonstrict-read-write \ read-write and transactional cache concurrency strategies for data that can be updated and there are mature, popular, production ready 2L cache providers for Hibernate: Infinispan, Ehcache, Hazelcast.
But I don't completely understand the difference between the transactional and read-write caches from the Hibernate documentation. I thought that the transactional cache is the only choice for a cluster application, but now (after reading some topics), I'm not so sure about that.
So my question is about the read-write cache. Is it cluster-safe? Does it guarantee data synchronization between database and the cache as well as synchronization between all the connected servers? Or it is only suitable for single server applications and I should always prefer the transactional cache?
For example, if a database transaction that is updating an entity field (first name, etc.) fails and has been rolled back, will the read-write cache discard the changes or it will just populate the bad data (the updated first name) to all the other nodes?
Does it require a JTA transaction for this?
The Concurrency strategy configuration for JBoss TreeCache as 2nd level Hibernate cache topic says:
'READ_WRITE` is an interesting combination. In this mode Hibernate
itself works as a lightweight XA-coordinator, so it doesn't require a
full-blown external XA. Short description of how it works:
In this mode Hibernate manages the transactions itself. All DB
actions must be inside a transaction, autocommit mode won't work.
During the flush() (which might appear multiple time during
transaction lifetime, but usually happens just before the commit)
Hibernate goes through a session and searches for
updated/inserted/deleted objects. These objects then are first saved
to the database, and then locked and updated in the cache so
concurrent transactions can neither update nor read them.
If the transaction is then rolled back (explicitly or because of some
error) the locked objects are simply released and evicted from the
cache, so other transactions can read/update them.
If the transaction is committed successfully, then the locked objects are
simply released and other threads can read/write them.
Is there some documentation how this works in a cluster environment?
It seems that the transactional cache works correctly for this, but requires JTA environment with a standalone transaction manager (such as JBossTM, Atomikos, Bitronix), XA datasource and a lot of configuration changes and testing. I managed to deploy this, but still have some issues with my frameworks. For instance, Google Guice IoC does not support JTA transactions and I have to replace it with Spring or move the service to some application server and use EJB.
So which way is better?
Thanks in advance!
Summary of differences
NonStrict R/w and R/w are both asynchronous strategies, meaning they
are updated after the transaction is completed.
Transactional is
obviously synchronous and is updated within the transaction.
Nonstrict R/w never locks an entity, so there's always the chance of
a dirty read.
Read-Write always soft locks an entity, so any
simultaneous access is sent to the database. However, there is a
remote chance that R/w might not produce Repeatable Read isolation.
The best way to understand the differences between these strategies
is to see how they behave during the course of the Insert, update or
delete operations.
You can check out my post
here
which describes the differences in further detail.
Feel free to comment.
So far I've only seen clustered 2LC working with transactional cache modes. That's precisely what Infinispan does, and in fact, Infinispan has so far stayed away from implementing the other cache concurrency modes. To lighten the transactional burden, Infinispan integrates via transaction synchronizations with Hibernate as opposed to XA.