we are building a web application, multiple dashboards and reports, that uses multiple datasources to read the data
we are planning to use redis cache to make performance better of the application
what should be the better solution for this
1) loading data periodically into the redis cache and serve it from the cache
this seems very good at first but has problem that for dashboards and reports that are not frequently used we will be pulling the data after the certain time hence creating unnecessary load on application and cache
2) not loading the data into the cache will take minutes of time to load for one dashboard or report
what should be the ideal approach we can take in such system where number of users are very much likely to expand to tens of thousand in a year after the release
we are using java, spring, hibernate, and redis cache on server side and angular2 on client side
Related
I am developing Java based jobs having some business logic which would run in its own jvm and would want to have a separate cache containing frequently accessed data from database , also running in its own jvm. The jobs need to access this cache instead of hitting database.
Which cache can I use? Ehcache , hazelcast or coherence?
How will jobs access this cache? Basically how will I expose cache operations mostly fetch operations to the jobs?
I only have some experience with EhCache, which served as a cache layer for Hibernate (or any other ORM) and it is transparent to the fetch operations, meaning that you don't have to explicitly activate it every time you run a DB query. EhCache inspects each query you run and if it sees the same query, with the same parameters, was run previously then it will hit the cache ,unless invalidated. However, EhCache runs on the same JVM as your java application with default configuration.
There is another solution since you wish to run the cache on a separate machine. A Database called Redis (https://redis.io/) is a great tool for building fast caches. it is a Document based, NOSQL store, which can run standalone or embedded on your cache JVM. i highly recommend you try it out (i am not affiliated with Redis in any way :) )
I have a webapp application and several batch editing the same database.
WebApp is deployed in Tomcat using the Tomcat Datasource building. This Datasource using hibernate second level cache configuring with her own ehcache.xml file.
Batchs are runging for updating the same database and use their own ehcache configaration ehcache.xml.
So webapp and batch don't share the same Region for Cache.
My problem is when the batch updates the database, my webapp view is not updating. This behaviour is normal because the expiration of entity on cache is not done on webapp side. The view is updating after refresh it.
My Question:
What is the best practice for this concurrency situation ?
Thx
My web application is deployed into Tomcat container. And it has her own ehcache.xml to configure Region cache.
For my bathc, there is a trigger based on the crontable to trigger all the night the batch. But it is outside TomCat. it also has her own ehcache.xml on configure her own Region cache.
There is no best practice solution.You have couple of variants depending on how much time you can tolerate the cache to be cold and how resilent you can be to invalid cache entries.
Another criteria would be what kind of data are you updating. Is it transactional data (Money ?) or is it something you update once a day and correctness is not critical. I pressume your Batch application is deployed together with your web application. This pressumption is important because you need to be capable of connecting from the batch to your hibernate second level cache in order to send invalidations.
Perform the batch and send asynchroneus invalidation messages for the updated entries. There will be a very short window in which your cache will contain stale entries but it should be fine as long as you don't have extremely high requirements for consistency like money handling. The overall performance should be fine.
Synchronous invalidation of the affected cache entries with transaction spanning across the cache invalidation and the database update. Performance wise this is suicide :) The transaction here is important in case of failure to persist data to DB you may end up with wrong data in your cache.
Shut the caching off for all transactional data that may be affected. The performance of your online part will get affected.
3A) you can decide to put some warm up logic for the cache once the batch is complete.
3B
Run the batch in a maintenance window. Your online application will go down for that period. You will probably spare development time :)
Use your cache as primary data storage and then place a background process synchronizing it with the DB. You may need some amount of replication to ensure no data loss. Some caches provide disk persistence as well.
What happens if you batch and you web application are running separately. That is a problem that can be solved in two ways:
You use a cache server like hazelcast or infinispan. Ehcache alone is not a server you need terracota on top to make it a server.
Alternativly you keeo the ehcache, then you need to build an interface in your webb app for cache invalidation. For example you can use a queue to send the invalidation messages ad then consume them and process them.
I have medium sized and traffic ecommerce website. At a time around 200-300 visitors. Webapp features are:
Built in Java, Spring used for MVC
Using Ehcache to cache several data requests from database
Pure JDBC used for connecting to database (Using connection pool of tomcat)
Deployed on tomcat in an AWS EC2
Using RDS as a database server
Around 100 database connections assigned to webapp
I am using Ehcache extensively to cache most of the catalog data as it is requested by all traffic coming on website. But when I deploy a new version on tomcat, almost always database server gets stalled due to excessive queries fired. Ehcache does not able to help here because till now nothing is cached. Best case is that it takes around 45 minutes till when website remains extremely slow and ehcache manages to cache important data. Worst case is website gets crashed and application stops running.
On Development environment it works very smooth, as there is no traffic. To quickly find a way around this problem, we did a quick fix.
The fix was: In ServletContextListener we made a dummy hit to most crucial services related to catalog which was eating up the database server by excessive queries. Due to this change, as soon as application gets deployed we fetch all data related to catalog in our memory and ehcache caches it all. Thereafter, application becomes usable to public. Although, this change has caused around 30seconds of lag in start when we deploy the app but we managed to get away from 45 minutes of slow website.
This fix indeed solved our problem but it doesn't feel like a good solution. Because everything related to catalog or other crucial data is in memory whether it is going to get used to not. It is around 3.5 GB of data. Moreover, it is a nightmare now to work in development environment now. Because of low memory in development systems.
Please suggest a good way to handle this problem.
Filling the cache at startup feels like a good idea. That's what I would do. If it fits in memory, I wouldn't mind loading too much stuff.
The alternative would be to have an expiry policy and to periodically ping the cache to remove expired entries. But it sounds more like a waste of time.
Distributed caching could also solve the problem but it means adding a layer of complexity to your architecture. I would do that only if necessary. And I don't think it is.
Then, to prevent loading in dev, just use a Spring profile that causes the loading to be active only in production (and staging ideally).
i want to optimize my game servers in Minecraft. I have 150k users in database, when daily on my servers join 15k users.
I have read about Redis, and i also read that Redis is faster than MySQL, i know that i can't give up from MySQL because my websites are using same database.
But what if i will load every 15 minutes all MySQL data to redis, then all my server plugins will work on this data, then after next 15 minutes redis will export that data to MySQL? I load same data to 4 servers and to 3 plugins on every server, so maybe loading it all to one redis server will be faster than send requests to MySQL from 4 servers * 3 Plugins?
Thanks for help.
Redis is an effective way to cache data from a MySQL database. Even though Redis has persistence options, many will still favor using a MySQL database for this task. As Redis operates in memory, it will be much faster than a MySQL database which (for the most part) does not operate in memory. Often, people will favor storing cache data with HashMaps, but since you have 3 servers, Redis would be a much better option. This way, you wouldn't have to create 3 near identical caches for each server.
Hi as much I can understand you have 4 mysql servers and 3 plugins.
As Redis is extremely fast no doubt but Redis use case is different than mysql. my advice is to load data in Redis which you'll use very frequently it'll be much faster than mysql, but to make it faster you have to design your keys intelligently so that Redis can search it faster. You can refresh keys and values after certain interval, defiantly your system's performance will improve.
I have seen application to have clustered web server(like 10 to 20 server) to have scalability where they can distribute the
load among webservers. But i have always seen all webserver using single DB.
Now consider any ecommerce or railways web application where million users are hitting the application at any point of time.
To scale at webserver side, we can have server clustering but how we can scale DB ? As we can not have multiple DB like multiple webserver as one dB will have different state than other one :)
UPDATE:-
Is scaling the db not possible in relational DBMS but only in NO SQL DB like mongo db etc ?
There is two differend kind of scalability on database side. One is read-scalability and other one is write scalability. You can do both with scaling vertically means adding more CPU and RAM to some level. But if you need to scale on very large data more than the limit of a single machine you should use read replicas for need of read-scalability and sharding for write-scalability.
Sharding is not working like putting some entities(shoes) to one server and others(t-shirts) to another servers. It works like putting some of shoes and some of t-shirts to one machine and doing that for the rest of entities also.
Another solution for high volume data management is using microservices which is more similar to your example. I mean having a service for shoes another service for t-shirts. With microservices you divide your code and data to different projects and to different application and database servers. So you can deal with scalability of different part of your data differently.