How to reboot application without losing the TreeMap kept in memory? - java

In a Spring Boot application, I keep a TreeMap in memory. I'm doing around 10,000 operations per second, and it may increase. To improve performance, I kept data in memory. I want my app to be able to start from the same state when application is restarted.
There are some methods I could find for this.
Keeping data on Hazelcast.
In this case I don't risk losing the data unless the Hazelcast dies, but if the Hazelcast dies, I can't restore data. Additionally, I don't think it makes sense to sync that amount of operations on Hazlecast.
Synchronizing events to database.
Here, my risk of data loss is very low. However, I need to execute a query after each operation. This may affect performance. Also, I need to handle exceptions on database update.
Synchronizing data in batches
There is only one ready solution that I could find here, MapDB. I'm planning to try it but I haven't tried it yet. If there is a more reliable, optimized sink solution that also uses db instead of file, I would prefer to use it.
Any recommendation to solve this question?

Do you need a Map or a TreeMap ?
Is collating sequence relevant for storage, for access or neither.
For Hazelcast, the chance for data loss is configurable. You set up a cluster with the level of resilience you want. This is the same as with disk, if you have one disk and it fails, you lose data. If you have two and one goes offline, you don't lose data. You allocate hardware for the level of resilience you need. Three is the recommended minimum.
(10,000 per second isn't worrying either, 1,000,000,000 has been done. Sync to an external store can be immediate or in batches)
Disclaimer, I work for Hazelcast, but I think your question is more fundamental -- how do you keep your store available.
Simply, don't restart.
Clustered solutions are the answer here. If you have multiple nodes, the service as a whole stays running even if a few nodes go offline.
Do rolling bounces.
If you must restart everything at once, what matters is how quickly can your service bring all data back and what does it do when the restore is 50% done (is 50% data visible?). Immediate replication to elsewhere is only really necessary if you have a clustered solution that hasn't been configured for resilience. Saving intermittently is fine if you have solved resilience.
So, configure your storage so that it doesn't go offline, makes the solution options for backup/restore all the easier.

Related

How to tweak cache intensive app in java?

does anyone know was the proper configuration/development approach when writing an application that only uses cache as store?
To give some background, the application doesn't need to store any information (it actually stores a timestamp but I'll explain that later) because it only reads from what another app writes. We have a stored procedure that reads from that application's database and returns us the information at that point. From the moment the application starts, any update is notified through a topic so that database is no longer needed (until next restart).
Once everything is loaded, every record in the cache has to be read when certain messages are consumed to loop through them an process them individually. The application keeps a Map of Lock objects, each one for each record in the cache, to avoid race conditions. If the record meets certain criteria, a timestamp is written to the cache and to a database using write-behind of up to 5000 records.
The application is already developed but I think we have some problems with GCs. We keep getting spikes and I would like to know if there is any recommendation on what to do to reduce them.
These are the things we've done so far:
There is a collection of Strings that are repeated over and over for each record. I'm interning these ones (we are using java 8)
The cache we are using is EhCache. To avoid recreating objects, the element from the cache is used directly.
Every variable is a long or a String, except for an enum value and a LocalDateTime that is required to do some date checks.
There are two caches. This is because, once a criteria is met, a timestamp has to be replicated to another instance of the app. For this, we are using JMS replication from EhCache that uses topics for these updates.
The timestamp updates don't happen very often so the impact this could have should be minimum.
The amount of records is, at the moment, 350000, each one with a bunch of Strings and longs alongside the enum and LocalDateTime mentioned before.
A random problem we have is that sometimes it throws GC overhead limit exceeded. Normally the application keeps lowering the amount of memory it uses after some GCs but it seems sometimes it cannot handle the load.
The box has 3GB of memory for this and the application after a major GC uses around 500MB for the cache.
Apart from this, I don't know how the JVM is configured or what kind of GC uses. Any ideas or any blogs or documents someone could suggest me to start reading?
Thanks!
As you are running Java 8 you could change the Garbage Collector. The so called "Garbage First" GC has been there as an option since early versions of Java 7. Problems from its infancy have been resolved and it is often recommended for interactive applications that need fast response.
It can be enabled by using -XX:+UseG1GC and will become the default on Java 9.
Read more about it at http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html

how to reduce usage of HttpSession

In my application, there are many objects in session. Therefore, the size of the HttpSession is high for each user who access my application(It is a Employee information management system). I tried to reduce the objects in session and some time, used the HttpRequest. How ever still session caries many objects which results that the JVM memory usage is high and slow the server. Therefore I need to reduce the session size.
Any solution for this ? Should I add another server and balance the load or upgrade the RAM size of the PC or JVM memory upgrade (If I upgrade so, what will happen in future when more users than now use the system, then again should I have to upgrade the memory?) ?
My supposed solution is to add those object into a caching server which runs in separate server(separate PC) instead of the session.
Please let me know your ideas.
Should you add another server? - May be. Should you add RAM and give more memory to JVM? - You could. Will it solve the problem? If the problem is in the design, then it might only upto a certain point. My 1st attempt would be to reduce the objects in session. If that's not possible, have a flat model of that object, you know create a thinned down version of your object which has the bare minimum it needs to be meaningful.
If you still think that all that is done and no significant improvement is noticed, then as Marc suggested, you could try scaling out or scaling up depending on your needs and constraints. Alternatively, you could use Memcached or again as Marc suggested Redis or Mongo if you're okay with persisting sessions and are willing to take a bit of performance hit. I say a bit because these things are real quick.

Write back strategy for Memcache on GAE

My App Engine (Java) application is planned to work on a data structure that needs frequent updates on many items. The amount of data is not planned to exceed 1000 records (per client) but the amount of clients is unlimited so I'm not willing to do 1000 reads and 1000 writes every second just to update some counters.
Naturally I'm thinking about utilizing the Memcache. Ideally the data should be in memory all the time so I can read and update it frequently. It should only be written to the data storage if the cache is full or the VM is being shut down (my biggest concern). Can I implement some sort of write-back strategy where the cache is only written to the storage when it needs to?
In particular my two questions are:
How do I know when an item is deleted from the cache?
How do I know when the VM is being shut down, so I can persist the content of the cache?
Short answer: No.
Longer answer: Memcache offers no guarantees.
Useful answer: Look at https://developers.google.com/appengine/articles/scaling/memcache#transient. If losing data is an option, you can rely on memcache (but sometimes some things might be lost).
Don't worry about the VM being shut down though: Memcache runs outside of the instance VM, and is shared between all the app instance VMs.

Downloading A Large SQLite Database From Server in Binary vs. Creating It On The Device

I have an application that requires the creation and download of a significantly large SQLite database. Depending on the user's data, creation of the db and the syncing of data from the server can take upwards of 20 to 25 minutes (some customers have a LOT of data). The data is downloaded as JSON and processed with Android's built in JSON classes.
To account for OutOfMemory issues I was having with some devices, I needed to limit the per-call download from the server to 500 records at a time. But, as of now, all of the above is working successfully - although slow.
Recently, there has been talk from my team of creating the complete SQLite db on the server side and then just downloading it to the device in binary in an effort to speed things up. I've never done this before. Is this indeed a viable option OR should I just be looking into speeding up the processing of the JSON through a 3rd party lib like GSON or Jackson.
Thanks in advance for your input.
From my experience with mobile devices, reinventing synchronization is an overkill most of the time. It obviously depends on the hardware, software and amounts of data you're working with. But most of the time long operation execution times on mobile devices are caused by faulty design, careless coding or specifics of embedded systems not taken into consideration.
Unfortunately, I can only give you some hints which you may consider, given pretty vague description of issues you're facing. I mean "LOT" doesn't mean much to me - I've seen mobile apps with DBs containing millions of records running pretty smoothly and ones that had around a 1K records running horribly slow and causing UI to freeze. You also didn't mentioned what OS version and device (or at least it's capabilities) you're using. What's the server configuration, what software is installed, what libraries/frameworks are used and in what modes. It all matters when you want to really speed things up.
Apart of encoding being gzip (which I believe you left default, which is on), you should give this ideas a try:
Streaming! - make sure both the client and the server use a streaming version of JSON API and use buffered streams. If either doesn't - replace it with a library that does. Jackson has one of the fastest streaming API. Sure it's more cumbersome to write a (de)serializer, but it pays off. When done properly, none of the sides must create a buffer large enough for (de)serialization of all the data, fill it with contents, and then parse/write it. Instead, a much smaller buffer is allocated and filled gradually as successive fields are serialized. When this buffer gets filled, it's contents is immediately sent to the other end of data channel. There it can be deserialized right away. The process continues until all data have been transmitted in small chunks. It makes the data interchange much more fluent and less resource-intensive.
For large batch inserts or updates use prepared statements. It also sometimes helps to insert your data without constraints and then create them - that way, for example, an index can be computed in one run instead of for each insert. Don't use transactions (they require maintaining extra database logs) or commit every 300 rows to minimize the overhead. If you're updating existing database and atomic modifications are necessary - load new data to a temporary database and, if everything is ok, replace old database with new one on the fly.
Almost always some data can be precomputed and stored on an sd-card for example. Or it can be loaded directly to an sd-card as a prepared SQLite DB in the company. If a task requires data that is so large that an import takes more than 10 minutes, you probably shouldn't do that task on mobile devices in the first place.

Should I be backing up a webapp's data to another host continuously?

I have a webapp in development. I need to plan for what happens if the host goes down.
I will lose some very recent session status (which I can live with) and everything else should be persistently stored in the database.
If I am starting up again after an outage, can I expect a good host to reconstruct the database to within minutes or seconds of where I was up to, or should I build in a background process to continually mirror the database elsewhere?
What is normal/sensible?
Obviously a good host will have RAID and other redundancy, so the likelihood of total loss should be low, and if they have periodic backups, I should lose only very recent stuff but this is presumably likely to be designed with almost static web content in mind, and my site is transactional with new data being filed continuously (with a customer expectation that I don't ever lose it).
Any suggestions/advice?
Are there off-the-shelf frameworks for doing this? (I'm primarily working in Java.)
Should I just plan to save the data or should I plan to have an alternative usable host implementation ready to launch in case the host doesn't come back up in a suitable timeframe?
You need a replication strategy which of course depends on your database engine.
It's usually done by configuration.
http://en.wikipedia.org/wiki/Replication_%28computer_science%29
I've experience with Informix there you can setup data replication to have a standby system available or do full backup the data, and replay logical-logs (which contain basically all SQL-Statement) which needs more time to recover from a crash.
Having a redundant storage is also a good idea on case of a disc crashes. This topic is probably better discussed on serverfault.com

Categories