data base caching strategies for range pagination

data base caching strategies for range pagination - java

Need advice on caching and paging. Scenario goes like this..
User give to variable range (say variable X from x1 to x2 and Y from y1 to y2) and I fetch the data from database, after that some logic do ordering on this result and give first page to user back..
For every user these (X & Y) are different.
Problem start when user ask for second page i have to fire the query and order the result again and give the second page result.
This has to be done for each user request..
Can u suggest me any caching strategies for this..
(Java + mysql)
If I am not clear do let me know...

If you use hibernate for DB acccess you can allow second_level_caching to get EhCache caching your results automatically. Read more on this here

I assume that a user can make a request for eg x1 to x100 and get 100 results, and you want to display results paginated with eg 10 results at time.
I have three suggestions, each with various merits.
Rely on MySQL to do the caching.
MySQL is pretty good at caching. If you repeat the same query a number of times, then MySQL will try and cache the results for you, making subsequent queries very fast (they come from memory and don't touch disk). If you query x1 to x100 and just display x1 to x10, then when you want to display page 2 you query x1 to x100 again MySQL can use its own internal cache. MySQL caching obviously uses RAM, so you need to ensure your DB server has enough RAM to be able to cache effectively. You will have to estimate how much data you expect to be caching and see if this is feasible.
This is an easy solution and hardware (RAM) is fairly cheap*.
Cache internally
Save the whole x1 to x100 results in your application (for eg in a user session) and display the relevant result for the page requested. This is similar to using MySQL to cache, but you are moving the cache closer to where it is needed. If you do this you have to think about things like cache management yourself (eg when to expire cache, managing memory use). You can use existing caching tools like ehcache for this.
Prefetch instead of caching
Caching requires a lot of memory. If caching isn't feasible then you might want to consider a prefetch strategy. With no caching you might experience slow response times but the problem is somewhat alleviated with a prefetch. This is where after a user has viewed one page you do an async look up of other pages the user is likely to view (usually next and previous). The lookup may be slow but you are doing it before the user has asked for it so its ready immediately when they need it.
Prefetching is a fairly common technique in web applications; when a user views a page the application can use ajax to load the prev and next pages into the DOM behind the scenes while the user is viewing the current page. When the user clicks the 'next' link the application modifies the DOM to show the next page without having to touch the server at all. To the user it looks like the response is instant.
*I was once on a MySQL administration course and on the topic of performance the tutor said the first thing anyone should do is fit as much RAM to the server as it supports. Most servers can be filled with RAM (were talking 100s of Gb) for less that the cost of a 3 day performance tuning course.

Related

Web Application Database or Maps for performance

I want to know whether it is useful to use ConcurrentHashMaps for user data. I have the user data saved in a mysql database and retrieve them when a user logs in (or someone edits the user). Every time when the user goes on another page, these user data will be refreshed. Should I use a map and save changes from my application there while having a database in background or should I directly download it from the db. I want to make the application as performant as possible.

What you are describing is a cache. Suppose the calls to the database cost a lot because there is a lot of info to load, or the query that is used to extract the data is complex and requires a lot of time. Here comes in play the cache data structure. It is basically an in memory storage, which is really faster w.r.t querying the database, because indeed, it is already loaded in memory.
The process of filling the cache takes the same time as querying the db for the data (generally more but in the same order). So it makes sense to use caches only if it brings benefit in time. There is a compromise though, speed vs freshness of data. Depending on your use-case you must find the right compromise between those two, and you shall afterwards find out if it is really convenient.
As you describe it, i.e user updates that needs to be saved and displayed, using a cache seems a bit an overkill IMO, unless you have lot of registered users, and so many of those are using the system simultaneously. If you decide to use it keep in mind of some concurrency issues that may rise. Concurrent hash maps saves you from many hazards but with performance compromise.

If the performance is the priority I think you should keep the logged users in memory.
That way, the read requests would be fast as you would not need to query the database. However, you would need to update the map if any of the logged users would be somehow edited.

A human cannot tell the difference between a 1ms delay and a 50ms delay. So it is overkill to optimize beyond "good enough".
MySQL already does a flavor of caching; your addition of another cache may actually slow down the response time.

Best design to handle very high volume of data handling in Java - JDBC

I am calling a web service which can return very high volume of data (>100K records = 200 MB). I have to insert this data in to SQL Server too. I have the following questions.
I know it depends on the server resources, but is there a ball park advice on limit of how much data should I store in any java structure (Collection -
with item having 4,5 string members each of length < 255) at run-time? I am already
using 50,000 records in each call (I am not sure how much memory
does it take)...
I then upload this data using batch sizes of 1000 to database using
JDBC. Is this correct approach? Would there be any benefit if I use
JPA for this instead of JDBC?
Also any standard design to handle this? I can think of breaking
down the web service calls into pages of limited size and then using Java Threads to handle them. Is this the right direction?
Thanks

First of all a web service which can return very high volume of data is no t enough information.A web service which can return very high volume of data ALWAYS, ONCE IN A WHILE , X% of THE TIME etc can help in designing a better system.
Its not advisable to use web services to exchange such a large quantity of data because it puts a strain on physical network infrastructure too but I guess that service is not part of your system.
Your application will be very unreliable with that amount of data per hit and you will need a very fast network too to get that amount of data.
and now coming to your points,
1.You have guessed it right, it all depends on server resources. There are applications which might be comfortable with a million records in a collection and at some places few thousands might be too much. You have to keep heap space and limits imposed by OS in mind. All in all - this is very specific to an application.
Purpose of collection plays a role too - is it for look up or just temporary storage to pass around data? how frequently does it get cleaned up? is it on stack or as object field? is it loaded once and cleaned before next load or keeps growing?
2.JDBC batch is a correct approach and not JPA.
3.if reading data from web service and storing data in DB is the main flow of the job , Spring Batch API might better fit into your design.
Hope it helps !!

AJAX/JavaScript search performance better than Java/Oracle

I work with a very large, enterprise application written in Java which queries an Oracle SQL database. We use JavaScript on the front end, and are always looking for ways to improve upon the performance of the application with increased use.
The issue we're having right now is that we are sending a query, via Java, that results in 39,000 records. This is putting a significant load on the server and causes the browser to hang. I should mention that the data is relatively static (only changes about once a year) and we could use an xml map or something similar (flat file) since we know the exact results that will be returned each time.
The query, however, is still taking 1.5 - 2 minutes to load, which is unacceptable. I wanted to see if there were any suggestions as to how this scenario can be optimized, especially if it can be done any quicker with JavaScript (or jQuery) and using AJAX for the db connection. Or, are we going about this problem all wrong?

You want to determine if the slowness is due to:
the query executing in the database
the network is slow returning 39k records
the javascript working with the 39k records after the ajax is complete
If you can run the query in sqlplus or toad, this will eliminate the web-tier and network all together. If this is slow, then tune the query by checking indexes.
If after adding the appropriate indexes, the query is still slow, then you could prebuild the query's results and store the results in a table or you could create a materialized view.
Once you have the query performing well from sqlplus, then add the network back into the equation. Run it from your web browser and see what overhead is being added.
If it is still slow, then you need to determine if the problem is the act of ajaxing the data or if the slowness occurs after the page does something with the data (ie. populating a data grid via javascript).
If the slowness is because the browser is waiting for the data, then you want to make sure it's only ever fetched once. You can do this by setting the cache headers in the ajax request to cache the result for 1 year. Or you can store the results in localstorage.
If the slowness is due to the browser working with the 39k rows (ie. moving the data into a data grid), then you have a few options.
find a better approach or library
use pagination
You may find performance issues from each of these areas. Most likely the query just needs to be tuned and by adding indexes or pre-querying the data and storing it will solve the problem.
Another thing to consider is if you really need 39k rows at one time. If you can, paginate at the db level so you're returning 100 rows per page.

Pagination in Highly dynamic and Frequently change Data in java

I am java developer and my application is in iOS and android.I have created web service for that and it is in restlet Framework as JDBC as DB connectivity.
My problem is i have three types of data it is called intersection like current + Past + Future.and this intersection contain list of user as a data.There is single web service for giving all users to device as his/her intersection.I have implement pagination but server has to process all of his/her intersections and out of this giving (start-End) data to device.I did this because there are chances that past user may also come in current.This the total logic.
But as intersection grows in his/her profile server has to process all user.so it become slow and this is obvious.also device call this web service in every 5 minutes.
please provide better suggestion to handle this scenario.
Thanks in advance.
Ketul Rathod

It's a little hard to follow your logic, but it sounds like you can probably benefit from caching your results on the server.
If it makes sense, after every time you process the users data on the server, save the results (to a file, to a database table, whatever). Then, in 5min, if there are no changes, simply return the same. If there were changes, retrieve from cache (optionally invalidating the cache in the process), append those changes to what is cached, re-save the results in the cache, and return the results.
If this is applicable to your workflow, your server-side processing time will be significantly less.

How to mange big amount users at server side?

I built a social android application in which users can see other users around them by gps location. at the beginning thing went well as i had low number of users, But now that I have increasing number of users (about 1500 +100 every day) I revealed a major problem in my design.
In my Google App Engine servlet I have static HashMap that holding all the users profiles objects, currenty 1500 and this number will increase as more users register.
Why I'm doing it
Every user that requesting for the users around him compares his gps with other users and check if they are in his 10km radius, this happens every 5 min on average.
That is why I can't get the users from db every time because GAE read/write operation quota will tare me apart.
The problem with this desgin is
As the number of users increased the Hashmap turns to null every 4-6 hours, I thing that this time is getting shorten but I'm not sure.
I'm fixing this by reloading the users from the db every time I detect that it became null, But this causes DOS to my users for 30 sec, So I'm looking for better solution.
I'm guessing that it happens because the size of the hashmap, Am I right?
I would like to know how to manage ALL users profiles with max aviablity.
Thanks.

I would not store this data in the HashMap as it does not really scale if you run on multiple instances and furthermore you use a lot of memory.
Why do you not use some different storages like MongoDB which is also available 'in the cloud'? (e.g. www.mongohq.com).
If you would like to scale you need to separate the data from the processors. E.g. have x servers running your servlet (or let Google AppEngine scale this on themselves) and have the data at a different place (e.g. in a MongoDB or PostgreSQL).

You need to rethink your whole design. Storing all users in one huge HashMap won't scale (sooner or later you'll have to cluster your application). Also the complexity of your algorithm is quite high - you need to traverse the whole map for each user.
A much more scalable solution would be to use a spatial database. All major relation databases and some NoSQL products offer geospatial indexing. Basically the database query engine is optimized for queries like: give me all the records with near this given point.
If your application is really successful, even an in-memory map will be slower than enterprise-grade geospatial index.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.