AJAX/JavaScript search performance better than Java/Oracle

AJAX/JavaScript search performance better than Java/Oracle - java

I work with a very large, enterprise application written in Java which queries an Oracle SQL database. We use JavaScript on the front end, and are always looking for ways to improve upon the performance of the application with increased use.
The issue we're having right now is that we are sending a query, via Java, that results in 39,000 records. This is putting a significant load on the server and causes the browser to hang. I should mention that the data is relatively static (only changes about once a year) and we could use an xml map or something similar (flat file) since we know the exact results that will be returned each time.
The query, however, is still taking 1.5 - 2 minutes to load, which is unacceptable. I wanted to see if there were any suggestions as to how this scenario can be optimized, especially if it can be done any quicker with JavaScript (or jQuery) and using AJAX for the db connection. Or, are we going about this problem all wrong?

You want to determine if the slowness is due to:
the query executing in the database
the network is slow returning 39k records
the javascript working with the 39k records after the ajax is complete
If you can run the query in sqlplus or toad, this will eliminate the web-tier and network all together. If this is slow, then tune the query by checking indexes.
If after adding the appropriate indexes, the query is still slow, then you could prebuild the query's results and store the results in a table or you could create a materialized view.
Once you have the query performing well from sqlplus, then add the network back into the equation. Run it from your web browser and see what overhead is being added.
If it is still slow, then you need to determine if the problem is the act of ajaxing the data or if the slowness occurs after the page does something with the data (ie. populating a data grid via javascript).
If the slowness is because the browser is waiting for the data, then you want to make sure it's only ever fetched once. You can do this by setting the cache headers in the ajax request to cache the result for 1 year. Or you can store the results in localstorage.
If the slowness is due to the browser working with the 39k rows (ie. moving the data into a data grid), then you have a few options.
find a better approach or library
use pagination
You may find performance issues from each of these areas. Most likely the query just needs to be tuned and by adding indexes or pre-querying the data and storing it will solve the problem.
Another thing to consider is if you really need 39k rows at one time. If you can, paginate at the db level so you're returning 100 rows per page.

Related

JDBC Pagination: vendor specific sql versus result set fetchSize

There are a lot of different tutorials across the internet about pagination with JDBC/iterating over huge result set.
So, basically there are a number of approaches I've found so far:
Vendor specific sql
Scrollable result set (?)
Holding plain result set in a memory and map the rows only when necessary (using fetchSize)
The result set fetch size, either set explicitly, or by default equal
to the statement fetch size that was passed to it, determines the
number of rows that are retrieved in any subsequent trips to the
database for that result set. This includes any trips that are still
required to complete the original query, as well as any refetching of
data into the result set. Data can be refetched, either explicitly or
implicitly, to update a scroll-sensitive or
scroll-insensitive/updatable result set.
Cursor (?)
Custom seek method paging implemented by jooq
Sorry for messing all these but I need someone to clear that out for me.
I have a simple task where service consumer asks for results with a pageNumber and pageSize. Looks like I have two options:
Use vendor specific sql
Hold the connection/statement/result set in the memory and rely on jdbc fetchSize
In the latter case I use rxJava-jdbc and if you look at producer implementation it holds the result set, then all you do is calling request(long n) and another n rows are processed. Of course everything is hidden under Observable suggar of rxJava. What I don't like about this approach is that you have to hold the resultSet between different service calls and have to clear that resultSet if client forgets to exhaust or close it. (Note: resultSet here is java ResultSet class, not the actual data)
So, what is recommended way of doing pagination? Is vendor specific sql considered slow compared to holding the connection?
I am using oracle, ScrollableResultSet is not recommended to be used with huge result sets as it caches the whole result set data on the client side. proof

Keeping resources open for an indefinite time is a bad thing in general. The database will, for example, create a cursor for you to obtain the fetched rows. That cursor and other resources will be kept open until you close the result set. The more queries you do in parallel the more resources will be occupied and at some point the database will reject further requests due to an exhausted resource pool (e.g. there is a limited number of cursors, that can be opened at a time).
Hibernate, for example, uses vendor specific SQL to fetch a "page" and I would do it just like that.

There are many approaches because there are many different use cases.
Do you actually expect users to fetch every page of the result set? Or are they more likely to fetch the first page or two and try something else if the data they're interested in isn't there. If you are Google, for example, you can be pretty confident that people will look at results from the first page, a small number will look at results from the second page, and a tiny fraction of results will come from the third page. It makes perfect sense in that case to use vendor-specific code to request a page of data and only run that for the next page when the user asks for it. If you expect the user to fetch the last page of the result, on the other hand, running a separate query for each page is going to be more expensive than running a single query and doing multiple fetches.
How long do users need to keep the queries open? How many concurrent users? If you're building an internal application that dozens of users will have access to and you expect users to keep cursors open for a few minutes, that might be reasonable. If you are trying to build an application that will have thousands of users that will be paging through a result over a span of hours, keeping resources allocated is a bad idea. If your users are really machines that are going to fetch data and process it in a loop as quickly as possible, a single ResultSet with multiple fetches makes far more sense.
How important is it that no row is missed/ every row is seen exactly once/ the results across pages are consistent? Multiple fetches from a single cursor guarantees that every row in the result is seen exactly once. Separate paginated queries might not-- new data could have been added or removed between queries being executed, your sort might not be fully deterministic, etc.

ScrollableResultSet caches result on client side - this requires memory resources. But for example PostgreSQL does it by default and nobody complains. Some databases simply use client's memory to hold the whole resultset. In most cases the database has to process much more data to re-evaluate the query.
Also you usually have much more clients, than database instances.
Also note that query re-execution - using rownum - as implemented by Hibernate does not guarantee correct(consistent) results. If data are modified between executions and default isolation level is used.
It really depends on use case. Changing Oracle's init parameter for max. connections and also for open cursors requires database restart.
So ScrollableResultSet and cursors can be used only when you can predict amount of (concurrent) users.

Performance Optimization in Java

In Java code I am trying to fetch 3500 rows from DB(Oracle). It takes almost 15 seconds to load the data. I have approached storing the result in Cache and retrieving from it too. I am using simple Select statement and displaying 8 columns from a single table (No joins used) .Using List to save the data from DB and using it as source for Datatable. I have also thought from hardware side such as RAM capacity, Storage, Network speed etc... It exceeds the minimum requirements comfortably. Can you help to do it quicker (Shouldn't take more than 3 seconds)?

Have you implemented proper indexing to your tables? I don't like to ask this since this is a very basic way of optimizing your tables for queries and you mention that you have already tried several ways. One of the workarounds that works for me is that if the purpose of the query is to display the results, the code can be designed in such a way that the query should immediately display the initial data while it is still loading more data. This implies to implement a separate thread for loading and separate thread for displaying.

It is most likely that the core problem is that you have one or more of the following:
a poorly designed schema,
a poorly designed query,
an badly overloaded database, and / or
a badly overloaded / underprovisioned network connection between the database and your client.
No amount of changing the client side (Java) code is likely to make a significant difference (i.e. a 5-fold increase) ... unless you are doing something crazy in the way you are building the list, or the bottleneck is in the display code not the retrieval.
You need to use some client-side and server-side performance tools to figure out whether the real bottleneck is the client, the server or the network. Then use those results to decide where to focus your attention.

Caching large data sets with hibernate/jpa/memcache?

I'm new to open source stacks and have been playing with hibernate/jpa/jdbc and memcache. I have a large data set per jdbc query and possibly will have a number these large data sets where I eventually bind to a chart.
However, I'm very focused on performance instead of hitting the database per page load to display it on my web page chart.
Are there some examples of how (memcache, redis, local or distributed) and where to cache this data (jSON or raw result data) to load in memory? Also I need to figure out how to refresh the cache unless it's a time based eviction marking algorithm (i.e. 30min expires so grab new data from data base query instead of using cache or perhaps an automated feed of data into the cache every xhrs/min/etc).?
Thanks!

This is typical problem and solution not straight forward. There are many factor which determine your design. Here is what we did sometime ago
Since our queries to extract data were a bit complex (took around a min to execute) and large dataset, we populated the memcache from a batch which used to pull data from database every 1 hour and push it to the memcached. By keeping the expiry cache larger than the batch interval, we made sure that where will always be data in cache.
There was another used case for dynamic caching, wherein on receiving the request for data, we checked first the memcached and if data not found, query the database, fetch the data, push it to memcached and return the results. But I would advise for this approach only when your database queries are simple and fast enough not to cause the poor overall response.
You can also used Hibernat's second level cache. It depends on your database schema, queries etc. to use this feature efficiently.

Hibernate has built-in support for 2nd level caching. Take a look at EhCache for example.
Also see: http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html#performance-cache

how to cache the objects for display tags in jsp JSTL

I am using the displaytag for the pagination purpose.
Now from the DB, I have a millions of records, to go one from the other page, its taking a quite longer time.
Is there a way we can cache the objects which needs to be shown, and so that traversing in between the pages can be faster.
Requirement : We are querying and displaying the number of files in the directory under Linux environment. each folders has thousands of files..

How are your reading from DB? It would be good to see some more from your implementation.
As a general guideline:
If you read all your data into a list from the DB and only display a page, you will be wasting resources (processing and memory). This can kill your app. Try an approach that will just go for the page you're needing.
If you are using a framework like Hibernate, you can implement caching and paging without much trouble.
If you are using direct JDBC, you will have to limit registers in your query. Here the proper technique might depend on the Database Engine you're using. Please provide this information.
Be aware that your problem might be the amount of read information rather than a caching problem (just depends on the implementation).
As a sample, in Oracle, you would need to know the page and the pagesize. With both, you could limit the query with "where rownum < pagesize * page" (or something similar depending on how you index, and navigate to the first register you need with the absolute(int) method of Resultset. On other Engines it might be more efficient.
Now, if you're paginating with some framework, normally they support some implementation of a "DataProvider" so you can control how to fetch results for each page.

Is a good idea do processing of a large amount of data directly on database?

I have a database with a lot of web pages stored.
I will need to process all the data I have so I have two options: recover the data to the program or process directly in database with some functions I will create.
What I want to know is:
do some processing in the database, and not in the application is a good
idea?
when this is recommended and when not?
are there pros and cons?
is possible to extend the language to new features (external APIs/libraries)?
I tried retrieving the content to application (worked), but was to slow and dirty. My
preoccupation was that can't do in the database what can I do in Java, but I don't know if this is true.
ONLY a example: I have a table called Token. At the moment, it has 180,000 rows, but this will increase to over 10 million rows. I need to do some processing to know if a word between two token classified as `Proper Name´ is part of name or not.
I will need to process all the data. In this case, doing directly on database is better than retrieving to application?

My preoccupation was that can't do in the database what can I do in
Java, but I don't know if this is true.
No, that is not a correct assumption. There are valid circumstances for using database to process data. For example, if it involves calling a lot of disparate SQLs that can be combined in a store procedure then you should do the processing the in the stored procedure and call the stored proc from your java application. This way you avoid making several network trips to get to the database server.
I do not know what are you processing though. Are you parsing XML data stored in your database? Then perhaps you should use XQuery and a lot of the modern databases support it.
ONLY an example: I have a table called Token. At the moment, it has
180,000 rows, but this will increase to over 10 million rows. I need
to do some processing to know if a word between two token classified
as `Proper Name´ is part of name or not.
Is there some indicator in the data that tells it's a proper name? Fetching 10 million rows (highly susceptible to OutOfMemoryException) and then going through them is not a good idea. If there are certain parameters about the data that can be put in a where clause in a SQL to limit the number of data being fetched is the way to go in my opinion. Surely you will need to do explains on your SQL, check the correct indices are in place, check index cluster ratio, type of index, all that will make a difference. Now if you can't fully eliminate all "improper names" then you should try to get rid of as many as you can with SQL and then process the rest in your application. I am assuming this is a batch application, right? If it is a web application then you definitely want to create a batch application to do the staging of the data for you before web applications query it.
I hope my explanation makes sense. Please let me know if you have questions.

Directly interacting with the DB for every single thing is a tedious job and affects the performance...there are several ways to get around this...you can use indexing, caching or tools such as Hibernate which keeps all the data in the memory so that you don't need to query the DB for every operation...there are tools such as luceneIndexer which are very popular and could solve your problem of hitting the DB everytime...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.