In Java code I am trying to fetch 3500 rows from DB(Oracle). It takes almost 15 seconds to load the data. I have approached storing the result in Cache and retrieving from it too. I am using simple Select statement and displaying 8 columns from a single table (No joins used) .Using List to save the data from DB and using it as source for Datatable. I have also thought from hardware side such as RAM capacity, Storage, Network speed etc... It exceeds the minimum requirements comfortably. Can you help to do it quicker (Shouldn't take more than 3 seconds)?
Have you implemented proper indexing to your tables? I don't like to ask this since this is a very basic way of optimizing your tables for queries and you mention that you have already tried several ways. One of the workarounds that works for me is that if the purpose of the query is to display the results, the code can be designed in such a way that the query should immediately display the initial data while it is still loading more data. This implies to implement a separate thread for loading and separate thread for displaying.
It is most likely that the core problem is that you have one or more of the following:
a poorly designed schema,
a poorly designed query,
an badly overloaded database, and / or
a badly overloaded / underprovisioned network connection between the database and your client.
No amount of changing the client side (Java) code is likely to make a significant difference (i.e. a 5-fold increase) ... unless you are doing something crazy in the way you are building the list, or the bottleneck is in the display code not the retrieval.
You need to use some client-side and server-side performance tools to figure out whether the real bottleneck is the client, the server or the network. Then use those results to decide where to focus your attention.
Related
My question is what would be the best design of table structure, code and UI server side processing.
As per my understanding, if I choose client side processsing, data should be less, may be a few thousand records, thats fine.
But when data is huge, we need to do server side processing, and then problem arises, most of the data we should keep in normalized form for for better storage structure but UI needs denormalized form for better filter and sort.
If I keep data normalized, I can change values but then for read, I need joins and sort/filter on those joins.
What is the best way to keep data to support server side processing.
Would like to consider scenarios where we have a table code_table( code, value) and for each data record in another table data_table, after some edit, we associate it with such code from code_table.
Now we need to show code values on UI and sort/filter on it. and we have 10–15 such tables.
Here are few things I would like to suggest, may be they will help you make a decision:
Joins: Usually having a few joins to get your end results is not such a big deal, as long as number of rows are running into millions. You can optimise them by having the right indexes. Unless, there are specific reasons you don't absolutely want to use joins, I strongly prefer you go with a Normalised structure and joins. This will give you utmost flexibility. Do some benchmarking to see how far you can scale.
De-Normalization: This should be an option if you think your Join queries cannot scale with the load you have. Having de normalized data will mean you can have Insert/Update/Delete inconsistencies. But this does not mean it cannot (or should not) be done. You will have to take some extra efforts in your server side code to ensure these inconsistencies do not creep in, but other than that you should be fine.
I work with a very large, enterprise application written in Java which queries an Oracle SQL database. We use JavaScript on the front end, and are always looking for ways to improve upon the performance of the application with increased use.
The issue we're having right now is that we are sending a query, via Java, that results in 39,000 records. This is putting a significant load on the server and causes the browser to hang. I should mention that the data is relatively static (only changes about once a year) and we could use an xml map or something similar (flat file) since we know the exact results that will be returned each time.
The query, however, is still taking 1.5 - 2 minutes to load, which is unacceptable. I wanted to see if there were any suggestions as to how this scenario can be optimized, especially if it can be done any quicker with JavaScript (or jQuery) and using AJAX for the db connection. Or, are we going about this problem all wrong?
You want to determine if the slowness is due to:
the query executing in the database
the network is slow returning 39k records
the javascript working with the 39k records after the ajax is complete
If you can run the query in sqlplus or toad, this will eliminate the web-tier and network all together. If this is slow, then tune the query by checking indexes.
If after adding the appropriate indexes, the query is still slow, then you could prebuild the query's results and store the results in a table or you could create a materialized view.
Once you have the query performing well from sqlplus, then add the network back into the equation. Run it from your web browser and see what overhead is being added.
If it is still slow, then you need to determine if the problem is the act of ajaxing the data or if the slowness occurs after the page does something with the data (ie. populating a data grid via javascript).
If the slowness is because the browser is waiting for the data, then you want to make sure it's only ever fetched once. You can do this by setting the cache headers in the ajax request to cache the result for 1 year. Or you can store the results in localstorage.
If the slowness is due to the browser working with the 39k rows (ie. moving the data into a data grid), then you have a few options.
find a better approach or library
use pagination
You may find performance issues from each of these areas. Most likely the query just needs to be tuned and by adding indexes or pre-querying the data and storing it will solve the problem.
Another thing to consider is if you really need 39k rows at one time. If you can, paginate at the db level so you're returning 100 rows per page.
What's the fastest way to get a large volume of data from an Oracle database into Java objects.
Are there any Oracle tricks as to the way the data should be organised?
I was thinking of using plain JDBC rather than any Hibernate style libraries?
Would it be better to get Oracle to produce a file and then read from the file - although this has to be done programatically.
All thoughts appreciated.
I am not a Java or JDBC expert, but if you plan on pulling a lot of rows down from a database, you will likely benefit by increasing the prefetch rows on the connection.
Connection conn = DriverManager.getConnection ("jdbc:oracle:","user","password");
//Set the default row prefetch setting for this connection
((OracleConnection)conn).setDefaultRowPrefetch(100);
I believe the default for JDBC is to fetch one row at a time, so you're paying for a round trip to the database for each row fetched. (Note, I've seen documentation that suggests the default is 10 rows per round trip). Setting prefetch to a larger number will fetch more rows per round trip to the database. Speed increases can be dramatic depending on the number of rows and the performance of your network.
Depending on how far you want to go with this I'd imagine dropping jdbc and writing a custom application residing on the same machine as the DB using Oracle Call API and JNI would be the fastest...
It's probably much simpler to just use a plain prepared statment using JDBC and then if that's not enough (and depending on where the bottle neck is) try making a stored procedure. The caching done by ORM's like Hibernate should not be discounted though, so I guess you'd have to do some benchmarks. Also if the bottle neck is the database and you write a stored procedure which improves the read performance, then you could still use Hibernate to marshal the data to java objects. See Using stored procedures for querying
Whatever you wind up doing, design for/implement "lazy initialization" [really only applies for complex object hierarchies/networks; you said java objects (plural) so I'm imagining something more than just a single table that maps to a single object]. So basically, you are only reading in the objects that are needed at that time; when you run a getter method, then it does more db calls for just that data.
Another trick sometimes overlooked in the Java world is: if you have some complex sql coming from the code, you can rather create a view on the Oracle side, embedding that complexity there, then map your object to the view; so if you can flatten your object like the view, then you're in business.
I have a database with a lot of web pages stored.
I will need to process all the data I have so I have two options: recover the data to the program or process directly in database with some functions I will create.
What I want to know is:
do some processing in the database, and not in the application is a good
idea?
when this is recommended and when not?
are there pros and cons?
is possible to extend the language to new features (external APIs/libraries)?
I tried retrieving the content to application (worked), but was to slow and dirty. My
preoccupation was that can't do in the database what can I do in Java, but I don't know if this is true.
ONLY a example: I have a table called Token. At the moment, it has 180,000 rows, but this will increase to over 10 million rows. I need to do some processing to know if a word between two token classified as `Proper Name´ is part of name or not.
I will need to process all the data. In this case, doing directly on database is better than retrieving to application?
My preoccupation was that can't do in the database what can I do in
Java, but I don't know if this is true.
No, that is not a correct assumption. There are valid circumstances for using database to process data. For example, if it involves calling a lot of disparate SQLs that can be combined in a store procedure then you should do the processing the in the stored procedure and call the stored proc from your java application. This way you avoid making several network trips to get to the database server.
I do not know what are you processing though. Are you parsing XML data stored in your database? Then perhaps you should use XQuery and a lot of the modern databases support it.
ONLY an example: I have a table called Token. At the moment, it has
180,000 rows, but this will increase to over 10 million rows. I need
to do some processing to know if a word between two token classified
as `Proper Name´ is part of name or not.
Is there some indicator in the data that tells it's a proper name? Fetching 10 million rows (highly susceptible to OutOfMemoryException) and then going through them is not a good idea. If there are certain parameters about the data that can be put in a where clause in a SQL to limit the number of data being fetched is the way to go in my opinion. Surely you will need to do explains on your SQL, check the correct indices are in place, check index cluster ratio, type of index, all that will make a difference. Now if you can't fully eliminate all "improper names" then you should try to get rid of as many as you can with SQL and then process the rest in your application. I am assuming this is a batch application, right? If it is a web application then you definitely want to create a batch application to do the staging of the data for you before web applications query it.
I hope my explanation makes sense. Please let me know if you have questions.
Directly interacting with the DB for every single thing is a tedious job and affects the performance...there are several ways to get around this...you can use indexing, caching or tools such as Hibernate which keeps all the data in the memory so that you don't need to query the DB for every operation...there are tools such as luceneIndexer which are very popular and could solve your problem of hitting the DB everytime...
I have a Java application that needs to display large amounts of data (on the order of 1 million data points). The data doesn't all need to be displayed at the same time but rather only when requested by a user. The app is a desktop app that is not running with an app server or hitting any centralized database.
My thought was to run a database on the machine and load the data in there. The DB will be read only most of the time, so I should be able to index to help optimize queries. If I'm running on a local system, I'm not sure if I should try and implement some caching (I'm not sure how fast the queries will run, I'm currently working on them).
Is this is a logical way to approach the problem or would there be better approaches?
Thanks,
Jeff
Display and data are two different things.
You don't give any details about either, but it could be possible to generate the display in the background, bringing in the data one slice at a time, and then displaying when it's ready. Lots of anything could cause memory issues, so you'll need to be careful. The database will help persist things, but it won't help you get ten pounds of data into your five pound memory bag.
UPDATE: If individuals are only reading a few points at a time, and display isn't an issue, then I'd say that any database will be able to handle it if you index the table appropriately. One million rows isn't a lot for a capable database.
Embedded DB seems reasonable. Check out JavaDB/Derby or H2 or HSQLDB.
Sqlite with a java wrapper is fine too.
It really depends on your data. Do multiple instances request the data? If not, it is definitely worth to look for a simple SQLite database as the storage. It is just a single file on your file system. No need to set up a server.
Well, depends on data size. 1 Million integers for example isnt that much, but 1 Million data structures/classes or whatever with, lets say, 1000 Bytes size is much.
For small data: keep them in memory
For large data: i think using the DB would be good.
Just my opinion :)
edit:
Of course it depends also on the speed you want to achieve. If you really need high speed and the data is big you could also cache some of them in memory and leave the rest in the db.