I have a Java program that is used to insert a lot (750.000) of records in an Oracle database. I am using the OJDBC6 library, with the OCI client. The table to be written to contains 330 columns, of which 8 appear in one or more indexes.
After having tried two approaches, I'm still struggling with some performance issues.
Creating a prepared statement once, filling the parameters for each record and thereafter executing the statement takes 1h29.
Creating a prepared statement once, filling the parameters for each record, adding them to a batch and executing the batch every 500/1000/5000 (I tried several options) processed records takes 0h27.
However, when the same data is mapped to the same tables using an ETL tool like Informatica PowerCenter, it only takes a couple of minutes. I understand that it might be wishful thinking to reach this timings, but I doubt whether no performance can be gained.
Does anyone has an idea about reasonable timings for this action, and how they can be achieved? Any help is appreciated, many thanks in advance!
(A related question: I will have to update a lot of records, too. What would be the most efficient approach: either keeping track of the columns that were changed and creating a record-dependent prepared statement containing only these columns; or always updating all columns, thereby reusing the same prepared statement?)
Another thing to try would be dropping the indexes, inserting the data, then reloading the indexes. Not as easy from Java, but simple enough.
You can use ThreadGroup on Java
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/ThreadGroup.html
;)
Related
I wish to read all rows of a large table from PostgreSQL in Java. I am processing the rows one by one in the Java software.
By default the JDBC PostgreSQL driver reads all rows into memory, meaning my program runs out of memory.
The documentation talks of "Getting results based on a cursor" using st.setFetchSize(50); I have implemented that and it works well.
Is there any disadvantage to this approach? If not, I would enable it for all our queries, big and small, or is that a bad idea?
Well, if you have a fetchsize of 50 and you get 1000 results, it will result in 20 round-trips to the database. So no, it's not a good idea to enable it blindly without thinking of the actual queries being run.
A bigger question is why are your ResultSets so big that you run out of memory. Are you only loading data you're going to use and you just don't have a lot of memory, or are there perhaps poorly designed queries that return excessive results.
I have a situation here. I have a huge database with >10 columns and millions of rows. I am using a matching algorithm which matches each input records with the values in database.
The database operation is taking lot of time when there are millions of records to match. I am thinking of using a multi-hash map or any resultset alternative so that i can save the whole table in memory and prevent hitting database again....
Can anybody tell me what should i do??
I don't think this is the right way to go. You are trying to do the database's work manually in Java. I'm not saying that you are not capable of doing this, but most databases have been developed for many years and are quite good in doing exactly the thing that you want.
However, databases need to be configured correctly for a given type of query to be executed fast. So my suggestion is that you first check whether you can tweak the database configuration to improve the performance of the query. The most common thing is to add the right indexes to your table. Read How MySQL Uses Indexes or the corresponding part of the manual of your particular database for more information.
The other thing is, if you have so much data storing everything in main memory is probably not faster and might even be infeasible. Not to say that you have to transfer the whole data first.
In any case, try to use a profiler to identify the bottleneck of the program first. Maybe the problem is not even on the database side.
Hello I would like to know what method is faster on Android.
I have a loop that process thousands of rows and I think the performance is being affected with the SQLiteDatabase.insert() method.
In the insert method I put a parameter with contentvalues and I think in the background the method must to have a process to check every contentvalue and build the query.
Instead in the execSQL method I put the whole query as a parameter and I think the method don't have to do that much to execute it.
I don't know if i am right, I think the execSQL is better than insert for a considerable quantity of data but I am not sure...
Well i tried that and there is a little difference beteween them... The execSQL is faster but it's difficult to notice it.
However, my real problem was that I wasn't using transactions. Due to sqlite doesn't have a server to control the access to the data, sqlite has to close and reopen the phisycal file for every insert you do. so if you have a thousand inserts in a loop and there's no transaction for it, it will going to do a thousand transactions. In my test, a thousand inserts lasted 11 seconds without using transaction, and with transaction it lasted just 1 second.
Most probably you won't see any difference. Most of the time is spent to actually write the data into the database and the query generation is rather simple step. I recommend you to time the inserts using both ways if you want to see the difference yourself.
If I want to fetch million rows in hibernate, how would it work? Will hibernate crash? How can I optimize that.
typically you wouldn't use hibernate for this. If you need to do a batch operation, use sql or the hibernate wrappers for batch operations. There is no way loading millions of records is going to end well for your application. Your app with thrash as the gc runs, or possibly crash. there has to be another option.
If you read one/write one it will probably work fine. Are you sure this the way you want to read 1,000,000 rows? It will likely take a while.
If you want all the objects to be in memory at the same time, you might well be challenged.
You can optimize it best, probably, by finding a different way. For example, you can dump from the database using database tools much more quickly than reading with hibernate.
You can select sums, maxes, and counts in the database without returning a million rows over the network.
What are you trying to accomplish, exactly?
For this you would be better off using spring's jdbc tools with a row handler. It will run the query and then perform some action on a row at a time.
Bring only the columns you need. Try it out in a test environment.
You should try looking at the StatelessSession interface and example of which can be found here:
http://mrmcgeek.blogspot.com/2010/09/bulk-data-loading-with-hibernate.html
I am connecting oracle db thru java program. The problem is i am getting Outofmemeory exception because the sql is returning 3 million records. I cannot increase the JVM heapsize for some reason.
What is the best solution to solve this?
Is the only option is to run the sql with LIMIT?
If your program needs to return 3 mil records at once, you're doing something wrong. What do you need to do that requires processing 3 mil records at once?
You can either split the query into smaller ones using LIMIT, or rethink what you need to do to reduce the amount of data you need to process.
In my opinion is pointless to have queries that return 3 million records. What would you do with them? There is no meaning to present them to the user and if you want to do some calculations it is better to run more than one queries that return considerably fewer records.
Using LIMIT is one solution, but a better solution would be to restructure your database and application so that you can have "smarter" queries that do not return everything in one go. For example you could return records based on a date column. This way you could have the most recent ones.
Application scaling is always an issue. The solution here will to do whatever you are trying to do in Java as a stored procedure in Oracle PL/SQL. Let oracle process the data and use internal query planners to limit amount of data flowing in an out and possibly causing major latencies.
You can even write the stored procedure in Java.
Second solution will be to indeed make a limited query and process from several java nodes and collate results. Look up map-reduce.
If each record is around 1 kilobyte that means 3gb of data, do you have that amount of memory available for your application?
Should be better if you explain the "real" problem, since OutOfMemory is not your actual problem.
Try this:
http://w3schools.com/sql/sql_where.asp
There could be three possible solutions
1. If retreiving 3million records at once is not necessary.. Use LIMIT
Consider using meaningful where clause
Export database entries into txt or csv or excel format with the tool that oracle provides and use that file for your use..
Cheers :-)
reconsider your where clause. see if you can make it more restrictive.
and/or
use limit
Just for reference, In Oracle queries, LIMIT is ROWNUM
Eg., ... WHERE ROWNUM<=1000
If you get that large a response then take care to process the result set row by row so the full result does not need to be in memory. If you do that properly you can process enormous data sets without problems.