Hello I would like to know what method is faster on Android.
I have a loop that process thousands of rows and I think the performance is being affected with the SQLiteDatabase.insert() method.
In the insert method I put a parameter with contentvalues and I think in the background the method must to have a process to check every contentvalue and build the query.
Instead in the execSQL method I put the whole query as a parameter and I think the method don't have to do that much to execute it.
I don't know if i am right, I think the execSQL is better than insert for a considerable quantity of data but I am not sure...
Well i tried that and there is a little difference beteween them... The execSQL is faster but it's difficult to notice it.
However, my real problem was that I wasn't using transactions. Due to sqlite doesn't have a server to control the access to the data, sqlite has to close and reopen the phisycal file for every insert you do. so if you have a thousand inserts in a loop and there's no transaction for it, it will going to do a thousand transactions. In my test, a thousand inserts lasted 11 seconds without using transaction, and with transaction it lasted just 1 second.
Most probably you won't see any difference. Most of the time is spent to actually write the data into the database and the query generation is rather simple step. I recommend you to time the inserts using both ways if you want to see the difference yourself.
Related
I have a csv file with more than 1 Million records. I want to do some processing on these records and persist all records in DB.
I tried few options like.
Save all entities in one GO
jpaepository.save(entities);
This method takes forever and never compete. works good for smaller no of records.
Save all Entities one by one
entities.forEach(jpaRepository::save);
This method completes but takes hell lot of time and memory usage is on the sky.
Here is what I would recommend, based just on your question -
Create a service that reads the file, say FileReaderService
Create a services that writes a set number of records say 1000 at a time, let us call it StorageService.Inject this into FileReaderService
Put #Transactional annotation on the save_N_records method.
Repeatedly call StorageService.save_N_records from FileReaderService.Each time you call it make sure you write a log to monitor progress.
If it is at all possible, I would disable indexing on the table, so inserts are faster, then turn it back on when I am done inserting. Of course, this is never possible on an on-line system, only on off-line reporting systems. Hope this helps!
I have a Java program that is used to insert a lot (750.000) of records in an Oracle database. I am using the OJDBC6 library, with the OCI client. The table to be written to contains 330 columns, of which 8 appear in one or more indexes.
After having tried two approaches, I'm still struggling with some performance issues.
Creating a prepared statement once, filling the parameters for each record and thereafter executing the statement takes 1h29.
Creating a prepared statement once, filling the parameters for each record, adding them to a batch and executing the batch every 500/1000/5000 (I tried several options) processed records takes 0h27.
However, when the same data is mapped to the same tables using an ETL tool like Informatica PowerCenter, it only takes a couple of minutes. I understand that it might be wishful thinking to reach this timings, but I doubt whether no performance can be gained.
Does anyone has an idea about reasonable timings for this action, and how they can be achieved? Any help is appreciated, many thanks in advance!
(A related question: I will have to update a lot of records, too. What would be the most efficient approach: either keeping track of the columns that were changed and creating a record-dependent prepared statement containing only these columns; or always updating all columns, thereby reusing the same prepared statement?)
Another thing to try would be dropping the indexes, inserting the data, then reloading the indexes. Not as easy from Java, but simple enough.
You can use ThreadGroup on Java
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/ThreadGroup.html
;)
i'm parsing large log files (5+GB) and extracting ad-hoc placed profiling lines (call name and execution time). I want to insert those lines into a MySql db.
My question is: should I execute the insert statement every time I get the line while parsing or there's some best practice to speed up everything?
If there is any way that you could do a bulk insert, that would help a lot (or at least send your data to the database in batches, instead of making separate calls each time).
Edit
LOAD DATA INFILE sounds even faster ;o)
https://web.archive.org/web/20150413042140/http://jeffrick.com/2010/03/23/bulk-insert-into-a-mysql-database
There are better options.
See http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
In your case, I think writing the relevant records to a file and then using LOAD DATA INFILE is the best approach.
For small updates, the number of transactions is critical for performance. SO if you can perform a number of inserts in the same transaction it will go much faster. I would try 100 inserts per transaction first.
If you don't want to follow the recommendations in Galz's link ( which is excellent BTW ) then try to open the connection and prepare the statement once, then loop round your log files carrying out the inserts ( using the premared statement ), then finally close the statement and connection once at the end. It's not the fastest way of doing the inserts, but it's the fastest way that sticks to a "normal" JDBC approach.
From java
JDBC batch insert
Example:
You do this with every insert: http://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/java/hu/mokk/hunglish/lucene/Indexer.java#232
You do this with every batch http://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/java/hu/mokk/hunglish/lucene/Indexer.java#371
The size of the batch can be determined by the available memory.
Aside from insert speed, the other problem you may run into is memory. Whatever approach you use, you will still need to consider your memory usage as the records are loaded from the file. Unless you have a hard requirement on processing speed, then it may be a better to use an approach with a predictable memory foot print.
Recently one of my colleagues made a comment that I should not use
LIKE '%'||?||'%'
rather use
LIKE ?
in the SQL and then replace the LIKE ? marker with LIKE '%'||?||'%' before I execute the SQL. He made the point that with a single parameter marker DB2 database will cache the statement always and thus cut down on the SQL prepare time.
However, I am not sure if it is accurate or not. To me it should be the other way around since we are doing more processing by doing a string replace on the SQL everytime the query is getting executed.
Does anyone know if a single marker really speeds up execution? Just FYI - I am using Spring 2.5 JDBC framework and the DB2 version is 9.2.
My question is - does DB2 treat "LIKE ?" differently from "LIKE '%'||?||'%'" as far as caching and preparation goes.
'LIKE ?' is a PreparedStatement. Prepared statements are an optimization at the JDBC driver level. The thinking is that databases analyze queries to decide how to most efficiently process them. The DB can then cache the resulting query plan, keyed on the full statement. Reusing identical statements reuses the query plan. So basically if you are running the same query multiple times with different comparison strings, and if the query plan stays cached, then yes, using 'LIKE ?' will be faster.
Some useful (though somewhat dated) info on PreparedStatements:
Prepared Statments
More Prepared Statments
I haven't done too much DB2, not since the 90's and I'm not really sure if I'm understanding what your underlying question is. Way back then I got a phone call from the head of the DBA team. "What are you doing different than every other programmer we've got!??" Mind you, this was early in my career, so tentatively I answered, "Nothing....", imagine it in kind of a whiny voice. "Well then, why do your queries take 50% of the cpu resources of any the other guys???". I took a quick poll of all the other guys and found I was the only one using prepared statements. Now under the covers Spring automatically makes prepared statements, and they've improved statement caching in the database a lot over the years, but if you make use of the properly, you can get the speedup there, AND it'll make the statement cache swap things out less often. It really depends on your use case, if you're only going to hit the query once, then there would be no difference, if its a few thousand times, obviously it would make a much greater difference.
in the SQL and then replace the LIKE ? marker with LIKE '%'||?||'%' before I execute the SQL. He made the point that with a single parameter marker DB2 database will cache the statement always and thus cut down on the SQL prepare time.
Unless DB2 is some sort of weird alien SQL database, or if it's driver does some crazy things, then the database server will never see your prepared statement until you actually execute it. So you can swap clauses in and out of the PreparedStatement all day long, and it will have no effect until you actually send it to the server when you execute it.
Currently working in the deployment of an OFBiz based ERP, we've come to the following problem: some of the code of the framework calls the resultSet.last() to know the total rows of the resultset. Using the Oracle JDBC Driver v11 and v10, it tries to cache all of the rows in the client memory, crashing the JVM because it doesn't have enough heap space.
After researching, the problem seems to be that the Oracle JDBC implements the Scrollable Cursor in the client-side, instead of in the server, by the use of a cache. Using the datadirect driver, that issue is solved, but it seems that the call to resultset.last() takes too much to complete, thus the application server aborts the transaction
is there any way to implemente scrollable cursors via jdbc in oracle without resorting to the datadirect driver?
and what is the fastest way to know the length of a given resultSet??
Thanks in advance
Ismael
"what is the fastest way to know the length of a given resultSet"
The ONLY way to really know is to count them all. You want to know how many 'SMITH's are in the phone book. You count them.
If it is a small result set, and quickly arrived at, it is not a problem. EG There won't be many Gandalfs in the phone book, and you probably want to get them all anyway.
If it is a large result set, you might be able to do an estimate, though that's not generally something that SQL is well-designed for.
To avoid caching the entire result set on the client, you can try
select id, count(1) over () n from junk;
Then each row will have an extra column (in this case n) with the count of rows in the result set. But it will still take the same amount of time to arrive at the count, so there's still a strong chance of a timeout.
A compromise is get the first hundred (or thousand) rows, and don't worry about the pagination beyond that.
your proposed "workaround" with count basically doubles the work done by DB server. It must first walk through everything to count number of results and then do the same + return results. Much better is the method mentioned by Gary (count(*) over() - analytics). But even here the whole result set must be created before first output is returned to the client. So it is potentially slow a memory consuming for large outputs.
Best way in my opinion is select only the page you want on the screen (+1 to determine that next one exists) e.g. rows from 21 to 41. And have another button (usecase) to count them all in the (rare) case someone needs it.