How do i update multiple rows efficiently?
One statement
Multiple statements
Can a single statement string become too large for SQL to handle (10000+ entries/rows)?
I have one single variable to modify, which is status:
| id | status |
My data is stored in a List (ArrayList).
It could be more efficient to use the Load Data command in MySQL. Provided you can structure your input into CSV format. Utilize the REPLACE and/or IGNORE keywords appropriately. This will be much faster than 1000's of individual statements to MySQL.
If you want to use JDBC and do it efficient you should definitely check out this blog post about batch inserts performance (applies to updates too).
Generally speaking you need to add rewriteBatchedStatements=true to your connection string, for example:
Connection con = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/database_name?rewriteBatchedStatements=true","login", "password");
This will allow driver to take prepared statements and re-write them to more efficient form.
See prepared statements : http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Create your prepared statement out of the loop, then inside the loop, just execute prepared statement with updated parameters.
If your status is limited set of values, than I will break the list in subsets based on status and than update the set of rows per single statement. Even if there are 10K rows for specific staus, you can update multiple rows (using in operator) in single call. This will decrease the roundtrip that your application need to made for updation.
Also you may try batch updates like this, though I'm not sure whether they are efficient:
update `id_status_table` `row`
set `status` = (
select case `row`.`id` when 1 then 'one'
when 2 then 'two'
else 'three or more' end
);
While the query string for 10000 rows may get too big, you may apply such query to each 1000 rows.
Related
I have a MySQL database where I need to do a 1k or so updates, and I am contemplating whether it would be more appropriate to use executeBatch or executeUpdate. The preparedstatement is to be built on an ArrayList of 1k or more ids (which are PKs of the table to be updated). For each update to the table I need to check if it was updated or not (it's possible that the id is not in the table). In the case that the id doesn't exist, I need to add that id to a separate ArrayList which will be used to do batch inserts.
Given the above, is it more appropriate to do:
Various separate executeUpdate() and then store the id if it is not updated, or
Simply create a batch and use executeBatch(), which will return an array of either a 0 or 1 for each separate statement/id.
In case two, the overhead would be an additional array to hold all the 0 or 1 return values. In case one, the overhead would be due to executing each UPDATE separately.
Definitely executeBatch(), and make sure that you add "rewriteBatchedStatements=true" to your jdbc connection string.
The increase in throughput is hard to exaggerate. Your 1K updates will likely take barely longer than a single update, assuming that you have proper indexes and a WHERE clause that makes use of them.
Without the extra setting on the connection string, the time to do the batch update is going to be about the same as to do each update individually.
I'd go with batch since network latency is something to consider unless you are somehow running it on the same box.
I want to insert data to TERADATA with jdbc.But it is slow. How can I make it faster?
I wrote this code:
connection_tera= DriverManager.getConnection
(
"jdbc:teradata://192.168.x.xx/database=DBC,tmode=ANSI,charset=UTF8","dbc","dbc"
);
stmt_tera = connection_tera.prepareStatement("insert into a.b values(?)");
//some code here to start while loop
stmt_tera.setObject(i,reset.getobject(i));
stmt_tera.addBatch();
if(addedBatchNumber%100==0)
stmt_tera.executeBatch();
connection_tera.commit();
stmt_tera.clearBatch();
//some code here and finish while loop
Should I add paramater like TYPE=FASTLOAD to connection string? or something else?
If you are loading to an empty table I would consider using JDBC FastLoad. For more details on the performance of JDBC to insert data into a Teradata table please refer to the following article on the Teradata Developer Exchange: Speed up your JDBC/ODBC Applications
If your table is not empty, it may make sense to load the data to a staging (intermediate) table that is empty first. Then use the ANSI MERGE operation to apply the INSERT/UPDATE logic to the target table. The MERGE operation will perform faster than the traditional INSERT and UPDATE statements because the operation works at the block level instead of row level. In some instances you can even avoid spooling the source data before the data is applied to the target table.
Here is a collection of sample Teradata JDBC Driver programs. Programs 205 through 209 are examples of using FastLoad.
Additionally you can also consider another side of the coin..Meaning you can think of performing multiple row insert with single query
insert into table1 (First,Last) values ('Fred','Smith'),
('John','Smith'),
('Michael','Smith'),
('Robert','Smith');
The benefits are
Connecting/interacting with database is an expensive operation. Say you have to insert 100 rows using your code so you would write your application in such a way to fire 100 quires( 100 db interactions ).. Instead of this, build your sql query as mentioned above and try insert and check the performance.
You are avoiding n number of database interactions.
Insert operation is seamlessly faster if you do like this.. This has been widely adopted technique to restore/import databases.
Hope this will be helpful..
Cheers!
Cheers!
If I'm reading this correctly, you are executing and committing a batch that has only one insert statement in it - I don't think that is your intention ( or, if it is, I think you are misunderstanding how batches are expected to be used )
Seems like you need to have an inner loop that adds an arbitrary number of statements to the batch which you then submit via executeBatch()
My question is very simple and in the title. Google and stack overflow are giving me nothing so I figured it was time to ask a question.
I am currently in the process of making an sql query for when users register to my site. I have ALWAYS only used prepared statements b/c the extra coding in callable statements, and the performance hit of regular statements are both turn offs. However this query is causing me to think of possible alternatives to my previous one size fits all (prepared statements) ways.
This query has a total of 4 round trips to the database. The steps are
Insert a user into the database, get back the generated key (their user id) within a result set.
Take the user id and insert a row into the album table. Get back a generated key (album id)
Take the album id and insert a row into the images table. Get back a generated key (image id)
Take the image id and update the user tables current default column with the image id
Aside: For anyone interested in the way I am getting the keys back after my inserts it is with Statement.RETURN_GENERATED_KEYS and you can read a great article about this here - IBM Article
So anyway I'd like to know if the use of 4 round trip (but cacheable) prepared statements is okay or if I should go with batched (but not cacheable) statements?
JDBC batch statements let you reduce the number of roundtrips under a condition that there is no data dependency among the rows that you are inserting or updating. Your scenario fails this condition, because the changes are dependent on each other's data: statements 2 through 4 must pick up an ID from the prior statement 1 through 3.
On the other hand, four round-trips is definitely suboptimal. That is why scenarios like yours call for stored procedures: you can put all this logic into a create_user_proc, and return the user ID back to the caller. All insertions from 1 to 4 would happen inside your SQL code, letting you manage ID dependencies in SQL. You would be able to call this stored procedure in a single roundtrip, which is definitely faster, especially if you process multiple user registrations per minute.
I would advice to write one Stored Proc doing all this four operation and passing the all the required params from application (to stored proc) at once and there in stored proc, you can get the generated keys for resultset
To increase performance and reduce database round trips, I agree with dasblinkenlight and ajduke - stored procedures will achieve this.
But, it this really a performance bottleneck in your application?
How often do users register on your site?
Compare this to how often information is read from these tables (once per page access?)
If information in these tables are being read thousands of times more than being written via new registrations, then it might not be worth going for the stored procedure approach.
Why you might not want to use stored procedures and stick to prepared statements:
not as portable as using prepared statements (a different syntax/language for each database, some simpler databases don't even support them)
will not work with ORM solutions such as JPA* - you mentioned using PreparedStatements directly so this probably does not apply to you, at least not now but it might limit you later on if you wanted to use ORM in the future
*JPA 2.1 might actually support stored procedures, but as of writing it has not yet been released.
If we use the Limit clause in a query which also has ORDER BY clause and execute the query in JDBC, will there be any effect in performance? (using MySQL database)
Example:
SELECT modelName from Cars ORDER BY manuDate DESC Limit 1
I read in one of the threads in this forum that, by default a set size is fetched at a time. How can I find the default fetch size?
I want only one record. Originally, I was using as follows:
SQL Query:
SELECT modelName from Cars ORDER BY manuDate DESC
In the JAVA code, I was extracting as follows:
if(resultSett.next()){
//do something here.
}
Definitely the LIMIT 1 will have a positive effect on the performance. Instead of the entire (well, depends on default fetch size) data set of mathes being returned from the DB server to the Java code, only one row will be returned. This saves a lot of network bandwidth and Java memory usage.
Always delegate as much as possible constraints like LIMIT, ORDER, WHERE, etc to the SQL language instead of doing it in the Java side. The DB will do it much better than your Java code can ever do (if the table is properly indexed, of course). You should try to write the SQL query as much as possibe that it returns exactly the information you need.
Only disadvantage of writing DB-specific SQL queries is that the SQL language is not entirely portable among different DB servers, which would require you to change the SQL queries everytime when you change of DB server. But it's in real world very rare anyway to switch to a completely different DB make. Externalizing SQL strings to XML or properties files should help a lot anyway.
There are two ways the LIMIT could speed things up:
by producing less data, which means less data gets sent over the wire and processed by the JDBC client
by potentially having MySQL itself look at fewer rows
The second one of those depends on how MySQL can produce the ordering. If you don't have an index on manuDate, MySQL will have to fetch all the rows from Cars, then order them, then give you the first one. But if there's an index on manuDate, MySQL can just look at the first entry in that index, fetch the appropriate row, and that's it. (If the index also contains modelName, MySQL doesn't even need to fetch the row after it looks at the index -- it's a covering index.)
With all that said, watch out! If manuDate isn't unique, the ordering is only partially deterministic (the order for all rows with the same manuDate is undefined), and your LIMIT 1 therefore doesn't have a single correct answer. For instance, if you switch storage engines, you might start getting different results.
I have some queries that run for a quite long (20-30 minutes). If a lot of queries are started simultaneously, connection pool is drained quickly.
Is it possible to wrap the long-running query into a statement (procedure) that will store the result of a generic query into a temp table, terminanting the connection, and fetchin (polling) the results later on demand?
EDIT: queries and data stuctures are optimized, and tips like 'check your indices and execution plan' don't work for me. I'm looking for a way to store [maybe a] byte presentation of a generic result set, for later retreive.
First of all, 20-30 minutes is an extremely long time for a query - are you sure you aren't missing any indexes for the query? Do check your execution plan - you could get a huge performance gain from a well-placed index.
In MySQL, you could do
INSERT INTO `cached_result_table` (
SELECT your_query_here
)
(of course, cached_result_table needs to have the exact same column structure as your SELECT returns, otherwise you'll get an error).
Then, you could query these cached results (instead of the original tables), and only run the above query from time to time - to update the cached_result_table.
Of course, the query will need to run at least once initially, which will take the 20-30 minutes you mentioned. I suggest to pre-populate the cached table before the data are requested, and keep some locking mechanism to prevent the update query to run several times simultaneously. Pseudocode:
init:
insert select your_big_query
work:
if your_big_query cached table is empty or nearing expiration:
refresh in the background:
check flag to see if there's another "refresh" process running
if yes
end // don't run two your_big_queries at the same time
else
set flag
re-run your_big_query, save to cached table
clear flag
serve data to clients always from cached table
An easy way to do that in Oracle is "CREATE TABLE sometempname AS SELECT...". That will create a new table using the result columns from the select.
Not quite sure what you are requesting.
Currently you have 50 database sessions. Say you get 40 running long-running queries, that leaves 10 to service the rest.
What you seem to be asking for is, you want those 40 queries asynchronously (running in the background) not clogging up the connection pool of 50. The question is, do you want those 40 running concurrently with (potentially) another 50 queries from the connection pool, or do you want them queued up in some way ?
Queuing can be done (look into DBMS_SCHEDULER and DBMS_JOB). But you will need to deliver those results into some other table and know how to deliver that result set. The old fashioned way is simply to generate reports on request that get delivered to a directory on a shared drive or by email. Could be PDF or CSV or Excel.
If you want the 40 running concurrently alongside the 50 'connection pool' settings, then you may be best off setting up a separate connection pool for the long-running queries.
You can look into Resource Manager for terminating calls that take too long or too many resources. That way the quickie pool can't get bogged down in long running requests.
The most generic approach in Oracle I can think of is creating a stored procedure that will convert a result set into XML, and store it as CLOB XMLType in a table with the results of your long-running queries.
You can find more on generation XMLs from a generic result sets here.
SQL> select dbms_xmlgen.getxml('select employee_id, first_name,
2 last_name, phone_number from employees where rownum < 6') xml
3 from dual