I want to insert data to TERADATA with jdbc.But it is slow. How can I make it faster?
I wrote this code:
connection_tera= DriverManager.getConnection
(
"jdbc:teradata://192.168.x.xx/database=DBC,tmode=ANSI,charset=UTF8","dbc","dbc"
);
stmt_tera = connection_tera.prepareStatement("insert into a.b values(?)");
//some code here to start while loop
stmt_tera.setObject(i,reset.getobject(i));
stmt_tera.addBatch();
if(addedBatchNumber%100==0)
stmt_tera.executeBatch();
connection_tera.commit();
stmt_tera.clearBatch();
//some code here and finish while loop
Should I add paramater like TYPE=FASTLOAD to connection string? or something else?
If you are loading to an empty table I would consider using JDBC FastLoad. For more details on the performance of JDBC to insert data into a Teradata table please refer to the following article on the Teradata Developer Exchange: Speed up your JDBC/ODBC Applications
If your table is not empty, it may make sense to load the data to a staging (intermediate) table that is empty first. Then use the ANSI MERGE operation to apply the INSERT/UPDATE logic to the target table. The MERGE operation will perform faster than the traditional INSERT and UPDATE statements because the operation works at the block level instead of row level. In some instances you can even avoid spooling the source data before the data is applied to the target table.
Here is a collection of sample Teradata JDBC Driver programs. Programs 205 through 209 are examples of using FastLoad.
Additionally you can also consider another side of the coin..Meaning you can think of performing multiple row insert with single query
insert into table1 (First,Last) values ('Fred','Smith'),
('John','Smith'),
('Michael','Smith'),
('Robert','Smith');
The benefits are
Connecting/interacting with database is an expensive operation. Say you have to insert 100 rows using your code so you would write your application in such a way to fire 100 quires( 100 db interactions ).. Instead of this, build your sql query as mentioned above and try insert and check the performance.
You are avoiding n number of database interactions.
Insert operation is seamlessly faster if you do like this.. This has been widely adopted technique to restore/import databases.
Hope this will be helpful..
Cheers!
Cheers!
If I'm reading this correctly, you are executing and committing a batch that has only one insert statement in it - I don't think that is your intention ( or, if it is, I think you are misunderstanding how batches are expected to be used )
Seems like you need to have an inner loop that adds an arbitrary number of statements to the batch which you then submit via executeBatch()
Related
I have an Oracle table with ~10 million records that are not dependent on each other . An existing Java application executes the query an iterates through the returned Iterator batching the records for further processing. The fetchSize is set to 250.
Is there any way to parallelize getting the data from the Oracle DB? One thing that comes to mind is to break down the query into chunks using "rowid" and then pass these chunks to separate threads.
I am wondering if there is some kind of standard approach in solving this issue.
Few approaches to achieve it:
alter session force parallel QUERY parallel 32; execute this at DB level in PL/SQL code just before the execution of SELECT statement. You can adjust the 32 value depends on number of Nodes (RAC setup).
The approach which you are doing on the basis of ROWID but the difficult part is how you return the chunk of SELECT queries to JAVA and how you can combine that result. So this approach is bit difficult.
I have to insert ~40K records in 2 tables(say table1 & table2) in the database.
The insert in table2 is conditional. A record should be inserted in table2 if and only if a record is inserted in table1 successfully.
Can this be done in batch? I'm using JDBC driver. I'm using Oracle 10g XE.
What is the best approach to do this? Should I go for db pooling with multi-threading?
The executeUpdate method will return the number of rows affected by your statement. Could use this as a comparison to check it had executed successfully.
My suggestion is perform the business logic for the operation as close to the data as possible. This will mean having a PL/SQL procedure to act as an API for the functionality you wish to perform.
This will make your code trivial; a simple call to the database procedure which will return something giving you the result.
All the logic applied to the data is performed by code designed almost exclusively to manipulate data. Unlike Java which can manipulate data but not as well as PL/SQL. Incidentally it is also likely to be much faster.(this presentation on Youtube is very informative, if a little long - https://www.youtube.com/watch?v=8jiJDflpw4Y )
I am executing a custom built DML statement using the
namedParameterJdbcTemplate.update(sql, valueMap);
call, where the sql is built based on the values in the map. Here my map could get very large and thus the sql might also get very lengthy. I understand that in Oracle, there is no fixed number for how long a query can be and there are many factors including the database configuration that may affect this value, but I would like to limit the query length to a fixed number.
What is the best way to limit the query length? Would the spring-batch API be any useful here?
Thanks in advance for any pointers.
I would choose one of these approaches:
Temporary table - Insert data in batch to the temporary table and then use MERGE INTO statement with that table.
Create SQL type for your rows and bind just that one parameter. (google for OraData - it is a bit tricky but it works)
Both will enable you to have a small static query and therefore avoid potential problems with too large query (and its parsing, polluting library cache etc.).
I have an application that logs a lot of data to a MySQL database. The in-production version already runs insert statements in batches to improve performance. We're changing the db schema a bit so that some of the extraneous data is sent to a different table that we can join on lookup.
However, I'm trying to properly design the queries to work with our batch system. I wanted to use the mysql LAST_QUERY_ID so I wouldn't have to worry about getting the generated keys and matching them up (seems like a very difficult task).
However, I can't seem to find a way to add different insert statements to a batch, so how can resolve this? I assume I need to build a second batch and add all detail queries to that, but that means that the LAST_QUERY_ID loses meaning.
s = conn.prepareStatement("INSERT INTO mytable (stuff) VALUES (?)");
while (!queue.isEmpty()){
s.setLong(1, System.currentTimeMillis() / 1000L);
// ... set other data
s.addBatch();
// Add insert query for extra data if needed
if( a.getData() != null && !a.getData().isEmpty() ){
s = conn.prepareStatement("INSERT INTO mytable_details (stuff_id,morestuff)
VALUES (LAST_INSERT_ID(),?)");
s.setString(1, a.getData());
s.addBatch();
}
}
This is not how batching works. Batching only works within one Statement, and for a PreparedStatement that means that you can only add batches of parameters for one and the same statement. Your code also neglects to execute the statements.
For what you want to do, you should use setAutoCommit(false), execute both statement and then commit() (or rollback if an error occurred).
Also I'd suggest you look into the JDBC standard method of retrieving generated keys, as that will make your code less MySQL specific. See also Retrieving AUTO_INCREMENT Column Values through JDBC.
I've fixed it for now though I wish there was a better way. I built an arraylist of extra data values that I can associates with the generatedKeys returned from the batch inserts. After the first query batch executes, I build a second batch with the right ids/data.
How do i update multiple rows efficiently?
One statement
Multiple statements
Can a single statement string become too large for SQL to handle (10000+ entries/rows)?
I have one single variable to modify, which is status:
| id | status |
My data is stored in a List (ArrayList).
It could be more efficient to use the Load Data command in MySQL. Provided you can structure your input into CSV format. Utilize the REPLACE and/or IGNORE keywords appropriately. This will be much faster than 1000's of individual statements to MySQL.
If you want to use JDBC and do it efficient you should definitely check out this blog post about batch inserts performance (applies to updates too).
Generally speaking you need to add rewriteBatchedStatements=true to your connection string, for example:
Connection con = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/database_name?rewriteBatchedStatements=true","login", "password");
This will allow driver to take prepared statements and re-write them to more efficient form.
See prepared statements : http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Create your prepared statement out of the loop, then inside the loop, just execute prepared statement with updated parameters.
If your status is limited set of values, than I will break the list in subsets based on status and than update the set of rows per single statement. Even if there are 10K rows for specific staus, you can update multiple rows (using in operator) in single call. This will decrease the roundtrip that your application need to made for updation.
Also you may try batch updates like this, though I'm not sure whether they are efficient:
update `id_status_table` `row`
set `status` = (
select case `row`.`id` when 1 then 'one'
when 2 then 'two'
else 'three or more' end
);
While the query string for 10000 rows may get too big, you may apply such query to each 1000 rows.