What I mean is I have a program that executes inserts in batches of 100k. Each one of these inserts is assigned a new ID from a sequence on insert. I want to keep the batch process for obvious reasons, but I also need to then pull out each ID as it is created and do things with it before I move on to the next insert. Is there a way to do this?
Things work differently in PostgreSQL than MySQL. First you have to write your insert as:
INSERT INTO foo (...) VALUES (...)
RETURNING id;
The RETURNING id is important as that tells the insert statement to return something. Then you should be able to pull back the id as what you could expect from a select statement.
I am not quite sure how the JDBC driver for PostgreSQl implements this regarding batch processing though. If you have to, you could probably modify this to store the id in a temporary table or something that you could query from after.
Related
I am using Hibernate with MSSQL server writing the software that integrates with an existing database. There is an instead of insert trigger on the table that I need to insert into and it messes up ##Identity, which means on Hibernate's save I can't get the id of inserted row. I can't control the trigger (can't modify it). I saw this question, but it involves procedures, which my trigger does not have, so I thought my question is different enough. I can't post the whole trigger, but hopefully I can post enough to get the point across:
CREATE TRIGGER TrigName ON TableName
INSTEAD OF INSERT
AS
SET XACT_ABORT ON
BEGIN TRANSACTION
-- several DECLARE, SET statements
-- a couple of inserts into other tables for business logic
-- plain T-SQL statements without procedures or functions
...
-- this is the actual insert that i need to perform
-- to be honest, I don't quite understand how INSERTED table
-- was filled with all necessary columns by this point, but for now
-- I accept it as is (I am no SQL pro...)
INSERT INTO ClientTable (<columns>)
SELECT <same columns> from INSERTED
-- a couple of UPDATE queries to unrelated tables
...
COMMIT TRANSACTION;
I was wondering if there is a reliable way to get the id of the row being inserted? One solution I thought of and tried to make is to install an on insert trigger on the same table that writes the newly inserted row into a new table I added to the db. I'd use that table as a queue. After transaction commit in Hibernate I could go into that table and run a select with the info I just inserted (I still have access to it from the same method scope), and I can get the id and finally remove that row. This is a bulky solution, but best I can come up with so far.
Would really appreciate some help. I can't modify existing triggers and procedures, but I can add something to the db if it absolutely does not affect existing logic (like that new table and a on insert trigger).
To sum up: I need to find a way to get the ID of the row I just inserted with Hibernate's save call. Because of that instead of insert trigger, hibernate always returns identity=0. I need to find a way to get that ID because I need to do the insert in a few other tables during one transaction.
I think I found an answer for my question. To reply to #SeanLange's comment: I can't actually edit insert code - it's done by another application and inquiry to change that will take too long (or won't happen - it's a legacy application). What I did is insert another trigger on insert on the same table. Since I know the order of operations in the existing instead of insert trigger I can see that the last insert operation will be in the table I want so that means my on insert trigger will fire right after that. In the scope of that trigger I have access to inserted table out of which I pull out the id.
CREATE TRIGGER Client_OnInsert ON myClientTable
FOR INSERT
AS
BEGIN
DECLARE #ID int;
SET #ID = (select ClientID from inserted);
INSERT INTO ModClient (modClientId)
OUTPUT #ID
VALUES (#ID);
END
GO
Then in Hibernate (since I can't use save() anymore), I use a NativeQuery to do this insert. I set parameters and run the list() method of NativeQuery, which returns a List where the first and only argument is the id I want.
This is a bulky way, I know. If there is anything that's really bad that will stand out to people - please let me know. I would really appreciate some feedback on this. However, I wanted to post this answer as a potential answer that worked so far, but it does not mean it's very good. For this solution to work I did have to create another small table ModClient, which I will have to use as a temp id storage for this exact purpose.
I have a ton of raw html files that I'm parsing and inserting to a MySQL database via a connection in Java.
I'm using "REPLACE INTO" statements and this method:
public void migrate(SomeThread thread) throws Exception{
PreparedStatement threadStatement = SQL.prepareStatement(threadQuery);
thread.prepareThreadStatement(threadStatement);
threadStatement.executeUpdate();
threadStatement.close();
for(SomeThread.Post P : thread.threadPosts){
PreparedStatement postStatement = SQL.prepareStatement(postQuery);
P.preparePostStatement(postStatement);
postStatement.executeUpdate();
postStatement.close();
}
}
I am running 3 separate instances of my program each in its own command prompt, with their own separate directory of htmls to parse and commit.
I'm using HeidiSQL to monitor the database and a funny thing is happening where I'll see that I have 500,000 rows in a table at one point for example, then I'll close HeidiSQL and check back later to find that I now have 440,000 rows. The same thing occurs for the two tables that I'm using.
Both of my tables use a primary key called "id", each of their ID's have their own domain but it's possible their values overlap and are overwriting each other? I'm not sure if this could be an issue because I'd think SQL would differentiate between the table's "local" id values.
Otherwise I was thinking it could be that since I'm running 3 separate instances that each have their connection to the DB, some kind of magic is happening where right as one row is being committed, the execution swaps to another commit statement, displaces the table, then back to the first commit and then some more magic that causes the database to roll back the number of rows collected.
I'm pretty new to SQL so I'm not too sure where to start, if somebody has an idea about what the heck is going on and could point me in the right direction I'd really appreciate it.
Thanks
You might want to use INSERT INTO instead of REPLACE INTO.
Data doesn't disappear.
Here are some tips:
Do you have another thread running that actually deletes entries?
Do other people have access to the database?
Not sure what HeidiSQL may do. To exclude that possibility maybe use MySQL Workbench instead.
Yeah now that I run a COUNT(*) query against my tables I see that all my rows are in fact there.
Most likely the heidiSQL summary page is just a very rough estimate.
Thanks for the suggestion to use workbench pete, I will try it and see if it is better than Heidi as Heidi is freezing up on me on a regular basis.
I have 2 DBs, Database A and Database B.
What I want to achieve:
build records from Database A and insert them to Database B
Process those records in my java app
What I'm currently doing:
I use two separate queries:
For (1) I use INSERT INTO ... SELECT ...
For (2) I perform another SELECT.
My solution works but it isn't optimal since I'm getting the records from Database A twice (instead of just one time).
Is there a way to execute the INSERT INTO ... SELECT ... and get the inner select result as a ResultSet?
I know I can perform only a SELECT and then insert the records in a batch, but thats a bit cumbersome and I want to find out if there's a cleaner solution.
Your cleaner solution look more cumbersome than simple read and write operation.
As you have to manipulate data in database B. You simply do this
Read Data from A to your app
Process data
Write data to B from your app
Then you have singe read single write and is simple.
You can not gain the result of INSERT INTO as Result set as this is INSERT statement
Sadly, I do not think that this is possible. What you are trying to achieve are two distinct operations i.e. an INSERT and a SELECT. However you cut it you are still going have to do at least one INSERT and one SELECT.
use this for two database
INSERT INTO Database2 (field1,field2,field3){
SELECT * FROM Database1;);
Both the database have the same field name.
I have an application that logs a lot of data to a MySQL database. The in-production version already runs insert statements in batches to improve performance. We're changing the db schema a bit so that some of the extraneous data is sent to a different table that we can join on lookup.
However, I'm trying to properly design the queries to work with our batch system. I wanted to use the mysql LAST_QUERY_ID so I wouldn't have to worry about getting the generated keys and matching them up (seems like a very difficult task).
However, I can't seem to find a way to add different insert statements to a batch, so how can resolve this? I assume I need to build a second batch and add all detail queries to that, but that means that the LAST_QUERY_ID loses meaning.
s = conn.prepareStatement("INSERT INTO mytable (stuff) VALUES (?)");
while (!queue.isEmpty()){
s.setLong(1, System.currentTimeMillis() / 1000L);
// ... set other data
s.addBatch();
// Add insert query for extra data if needed
if( a.getData() != null && !a.getData().isEmpty() ){
s = conn.prepareStatement("INSERT INTO mytable_details (stuff_id,morestuff)
VALUES (LAST_INSERT_ID(),?)");
s.setString(1, a.getData());
s.addBatch();
}
}
This is not how batching works. Batching only works within one Statement, and for a PreparedStatement that means that you can only add batches of parameters for one and the same statement. Your code also neglects to execute the statements.
For what you want to do, you should use setAutoCommit(false), execute both statement and then commit() (or rollback if an error occurred).
Also I'd suggest you look into the JDBC standard method of retrieving generated keys, as that will make your code less MySQL specific. See also Retrieving AUTO_INCREMENT Column Values through JDBC.
I've fixed it for now though I wish there was a better way. I built an arraylist of extra data values that I can associates with the generatedKeys returned from the batch inserts. After the first query batch executes, I build a second batch with the right ids/data.
So i have a database where there is a lot of data being inserted from a java application. Usualy i insert into table1 get the last id, then again insert into table2 and get the last id from there and finally insert into table3 and get that id as well and work with it within the application. And i insert around 1000-2000 rows of data every 10-15 minutes.
And using a lot of small inserts and selects on a production webserver is not really good, because it sometimes bogs down the server.
My question is: is there a way how to insert multiple data into table1, table2, table3 without using such a huge amount of selects and inserts? Is there a sql-fu technique i'm missing?
Since you're probably relying on auto_increment primary keys, you have to do the inserts one at a time, at least for table1 and table2. Because MySQL won't give you more than the very last key generated.
You should never have to select. You can get the last inserted id from the Statement using the getGeneratedKeys() method. See an example showing this in the MySQL manual for the Connector/J:
http://dev.mysql.com/doc/refman/5.1/en/connector-j-usagenotes-basic.html#connector-j-examples-autoincrement-getgeneratedkeys
Other recommendations:
Use multi-row INSERT syntax for table3.
Use ALTER TABLE DISABLE KEYS while you're importing, and re-enable them when you're finished.
Use explicit transactions. I.e. begin a transaction before your data-loading routine, and commit at the end. I'd probably also commit after every 1000 rows of table1.
Use prepared statements.
Unfortunately, you can't use the fastest method for bulk load of data, LOAD DATA INFILE, because that doesn't allow you to get the generated id values per row.
There's a lot to talk about here:
It's likely that network latency is killing you if each of those INSERTs is another network roundtrip. Try batching your requests so they only require a single roundtrip for the entire transaction.
Speaking of transactions, you don't mention them. If all three of those INSERTs need to be a single unit of work you'd better be handling transactions properly. If you don't know how, better research them.
Try caching requests if they're reused a lot. The fastest roundtrip is the one you don't make.
You could redesign your database such that the primary key was not a database-generated, auto-incremented value, but rather a client generated UUID. Then you could generated all the keys for every record upfront and batch the inserts however you like.