I would like to do a real time reading from mysql.
The idea is simple. I use the binary log to trigger the select statement.
Meanwhile I'd like to read only the new rows on every change.
And currently I just consider insert.
So when someone do
insert into sometable(uid,somecolumn) values(uid,something)
My code will be triggered and do
select from sometable where uid=uid
Of course I have already written down which columns are the primary key because it seems no information from binlog.
I cannot find a tool to analysis mysql insert statement. So I use the regex to find out which column equals which value, then extract primary keys.
BUT the real problems what will happen if I do
Insert into `table` (`col`) values (select 0 as `col` from `dummy`);
How can I find out the col=0?
Is it impossible that make a select statement that select the new changed rows, triggered by the insert statement?
In a TRIGGER, you have access to the OLD and NEW values. With them, you can write code (in the TRIGGER) to log, for example, just the changes. Something like...
IF NEW.col1 != OLD.col1 THEN INSERT INTO LOG ...; END;
IF NEW.col2 != OLD.col2 THEN INSERT INTO LOG ...; END;
Related
I am implementing application specific data import feature from one database to another.
I have a CSV file containing say 10000 rows. These rows need to be inserted/updated into database.
I am using mysql database and inserting from Java.
There might be the case, where couple of rows may present in database that means those need to be updated. If not present in database, those need to be inserted.
One possible solution is that, I can read one by one line, check the entry in database and build insert/update queries accordingly. But this process may take much time to create update/insert queries and execute them in database. Some times my CSV file may have millions of records.
Is there any other faster way to achieve this feature?
I don't know how you determine "is already present", but if it's any kind of database level constraint (probably on a primary key?) you can make use of the REPLACE INTO statement, which will create a record unless it gets an error in which case it'll update the record that prevents it from being inserted.
It works just like INSERT basically:
REPLACE INTO table ( id, field1, field2 )
VALUES ( 1, 'value1', 'value'2 )
If a row with ID 1 exists, it's updated with these values; otherwise it's created.
Given that you're using MySQL you could use the INSERT ... ON DUPLICATE KEY UPDATE ... statement, which functions similarly to the SQL standard MERGE statement. MYSQL doc reference here and general Wikipedia reference to SQL MERGE functionality here. The statement would look something like
INSERT INTO MY_TABLE
(PRIMARY_KEY_COL, COL2, COL3, COL4)
VALUES
(1, 2, 3, 4)
ON DUPLICATE KEY
UPDATE COL2 = 2,
COL3 = 3,
COL4 = 4
In this example I'm assuming that PRIMARY_KEY_COL is a primary or unique key on MY_TABLE. If the INSERT statement would fail due to a duplicate value on the primary or unique key then the UPDATE clause is executed. Also note (on the MySQL doc page) that there are some gotcha's associated with auto-increment columns on an InnoDB table.
Share and enjoy.
Do you need to do this often or just once in a while?
I need to load csv files from time to time to a database for analysis and I created a SSIS-Datasolution with a Data Flow task which loads the csv-File into a table on the SQL Server.
For more infos look at this blog
http://blog.sqlauthority.com/2011/05/12/sql-server-import-csv-file-into-database-table-using-ssis/
Add a stored procedure in SQL for inserting. In the stored procedure use a try catch block to do the insert. If the insert fails do an update. Then you can simply call this method from your program.
Alternatively:
UPDATE Table1 SET (...) WHERE Column1='SomeValue'
IF ##ROWCOUNT=0
INSERT INTO Table1 VALUES (...)
I have an application that logs a lot of data to a MySQL database. The in-production version already runs insert statements in batches to improve performance. We're changing the db schema a bit so that some of the extraneous data is sent to a different table that we can join on lookup.
However, I'm trying to properly design the queries to work with our batch system. I wanted to use the mysql LAST_QUERY_ID so I wouldn't have to worry about getting the generated keys and matching them up (seems like a very difficult task).
However, I can't seem to find a way to add different insert statements to a batch, so how can resolve this? I assume I need to build a second batch and add all detail queries to that, but that means that the LAST_QUERY_ID loses meaning.
s = conn.prepareStatement("INSERT INTO mytable (stuff) VALUES (?)");
while (!queue.isEmpty()){
s.setLong(1, System.currentTimeMillis() / 1000L);
// ... set other data
s.addBatch();
// Add insert query for extra data if needed
if( a.getData() != null && !a.getData().isEmpty() ){
s = conn.prepareStatement("INSERT INTO mytable_details (stuff_id,morestuff)
VALUES (LAST_INSERT_ID(),?)");
s.setString(1, a.getData());
s.addBatch();
}
}
This is not how batching works. Batching only works within one Statement, and for a PreparedStatement that means that you can only add batches of parameters for one and the same statement. Your code also neglects to execute the statements.
For what you want to do, you should use setAutoCommit(false), execute both statement and then commit() (or rollback if an error occurred).
Also I'd suggest you look into the JDBC standard method of retrieving generated keys, as that will make your code less MySQL specific. See also Retrieving AUTO_INCREMENT Column Values through JDBC.
I've fixed it for now though I wish there was a better way. I built an arraylist of extra data values that I can associates with the generatedKeys returned from the batch inserts. After the first query batch executes, I build a second batch with the right ids/data.
I have a table on mySql with around 300,000 records. One column is a VARCHAR and it contains a link (let's say, http://www.mysite.com/123012993)
Using Java, everytime I create a new record I need to know if it already exists. The exact link must be on the database. If it's new, then proceed to insert. Else, do nothing.
So I have the following:
String selectString = "Select count(link) from records_table where link = ?";
PreparedStatement ps = conn.prepareStatement(selectString);
ps.setString(1, "http://www.mysite.com/123012993");
ResultSet rsFinding = ps.executeQuery();
rsFinding.next();
if (t != 0) return false;
else { // do normal insert }
However, the query to search the Text is very slow, we are talking around 1 minute. The insert itself is very fast. Everything runs on localhost.
Is this the right way to search for the text? Or should I index the database?
I was thinking on implementing a hashkey and narrow the results, but a query on 300,000 records shouldn't be to heavy I believe.
Thanks
A couple of things:
PreparedStatement should not be prepared each time again and again. Prepare and reuse.
Your t is defined nowhere.
Let the DB do the work: I guess each DB has a possibility to handle duplicates. For MySql there's INSERT ... ON DUPLICATE KEY UPDATE ...
So use this command
INSERT ? INTO records_table ON DUPLICATE KEY UPDATE link = link
The part link = link is a no-op to make the syntax looking good for the MySql parser.
There's also INSERT IGNORE which is bit easier to use (no need for the no-op), but it ignores more problems, which is bad.
I forgot to mention that you need a unique key constraint on link (a primary key is a special case of UK as thus fine too).
Sorry if my question is not specific or if it has been answered before. I tried looking for it and for a better way to ask but this is the most accurate way.
I have developed a program in Java in which I insert a new row into my database in the following way:
INSERT INTO table_name VALUES (?,?,?)
The thing is that I have this query in many parts of the program, and now I decided to add a fourth column to my table. Do I have to update EVERY SINGLE query with a new question mark in the program? If I dont, it crashes.
What is the best way to proceed in these cases?
YES.
you need to add extra ? (parameter placeholder) because you are using implicit INSERT statement. That means that you didn't specify the column names of the table to which the values will be inserted.
INSERT INTO table_name VALUES (?,?,?)
// the server assumes that you are inserting values for all
// columns in your table
// if you fail to add value on one column. an exception will be thrown
The next time you create an INSERT statement, make sure that you specify the column names on it so when you alter the table by adding extra column, you won't update all your place holders.
INSERT INTO table_name (Col1, col2, col3) VALUES (?,?,?)
// the server knows that you are inserting values for a specific column
Do I have to update EVERY SINGLE query with a new question mark in the program?
Probably. What you should do, while you're updating every single one of those queries, is to encapsulate them into an object, probably using a Data Source pattern such as a Table Data Gateway or a Row Data Gateway. That way you Don't Repeat Yourself and the next time you update the table, you only have one place to update the query.
Because of the syntax you've used, you might run some issues. I've referring to the lack of column names. Your INSERT queries will start failing as soon as you change your table structure.
If you had used the following syntax:
INSERT INTO table_name (C1, C2, C3) VALUES (?,?,?)
assuming your new column has a proper default value, then it would've work fine.
I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.