I have a table on mySql with around 300,000 records. One column is a VARCHAR and it contains a link (let's say, http://www.mysite.com/123012993)
Using Java, everytime I create a new record I need to know if it already exists. The exact link must be on the database. If it's new, then proceed to insert. Else, do nothing.
So I have the following:
String selectString = "Select count(link) from records_table where link = ?";
PreparedStatement ps = conn.prepareStatement(selectString);
ps.setString(1, "http://www.mysite.com/123012993");
ResultSet rsFinding = ps.executeQuery();
rsFinding.next();
if (t != 0) return false;
else { // do normal insert }
However, the query to search the Text is very slow, we are talking around 1 minute. The insert itself is very fast. Everything runs on localhost.
Is this the right way to search for the text? Or should I index the database?
I was thinking on implementing a hashkey and narrow the results, but a query on 300,000 records shouldn't be to heavy I believe.
Thanks
A couple of things:
PreparedStatement should not be prepared each time again and again. Prepare and reuse.
Your t is defined nowhere.
Let the DB do the work: I guess each DB has a possibility to handle duplicates. For MySql there's INSERT ... ON DUPLICATE KEY UPDATE ...
So use this command
INSERT ? INTO records_table ON DUPLICATE KEY UPDATE link = link
The part link = link is a no-op to make the syntax looking good for the MySql parser.
There's also INSERT IGNORE which is bit easier to use (no need for the no-op), but it ignores more problems, which is bad.
I forgot to mention that you need a unique key constraint on link (a primary key is a special case of UK as thus fine too).
Related
I would like to do a real time reading from mysql.
The idea is simple. I use the binary log to trigger the select statement.
Meanwhile I'd like to read only the new rows on every change.
And currently I just consider insert.
So when someone do
insert into sometable(uid,somecolumn) values(uid,something)
My code will be triggered and do
select from sometable where uid=uid
Of course I have already written down which columns are the primary key because it seems no information from binlog.
I cannot find a tool to analysis mysql insert statement. So I use the regex to find out which column equals which value, then extract primary keys.
BUT the real problems what will happen if I do
Insert into `table` (`col`) values (select 0 as `col` from `dummy`);
How can I find out the col=0?
Is it impossible that make a select statement that select the new changed rows, triggered by the insert statement?
In a TRIGGER, you have access to the OLD and NEW values. With them, you can write code (in the TRIGGER) to log, for example, just the changes. Something like...
IF NEW.col1 != OLD.col1 THEN INSERT INTO LOG ...; END;
IF NEW.col2 != OLD.col2 THEN INSERT INTO LOG ...; END;
I have an application that logs a lot of data to a MySQL database. The in-production version already runs insert statements in batches to improve performance. We're changing the db schema a bit so that some of the extraneous data is sent to a different table that we can join on lookup.
However, I'm trying to properly design the queries to work with our batch system. I wanted to use the mysql LAST_QUERY_ID so I wouldn't have to worry about getting the generated keys and matching them up (seems like a very difficult task).
However, I can't seem to find a way to add different insert statements to a batch, so how can resolve this? I assume I need to build a second batch and add all detail queries to that, but that means that the LAST_QUERY_ID loses meaning.
s = conn.prepareStatement("INSERT INTO mytable (stuff) VALUES (?)");
while (!queue.isEmpty()){
s.setLong(1, System.currentTimeMillis() / 1000L);
// ... set other data
s.addBatch();
// Add insert query for extra data if needed
if( a.getData() != null && !a.getData().isEmpty() ){
s = conn.prepareStatement("INSERT INTO mytable_details (stuff_id,morestuff)
VALUES (LAST_INSERT_ID(),?)");
s.setString(1, a.getData());
s.addBatch();
}
}
This is not how batching works. Batching only works within one Statement, and for a PreparedStatement that means that you can only add batches of parameters for one and the same statement. Your code also neglects to execute the statements.
For what you want to do, you should use setAutoCommit(false), execute both statement and then commit() (or rollback if an error occurred).
Also I'd suggest you look into the JDBC standard method of retrieving generated keys, as that will make your code less MySQL specific. See also Retrieving AUTO_INCREMENT Column Values through JDBC.
I've fixed it for now though I wish there was a better way. I built an arraylist of extra data values that I can associates with the generatedKeys returned from the batch inserts. After the first query batch executes, I build a second batch with the right ids/data.
When i'm trying to delete a last row in a table using a PreparedStatement (I'm using a MySQL database), the row wasn't been deleted, I tried to use a DELETE FROM... command and a TRUNCATE command (using the executeUpdate() command), but none of those commands deleted that last row, What should I do, in order to be able to delete that last row?
this is the command i have written:
String sqlDelete = "DELETE FROM free_time WHERE therapist_id=? AND from=? AND to=? AND date=?";
And I have checked that the parameters that i'm sending to the Prepared Statement are correct, but still, the row isn't been deleted.
Thanks in advanced.
Last row doesn't mean anything in a relational database. You don't need to know how the rows are stored.
You should be using a WHERE clause to identify what you need to DELETE.
I wouldn't recommend TRUNCATE.
You are using key words, and obviously suppressing exceptions ;).
String sqlDelete = "DELETE FROM free_time WHERE therapist_id=? AND `from`=? AND `to`=? AND `date`=?";
My question is a little vague at the moment since I'm not sure that I'm supposed to post any company code online or anything. But here goes.
Suppose I need to update a specific field in a MySQL database. In order to do this using my Java client program, I have to use multiple SELECT statements in order to check that the field should be updated, and then appropriately update it using the information that has been retrieved.
eg.
//created a Connection called con already...
PreparedStatement selectStatement = con.prepareStatement("SELECT * FROM myTable" /*+ etc*/); //example query only! I'm not actually going to use "SELECT * FROM myTable"!
//more selectStatements follow
PreparedStatement updateStatement = con.prepareStatement("UPDATE myTable SET field1 = ? WHERE id = ?");
ResultSet rs = selectStatement.executeQuery();
//more ResultSets from the other selectStatements
//process ResultSets and retrieve information that indicates wwhether an update must take place
if(conditionOccurred) { //Assuming we need to update
updateStatement.setText(...);
updateStatement.executeUpdate();
}
(I haven't included try-catches in the code (sorry, I'm a bit lazy since this is just a contrived example) but I'd have to catch the potential SQLExceptions as well, I guess...)
What I'm wondering is: will it still be more "expensive", or costly in terms of speed if I delete the row and then insert a new row that contains all the updated information, given that I now need to use multiple select statements to check whether an update should occur? (memory is not such a big issue at the moment, though if something I've done has a massive flaw with regards to this I'd love to hear it!)
TL; DR: If I use multiple SELECT statements and then an UPDATE to some field(s), will it be more efficient to simply DELETE and then INSERT a new row?
Extra details: the table I'm working with at the moment has an auto-incremented ID, a VARCHAR field (the one to be updated, has a uniqueness constraint), 2 date fields and a CHAR(64) field. Not sure if it helps in answering the question, but I'll provide it anyway.
Please let me know if there are more details you'd need, and thank you in advance to anyone who might provide some insight.
To fully answer your question we would need to see your SELECT statement, however if your UPDATE does not alter the primary key values I would assume UPDATE is more efficient. The reasoning behind this is that an index values would not have to be adjusted where in the case of the DELETE & INSERT the index would be.
As in most cases the only sure fire way to test this is by using both methods and bench marking the elapsed time.
I'm answering your question based on the knowledge I acquired from my advanced database management course. I would say it would be very subjective as your concern is in terms of speed here and not the usage of memory.
When retrieval are done, in terms of your Select statements, your data are cached and when any necessary Update are required, you directly edit the fields in the cache. This save a read and write trip if you were performing the latter of Delete and Insert.
This would in my understanding, save you processing time in terms of millisecond for one single transaction, and if you look at a big picture, it will save you a lot when multiple transaction are performed. However, if your select statements involves too many queries dealing with a large size of data, it might turn out that your latter method is more efficient.
I believe with your additional inputs of your SQL statements, we would be able to give you a better and more accurate advise. :) I hope it helps.
I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.