Locking Tables with postgres in JDBC - java

Just a quick question about locking tables in a postgres database using JDBC. I have a table for which I want to add a new record to, however, To do this for the primary key, I use an increasing integer value.
I want to be able to retrieve the max value of this column in Java and store it as a variable to be used as a new primary key when adding a new row.
This gives me a small problem, as this is going to be modelled as a multi-user system, what happens when 2 locations request the same max value? This will of course create a problem when trying to add the same primary key.
I realise that I should be using an EXCLUSIVE lock on the table to prevent reading or writing while getting the key and adding a new row. However, I can't seem to find any way to deal with table locking in JDBC, just standard transactions.
psuedo code as such:
primaryKey = "SELECT MAX(id) FROM table1;";
primary key++;
//id retrieved again from 2nd source
"INSERT INTO table1 (primaryKey, value 1, value 2);"

You're absolutely right, if two locations request at around the same time, you'll run into a race condition.
The way to handle this is to create a sequence in postgres and select the nextval as the primary key.
I don't know exactly what direction you're heading and how your handle your data, but you could also set the column as a serial and not even include the column in your insert query. The column will automatically auto increment.

Related

Reading and wiring CSV File into database

I am implementing application specific data import feature from one database to another.
I have a CSV file containing say 10000 rows. These rows need to be inserted/updated into database.
I am using mysql database and inserting from Java.
There might be the case, where couple of rows may present in database that means those need to be updated. If not present in database, those need to be inserted.
One possible solution is that, I can read one by one line, check the entry in database and build insert/update queries accordingly. But this process may take much time to create update/insert queries and execute them in database. Some times my CSV file may have millions of records.
Is there any other faster way to achieve this feature?
I don't know how you determine "is already present", but if it's any kind of database level constraint (probably on a primary key?) you can make use of the REPLACE INTO statement, which will create a record unless it gets an error in which case it'll update the record that prevents it from being inserted.
It works just like INSERT basically:
REPLACE INTO table ( id, field1, field2 )
VALUES ( 1, 'value1', 'value'2 )
If a row with ID 1 exists, it's updated with these values; otherwise it's created.
Given that you're using MySQL you could use the INSERT ... ON DUPLICATE KEY UPDATE ... statement, which functions similarly to the SQL standard MERGE statement. MYSQL doc reference here and general Wikipedia reference to SQL MERGE functionality here. The statement would look something like
INSERT INTO MY_TABLE
(PRIMARY_KEY_COL, COL2, COL3, COL4)
VALUES
(1, 2, 3, 4)
ON DUPLICATE KEY
UPDATE COL2 = 2,
COL3 = 3,
COL4 = 4
In this example I'm assuming that PRIMARY_KEY_COL is a primary or unique key on MY_TABLE. If the INSERT statement would fail due to a duplicate value on the primary or unique key then the UPDATE clause is executed. Also note (on the MySQL doc page) that there are some gotcha's associated with auto-increment columns on an InnoDB table.
Share and enjoy.
Do you need to do this often or just once in a while?
I need to load csv files from time to time to a database for analysis and I created a SSIS-Datasolution with a Data Flow task which loads the csv-File into a table on the SQL Server.
For more infos look at this blog
http://blog.sqlauthority.com/2011/05/12/sql-server-import-csv-file-into-database-table-using-ssis/
Add a stored procedure in SQL for inserting. In the stored procedure use a try catch block to do the insert. If the insert fails do an update. Then you can simply call this method from your program.
Alternatively:
UPDATE Table1 SET (...) WHERE Column1='SomeValue'
IF ##ROWCOUNT=0
INSERT INTO Table1 VALUES (...)

Hibernate query to fetch records taking much time

I am trying to retrieve a set of records from a table. The query I am using is:
select * from EmployeeUpdates eu where eu.updateid>0 and eu.department = 'EEE'
The table EmployeeUpdates has around 20 million records. 'updateid' is the primary key and there are no records currently in the table with the department 'EEE'. But the query is taking lots of time, due to which the web-service call is getting timed out.
Currently we have index only on the column 'updateid'. 'department' is a new column added for which we are expecting 'EEE' records.
What changes can I make to retrieve the results faster?
First off, your sql isn't valid, looks like you're missing an 'and' between the 2 conditions.
I'm guessing that all the update ID's are positive, and as its the primary key, they're unique, so I suspect eu.updateid>0 matches every row. This means it's not technically a Tablespace scan, but an index based scan, although if that scan then has all 20 million rows after matching the index, you might as well have a table space scan. The only thing you can really do is add an index to the department field. Depending on what this data is, you could have it on a seperate table, with a numeric primary key and then store that as a foreign key on the eu table. This would mean you scanned through all the departments, then got the updated associated to them, rather than searching every single update for a specific department.
I think you should look into using a Table-per-subclass mapping (more here: http://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html/inheritance.html#inheritance-tablepersubclass-discriminator). You can make the department the discriminator and then you'd have a EEEEmployeUpdates and ECEmployeeUpdates classes. Your query could change then to just query the EEEEmployeeUpdates.

Get a unique max id for each instance using mysql

I am using a field with a prefix + auto increment id. For each instance i am taking the max+1 of ID and adding that to prefix. Can anyone suggest me a way to get this as unique please?
You can try this:
Insert into table1 (id, user_id)
SELECT MAX(id)+1, CONCAT('a',CAST(MAX(id)+1 AS char))
FROM table1;
See this SQLFiddle
The problem with using max(id)+1 is that there may be multiple threads making the same call, and so the result would not be unique. There are several ways to solve this problem. The first is to use a sequence, where the database server will increment the number every time a new id is requested. You can use a table, with a number in it, but you have to lock the table when you update the number. Or you can allow the database to create the key for you when the table is inserted and retrieve the key after the insert. All are valid.
I prefer to use Hibernate and make it determine how to implement the ID for the database I am currently using.

how to enable multi-thread/connection modify the same mysql table?

I have a program that has 2 threads running, and each thread has its own database JDBC connection, and they will access/modify the same database table A like below. Table A only has 2 columns (id, name), and the primary key is the combination of id and name.
statement stmt;
// first delete it if the record has exist in table
stmt.addBatch("delete from A where id='arg_id' and name='arg_name';");
// then insert it to table
stmt.addBatch("insert into A values (arg_id, arg_name);");
stmt.executeBatch();
The 2 threads maybe insert the same data to the table, and i got the following exception,
java.sql.BatchUpdateException: Duplicate entry '0001-joey' for key 1
at com.mysql.jdbc.Statement.executeBatch(Statement.java:708)
at com.mchange.v2.c3p0.impl.NewProxyStatement.executeBatch(NewProxyStatement.java:743)
at proc.Worker.norD(NW.java:450)
Do you have any idea how I can fix this issue? Thank you.
Regards,
Joey
Why not introduce a simple optimistic locking mecanism on the database?
Add a version column and track the version number when performing delete or update transactions.
Your table would look like
create table test(
id int not null primary key,
name varchar,
rowversion int default = 0);
Every time you retrieve a row you should retrieve the row version so you can do
update test set name='new name' rowversion=rowversion+1 where id=id and rowversion=retrieved row version;
The same with delete
delete from test where id=id and rowversion=retrievedRowVersion;
This is a simple mechanism that will exploit the dbms concurency management features. Check this link for more information on optimistic locking http://en.wikipedia.org/wiki/Optimistic_concurrency_control#Examples
This is obviously only a very simple implementation of concurency management but your problem has to take these into account.
Also for the double insert the fact that your transaction is rejected is good that means that no duplicate keys are inserted. You should just handle the Exception and notify the user of what happen.
Wrap both statements in a transaction:
BEGIN;
DELETE FROM a WHERE ...;
INSERT INTO a VALUES (...);
COMMIT;
Note that as long as the table consists of only the primary key, this conflict arises only when the table is unmodified at the end; I presume you want to add more columns, in which case you should use the UPDATE ... WHERE syntax to change values.
Are you using any kind of synchronization? First you will need to wrap the code that modifies the table in:
synchronized(obj)
{
// code
}
where obj is an object that both threads can access.
I don't know the exact semantics of your table modifications, but if they both insert ids, you will also need to hold a "global" id and atomically increment it in each thread, such that they don't both get the same value.

Insert fail then update OR Load and then decide if insert or update

I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.

Categories