Validate existing values in the Postgres table before inserting the row - java

I am using JPA and Hibernate to connect to a table and insert data in the same. I have a table say : User which has three columns ID, Name and Address. I have an entity class for the same and to insert the data I simply use the EntityManager's object and persist the data in the db which works like a charm for me.
Now I have a scenario where I want to check whether the values that I am persisting already exist, if that is the case I have to log an error. Currently how I am doing that is manually loading the rows from the table and manually checking if the same values exist or not which is fairly simple for the example table (User) that has only three columns. But what if I have a table with 30 columns?
Do I manually load the data based on one condition and check for other columns or is there a better and a short way to do that ?

30 columns, is that you primary key as? If the data you are checking for duplication is the primary key, or unique constraint then you can use Hibnerate to fetch an object before save and report back if it exists. If the 30 columns are not part of the key then I would use equals method, and as such fetch all rows. However if there are many rows and this would be slow then I would probably write an dedicated SQL to check wherever an object exists, i.e.
UserDao public boolean rowExists(User user) { ... }

Related

Reading and wiring CSV File into database

I am implementing application specific data import feature from one database to another.
I have a CSV file containing say 10000 rows. These rows need to be inserted/updated into database.
I am using mysql database and inserting from Java.
There might be the case, where couple of rows may present in database that means those need to be updated. If not present in database, those need to be inserted.
One possible solution is that, I can read one by one line, check the entry in database and build insert/update queries accordingly. But this process may take much time to create update/insert queries and execute them in database. Some times my CSV file may have millions of records.
Is there any other faster way to achieve this feature?
I don't know how you determine "is already present", but if it's any kind of database level constraint (probably on a primary key?) you can make use of the REPLACE INTO statement, which will create a record unless it gets an error in which case it'll update the record that prevents it from being inserted.
It works just like INSERT basically:
REPLACE INTO table ( id, field1, field2 )
VALUES ( 1, 'value1', 'value'2 )
If a row with ID 1 exists, it's updated with these values; otherwise it's created.
Given that you're using MySQL you could use the INSERT ... ON DUPLICATE KEY UPDATE ... statement, which functions similarly to the SQL standard MERGE statement. MYSQL doc reference here and general Wikipedia reference to SQL MERGE functionality here. The statement would look something like
INSERT INTO MY_TABLE
(PRIMARY_KEY_COL, COL2, COL3, COL4)
VALUES
(1, 2, 3, 4)
ON DUPLICATE KEY
UPDATE COL2 = 2,
COL3 = 3,
COL4 = 4
In this example I'm assuming that PRIMARY_KEY_COL is a primary or unique key on MY_TABLE. If the INSERT statement would fail due to a duplicate value on the primary or unique key then the UPDATE clause is executed. Also note (on the MySQL doc page) that there are some gotcha's associated with auto-increment columns on an InnoDB table.
Share and enjoy.
Do you need to do this often or just once in a while?
I need to load csv files from time to time to a database for analysis and I created a SSIS-Datasolution with a Data Flow task which loads the csv-File into a table on the SQL Server.
For more infos look at this blog
http://blog.sqlauthority.com/2011/05/12/sql-server-import-csv-file-into-database-table-using-ssis/
Add a stored procedure in SQL for inserting. In the stored procedure use a try catch block to do the insert. If the insert fails do an update. Then you can simply call this method from your program.
Alternatively:
UPDATE Table1 SET (...) WHERE Column1='SomeValue'
IF ##ROWCOUNT=0
INSERT INTO Table1 VALUES (...)

Hibernate query to fetch records taking much time

I am trying to retrieve a set of records from a table. The query I am using is:
select * from EmployeeUpdates eu where eu.updateid>0 and eu.department = 'EEE'
The table EmployeeUpdates has around 20 million records. 'updateid' is the primary key and there are no records currently in the table with the department 'EEE'. But the query is taking lots of time, due to which the web-service call is getting timed out.
Currently we have index only on the column 'updateid'. 'department' is a new column added for which we are expecting 'EEE' records.
What changes can I make to retrieve the results faster?
First off, your sql isn't valid, looks like you're missing an 'and' between the 2 conditions.
I'm guessing that all the update ID's are positive, and as its the primary key, they're unique, so I suspect eu.updateid>0 matches every row. This means it's not technically a Tablespace scan, but an index based scan, although if that scan then has all 20 million rows after matching the index, you might as well have a table space scan. The only thing you can really do is add an index to the department field. Depending on what this data is, you could have it on a seperate table, with a numeric primary key and then store that as a foreign key on the eu table. This would mean you scanned through all the departments, then got the updated associated to them, rather than searching every single update for a specific department.
I think you should look into using a Table-per-subclass mapping (more here: http://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html/inheritance.html#inheritance-tablepersubclass-discriminator). You can make the department the discriminator and then you'd have a EEEEmployeUpdates and ECEmployeeUpdates classes. Your query could change then to just query the EEEEmployeeUpdates.

Locking Tables with postgres in JDBC

Just a quick question about locking tables in a postgres database using JDBC. I have a table for which I want to add a new record to, however, To do this for the primary key, I use an increasing integer value.
I want to be able to retrieve the max value of this column in Java and store it as a variable to be used as a new primary key when adding a new row.
This gives me a small problem, as this is going to be modelled as a multi-user system, what happens when 2 locations request the same max value? This will of course create a problem when trying to add the same primary key.
I realise that I should be using an EXCLUSIVE lock on the table to prevent reading or writing while getting the key and adding a new row. However, I can't seem to find any way to deal with table locking in JDBC, just standard transactions.
psuedo code as such:
primaryKey = "SELECT MAX(id) FROM table1;";
primary key++;
//id retrieved again from 2nd source
"INSERT INTO table1 (primaryKey, value 1, value 2);"
You're absolutely right, if two locations request at around the same time, you'll run into a race condition.
The way to handle this is to create a sequence in postgres and select the nextval as the primary key.
I don't know exactly what direction you're heading and how your handle your data, but you could also set the column as a serial and not even include the column in your insert query. The column will automatically auto increment.

Metadata of Check constraints SQL Server

I have a SQL Server database that holds a table where a varchar column has a check constraints on it to make sure only a few different words can be entered as a value (names).
Like this
CONSTRAINT chk_Names CHECK (name IN ('John', 'Eva', 'Carl', 'Fred'))
What I want do do is to populate a combobox in java with these names, and I don't want to manually enter them since they might change in the database. I want to populate it from metadata.
But I haven't been able to find a way to get the information from the database either with the INFORMATION_SCHEMA or sys.objects (or from DatabaseMetaData in java for that matter)
I'm quite new to SQL Server but is it possible to get that information somehow?
Regards
/Fred
It sounds like you should move the list of names to a table. You're Java form could select the data from the table.
And, because the data can change, it will be better to update the table than to change the check constraint. You can change the check constraint to a foreign key constraint too.
You can also find the check-constraint definitions in INFORMATION_SCHEMA.CHECK_CONSTRAINTS. The expression is in the CHECK_CLAUSE column; and, you'll have to extract the values from the expression.

How to track pagination information for multi database connection?

I have following setup
MySQL server instance1
MySQL server instance2
in both these table I have a single table records which are partitioned. I have to retrieve the data from each instance and show the data in JQGrid.
Here are the consideration to be made:
1) From each database instance only 1000 records needs to be got.
2) Merge these 1000 records and sort in ascending order by a default column.
3) Again from the merged records get only 1000 records to be shown in a Grid.
4) For the next 1000 records we should not show any of the earlier records which have been shown.
The major problem I am having is how to uniquely identify the last row shown from the fetched records.
I thought about doing it this way:
1) Get the rowid for each record from all the connection. But from two instances the rowid would be same then how would I identify which record is from which database?
2) Check for rowid and primary key combination. But if the client sets the primary key's auto-increment value as same on all the instance then we would not get a unique combination.
Am I missing something or is there any other way to do it?
I am using JDBC connection to connect the database.
[SOLVED IT]
Solved the problem by writing a small function which calculates and create a map for the number of records to be fetched from each connection for each iteration.
Sorry can't add the code here as it is clients IP.
You usually uniquely identify a row with a POJO class where you override your hashCode() and equals() methods basing on your class fields. You can put your last row into a HashMap and check against it.
A rownumber retrieved from the tables can be achieved by creating one using RANK() and PARTITION BY. Create a temporary or tmp_ table with your merging results then drop it.
hope this helps abit.
if the two mysql instance can see each other, how about create a view, UNION the two query results. then in java world, you could do query/pagination on this view?

Categories