remove duplicate values while insertion

remove duplicate values while insertion - java

Hi I am trying to insert values from excel sheet into SQL Database in java. SQL database has already some rows inserted by some other techniques. Now I need to insert new rows from excel sheet and should eliminate the duplicate values which are existed in the database as well as in the excel sheet. For that I write a query like this.
First I inserted the records from excelsheet into SQL database by using insert query
Statement.executeUpdate(("INSERT INTO dbo.Company(CName,DateTimeCreated) values
('"+Cname”' ,'"+ts+"');
Later I deleted the duplicate values using delete query.
String comprows="delete from dbo.Company where Id not in"
+ "(select min(Id) from dbo.Company "
+ "group by CName having count(*)>=1)";
statement3.executeUpdate(comprows);
where Id is autoincremented integer.
but it is not good to do insert and then delete.
How do I know the values are already exist? If it is exist how do I remove during insertion???

You can simply fire a SELECT for the CName first. If a record is found, update else insert a new record.
Edited to add code snippet:
ResultSet rs = Statement.query("SELECT Id from dbo.Company where CNAME = '" +Cname + "'");
if(rs.next()) {
// retrieve ID from rs
// fire an update for this ID
} else {
// insert a new record.
}
Alternatively, if you think that there are already duplicates on your table and you want to remove them as well..
ResultSet rs = Statement.query("SELECT Id from dbo.Company where CNAME = '"+Cname + "'");
List idList = new ArrayList();
while(rs.next()) {
// collect IDs from rs in a collection say idList
}
if(!isList.isempty()) {
//convert the list to a comma seperated string say idsStr
Statement.executeUpdate("DELETE FROM dbo.Company where id in ("+ idsStr + ")");
}
// insert a new record.
Statement.executeUpdate(("INSERT INTO dbo.Company(CName,DateTimeCreated) values('"+Cname”' ,'"+ts+"');
Of course good practice is to use PreparedStatement as it would improve performance.
PS: Excuse me for any syntax errors.

One option would be to create a temp table and dump your Excel data there. Then you can write an insert that joins the temp table with the dbo.Company table and only insert the records that aren't already there.
You could do a lookup on each record you want to insert but if you are dealing with large volumes that's not a super efficient way to do it since you will have to do a select and an insert for each record in you excel spreadsheet.
Merge statements are pretty effective in these types of situations as well. I don't think all databases support them (I know Oracle does for sure). A merge statement is basically a combo insert and update so you can do the look up to the final table and insert if not found and update if found. The nice thing about this is you get the efficiency of doing all of this as a set rather than one record at a time.

If you can control the DB schema, you might consider putting a unique contraint for whatever column(s) to avoid duplicating. When you do your inserts, it'll throw when it tries to add the dup data. Catch it before it tosses you all the way out.
It's usually good to enforce constraints like this on the DB itself; that means no one querying the database has to worry about invalid duplicates. Also, optimistically trying the insert first (without doing a separate select first) might be faster.

Related

How to delete multiple rows from multiple tables using Where clause?

Using an Oracle DB, I need to select all the IDs from a table where a condition exists, then delete the rows from multiple tables where that ID exists. The pseudocode would be something like:
SELECT ID FROM TABLE1 WHERE AGE > ?
DELETE FROM TABLE1 WHERE ID = <all IDs received from SELECT>
DELETE FROM TABLE2 WHERE ID = <all IDs received from SELECT>
DELETE FROM TABLE3 WHERE ID = <all IDs received from SELECT>
What is the best and most efficient way to do this?
I was thinking something like the following, but wanted to know if there was a better way.
PreparedStatement selectStmt = conn.prepareStatment("SELECT ID FROM TABLE1 WHERE AGE > ?");
selectStmt.setInt(1, age);
ResultSet rs = selectStmt.executeQuery():
PreparedStatement delStmt1 = conn.prepareStatment("DELETE FROM TABLE1 WHERE ID = ?");
PreparedStatement delStmt2 = conn.prepareStatment("DELETE FROM TABLE2 WHERE ID = ?");
PreparedStatement delStmt3 = conn.prepareStatment("DELETE FROM TABLE3 WHERE ID = ?");
while(rs.next())
{
String id = rs.getString("ID");
delStmt1.setString(1, id);
delStmt1.addBatch();
delStmt2.setString(1, id);
delStmt2.addBatch();
delStmt3.setString(1, id);
delStmt3.addBatch();
}
delStmt1.executeBatch();
delStmt2.executeBatch();
delStmt3.executeBatch();
Is there a better/more efficient way?

You could do it with one DELETE statement if two of your 3 tables (for example "table2" and "table3") are child tables of the parent table (for example "table1") that have a "ON DELETE CASCADE" option.
This means that the two child tables have a column (example column "id" of "table2" and "table3") that has a foreign key constraint with "ON DELETE CASCADE" option that references the primary key column of the parent table (example column "id" of "table1"). This way only deleting from the parent table would automatically delete associated rows in the child tables.
Check out this in more detail : http://www.techonthenet.com/oracle/foreign_keys/foreign_delete.php

If you delete only few records of a large tables ensure that an index on the
column ID is defined.
To delete the records from the table TABLE2 and 3 the best strategy is to use the CASCADE DELETE as proposed by
#ivanzg - if this is not possible, see below.
To delete from TABLE1 a far superior option that a batch delete on a row basis, use signle delete using the age based predicate:
PreparedStatement stmt = con.prepareStatement("DELETE FROM TABLE1 WHERE age > ?")
stmt.setInt(1,60)
Integer rowCount = stmt.executeUpdate()
If you can't cascade delete, use for the table2 and 3 the same concept as above but with the following statment:
DELETE FROM TABLE2/*or 3*/ WHERE ID in (SELECT ID FROM TABLE1 WHERE age > ?)
General best practice - minimum logic in client, whole logic in the database server. The database should be able to do reasonable execution plan
- see the index note above.

DELETE statement operates a table per statement. However the main implementations support triggers or other mechanisms that perform subordinate modifications. For example Oracle's CREATE TRIGGER.
However developers might end up figuring out what is the database doing behind their backs. (When/Why to use Cascading in SQL Server?)
Alternatively, if you need to use an intermediate result in your delete statements. You might use a temporal table in your batch (as proposed here).
As a side note, I see not transaction control (setAutoCommit(false) ... commit() in your example code. I guess that might be for the sake of simplicity.
Also you are executing 3 different delete batches (one for each table) instead of one. That might negate the benefit of using PreparedStatement.

Get current sequence Id to store in other tables

We have multiple tables and all are related with first table's primary key (example: id). Id is configured as a sequence and while inserting data into to first table we are using sequence.nextval in the insert query.
Now while inserting data to other tables, how to get current sequence value or current Id.
We have tried below options:
sequence.currval, directly in the insert statement
2.select sequence.currval from dual
Above two options throwing error while using getJdbcTemplate().update().
Could anyone please suggest how to get current sequence value to pass to other tables after inserting data into first table??

If you want to insert the same id (which comes from a sequence) to different tables, simple get it form the first insert and use it in the other inserts.
PrepearedStatement stmt1 = conn.prepareStatement("INSERT INTO TABLE1 (id) VALUES(yoursequence.nextval)", Statemet.RETURN_GENERATED_KEYS);
stmt1.executeUpdate();
ResultSet rs = stmt1.getGeneratedKeys();
rs.next();
long id = rs.getLong(1);
PrepearedStatement stmt2 = conn.prepareStatement("INSERT INTO TABLE2 (id) VALUES(?)");
stmt2.setLong(1,id);
stmt2.executeUpdate();

java looping through multiple sql queries

I'm trying to loop through multiple sql queries that are executed. I want to first get all the question information for a certain task and then get the keywords for that question. I have three records in my Questions table, but when the while loop at the end of list.add(keyword); is done, it jumps to the SELECT Questions.Question loop (as it should) and then just jumps out and gives me only one record and not the other 2.
What am I doing wrong? Can someone maybe help me fix my code? I've thought of doing batch sql executes (maybe that is the solution), but within each while loop, I need information from the previous sql statement, so I can't just do it all at the end of the batch.
SQL Code:
String TaskTopic = eElement.getElementsByTagName("TaskTopic").item(0).getTextContent();
// perform query on database and retrieve results
String sql = "SELECT Tasks.TaskNo FROM Tasks WHERE Tasks.TaskTopic = '" + TaskTopic + "';";
System.out.println(" Performing query, sql = " + sql);
result = stmt.executeQuery(sql);
Document doc2 = x.createDoc();
Element feedback = doc2.createElement("Results");
while (result.next())
{
String TaskNo = result.getString("TaskNo");
// perform query on database and retrieve results
String sqlquery = "SELECT Questions.Question, Questions.Answer, Questions.AverageRating, Questions.AverageRating\n" +
"FROM Questions\n" +
"INNER JOIN TaskQuestions ON TaskQuestions.QuestionID = Questions.QuestionID \n" +
"INNER JOIN Tasks ON Tasks.TaskNo = '" + TaskNo + "';";
result = stmt.executeQuery(sqlquery);
while (result.next())
{
String Question = result.getString("Question");
String Answer = result.getString("Answer");
String AverageRating = result.getString("AverageRating");
String sqlID = "SELECT QuestionID FROM Questions WHERE Question = '" + Question + "';";
result = stmt.executeQuery(sqlID);
while (result.next())
{
String ID = result.getString("QuestionID");
String sqlKeywords = "SELECT Keyword FROM LinkedTo WHERE QuestionID = '" + ID + "';";
result = stmt.executeQuery(sqlKeywords);
while (result.next())
{
String keyword = result.getString("Keyword");
list.add(keyword);
}
}
feedback.appendChild(x.CreateQuestionKeyword(doc2, Question, Answer, AverageRating, list));
}
}

Why this should be done in SQL
Creating loops is exponentially less efficient than writing a sql query. Sql is built to pull back this type of data and can plan out how it is going to get this data from the database (called an execution plan).
Allowing Sql to do its job and determine the best way to pull back the data instead of explicitly determining what tables you are going to use first and then calling them one at a time is better in terms of the amount of resources you will use, how much time it will take to get the results, code readability, and maintainability in the future.
What information you are looking for
In the psuedocode you provided, you are using the Keyword, Question, Answer, and AnswerRating values. Finding these values should be the focus of the sql query. Based on the code you have written, Question, Answer, and AnswerRating are coming from the Questions table and Keyword is coming from the LinkedTo table, so both of these tables should be available to have data pulled from them.
You can note at this point that we have essentially just mapped out what the Select and From portions of your query should look like.
It also looks like you have a parameter called TaskTopic so we need to include the table Tasks to make sure the correct data is returned. Lastly, the TaskQuestions table is the link between the tasks and the questions. Now that we know what the query should look like, let's see what the results are using sql syntax.
The Code
You did not include the declaration of stmt, but I assume that it is a PreparedStatement. You can add parameters to a prepared statement. Notice the ? in the sql code? The parameters you provide will be added in place of the ?. To do this, you should use stmt.setString(1, TaskTopic);. Note that if there were more than one parameter, you would need to add them in the order that they exists in the sql query (using 1, 2, ...)
SELECT l.Keyword,
q.Question,
q.Answer,
q.AverageRating
FROM LinkedTo l Inner Join
Questions q
on l.questionID = q.QuestionID
Where exists ( Select 1
From TaskQuestions tq INNER JOIN
Tasks t
on tq.TaskNo = t.TaskNo
Where t.TaskTopic = ?
and tq.QuestionID = q.QuestionID)
This is one way that you can write the query to return the same results. There are other ways to write this to get what you are looking for.
What's Going On?
There are a few things in this query you may not be familiar with. First are table aliases. Instead of writing the table name over and over again, you can alias your tables. I used the letter q to represent the Questions table. Any time you see q. you should recognize that I am referring to a column from Questions. The q after Questions is what gives the table its alias.
Exists Instead of doing a bunch of inner joins with tables that you are not selecting information from, you can use an exists to check if what you are looking for is in those tables. You can continue to do inner joins if you need data from the tables, but if you don't, Exists is more efficient.
I suspect you had issues with the query before (and probably the one you provided) because you did not provide any information to join TaskQuestions and Tasks together. That most likely resulted in the duplicates. I joined on TaskNo but this may not be the correct column depending on how the tables are set up.

Complex INSERT query

I'm pretty new to MySQL. I have two related tables, quite common case: Klients(KID, name, surname) and Visits(VID, VKID, dateOfVisit) - VKID is the Klient ID. I have a problem with suitable INSERT query, this is what I want to do:
1.Check if Klient with specific name and surname exists (let's assume that there are no people with the same surnames)
2.If yes, get the ID and do the INSERT to Visits table
3.If no, INSERT new Klient, get the ID and INSERT to Visits.
Is it possible to do in one query?

You would need to use the IF EXIST / NOT EXISTS and use a subquery to check the table. See the reference bwlo
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
HTH

The INSERT statement allows only one single target table.
So the query you're looking for is just impossible unless you use triggers or stored procedures.
But such problem is commonly solved using the fallowing small algorithm:
1) insert a record in table [Visits] assuming the parent record does exist in table [Klients]
INSERT INTO Visits (VKID, dateOfVisit)
SELECT KID, NOW()
FROM Klients
WHERE (name=#name) AND (surname=#surname)
2) check the number of inserted records after query (1)
3) if no record has been inserted, then add a new record table [Klients], and then run (1) again.

try something like this
IF (SELECT * FROM `sometable` WHERE name = 'somename' AND surname = 'somesurname') IS NULL THEN
INSERT INTO Table1(name,surname) VALUES ('somename', 'somesurname');
ELSE INSERT INTO visits(kid,name,surname)
SELECT kid, name, surname FROM Table1 WHERE name = 'somename' AND surname = 'somesurname';
END IF;
there is no need to specify 'VALUES' on the second insert
i have not tested it, but this is the general idea of what you are trying to accomplish.

These should be two queries in a transaction:
INSERT INTO Klients (name, surname)
VALUES ('John', 'Doe')
ON DUPLICATE KEY UPDATE
KID = LAST_INSERT_ID(KID);
INSERT INTO Visits (VKID, dateOfVisits)
VALUES (LAST_INSERT_ID(), NOW());
The first statement is an upsert statement where the update part uses not widely known, but intented exactly for the purpose functionality of LAST_INSERT_ID(), where explicitly passed value is stored for getting the value afterwards.
UPD: I forgot to mention that you would need to add a unique constraint on (surname, name).

Oracle & java dynamic 'Order by' clause

I am trying to build a dynamic sql query in java (shown below)
sqlStr = "Select * " +
"from " + tableName
if(tableName!=null){
if(tableName.equals("Table1"){
sqlStr = sqlStr.concat("order by city desc");
}else if(tableName.equals("Table2"){
sqlStr = sqlStr.concat("order by country desc");
}else if(tableName.equals("Table3"){
sqlStr = sqlStr.concat("order by price desc");
}
}
Now what i would like to do is to add a final 'else' statement which would order the query based on whether the table contains a column named 'custID'. There will be several tables with that column so i want to sort the ones that have that column by custID. (Rather than having hundreds of additional if statements for each table that does have that column name.) Is this possible? i have seen people using the 'decode' function but i cant work out how to use it here.

Use DatabaseMetaData to get the table information.
You can use the getTablexxx() and getColumnxx() methods to get the table information.
Connection conn = DriverManager.getConnection(.....);
DatabaseMetaData dbmd = conn.getMetaData();
dbmd.getxxxx();
Note: you forgot space in your code before ORDER BY clause.

If you are happy with hardcoding things, a way to avoid multiple conditionals would be to store a list of all the tables that include custID.
private final static String tablesWithCustID = "/TableX/TableY/TableZ/";
...
if (tablesWithCustID.contains( tableName )) {
sqlStr = sqlStr.concat("order by custID")
}
You could use a List instead of a simple delimited string if you like.
Perhaps better, you could store a map with table names as the key, and the sort string as the value. Load it up once, then you don't need any conditional at all.

The most straight-forward way to do it is to read the column definitions from USER_TAB_COLUMNS or ALL_TAB_COLUMNS and check for the existence of a custID column. Without crazy PL/SQL tricks, you won't be able to solve this in SQL alone.
BTW, there is a " " missing between tableName and the order by clauses.

I understand that you're looking for a solution that can do this in one query, i.e. without running a separate metadata query beforehand.
Unfortunately, this won't be possible. The decode function can do some dynamic things with column values, but not with column name. And you're looking for a solution dynamically derive the column name.
An alternative might be to just add ORDER BY 1, 2. This is an old syntax that means order by the first and than by the second column. It might be a good solution if the custID column is the first column. Otherwise it at least gives you some sorting.

How about ArrayList.contains()?
You can create a list of tables which have that column, and just check for tables.contains(tablename) in the final if condition.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.