synchronizing current database entries with new data to insert - java

I have to sync the data in my database table with new data from an sql query, where I need to delete the current entries which is not in the new data and insert the new entries which is not in the current data. I was able to do this in java using this pseudo code:
// 1) get all data in database and store it in list (currentList)
// 2) get new data obtained through sql query and store it in list (newList)
// 3) sync both list
for(entry : currentList) {
if(newList.contains(entry))
finalList.add(entry)
}
for(entry : newList) {
if(!finalList.contains(entry))
finalList.add(entry)
}
// 4) delete all data from DB
// 5) insert finalList data to DB
It works fine, however, I think it will have performance issue when dealing with large set of data because I'm deleting everything and reinserting the whole list instead of just inserting the new entries and deleting the entries not found in the new data.
Can you suggest a better way of doing this? Is it possible to create an sql query that can take care of synchronizing of data?

Take a look at MERGE.
The construct will allow you to specify conditions under which to either update existing records, or to add new ones.
It basically looks like:
MERGE
INTO target_table
USING source_table
ON (some condition)
WHEN MATCHED
( UPDATE some_update_statement )
WHEN NOT MATCHED
( INSERT some_insert_statement )

You can also do these operations in the stored procedure to not to unnecessarily increase DB traffic.
However this won't work for huge amount of entries (say millions of entries) or if each entry is a very big one. In this case try to use MERGE operations as Ryan pointed above.
Consider the following PL/SQL pseudocode (you may want to change it depending on your use case):
PROCEDURE replace_with_list (newList) AS
BEGIN
/* Delete all the entries that are not present in the newList (e.g. by ID) */
/* Insert all the entries that present in newList and not present in your table */
END;

Related

DynamoDBMapper : How to get all the rows for multiple id's(array) in a single query or scan of DynamoDBMapper

My DB table consist of multiple rows whose id are unique.
API(Endpoint) -> get the rows for the id's
i am passing array of inputs (id1,id2,id3,id4)
Question : In DynamoDBMapper, write a single query fetching all the rows for the id's that we passed in.
we can use either scan or query.
Appreciate your help.
Thanks in Advance.
Scan or Query is not suitable for this transaction.
You should iterate your list and use GetItem to retrieve each item individually, which is the fastest and cheapest way to get the items. You can also use BatchGetItem if you wish to perform concurrent requests.
A Scan would be slow and expensive as it would evaluate every single item in your table. However if you insist on using it, simply scan your table and provide a ScanFilter to return your items.
If you used a Query, it would operate in exactly the same way as GetItem anyway. You would have to iterate your list of IDs. i.e. a Query is not at all suitable in this case.
I achieved in single query call (dynamoDBMapper.SCAN). Example as follow
private List<Activity> getbyIds(List<UUID> Ids) {
List<Entity> activityEntityList = new ArrayList<Entity>();
List<AttributeValue> attList = Ids.stream().map(x -> new AttributeValue(x.toString())).collect(Collectors.toList());
DynamoDBScanExpression dynamoDBScanExpression = new DynamoDBScanExpression()
.withFilterConditionEntry("id", new Condition()
.withComparisonOperator(ComparisonOperator.IN)
.withAttributeValueList(attList));
PaginatedScanList<Entity> list = dynamoDBMapper.scan(Entity.class, dynamoDBScanExpression);
}

How to fetch list then modify bin of type List in single transaction from aerospike Java cilent

I want to fetch bin of type list from aerospike database and update list(add and remove some elements from list) then then update bin in aerospike db in single for given key. I have multiple thread that can fetch and update same key from multiple place so i want to do above operation in single transaction.
aerospike java client version: **3.2.0****
**java version: 1.8
If you are doing simple list manipulations, you can use ListOperation:
https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/ListOperation.html
If ListOperation doesn't have the operations you need to make, or you need to make multiple operations in one atomic transaction, using UDFs is your best choice.
During UDF execution, the record is locked. Whether it is a full lock or a write lock, I'm not sure, but either case should serve your atomicity needs just fine. Do all your list operations, then persist the changes to database in one aerospike:create(rec) or aerospike:update(rec) call.
Example: most_recent_10_list.lua
function append(rec, value)
-- Get list from database record. Create one if it doesn't exist.
local my_list = rec["my_list_bin"]
if my_list == nil then
my_list = list()
end
-- Add value to list
list.append(my_list, value)
-- Keep only the 10 most-recent values in the list
local new_list_size = list.size(my_list)
if list.size(new_list_size >= 10) then
my_list = list.drop(my_list, new_list_size - 10)
end
-- Save changes to database
rec["my_list_bin"] = my_list
if not aerospike:exists(rec) then
aerospike:create(rec)
else
aerospike:update(rec)
end
end

Efficient way to check if record (from a large set of data) is existing in the Database (JPA/Hibernate)

We have a large set of data (bulk data) that needs to be checked if the record is existing in the database.
We are using SQL Server2012/JPA/Hibernate/Spring.
What would be an efficient or recommended way to check if a record exists in the database?
Our entity ProductCodes has the following fields:
private Integer productCodeId // this is the PK
private Integer refCode1 // ref code 1-5 has a unique constraint
private Integer refCode2
private Integer refCode3
private Integer refCode4
private Integer refCode5
... other fields
The service that we are creating will be given a file where each line is a combination of refCode1-5.
The task of the service is to check and report all lines in the file that are already existing in the database.
We are looking at approaching this in two ways.
Approach1: Usual approach.
Loop through each line and call the DAO to query the refCode1-5 if existing in the db.
//psuedo code
for each line in the file
call dao. pass the refCode1-5 to query
(select * from ProductCodes where refCode1=? and refCode2=? and refCode3=? and refCode4=? and refCode5=?
given a large list of lines to check, this might be inefficient since we will be invoking the DAO xxxx number of times. If the file say consists of 1000 lines to check, this will be 1000 connections to the DB
Approach2: Query all records in the DB approach
We will query all records in the DB
Create a hash map with concatenated refCode1-5 as keys
Loop though each line in the file validating against the hashmap
We think this is more efficient in terms of DB connection since it will not create 1000 connections to the DB. However, if the DB table has for example 5000 records, then hibernate/jpa will create 5000 entities in memory and probably crash the application
We are thinking of going for the first approach since refCode1-5 has a unique constraint and will benefit from the implicit index.
But is there a better way of approaching this problem aside from the first approach?
try something like a batch select statement for say 100 refCodes instead of doing a single select for each refCode.
construct a query like
select <what ever you want> from <table> where ref_code in (.....)
Construct the select projection in a way that not just gives you wnat you want but also the details of ref_code. Teh in code you can do a count or multi-threaded scan of resultset if DB said you got less refCodes that the number you codes you entered in query.
You can try to use the concat operator.
select <your cols> from <your table> where concat(refCode1, refCode2, refCode3, refCode4, refCode5) IN (<set of concatenation from your file>);
I think this will be quite efficient and it may be worth to try to see if pre-sorting the lines and playing with the num of concatenation taken each times bring you some benefits.
I would suggest you create a temp table in your application where all records from file are stored initially with batch save, and later you run a query joining new temp table and productCodes table to achieve filtering how you like. In this way you are not locking productCodes table many times to check individual row as SqlServer locks rows on select statement as well.

rewrite Hibernate query without huge list parameter

In my database I have a zip table with a code column. The user can upload a list of Zip codes and I need to figure out which ones are already in the database. Currently, I do this using the following Hibernate query (HQL):
select zip.code from Zip zip
where zip.code in (:zipCodes)
The value of the :zipCodes parameter is the list of codes uploaded by the user. However, in the version of Hibernate I'm using there's a bug which limits the size of such list parameters and on occasions we're exceeding this limit.
So I need to find another way to figure out which of the (potentially very long) list of Zip codes are already in the database. Here are a few options I've considered
Option A
Rewrite the query using SQL instead of HQL. While this will avoid the Hibernate bug, I suspect the performance will be terrible if there are 30,000 Zip codes that need to be checked.
Option B
Split the list of Zip codes into a series of sub-lists and execute a separate query for each sub-list. Again, this will avoid the Hibernate bug, but performance will likely still be terrible
Option C
Use a temporary table, i.e. insert the Zip codes to be checked into a temporary table, then join that to the zip table. It seems the querying part of this solution should perform reasonably well, but the creation of the temporary table and insertion of up to 30,000 rows will not. But perhaps I'm not going about it the right way, here's what I had in mind in pseudo-Java code
/**
* Indicates which of the Zip codes are already in the database
*
* #param zipCodes the zip codes to check
* #return the codes that already exist in the database
* #throws IllegalArgumentException if the list is null or empty
*/
List<Zip> validateZipCodes(List<String> zipCodes) {
try {
// start transaction
// execute the following SQL
CREATE TEMPORARY TABLE zip_tmp
(code VARCHAR(255) NOT NULL)
ON COMMIT DELETE ROWS;
// create SQL string that will insert data into zip_tmp
StringBuilder insertSql = new StringBuilder()
for (String code : zipCodes) {
insertSql.append("INSERT INTO zip_tmp (code) VALUES (" + code + ");")
}
// execute insertSql to insert data into zip_tmp
// now run the following query and return the result
SELECT z.*
FROM zip z
JOIN zip_tmp zt ON z.code = zt.code
} finally {
// rollback transaction so that temporary table is removed to ensure
// that concurrent invocations of this method operate do not interfere
// with each other
}
}
Is there a more efficient way to implement this than in the pseudo-code above, or is there another solution that I haven't thought of? I'm using a Postgres database.
Load all the Zip codes in the database to a List. And on the user inputed list of Zip codes do a removeAll(databaseList).
Problem solved!
Suppose you "validate" 1000 codes against a table of 100000 records in which the code is the primary key and has a clustered index.
Option A is not an improvement, Hibernate is going to build the same SELECT ... IN ... you could write on your own.
Option B, as well as your current query, might fail to use the index.
Option D might be good if you are sure the zipcodes don't change at arbitrary times, which is unlikely, or if you can recover from trying to process existing codes.
Option C (Creating a temp table, issuing 1000 INSERT statements and joining 1000 rows against 100000 in a single SELECT) isn't competitive with just issuing 1000 simple and index-friendly queries for a single new code each:
SELECT COUNT(*) FROM Zip WHERE Zip.code = :newCode
Option D:
Loading all existing zip codes from the database (pagination?) and make the compare in your application.
Regarding your Option A:
I remember a limitation of the SQL query lenght but that was on DB2, I don't know if there is a limit on PostgreSQL.
There are around 45'000 Zip Codes in the US and the seem to be updated anualy. If this is an anual job, dont write it in java. Create a sql script which loads the zip codes into a a new table and write an insert statement with
insert XXX into zip where zip.code not in (select code from ziptemp)
Have your operation guys run this two line SQL script once a year and dont buy yourself with this in the java code. Plus if you keep this out of java, you can basically take any approach, because no one cares if this runs for thirty minutes in offpeak times.
divide et impera
Have you tryed to use subqueries IN ?
http://docs.jboss.org/hibernate/orm/3.5/api/org/hibernate/criterion/Subqueries.html
would be something like this
DetachedCriteria dc = DetachedCriteria.forClass(Zip.class, "zz");
//add restrictions for the previous dc
Criteria c = session.createCriteria(Zip.class, "z");
c.add(Subqueries.in("z.code" dc));
sry if I mistaken the code, its beeing a while since I dont use Hibernate

Getting grouped data from database and process further in java

I have a table , say A ; now in A i have attribute ID as string and Time as DateTime.
Now the condition is that different entries to the table can have same ID and they have to be clubbed together and further do some refinement on it.
I am using java, I write the SQL query that
Select * from A group by ID;
Now i get this data in a huge list in java. Now what i do is
Set_ID=NULL;
for(each element in List)
{
if(Set_ID equals elements `ID` from table)
Add the element to the same list
else
Create new List and add element to the list. Change Set_ID to current `ID`
}
This way i get all the Entries with same Id in different lists and i can process further.
But is this the efficient way to this; comparing strings for each element.
Any change i can make, to make it better. Thanks.
Instead of reading all the data into a list & then processing it into sub lists, I'd process them directly into sub lists as you pull them from the database

Categories