How to Implement Transactions in a Non-Transactional Database - java

How to Implement Transactions in a Non-Transactional Database.
1) Please explain in How you can do this on java side.
Note: I will share the efforts I put in finding the answer.
Suppose you have two inserts and two updates in a single transaction. So you will have four threads executing each instruction, One thread will monitor them all. If there is any failure in one of the thread so the monitoring thread will cancel out everything.

Each thread that participates in the transaction is given a transaction id. You need to create a structure that they can write to that keeps track of the data (or keys) in order to back out changes.
Like a real database, when you do an update, the before changed data needs to be stored and the after changed data needs to be recorded as well. You need this, because you may need it to find the record.
Inserts are a little easier, just delete the record.
Deletes need to store the before deleted data as well.
So any structure you create, needs a transaction ID, a table name and say a list of column data (which can be a map of String, object to store the column name, column data).
That should be a pretty good start...

Related

Upsert in Spring Data

There is a table
columns: id(pk), name, attribute
unique constraint on (name, attribute).
There are a bunch of threads which insert in the table if a record is not there. Spring Data is used for that and it's done it a transaction which could take some time. The records could be the same, meaning same (name, attribute), simultaneously in a couple of threads. From time to time race condition happens, thread A tries to commit a new record whereas thread b committed the same before thread A read it.
Are there any approaches on how to do upsert in this kind of situations?
Perhaps, there are other suggestions to resolve this issue, would be happy to hear them.
Either do it the JPA way:
Try to find the entity, if it is not there, save it.
If it is there there is nothing to do, but of course you could update it by manipulating the found entity.
Alternatively go SQL and write an actual upsert/merge statement which many database dialects support.

How to force commit Spring - hibernate transaction safely

We are using spring and hibernate for an web application:
The application has a shopping cart where user can place items in it. in order to hold the items to be viewed between different login's the item values in the shopping cart are stored in tables. when submitting the shopping cart the items will be saved into different table were we need to generate the order number.
When we insert the values into the table to get the order number, we use to get the max order number and add +1 to it. we are using spring transaction manager and hibernate, in the code flow we get the order number and update the hibernate object to hold the order num value. when i debug, i noticed that only when the complete transaction is issued the order number entity bean is being inserted.
Issue here is when we two request is being submitted to the server at the same time, the same order number is being used, and only one request data is getting inserted. could not insert the other request value which is again a unique one.
The order num in the table is a unique one.
i noticed when debugging the persistant layer is not getting inserted into the database even after issuing session flush
session.flush()
its just updating the memory and inserting the data to db only at the end of the spring transaction . i tried explicitly issuing a commit to transaction
session.getTransaction().commit();
this inserted the values into the database immediately, but on further code flow displayed message that could not start transaction.
Any help is highly appreciated.
Added:
Oracle database i used.
There is a sequence number which is unique for that table and also the order number maps to it.
follow these steps :- ,
1) Create a service method with propagation REQUIRES_NEW in different service class .
2)Move your code (whatever code you want to flush in to db ) in this new method .
3)Call this method from existing api (Because of proxy in spring, we have to call this new service method from different class otherwise REQUIRES_NEW will not work which make sure your flushing data ).
I would set the order number with a trigger which will run in the same transaction with the shopping cart insert one.
After you save the shopping cart, to see the updated order count, you'll have to call:
session.refresh(cart);
The count shouldn't be managed by Hibernate (insertable/updatable = false or #Transient).
Your first problem is that of serial access around the number generation when multiple thread are executing the same logic. If you could use Oracle sequences this would have been automatically taken care of at the database level as the sequences
are guranteed to return unique values any number of times they are called. However since this needs to be now managed at server side, you would need to
use synchronization mechanism around your number generation logic ( select max and increment by one) across the transaction boundary. You can make the Service
method synchronized ( your service class would be singleton and Spring managed) and declare the transaction boundary around it. However please note that this would be have performance implications and is usually bad for
scalability.
Another option could be variation of this - store the id to be allocated in a seperate table with one column "currentVal" and use pessimistic lock
for getting the next number. This way, the main table would not have any big lock. This way a lock would be held for the sequence generator code for the time the main entity creation transaction is complete. The main idea behind these techniques is to serialize
access to the sequence generator and hold the lock till the main entity transaction commits. Also delay the number generator as late as possible.
The solution suggested by #Vlad is an good one if using triggers is fine in your design.
Regarding your question around the flush behaviour, the SQL is sent to the database at flush call, however the data is not committed until the transaction is committed declaratively or a manual commit is called. The transaction can however see the data it purposes to change but not other transactions depending upon the isolation nature of transaction.

With an ItemWriter which persists all of my value objects, is it possible to perform a separate db insert only once?

I have an implementation of an ItemWriter which persists all of my value objects nicely. When the first value object (for the batch job) is passed to the ItemWRiter can I perform a separate db insert, and guarantee that this insert will not occur for subsequent value objects coming into the ItemWriter?
Apologies it sounds wordy. In simpler terms I want to get a record to a status table to show that the batch job has started writing and not have this inserted n times.
You can use JobExplorer to query SB metadata tables and check if step is started.
Another way: you can use a listener like the ItemWriterListener.afterWrite() and store your flag into audit table (and - also - into execution context to prevent multiple writes).

Synchronizing table data across databases

I have one table that records its row insert/update timestamps on a field.
I want to synchronize data in this table with another table on another db server. Two db servers are not connected and synchronization is one way (master/slave). Using table triggers is not suitable
My workflow:
I use a global last_sync_date parameter and query table Master for
the changed/inserted records
Output the resulting rows to xml
Parse the xml and update table Slave using updates and inserts
The complexity of the problem rises when dealing with deleted records of Master table. To catch the deleted records I think I have to maintain a log table for the previously inserted records and use sql "NOT IN". This becomes a performance problem when dealing with large datasets.
What would be an alternative workflow dealing with this scenario?
It sounds like you need a transactional message queue.
How this works is simple. When you update the master db you can send a message to the message broker (of whatever the update was) which can go to any number of queues. Each slave db can have its own queue and because queue's preserve order the process should eventually synchronize correctly (ironically this is sort of how most RDBMS do replication internally).
Think of the Message Queue as a sort of SCM change-list or patch-list database. That is for the most part the same (or roughly the same) SQL statements sent to master should be replicated to the other databases eventually. Don't worry about loosing messages as most message queues support durability and transactions.
I recommend you look at spring-amqp and/or spring-integration especially since you tagged this question with spring-batch.
Based on your comments:
See Spring Integration: http://static.springsource.org/spring-integration/reference/htmlsingle/ .
Google SEDA. Whether you go this route or not you should know about Message queues as it goes hand-in-hand with batch processing.
RabbitMQ has a good picture diagram of how messaging works
The contents of your message might be the entire row and whether its a CRUD, UPDATE, DELETE. You can use whatever format (e.g. JSON. See spring integration on recommendations).
You could even send the direct SQL statements as a message!
BTW your concern of NOT IN being a performance problem is not a very good one as there are a plethora of work-arounds but given your not wanting to do DB specific things (like triggers and replication) I still feel a message queue is your best option.
EDIT - Non MQ route
Since I gave you a tough time about asking this quesiton I will continue to try to help.
Besides the message queue you can do some sort of XML file like you we were trying before. THE CRITICAL FEATURE you need in the schema is a CREATE TIMESTAMP column on your master database so that you can do the batch processing while the system is up and running (otherwise you will have to stop the system). Now if you go this route you will want to SELECT * WHERE CREATE_TIME < ? is less than the current time. Basically your only getting the rows at a snapshot.
Now on your other database for the delete your going to remove rows by inner joining on a ID table but with != (that is you can use JOINS instead of slow NOT IN). Luckily you only need all the ids for delete and not the other columns. The other columns you can use a delta based on the the update time stamp column (for update, and create aka insert).
I am not sure about the solution. But I hope these links may help you.
http://knowledgebase.apexsql.com/2007/09/how-to-synchronize-data-between.htm
http://www.codeproject.com/Tips/348386/Copy-Synchronize-Table-Data-between-databases
Have a look at Oracle GoldenGate:
Oracle GoldenGate is a comprehensive software package for enabling the
replication of data in heterogeneous data environments. The product
set enables high availability solutions, real-time data integration,
transactional change data capture, data replication, transformations,
and verification between operational and analytical enterprise
systems.
SymmetricDS:
SymmetricDS is open source software for multi-master database
replication, filtered synchronization, or transformation across the
network in a heterogeneous environment. It supports multiple
subscribers with one direction or bi-directional asynchronous data
replication.
Daffodil Replicator:
Daffodil Replicator is a Java tool for data synchronization, data
migration, and data backup between various database servers.
Why don't you just add a TIMESTAMP column that indicates the last update/insert/delete time? Then add a deleted column -- ie. mark the row as deleted instead of actually deleting it immediately. Delete it after having exported the delete action.
In case you cannot alter schema usage in an existing app:
Can't you use triggers at all? How about a second ("hidden") table that gets populated with every insert/update/delete and which would constitute the content of the next to be generated xml export file? That is a common concept: a history (or "log") table: it would have its own progressing id column which can be used as an export marker.
Very interesting question.
In may case I was having enough RAM to load all ids from master and slave tables to diff them.
If ids in master table are sequential you try to may maintain a set of full filled ranges in master table (ranges with all ids used, without blanks, like 100,101,102,103).
To find removed ids without loading all of them to the memory you may execute SQL query to count number of records with id >= full_region.start and id <= full_region.end for each full filled region. If result of query == (full_region.end - full_region.end) + 1 it means all record in region are not deleted. Otherwise - split region into 2 parts and do the same check for both of them (in a lot of cases only one side contains removed records).
After some length of range (about 5000 I think) it will faster to load all present ids and check for absent using Set.
Also there is a sense to load all ids to the memory for a batch of small (10-20 records) regions.
Make a history table for the table that needs to be synchronized (basically a duplicate of that table, with a few extra fields perhaps) and insert the entire row every time something is inserted/updated/deleted in the active table.
Write a Spring batch job to sync the data to Slave machine based on the history table's extra fields
hope this helps..
A potential option for allowing deletes within your current workflow:
In the case that the trigger restriction is limited to triggers with references across databases, a possible solution within your current workflow would be to create a helper table in your Master database to store only the unique identifiers of the deleted rows (or whatever unique key would enable you to most efficiently delete your deleted rows).
Those ids would need to be inserted by a trigger on your master table on delete.
Using the same mechanism as your insert/updates, create a task following your inserts and updates. You could export your helper table to xml, as you noted in your current workflow.
This task would simply delete the rows out of the slave table, then delete all data from your helper table following completion of the task. Log any errors from the task so that you can troubleshoot this since there is no audit trail.
If your database has a transaction dump log, just ship that one.
It is possible with MySQL and should be possible with PostgreSQL.
I would agree with another comment - this requires the usage of triggers. I think another table should hold the history of your sql statements. See this answer about using 2008 extended events... Then, you can get the entire sql, and store the result query in the history table. Its up to you if you want to store it as a mysql query or a mssql query.
Here's my take. Do you really need to deal with this? I assume that the slave is for reporting purposes. So the question I would ask is how up to date should it be? Is it ok if the data is one day old? Do you plan a nightly refresh?
If so, forget about this online sync process, download the full tables; ship it to the mysql and batch load it. Processing time might be a lot quicker than you think.

Keeping search result consistent across multiple transactions

I have to implement a requirement for a Java CRUD application where users want to keep their search results intact even if they do actions which affects the criteria by which the returned rows are matched.
Confused? Ok. Let me give you a familiar example. In Gmail if you do an advanced search on unread emails, you are presented with a list of matching results. Click on an entry and then go back to the search list. What happens is that you have just read that entry but it hasn't disappeard from the original result set. Only that line has changed from bold to normal.
I need to implement the exact same behaviour but the application is designed in such a way that any transaction is persisted first and then the UI requeries the db to keep in sync. The complexity of the application and the size of the database prevents me from doing just a simple in memory caching of the matching rows and making the changes both in db and in memory.
I'm thinking of solving the problem on the database level by creating an intermediate table in the Oracle database holding pointers to matching records and requerying only those records to keep the UI in sync with data. Any Ideas?
In Oracle, if you open a cursor, the results of that cursor are static, regardless if another transaction inserts a row that would appear in your cursor, or updates or deletes a row that does exist in your cursor.
The challenge then is to not close the cursor if you want results consistent from when the cursor was opened.
If the UI maintains a single session on the database, one solution is to use Global Temporary Tables in Oracle. When you execute a search, insert the unique IDs into the GTT, then the UI just queries the GTT.
If the UI doesn't keep the session open, you could do the same thing but with an ordinary table. Then, of course, you'd just have to add some cleanup code to remove old search results from the table.
You can use a flashback query to read data from the past. For example, select * from employee as of timestamp to_timestap('01-MAY-2011 070000', 'DD-MON-YYYY HH24MISS');
Oracle only stores this historical information for a limited period of time. You'll need to look into your retention settings; the UNDO_RETENTION parameter, UNDO tablespace retention gaurantee and proper sizing, and also LOBs have their own retention setting.
Create two connections to the database.
Set the first one to READ ONLY (using SET TRANSACTION READ ONLY) do your searching from that connection but make sure you never end that transaction by issuing a commit or rollback.
As a read only transaction only sees the data as it was at the time the transaction started, the first connection will never see any changes to the database - not even committed ones.
Then you can do your updates in the second connection without affecting the results in the first connection.
If you cannot use two connections, you could implement the updates through stored procedures that use autonomous transactions, then you can keep the read only transaction open in the single connection you have.

Categories