I have a java batch process which publishes message processing to MQ. MDB associated with the queue processes the message. Each message will have 10 records. I need to update a database table to keep track of the records processed, successful and failures. There will be only one row in table for each batch run. So the problem is that since multiple instances of MDB are trying to update, we are facing concurrency issues. We tried with row-level locking as well. But the issue still exists.
I am looking for a solution where I can keep track of the counter on the java side and then do a single update after reaching certain threshold. Lets say 500 messages were published. Each message processes 10 records. The MDB should update this counter after processing all records within this message. The counter will then spawn a thread (if threshold is met) that will update the database.
Please let me know what options are available to me.
App Server - WAS 5.6, DB2 9.1 on Z/OS. Access to DB2 is through SP.
Thanks!
Have you tried doing the update entirely on the DB server? For example:
UPDATE COUNT_TABLE SET COUNTER = COUNTER + 1 WHERE ...
The DB server should be able to manage concurrent update statements like this.
The simplest solution would be to have only a single MDB instance running and your concurrency problem disappears.
This would be a little slower, since you will be doing DB updates of 10 at a time instead of your proposed 500, but unless this is a problem, I would just keep it simple.
Related
I have a web application that sends an email on certain action. So, for that, it saves the request in database, and then one email worker thread picks up the request and sends the email. Email worker watches the database for changes.
Now I want to have the same web application running behind load balancer sharing the same database. Now the problem is when I'll create an email request in database, there is a possibility that the email worker running inside the similar web application on different machine behind load balancer might see the database entry of email at the same time and this will result in same email being sent multiple times.
So, Is there any way to prevent this situation other than explicitly locking a table?
I've read this question Distributing java threads over multiple servers?
but don't know whether the solutions provided there will suffice my need. Terracotta seems to be the solution, but I think it will need explicit synchronization to be added to the code, don't know.
Any knowledge on this will be helpful.
A simple, low-tech solution would be to have the email sending worker running just once.
This can be achieved either by extracting the worker to a separate application, which is triggered for example by cron, or by making it configurable, whether an instance of your web application has the email worker activated. In the latter case it is up to you to make sure that only one of the load balanced instances has the email worker active
The downside would be that E-Mail sending would not be redundant, i.e. if the host on which email sending is active is down, no one is sending any emails.
If you need redundancy you'd have to rely on some distributed middleware, as in the question you linked above, or implement a synchronization mechanism yourself.
For an own implementation you could look at optimistic locking using a version number field/column on the email request, as supported by Hibernate/JPA (see https://stackoverflow.com/a/19456821/981913). The algorithm would be something along the following lines:
Worker wakes up, finds email request in the DB, and version column = 0
Worker updates the request with version = 1. Update only succeeds if version was 0 before, meaning no other worker updated it to 1
since the request was read at step 1.
if update from step 2 was successful, worker can safely assume no other worker will process this reques, and go ahead and send the email
Use SQL transactions to lock a row for processing.
Your email row should have 2 columns added, TIMESTAMP type: processed_time, send_time but default to NULL
Begin transaction
select 1 item where processed_time is NULL and send_time is NULL and use highest level of serialization for that database (see something like In SQL Server, how can I lock a single row in a way similar to Oracle's "SELECT FOR UPDATE WAIT"?), every DB/ORM has a way of doing this, postres uses SELECT FOR UPDATE, etc.
update row and set processed_time to NOW()
commit (but don't close transaction, if email fails you may need to handle that or rollback
4a. At this point you have claimed that row for yourself so another thread would not get that row at step 2 and needs to be done asap
send email
update row and set send_time to NOW()
End transaction
You may need to fine tune this process based on DB/ORM you are using but the general idea holds.
Good time guys!
We have a pretty straightforward application-adapter: once in 30 seconds it reads records from a database (can't write to it) of one system, converts each of these records into an internal format, performs filtering, encrichment, ..., and, finally, transforms the resulting, let's say, entities into an xml format and sends them via a JMS to other system. Nothing new.
Let's add some spice here: records in the database are sequential (that means that their identifies are generated by a sequence), and when it is time to read a new bunch of records, we get a last-processed-sequence-number -- which is stored in our internal databese and updated each time the next record is processed (sent to the JMS) -- and start reading from that record (+1).
The problem is our customers gave us an NFR: processing of a read record bunch must not last longer than 30 seconds. As far as there are a lot of steps in the workflow (with some pretty long running ones), and it is possible to get a pretty big amount of records, and as far as we process them one by one, it can take more than 30 seconds.
Because of all the above I want to ask 2 questions:
1) Is there an approach of a parallel processing of sequential data, maybe with one or several intermediate storages, or Disruptor patern, or cqrs-like, or a notification-based, or ... that provides a possibility of working in such a system?
2) A general one. I need to store a last-processed-number and send an entity to the JMS. If I save a number to a database and then some problem raises with the JMS, on an application's restart my adapter will think that it successfuly sended the entity, which is not true and it won't be ever received. If I send an entity and after that try so save a number to a database and get an exception, on an application's restart a reprocessing will be performed which will lead to duplications in the JMS. I'm not sure that xa transactions will help here or some kind of a last resorce gambit...
Could somebody, please, share experience or ideas?
Thanks in advance!
1) 30 seconds is a long time and you can do a lot in that time esp with more than one CPU. Without specifics I can only say it is likely you can make it faster if you profile it and use more CPUs.
2) You can update the database before you send and listen to the JMS queue yourself to see it was received by the broker.
Dimitry - I don't know the detail around your problem so I'm just going to make a set of assumptions. I hope it willtrigger an idea that will lead to the solution at least.
Here goes:
Grab you list of items to process.
Store the last id (and maybe the starting id)
Process each item on a different thread (suggest using Tasks).
Record any failed item in a local failed queue.
When you grab the next bunch, ensure you process the failed queue first.
Have a way of determining a max number of retries and a way of moving/marking it as permanently failed.
Not sure if that was what you were after. NServiceBus has a retry process where the gap between each retry gets longer up to a point, then it is marked as failed.
Folks, finally we ended up with the following solution. We implemented a kind of the Actor Model. The idea is the following.
There are two main (internal) database tables for our application, let's call them READ_DATA_INFO, which contains a last-read-record-number of the 'source' external system, and DUMPED_DATA, which stores a metadata about each read record of the source system. This is how it all works: each n (a configurable property) seconds a service bus reads the last processed identifier of the source system and sends a request to the source system to get new records from it. If there are several new records, they are being wrapped with a DumpRecordBunchMessage message and sent to a DumpActor class. This class begins a transaction which comprises two operations: update the last-read-record-number (the READ_DATA_INFO table) and save a metadata about each reacord (the DUMPED_DATA table) (each dumped record gets the 'NEW' status. When a record is successfully processed, it gets the 'COMPLETED' status; otherwise - the 'FAILED' status). In case of a successfull transaction commit each of those records is wrapped with a RecordMessage message class and send to next processing actor; otherwise those records are just skipped - they would be reread after next n seconds.
There are three interesting points:
an application's disaster recovery. What if our application will be stopped somehow at the middle of a processing. No problem, at an application's startup (#PostConstruct marked method) we find all the records with the 'NEW' statuses at the DUMPED_DATA table and with a help of a stored metadata rebuild restore them from the source system.
parallel processing. After all records are successfully dumped, they become independent, which means that they can be processed in parallel. We introduced several mechanisms of a parallelism and a loa balancing. The simplest one is a round robin approach. Each processing actor consists of a parant actor (load balancer) and a configurable set of it's child actors (worker). When a new message arrives to the parent actor's queue, it dispatches it to the next worker.
duplicate record prevention. This is the most interesting one. Let's assume that we read data each 5 seconds. If there is an actor with a long running operation, it is possible to have several tryings to read from the source system's database starting from the same last-read-record number. Thus there would potentially be a lot duplicate records dumped and processed. In order to prevent this we added a CAS-like check of DumpActor's messages: if the last-read-record from a message is equal to a one from the DUMPED_DATA table, this message should be processed (no messages were processed before it); otherwise this message is rejected. Rather simple, but powerfull.
I hope this overview will help somebody. Have a good time!
Can you help me in two problem :
A. We have a table on which read and write operation happens simultaneously. Write happens very vastly so read is very slow - sometimes my web application does not come up due to heavy write operation on this table. How could i handle such scenario. Write happens through different Java application while read happens through our web application, so web application become very slow. Any idea?
B. Write happens to this table happens through 200 threads, these thread take connection from connection pool and write into the table and this application run 24 by 7. is the thread priority is having issue and stopping read operation from web application.
C. Can we have master- master replication for that table only- so write happens in one table and write happens in other table and every two minute data migrates from one table to other table?
Please suggest me .
Thanks in advance.
Check connection pool size - maybe it's too small and your threads waste time waiting for connection from pool.
Check your database settings, if you just running it with out-of-the-box params there maybe a good space for improvements.
You probably need some kind of event-driven system - when vehicle sends data DB is not updated, but a message is added to some queue (e.g. JMS). Your app then caches data on startup, and updates both cache and database upon receiving this message. The key thing is that the only component that interacts with DB is your app, and data changed only when you receive event - so you don't need to query DB to read the data, plus you may do updates in the background using only few threads, etc. There are quite good open-source messaging systems (e.g. Apache Active MQ) and caching libraries (e.g. EH Cache), so you can built reasonably perfomant and fault-tolerant system with not too much effort.
I guess introducing messaging will be a serious reengineering, so to solve your immediate problem replication might be the best solution - merge data from the updateable table to another one every 2 minutes, and the tracker will read that another table; obviously works well if you only read the data in the web-app, and not update them, otherwise you need to put a lot of effort to keep 2 tables in sync. A variation of that is batching - data from vehicle are iserted into intermediate table, and then every 2 minutes transferred into main table from which reader queries them; intermediate table is cleaned after transfer.
The one true way to solve this is to use a queue of write events and to stop the writing periodically so that the reader has a chance.
Create a queue for incoming write updates
Create an atomicXXX (see java.util.concurrency) to use as a lock
Create a thread pool to read from the queue and execute the updates when the lock is unset
Use javax.swing.Timer to periodically set the lock and read the table data.
Before trying anything too complicated try this perhaps:
1) Don't use Thread priorities, they are rarely what you want.
2) Set up your own priority scheme, perhaps simply by having a (priority) queue for both reads and writes where reads are prioritized. That is: add read and write requests to a single queue and have them block or be notified of the result.
3) check your database features to optimize write heavy tables
I am using Informix DB. This question may not be tied to one specific database. But I want to know how I can in Java, continuously probe into a Database and check if a certain row has been added to a table in the DB. Basically, the flow is:
My Java application should use JDBC to check if a certain table is populated.
If no, it should wait until a row has been inserted.
My question how can I have Java be aware of a row insertion. I am not expecting to add any triggers or anything, but in pure Java be able to check that the row is added.
Some thoughts that come to my mind are continuously call DB for the row, or periodically (every half-hour or so) call DB and check if the row is available. But what I am looking for is something like a Listener which can do this.
There is no facility in the Informix DBMS to signal when a particular row arrives in a table.
Well, I say that, but there is the DB-Cron facility which can periodically execute tasks (inside the server), and you could conceivably schedule a task to poll for the data to see if it has arrived and to send a message (somehow) to indicate that it has. It would be non-trivial, especially the part the indicates that it has arrived.
The JDBC protocol (and SQL protocols generally) are essentially synchronous; the client sends a request and waits for an answer from the DBMS.
So, pragmatically, if your delay period is half an hour, you can either create an admin task to handle the processing (you could write a Java UDR to be executed in the server by the server if that's crucial to you), or you can arrange for the Java (client-side) program to poll periodically to find out whether the information you need is there. A half-hour delay is not going to stress anything, even with a moderate number of processes polling for separate values (or even the same value). On the other hand, you normally try to avoid polling when you can. You'll need to strike a balance between responsiveness to the special data arriving and general system responsiveness. On the whole, general system responsiveness is more important, so keep the polling interval as large as you can.
If your polling interval needed to be sub-second, then the balance would be different - the job would be a lot harder.
We have a JMS queue of job statuses, and two identical processes pulling from the queue to persist the statuses via JDBC. When a job status is pulled from the queue, the database is checked to see if there is already a row for the job. If so, the existing row is updated with new status. If not, a row is created for this initial status.
What we are seeing is that a small percentage of new jobs are being added to the database twice. We are pretty sure this is because the job's initial status is quickly followed by a status update - one process gets one, another process the other. Both processes check to see if the job is new, and since it has not been recorded yet, both create a record for it.
So, my question is, how would you go about preventing this in a vendor-neutral way? Can it be done without locking the entire table?
EDIT: For those saying the "architecture" is unsound - I agree, but am not at liberty to change it.
Create a unique constraint on JOB_ID, and retry to persist the status in the event of a constraint violation exception.
That being said, I think your architecture is unsound: If two processes are pulling messages from the queue, it is not guaranteed they will write them to the database in queue order: one consumer might be a bit slower, a packet might be dropped, ..., causing the other consumer to persist the later messages first, causing them to be overridden with the earlier state.
One way to guard against that is to include sequence numbers in the messages, update the row only if the sequence number is as expected, and delay the update otherwise (this is vulnerable to lost messages, though ...).
Of course, the easiest way would be to have only one consumer ...
JDBC connections are not thread safe, so there's nothing to be done about that.
"...two identical processes pulling from the queue to persist the statuses via JDBC..."
I don't understand this at all. Why two identical processes? Wouldn't it be better to have a pool of message queue listeners, each of which would handle messages landing on the queue? Each listener would have its own thread; each one would be its own transaction. A Java EE app server allows you to configure the size of the message listener pool to match the load.
I think a design that duplicates a process like this is asking for trouble.
You could also change the isolation level on the JDBC connection. If you make it SERIALIZABLE you'll ensure ACID at the price of slower performance.
Since it's an asynchronous process, performance will only be an issue if you find that the listeners can't keep up with the messages landing on the queue. If that's the case, you can try increasing the size of the listener pool until you have adequate capacity to process the incoming messages.