Spring Batch Multithreading stuck without any exception output - java

I am implementing Spring Batch multithreading for some daily process. Item reader, Item processor and item writer are all bean(singleton). Also I am using Hibernate and spring data jpa for db access. For thread, I am using threadpooltaskexecutor.
Thread will hang for no reason, probably I don't know the root cause. Right now, in log, hibernate will stuck at either select or insert statement and just hang there forever. I don't really know the reason. For transaction, I have required_new and read_committed for propagation and isolation. Everything else is just default from Spring Boot.
I am processing 20k jsons, size might differ, but some are big. I can't share the whole code because I don't know where the problem is. Basically in item processor, I have few sync blocks to handle the business logic.
For this question, I just want to know what are the possible reasons? Because Java doesn't give any information.

Without code it is hard to say what might be the problem .You need to dig down exactly where its getting hanged. Possible reasons are
In reader select query is taking long time. (May be due to improper indexing or table might have locked). You can add more logs in reader to check exactly which record/s is causing this.
If you are doing any DB operations in ItemProcessor, add logs add check.
In writer, insertion might be taking long to complete writing.

Related

How does Spring JPA with hibernate manage concurrent updates

I am trying to write an app that can run active-active, with both instances using the same DB. I was worried that in updating a entity (calling repo.findById(), entity.setX(), repo.save(entity)) there was a potential race condition where updates could overwrite each other if there was a large enough delay between find and save.
To test it I made a method that loads the entity, waits 10 seconds, then adds something to a list attribute and saves it. Calling that twice I expected the second save to overwrite the first one, but surprisingly both updates were persisted.
Whilst this is what I wanted I was wondering if anyone knew how/why spring did this, because I want to make sure it will do this every time? I understand it uses optimistic locking by default, but you need the #version annotation (which I don't have). Will this also work if the updates come from separate apps, or did it only work because both of them were from the same application?

How to use Spring Batch to connect to server?

I need help from experienced Spring Batch programmers for a particular problem I am having.
I am new to Spring Batch. It has detailed documentation for each class or interface, but next to nothing in all of it working together.
The Scenario:
I have a database hosted on the cloud, So I have to make REST calls to get some data from it and save related data to it.
- Data retrieval will return a JSON response with data I queried.
- Data save will return a JSON response on how many rows were added etc.
- All responses will have a valid HTTP status code.
- A transaction is complete when the save call is successful with an Http code of 200 and data which shows how many records were inserted is received.
The connection may not always be available, In that case the program must keep retrying every 5 minutes until the whole task is complete.
What I chose not to do
I could do some dirty Java tricks (which were surprisingly recommended by many in stack overflow)
Threads and sleep (Too crude)
Spring's #Scheduled (The scheduler keeps running even after job completion)
What I tried
So I decided to use Spring Batch since it seemed to be a framework made for this.
I have no file tasks, so I used a Tasklet instead of Readers and
Writers.
The Tasklet interface can return only FINISHED status code. No codes
for FAILURE
So, inside the tasklet, I set a custom value in the StepContext and retrieved my custom
value in a StepExecutionListener and accordingly configured
ExistStatus of the Step to FAILURE
To handle this workaround I had to configure a JobExecutionListener
to make the Job fail accordingly.
Apart from all these above work-arounds,
Spring batch does not have any scheduling. I have to end up using
another scheduler.
Spring Batch's retry within a step is valid only
for ItemReader,ItemWriter etc and not for tasklets
The Question
Is Spring Batch right for this situation?
Is my design correct? It seems very "hack"-ey.
I need help with the most efficient way to handle my scenario
I was using spring batch for similar case - as a execution engine to process large files which resulted as lots of REST requests to other systems.
What Spring batch brought for me:
execution engine/model for large dependent operations. In other words I could maintain my input as one single entry point and have 'huge' transaction on top of other small operations.
Possibility to see execution results and monitor them.
Retriability of batch operations. This is one of the best thing in spring batch
it allows you design your operation in such manner that if something goes wrong in the middle of execution, you can simply restart it and continue from failing point. But you need to invest some effort to maintain this.
More on business cases here: https://docs.spring.io/spring-batch/trunk/reference/html/spring-batch-intro.html#springBatchUsageScenarios
So you need to check carefully those business cases and answer yourself if you really need them.
So far what you have described - I really don't see benefit of spring batch for you.

In what condition will cause LockAcquisitionException and SQLCODE=-911, SQLSTATE=40001, SQLERRMC=68

As my understanding, LockAcquisitionException will happen when a thread is trying to update a row of record that is locking by another thread. ( Please do correct me if I am wrong )
So I try to simulate as follow:
I lock a row of record using dbVisualizer, then I using application to run a update query on the same record as well. At the end, I am just hitting global transaction time out instead of LockAcquisitionException with reason code 68.
Thus, I am thinking that my understanding is wrong. LockAcquisitionException is not happen by this way. Can kindly advise or give some simple example to create the LockAcquisitionException ?
You will get LockAcquisitionException (SQLCODE=-911 SQLERRMC=68) as a result of a lock timeout.
It may be unhelpful to compare the actions of dbViz with hibernate because they may use different classes/methods and settings at jdbc level which can influence the exception details. What matters is that at Db2 level both experienced SQLCODE=-911 with SQLERRMC=68 regardless of the exception-name they report for the lock-timeout.
You can get a lock-timeout on statements like UPDATE or DELETE or INSERT or SELECT (and others including DDL and commands), depending on many factors.
All lock-timeouts have one thing in common: one transaction waited too long and got rolled-back because another transaction did not commit quickly enough.
Lock-timeout diagnosis and Lock-Timeout avoidance are different topics.
The length of time to wait for a lock can be set at database level, connection level, or statement level according to what design is chosen, including mixing these. You can also adjust how Db2 behaves for locking by adjusting database parameters like CUR_COMMIT, LOCK_TIMEOUT and by adjusting isolation-level at statement-level or connection-level.
It's wise to ensure accurate diagnosis before thinking about avoidance.
As you are running Db2-LUW v10.5.0.9, consider careful study of this page and all related links:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.trb.doc/doc/t0055072.html
There are many situations that can lead to a lock timeout, so it's better to know exactly which situation is relevant for your case(s).
Avoiding lock-conflicts is a matter of both configuration and transaction design so that is a bigger topic. The configuration can be at Db2 level or at application layer or both.
Sometimes bugs cause lock-timeouts, for example when app-server threads have a database-connection that is hung and has not committed and is not being cleaned up correctly by the application.
You should diagnose the participants in the lock timeout. There are different ways to do lock-conflict diagnosis on Db2-LUW so choose the one that works for you.
One simple diagnosis tool that still works on V10.5.0.9 is to use the Db2-registry variable DB2_CAPTURE_LOCKTIMEOUT=ON, event though the method is deprecated. You can set this variable (and unset it) on the fly without needing any service-outage. So if you have a recreatable scenario that results in SQLCODE=-911 SQLERRMC=68 (lock timeout), you can switch on this variable, repeat the test, then switch off the variable. If the variable is switched on, and a lock timeout happens, Db2 will write a new text file containing information about the participants in the locking situation showing details that help you understand what is happening and that lets you consider ways to resolve the issue when you have enough facts. You don't want to keep this variable permanently set because it can impact perormance and fill up the Db2 diagnostics file-system if you get a lot of lock-timeouts. So you have to be careful. Read about this variable in the Knowledge Center at this page:
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.regvars.doc/doc/r0005657.html
You diagnose the lock-timeout by careful study of the contents of these files, although of course it's necessary to understand the details also. This is a regular DBA activity.
Another method is to use db2pdcfg -catch with a custom db2cos script, to decide what to do after Db2 throws the -911. This needs scripting skills and it lets you decide exactly what diagnostics to collect after the -911 and where to store those diagnostics.
Another method which involves much more work but potentially pays more dividends is to use an event monitor for locking. The documentation is at:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0054074.html
Be sure to study the "Related concepts" and "Related tasks" pages also.

Atomic read and delete in mongo

I am fairly new to mongo, so what I'm trying to achieve here might not be possible. My research so far is inconclusive...
My scenario is the following: I have an application which may have multiple instances running. These instances are processing some data, and when that processing fails, they write the ID of the failed item in a mongo collection ("error").
From time to time I want to retry processing those items. So, at fixed intervals, the application reads all the IDs from the collection, after which it deletes all the records. Now, this is an obvious race condition. Two instances may read the very same data, which would double the work to be done. Some IDs may also be missed like this.
My question would be the following: is there any way I can read and delete those records, in a distributed-atomic way? I was thinking about locking the collection, but for this I found no support so far in the java driver's documentation. I also tried to look for a findAndDrop() like method, but no luck so far.
I am aware of techniques like leader election, which most probably would solve this problem, but I wanted to see if it can be done in an easier way.
You can use BlockingQueue with multiple producer-single consumer approach, as you have multiple producer to produce ids and delete them with single consumer.
After all, I found no way to implement this with mongo.
However, since this is a heroku app, I stored the IDs in a Redis collection. This library I found implements a distributed Redis lock for Jedis, so this workaround solved my problem.

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

Categories