Spring Batch Job hangs - concurrent steps and each step using multiple threads - java

I am using Spring Batch for processing records from database tables using the below scenario:
Processing data from 5 tables concurrently using 5 parallel steps
Each parallel step has further 5 threads to process records from single table
Here is the summary of job configuration: TestJob -> Parallel Step1 & Step2 -> Step 1 using 2 threads, Step 2 using 2 threads
For Spring Batch tables I tried using SQL Server database, HSQL in memory database but somehow Spring batch stucks when selecting from BATCH_STEP_EXECUTION_SEQ
Spring batch trying to INSERT into BATCH_STEP_EXECUTION table so trying to get ID from BATCH_STEP_EXECUTION_SEQ table where it hangs.
I am using Spring Boot 2.2.2.RELEASE version. I tried override jobrepository configuration with different create isolation levels but problem always persists.
NOTE:
Everything is working as expected when:
Processing concurrent tables at a time and each table processed by single thread
Processing single table at a time and single table processed by multiple threads
Any help/pointer to fix the problem is highly appreciated.
Thanks,
Har Krishan

Just for the sake of others who may be facing the same issue. The issue seems with the configurations of database specific tables and sequences. I tried with SQL Server, still issue persists with the default provided database scripts. Then I tried with Hsql memory database again issue persists. Then I tried using H2 memory database and it worked with that. It also works with MapJobRepositoryFactoryBean.
So you may need to tweak DDL as per database.
Thanks!

Related

parallel updates to oracle table from spring mvc application

We have an application running in 2 containers/pods. This applications reads request from ActiveMQ and for processing this request this applications needs to update 10 tables. We have these hundreds of ASYNC request in ActiveMQ that needs to be processed in hundreds per second and each request tries to update the 10 tables in one oracle database. Because of these updates to 10 tables at the same fraction of a second some of the request fails frequently with this error "Error updating database. Cause: java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource"
Is there a better way to handle these kind of scenarios like a better architecture ?
Is there a better way to handle these kind of scenarios using spring framework?
Your premise is wrong; deadlocks are not caused simply by a large amount of activity on many tables. They're a special kind of lock that occur when different sessions try to lock the same row in a different order. When that happens, the only solution is to kill one of the sessions to release the lock.
The first step in investigating deadlocks is to look in the alert log and find the trace file generated by a deadlock. It will show the statements and objects involved. If you're lucky, the mistake is caused by a missing foreign key index or a bitmap index on a transactional table, which can be resolved with a simple DDL change.
If you're unlucky, the application is changing tables in different orders. In that case, you'll need to change the application to always process changes the same way. I don't think I can offer any advice for that, it depends entirely on your application.

MySQL only shows some data that is inserted by Hibernate

I have a Java application that loads data from a very large text file into MySQL via Hibernate. So far, the application seems to work OK - the data gets loaded.
Then I wanted to monitor the progress of the application while it was inserting data, so I wrote a small command-line utility that basically queries select count(*) from my_table;.
The first time this query is run (from either the CLI or MySQL Workbench), I get the correct number of records, as expected. All subsequent executions of this query return the exact same number! Even though the data-loading application is still running!
If I stop and start the MySQL process, querying for the number of records shows the correct number, as the data-loading application would report it.
I've never seen anything like this before. It looks like there is some strange MySQL caching issue going on here, and I'm concerned it may cause problems for other non-Hibernate applications that may want to access this database.
Does anyone know how to tweak MySQL (or possibly Hibernate) so that MySQL always shows what's being added to the database?
Technical details:
MySQL: 5.7.26, in a docker container, using InnoDB storage
Hibernate version: 5.4.2.Final
Spring version: 5.1.7.RELEASE
Calling FLUSH TABLES seems to resolve this, in the sense that after I flush the tables, I can see how many records have been added by the application.

Difference between Spring JDBCTemplate ResultSetExtractor and Spring Batch ItemReader

I do have a large MySQL database table (more than 1 million records). I need to read all data and do some processing on them using java language.
I want to make sure that the java process shouldn't consume more memory by taking the entire result set in memory.
While looking at cursor based implementations, I found some options,
Using Spring JDBCTemplate override ResultSetExtractor or RowCallbackHandler and reading row sequentially.
Other options using Spring Batch JDBCCursorItemReader/JDBCPagingItemReader.
Can someone explain what is the difference between these two options ?
Option 1 seems better with some internal batching at your application side, if you require any batching.
JdbcCursorItemReader opens a new connection and hence will not participate in your application transaction. See the API at http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/item/database/AbstractCursorItemReader.html This API is part of spring batch. If you are writing a batch processing application then it will be well suited. See spring batch

spring batch MapJobRepositoryFactoryBean

We get the below error when using spring batch .
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=8827 with wrong version (1), where current version is 2
What I observed from different forums was that we were using org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean which is not thread safe and not adivsable to be used in production.
We do not want to persist the meta data of the jobs or use in memory database - Is there any other alternative to MapJobRepositoryFactoryBean ?
Thanks
Lives
According to this post on the spring forums the MapJobRepositoryFactoryBean is not generally intended for production use. I guess I would ask why wouldn't you want the metadata persisted to a database? It provides tremendous value, not to mention giving you the ability to use the spring batch admin console.

Handling transactions spanning across database servers

I have a scenario where the unit of work is defined as:
Update table T1 in database server S1
Update table T2 in database server S2
And I want the above unit of work to happen either completely or none at all (as the case with any database transaction). How can I do this? I searched extensively and found this post close to what I am expecting but this seems to be very specific to Hibernate.
I am using Spring, iBatis and Tomcat (6.x) as the container.
It really depends on how robust a solution you need. The minimal level of reliability on such a thing is XA transactions. To use that, you need a database and JDBC driver that supports it for starters, then you could configure Spring to use it (here is an outline).
If XA isn't robust enough for you (XA has failure scenarios, such as if something goes wrong in the second phase of commits, such as a hardware failure) then what you really need to do is put all the data in one database and then have a separate process propagate it. So the data may be inconsistent, but it is recoverable.
Edit: What I mean is that put the whole of the data into one database. Either the first database, or a different database for this purpose. This database would essentially become a queue from which the final data view is fed. The write to that database (assuming a decent database product) will be complete, or fail completely. Then, a separate thread would poll that database and distribute any missing data to the other databases. So if the process should fail, when that thread starts up again it will continue the distribution process. The data may not exist in every place you want it to right away, but nothing would get lost.
You want a distributed transaction manager. I like using Atomikos which can be run within a JVM.

Categories