We get the below error when using spring batch .
org.springframework.dao.OptimisticLockingFailureException: Attempt to update step execution id=8827 with wrong version (1), where current version is 2
What I observed from different forums was that we were using org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean which is not thread safe and not adivsable to be used in production.
We do not want to persist the meta data of the jobs or use in memory database - Is there any other alternative to MapJobRepositoryFactoryBean ?
Thanks
Lives
According to this post on the spring forums the MapJobRepositoryFactoryBean is not generally intended for production use. I guess I would ask why wouldn't you want the metadata persisted to a database? It provides tremendous value, not to mention giving you the ability to use the spring batch admin console.
Related
Context
We have a Spring boot application (an API used by an angular frontend).
It is running on a docker container.
It is using a single instance of a PostgreSQL database.
Our application had some load problems so we asked us to scale it.
We told us to run our API on several docker containers for that.
We have several questions / problems dealing with code synchronization over multiple docker instances executing our code.
Problem 1
We have some #Scheduled jobs integrated and deployed with our API code.
We don't want these scheduled jobs to be executed by all container instances, but only one.
I think we can simply handle this by disabling jobs on other containers through environment variables with the "-" value to disable the Spring scheduled cron.
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/scheduling/annotation/Scheduled.html#CRON_DISABLED
Does it sounds right?
Problem 2
The other problem is that we use Spring's #Lock annotation on some repository methods.
public interface IncrementRepository extends JpaRepository<IncrementEntity, UUID> {
#Lock(LockModeType.PESSIMISTIC_FORCE_INCREMENT)
Optional<IncrementEntity> findByAnnee(String pAnneeAA);
#Lock(LockModeType.PESSIMISTIC_WRITE)
IncrementEntity save(IncrementEntity pIncrementEntity);
}
This is critical for us to have a lock on that as we get / compute an increment used to act as a unique identifier of some of our data.
If I correctly understood this locking mechanism :
if a process execute this code, the Spring JPA #Transaction will acquire a lock on the IncrementEntity (lock the database table).
when another process tries do do the same thing before the first lock has been released by the first transaction, it should have a PessimisticLockException and the second transaction will rollback
this is managed by Spring at application level, NOT directly at database level (right??)
So what will happen if we're running our code on several containers ?
app running in container 1 sets a lock
app running in container 2 execute the same code and tries to set the same lock while the first one has not been released yet
each Spring application running in different containers will probably acquire the lock without problems as they don't share the same information?
Please tell me if I correctly understood how it works, and if we will effectively have a problem running such code on several docker containers.
I guess that solution would be to set a lock directly on the database table, as we have only one instance of it?
Is there a way to easily set / release the lock at database level using Spring JPA code ?
Or perhaps I misunderstood and setting a lock using Spring's #Lock annotation sets a real database lock ?
In that case, perhaps we don't have any problem at all, as the lock is correctly set on the database itself, shared by all containers instances??
Problem 3
To avoid having too much exceptions and reject some requests trying to acquire a lock at the same time, we also added a synchronized block around the above code.
String numIncrement;
synchronized (this.mutex) {
try {
numIncrement = this.incrementService.getIncrement(var);
} catch (Exception e) {
// rethrow custom technical exception
}
}
This way concurrent requests should be delayed and queued, which is better for our users experience.
I guess that we will also have problems here as docker instances doesn't share the same JVM, so synchronization can work only in the scope of the container itself... right?
Conclusion
For all these problems, please tell me if you have some solutions to workaround / adapt our code so it can be compatible with app scaling.
Following a set of tests I can confirm these points about my original question
Problem 1
We can disable a Spring CRON with the - value
#Scheduled(cron = "-")
Problem 2
The Spring's JPa #Lock annotation sets a lock on the database itself. It is not managed by Spring software.
So when duplicating containers, if the Spring app in the first container sets a lock, the database is locked and when the second app in another container tries to get data it has the PessimisticLockException.
Problem 3
Synchronized code using the synchronized JAVA keyword is obviously managed by JVM, so there is no code mutual exclusion between containers.
From this article we can learn that Spring-Batch holds the Job's status in some SQL repository.
And from this article we can learn that the location of the JobRepository can be configured - can be in-memory and can be remote DB.
So if we need to scale a batch job, should we run several different Spring-batch JARs, all configured to use the same shared DB in order to keep them synchronized?
Is this the right pattern / architecture?
Yes, this is the way to go. The problem that might happen when you launch the same job from different physical nodes is that you can create the same job instance twice. In this case, Spring Batch will not know which instance to pick up when restarting a failed execution. A shared job repository acts as a safeguard to prevent this kind of concurrency issues.
The job repository achieves this synchronization thanks to the transactional capabilities of the underlying database. The IsolationLevelForCreate can be set to an aggressive value (SERIALIZABLE is the default) in order to avoid the aforementioned issue.
I do have a large MySQL database table (more than 1 million records). I need to read all data and do some processing on them using java language.
I want to make sure that the java process shouldn't consume more memory by taking the entire result set in memory.
While looking at cursor based implementations, I found some options,
Using Spring JDBCTemplate override ResultSetExtractor or RowCallbackHandler and reading row sequentially.
Other options using Spring Batch JDBCCursorItemReader/JDBCPagingItemReader.
Can someone explain what is the difference between these two options ?
Option 1 seems better with some internal batching at your application side, if you require any batching.
JdbcCursorItemReader opens a new connection and hence will not participate in your application transaction. See the API at http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/item/database/AbstractCursorItemReader.html This API is part of spring batch. If you are writing a batch processing application then it will be well suited. See spring batch
I am trying to get a very minimal JPA + SDN (Spring Data Neo4j) cross store project running and am trying to demonstrate that saving a partial entity using a JPA repository call will create a corresponding node in Neo4j.
I have followed the instructions / advice that I have been able to find on SO, Google and Spring's site but am currently still having trouble standing things up. I currently have a minimal test project created at:
https://github.com/simon-lam/sdn-cross-store-poc
The project uses Spring Boot and has a simple domain containing a graph entity, GraphNodeEntity.java, and a partial entity, PartialEntity.java. I have written a very basic test, PartialEntityRepositoryTest.java, to do a save on the partial entity and am seeing:
The wrong transaction manager seems to be used because the CrossStoreNeo4jConfiguration class does not properly autowire entityManagerFactory, it is null
As a result of the above ^, no ID is assigned to my entity
I do not see any SDN activity in the logs at all
Am I doing something glaringly wrong?
More generally, I was hoping to confirm some assumptions and better understand cross store persistence support in general:
To enable it, do I need to enable advanced mapping?
As part of enabling advanced mapping, I need to set up AspectJ; does this include enabling load time weaving? If so is this accomplished through using the #EnableLoadTimeWeaving config?
Assuming that all my configuration is eventually fixed, should I expect to see partial nodes persist in Neo4j when I persist them using a JPA repository? This should be handled by the cross store support which is driven by aspects right?
Thank you for any help that can be offered!
I sent a message to the Neo4j Google Group and got some feedback from Michael Hunger so I'm going to share here:
Turns out the cross store lib has been dormant for a while
JPA repos are not supported, only the EntityManager operations are
The cross store setup was not meant for a remote server and was not tested
So in summary my core understanding / assumptions were off!
Source: https://groups.google.com/forum/#!topic/neo4j/FGI8692AVJQ
I am developing a Spring Boot application that uses Spring Data JPA and will need to connect to many different databases e.g. PostreSQL, MySQL, MS-SQL, MongoDB.
I need to create all datasources in runtime i.e. user choose these data by GUI in started application:
-driver(one of the list),
-source,
-port,
-username,
-password.
And after all he writes native sql to choosen database and get results.
I read a lot of things about it in stack and spring forums(e.g. AbstractRoutingDataSource) but all of these tutorials show how to create datasources from xml configuration or static definition in java bean. It is possible to create many datsources in runtime? How to manage transactions and how to create many sessionFactories? It is possible to use #Transactional annotation? What is the best method to do this? Can someone explain me how to do this 'step by step'?
Hope it's not too late for an answer ;)
I developed a module which can be easily integrated in any spring project. It uses a meta-datasource to hold the tenant-datasource connection details.
For the tenant-datasource an AbstractRoutingDataSource is used.
Here you find my core implementation using the AbstractRoutingDataSource.
https://github.com/Dactabird/multitenancy
Here is an example to show how to integrate it. https://github.com/Dactabird/multitenancy-sample
In this example I'm using H2 embedded db. But of course you can use whatever you want.
Feel free to modify it for your purposes or to ask if questions are left!