Jpa reader Spring Batch - java

I would like to know, if this way is recommended to implement the reader spring batch with jpa or is it better to look for another solution and if this way is not recommended where can I look for information on a better option
public class CreditCardItemReader implements ItemReader<CreditCard> {
#Autowired
private CreditCardRepository respository;
private Iterator<CreditCard> usersIterator;
#BeforeStep
public void before(StepExecution stepExecution) {
usersIterator = respository.someQuery().iterator();
}
#Override
public CreditCard read() {
if (usersIterator != null && usersIterator.hasNext()) {
return usersIterator.next();
} else {
return null;
}
}
}

This implementation is acceptable only for the small dataset because data is read by one batch query, and stored whole result list in memory. Also, it is not thread-safe.
In the case of loading large volumes:
on the environment with limited memory can lead to out of memory
can lead to performance problems. We will wait until thousands of records will be loaded from DB by one call
Solution 1, org.springframework.batch.item.database.JpaCursorItemReader
A similar implementation is defined out of the box in Spring Batch: JpaCursorItemReader
The main difference is that this implementation is working only with specific JPQL query instead of repository and use JPA’s Query.getResultStream() method to get query results.
Implementation of JpaCursorItemReader:
protected void doOpen() throws Exception {
...
Query query = createQuery();
if (this.parameterValues != null) {
this.parameterValues.forEach(query::setParameter);
}
this.iterator = query.getResultStream().iterator();
}
Hibernate, for example, introduced the Query.getResultStream() method in version 5.2.
It uses Hibernate’s ScrollableResult implementation to move through the result set and to fetch the records in batches. That prevents you from loading all records of the result set at once and allows you to process them more efficiently.
Example of creation:
protected ItemReader<Foo> getItemReader() throws Exception {
LocalContainerEntityManagerFactoryBean factoryBean = new LocalContainerEntityManagerFactoryBean();
String jpqlQuery = "from Foo";
JpaCursorItemReader<Foo> itemReader = new JpaCursorItemReader<>();
itemReader.setQueryString(jpqlQuery);
itemReader.setEntityManagerFactory(factoryBean.getObject());
itemReader.afterPropertiesSet();
itemReader.setSaveState(true);
return itemReader;
}
Solution 2, org.springframework.batch.item.database.JpaPagingItemReader
It is more flexible solution for JPQL query than JpaCursorItemReader. ItemReader loads and stores data by pages and it is thread-safe.
According to documentation:
ItemReader for reading database records built on top of JPA.
It executes the JPQL setQueryString(String) to retrieve requested
data. The query is executed using paged requests of a size specified
in AbstractPagingItemReader.setPageSize(int). Additional pages are
requested when needed as
AbstractItemCountingItemStreamItemReader.read() method is called,
returning an object corresponding to current position.
The performance of the paging depends on the JPA implementation and
its use of database specific features to limit the number of returned
rows.
Setting a fairly large page size and using a commit interval that
matches the page size should provide better performance.
In order to reduce the memory usage for large results the persistence
context is flushed and cleared after each page is read. This causes
any entities read to be detached. If you make changes to the entities
and want the changes persisted then you must explicitly merge the
entities.
The implementation is thread-safe in between calls
Solution 3, org.springframework.batch.item.data.RepositoryItemReader
It is a more efficient solution. It works with the repository, loads and stores data in chunks and it is thread-safe.
According to documentation:
A ItemReader that reads records utilizing a
PagingAndSortingRepository.
Performance of the reader is dependent on the repository
implementation, however setting a reasonably large page size and
matching that to the commit interval should yield better performance.
The reader must be configured with a PagingAndSortingRepository, a
Sort, and a pageSize greater than 0.
This implementation is thread-safe between calls to
AbstractItemCountingItemStreamItemReader.open(ExecutionContext), but
remember to use saveState=false if used in a multi-threaded client (no
restart available).
Example of creation:
PagingAndSortingRepository<Foo, Long> repository = FooRepository<>();
RepositoryItemReader<Foo> reader = new RepositoryItemReader<>();
reader.setRepository(repository ); //The PagingAndSortingRepository implementation used to read input from.
reader.setMethodName("findByName"); //Specifies what method on the repository to call.
reader.setArguments(arguments); // Arguments to be passed to the data providing method.
Creation via builder:
PagingAndSortingRepository<Foo, Long> repository = new FooRepository<>();
new RepositoryItemReaderBuilder<>().repository(repository)
.methodName("findByName")
.arguments(new ArrayList<>())
.build()
More examples of usage: RepositoryItemReaderTests and RepositoryItemReaderIntegrationTests
Summarise:
Your implementation is good only for simple use cases.
I recommend to use out of box solutions.

Related

How to check special conditions before saving data with Hibernate

Sample Scenario
I have a limit that controls the total value of a column. If I make a save that exceeds this limit, I want it to throw an exception. For example;
Suppose I have already added the following data: LIMIT = 20
id
code
value
1
A
15
2
A
5
3
B
12
4
B
3
If I insert (A,2) it exceeds the limit and I want to get exception
If I insert (B,4) the transaction should be successful since it didn't exceed the limit
code and value are interrelated
What can I do
I can check this scenario with required queries. For example, I write a method for it and I can check it in the save method. That's it.
However, I'm looking for a more useful solution than this
For example, is there any annotation when designing Entity ?
Can I do this without calling the method that provides this control every time ?
What examples can I give ?
#UniqueConstraint checking if it adds the same values
Using transaction
The most common and long-accepted way is to simply abstract in a suitable form (in a class, a library, a service, ...) the business rules that govern the behavior you describe, within a transaction:
#Transactional(propagation = Propagation.REQUIRED)
public RetType operation(ReqType args) {
...
perform operations;
...
if(fail post conditions)
throw ...;
...
}
In this case, if when calling a method there is already an open transaction, that transaction will be used (and there will be no interlocks), if there is no transaction created, it will create a new one so that both the operations and the postconditions check are performed within the same transaction.
Note that with this strategy both operation and invariant check transactions can combine multiple transactional states managed by the TransactionManager (e.g. Redis, MySQL, MQS, ... simultaneously and in a coordinated manner).
Using only the database
It has not been used for a long time (in favor of the first way) but using TRIGGERS was the canonical option used some decades ago to check postconditions, but this solution is usually coupled to the specific database engine (e.g. in PostgreSQL or MySQL).
It could be useful in the case where the client making the modifications is unable or unwilling (not safe) to check postconditions (e.g. bash processes) within a transaction. But nowadays it is infrequent.
The use of TRIGGERS may also be preferable in certain scenarios where efficiency is required, as there are certain optimization options within the database scripts.
Neither Hibernate nor Spring Data JPA have anything built-in for this scenario. You have to program the transaction logic in your repository yourself:
#PersistenceContext
EntityManager em;
public addValue(String code, int value) {
var checkQuery = em.createQuery("SELECT SUM(value) FROM Entity WHERE code = :code", Integer.class);
checkQuery.setParameter("code", code);
if (checkQuery.getSingleResult() + value > 20) {
throw new LimitExceededException("attempted to exceed limit for " + code);
}
var newEntity = new Entity();
newEntity.setCode(code);
newEntity.setValue(value);
em.persist(newEntity);
}
Then (it's important!) you have to define SERIALIZABLE isolation level on the #Transactional annotations for the methods that work with this table.
Read more about serializable isolation level here, they have an oddly similar example.
Note that you have to consider retrying the failed transaction. No idea how to do this with Spring though.
You should use a singleton (javax/ejb/Singleton)
#Singleton
public class Register {
#Lock(LockType.WRITE)
public register(String code, int value) {
if(i_can_insert_modify(code, value)) {
//use entityManager or some dao
} else {
//do something
}
}
}

HibernateTransactionManager #Transactional(propagation=REQUIRES_NEW) cannot open 2 sessions

There is one batch job looking like this:
#Transactional
public void myBatchJob() {
// retrieves thousands of entries and locks them
// to prevent other jobs from touthing this dataset
entries = getEntriesToProcessWithLock();
additional = doPrepWork(); // interacts with DB
processor = applicationContext.getBean(getClass());
while (!entries.isEmpty()) {
result = doActualProcessing(entries, additional); // takes as many entries as it needs; removes them from collection afterwards
resultDao.save(result);
}
}
However I occasionally get the below error if the entries collection is big enough.
ORA-01000: maximum open cursors exceeded
I decided to blame doActualProcessing() and save() methods as they could end up in creating hundreds of blobs in one transaction.
The obvious way out seems to be splitting processing into multiple transactions: one for getting and locking entries and multiple other transactions for processing and persisting. Like this:
#Transactional
public void myBatchJob() {
// retrieves thousands of entries and locks them
// to prevent other jobs from touthing this dataset
entries = getEntriesToProcessWithLock();
additional = doPrepWork(); // interacts with DB
processor = applicationContext.getBean(getClass());
while (!entries.isEmpty()) {
processor.doProcess(entries, additional);
}
}
#Transactional(propagation=REQUIRES_NEW)
public void doProcess(entries, additional) {
result = doActualProcessing(entries, additional); // takes as many entries as it needs; removes them from collection afterwards
resultDao.save(result);
}
and now whenever doProcess is called I get:
Caused by: org.hibernate.HibernateException: illegally attempted to associate a proxy with two open Sessions
How do I make HibernateTransactionManager do what REQUIRES_NEW javadoc suggests: suspend current transaction and start a new one?
In my opinion the problem lies in the fact that you have retrieved the entities in the top Transaction and while they are still associated with that transaction you try to pass them (proxies) to method which would be processed in a separate transaction.
I think that you could try two options:
1) Detach the entities before ivoking processor.doProcess(entries, additional);:
session.evict(entity); // loop through the list and do this
then inside inner transaction try to merge:
session.merge(entity);
2) Second option would be to retrieve ids instead of entities in the getEntriesToProcessWithLock. Then you would pass plain primitive fields which wont cause proxy problems. You would then retrieve proper entities inside of the inner transaction.

How to Hibernate Batch Insert with real time data? Use #Transactional or not?

I am trying to perform batch inserts with data that is currently being inserted to DB one statement per transaction. Transaction code statement looks similar to below. Currently, addHolding() method is being called for each quote that comes in from an external feed, and each of these quote updates happens about 150 times per second.
public class HoldingServiceImpl {
#Autowired
private HoldingDAO holdingDao;
#Transactional(propagation = Propagation.REQUIRES_NEW, rollbackFor = Exception.class)
public void addHolding(Quote quote) {
Holding holding = transformQuote(quote);
holdingDao.addHolding(holding);
}
}
And DAO is getting current session from Hibernate SessionFactory and calling save on object.
public class HoldingDAOImpl {
#Autowired
private SessionFactory sessionFactory;
public void addHolding(Holding holding) {
sessionFactory.getCurrentSession().save(holding);
}
}
I have looked at Hibernate batching documentation, but it is not clear from document how I would organize code for batch inserting in this case, since I don't have the full list of data at hand, but rather am waiting for it to stream.
Does merely setting Hibernate batching properties in properties file (e.g. hibernate.jdbc.batch_size=20) "magically" batch insert these? Or will I need to, say, capture each quote update in a synchronized list, and then insert list load and clear list when batch size limit reached?
Also, the whole purpose of implementing batching is to see if performance improves. If there is better way to handle inserts in this scenario, let me know.
Setting the property hibernate.jdbc.batch_size=20 is an indication for the hibernate to Flush the objects after 20. In your case hibernate automatically calls sessionfactory.flush() after 20 records saved.
When u call a sessionFactory.save(), the insert command is only fired to in-memory hibernate cache. Only once the Flush is called hibernate synchronizes these changes with the Database. Hence setting hibernate batch size is enough to do batch inserts. Fine tune the Batch size according to your needs.
Also make sure your transactions are handled properly. If you commit a transaction also forces hibernate to flush the session.

Spring #Transactional DAO calls return same object

We are using Spring and IBatis and I have discovered something interesting in the way a service method with #Transactional handles multiple DAO calls that return the same record. Here is an example of a method that does not work.
#Transactional
public void processIndividualTrans(IndvTrans trans) {
Individual individual = individualDAO.selectByPrimaryKey(trans.getPartyId());
individual.setFirstName(trans.getFirstName());
individual.setMiddleName(trans.getMiddleName());
individual.setLastName(trans.getLastName());
Individual oldIndvRecord = individualDAO.selectByPrimaryKey(trans.getPartyId());
individualHistoryDAO.insert(oldIndvRecord);
individualDAO.updateByPrimaryKey(individual);
}
The problem with the above method is that the 2nd execution of the line
individualDAO.selectByPrimaryKey(trans.getPartyId())
returns the exact object returned from the first call.
This means that oldIndvRecord and individual are the same object, and the line
individualHistoryDAO.insert(oldIndvRecord);
adds a row to the history table that contains the changes (which we do not want).
In order for it to work it must look like this.
#Transactional
public void processIndividualTrans(IndvTrans trans) {
Individual individual = individualDAO.selectByPrimaryKey(trans.getPartyId());
individualHistoryDAO.insert(individual);
individual.setFirstName(trans.getFirstName());
individual.setMiddleName(trans.getMiddleName());
individual.setLastName(trans.getLastName());
individualDAO.updateByPrimaryKey(individual);
}
We wanted to write a service called updateIndividual that we could use for all updates of this table that would store a row in the IndividualHistory table before performing the update.
#Transactional
public void updateIndividual(Individual individual) {
Individual oldIndvRecord = individualDAO.selectByPrimaryKey(trans.getPartyId());
individualHistoryDAO.insert(oldIndvRecord);
individualDAO.updateByPrimaryKey(individual);
}
But it does not store the row as it was before the object changed. We can even explicitly instantiate different objects before the DAO calls and the second one becomes the same object as the first.
I have looked through the Spring documentation and cannot determine why this is happening.
Can anyone explain this?
Is there a setting that can allow the 2nd DAO call to return the database contents and not the previously returned object?
You are using Hibernate as ORM and this behavior is perfectly described in the Hibernate documentation. In the Transaction chapter:
Through Session, which is also a transaction-scoped cache, Hibernate provides repeatable reads for lookup by identifier and entity queries and not reporting queries that return scalar values.
Same goes for IBatis
MyBatis uses two caches: a local cache and a second level cache. Each
time a new session is created MyBatis creates a local cache and
attaches it to the session. Any query executed within the session will
be stored in the local cache so further executions of the same query
with the same input parameters will not hit the database. The local
cache is cleared upon update, commit, rollback and close.

Hibernate: Accessing created entity from different transaction

I am having quite complex methods which create different entities during its execution and use them. For instance, I create some images and then I add them to an article:
#Transactional
public void createArticle() {
List<Image> images = ...
for (int i = 0; i < 10; i++) {
// creating some new images, method annotated #Transactional
images.add(repository.createImage(...));
}
Article article = getArticle();
article.addImages(images);
em.merge(article);
}
This correctly works – images have their IDs and then they are added to the article. The problem is that during this execution the database is locked and nothing can be modified. This is very unconvinient because images might be processed by some graphic processor and it might take some time.
So we might try to remove the #Transactional from the main method. This could be good.
What happens is that images are correctly created and have their ID. But once I try to add them to article and call merge, I get javax.persistence.EntityNotFoundException for Image with ID XXXX. The entity manager can't see that the image was created and have its ID. So the database is not locked, but we can't do anything either.
So what can I do? I don't want to have the database locked during the whole execution and I want to be able to access the created entities!
I am using current version of Spring and Hibernate, everything defined by Annotations. I don't use session factory, I am accessing everything via javax.persistence.EntityManager.
Consider leveraging the Hibernate cascading functionality for persisting object trees in one go with minimal database locking:
#Entity
public class Article {
#OneToMany(cascade=CascadeType.MERGE)
private List<Images> images;
}
#Transactional
public void createArticle() {
//images created as Java objects in memory, no DAOs called yet
List<Image> images = ...
Article article = getArticle();
article.addImages(images);
// cascading will save the article AND the images
em.merge(article);
}
Like this the article AND it's images will get persisted at the end of the transaction in a single transaction with a minimal lifetime. Up until then no locking occurred on the database.
Alternativelly split the createArticle in two #Transactional business methods, one createImages and the other addImagesToArticle and call them one after the other in a third method in another bean:
#Service
public class OtherBean {
#Autowired
private YourService yourService;
// note that no transactional annotation is used, this is intentional
public otherMethod() {
yourService.createImages(); // first transaction - images are committed
yourService.addImagesToArticle(); // second transaction - images are added to article
}
}
You could try setting the transaction isolation on your datasource to READ_UNCOMMITTED, though that can lead to inconsistencies so it is generally not a recommended thing to do.
My best guess is that your transaction isolation level is SERIALIZABLE. That's why the DB locks affected tables for the whole duration of a transaction.
If that's the case change the level to READ_COMMITTED. Hibernate (or any JPA provider) works nicely with this one.
It won't lock anything unless you explicitly call entityManager.lock(someEntity, LockModeType.SomeLockType))
Also when you choose transaction boundaries firstly think in terms of atomicity. If createArticle() is an atomic unit of work it just has to be made transactional, breaking it into smaller transactions for the sake of 'optimization' is wrong.

Categories