Spring Batch + Hibernate: Resolve ManyToMany on Data Migration

Spring Batch + Hibernate: Resolve ManyToMany on Data Migration - java

We are doing a data migration from one database to another using Hibernate and Spring Batch. The example below is slightly disguised.
Therefore, we are using the standard processing pipeline:
return jobBuilderFactory.get("migrateAll")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(DConfiguration.migrateD())
and migrateD consists of three steps:
#Bean(name="migrateDsStep")
public Step migrateDs() {
return stepBuilderFactory.get("migrateDs")
.<org.h2.D, org.mssql.D> chunk(100)
.reader(dReader())
.processor(dItemProcessor)
.writer(dWriter())
.listener(chunkLogger)
.build();
Now asume that this table has a manytomany relationship to another table. How can I persist that? I have basically a JPA Entity Class for all my Entities and fill those in the processor which does the actual migration from the old database objects to the new ones.
#Component
#Import({mssqldConfiguration.class, H2dConfiguration.class})
public class ClassificationItemProcessor implements ItemProcessor<org.h2.d, org.mssql.d> {
public ClassificationItemProcessor() {
super();
}
public Classification process(org.h2.d a) throws Exception {
d di = new di();
di.setA(a.getA);
di.setB(a.getB);`
// asking for object e.g. possible via, But this does not work:
// Set<e> es = eRepository.findById(a.getes());
di.set(es)
...
// How to model a m:n?
return d;
}
So I could basically ask for the related object via another database call (Repository) and add it to d. But when I do that, I rather run into LazyInitializationExceptions or, if it was successful sometimes the data in the intermediate tables will not have been filled up.
What is the best practice to model this?

This is not a Spring Batch issue, it is rather a Hibernate mapping issue. As far as Spring Batch is concerned, your input items are of type org.h2.D and your output items are of type org.mssql.D. It is up to you define what an item is and how to "enrich" it in your item processor.
You need to make sure that items received by the writer are completely "filled in", meaning that you have already set any other entities on them (be it a single entity or a set of of entities such as di.set(es) in your example). If this leads to lazy intitialization exceptions, you need to change your model to be eagerly initialized instead, because Spring Batch cannot help at that level.

Related

DDD implementation with Spring Data and JPA + Hibernate problem with identities

So I'm trying for the first time in a not so complex project to implement Domain Driven Design by separating all my code into application, domain, infrastructure and interfaces packages.
I also went with the whole separation of the JPA Entities to Domain models that will hold my business logic as rich models and used the Builder pattern to instantiate. This approach created me a headache and can't figure out if Im doing it all wrong when using JPA + ORM and Spring Data with DDD.
Process explanation
The application is a Rest API consumer (without any user interaction) that process daily through Scheduler tasks a fairly big amount of data resources and stores or updates into MySQL. Im using RestTemplate to fetch and convert the JSON responses into Domain objects and from there Im applying any business logic within the Domain itself e.g. validation, events, etc
From what I have read the aggregate root object should have an identity in their whole lifecycle and should be unique. I have used the id of the rest API object because is already something that I use to identify and track in my business domain. I have also created a property for the Technical id so when I convert Entities to Domain objects it can hold a reference for the update process.
When I need to persist the Domain to the data source (MySQL) for the first time Im converting them into Entity objects and I persist them using the save() method. So far so good.
Now when I need to update those records in the data source I first fetch them as a List of Employees from data source, convert Entity objects to Domain objects and then I fetch the list of Employees from the rest API as Domain models. Up until now I have two lists of the same Domain object types as List<Employee>. I'm iterating them using Streams and checking if an objects are not equal() between them if yes a collection of List items is created as a third list with Employee objects that need to be updated. Here I've already passed the technical Id to the domain objects in the third list of Employees so Hibernate can identify and use to update the records that are already exists.
Up to here are all fairly simple stuff until I use the saveAll() method to update the records.
Questions
I alway see Hibernate using INSERT instead of updating the list of
records. So If Im correct Hibernate session is not recognising the
objects that Im throwing into it because I have detached them when I
used the convert to domain object?
Does anyone have a better idea how can I implement this differently or fix
this problem?
Or should I stop using this approach as two different objects and continue use
them as rich Entity models?
Simple classes to explain it with code
EmployeeDO.java
#Entity
#Table(name = "employees")
public class EmployeeDO implements Serializable {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
public EmployeeDO() {}
...omitted getter/setters
}
Employee.java
public class Employee {
private Long persistId;
private Long employeeId;
private String name;
private Employee() {}
...omitted getters and Builder
}
EmployeeConverter.java
public class EmployeeConverter {
public static EmployeeDO serialize(Employee employee) {
EmployeeDO target = new EmployeeDO();
if (employee.getPersistId() != null) {
target.setId(employee.getPersistId());
}
target.setName(employee.getName());
return target;
}
public static Employee deserialize(EmployeeDO employee) {
return new Country.Builder(employee.getEmployeeId)
.withPersistId(employee.getId()) //<-- Technical ID setter
.withName(employee.getName())
.build();
}
}
EmployeeRepository.java
#Component
public class EmployeeReporistoryImpl implements EmployeeRepository {
#Autowired
EmployeeJpaRepository db;
#Override
public List<Employee> findAll() {
return db.findAll().stream()
.map(employee -> EmployeeConverter.deserialize(employee))
.collect(Collectors.toList());
}
#Override
public void saveAll(List<Employee> employees) {
db.saveAll(employees.stream()
.map(employee -> EmployeeConverter.serialize(employee))
.collect(Collectors.toList()));
}
}
EmployeeJpaRepository.java
#Repository
public interface EmployeeJpaRepository extends JpaRepository<EmployeeDO, Long> {
}

I use the same approach on my project: two different models for the domain and the persistence.
First, I would suggest you to don't use the converter approach but use the Memento pattern. Your domain entity exports a memento object and it could be restored from the same object. Yes, the domain has 2 functions that aren't related to the domain (they exist just to supply a non-functional requirement), but, on the other side, you avoid to expose functions, getters and constructors that the domain business logic never use.
For the part about the persistence, I don't use JPA exactly for this reason: you have to write a lot of code to reload, update and persist the entities correctly. I write directly SQL code: I can write and test it fast, and once it works I'm sure that it does what I want. With the Memento object I can have directly what I will use in the insert/update query, and I avoid myself a lot of headaches about the JPA of handling complex tables structures.
Anyway, if you want to use JPA, the only solution is to:
load the persistence entities and transform them into domain entities
update the domain entities according to the changes that you have to do in your domain
save the domain entities, that means:
reload the persistence entities
change, or create if there're new ones, them with the changes that you get from the updated domain entities
save the persistence entities
I've tried a mixed solution, where the domain entities are extended by the persistence ones (a bit complex to do). A lot of care should be took to avoid that domain model should adapts to the restrictions of JPA that come from the persistence model.
Here there's an interesting reading about the splitting of the two models.
Finally, my suggestion is to think how complex the domain is and use the simplest solution for the problem:
is it big and with a lot of complex behaviours? Is expected that it will grow up in a big one? Use two models, domain and persistence, and manage the persistence directly with SQL It avoids a lot of caos in the read/update/save phase.
is it simple? Then, first, should I use the DDD approach? If really yes, I would let the JPA annotations to split inside the domain. Yes, it's not pure DDD, but we live in the real world and the time to do something simple in the pure way should not be some orders of magnitude bigger that the the time I need to to it with some compromises. And, on the other side, I can write all this stuff in an XML in the infrastructure layer, avoiding to clutter the domain with it. As it's done in the spring DDD sample here.

When you want to update an existing object, you first have to load it through entityManager.find() and apply the changes on that object or use entityManager.merge since you are working with detached entities.
Anyway, modelling rich domain models based on JPA is the perfect use case for Blaze-Persistence Entity Views.
Blaze-Persistence is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model. I created Entity Views on top of it to allow easy mapping between JPA models and custom interface defined models, something like Spring Data Projections on steroids. The idea is that you define your target structure the way you like and map attributes(getters) via JPQL expressions to the entity model. Since the attribute name is used as default mapping, you mostly don't need explicit mappings as 80% of the use cases is to have DTOs that are a subset of the entity model.
The interesting point here is that entity views can also be updatable and support automatic translation back to the entity/DB model.
A mapping for your model could look as simple as the following
#EntityView(EmployeeDO.class)
#UpdatableEntityView
interface Employee {
#IdMapping("persistId")
Long getId();
Long getEmployeeId();
String getName();
void setName(String name);
}
Querying is a matter of applying the entity view to a query, the simplest being just a query by id.
Employee dto = entityViewManager.find(entityManager, Employee.class, id);
The Spring Data integration allows you to use it almost like Spring Data Projections: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-features and it can also be saved back. Here a sample repository
#Repository
interface EmployeeRepository {
Employee findOne(Long id);
void save(Employee e);
}
It will only fetch the mappings that you tell it to fetch and also only update the state that you make updatable through setters.
With the Jackson integration you can deserialize your payload onto a loaded entity view or you can avoid loading alltogether and use the Spring MVC integration to capture just the state that was transferred and flush that. This could look like the following:
#RequestMapping(path = "/employee/{id}", method = RequestMethod.PUT, consumes = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<String> updateEmp(#EntityViewId("id") #RequestBody Employee emp) {
employeeRepository.save(emp);
return ResponseEntity.ok(emp.getId().toString());
}
Here you can see an example project: https://github.com/Blazebit/blaze-persistence/tree/master/examples/spring-data-webmvc

How to create a completely custom query in spring boot

Request
Within spring boot I'd like to run a completely custom query, for example running a query to get the firstname and lastname columns on the table person. BUT where the existence of the table person and the fact that it may have columns firstname and lastname was not known at compile time.
Why I want to do this
I want to do this as within our application because we have a concept of custom fields and custom entities. These have views automatically built over the top of them. At run time I will know what views are available and what columns they have but I will not know that when the application starts (and they may change while the application is running
What I don't want
A query annotation on a crudRepository because that still needs to target a particular object and so can't have dynamic fields or an arbitrary object (unless someone knows how to make a crudRepository do that)

If you have knowledge of the entity and the fields you need at run time, you can use SpringJDBC Templates to construct SQL queries. Below is an example of fetching a Todo item. You can do something similar but passing in the entity name and a collection of fields you would want. Here is a very basic example:
#Autowired
private DataSource dataSource; // Configure this in a class annotated with #Configuration
public Todo fetchWithToDoId(long id) {
Todo record = new JdbcTemplate(dataSource).queryForObject("SELECT * FROM PUBLIC.TODO WHERE todo_id = ?", new Object[]{id}, getRowMapper());
return record;
}
private RowMapper<Todo> getRowMapper() {
return (resultSet, i) -> {
Todo d = new Todo();
d.setUserId(resultSet.getInt("todo_user_id"));
d.setId(resultSet.getInt("todo_id"));
d.setTitle(resultSet.getString("todo_title"));
d.setCompleted(resultSet.getBoolean("todo_completed"));
d.setCreated(resultSet.getTimestamp("todo_created"));
return d;
};
}
If the tables and the columns do not exist then an exception will be thrown and it's up to you to handle them on the server side and present the appropriate view to the client. You would probably expand this to take in as argument an entity and a KeyValuePair data structure to map field to value. When you construct you query then it will present all the fields and their target values.

Hibernate: Accessing created entity from different transaction

I am having quite complex methods which create different entities during its execution and use them. For instance, I create some images and then I add them to an article:
#Transactional
public void createArticle() {
List<Image> images = ...
for (int i = 0; i < 10; i++) {
// creating some new images, method annotated #Transactional
images.add(repository.createImage(...));
}
Article article = getArticle();
article.addImages(images);
em.merge(article);
}
This correctly works – images have their IDs and then they are added to the article. The problem is that during this execution the database is locked and nothing can be modified. This is very unconvinient because images might be processed by some graphic processor and it might take some time.
So we might try to remove the #Transactional from the main method. This could be good.
What happens is that images are correctly created and have their ID. But once I try to add them to article and call merge, I get javax.persistence.EntityNotFoundException for Image with ID XXXX. The entity manager can't see that the image was created and have its ID. So the database is not locked, but we can't do anything either.
So what can I do? I don't want to have the database locked during the whole execution and I want to be able to access the created entities!
I am using current version of Spring and Hibernate, everything defined by Annotations. I don't use session factory, I am accessing everything via javax.persistence.EntityManager.

Consider leveraging the Hibernate cascading functionality for persisting object trees in one go with minimal database locking:
#Entity
public class Article {
#OneToMany(cascade=CascadeType.MERGE)
private List<Images> images;
}
#Transactional
public void createArticle() {
//images created as Java objects in memory, no DAOs called yet
List<Image> images = ...
Article article = getArticle();
article.addImages(images);
// cascading will save the article AND the images
em.merge(article);
}
Like this the article AND it's images will get persisted at the end of the transaction in a single transaction with a minimal lifetime. Up until then no locking occurred on the database.
Alternativelly split the createArticle in two #Transactional business methods, one createImages and the other addImagesToArticle and call them one after the other in a third method in another bean:
#Service
public class OtherBean {
#Autowired
private YourService yourService;
// note that no transactional annotation is used, this is intentional
public otherMethod() {
yourService.createImages(); // first transaction - images are committed
yourService.addImagesToArticle(); // second transaction - images are added to article
}
}

You could try setting the transaction isolation on your datasource to READ_UNCOMMITTED, though that can lead to inconsistencies so it is generally not a recommended thing to do.

My best guess is that your transaction isolation level is SERIALIZABLE. That's why the DB locks affected tables for the whole duration of a transaction.
If that's the case change the level to READ_COMMITTED. Hibernate (or any JPA provider) works nicely with this one.
It won't lock anything unless you explicitly call entityManager.lock(someEntity, LockModeType.SomeLockType))
Also when you choose transaction boundaries firstly think in terms of atomicity. If createArticle() is an atomic unit of work it just has to be made transactional, breaking it into smaller transactions for the sake of 'optimization' is wrong.

Submitting / binding partial objects with spring mvc

The Spring MVC binding mechanism is powerful, but I'm now confronted with a trivial issue that I wonder how to resolve:
User JPA entity, that is used for the binding and validation as well (i.e. throughout all layers)
"Edit profile" page, that is not supposed to change the password or some other entity properties
Two ways that I can think of:
Using the same object
use #InitBinder to configure a list of disallowed properties
obtain the target user (by id)
then use a reflection utility (BeanUtils) to copy the submitted object to the target object, but ignore null values - i.e. fields that are not submitted
Introduce a new object that has the needed subset of fields, and use BeanUtils.copyProperties(..) to merge it to the entity.
Alternatives?

I've found that as soon as your web model starts to diverge from your business layer in function, it's best to use a view layer object (a model object) to collect, or display the data
the entity:
public class com.myapp.domain.UserEntity {
}
the model object:
public class com.myapp.somesite.web.SomeSiteUserModel {
public static SomeSiteUserModel from(UserEntity userEntity) {
... initialize model ...
}
public UserEntity getModelObject() {
... get entity back ...
}
}
now all view based operations can hand off processing to the internal model object if that makes sense, otherwise it can customize them itself. Of course the problem with this is you have to re-write all the getters and setters you want for the entity (an issue that I've had to deal with, that is annoying) unfortunately that is a bit of a Java language issue

I just checked up with two of the last Spring projects I have worked on and in both places the following approach is taken:
In the JSP page for the form the change password field has a name that does not match the name of the password field in the User bean, so that it doesn't get mapped to the bean. Then in the onSubmit method there is a separate check whether a new password has been submitted, and if it has been, the change is reflected explicitly.
Поздрави,
Vassil

You can read the object from the database first and bind then the request. You can find an example at FuWeSta-Sample.
It uses a helper-bean which must be initialized by Spring.

Detach an entity from JPA/EJB3 persistence context

What would be the easiest way to detach a specific JPA Entity Bean that was acquired through an EntityManager. Alternatively, could I have a query return detached objects in the first place so they would essentially act as 'read only'?
The reason why I want to do this is becuase I want to modify the data within the bean - with in my application only, but not ever have it persisted to the database. In my program, I eventually have to call flush() on the EntityManager, which would persist all changes from attached entities to the underyling database, but I want to exclude specific objects.

(may be too late to answer, but can be useful for others)
I'm developing my first system with JPA right now. Unfortunately I'm faced with this problem when this system is almost complete.
Simply put. Use Hibernate, or wait for JPA 2.0.
In Hibernate, you can use 'session.evict(object)' to remove one object from session. In JPA 2.0, in draft right now, there is the 'EntityManager.detach(object)' method to detach one object from persistence context.

No matter which JPA implementation you use, Just use entityManager.detach(object) it's now in JPA 2.0 and part of JEE6.

If you need to detach an object from the EntityManager and you are using Hibernate as your underlying ORM layer you can get access to the Hibernate Session object and use the Session.evict(Object) method that Mauricio Kanada mentioned above.
public void detach(Object entity) {
org.hibernate.Session session = (Session) entityManager.getDelegate();
session.evict(entity);
}
Of course this would break if you switched to another ORM provider but I think this is preferably to trying to make a deep copy.

Unfortunately, there's no way to disconnect one object from the entity manager in the current JPA implementation, AFAIR.
EntityManager.clear() will disconnect all the JPA objects, so that might not be an appropriate solution in all the cases, if you have other objects you do plan to keep connected.
So your best bet would be to clone the objects and pass the clones to the code that changes the objects. Since primitive and immutable object fields are taken care of by the default cloning mechanism in a proper way, you won't have to write a lot of plumbing code (apart from deep cloning any aggregated structures you might have).

As far as I know, the only direct ways to do it are:
Commit the txn - Probably not a reasonable option
Clear the Persistence Context - EntityManager.clear() - This is brutal, but would clear it out
Copy the object - Most of the time your JPA objects are serializable, so this should be easy (if not particularly efficient).

If using EclipseLink you also have the options,
Use the Query hint, eclipselink.maintain-cache"="false - all returned objects will be detached.
Use the EclipseLink JpaEntityManager copy() API to copy the object to the desired depth.

If there aren't too many properties in the bean, you might just create a new instance and set all of its properties manually from the persisted bean.
This could be implemented as a copy constructor, for example:
public Thing(Thing oldBean) {
this.setPropertyOne(oldBean.getPropertyOne());
// and so on
}
Then:
Thing newBean = new Thing(oldBean);

this is quick and dirty, but you can also serialize and deserialize the object.

Since I am using SEAM and JPA 1.0 and my system has a fuctinality that needs to log all fields changes, i have created an value object or data transfer object if same fields of the entity that needs to be logged. The constructor of the new pojo is:
public DocumentoAntigoDTO(Documento documentoAtual) {
Method[] metodosDocumento = Documento.class.getMethods();
for(Method metodo:metodosDocumento){
if(metodo.getName().contains("get")){
try {
Object resultadoInvoke = metodo.invoke(documentoAtual,null);
Method[] metodosDocumentoAntigo = DocumentoAntigoDTO.class.getMethods();
for(Method metodoAntigo : metodosDocumentoAntigo){
String metodSetName = "set" + metodo.getName().substring(3);
if(metodoAntigo.getName().equals(metodSetName)){
metodoAntigo.invoke(this, resultadoInvoke);
}
}
} catch (IllegalArgumentException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
} catch (InvocationTargetException e) {
e.printStackTrace();
}
}
}
}

In JPA 1.0 (tested using EclipseLink) you could retrieve the entity outside of a transaction. For example, with container managed transactions you could do:
public MyEntity myMethod(long id) {
final MyEntity myEntity = retrieve(id);
// myEntity is detached here
}
#TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
public MyEntity retrieve(long id) {
return entityManager.find(MyEntity.class, id);
}

Do deal with a similar case I have created a DTO object that extends the persistent entity object as follows:
class MyEntity
{
public static class MyEntityDO extends MyEntity {}
}
Finally, an scalar query will retrieve the desired non managed attributes:
(Hibernate) select p.id, p.name from MyEntity P
(JPA) select new MyEntity(p.id, p.name) from myEntity P

If you get here because you actually want to pass an entity across a remote boundary then you just put some code in to fool the hibernazi.
for(RssItem i : result.getChannel().getItem()){
}
Cloneable wont work because it actually copies the PersistantBag across.
And forget about using serializable and bytearray streams and piped streams. creating threads to avoid deadlocks kills the entire concept.

I think there is a way to evict a single entity from EntityManager by calling this
EntityManagerFactory emf;
emf.getCache().evict(Entity);
This will remove particular entity from cache.

Im using entityManager.detach(returnObject);
which worked for me.

I think you can also use method EntityManager.refresh(Object o) if primary key of the entity has not been changed. This method will restore original state of the entity.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.