I am currently working on a client (java/swing) server (tomcat/hibernate/jax-ws) application that requires many database operations and should be able to execute long-running background tasks on the server-side. I chose this setup mainly for a better code reuse.
However, there are some issues that, probably, many others have also faced and found solutions for:
One of the biggest issues was lazy-loading vs. jax-ws. There were some viable solutions like overriding the jax-ws accessors (JAX-WS + Hibernate + JAXB: How to avoid LazyInitializationException during marshalling) that solved this issue be replacing the hibernate's proxies with null.
Now I'm facing new problems described by this example:
An entity "customer" is located within a "country", thus: n:1 relationship.
The "country" within the "customer" is marked as lazy-loaded to avoid unnecessary database traffic. When the client UI wants to list customers (the country is not needed here), the country-proxy is replaced by null within the jax-ws accessor and everything is fine.
When editing a customer, however, (I think) I must join the country, even when not viewing/changing it. Otherwise its proxy would be replaced by null when sent to the client via jax-ws, then sent back to the server, and committed (with null) into the database. Hereafter my customer->country association is lost.
Maybe there are several solutions like:
marking the country as "optional=false" triggering an exception when I forgot to join the country beforehand and then try to save the customer. Using this approach I must always join all references even when they are not part of the editing process. References requiring "optional=true" would pass silently and coding mistakes might destroy the database.
not replacing the proxy by null within the jax-ws accessor, but some other dummy class that, when sent back from the client to the server, is replaced by the original proxy. But I'm not sure whether this is feasible at all.
use hibernate within the client and connect directly to the database, using jax-ws only for non-database interaction
write some code to allow lazy-loading within the client (when necessary) by sending corresponding jax-ws requests (couldn't find the StackOverflow link anymore where someone asked for something like this). Totally feels like reinventing hibernate...
Are there any other solutions, recommendations, best-practices, better setups for this kind of application?
Thx in advance!
You apparently store data differently compared to how you transfer it. So it might make sense NOT to use the same object instances for transfer and for storage.
One of the solutions is to used different classes for that - DTOs and entities. You store entities but transfer DTOs. This takes additional effort to implement DTOs and mapping DTOs<->entities but this gives you clear separation of layers and may be even more efficient (from the effort point of view) in the long run. You can use Dozer and likes for mapping between DTOs an entities.
Another approach is to use not different classes but different instances of objects for transfer and storage. This is probably similar to #VinZ answer. You "merge" data from source object into the target object. I wrote a JAXB plugin to generate such merge methods some time ago and found the approache very useful in different use cases.
With this approach you save significant amount of effort compared to DTOs, but don't have a layer separation on the class level.
I personally would go with well-developed and polished structure of entities and extra DTOs optimized for transfer. I'd also try to autogenerate the merge/copy methods somehow to avoid the need of writing the manually. A JAXB plugin maybe. I love writing JAXB plugins so if something can be solved with a JAXB plugin, I'd solve it with a JAXB plugin.
Hope this helps.
The problem you describe is not restricted to your "JAX-WS to Hibernate" scenario. Also in other szenarios you will face this "null-value" problem.
One solution is the "DIY merge pattern":
Send the entity from client to server.
On the server invoke "EntityManager.find" with the received ID to find the existing entity.
Now copy over the state ourselves. If EntityManager.find returns null, its new -> just persist the received object.
Example:
Customer serverCustomer = dao.findById(receivedCustomer.getId());
if(serverCustomer == null) {
dao.persist(clientCustomer);
} else {
serverCustomer.setName(receivedCustomer.getName());
serverCustomer.setDate(receivedCustomer.getDate());
// ... all other fields, except "Country"
if (receivedCustomer.getCountry() != null) {
// Country keeps its server state if no new data
serverCustomer.setDate(receivedCustomer.getCountry());
}
}
Related
REST API - DTOs or not?
I would like to re-ask this question in Microservices' context. Here is the quote from original question.
I am currently creating a REST-API for a project and have been reading
article upon article about best practices. Many seem to be against
DTOs and simply just expose the domain model, while others seem to
think DTOs (or User Models or whatever you want to call it) are bad
practice. Personally, I thought that this article made a lot of sense.
However, I also understand the drawbacks of DTOs with all the extra
mapping code, domain models that might be 100% identical to their
DTO-counterpart and so on.
Now, My question
I am more aligned towards using one Object through all the layers of my application (In other words, just expose Domain Object rather than creating DTO and manually copying over each fields). And the differences in my Rest contract vs domain object can be addressed using Jackson annotations like #JsonIgnore or #JsonProperty(access = Access.WRITE_ONLY) or #JsonView etc). Or if there is one or two fields that needs a transformation which cannot be done using Jackson Annotation, then I will write custom logic to handle just that (Trust me, I haven't come across this scenario not even once in my 5+ years long journey in Rest services)
I would like to know if I am missing any real bad effects for not copying the Domain to DTO
I would vote for using DTOs and here is why:
Different requests (events) and your DB entities. Often it happens that your requests/responses different from what you have in the domain model. Especially it makes sense in microservice architecture, where you have a lot of events coming from other microservices. For instance, you have Order entity, but the event you get from another microservice is OrderItemAdded. Even if half of the events (or requests) are the same as entities it still does make sense to have a DTOs for all of them in order to avoid a mess.
Coupling between DB schema and API you expose. When using entities you basically expose how you model your DB in a particular microservice. In MySQL you probably would want to have your entities to have relations, they will be pretty massive in terms of composition. In other types of DBs, you would have flat entities without lots of inner objects. This means that if you use entities to expose your API and want to change your DB from let's say MySQL to Cassandra - you'll need to change your API as well which is obviously a bad thing to have.
Consumer Driven Contracts. Probably this is related to the previous bullet, but DTOs makes it easier to make sure that communication between microservices is not broken whilst their evolution. Because contracts and DB are not coupled this is just easier to test.
Aggregation. Sometimes you need to return more than you have in one single DB entity. In this case, your DTO will be just an aggregator.
Performance. Microservices implies a lot of data transferring over the network, which may cost you issues with performance. If clients of your microservice need less data than you store in DB - you should provide them less data. Again - just make a DTO and your network load will be decreased.
Forget about LazyInitializationException. DTOs doesn't have any lazy loading and proxying as opposed to domain entities managed by your ORM.
DTO layer is not that hard to support with right tools. Usually, there is a problem when mapping entities to DTOs and backwards - you need to set right fields manually each time you want to make a conversion. It's easy to forget about setting the mapping when adding new fields to the entity and to the DTO, but fortunately, there are a lot of tools that can do this task for you. For instance, we used to have MapStruct on our project - it can generate conversion for you automatically and in compile time.
The Pros of Just exposing Domain Objects
The less code you write, the less bugs you produce.
despite of having extensive (arguable) test cases in our code base, I have came across bugs due to missed/wrong copying of fields from domain to DTO or viceversa.
Maintainability - Less boiler plate code.
If I have to add a new attribute, I don't have to add in Domain, DTO, Mapper and the testcases, of course. Don't tell me that this can be achieved using a reflection beanCopy utils, it defeats the whole purpose.
Lombok, Groovy, Kotlin I know, but it will save me only getter setter headache.
DRY
Performance
I know this falls under the category of "premature performance optimization is the root of all evil". But still this will save some CPU cycles for not having to create (and later garbage collect) one more Object (at the very least) per request
Cons
DTOs will give you more flexibility in the long run
If only I ever need that flexibility. At least, whatever I came across so far are CRUD operations over http which I can manage using couple of #JsonIgnores. Or if there is one or two fields that needs a transformation which cannot be done using Jackson Annotation, As I said earlier, I can write custom logic to handle just that.
Domain Objects getting bloated with Annotations.
This is a valid concern. If I use JPA or MyBatis as my persistent framework, domain object might have those annotations, then there will be Jackson annotations too. In my case, this is not much applicable though, I am using Spring boot and I can get away by using application-wide properties like mybatis.configuration.map-underscore-to-camel-case: true , spring.jackson.property-naming-strategy: SNAKE_CASE
Short story, at least in my case, cons doesn't outweigh the pros, so it doesn't make any sense to repeat myself by having a new POJO as DTO. Less code, less chances of bugs. So, going ahead with exposing the Domain object and not having a separate "view" object.
Disclaimer: This may or may not be applicable in your use case. This observation is per my usecase (basically a CRUD api having 15ish endpoints)
The decision is a much simpler one in case you use CQRS because:
for the write side you use Commands that are already DTOs; Aggregates - the rich behavior objects in your domain layer - are not exposed/queried so there is no problem there.
for the read side, because you use a thin layer, the objects fetched from the persistence should be already DTOs. There should be no mapping problem because you can have a readmodel for every use case. In worst case you can use something like GraphQL to select only the fields you need.
If you do not split the read from write then the decision is harder because there are tradeoffs in both solutions.
Restful resources do not always have a one-to-one mapping with your jpa entities. As I see it there are a few problems that I am trying to figure out how to handle:
When a resource has information that is populated and saved by more than one entity.
When an entity has more information in it that you want to send down as a resource. I could just use Jackson's #JsonIgnore but I would still have issue 1, 3 and 4.
When an entity (like an aggregate root) has nested entities and you want to include part of its nested entities but only to a certain level of nesting as your resource.
When you want to exclude once piece of an entity when its part of one parent entity but exclude a separate piece when its part of a different parent entity.
Blasted circular references (I got this mostly working with JSOG using Jackson's #JsonIdentityInfo)
Possible solutions:
The only way I could think of that would handle all of these issues would be to create a whole bunch of "resource" classes that would have constructors that took the needed entities to construct the resource and put necessary getters and setters for that resource on it. Is that overkill?
To solve 2, 3, 4 , and 5 I could just do some pre and post processing on the actual entity before sending it to Jackson to serialize or deserialize my pojo into JSON, but that doesn't address issue 1.
These are all problems I would think others would have come across and I am curious what solutions other people of come up with. (I am currently using JPA 2, Spring MVC, Jackson, and Spring-Data but open to other technologies)
With a combination of JAX_RS 1.1 and Jackson/GSON you can expose JPA entities directly as REST resources, but you will run into a myriad of problems.
DTOs i.e. projections onto the JPA entities are the way to go. It would allow you to separate the resource representation concerns of REST from the transactional concerns of JPA. You get to explicitly define the nature of the representations. You can control the amount of data that appears in the representation, including the depth of the object graph to be traversed, if you design your DTOs/projections carefully. You may need to create multiple DTOs/projections for the same JPA entity for the different resources in which the entity may need to be represented differently.
Besides, in my experience using annotations like #JsonIgnore and #JsonIdentityInfo on JPA entities doesnt exactly lend to more usable resource representations. You may eventually run into trouble when merging the objects back into the persistence context (because of ignored properties), or your clients may be unable to consume the resource representations, since object references as a scheme may not be understood. Most JavaScript clients will usually have trouble consuming object references produced by the #JsonidentityInfo annotation, due to the lack of standardization here.
There are other additional aspects that would be possible through DTOs/projections. JPA #EmbeddedIds do not fit naturally into REST resource representations. Some advocate using the JAX-RS #MatrixParam annotation to identify the resource uniquely in the resource URIs, but this does not work out of the box for most clients. Matrix parameters are after all only a design note, and not a standard (yet). With a DTO/projection, you can serve out the resource representation against a computed Id (could be a combination of the constituent keys).
Note: I currently work on the JBoss Forge plugin for REST where some or all of these issues exist and would be fixed in some future release via the generation of DTOs.
I agree with the other answers that DTOs are the way to go. They solve many problems:
Separation of layers and clean code. One day you may need to expose the data model using a different format (eg. XML) or interface (eg. non web-service based). Keeping all configuration (such as #JsonIgnore, #JsonidentityInfo) for each interface/format in domain model would make is really messy. DTOs separate the concerns. They can contain all the configuration required by your external interface (web-service) without involving changes in domain model, which can stay web-service and format agnostic.
Security - you easily control what is exposed to the client and what the client is allowed to modify.
Performance - you easily control what is sent to the client.
Issues such as (circular) entity references, lazily-loaded collections are also resolved explicitly and knowingly by you on converting to DTO.
Given your constraints, there looks to be no other solution than Data Transfer Objects - yes, it's occurring frequently enough that people named this pattern...
If you application is completely CRUDish then the way to go is definitely Spring Data REST in which you absolutely do not need DTOs. If it's more complicated than that you will be safer with DTOs securing the application layer. But do not attempt to encapsulate DTOs inside the controller layer. They belong to a service layer cause the mapping is also the part of logic (what you let in the application and what you let out of it). This way the application layer stays hermetic. Of course in most cases it can be the mix of those two.
My question is this: Is there ever a role for JPA merge in a stateless web application?
There is a lot of discussion on SO about the merge operation in JPA. There is also a great article on the subject which contrasts JPA merge via a more manual Do-It-Yourself process (where you find the entity via the entity manager and make your changes).
My application has a rich domain model (ala domain-driven design) that uses the #Version annotation in order to make use of optimistic locking. We have also created DTOs to send over the wire as part of our RESTful web services. The creation of this DTO layer also allows us to send to the client everything it needs and nothing it doesn't.
So far, I understand this is a fairly typical architecture. My question is about the service methods that need to UPDATE (i.e. HTTP PUT) existing objects. In this case we have these two approaches 1) JPA Merge, and 2) DIY.
What I don't understand is how JPA merge can even be considered an option for handling updates. Here's my thinking and I am wondering if there is something I don't understand:
1) In order to properly create a detached JPA entity from a wire DTO, the version number must be set correctly...else an OptimisticLockException is thrown. But the JPA spec says:
An entity may access the state of its version field or property or
export a method for use by the application to access the version, but
must not modify the version value[30]. Only the persistence provider
is permitted to set or update the value of the version attribute in
the object.
2) Merge doesn't handle bi-directional relationships ... the back-pointing fields always end up as null.
3) If any fields or data is missing from the DTO (due to a partial update), then the JPA merge will delete those relationships or null-out those fields. Hibernate can handle partial updates, but not JPA merge. DIY can handle partial updates.
4) The first thing the merge method will do is query the database for the entity ID, so there is no performance benefit over DIY to be had.
5) In a DYI update, we load the entity and make the changes according to the DTO -- there is no call to merge or to persist for that matter because the JPA context implements the unit-of-work pattern out of the box.
Do I have this straight?
Edit:
6) Merge behavior with regards to lazy loaded relationships can differ amongst providers.
Using Merge does require you to either send and receive a complete representation of the entity, or maintain server side state. For trivial CRUD-y type operations, it is easy and convenient. I have used it plenty in stateless web apps where there is no meaningful security hazard to letting the client see the entire entity.
However, if you've already reduced operations to only passing the immediately relevant information, then you need to also manually write the corresponding services.
Just remember that when doing your 'DIY' update you still need to pass a Version number around on the DTO and manually compare it to the one that comes out of the database. Otherwise you don't get the Optimistic Locking that spans 'user think-time' that you would have if you were using the simpler approach with merge.
You can't change the version on an entity created by the provider, but when you have made your own instance of the entity class with the new keyword it is fine and expected to set the version on it.
It will make the persistent representation match the in-memory representation you provide, this can include making things null. Remember when an object is merged that object is supposed to be discarded and replaced with the one returned by merge. You are not supposed to merge an object and then continue using it. Its state is not defined by the spec.
True.
Most likely, as long as your DIY solution is also using the entity ID and not an arbitrary query. (There are other benefits to using the 'find' method over a query.)
True.
I would add:
7) Merge translates to insert or to update depending on the existence of the record on DB, hence it does not deal correctly with update-vs-delete optimistic concurrency. That is, if another user concurrently deletes the record and you update it, it must (1) throw a concurrency exception... but it does not, it just inserts the record as new one.
(1) At least, in most cases, in my opinion, it should. I can imagine some cases where I would want this use case to trigger a new insert, but they are far from usual. At least, I would like the developer to think twice about it, not just accept that "merge() == updateWithConcurrencyControl()", because it is not.
I am developing an application in Flex, using Blaze DS to communicate with a Java back-end, which provides persistence via JPA (Eclipse Link).
I am encountering issues when passing JPA entities to Flex via Blaze DS. Blaze DS uses reflection to convert the JPA entity into an ObjectProxy (effectively a HashMap) by calling all getter methods on the entity. This includes any lazy-initialised one/many-to-many relationships.
You can probably see where I am going. If I pass a single object through JPA this will call all one/many-to-many methods on this object. For each returned object if they have one/many-to-many relationships they will be called too. As such, by passing back a single JPA entity I actually end up doing multiple database calls and passing all related entries back as a single ObjectProxy instance!
My solution to date is to create a translator to convert each entity to an ObjectProxy and vice-versa. This is clearly cumbersome and there must be a better way.
Thoughts please?
As an alternative, you could consider using GraniteDS instead of BlazeDS: GraniteDS has a much more powerful data management stack than BlazeDS (it competes more with LCDS) and fully support lazy-loading for all major JPA engines: Hibernate, EclipseLink, OpenJPA, etc.
Moreover, GraniteDS has a great client-side transparent lazy loading feature and even a so-called reverse lazy-loading mechanism.
And you don't need any kind of intermediate DTOs: it serializes JPA entities as is and uses code-generated ActionScript beans on the client-side to keep their initialization states.
Unfortunately, lazy-loading is not easy to accomplish with Flash clients. There are some working solutions, like dpHibernate, but so far all the different solutions I have tested fall short of what you would expect in terms of performance and ease of use.
So in my experience, it is the best and most reliable solution to always use DTOs, which adds the benefit of cleanly separating the database and view layers. This necessitates, though, that you implement either eager loading, or a second server round trip to resolve your many-to-many relations, as well as a good deal more boilerplate code to copy the DAO and DTO field values.
Which one to choose depends on your use case: Sometimes getting only the main object's fields might be enough, then you could simply omit the List of related objects from your DTO (transfer only those values you need for your query). Sometimes you may actually need the entire list of related entities, and then you could get it via eager loading, or by setting up a second remote object to find only the list.
EclipseLink also provides a copyObject() API that allows you to give a copy group of exactly what attribute you want. You could then use this copy to avoid having the relationships that you do not want.
If you have a detached object, you could just null out the fields that you do not want as well, or use a DTO.
I'm asking this question given my chosen development frameworks of JPA (Hibernate implementation of), Spring, and <insert MVC framework here - Struts 1, Struts 2, Spring MVC, Stripes...>.
I've been thinking a bit about relationships in my entity layer - for example I have an order entity that has many order lines. I've set up my app so that it eagerly loads the order lines for every order. Do you think this is a lazy way to get around the lazy initialization problems that I would come across if I was to set the fetch strategy to false?
The way I see it, I have the following alternatives when retrieving entities and their associations:
Use the Open Session In View pattern to create the session on each request and commit the transaction before returning the response.
Implement a DTO (Data Transfer Object) layer such that every DAO query I execute returns the correctly initialized DTO for my purposes. I don't really like this option much because in my experience I've found that it creates a lot of boilerplate copying code and becomes messy to maintain.
Don't map any associations in JPA so that every query I execute returns only the entities I'm interested in - this will probably require me to have DTOs anyway and will be a pain to maintain and I think defeats the purpose of having an ORM in the first place.
Eagerly fetch all (or most associations) - in the example above, always fetch all order lines when I retrieve an order.
So my question is, when and under what circumstances would you use which of these options? Do you always stick with one way of doing it?
I would ask a colleague but I think that if I even mentioned the term 'Open Session in View' I would be greeted with blank stares :( What I'm really looking for here is some advice from a senior or very experienced developer.
Thanks guys!
Open Session in View has some problems.
For example, if the transaction fails, you might know it too late at commit time, once you are nearly done rendering your page (possibly the response already commited, so you can't change the page !) ... If you had know that error before, you would have followed a different flow and ended up rendering a different page...
Other example, reading data on-demand might turn to many "N+1 select" problems, that kill your performance.
Many projects use the following path:
Maintain transactions at the business layer ; load at that point everything you are supposed to need.
Presentation layer runs the risk of LazyExceptions : each is considered a programming error, caught during tests, and corrected by loading more data in the business layer (you have the opportunity to do it efficiently, avoiding "N+1 select" problems).
To avoid creating extra classes for DTOs, you can load the data inside the entity objects themselves. This is the whole point of the POJO approach (uses by modern data-access layers, and even integration technologies like Spring).
I've successfully solved all my lazy initialization problems with Open Session In View -pattern (ie. the Spring implementation). The technologies I used were the exact same as you have.
Using this pattern allows me to fully map the entity relationships and not worry about fetching child entities in the dao. Mostly. In 90% of the cases the pattern solves the lazy initialization needs in the view. In some cases you'll have to "manually" initialize relationships. These cases were rare and always involved very very complex mappings in my case.
When using Open Entity Manager In View pattern it's important to define the entity relationships and especially propagation and transactional settings correctly. If these are not configured properly, there will be errors related to closed sessions when some entity is lazily initialized in the view and it fails due to the session having been closed already.
I definately would go with option 1. Option 2 might be needed sometimes, but I see absolutely no reason to use option 3. Option 4 is also a no no. Eagerly fetching everything kills the performance of any view that needs to list just a few properties of some parent entities (orders in tis case).
N+1 Selects
During development there will be N+1 selects as a result of initializing some relationships in the view. But this is not a reason to discard the pattern. Just fix these problems as they arise and before delivering the code to production. It's as easy to fix these problems with OEMIV pattern as it's with any other pattern: add the proper dao or service methods, fix the controller to call a different finder method, maybe add a view to the database etc.
I have successfully used the Open-Session-in-View pattern on a project. However, I recently read in "Spring In Practice" of an interesting potential problem with non-repeatable reads if you manage your transactions at a lower layer while keeping the Hibernate session open in the view layer.
We managed most of our transactions in the service layer, but kept the hibernate session open in the view layer. This meant that lazy reads in the view were resulting in separate read transactions.
We managed our transactions in our service layer to minimize transaction duration. For instance, some of our service calls resulted in both a database transaction and a web service call to an external service. We did not want our transaction to be open while waiting for a web service call to respond.
As our system never went into production, I am not sure if there were any real problems with it, but I suspect that there was the potential for the view to attempt to lazily load an object that has been deleted by someone else.
There are some benefits of DTO approach though. You have to think beforehand what information you need. In some cases this will prevent you from generating n+1 select statements. It helps also to see where to use eager fetching and/or optimized views.
I'll also throw my weight behind the Open-Session-in-View pattern, having been in the exact same boat before.
I work with Stripes without spring, and have created a manual filter before that tends to work well. Coding transaction logic on the backend turns messy really quick as you've mentioned. Eagerly fetching everything becomes TERRIBLE as you map more and more objects to each other.
One thing I want to add that you may not have come across is Stripersist and Stripernate - Stripersist being the more JPA flavor - auto-hydration filters that take a lot of the work off your shoulders.
With Stripersist you can say things like /appContextRoot/actions/view/3 and it will auto-hydrate the JPA Entity on the ActionBean with id of 3 before the event is executed.
Stripersist is in the stripes-stuff package on sourceforge. I now use this for all new projects, as it's clean and easily supports multiple datasources if necessary.
Does the Order and Order Lines compose a high volume of data? Do they take part in online processes where real-time response is required? If so, you might consider not using eager fetching - it does make a huge diference in performance. If the amount of data is small, there is no problem in eager fetching.
About using DTOs, it might be a viable implementation.
If your business layer is used internally by your own application (i.e a small web app and its business logic) it'd probably be best to use your own entities in your view with open session in view pattern since it's simpler.
If your entities are used by many applications (i.e a backend application providing a service in your corporation) it'd be interesting to use DTOs since you would not expose your model to your clients. Exposing it could mean you would have a harder time refactoring your model since it could mean breaking contracts with your clients. A DTO would make that easier since you have another layer of
abstraction. This can be a bit strange since EJB3 would theorically eliminate the need of DTOs.