JPA, complex object graphs and Serialisation

JPA, complex object graphs and Serialisation - java

I have a "Project" entity/class that includes a number of "complex" fields, eg referenced as interfaces with many various possible implementations. To give an example: an interface Property, with T virtually of any type (as many types as I have implemented).
I use JPA. For those fields I have had no choice but to actually serialize them to store them. Although I have no need to use those objects in my queries, this is obviously leading to some issues, eg maintenance/updates to start with.
I have two questions:
1) is there a "trick" I could consider to keep my database up to date in case I have a "breaking" change in my serialised class (most of the time serialisation changes are handled well)?
2) will moving to JDO help at all? I very little experience with JDO but my understanding is that with JDO, having serialised objects in the tables will never happen (how are changes handled though?).
In support to 2) I must also add that the object graphs I have can be quite complex, possibly involving 10s of tables just to retrieve a full "Project" for instance.

JDO obviously supports persistence of interface fields (but then DataNucleus JPA also allows their persistence, but as vendor extension). Having some interface field being one of any possible type presents problems to RDBMS rather than to JDO as such. The underlying datastore is more your problem (in not being able to adequately mirror your model), and one of the many many other datastores could help you with that. For example DataNucleus JDO/JPA supports GAE/Datastore, Neo4j, MongoDB, HBase, ODF, Excel, etc and simply persists the "id" of the related object in a "column" (or equivalent) in the owning object representation ... so such breaking changes would be much much less than what you have now

Related

Using same Entity classes in different spring data repositories

I'm trying to put together a project in which I have to persist some entity classes using different spring data repositories (gemfire, jpa, mongodb etc). As the data is more or less the same that needs to go into these repositories, I was wondering if I can use the same entity class for all of them to save me from converting from one object to another?
I got it working for gemfire and jpa but the entity class is already starting to looking a bit wired.
#Id // spring-data-gemfire
#javax.persistence.Id // jpa
#GeneratedValue
private Long id;
So far I can see following options:
Create an interface based separate Entity (domain) classes - Trying to re-use same class looks like a bit of premature optimization.
Externalize xml based mapping for JPA, not sure if gemfire and mongodb mapping can be externalized.
Use different concrete entity classes and use some copy constructor/converter for the conversion.
Been literally hitting my head against the wall to find the best approach - Any response is much appreciated. Thanks

If by weird, you mean your application domain objects/entity classes are starting to accumulate many different, but separate (mapping) annotations (some semantically the same even, e.g. SD Common's o.s.data.annotation.Id and JPA's #javax.persistence.Id) for the different data stores in which those entities will be persisted, then I suppose that is understandable.
The annotation pollution only increases too as the number of representations for your entities increases. For example, think Jackson annotations for JSON mapping or JAXB for XML, etc. Pretty soon, you have more meta-data then actual data, :-)
However, it is more a matter of preference, convenience, simplicity, really.
Some developers are purists and like to externalize everything. Others like to keep information (meta-data) close to the code using it. Even certain patterns have emerged to address these type of concerns... DTOs, Bounded Contexts (see Fowler's BoundedContext, which has a strong correlation to DDD and Microservices).
Personally, I use the following rules when designing and applying architectural principals/decisions in my code, especially when introducing something new:
Simplicity
Consistency
DRY
Test
Refactor
(along with a few others as well... good OOD, SoC, SOLID, Design Patterns, etc).
In that order too. If something starts getting too complex, refactor and simplify it. Be consistent in what you do by following/using patterns, conventions; familiarity is 1 key to consistency. But, don't keep repeating yourself either.
At the end of the day, it is really about maintaining the application. Will someone else who picks up where you left off be able to understand the organization and logic quickly, and be able to maintain it... simplicity is king. It does not mean it is so simple it is not viable or valuable. Even complex things can be simple if organized properly. However, breaking things apart and introducing abstractions can have hidden costs (see closing thoughts).
To more concretely answer (a few of) your questions...
I am not certain about MongoDB, but (Spring Data) GemFire does not have an external mapping. Minimally, #Region (on the entity class) and #Id are required, along with #PersistenceConstructor if your entity class has more than 1 constructor. For example.
This sounds sneakingly like to DTOs. Personally, I think BoundContexts are a better, more natural model of the application's data since the domain model should not be unduly tied to any persistent store or external representation (e.g. JSON, XML, etc). The application domain model is the 1 true state of the application and it should model the concept that is represents in a natural way, not superficially to satisfy some representation or persistent store (hence the mapping/conversion).
Anyway, try not to beat yourself up too much. It is all about managing complexity. Try to let yourself just do and use testing and other feedback loops to find an answer that is right for your application. You'll know.
Hope this helps.

JPA performance optimization or alternatives

We are currently in a project with a high demand on performance when it comes to reads from the database.
We are currently using JPA (EclipseLink implementation), currently just because it provides convenient database access and column mapping.
For our queries we are using highly specific SQL queries. We are also using one database (SAP HANA, in-memory), so a language abstraction is not required. The database access is pretty fast, our current bottleneck really is the application server, especially the persistence layer.
The result sets often also do not contain entities because entities are made up of the context. For us, there is no point in using an #Id field like the following, because we don't have fields that are unique (only combinations, but defining an IdClass is too much overhead).
#Entity
public class Item {
#Id
public myField;
// other fields...
}
This seems to be enforced by JPA if I want to run a typed native query. Is that assumption true? Currently we haven't found a way around the ID mapping.
Are these findings valid?
If not, how can we make our use of JPA more performant (there is significant latency compared to plain JDBC), also without defining an #Id (because it is useless in our case) for result types?
If yes, is there another Java library that just provides a minimum layer on top of JDBC without too much latency that provides a more convenient use than plain JDBC (with column mapping and all that good stuff).
Thanks!
Usecase: We would like to stream historic GPS sensor data from the database. Besides just transforming this to JSON, we also do some transformations/validations. That's why we actually need to build objects. So what we basically looking for is a convenient way of mapping the fields of select statements to attributes. I hope that makes sense.

There are many articles and blogs about improving EclipseLink/JPA performance that you might look into, such as EclipseLink Performance, JPA Performance Tuning and Optimizing the EclipseLink Application
In the end though it all depends very much on your specific use case and any future use cases you may want. JPA is designed to make reading and writing overtop of JDBC easier and more maintainable and adds many performance benefits such as caching. If all you are using it for is to read raw data though, the extra layer might be extra overhead that isn't adding any value. There isn't much point to having JPA build you entities from the resultsets, maintain the cache and watch for changes only for your application to ignore it all and grab the raw data.
I do not understand why you would have an Item table with a single myField. How is it used by the application and how does it relate to other tables and potential entities?
Such a construct is not the normal use case for relational databases and ORMs, but there are still ways around it in JPA. The data could be used in element collections by other entities, or even just not mapped, and native SQL queries used which are passed straight through the JDBC layer. EclipseLink itself has many mapping types and options above and beyond JPA that might be used depending on your use cases.

Java object references in a database [duplicate]

I don't know if this is the right title for this question.
Anyway, recently I have heard about that you could make life easier when creating database. By in which you use object based database. It will make migration to other type of database also easier e.g. from MySQL to SQLlite or something else.
Anyway the main way I do a webpage with database access now is that I manually write down the Query to fetch what I need from a database. However it can be done in some other way also which does not involve I have to write query. I want to know how this other method work. How to search it in Google.

Object DB
High performance
Faster as no joins required
Inherent versioning mechanism
Navigational interface for operations (like graph traversal)
Object Query Language retrieve objects declaratively
complex data types
object identity ie. equals() in which object identity is independent of value and updates
facilitates object sharing
classes and hierarchies (inheritance and encapsulation)
support for relationships
integrated with a persistence language like ODL
support for atomicity
support for nested relationships
semantic modelling
Cons
No mathematical foundation as RDB (refer Codd)
cons of object orientation
persistence difficult for complex structures, some data must be transient
Object-Relational databases (You might have seen UDTs!)
support for complex data types like collection, multisets etc
object oriented data modelling
extended SQL and rich types
support for UDT inhertance
powerful query language
Different approaches (OO, Relational DB or OODB) may be necessary for different applications
References
OODMS manifesto
ODMG
The Object-Oriented Database System Manifesto
Object Oriented Database Systems
Object Relational Databases in DBMS
Completeness Criteria for Object-Relational Database Systems
Comparisons
http://en.wikipedia.org/wiki/Comparison_of_object_database_management_systems
http://en.wikipedia.org/wiki/Comparison_of_object-relational_database_management_systems

It sounds like you are talking about JPA. You simply annotate your objects, and the database is setup according to the objects for you. The most used JPA implementation is Hibernate, and is very quick way of writing database enabled Java applications.
If you want more control over the database structure, you can do that via the annotations.
For more information on hibernate, check out http://www.hibernate.org/.

If you are using an object oriented database, you are not using a relational database like MySQL or SQLite.
Instead, the database directly stores your application objects, and you usually can query these with some query language or API.
I have only experience with db4o, there you simply do
database.store(object);
and your object is stored.

Flex Blaze DS and JPA - lazy-loading issues

I am developing an application in Flex, using Blaze DS to communicate with a Java back-end, which provides persistence via JPA (Eclipse Link).
I am encountering issues when passing JPA entities to Flex via Blaze DS. Blaze DS uses reflection to convert the JPA entity into an ObjectProxy (effectively a HashMap) by calling all getter methods on the entity. This includes any lazy-initialised one/many-to-many relationships.
You can probably see where I am going. If I pass a single object through JPA this will call all one/many-to-many methods on this object. For each returned object if they have one/many-to-many relationships they will be called too. As such, by passing back a single JPA entity I actually end up doing multiple database calls and passing all related entries back as a single ObjectProxy instance!
My solution to date is to create a translator to convert each entity to an ObjectProxy and vice-versa. This is clearly cumbersome and there must be a better way.
Thoughts please?

As an alternative, you could consider using GraniteDS instead of BlazeDS: GraniteDS has a much more powerful data management stack than BlazeDS (it competes more with LCDS) and fully support lazy-loading for all major JPA engines: Hibernate, EclipseLink, OpenJPA, etc.
Moreover, GraniteDS has a great client-side transparent lazy loading feature and even a so-called reverse lazy-loading mechanism.
And you don't need any kind of intermediate DTOs: it serializes JPA entities as is and uses code-generated ActionScript beans on the client-side to keep their initialization states.

Unfortunately, lazy-loading is not easy to accomplish with Flash clients. There are some working solutions, like dpHibernate, but so far all the different solutions I have tested fall short of what you would expect in terms of performance and ease of use.
So in my experience, it is the best and most reliable solution to always use DTOs, which adds the benefit of cleanly separating the database and view layers. This necessitates, though, that you implement either eager loading, or a second server round trip to resolve your many-to-many relations, as well as a good deal more boilerplate code to copy the DAO and DTO field values.
Which one to choose depends on your use case: Sometimes getting only the main object's fields might be enough, then you could simply omit the List of related objects from your DTO (transfer only those values you need for your query). Sometimes you may actually need the entire list of related entities, and then you could get it via eager loading, or by setting up a second remote object to find only the list.

EclipseLink also provides a copyObject() API that allows you to give a copy group of exactly what attribute you want. You could then use this copy to avoid having the relationships that you do not want.
If you have a detached object, you could just null out the fields that you do not want as well, or use a DTO.

Disadvantages of Object Relational Mapping

I am a fan of ORM - Object Relational Mapping and I have been using it with Rails for the past year and a half. Prior that, I use to write raw queries using JDBC and make Database do the heavy lifting via Stored Procedures. With ORM, I was initially happy to do stuff like coach.manager and manager.coaches which were very simple and easy to read.
But as time went by there were in-numerous associations creeping up and I ended up doing a.b.c.d which were firing queries in all directions, behind the scenes. With rails and ruby, the garbage collector went nuts and took insane time to load a very complex page which involves relatively lesser data. I had to replace this ORM style code by a simple Stored procedure and the result I saw was enormous. A page that took 50 seconds to load now takes only 2 seconds.
With this huge difference, should I continue using ORM? It is very clear it has severe overheads compared to a raw query.
In general, what are the general pitfalls of using an ORM framework like Hibernate, ActiveRecord?

An ORM is only a tool. If you don't use it correctly, you'll have bad results.
Nothing stops you from using dedicated HQL/criteria queries, with fetch joins or projections, to return the information that your page must display in as few queries as possible. This will take more or less the same time as dedicated SQL queries.
But of course, if you just get everything by ID and navigate through your objects without realizing how many queries it generates, it will lead to long loading times. The key is to know exactly what the ORM does behind the scene, and decide if it's appropriate or if another strategy must be adopted.

I think you've already identified the major tradeoff associated with ORM software. Every time you add a new layer of abstraction that tries to provide a generalized implementation of something that you used to do by hand there is going to be some loss of performance/efficiency.
As you noted, traversing multiple relationships such as a.b.c.d can be inefficient, because most ORM software will be doing an independent database query for each . along the way. But I'm not sure that means you should eliminate ORM altogether. Most ORM solutions (or at least, certainly Hibernate) allow you to specify custom queries where you can bring back exactly what you want in a single database operation. This should be about as fast as your dedicated SQL.
Really the issue is about understanding how the ORM layer is working behind the scenes, and realizing that while something like a.b.c.d is simple to write, what it causes the ORM layer to do as it is evaluated is not. As a general rule I always go with the simplest possible approach to begin, and then write optimized queries in areas where it makes sense/where it is obvious that the simple approach will not scale.

I'd say, one should use the appropriate tool for different tasks.
E.g., for CRUD operations, ORM frameworks like Hibernate can speed up development and it will perform well enough. Sometimes you need to do some necessary tweaks to achieve acceptable performance. I'm not sure, your task (what took 50 sec with Hibernate) could not be done properly with Hibernate, because you did not provide us with the details.
On the other hand, for example bulk operations involving hundreds of thousands of records is not the type of task you'd expect Hibernate will do without significant performance penalty.

As it was mentioned already, ORM is only a tool and you can use it eiter good or bad.
One of the most typical performance problems in ORMs is 1+N queries problem. It is caused by loading additional objects for each of objects from the list. This is caused by eager fetch of 1-to-n-relation entities for each element on list, the dealing is using HQL queries, specifying fields in projection or marking fetching 1-to-n relations to lazy.
Any time, you must exactly know what the ORM is doing in order to achieve good performance. Not understanding what operations are done in background is a way to disaster (slow, buggy and hard to analyze code because of unnecessary and wrongly written work-arounds).

I'm with Petar from your comments regarding the lazy fetching. Say you have an html table filled fields from object a.b.c.d. You could find your framework round-tripping the database thousands of times(possibly many more) . The disadvantage of ORM in this case is you have to read the documentation thoroughly. Most frameworks support disabling lazy fetching and many even support adding your own processing logic to bind the data set.
The net out is that almost any ORM is almost undoubtedly better than anything you are going to write yourself. You will find yourself saddled with maintaining huge libraries of boilerplate or worse writing the same code over and over again.

We are currently investigating to switch from our own data store layer with clean separation of transfer objects and data access objects to JPA. We used a generator to create the TOs, the DAOs and the SQL DDL as well from some documentation in docbook format. By this all of our stuff from documentation, the database structure and the generated Java classes where always in sync with a good documentation of the database itself.
What we discovered so far by using JPA:
Foreign key references cannot be used for imports, some special
queries and so on because they must not be placed in a managed
entity. JPA only allows the target class there.
Access to some user session scope is difficult upto impossible. We
still have no clue how to get the users id into the column
'userWhoLastMadeAnUpdate' in some PrePersist method.
Something expected to be quite easy with an ORM, namely "class
mapping" does not work at all. We are using HalDateTime
(http://sourceforge.net/projects/haldatetime/) internally.
Especially in the client. Mapping it with JPA directly is not
possible although HalDateTime supports it. Due to JPA restrictions
we have to use two fields in the entity.
JPA uses either one XML file to describe the mapping. So you have to
look at least into two files to even understand the relationship
between the Java class and the database. And the XML file becomes
huge for large applications.
Alternatively ORMs provide annotations in the Java class itself. So
its easier to learn and understand the relationship. But it forces
you to see all that database stuff in the client layer (which
completely breaks a proper layering).
You will have to restrict yourself to stay as close to a clean
database structure as anyhow possible. Otherwise you will for sure
end up with a mess of queries and statements by the ORM.
Use an ORM which provides a query language which is close to SQL
itself (JPA seems quite acceptable here). An ORM induced language
makes supporting a large application really expensive.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.