Serialization vs DB Storage - java

I am currently studying Java Serialization. However, I am still confused about its practicality. Generally speaking, when are we supposed to use serialisation in comparison to storing data directly in various database columns?

I think you are confusing Serialization with reading and writing an Object to a database.
As explained in this SO answer Do Hibernate table classes need to be Serializable? JPA objects (which hibernate is) should implement Serializable so that detached entities can be sent to other layers of your application possibly via RMI.
This has nothing to do with how Hibernate reads and writes data to a database. As long as you don't use detached objects you can get away with not having your entities implement Serializable and hibernate will still work just fine.
Hibernate reads and writes to a database via JDBC just like you would if you were writing the SQL queries yourself. If you want to learn more about how hibernate converts your object fields to JDBC methods you can start by looking at Hibernate UserType. Hibernate comes by default with a lot of built in User Types that can convert ResultSet columns of types String, Date, int etc to and from the database. If you need to write your own UserTypes which happens on occasion, you just have to write your own UserType implementation which is pretty simple.

Hibernate-enhanced classes can actually be Serializable. However, you should think about all the outcomes before you use it that way. You may encounter those problems:
if your class has collections of related DB entities, extra queries will be made to load those
if you have bidirectional relation in classes, you can experience a stack overflow
to prevent this behavior, you will need to specify the serialization somehow (e.g. using some #Json* annotations if you serialize to JSON)
if your class only contains IDs, you're fine (but you are losing a lot of Hibernate's goodness)
To understand those problems, you need to know how Hibernate actually works.
The enhancement allows the entity to be partially loaded. E.g., you want to get all books for given user. You get a collection of books, but what you really have is a collection of hibernate-enhanced classes, that only wrap IDs at the moment. Provided that you use the default lazy loading.
Unless you really need something else than IDs, the data will never be loaded. Upon a getter call, background queries are made to obtain the extra information, if needed.
Now, imagine a user has a collection of all his books, as a field. When lazily loaded from DB, there may be nothing at all. However, if you want to serialize it, all getters are called and you get all the books, and transitively, every book author, should the books have a relation to its authors ...
If the relation is bidirectional in the classes, you can even create a cycle that will cause a stack overflow error, where you look who's the book's owner and then you fetch books for him again.

Related

Spring data jpa method query findWith

Lately, I have come across this Spring Data JPA repository method findWithBooksById.
The two classes involved are very basic: Library one-to-many Books, and the method is querying for a library and its books.
I looked at https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#repositories.query-methods.details , but there is no reference to this method pattern (findWith...).
Looking at the query generated, it queries the library table and queries books immediately after. So two queries are called consecutively like if I have called getBooks right after a findById query (lazily initialized books in this case).
Does anyone know how findWith... works in Spring Data JPA?
Does anyone know how "findWith..." works in Spring Data JPA?
It doesn't.
The pattern used is that of find...By....
The second select is probably standard behavior of the JPA implementation used.
It might be that Books get eagerly loaded but can't get loaded in the initial query or that something accesses them and thereby triggers lazy loading.
It's impossible to tell without knowing the JPA implementation and the model classes involved.

JPA performance optimization or alternatives

We are currently in a project with a high demand on performance when it comes to reads from the database.
We are currently using JPA (EclipseLink implementation), currently just because it provides convenient database access and column mapping.
For our queries we are using highly specific SQL queries. We are also using one database (SAP HANA, in-memory), so a language abstraction is not required. The database access is pretty fast, our current bottleneck really is the application server, especially the persistence layer.
The result sets often also do not contain entities because entities are made up of the context. For us, there is no point in using an #Id field like the following, because we don't have fields that are unique (only combinations, but defining an IdClass is too much overhead).
#Entity
public class Item {
#Id
public myField;
// other fields...
}
This seems to be enforced by JPA if I want to run a typed native query. Is that assumption true? Currently we haven't found a way around the ID mapping.
Are these findings valid?
If not, how can we make our use of JPA more performant (there is significant latency compared to plain JDBC), also without defining an #Id (because it is useless in our case) for result types?
If yes, is there another Java library that just provides a minimum layer on top of JDBC without too much latency that provides a more convenient use than plain JDBC (with column mapping and all that good stuff).
Thanks!
Usecase: We would like to stream historic GPS sensor data from the database. Besides just transforming this to JSON, we also do some transformations/validations. That's why we actually need to build objects. So what we basically looking for is a convenient way of mapping the fields of select statements to attributes. I hope that makes sense.
There are many articles and blogs about improving EclipseLink/JPA performance that you might look into, such as EclipseLink Performance, JPA Performance Tuning and Optimizing the EclipseLink Application
In the end though it all depends very much on your specific use case and any future use cases you may want. JPA is designed to make reading and writing overtop of JDBC easier and more maintainable and adds many performance benefits such as caching. If all you are using it for is to read raw data though, the extra layer might be extra overhead that isn't adding any value. There isn't much point to having JPA build you entities from the resultsets, maintain the cache and watch for changes only for your application to ignore it all and grab the raw data.
I do not understand why you would have an Item table with a single myField. How is it used by the application and how does it relate to other tables and potential entities?
Such a construct is not the normal use case for relational databases and ORMs, but there are still ways around it in JPA. The data could be used in element collections by other entities, or even just not mapped, and native SQL queries used which are passed straight through the JDBC layer. EclipseLink itself has many mapping types and options above and beyond JPA that might be used depending on your use cases.

JPA - What exactly does it mean for an entity object to persist? What is the definition of persistence?

I'm fairly new to java web applications and I am undertaking the task of learning JPA. However, it is not explicitly clear what it means for an entity object to persist. I think I have an idea, but I would rather not assume its meaning.
I am referencing the Oracle JPA Doc, but they continue to use the words like "persist" or "persistence" when describing persistent fields/properties. Can someone shed some light on this idea of persistence? And maybe define what it means for an instance of an entity to be persistent?
And if you could not use the word "persistent" (or any form of the word) in your definition that would be much appreciated. A simple answer would be great, but more in-depth explanations are definitely welcome! Thanks so much!
Persistence simply means to Store Permanently.
In JAVA we work with Objects and try to store Object's values into database(RDBMS mostly).
JPA provides implementation for Object Relation Mapping(ORM) ,so that we can directly store Object into Database as a new Tuple.
Object, in JPA, are converted to Entity for mapping it to the Table in Database.
So Persisting an Entity means Permanently Storing Object(Entity) into Database.
Hope this Helps!!
"Persist" means "lives on after the application is shut down". The object is not just in volatile memory; it's in more permanent storage on disk. If the application is shut down, or the user ends their session and begins a new one, the old data is still available from permanent storage on disk.
Databases store information on disks, unless they are in-memory versions that give you the advantage of using SQL but little else. If you use a relational SQL database, you get a query language that makes it easy to Create/Read/Update/Delete information without having to worry about how it's stored on the disk.
SQL databases store relations on disk using different data structures (e.g. B-Tree). Relations are defined in terms of tables and columns. Each record in a table consists of a tuple of row values. Objects have to map tables and columns to objects and attributes using object-relational mapping. JPA generalizes this idea and builds it into Java EE, following the example of implementations like TopLink and Hibernate.
NoSQL databases, like MongoDB, also store information on disk as documents rather than relations.
Object databases serialize an object and all its children using formats like Java serialization, XML, JSON, or custom formats (e.g. Google protocol buffers).
Graph databases, like Neo4J, can be thought of as more general cases of object databases.

JPA, complex object graphs and Serialisation

I have a "Project" entity/class that includes a number of "complex" fields, eg referenced as interfaces with many various possible implementations. To give an example: an interface Property, with T virtually of any type (as many types as I have implemented).
I use JPA. For those fields I have had no choice but to actually serialize them to store them. Although I have no need to use those objects in my queries, this is obviously leading to some issues, eg maintenance/updates to start with.
I have two questions:
1) is there a "trick" I could consider to keep my database up to date in case I have a "breaking" change in my serialised class (most of the time serialisation changes are handled well)?
2) will moving to JDO help at all? I very little experience with JDO but my understanding is that with JDO, having serialised objects in the tables will never happen (how are changes handled though?).
In support to 2) I must also add that the object graphs I have can be quite complex, possibly involving 10s of tables just to retrieve a full "Project" for instance.
JDO obviously supports persistence of interface fields (but then DataNucleus JPA also allows their persistence, but as vendor extension). Having some interface field being one of any possible type presents problems to RDBMS rather than to JDO as such. The underlying datastore is more your problem (in not being able to adequately mirror your model), and one of the many many other datastores could help you with that. For example DataNucleus JDO/JPA supports GAE/Datastore, Neo4j, MongoDB, HBase, ODF, Excel, etc and simply persists the "id" of the related object in a "column" (or equivalent) in the owning object representation ... so such breaking changes would be much much less than what you have now

Flex Blaze DS and JPA - lazy-loading issues

I am developing an application in Flex, using Blaze DS to communicate with a Java back-end, which provides persistence via JPA (Eclipse Link).
I am encountering issues when passing JPA entities to Flex via Blaze DS. Blaze DS uses reflection to convert the JPA entity into an ObjectProxy (effectively a HashMap) by calling all getter methods on the entity. This includes any lazy-initialised one/many-to-many relationships.
You can probably see where I am going. If I pass a single object through JPA this will call all one/many-to-many methods on this object. For each returned object if they have one/many-to-many relationships they will be called too. As such, by passing back a single JPA entity I actually end up doing multiple database calls and passing all related entries back as a single ObjectProxy instance!
My solution to date is to create a translator to convert each entity to an ObjectProxy and vice-versa. This is clearly cumbersome and there must be a better way.
Thoughts please?
As an alternative, you could consider using GraniteDS instead of BlazeDS: GraniteDS has a much more powerful data management stack than BlazeDS (it competes more with LCDS) and fully support lazy-loading for all major JPA engines: Hibernate, EclipseLink, OpenJPA, etc.
Moreover, GraniteDS has a great client-side transparent lazy loading feature and even a so-called reverse lazy-loading mechanism.
And you don't need any kind of intermediate DTOs: it serializes JPA entities as is and uses code-generated ActionScript beans on the client-side to keep their initialization states.
Unfortunately, lazy-loading is not easy to accomplish with Flash clients. There are some working solutions, like dpHibernate, but so far all the different solutions I have tested fall short of what you would expect in terms of performance and ease of use.
So in my experience, it is the best and most reliable solution to always use DTOs, which adds the benefit of cleanly separating the database and view layers. This necessitates, though, that you implement either eager loading, or a second server round trip to resolve your many-to-many relations, as well as a good deal more boilerplate code to copy the DAO and DTO field values.
Which one to choose depends on your use case: Sometimes getting only the main object's fields might be enough, then you could simply omit the List of related objects from your DTO (transfer only those values you need for your query). Sometimes you may actually need the entire list of related entities, and then you could get it via eager loading, or by setting up a second remote object to find only the list.
EclipseLink also provides a copyObject() API that allows you to give a copy group of exactly what attribute you want. You could then use this copy to avoid having the relationships that you do not want.
If you have a detached object, you could just null out the fields that you do not want as well, or use a DTO.

Categories