Correct modeling historical records in a database

Correct modeling historical records in a database - java

In my applications I have a set of object which stay alive during the whole application lifecycle and I need to create an historical database of them.
These objects are instances of a hierarchy of Java / Scala classes annotated with Hibernate annotations, which I use in my application to load them at startup. Luckily all the classes already contain a timestamp, which means that I do not need to change the object model to be able to create historical records.
What is the most suitable approach:
Use Hibernate without annotations and providing external xml mappings, which are the same as the one of annotations besides the primary key ( which is now a composite key consisting of the previous primary key + the timestamp)
Use other classes for historical records ( this sounds very complicated, as I do have a hierarchy of classes and not a single class, and I would have to subclass my HistoricalRecordClass for every type of record, as I want to build it back). Still use Hibernate
Use a completely different approach (Please not I do not like ORMS, it is just a matter of convience)
Some considerations:
The goal of storing historical records is that the user, through a single GUI, might access both the real-time values of certain data or the historical value, just by specifying a date.

How do you intend to use the historical records? The easiest solution would be to serialize them as JSON and log them to a file.
I've never combined hibernate xml mappings in conjunction with hibernate annotations, but if it worked, it sounds more attractive than carrying two parallel object models.
If you need to be able to recreate the application state at any point in time, then you're more or less stuck with writing them to a database (because of the fast random access). You could cheat and have a "history" table that has a composite key of id + timestamp + type, then a "json" field where you just marshal the thing down and save it. That would help with a) carrying one history table instead of a bunch of clone tables, and b) give you some flexibility if the schema changes (i.e. leverage the open schema nature of JSON)
But since it's archive data with a different usage pattern (you're just reading/writing the records whole), I'd think about some other means of storing it than with the same strict schema as the live data.
It's a nice application of the "write once" paradigm... do you have Hadoop available? ;)

Related

Pragmatic Programmer: avoid data source duplication by using map?

In the Pragmatic Programmer book, chapter “Data source duplication” the authors state:
Many Data sources allow you to introspect on their data schema. This can be used to remove much of the duplication between them and your code. Rather than manually creating the code to contain this stored data, you can generate the containers directly from the schema. Many persistence frameworks will do this heavy lifting for you.
So far so good. We can achieve this easily connecting our IDE to the DB and let it create our entities for us.
Then it continues:
There’s another option, and one we often prefer. Rather than writing code that represents external data in a fixed structure (an instance of a struct of class for example), just stick it into a key/value data structure (your language might call it a map, hash, dictionary, or even object). On its own this is risky .... we recommend adding a second layer to this solution: a simply table-driven validation suite that verifies that the map you’ve created contains at least the data you need. Your API documentation tool might be able to generate this.
The idea if I got it right is to avoid having an Entity to represent the table in the DB (so to avoid duplication of knowledge) but rather to use a map, so that if we add a new column to the schema we don’t need to update our representation of that schema (i.e. the entity) as well in our application.
Then comes the part that is not clear to me: he talks about an autogenerated “table-driven validation suite that verifies that the map you’ve created contains at least the data you need”.
Does any of you know how these concept implemented would look like?
The closest thing i could find on Google about this topic is this question on StackOverflow but the answers skipped the second part.

I think it really depends on the language you’re using and on the data you need to read. For Java, if you’re mapping the raw data to a Map what you could do is to use validators (ex Hibernate validators or spring validators) to define your custom annotation and enforce that schema’s constraints are respected when creating the in memory representation (eg: you’re reading a user table with id as primary key, the map must then contain the id key with a valid value)

JPA - What exactly does it mean for an entity object to persist? What is the definition of persistence?

I'm fairly new to java web applications and I am undertaking the task of learning JPA. However, it is not explicitly clear what it means for an entity object to persist. I think I have an idea, but I would rather not assume its meaning.
I am referencing the Oracle JPA Doc, but they continue to use the words like "persist" or "persistence" when describing persistent fields/properties. Can someone shed some light on this idea of persistence? And maybe define what it means for an instance of an entity to be persistent?
And if you could not use the word "persistent" (or any form of the word) in your definition that would be much appreciated. A simple answer would be great, but more in-depth explanations are definitely welcome! Thanks so much!

Persistence simply means to Store Permanently.
In JAVA we work with Objects and try to store Object's values into database(RDBMS mostly).
JPA provides implementation for Object Relation Mapping(ORM) ,so that we can directly store Object into Database as a new Tuple.
Object, in JPA, are converted to Entity for mapping it to the Table in Database.
So Persisting an Entity means Permanently Storing Object(Entity) into Database.
Hope this Helps!!

"Persist" means "lives on after the application is shut down". The object is not just in volatile memory; it's in more permanent storage on disk. If the application is shut down, or the user ends their session and begins a new one, the old data is still available from permanent storage on disk.
Databases store information on disks, unless they are in-memory versions that give you the advantage of using SQL but little else. If you use a relational SQL database, you get a query language that makes it easy to Create/Read/Update/Delete information without having to worry about how it's stored on the disk.
SQL databases store relations on disk using different data structures (e.g. B-Tree). Relations are defined in terms of tables and columns. Each record in a table consists of a tuple of row values. Objects have to map tables and columns to objects and attributes using object-relational mapping. JPA generalizes this idea and builds it into Java EE, following the example of implementations like TopLink and Hibernate.
NoSQL databases, like MongoDB, also store information on disk as documents rather than relations.
Object databases serialize an object and all its children using formats like Java serialization, XML, JSON, or custom formats (e.g. Google protocol buffers).
Graph databases, like Neo4J, can be thought of as more general cases of object databases.

JPA, complex object graphs and Serialisation

I have a "Project" entity/class that includes a number of "complex" fields, eg referenced as interfaces with many various possible implementations. To give an example: an interface Property, with T virtually of any type (as many types as I have implemented).
I use JPA. For those fields I have had no choice but to actually serialize them to store them. Although I have no need to use those objects in my queries, this is obviously leading to some issues, eg maintenance/updates to start with.
I have two questions:
1) is there a "trick" I could consider to keep my database up to date in case I have a "breaking" change in my serialised class (most of the time serialisation changes are handled well)?
2) will moving to JDO help at all? I very little experience with JDO but my understanding is that with JDO, having serialised objects in the tables will never happen (how are changes handled though?).
In support to 2) I must also add that the object graphs I have can be quite complex, possibly involving 10s of tables just to retrieve a full "Project" for instance.

JDO obviously supports persistence of interface fields (but then DataNucleus JPA also allows their persistence, but as vendor extension). Having some interface field being one of any possible type presents problems to RDBMS rather than to JDO as such. The underlying datastore is more your problem (in not being able to adequately mirror your model), and one of the many many other datastores could help you with that. For example DataNucleus JDO/JPA supports GAE/Datastore, Neo4j, MongoDB, HBase, ODF, Excel, etc and simply persists the "id" of the related object in a "column" (or equivalent) in the owning object representation ... so such breaking changes would be much much less than what you have now

Mapping POJO to Entities

In our project we have a constraint of not having the luxury to alter the table structure already in place. The tables are highly denormalized in nature.
We have come up with good POJOs for the application. We have the Entity beans generated out of the exiting tables. Now we have to map the POJOs to the entities so that we can persist.
Ultimately, we combine a good POJO with a bad table. Any thoughts on options/alternatives/suggestions to this approach?

Hibernate/JPA(2) has a rich set of functionality to manipulate the mapping (so that your objects can differ from the tables), so that many (NOT ALL) old tables can be mapped to normal object. -- May you should have a look at this first, any use your pojo/table-"solution" only if this mapping is not powerful enough.
If you have a read only application, you can think of using views to make your table/views more like you objects. This may reduse the amount of strange mapping.
I don't know your mapping, size of the application or use case, but have you considered not to use Hibernate? I ask this, because I can imagine (how I said: I don't know you application), that in a architecture like this, no Hibernate feature is used and so Hibernate will add only a not needed complexity.

If you are using Hibernate you should be able to map your POJOs to the table structure using only XML files, without creating new Java beans. This would allow you to easily change the mapping if all of a sudden you can change the tables structures and make the economy of intermediary beans. That's the best you can do.

In my current project (an order management system build from scratch), we are handling orders in the form of XML objects which are saved in a relational database.
I would outline the requirements like this:
Selecting various details from anywhere in the order
Updating / enriching data (e.g. from the CRM system)
Keeping a record of the changes (invalidating old data, inserting new values)
Details of orders should be easily selectable by SQL queries (for 2nd level support)
What we did:
The serialization is done with proprietary code, disassembling the order into tables like customer, address, phone_number, order_position etc.
Whenever an order is processed a bit further (e.g. due to an incoming event), it is read completely from the database and assembled back into a XML document.
Selection of data is done by XPath (scattered over code).
Most updates are done directly in the database (the order will then be reloaded for the next step).
The problems we face:
The order structure (XSD) evolves with every release. Therefore XPaths and the custom persistence often breaks and produces bugs.
We ended up having a mixture of working with the document and the database (because the persistence layer can not persist the changes in the document).
Performance is not really an issue (yet), since it is an offline system and orders are often intentionally delayed by days.
I do not expect free consultancy here, but I am a little confused on how the approach could be improved (next time, basically).
What would you think is a good solution for handling these requirements?
Would working with an object graph, something like JXPath and OGNL and an OR mapper be a better approach? Or using XML support of e.g. the Oracle database?

If your schema changes often, I would advise against using any kind of object-mapping. You'd keep changing boilerplate code just for the heck of it.
Instead, use the declarative schema definition to validate data changes and access.
Consider an order as a single datum, expressed as an XML document.
Use a document-oriented store like MongoDB, Cassandra or one of the many XML databases to manipulate the document directly. Don't bother with cutting it into pieces to store it in a relational db.
Making the data accessible via reporting tools in a relational database might be considered secondary. A simple map-reduce job on a MongoDB, for example, could populate the required order details into a relational database whenever required, separating the two use cases quite naturally.

The standard Java EE approach is to represent your data as POJOs and use JPA for the database access and JAXB to convert the objects to/from XML.
JPA
Object-to-Relational standard
Supported by all the application server vendors.
Multiple available implementations EclipseLink, Hibernate, etc.
Powerful query language JPQL (that is very similar to SQL)
Handles query optimization for you.
JAXB
Object-to-XML standard
Supported by all the application server vendors.
Multiple implementations available: EclipseLink MOXy, Metro, Apache JaxMe, etc.
Example
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-15.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.