XML vs. object trees

XML vs. object trees - java

In my current project (an order management system build from scratch), we are handling orders in the form of XML objects which are saved in a relational database.
I would outline the requirements like this:
Selecting various details from anywhere in the order
Updating / enriching data (e.g. from the CRM system)
Keeping a record of the changes (invalidating old data, inserting new values)
Details of orders should be easily selectable by SQL queries (for 2nd level support)
What we did:
The serialization is done with proprietary code, disassembling the order into tables like customer, address, phone_number, order_position etc.
Whenever an order is processed a bit further (e.g. due to an incoming event), it is read completely from the database and assembled back into a XML document.
Selection of data is done by XPath (scattered over code).
Most updates are done directly in the database (the order will then be reloaded for the next step).
The problems we face:
The order structure (XSD) evolves with every release. Therefore XPaths and the custom persistence often breaks and produces bugs.
We ended up having a mixture of working with the document and the database (because the persistence layer can not persist the changes in the document).
Performance is not really an issue (yet), since it is an offline system and orders are often intentionally delayed by days.
I do not expect free consultancy here, but I am a little confused on how the approach could be improved (next time, basically).
What would you think is a good solution for handling these requirements?
Would working with an object graph, something like JXPath and OGNL and an OR mapper be a better approach? Or using XML support of e.g. the Oracle database?

If your schema changes often, I would advise against using any kind of object-mapping. You'd keep changing boilerplate code just for the heck of it.
Instead, use the declarative schema definition to validate data changes and access.
Consider an order as a single datum, expressed as an XML document.
Use a document-oriented store like MongoDB, Cassandra or one of the many XML databases to manipulate the document directly. Don't bother with cutting it into pieces to store it in a relational db.
Making the data accessible via reporting tools in a relational database might be considered secondary. A simple map-reduce job on a MongoDB, for example, could populate the required order details into a relational database whenever required, separating the two use cases quite naturally.

The standard Java EE approach is to represent your data as POJOs and use JPA for the database access and JAXB to convert the objects to/from XML.
JPA
Object-to-Relational standard
Supported by all the application server vendors.
Multiple available implementations EclipseLink, Hibernate, etc.
Powerful query language JPQL (that is very similar to SQL)
Handles query optimization for you.
JAXB
Object-to-XML standard
Supported by all the application server vendors.
Multiple implementations available: EclipseLink MOXy, Metro, Apache JaxMe, etc.
Example
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-15.html

Related

Does a relational database schema have to be normalized to 3rd normal form in order for it to be ORM-mappable?

Jooq claims that there is no impedance mismatch when it comes to relational database schemas and object-oriented modelling of the data.
So, given a database schema that is asking to be wrapped in an application layer, does the DB schema have to be normalized to 3rd normal form in order for there to be optimal mapping between the DB schema, the ORM layer and the application?

You're probably referring to this jOOQ blog post here, which is a bit academic, not necessarily practical. It essentially says that what people call "impedance mismatch" may be caused by a lack of ORM features, not by the concept of ORMs per se.
This discussion has nothing to do with normalisation. As far as mapping is concerned, you can always map any table model to any object model if you correctly apply mapping rules and if you manually handle all the disadvantages of denormalisation (e.g. preventing inconsistencies in duplicate data). Having said so: the advantages of normalisation will make your life easier on all layers.
Note: if your schema is not normalised, chances are that it might have been designed for an analytic workload, not a transactional one, in case of which using an ORM might be overkill. Using a SQL based API like JDBC, jOOQ, etc. might be the better choice

JPA performance optimization or alternatives

We are currently in a project with a high demand on performance when it comes to reads from the database.
We are currently using JPA (EclipseLink implementation), currently just because it provides convenient database access and column mapping.
For our queries we are using highly specific SQL queries. We are also using one database (SAP HANA, in-memory), so a language abstraction is not required. The database access is pretty fast, our current bottleneck really is the application server, especially the persistence layer.
The result sets often also do not contain entities because entities are made up of the context. For us, there is no point in using an #Id field like the following, because we don't have fields that are unique (only combinations, but defining an IdClass is too much overhead).
#Entity
public class Item {
#Id
public myField;
// other fields...
}
This seems to be enforced by JPA if I want to run a typed native query. Is that assumption true? Currently we haven't found a way around the ID mapping.
Are these findings valid?
If not, how can we make our use of JPA more performant (there is significant latency compared to plain JDBC), also without defining an #Id (because it is useless in our case) for result types?
If yes, is there another Java library that just provides a minimum layer on top of JDBC without too much latency that provides a more convenient use than plain JDBC (with column mapping and all that good stuff).
Thanks!
Usecase: We would like to stream historic GPS sensor data from the database. Besides just transforming this to JSON, we also do some transformations/validations. That's why we actually need to build objects. So what we basically looking for is a convenient way of mapping the fields of select statements to attributes. I hope that makes sense.

There are many articles and blogs about improving EclipseLink/JPA performance that you might look into, such as EclipseLink Performance, JPA Performance Tuning and Optimizing the EclipseLink Application
In the end though it all depends very much on your specific use case and any future use cases you may want. JPA is designed to make reading and writing overtop of JDBC easier and more maintainable and adds many performance benefits such as caching. If all you are using it for is to read raw data though, the extra layer might be extra overhead that isn't adding any value. There isn't much point to having JPA build you entities from the resultsets, maintain the cache and watch for changes only for your application to ignore it all and grab the raw data.
I do not understand why you would have an Item table with a single myField. How is it used by the application and how does it relate to other tables and potential entities?
Such a construct is not the normal use case for relational databases and ORMs, but there are still ways around it in JPA. The data could be used in element collections by other entities, or even just not mapped, and native SQL queries used which are passed straight through the JDBC layer. EclipseLink itself has many mapping types and options above and beyond JPA that might be used depending on your use cases.

Fast and safe way to save and retrieve data in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In order to add data persistance to an oriented billing software, i wonder what is the best way to save and retrieve data in my situation.
I work with JavaFX's TableView populated by custom objects (with many string, int, booleans, ...), each one representing one bill. The user must be able to add, read and edit data on the fly. Everything is stored locally, no need to use a cloud or something like.
I usually use serialization to write my objects, but is it a safe and fast way to store around 10.000 custom objects ?
Should i use XML, Serialization, a local database (with JavaDB ?) or ... ?
By fast, i mean that the user can write, and edit data. I have no problem with a small loading time when the app is launched.
By safe, i don't mean encrypted, it is safe in the "data won't get lost or corrupted" way.
Eventually if there are multiple solutions, why one over another ?

Any persistence mechanism (flat file, relational db, nosql) can be safe if used as designed or unsafe if abused/misunderstood. Your question is a very open ended question and can get very involved, or be very light.. it all depends.
Typically the choices come down to:
flat file (say binary serialisation, csv, json or xml). Very simple mechanism which takes effort to scale to large files and care must be taken when making changes to the code base; as changes could prevent older files from being readable. One has also got to bare in mind when the data is written in relation to changes coming in from the user and the possibility of a machine crash. ie there are not transactions and so data can corrupt, not a simple topic in its own right. As for which format is best, well many a religious war has been fought over that but typically a textual format (json, xml or csv) has the advantage of being human readable which helps debugging/maintenance tasks. XML and Json support nested structures which is an advantage over CSV. As for performance, text manipulation typically slows the parser down by about 10x compared to a binary one. However there are fast implementations and slow ones, and for 10k objects you are unlikely to notice the difference.
relational database. Very useful for apps that benefit from using relational queries (SQL), and a lot of effort has gone into making them transactional and robust to machine crashes. They are generally the persistence mechanism of choice for large businesses and require some knowledge to setup and maintain the DB itself. H2 is a very simple, low cost entry provider and Oracle is at the other extreme end of the spectrum. Relational databases suffer from a domain mismatch, specifically Object design and SQL design do not map together without some effort from the developer. They also typically suffer from scaling problems as they are not usually clusterable, not a problem for 10k rows.
no sql databases (eg redis, cassandra, mongo, couch, neo4j). Generally not transactional, but they are often faster than the relational dbs and offer clusters from the get-go making them very robust. They also support different data modelling paradigms such as graph, list, document making the NoSQL landscape much richer than the relational SQL one.
I assume that you are not working on a professional project and lack a mentor, so I will wrap up by suggesting that you focus on flat files first and then pick a DB product of some kind to experiment with (H2 is very good for learning relational products, and Mongo or Redis for ease of learning one of the NoSql products).

You can use H2 database (http://www.h2database.com/), it's really convenient way to store data, and you can use embedded database this looks like
Class.forName("org.h2.Driver");
Connection conn = DriverManager.getConnection("jdbc:h2:~/test", "sa", "");
// add application code here
conn.close();
H2 creates file test in user home directory named test.db.

Object serialization is safe. It's not particularly fast and you have to be very careful about how you make changes to the class to ensure that you can consistently deserialize. This is the biggest disadvantage in my opinion about object serialization.
XML (or JSON) aren't bad either. There are binding technologies like JAXB, Jackson or Gson which allow you to seamlessly map objects to XML or JSON. Permissive binding makes these formats easier to use than object serialization, having the additional benefit of being human readable and editable, but with the cost of being more verbose (consider file compression). If your storage format is a giant XML file, you can also search for records using XPath.
JavaDB (or H2 or SQLite) is good in that it implements a relational data store, so you can perform SQL queries on the data. Managing lots of records is much more straightforward with a proper database. You could probably save on disk space, too. I would recommend this approach.
Will there be multiple clients reading these files? In that case, for safety, you would have to implement some kind of file locking scheme to prevent data corruption. You can get around this with some kind of out-of-process data management, using a lightweight server such as H2 or one of the NoSQL datastores like Mongo or Cassandra.

Concerns with NoSQL/MongoDB

I'm starting to build a new Spring-based multi-user document management application and I would like to venture into the world of NoSQL/MongoDB. Coming from a RDBMS background, I have several concerns with MongoDB, primarily:
Lack of transactions
More focused on performance/scalability than data integrity
Lack of a JPA standard
To start with, I do not expect high loads or massive reads vs writes. I suspect reads to writes will be about 10 to 1. Additionally, I do not expect very high loads - especially to start.
1) From what I can tell, there is no easy way to do multi-collection transactions. Where in a RDBMS I can easily have a per-user document ID counter maintained in a separate table, there does not seem to be a way to do this reliably in MongoDB given that it would be in a separate collection/document. Consequently, I'm not sure if/how one resolves this problem.
2) Additionally, from what I have read, NoSQL is great where data integrity isn't the primary concern (ex: blog comments, etc). However, I'm not sure how this translates to being the primary data store for an application. Does this mean that one can update a document and have it fail? I ran across an older unaccredited rant which discusses failed commits/etc which further flames the concerns.
3) The seemingly lack of a JPA-like standard for NoSQL would imply that I have to choose my DB and stick with it. Unlike JPA where I can easily swap one DB vendor for another JPQ/SQL compliant vendor, I have to code with MongoDB in mind and redesign my structure/queries if ever I wanted to switch to another NoSQL DB. I've seen Hibernate OGM, but it seems that it is still very much in its infancy and only provides rudimentary support. Definitely not something that would avoid mongodb specific queries.
Are these concerns easily mitigated? Being new to the NoSQL world, I'm still having trouble understanding the right business case when to use NoSQL.

These are good questions. Here's my 2 cents about MongoDB and some references to help you learn more. I won't speak about any other NoSQL thingies as there are a lot out there and there's no real unifying principle to NoSQL other than "it doesn't use SQL", except sometimes people make it work with SQL, so, yeah.
MongoDB does not do joins. Period. MongoDB does not have transactions - whether within one collection or involving multiple collections. The unit of atomicity is the document. How does this work in an application? Via schema designand some techniques for recovering parts of ACID semantics if necessary, for example using two-phase commits. In relational databases, schema design is straightforward and is based on the structure of the data and not its use case. Joins and transactions fill in the gap between the abstract, normalized data representation and the concrete ways the data will be used. The data modeling intro already linked explains the situation for MongoDB, for contrast:
The key challenge in data modeling is balancing the needs of the application, the performance characteristics of the database engine, and the data retrieval patterns. When designing data models, always consider the application usage of the data (i.e. queries, updates, and processing of the data) as well as the inherent structure of the data itself.
That specific "rant" is clearly very old as it talks about writes being unacknowledged by default. This isn't the case anymore. Given any distributed computer system operating over a network, it's pretty easy to come up with a way for it to behave poorly . The MongoDB blog covered a lot of this stuff in a series on consistency. I'd suggest touring the docs about journaling, replication, and write concern and see if that makes you feel better about MongoDB as a primary data store.
Yup. This comes with the NoSQL territory. What doesn't exist is common data access languages or standards because everything is new and trying to be different. Check back in 30 years.

ORM Technologies vs JDBC?

My question is regarding ORM and JDBC technologies, on what criteria would you decide to go for an ORM technology as compared to JDBC and other way round ?
Thanks.

JDBC
With JDBC, developer has to write code to map an object model's data representation to a relational data model and its corresponding database schema.
With JDBC, the automatic mapping of Java objects with database tables and vice versa conversion is to be taken care of by the developer manually with lines of code.
JDBC supports only native Structured Query Language (SQL). Developer has to find out the efficient way to access database, i.e. to select effective query from a number of queries to perform same task.
Application using JDBC to handle persistent data (database tables) having database specific code in large amount. The code written to map table data to application objects and vice versa is actually to map table fields to object properties. As table changed or database changed then it’s essential to change object structure as well as to change code written to map table-to-object/object-to-table.
With JDBC, it is developer’s responsibility to handle JDBC result set and convert it to Java objects through code to use this persistent data in application. So with JDBC, mapping between Java objects and database tables is done manually.
With JDBC, caching is maintained by hand-coding.
In JDBC there is no check that always every user has updated data. This check has to be added by the developer.
HIBERNATE.
Hibernate is flexible and powerful ORM solution to map Java classes to database tables. Hibernate itself takes care of this mapping using XML files so developer does not need to write code for this.
Hibernate provides transparent persistence and developer does not need to write code explicitly to map database tables tuples to application objects during interaction with RDBMS.
Hibernate provides a powerful query language Hibernate Query Language (independent from type of database) that is expressed in a familiar SQL like syntax and includes full support for polymorphic queries. Hibernate also supports native SQL statements. It also selects an effective way to perform a database manipulation task for an application.
Hibernate provides this mapping itself. The actual mapping between tables and application objects is done in XML files. If there is change in Database or in any table then the only need to change XML file properties.
Hibernate reduces lines of code by maintaining object-table mapping itself and returns result to application in form of Java objects. It relieves programmer from manual handling of persistent data, hence reducing the development time and maintenance cost.
Hibernate, with Transparent Persistence, cache is set to application work space. Relational tuples are moved to this cache as a result of query. It improves performance if client application reads same data many times for same write. Automatic Transparent Persistence allows the developer to concentrate more on business logic rather than this application code.
Hibernate enables developer to define version type field to application, due to this defined field Hibernate updates version field of database table every time relational tuple is updated in form of Java class object to that table. So if two users retrieve same tuple and then modify it and one user save this modified tuple to database, version is automatically updated for this tuple by Hibernate. When other user tries to save updated tuple to database then it does not allow saving it because this user does not have updated data.

Complexity.
ORM If your application is domain driven and the relationships among objects is complex or you need to have this object defining what the app does.
JDBC/SQL If your application is simple enough as to just present data directly from the database or the relationships between them is simple enough.
The book "Patterns of enterprise application architecture" by Martin Fowler explains much better the differences between these two types:
See: Domain Model and Transaction Script

I think you forgot to look at "Functional Relational Mapping"
I would sum up by saying:
If you want to focus on the data-structures, use an ORM like JPA/Hibernate
If you want to shed light on treatments, take a look at FRM libraries: QueryDSL or Jooq
If you need to tune your SQL requests to specific databases, use JDBC and native SQL requests
The strengh of various "Relational Mapping" technologies is portability: you ensure your application will run on most of the ACID databases.
Otherwise, you will cope with differences between various SQL dialects when you write manually the SQL requests.
Of course you can restrain yourself to the SQL92 standard (and then do some Functional Programming) or you can reuse some concepts of functionnal programming with ORM frameworks
The ORM strenghs are built over a session object which can act as a bottleneck:
it manages the lifecycle of the objects as long as the underlying database transaction is running.
it maintains a one-to-one mapping between your java objects and your database rows (and use an internal cache to avoid duplicate objects).
it automatically detects association updates and the orphan objects to delete
it handles concurrenty issues with optimistic or pessimist lock.
Nevertheless, its strengths are also its weaknesses:
The session must be able to compare objects so you need to implements equals/hashCode methods
But Objects equality must be rooted on "Business Keys" and not database id (new transient objects have no database ID!).
However, some reified concepts have no business equality (an operation for instance).
A common workaround relies on GUIDs which tend to upset database administrators.
The session must spy relationship changes but its mapping rules push the use of collections unsuitable for the business algorithms.
Sometime your would like to use an HashMap but the ORM will require the key to be another "Rich Domain Object" instead of another light one...
Then you have to implement object equality on the rich domain object acting as a key...
But you can't because this object has no counterpart on the business world.
So you fall back to a simple list that you have to iterate on (and performance issues result from)
The ORM API are sometimes unsuitable for real-world use.
For instance, real world web applications try to enforce session isolation by adding some "WHERE" clauses when you fetch data...
Then the "Session.get(id)" doesn't suffice and you need to turn to more complex DSL (HSQL, Criteria API) or go back to native SQL
The database objects conflicts with other objects dedicated to other frameworks (like OXM frameworks = Object/XML Mapping).
For instance, if your REST services use jackson library to serialize a business object.
But this Jackson exactly maps to an Hibernate One.
Then either you merge both and a strong coupling between your API and your database appears
Or you must implement a translation and all the code you saved from the ORM is lost there...
On the other side, FRM is a trade-off between "Object Relational Mapping" (ORM) and native SQL queries (with JDBC)
The best way to explain differences between FRM and ORM consists into adopting a DDD approach.
Object Relational Mapping empowers the use of "Rich Domain Object" which are Java classes whose states are mutable during the database transaction
Functional Relational Mapping relies on "Poor Domain Objects" which are immutable (so much so you have to clone a new one each time you want to alter its content)
It releases the constraints put on the ORM session and relies most of time on a DSL over the SQL (so portability doesn't matter)
But on the other hand, you have to look into the transaction details, the concurrency issues
List<Person> persons = queryFactory.selectFrom(person)
.where(
person.firstName.eq("John"),
person.lastName.eq("Doe"))
.fetch();

It also depends on the learning curve.
Ebean ORM has a pretty low learning curve (simple API, simple query language) if you are happy enough with JPA annotations for mapping (#Entity, #Table, #OneToMany etc).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.