Set foreign keys if using a ORM framework

Set foreign keys if using a ORM framework - java

Do you set foreign keys in your tables although you use an ORM framework like Hibernate or Doctrine? In my opinion the advantage is that you can navigate in the sql admin view easier but I think data integrity is no argument anymore - just because you can set cascade settings in the ORM framework annotations / xml / ...? What do you think?
In my case I will always use foreign keys because it gives me a clean and stable database definition. but I am interested on others opinions.

No existing ORM (even most powerfull) can be compared to enterprise-ready relational DBMS like mssql, postgres, mysql etc in maintaining data integrity and consistency (feature-wise, perfomance-wise). And your DB managed by RDMBS - is free and extra entry and extensibility point (with many available APIs) for you data. Hence it should keep data consistent while data manipulations, performed by various clients. In many cases you can think of your non-db app as off replacable front-end. Good side effects of being data-centric - you can use your db for low-level synchronisation of your app (app-parts) operations, change your app behavior easily (without client recompilation, redeployment) by changing DB-programmability, schema (including relations, constraints) etc.

Related

Does a relational database schema have to be normalized to 3rd normal form in order for it to be ORM-mappable?

Jooq claims that there is no impedance mismatch when it comes to relational database schemas and object-oriented modelling of the data.
So, given a database schema that is asking to be wrapped in an application layer, does the DB schema have to be normalized to 3rd normal form in order for there to be optimal mapping between the DB schema, the ORM layer and the application?

You're probably referring to this jOOQ blog post here, which is a bit academic, not necessarily practical. It essentially says that what people call "impedance mismatch" may be caused by a lack of ORM features, not by the concept of ORMs per se.
This discussion has nothing to do with normalisation. As far as mapping is concerned, you can always map any table model to any object model if you correctly apply mapping rules and if you manually handle all the disadvantages of denormalisation (e.g. preventing inconsistencies in duplicate data). Having said so: the advantages of normalisation will make your life easier on all layers.
Note: if your schema is not normalised, chances are that it might have been designed for an analytic workload, not a transactional one, in case of which using an ORM might be overkill. Using a SQL based API like JDBC, jOOQ, etc. might be the better choice

JPA performance optimization or alternatives

We are currently in a project with a high demand on performance when it comes to reads from the database.
We are currently using JPA (EclipseLink implementation), currently just because it provides convenient database access and column mapping.
For our queries we are using highly specific SQL queries. We are also using one database (SAP HANA, in-memory), so a language abstraction is not required. The database access is pretty fast, our current bottleneck really is the application server, especially the persistence layer.
The result sets often also do not contain entities because entities are made up of the context. For us, there is no point in using an #Id field like the following, because we don't have fields that are unique (only combinations, but defining an IdClass is too much overhead).
#Entity
public class Item {
#Id
public myField;
// other fields...
}
This seems to be enforced by JPA if I want to run a typed native query. Is that assumption true? Currently we haven't found a way around the ID mapping.
Are these findings valid?
If not, how can we make our use of JPA more performant (there is significant latency compared to plain JDBC), also without defining an #Id (because it is useless in our case) for result types?
If yes, is there another Java library that just provides a minimum layer on top of JDBC without too much latency that provides a more convenient use than plain JDBC (with column mapping and all that good stuff).
Thanks!
Usecase: We would like to stream historic GPS sensor data from the database. Besides just transforming this to JSON, we also do some transformations/validations. That's why we actually need to build objects. So what we basically looking for is a convenient way of mapping the fields of select statements to attributes. I hope that makes sense.

There are many articles and blogs about improving EclipseLink/JPA performance that you might look into, such as EclipseLink Performance, JPA Performance Tuning and Optimizing the EclipseLink Application
In the end though it all depends very much on your specific use case and any future use cases you may want. JPA is designed to make reading and writing overtop of JDBC easier and more maintainable and adds many performance benefits such as caching. If all you are using it for is to read raw data though, the extra layer might be extra overhead that isn't adding any value. There isn't much point to having JPA build you entities from the resultsets, maintain the cache and watch for changes only for your application to ignore it all and grab the raw data.
I do not understand why you would have an Item table with a single myField. How is it used by the application and how does it relate to other tables and potential entities?
Such a construct is not the normal use case for relational databases and ORMs, but there are still ways around it in JPA. The data could be used in element collections by other entities, or even just not mapped, and native SQL queries used which are passed straight through the JDBC layer. EclipseLink itself has many mapping types and options above and beyond JPA that might be used depending on your use cases.

XML vs. object trees

In my current project (an order management system build from scratch), we are handling orders in the form of XML objects which are saved in a relational database.
I would outline the requirements like this:
Selecting various details from anywhere in the order
Updating / enriching data (e.g. from the CRM system)
Keeping a record of the changes (invalidating old data, inserting new values)
Details of orders should be easily selectable by SQL queries (for 2nd level support)
What we did:
The serialization is done with proprietary code, disassembling the order into tables like customer, address, phone_number, order_position etc.
Whenever an order is processed a bit further (e.g. due to an incoming event), it is read completely from the database and assembled back into a XML document.
Selection of data is done by XPath (scattered over code).
Most updates are done directly in the database (the order will then be reloaded for the next step).
The problems we face:
The order structure (XSD) evolves with every release. Therefore XPaths and the custom persistence often breaks and produces bugs.
We ended up having a mixture of working with the document and the database (because the persistence layer can not persist the changes in the document).
Performance is not really an issue (yet), since it is an offline system and orders are often intentionally delayed by days.
I do not expect free consultancy here, but I am a little confused on how the approach could be improved (next time, basically).
What would you think is a good solution for handling these requirements?
Would working with an object graph, something like JXPath and OGNL and an OR mapper be a better approach? Or using XML support of e.g. the Oracle database?

If your schema changes often, I would advise against using any kind of object-mapping. You'd keep changing boilerplate code just for the heck of it.
Instead, use the declarative schema definition to validate data changes and access.
Consider an order as a single datum, expressed as an XML document.
Use a document-oriented store like MongoDB, Cassandra or one of the many XML databases to manipulate the document directly. Don't bother with cutting it into pieces to store it in a relational db.
Making the data accessible via reporting tools in a relational database might be considered secondary. A simple map-reduce job on a MongoDB, for example, could populate the required order details into a relational database whenever required, separating the two use cases quite naturally.

The standard Java EE approach is to represent your data as POJOs and use JPA for the database access and JAXB to convert the objects to/from XML.
JPA
Object-to-Relational standard
Supported by all the application server vendors.
Multiple available implementations EclipseLink, Hibernate, etc.
Powerful query language JPQL (that is very similar to SQL)
Handles query optimization for you.
JAXB
Object-to-XML standard
Supported by all the application server vendors.
Multiple implementations available: EclipseLink MOXy, Metro, Apache JaxMe, etc.
Example
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-15.html

Migrating from hand-written persistence layer to ORM

We are currently evaluating options for migrating from hand-written persistence layer to ORM.
We have a bunch of legacy persistent objects (~200), that implement simple interface like this:
interface JDBC {
public long getId();
public void setId(long id);
public void retrieve();
public void setDataSource(DataSource ds);
}
When retrieve() is called, object populates itself by issuing handwritten SQL queries to the connection provided using the ID it received in the setter (this usually is the only parameter to the query). It manages its statements, result sets, etc itself. Some of the objects have special flavors of retrive() method, like retrieveByName(), in this case a different SQL is issued.
Queries could be quite complex, we often join several tables to populate the sets representing relations to other objects, sometimes join queries are issued on-demand in the specific getter (lazy loading). So basically, we have implemented most of the ORM's functionality manually.
The reason for that was performance. We have very strong requirements for speed, and back in 2005 (when this code was written) performance tests has shown that none of mainstream ORMs were that fast as hand-written SQL.
The problems we are facing now that make us think of ORM are:
Most of the paths in this code are well-tested and are stable. However, some rarely-used code is prone to result set and connection leaks that are very hard to detect
We are currently squeezing some additional performance by adding caching to our persistence layer and it's a huge pain to maintain the cached objects manually in this setup
Support of this code when DB schema changes is a big problem.
I am looking for an advice on what could be the best alternative for us. As far as I know, ORMs has advanced in last 5 years, so it might be that now there's one that offers an acceptable performance. As I see this issue, we need to address those points:
Find some way to reuse at least some of the written SQL to express mappings
Have the possibility to issue native SQL queries without the necessity to manually decompose their results (i.e. avoid manual rs.getInt(42) as they are very sensitive to schema changes)
Add a non-intrusive caching layer
Keep the performance figures.
Is there any ORM framework you could recommend with regards to that?
UPDATE To give a feeling of what kind of performance figures we are talking about:
The backend database is TimesTen, in-memory database that runs on the same machine as the JVM
We found out that changing rs.getInt("column1") to rs.getInt(42) brings the performance increase we consider significant.

If you want a standard persistence layer that lets you issue native SQL queries, consider using iBATIS. It's a fairly thin mapping between your objects and SQL. http://ibatis.apache.org/
For caching and lazy joins, Hibernate might be a better choice. I haven't used iBATIS for these purposes.
Hibernate provides a lot of flexibility in allowing you to specify certain defaults for lazy loading as you traverse your object graph, yet also pre-fetch data with SQL or HQL queries to your heart's content when you need better-known load times. However, the conversion effort will be complicated for you as it has a fairly high bar to entry in terms of learning and configuration. Annotations made this easier for me.
Two benefits you didn't mention about switching to a standard framework:
(1) running down bugs becomes easier when you have a wealth of sites and forums out there to support you.
(2) new hires are cheaper, easier and faster.
Good luck in addressing your performance and usability issues. The tradeoffs you point out are very common. Sorry if I evangelized.

For the bulk of your queries, I'd go with hibernate. It's widely used,well documented, and generally performant. You can drop down to hand-written SQL if hibernate isn't producing efficient enough queries. Hibernate gives you a lot of control in specifying the table names and columns that the domain objects map to, and in most cases you can retro fit it to an exisitng schema.
Find some way to reuse at least some of the written SQL to express mappings
The mappings are expressed in JPA using annotations. You can use the existing SQL as a guide when creating JPQL queries.
Add a non-intrusive caching layer
Caching in hibernate is automatic and transparent, unless you specifically choose to get involved. You can mark entities as read only, or evict from the cache, control when changes are flushed to the database (inside a transaction of course - automatic use of batching improves performance when network latency is a concern.)
Have the possibility to issue native
SQL queries without the necessity to
manually decompose their results (i.e.
avoid manual rs.getInt(42) as they
are very sensitive to schema changes)
Hibernate allows you to write SQL, and have this mapped to your entities. You don't deal with the ResultSet directly - hibernate takes care of the deconstruction into your entity. See Chpt 16, Native SQL in the hibernate manual.
Support of this code when DB schema changes is a big problem.
Managing schema changes can still be a pain, since you now effectively have two schemata - the database schema and the JPA mapping (an object schema). if you choose to let hibernate generate the db schema and move your data to that, you are no longer directly responsible for what goes into the database, and so you are then faced with manging automatic changes to a machine generated schema. There are tools that can assist, such as dbmigrate, and liquibase, but it's no walk in the park. Conversely, if you are managing the db schema by hand, then you will have to carefully recraft your entities, JPA annotations and queries to accomodate the schema changes. Adding columns and new entities is relatively trivial, but more complex changes such as changing a single property to a collection of properties, or restructing an object hierarchy will involve considerably more extensive changes. There is no easy way out of this - either the db or hibernate is the "master" that decides the schema, and when one changes, the other must follow. The code changes aren't so bad - in my experience, it's migrating the data that's difficult. But this is a basic issue with databases, and will be present in any solution you choose.
So, to sum up, I'd go with hibernate, and use the JPA interface.

I've recently drilled through a bunch of Java ORMs and didn't come up with anything much better than Hibernate. Hibernate's performance may get you there and satisfy your performance goals.
Lots of people think that moving to Hibernate will make everything so awesome, but it's really just moving a set of problems from JDBC queries into Hibernate tuning. Read a bunch of books or (better) hire a "Hibernate guy" to come in and help.
During your refactor, I'd recommend using JPA so you can un-plug and re-plug a new persistence provider when the Next Big Thing comes along (or you move to Oracle)

Do you really need to migrate? What's forcing you to move? Is there some REAL need here or someone just inventing work (an 'Astronaut architect')?
I agree with the above answers though - if you HAVE to move - Hibernate or iBatis are good choices. iBatis especially if you want to stay 'closer' to the SQL.

If you need more performance: drop the database (for on-line work) and handle the persistence direct. Adding caching is not going to help you with a TimesTen DB, it just adds an extra copy (slowing you down).
You might want to take a look at GemFire.

There is a lot of good advice already in here that I won't repeat. The only thing I didn't see suggested that might work for you is caching reference data in memory.
I have done quite a bit of this in the past and it does save a lot of time. If you have a large number of fairly static reference tables, load them all into memory at startup time and refresh them every couple minutes. That way you're not hitting the DB over and over again for data that never changes.

ORM Technologies vs JDBC?

My question is regarding ORM and JDBC technologies, on what criteria would you decide to go for an ORM technology as compared to JDBC and other way round ?
Thanks.

JDBC
With JDBC, developer has to write code to map an object model's data representation to a relational data model and its corresponding database schema.
With JDBC, the automatic mapping of Java objects with database tables and vice versa conversion is to be taken care of by the developer manually with lines of code.
JDBC supports only native Structured Query Language (SQL). Developer has to find out the efficient way to access database, i.e. to select effective query from a number of queries to perform same task.
Application using JDBC to handle persistent data (database tables) having database specific code in large amount. The code written to map table data to application objects and vice versa is actually to map table fields to object properties. As table changed or database changed then it’s essential to change object structure as well as to change code written to map table-to-object/object-to-table.
With JDBC, it is developer’s responsibility to handle JDBC result set and convert it to Java objects through code to use this persistent data in application. So with JDBC, mapping between Java objects and database tables is done manually.
With JDBC, caching is maintained by hand-coding.
In JDBC there is no check that always every user has updated data. This check has to be added by the developer.
HIBERNATE.
Hibernate is flexible and powerful ORM solution to map Java classes to database tables. Hibernate itself takes care of this mapping using XML files so developer does not need to write code for this.
Hibernate provides transparent persistence and developer does not need to write code explicitly to map database tables tuples to application objects during interaction with RDBMS.
Hibernate provides a powerful query language Hibernate Query Language (independent from type of database) that is expressed in a familiar SQL like syntax and includes full support for polymorphic queries. Hibernate also supports native SQL statements. It also selects an effective way to perform a database manipulation task for an application.
Hibernate provides this mapping itself. The actual mapping between tables and application objects is done in XML files. If there is change in Database or in any table then the only need to change XML file properties.
Hibernate reduces lines of code by maintaining object-table mapping itself and returns result to application in form of Java objects. It relieves programmer from manual handling of persistent data, hence reducing the development time and maintenance cost.
Hibernate, with Transparent Persistence, cache is set to application work space. Relational tuples are moved to this cache as a result of query. It improves performance if client application reads same data many times for same write. Automatic Transparent Persistence allows the developer to concentrate more on business logic rather than this application code.
Hibernate enables developer to define version type field to application, due to this defined field Hibernate updates version field of database table every time relational tuple is updated in form of Java class object to that table. So if two users retrieve same tuple and then modify it and one user save this modified tuple to database, version is automatically updated for this tuple by Hibernate. When other user tries to save updated tuple to database then it does not allow saving it because this user does not have updated data.

Complexity.
ORM If your application is domain driven and the relationships among objects is complex or you need to have this object defining what the app does.
JDBC/SQL If your application is simple enough as to just present data directly from the database or the relationships between them is simple enough.
The book "Patterns of enterprise application architecture" by Martin Fowler explains much better the differences between these two types:
See: Domain Model and Transaction Script

I think you forgot to look at "Functional Relational Mapping"
I would sum up by saying:
If you want to focus on the data-structures, use an ORM like JPA/Hibernate
If you want to shed light on treatments, take a look at FRM libraries: QueryDSL or Jooq
If you need to tune your SQL requests to specific databases, use JDBC and native SQL requests
The strengh of various "Relational Mapping" technologies is portability: you ensure your application will run on most of the ACID databases.
Otherwise, you will cope with differences between various SQL dialects when you write manually the SQL requests.
Of course you can restrain yourself to the SQL92 standard (and then do some Functional Programming) or you can reuse some concepts of functionnal programming with ORM frameworks
The ORM strenghs are built over a session object which can act as a bottleneck:
it manages the lifecycle of the objects as long as the underlying database transaction is running.
it maintains a one-to-one mapping between your java objects and your database rows (and use an internal cache to avoid duplicate objects).
it automatically detects association updates and the orphan objects to delete
it handles concurrenty issues with optimistic or pessimist lock.
Nevertheless, its strengths are also its weaknesses:
The session must be able to compare objects so you need to implements equals/hashCode methods
But Objects equality must be rooted on "Business Keys" and not database id (new transient objects have no database ID!).
However, some reified concepts have no business equality (an operation for instance).
A common workaround relies on GUIDs which tend to upset database administrators.
The session must spy relationship changes but its mapping rules push the use of collections unsuitable for the business algorithms.
Sometime your would like to use an HashMap but the ORM will require the key to be another "Rich Domain Object" instead of another light one...
Then you have to implement object equality on the rich domain object acting as a key...
But you can't because this object has no counterpart on the business world.
So you fall back to a simple list that you have to iterate on (and performance issues result from)
The ORM API are sometimes unsuitable for real-world use.
For instance, real world web applications try to enforce session isolation by adding some "WHERE" clauses when you fetch data...
Then the "Session.get(id)" doesn't suffice and you need to turn to more complex DSL (HSQL, Criteria API) or go back to native SQL
The database objects conflicts with other objects dedicated to other frameworks (like OXM frameworks = Object/XML Mapping).
For instance, if your REST services use jackson library to serialize a business object.
But this Jackson exactly maps to an Hibernate One.
Then either you merge both and a strong coupling between your API and your database appears
Or you must implement a translation and all the code you saved from the ORM is lost there...
On the other side, FRM is a trade-off between "Object Relational Mapping" (ORM) and native SQL queries (with JDBC)
The best way to explain differences between FRM and ORM consists into adopting a DDD approach.
Object Relational Mapping empowers the use of "Rich Domain Object" which are Java classes whose states are mutable during the database transaction
Functional Relational Mapping relies on "Poor Domain Objects" which are immutable (so much so you have to clone a new one each time you want to alter its content)
It releases the constraints put on the ORM session and relies most of time on a DSL over the SQL (so portability doesn't matter)
But on the other hand, you have to look into the transaction details, the concurrency issues
List<Person> persons = queryFactory.selectFrom(person)
.where(
person.firstName.eq("John"),
person.lastName.eq("Doe"))
.fetch();

It also depends on the learning curve.
Ebean ORM has a pretty low learning curve (simple API, simple query language) if you are happy enough with JPA annotations for mapping (#Entity, #Table, #OneToMany etc).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.