Related
I was wondering about a question of concept. When accessing data in Spring Boot I see two ways. Creating SQL queries or using a relatively unspecific JPA/CRUD repository query and then filtering the result in java by accessing related entities via the entity object and lazy loading.
An example:
List<Car> unfilteredCars = carRepository.findAll();
List<Car> filteredByVendor = unfilteredCars.stream().filter( car -> car.getVendor().getId() == 3 ).toList()
I expect that an SQL query might be much more efficient, but testing of SQL queries is much more difficult and harder to maintain as well. I have queries with up to 5 joins which makes any update on the tables very painful. I am happy to sacrifice some efficiency for cleaner code and better maintainability.
What I am asking specifically is: Is the decrease in computing speed significant enough that I should stick with SQL statements or is it safe to use the java filtering approach for medium size applications?
I am working in a project which uses JPA ORM and framework provides two kinds of method to create queries.
entityManager.createQuery(query1);
entityManager.createNativeQuery(query2);
I understand the kinds of query string is to be passed to use them, but I don't know exactly why do we need to create native query? Probably we don't want to use ORM capabilities there?
You do not need to create a native query unless you want to. JPQL eventually is translated into SQL by the framework but the framework lets you call the native query also. Why would want to do that:
Low level access, which means that you can optimize and handle the mapping by yourself; with SQL you actually access the database table while with JPQL you access the entity objects;
Maybe you do not want to learn JPQL if you already know SQL
You already have the queries written in SQL, and do not have resources/time to port them to JPQL
createQuery uses JPAs own query language, you select from Class names instead of table names. This is not SQL, it is just similar, and is later transformed to real SQL. Mapping to java classes will be done automatically and actual class instances will be returned as result.
createNativeQuery uses real SQL, and will not be able to use JPA features. This method is used in general if you need to do something really odd that is not supported by JPA. A list of Object[] will be returned, and mapping to java objects will have to be done manually. In other words, its just like working with a DB before JPA came to, just slightly more convenient since connection handling is done automatically.
I have used it for optimization purposes. Using Native queries means that the ORM mapping is not in place, and instead of JPQL, you use the DB's native syntax. So, as #RasmusFranke also pointed out, if you need something that is not supported by JPA (like when you want to use DB vendor specific extensions, which is conceptually a bad idea, since JPA is all about being DB agnostic, but happens nevertheless. I know...)
The other effect of this is that by using native queries, only the supplied query is run. No eager fetching of other entities, or other unwanted stuff. This way, if you deal with huge amounts of objects, you can save some heap space.
I am a fan of ORM - Object Relational Mapping and I have been using it with Rails for the past year and a half. Prior that, I use to write raw queries using JDBC and make Database do the heavy lifting via Stored Procedures. With ORM, I was initially happy to do stuff like coach.manager and manager.coaches which were very simple and easy to read.
But as time went by there were in-numerous associations creeping up and I ended up doing a.b.c.d which were firing queries in all directions, behind the scenes. With rails and ruby, the garbage collector went nuts and took insane time to load a very complex page which involves relatively lesser data. I had to replace this ORM style code by a simple Stored procedure and the result I saw was enormous. A page that took 50 seconds to load now takes only 2 seconds.
With this huge difference, should I continue using ORM? It is very clear it has severe overheads compared to a raw query.
In general, what are the general pitfalls of using an ORM framework like Hibernate, ActiveRecord?
An ORM is only a tool. If you don't use it correctly, you'll have bad results.
Nothing stops you from using dedicated HQL/criteria queries, with fetch joins or projections, to return the information that your page must display in as few queries as possible. This will take more or less the same time as dedicated SQL queries.
But of course, if you just get everything by ID and navigate through your objects without realizing how many queries it generates, it will lead to long loading times. The key is to know exactly what the ORM does behind the scene, and decide if it's appropriate or if another strategy must be adopted.
I think you've already identified the major tradeoff associated with ORM software. Every time you add a new layer of abstraction that tries to provide a generalized implementation of something that you used to do by hand there is going to be some loss of performance/efficiency.
As you noted, traversing multiple relationships such as a.b.c.d can be inefficient, because most ORM software will be doing an independent database query for each . along the way. But I'm not sure that means you should eliminate ORM altogether. Most ORM solutions (or at least, certainly Hibernate) allow you to specify custom queries where you can bring back exactly what you want in a single database operation. This should be about as fast as your dedicated SQL.
Really the issue is about understanding how the ORM layer is working behind the scenes, and realizing that while something like a.b.c.d is simple to write, what it causes the ORM layer to do as it is evaluated is not. As a general rule I always go with the simplest possible approach to begin, and then write optimized queries in areas where it makes sense/where it is obvious that the simple approach will not scale.
I'd say, one should use the appropriate tool for different tasks.
E.g., for CRUD operations, ORM frameworks like Hibernate can speed up development and it will perform well enough. Sometimes you need to do some necessary tweaks to achieve acceptable performance. I'm not sure, your task (what took 50 sec with Hibernate) could not be done properly with Hibernate, because you did not provide us with the details.
On the other hand, for example bulk operations involving hundreds of thousands of records is not the type of task you'd expect Hibernate will do without significant performance penalty.
As it was mentioned already, ORM is only a tool and you can use it eiter good or bad.
One of the most typical performance problems in ORMs is 1+N queries problem. It is caused by loading additional objects for each of objects from the list. This is caused by eager fetch of 1-to-n-relation entities for each element on list, the dealing is using HQL queries, specifying fields in projection or marking fetching 1-to-n relations to lazy.
Any time, you must exactly know what the ORM is doing in order to achieve good performance. Not understanding what operations are done in background is a way to disaster (slow, buggy and hard to analyze code because of unnecessary and wrongly written work-arounds).
I'm with Petar from your comments regarding the lazy fetching. Say you have an html table filled fields from object a.b.c.d. You could find your framework round-tripping the database thousands of times(possibly many more) . The disadvantage of ORM in this case is you have to read the documentation thoroughly. Most frameworks support disabling lazy fetching and many even support adding your own processing logic to bind the data set.
The net out is that almost any ORM is almost undoubtedly better than anything you are going to write yourself. You will find yourself saddled with maintaining huge libraries of boilerplate or worse writing the same code over and over again.
We are currently investigating to switch from our own data store layer with clean separation of transfer objects and data access objects to JPA. We used a generator to create the TOs, the DAOs and the SQL DDL as well from some documentation in docbook format. By this all of our stuff from documentation, the database structure and the generated Java classes where always in sync with a good documentation of the database itself.
What we discovered so far by using JPA:
Foreign key references cannot be used for imports, some special
queries and so on because they must not be placed in a managed
entity. JPA only allows the target class there.
Access to some user session scope is difficult upto impossible. We
still have no clue how to get the users id into the column
'userWhoLastMadeAnUpdate' in some PrePersist method.
Something expected to be quite easy with an ORM, namely "class
mapping" does not work at all. We are using HalDateTime
(http://sourceforge.net/projects/haldatetime/) internally.
Especially in the client. Mapping it with JPA directly is not
possible although HalDateTime supports it. Due to JPA restrictions
we have to use two fields in the entity.
JPA uses either one XML file to describe the mapping. So you have to
look at least into two files to even understand the relationship
between the Java class and the database. And the XML file becomes
huge for large applications.
Alternatively ORMs provide annotations in the Java class itself. So
its easier to learn and understand the relationship. But it forces
you to see all that database stuff in the client layer (which
completely breaks a proper layering).
You will have to restrict yourself to stay as close to a clean
database structure as anyhow possible. Otherwise you will for sure
end up with a mess of queries and statements by the ORM.
Use an ORM which provides a query language which is close to SQL
itself (JPA seems quite acceptable here). An ORM induced language
makes supporting a large application really expensive.
We are currently evaluating options for migrating from hand-written persistence layer to ORM.
We have a bunch of legacy persistent objects (~200), that implement simple interface like this:
interface JDBC {
public long getId();
public void setId(long id);
public void retrieve();
public void setDataSource(DataSource ds);
}
When retrieve() is called, object populates itself by issuing handwritten SQL queries to the connection provided using the ID it received in the setter (this usually is the only parameter to the query). It manages its statements, result sets, etc itself. Some of the objects have special flavors of retrive() method, like retrieveByName(), in this case a different SQL is issued.
Queries could be quite complex, we often join several tables to populate the sets representing relations to other objects, sometimes join queries are issued on-demand in the specific getter (lazy loading). So basically, we have implemented most of the ORM's functionality manually.
The reason for that was performance. We have very strong requirements for speed, and back in 2005 (when this code was written) performance tests has shown that none of mainstream ORMs were that fast as hand-written SQL.
The problems we are facing now that make us think of ORM are:
Most of the paths in this code are well-tested and are stable. However, some rarely-used code is prone to result set and connection leaks that are very hard to detect
We are currently squeezing some additional performance by adding caching to our persistence layer and it's a huge pain to maintain the cached objects manually in this setup
Support of this code when DB schema changes is a big problem.
I am looking for an advice on what could be the best alternative for us. As far as I know, ORMs has advanced in last 5 years, so it might be that now there's one that offers an acceptable performance. As I see this issue, we need to address those points:
Find some way to reuse at least some of the written SQL to express mappings
Have the possibility to issue native SQL queries without the necessity to manually decompose their results (i.e. avoid manual rs.getInt(42) as they are very sensitive to schema changes)
Add a non-intrusive caching layer
Keep the performance figures.
Is there any ORM framework you could recommend with regards to that?
UPDATE To give a feeling of what kind of performance figures we are talking about:
The backend database is TimesTen, in-memory database that runs on the same machine as the JVM
We found out that changing rs.getInt("column1") to rs.getInt(42) brings the performance increase we consider significant.
If you want a standard persistence layer that lets you issue native SQL queries, consider using iBATIS. It's a fairly thin mapping between your objects and SQL. http://ibatis.apache.org/
For caching and lazy joins, Hibernate might be a better choice. I haven't used iBATIS for these purposes.
Hibernate provides a lot of flexibility in allowing you to specify certain defaults for lazy loading as you traverse your object graph, yet also pre-fetch data with SQL or HQL queries to your heart's content when you need better-known load times. However, the conversion effort will be complicated for you as it has a fairly high bar to entry in terms of learning and configuration. Annotations made this easier for me.
Two benefits you didn't mention about switching to a standard framework:
(1) running down bugs becomes easier when you have a wealth of sites and forums out there to support you.
(2) new hires are cheaper, easier and faster.
Good luck in addressing your performance and usability issues. The tradeoffs you point out are very common. Sorry if I evangelized.
For the bulk of your queries, I'd go with hibernate. It's widely used,well documented, and generally performant. You can drop down to hand-written SQL if hibernate isn't producing efficient enough queries. Hibernate gives you a lot of control in specifying the table names and columns that the domain objects map to, and in most cases you can retro fit it to an exisitng schema.
Find some way to reuse at least some of the written SQL to express mappings
The mappings are expressed in JPA using annotations. You can use the existing SQL as a guide when creating JPQL queries.
Add a non-intrusive caching layer
Caching in hibernate is automatic and transparent, unless you specifically choose to get involved. You can mark entities as read only, or evict from the cache, control when changes are flushed to the database (inside a transaction of course - automatic use of batching improves performance when network latency is a concern.)
Have the possibility to issue native
SQL queries without the necessity to
manually decompose their results (i.e.
avoid manual rs.getInt(42) as they
are very sensitive to schema changes)
Hibernate allows you to write SQL, and have this mapped to your entities. You don't deal with the ResultSet directly - hibernate takes care of the deconstruction into your entity. See Chpt 16, Native SQL in the hibernate manual.
Support of this code when DB schema changes is a big problem.
Managing schema changes can still be a pain, since you now effectively have two schemata - the database schema and the JPA mapping (an object schema). if you choose to let hibernate generate the db schema and move your data to that, you are no longer directly responsible for what goes into the database, and so you are then faced with manging automatic changes to a machine generated schema. There are tools that can assist, such as dbmigrate, and liquibase, but it's no walk in the park. Conversely, if you are managing the db schema by hand, then you will have to carefully recraft your entities, JPA annotations and queries to accomodate the schema changes. Adding columns and new entities is relatively trivial, but more complex changes such as changing a single property to a collection of properties, or restructing an object hierarchy will involve considerably more extensive changes. There is no easy way out of this - either the db or hibernate is the "master" that decides the schema, and when one changes, the other must follow. The code changes aren't so bad - in my experience, it's migrating the data that's difficult. But this is a basic issue with databases, and will be present in any solution you choose.
So, to sum up, I'd go with hibernate, and use the JPA interface.
I've recently drilled through a bunch of Java ORMs and didn't come up with anything much better than Hibernate. Hibernate's performance may get you there and satisfy your performance goals.
Lots of people think that moving to Hibernate will make everything so awesome, but it's really just moving a set of problems from JDBC queries into Hibernate tuning. Read a bunch of books or (better) hire a "Hibernate guy" to come in and help.
During your refactor, I'd recommend using JPA so you can un-plug and re-plug a new persistence provider when the Next Big Thing comes along (or you move to Oracle)
Do you really need to migrate? What's forcing you to move? Is there some REAL need here or someone just inventing work (an 'Astronaut architect')?
I agree with the above answers though - if you HAVE to move - Hibernate or iBatis are good choices. iBatis especially if you want to stay 'closer' to the SQL.
If you need more performance: drop the database (for on-line work) and handle the persistence direct. Adding caching is not going to help you with a TimesTen DB, it just adds an extra copy (slowing you down).
You might want to take a look at GemFire.
There is a lot of good advice already in here that I won't repeat. The only thing I didn't see suggested that might work for you is caching reference data in memory.
I have done quite a bit of this in the past and it does save a lot of time. If you have a large number of fairly static reference tables, load them all into memory at startup time and refresh them every couple minutes. That way you're not hitting the DB over and over again for data that never changes.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
It's a pretty open ended question. I'll be starting out a new project and am looking at different ORMs to integrate with database access.
Do you have any favorites?
Are there any you would advise staying clear of?
I have stopped using ORMs.
The reason is not any great flaw in the concept. Hibernate works well. Instead, I have found that queries have low overhead and I can fit lots of complex logic into large SQL queries, and shift a lot of my processing into the database.
So consider just using the JDBC package.
None, because having an ORM takes too much control away with small benefits. The time savings gained are easily blown away when you have to debug abnormalities resulting from the use of the ORM. Furthermore, ORMs discourage developers from learning SQL and how relational databases work and using this for their benefit.
Many ORM's are great, you need to know why you want to add abstraction on top of JDBC. I can recommend http://www.jooq.org to you (disclaimer: I'm the creator of jOOQ, so this answer is biased). jOOQ embraces the following paradigm:
SQL is a good thing. Many things can be expressed quite nicely in SQL. There is no need for complete abstraction of SQL.
The relational data model is a good thing. It has proven the best data model for the last 40 years. There is no need for XML databases or truly object oriented data models. Instead, your company runs several instances of Oracle, MySQL, MSSQL, DB2 or any other RDBMS.
SQL has a structure and syntax. It should not be expressed using "low-level" String concatenation in JDBC - or "high-level" String concatenation in HQL - both of which are prone to hold syntax errors.
Variable binding tends to be very complex when dealing with major queries. THAT is something that should be abstracted.
POJO's are great when writing Java code manipulating database data.
POJO's are a pain to write and maintain manually. Code generation is the way to go. You will have compile-safe queries including datatype-safety.
The database comes first. While the application on top of your database may change over time, the database itself is probably going to last longer.
Yes, you do have stored procedures and user defined types (UDT's) in your legacy database. Your database-tool should support that.
There are many other good ORM's. Especially Hibernate or iBATIS have a great community. But if you're looking for an intuitive, simple one, I'll say give jOOQ a try. You'll love it! :-)
Check out this example SQL:
// Select authors with books that are sold out
SELECT *
FROM T_AUTHOR a
WHERE EXISTS (SELECT 1
FROM T_BOOK
WHERE T_BOOK.STATUS = 'SOLD OUT'
AND T_BOOK.AUTHOR_ID = a.ID);
And how it can be expressed in jOOQ:
// Alias the author table
TAuthor a = T_AUTHOR.as("a");
// Use the aliased table in the select statement
create.selectFrom(a)
.whereExists(create.selectOne()
.from(T_BOOK)
.where(T_BOOK.STATUS.equal(TBookStatus.SOLD_OUT)
.and(T_BOOK.AUTHOR_ID.equal(a.ID))))));
Hibernate, because it's basically the defacto standard in Java and was one of the driving forces in the creation of the JPA. It's got excellent support in Spring, and almost every Java framework supports it. Finally, GORM is a really cool wrapper around it doing dynamic finders and so on using Groovy.
It's even been ported to .NET (NHibernate) so you can use it there too.
Hibernate, because it:
is stable - being around for so many years, it lacks any major problems
dictates the standards in the ORM field
implements the standard (JPA), in addition to dictating it.
has tons of information about it on the Internet. There are many tutorials, common problem solutions, etc
is powerful - you can translate a very complex object model into a relational model.
it has support for any major and medium RDBMS
is easy to work with, once you learn it well
A few points on why (and when) to use ORM:
you work with objects in your system (if your system has been designed well). Even if using JDBC, you will end up making some translation layer, so that you transfer your data to your objects. But my bets are that hibernate is better at translation than any custom-made solution.
it doesn't deprive you of control. You can control things in very small details, and if the API doesn't have some remote feature - execute a native query and you have it.
any medium-sized or bigger system can't afford having one ton of queries (be it at one place or scattered across), if it aims to be maintainable
if performance isn't critical. Hibernate adds performance overhead, which in some cases can't be ignored.
I would recommend using MyBatis. It is a thin layer on top of JDBC, it is very easy to map objects to tables and still use plain SQL, everything is under your control.
I had a really good experience with Avaje Ebean when I was writing a medium sized JavaSE application.
It uses standard JPA annotations to define entities, but exposes a much simpler API (No EntityManager or any of that attached/detached entities crap). It also lets you easily use SQL queries or event plain JDBC calls when necessary.
It also has a very nice fluid and type-safe API for queries. You can write things like:
List<Person> boys = Ebean.find(Person.class)
.where()
.eq("gender", "M")
.le("age", 18)
.orderBy("firstName")
.findList();
SimpleORM, because it is straight-forward and no-magic. It defines all meta data structures in Java code and is very flexible.
SimpleORM provides similar
functionality to Hibernate by mapping
data in a relational database to Java
objects in memory. Queries can be
specified in terms of Java objects,
object identity is aligned with
database keys, relationships between
objects are maintained and modified
objects are automatically flushed to
the database with optimistic locks.
But unlike Hibernate, SimpleORM uses a
very simple object structure and
architecture that avoids the need for
complex parsing, byte code processing
etc. SimpleORM is small and
transparent, packaged in two jars of
just 79K and 52K in size, with only
one small and optional dependency
(Slf4j). (Hibernate is over 2400K
plus about 2000K of dependent Jars.)
This makes SimpleORM easy to
understand and so greatly reduces
technical risk.
Eclipse Link, for many reasons, but notably I feel like it has less bloat than other main stream solutions (at least less in-your-face bloat).
Oh and Eclipse Link has been chosen to be the reference implementation for JPA 2.0
While I share the concerns regarding Java replacements for free-form SQL queries, I really do think people criticizing ORM are doing so because of a generally poor application design.
True OOD is driven by classes and relationships, and ORM gives you consistent mapping of different relationship types and objects.
If you use an ORM tool and end up coding query expressions in whatever query language the ORM framework supports (including, but not limited to Java expression trees, query methods, OQL etc.), you are definitely doing something wrong, i.e. your class model most likely doesn't support your requirements in the way it should. A clean application design doesn't really need queries on the application level. I've been refactoring many projects people started out using an ORM framework in the same way as they were used to embed SQL string constants in their code, and in the end everyone was suprised about how simple and maintainable the whole application gets once you match up your class model with the usage model. Granted, for things like search functionality etc. you need a query language, but even then queries are so much constrained that creating an even complex VIEW and mapping that to a read-only persistent class is much nicer to maintain and look at than building expressions in some query language in the code of your application. The VIEW approach also leverages database capabilities and, via materialization, can be much better performance-wise than any hand-written SQL in your Java source.
So, I don't see any reason for a non-trivial application NOT to use ORM.