JPA for querying in-memory data - java

During an import procedure, I read tabular data from excel sheets.
I want to perform some operation on this data: I want to sort, search by key, filter etc.
Is it possible to reliably perform this kind of operations via JQL?
thanks

I would use Space4J and simple Collections for something like this.
Personally I think all these ORM abstractions and layers are a distraction at the small scale problem domains, and are inflexible and added complexity at the other extreme.

You could use an in-memory database like HSQLDB, fill it, then use JPA to query the data. But wouldn't using some Maps and Lists be sufficient?

DataNucleus has its own in-memory query evaluator for JDOQL and JPQL syntaxis. I've never used it outside of a full JDO / JPA persistence environment but there's no real reason why it couldn't be made to work with a little coding

If you are using EclipseLink, you can execute most queries in memory against the cache.
You just need to read everything into the cache they execute a JPQL or Criteria query with the "CheckCacheOnly" option.
See,
http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching/Query_Options

Related

Is it really needed to use Spring Data JPA Named Queries?

After making some search on the web, I think that when using Spring Data JPA Named Queries, we need some extra implementation or definitions comparing to the derived or dynamic queries in Spring Data JPA. In this scene, I am really wondering that do we really need to use Spring Data JPA Named Queries?
Spring Data derived queries are intended (and useful) only for very simple queries. Those queries where you look at the name that you would naturally give such a method and would immediately know how to implement it in SQL or JPQL.
As soon as a query gets a little more complex we shouldn't use derived queries anymore, and often we can't even if we wanted to. For example query derivation doesn't have a way to control the precedence between AND and OR.
For all other queries we need to explicitly code the query one way or the others. And if you don't want your queries mixed with your repository, a named query is a very viable alternative.

Hibernate - join tables without NamedNativeQuery

While joining multiple tables on my project using Hibernate jpa /Spring (annotation driven), I had to use the NamedNativeQuery annotation to achieve my objective to extract a distributed resultset spanning multiple tables. This may be a question that merely serves academic merit, but given that I am starting out on Hibernate - is there another way to achieve table joins without having to fall back on queries native to the database dialect?
Yes. I believe this is exactly what you need: https://docs.jboss.org/hibernate/entitymanager/3.5/reference/en/html/querycriteria.html#querycriteria-tuple
Criteria Queries is a way of building a complete query just using its API. If I were you, I'd give it a try.
By the way, according to your question, the reason for using native queries is just for retrieving a specific set of columns. If this is the case, you can also write it using HQL as well. The query doesn't necessarily needs to be native.

How many ways Hibernate provide to access database?

How many ways Hibernate provide to access database?
For example, I want to CRUD an object to database, I found out:
Using session from SessionFactory:
session.save(object);
...
Using Hibernate Query Language.
Using Hibernate Criteria Queries.
Using Native SQL.
But I don't know what I should use. Please list your practice to access database in PRIORITY DECREASING ORDER and the reason why you do that.
Thank you.
If you have an ID and wants the associated entity, the use Session.get(). It's efficient, and makes use of the first-level cache to avoid reexecuting the query again and again.
If you need to get entities via other criteria (like all the users with a given first name, for example), then use JPQL queries. They are simple to write, very readable, and have less limitations than criteria queries.
If you need to take various optional criteria (like for a complex search form), the criteria API is the tool for the job. But it can't do everything a JPQL query does. There are other APIs available, and you can relatively easily write an API that generates dynamic JPQL queries if needed.
If you have a really complex query that can't be expressed using JPQL, then use SQL.
To write things to the database, queries should generally not be used, except in very specific circumstances where many entities must be modified the same way. Instead, get the entities to modify, and modify them. Hibernate will save their new state automatically.

Why do we need to create native query?

I am working in a project which uses JPA ORM and framework provides two kinds of method to create queries.
entityManager.createQuery(query1);
entityManager.createNativeQuery(query2);
I understand the kinds of query string is to be passed to use them, but I don't know exactly why do we need to create native query? Probably we don't want to use ORM capabilities there?
You do not need to create a native query unless you want to. JPQL eventually is translated into SQL by the framework but the framework lets you call the native query also. Why would want to do that:
Low level access, which means that you can optimize and handle the mapping by yourself; with SQL you actually access the database table while with JPQL you access the entity objects;
Maybe you do not want to learn JPQL if you already know SQL
You already have the queries written in SQL, and do not have resources/time to port them to JPQL
createQuery uses JPAs own query language, you select from Class names instead of table names. This is not SQL, it is just similar, and is later transformed to real SQL. Mapping to java classes will be done automatically and actual class instances will be returned as result.
createNativeQuery uses real SQL, and will not be able to use JPA features. This method is used in general if you need to do something really odd that is not supported by JPA. A list of Object[] will be returned, and mapping to java objects will have to be done manually. In other words, its just like working with a DB before JPA came to, just slightly more convenient since connection handling is done automatically.
I have used it for optimization purposes. Using Native queries means that the ORM mapping is not in place, and instead of JPQL, you use the DB's native syntax. So, as #RasmusFranke also pointed out, if you need something that is not supported by JPA (like when you want to use DB vendor specific extensions, which is conceptually a bad idea, since JPA is all about being DB agnostic, but happens nevertheless. I know...)
The other effect of this is that by using native queries, only the supplied query is run. No eager fetching of other entities, or other unwanted stuff. This way, if you deal with huge amounts of objects, you can save some heap space.

Disadvantages of Object Relational Mapping

I am a fan of ORM - Object Relational Mapping and I have been using it with Rails for the past year and a half. Prior that, I use to write raw queries using JDBC and make Database do the heavy lifting via Stored Procedures. With ORM, I was initially happy to do stuff like coach.manager and manager.coaches which were very simple and easy to read.
But as time went by there were in-numerous associations creeping up and I ended up doing a.b.c.d which were firing queries in all directions, behind the scenes. With rails and ruby, the garbage collector went nuts and took insane time to load a very complex page which involves relatively lesser data. I had to replace this ORM style code by a simple Stored procedure and the result I saw was enormous. A page that took 50 seconds to load now takes only 2 seconds.
With this huge difference, should I continue using ORM? It is very clear it has severe overheads compared to a raw query.
In general, what are the general pitfalls of using an ORM framework like Hibernate, ActiveRecord?
An ORM is only a tool. If you don't use it correctly, you'll have bad results.
Nothing stops you from using dedicated HQL/criteria queries, with fetch joins or projections, to return the information that your page must display in as few queries as possible. This will take more or less the same time as dedicated SQL queries.
But of course, if you just get everything by ID and navigate through your objects without realizing how many queries it generates, it will lead to long loading times. The key is to know exactly what the ORM does behind the scene, and decide if it's appropriate or if another strategy must be adopted.
I think you've already identified the major tradeoff associated with ORM software. Every time you add a new layer of abstraction that tries to provide a generalized implementation of something that you used to do by hand there is going to be some loss of performance/efficiency.
As you noted, traversing multiple relationships such as a.b.c.d can be inefficient, because most ORM software will be doing an independent database query for each . along the way. But I'm not sure that means you should eliminate ORM altogether. Most ORM solutions (or at least, certainly Hibernate) allow you to specify custom queries where you can bring back exactly what you want in a single database operation. This should be about as fast as your dedicated SQL.
Really the issue is about understanding how the ORM layer is working behind the scenes, and realizing that while something like a.b.c.d is simple to write, what it causes the ORM layer to do as it is evaluated is not. As a general rule I always go with the simplest possible approach to begin, and then write optimized queries in areas where it makes sense/where it is obvious that the simple approach will not scale.
I'd say, one should use the appropriate tool for different tasks.
E.g., for CRUD operations, ORM frameworks like Hibernate can speed up development and it will perform well enough. Sometimes you need to do some necessary tweaks to achieve acceptable performance. I'm not sure, your task (what took 50 sec with Hibernate) could not be done properly with Hibernate, because you did not provide us with the details.
On the other hand, for example bulk operations involving hundreds of thousands of records is not the type of task you'd expect Hibernate will do without significant performance penalty.
As it was mentioned already, ORM is only a tool and you can use it eiter good or bad.
One of the most typical performance problems in ORMs is 1+N queries problem. It is caused by loading additional objects for each of objects from the list. This is caused by eager fetch of 1-to-n-relation entities for each element on list, the dealing is using HQL queries, specifying fields in projection or marking fetching 1-to-n relations to lazy.
Any time, you must exactly know what the ORM is doing in order to achieve good performance. Not understanding what operations are done in background is a way to disaster (slow, buggy and hard to analyze code because of unnecessary and wrongly written work-arounds).
I'm with Petar from your comments regarding the lazy fetching. Say you have an html table filled fields from object a.b.c.d. You could find your framework round-tripping the database thousands of times(possibly many more) . The disadvantage of ORM in this case is you have to read the documentation thoroughly. Most frameworks support disabling lazy fetching and many even support adding your own processing logic to bind the data set.
The net out is that almost any ORM is almost undoubtedly better than anything you are going to write yourself. You will find yourself saddled with maintaining huge libraries of boilerplate or worse writing the same code over and over again.
We are currently investigating to switch from our own data store layer with clean separation of transfer objects and data access objects to JPA. We used a generator to create the TOs, the DAOs and the SQL DDL as well from some documentation in docbook format. By this all of our stuff from documentation, the database structure and the generated Java classes where always in sync with a good documentation of the database itself.
What we discovered so far by using JPA:
Foreign key references cannot be used for imports, some special
queries and so on because they must not be placed in a managed
entity. JPA only allows the target class there.
Access to some user session scope is difficult upto impossible. We
still have no clue how to get the users id into the column
'userWhoLastMadeAnUpdate' in some PrePersist method.
Something expected to be quite easy with an ORM, namely "class
mapping" does not work at all. We are using HalDateTime
(http://sourceforge.net/projects/haldatetime/) internally.
Especially in the client. Mapping it with JPA directly is not
possible although HalDateTime supports it. Due to JPA restrictions
we have to use two fields in the entity.
JPA uses either one XML file to describe the mapping. So you have to
look at least into two files to even understand the relationship
between the Java class and the database. And the XML file becomes
huge for large applications.
Alternatively ORMs provide annotations in the Java class itself. So
its easier to learn and understand the relationship. But it forces
you to see all that database stuff in the client layer (which
completely breaks a proper layering).
You will have to restrict yourself to stay as close to a clean
database structure as anyhow possible. Otherwise you will for sure
end up with a mess of queries and statements by the ORM.
Use an ORM which provides a query language which is close to SQL
itself (JPA seems quite acceptable here). An ORM induced language
makes supporting a large application really expensive.

Categories