Is Hibernate Search a clean abstraction for Lucene?

Is Hibernate Search a clean abstraction for Lucene? - java

I have used Hibernate in the past and I share many people's frustration with using an ORM. Since traditional databases are relational, any ORM has a leaky abstraction. Much of my time ends up being used to understand the details of the abstraction so I can achieve good performance.
Hibernate Search however works on top of Lucene. Since Lucene contains a collection of documents of the same types, it might not have the same problems as Hibernate with a relational database. Does Hibernate Search provide a clean abstraction, or is Hibernate Search fraught with the same problems as Hibernate+MySQL?
I'm considering moving from an existing implementation in raw Lucene to Hibernate Search.

A good abstraction is an abstraction which solves a problem which is not addressed by the underlying library with no or little compromise on the feature set. In case of Lucene, these problems could be:
index distribution,
synchronization with another persisted data source,
...
Then, the best abstraction depends on this problem you need to solve:
If you just want to be able to build a reverse index on a single server and query it, then stick to plain Lucene. Lucene is already the best abstraction available. Any other abstraction would add overhead while probably preventing you from using some features and not making things much easier.
If you want to go distributed, then Solr or Elastic Search would help you a lot.
If you want to integrate full-text functionality with another persisted data set, then Hibernate Search or Compass could be interesting candidates.

Related

Java Large number of transaction object caching

I am looking for best solution for caching large amount of simple transactional pojo structure in memory. Transactions happen at oracle database on 3-4 tables by external application. Another application is kind of Business Intelligence type, which based on transactions in database evaluates updated pojos(mapped to table) and applies various business rules.
Hibernate solution relies on transactions on same server; where as in our case transactions happen some where else, and not sure cached objects can be queried.
Question:
Is there oracle jdbc API that would trigger update event on java side?
Which Caching solution would support #1,
Is cached objects can be queried?

Oracle databases support Java triggers, so in theory you could implement something like this yourself, see this guide. In theory, your Java trigger could invoke the client library of whichever distributed caching solution you are using, to update or evict stale entries.
Oracle also have a caching solution of their own, known as Coherence. It might have integration like this built in, or at least it might be worth checking it out. Search for "java distributed cache" for some alternatives.
As far as I know Hibernate does not support queries on objects stored in its cache.
However if you cache an entire collection of objects separately, then there are some libraries which will allow you to perform SQL-like queries on those collections:
LambdaJ - supports advanced queries, not as fast
CQEngine - supports typical queries, extremely fast
BTW I am the author of CQEngine. I like both of those libraries. But please excuse my slight bias for my own one :)

Using Solr/Lucene as persistence technology

Solr/Lucene's reverse index and query supports an subset of RDBMS functionalities, i.e. filtering, sorting, groupby, paging. In this sense it is very close to an nosql database as it also does not support transaction and joins.
With framework like Hibernate-Search, it is possible to map even complex objects to the index and perform basic CRUD operations, while supporting full-text search.
Considerations:
1) Write throughput
From my past experience, Lucene index's write throughput is much lower than RDBMS
2) Query Speed
Query speed for Lucene index should be comparable, if not faster, due to the reverse index.
3) Scalability
Could be resolved using replication or Solr-cloud.
4) Ability to handle large data set
I have used lucene index with 15M+ document on a single JVM without any performance issue.
Background:
I am currently using MongoDB with Solr and it is working well enough. However, it is not as "simple" as i would like it to be due to:
Keeping mongo and Solr index in sync (not a trivial task)
Transformation between Java object <-> mongo <-> solr (SpringData and SolrJ helps, but still not great).
Why use two "persistence" technology if one will do
From the small scale test I have done so far, I haven't found any technical road block that would prevent me from using Solr/Lucene as persistence. However, I also don't want to commit to such a drastic refactoring without more information. I also aware of projects like Solandra with attempts to bring NoSQl and Solr together, but they don't seem to be mature enough.
Question
So with applications where full-text search is an major (but not the only) requirement, is it then feasible to for-go traditional (RDBMS) and contemporary (NoSQL) data store?
Great Reference Thanks to raticulin
Atlassian (Jira) - Lucene Generic Data Indexing

I think I remember watching some presentation from Atlassian where they explained that for Jira the were using just Lucene nowadays, they had dropped their previous DB (whatever it was) and using Lucene as storage too. They were happy.
If someone can confirm it was them would be cool.
Edit:
http://blogs.atlassian.com/rebelutionary/downloads/tssjs2007-lucene-generic-data-indexing.pdf

Lucene - Full Text Search/Information Retrieval Library.
Solr - Enterprise Search Server built on top of Lucene.
Lucene/Solr should not be used in place of Persistence, neither they will be able to replace RDBMS nor it is a good thing to compare them to RDBMS, you are comparing apples & oranges.
As far index throughput speed of Lucene that you are comparing with RDBMS will not help & it is not right to compare directly, there could be a number of factors that affect Lucene throughput depending on your search schema configurations.
Lucene has one of the well known & best data structures for information retrieval, Query speed that you get depends on number of factors from configuration, HW etc..
Obviously, that's the way to go.
Handling 15M+ on a single JVM is great, but it does not go far without understanding Document size, feature set used, JVM Memory, CPU Cores etc...
Now if your problem is that RDBMS is real scalability bottleneck, you could use pick a NoSQL datastore based on your persistence needs, which you could then with integrate Solr/Lucene to provide full-text search capability. Since NoSQL is rapidly evolving & fairly new you might not find fairly stable adapters to integrate Solr/Lucene with NoSQL.
Edit:
Now that the question is updated, this is already well debated in this question NoSQL (MongoDB) vs Lucene (or Solr) as your database. It could be a pain to have too many moving parts, Lucene/Solr could very well replace MongoDB, depending on app. But you have to consider NoSQL Data Store are built from ground up to be fully distributed, you dont lose or have limited functionality due to scaling, while Solr is not built with Distributed Computing in mind, so there are limitations Distributed Search limitations when it comes horizontal scaling. SolrCloud may be the answer too that..

Performance difference of Native SQL(using MySQL) vs using Hibernate ORM?

I am using Spring MVC for an application that involved a multilevel back end for management and a customer/member front end. The project was initially started with no framework and simple native JDBC calls for database access.
As the project has grown significantly(as they always do) I have made more significant database calls, sometimes querying large selection sizes.
I am doing what I can to treat my db calls to closely emulate Object Relational Mapping best practices, but am still just using JDBC. I have been contemplating on whether or not I should make the transition to hibernate, but was unsure if it would be worth it. I would be willing to do it, if it was worth a performance gain.
Is there any performance gain from using Hibernate( or even just Object Relational Mapping) over native SQL using JDBC?

Is there any performance gain from using Hibernate (or even just Object Relational Mapping) over native SQL using JDBC?
Using an ORM, a datamapper, etc won't make the same SQL queries run faster. However, when using Hibernate you can benefit from things like lazy loading, second level caching, query caching and these features might help to improve the performances. I'm not saying Hibernate is perfect for every use case (for the special cases Hibernate can't handle well, you can always fall back to native SQL) but it does a very decent job and definitely improves development time (even after adding time spent on optimization).
But the best way to convince yourself would be to measure things and in your case, I would probably create an Hibernate prototype covering some representative scenarios and bench it.

ORM lets you stay inside OOP world, but this comes at the cost of performance, especially with many to many relations in our case. We were using Hibernate as default, doing performance optimization with jdbc where required.

Hibernate will make the development and maintenance of your app easier, but it won't necessarily make DB access quicker.
If your native JDBC calls use inefficient SQL then you might see some performance improvement as HIbernate tends to generate good SQL.

What's the "low risk" choice between JDO or JPA?

Do not close this as a duplicate of other questions because I'm not asking the same thing. They're also about a year old. That said, see these links:
http://db.apache.org/jdo/jdo_v_jpa.html
http://www.datanucleus.org/products/accessplatform/jdo_jpa_faq.html
http://www.datanucleus.org/products/accessplatform/persistence_api.html
It seems JPA is the "popular" choice backed by the big vendors (who love to screw you over if they can).
It seems JDO is the more mature, seemingly superior choice which should enjoy more OSS community backing. (But does it?)
So what's a low-risk tolerance organization supposed to do? Is the difficulty of going from one to the other about the same? Has one started to emerge above the other at this point? Also, only because we currently use it, does Hibernate limit you to JPA-only? If so what is the most popular JDO implementation?

#Crusader - what makes you think that anyone on SO has a better crystal ball than you do?
So what's a low-risk tolerance organization supposed to do?
Pick the alternative that it determines to be the low risk solution. How it determines what solution has the least risk is ... unclear ... but I don't think that asking SO is a valid risk assessment procedure.
The other point is that choosing JDO when JPA is the "winner" (or vice versa) probably won't kill your project in the short or long term. The consequences of making the wrong choice are most likely limited to greater staff training costs, and being stuck with base ORM platform where development has stagnated and support is increasingly expensive. [I'd protect myself against the latter by picking an open source ORM platform ... either way.]
Is the difficulty of going from one to the other about the same?
Probably yes. Especially when you consider data migration issues.
Has one started to emerge above the other at this point?
JPA seems to dominate these days. The JDO folks would say that their way is technically superior, but that's not the point.
Also, only because we currently use it, does Hibernate limit you to JPA-only?
JPA plus Hibernate-specific extensions. Certainly Hibernate does nor support JDO and it probably never could.
If so what is the most popular JDO implementation?
Pass.

If you use a lightweight dependency injection and ORM wrapping framework like the open source exPOJO it allows you to bypass that question altogether. Your main code base remains completely agnostic of the underlying persistence interface/technology (JDO, Hibernate implementations currently supported, JPA on the way - like to lend a hand?).
All persistence technology specific code is encapsulated within Repository and Service classes as per Chris Richardson's excellent book "POJOs in Action" and exposed using the "exposed domain model" pattern he discusses in the book - which turns out to be pretty awesome and the most productive approach I've ever used.
Using exPOJO 99% of your code remains gloriously and instantly portable between JDO, JPA, Hibernate plus you get extremely lightweight and very simple dependency injection (no annotations or XML required) as an added bonus.
It comes with it's own extremely lightweight and easy to use servlet filter that can provide "open session/persistence manager/entity manager in view" without the XML hell. Each HTTP request is automatically attached to a ModelExposer object that provides convenient access to the repository and service components that allow generic access to your objects.
exPOJO is at http://www.expojo.com - yeah ok, I wrote it so I am biased slightly =]

JDO vs JPA for Java on Google App Engine

I want to develop my project on Google App Engine with Struts2. For the database I have two options JPA and JDO. Will you guys please suggest me on it? Both are new for me and I need to learn them. So I will be focused on one after your replies.
Thanks.

The GAE/J google group has several posts about this very thing. I'd do a search on there and look at people's opinions. You will get a very different message to the opinions expressed above. Also focus on the fact that BigTable is not an RDBMS. Use the right tool for the job

JPA is Sun's standard for persistence, JDO is IMHO dying (actually, it's dead but still moving). In other words, JPA seems to be a better investment on the long term. So I guess I'd choose JPA if both were new to me.

Just saw this comparison between JPA and JDO by DataNucleus themselves:-
http://www.datanucleus.org/products/accessplatform_2_1/jdo_jpa_faq.html
An eye-opener.

I'm a happy user of JDO. Keep up the good work guys.

People claiming JDO is dead is not without merit. Here is what I read in the book Pro EJB 3 Java Persistence API: "Shortly thereafter Sun announced that JDO would be reduced to specification maintenance mode and that the Java Persistence API would draw from both JDO and the other persistence vendors and become the single supported standard going forward.". The author Mike Keith is the co-specification leader on EJB3. Of course he is a big supporter of JPA, but I doubt he is biased enough to lie.
It is true that when the book was published, most major vendors were united behind JPA rather than JDO, even though JDO does have more advanced technical features than JPA. It is not surprising because big players in the EE world such as IBM/Oracle are also big RDBMS vendors. More customers are using RDMBS than non-RDMBS in their projects. JDO was dying until GAE gave it a big boost. It makes sense because GAE data store is not relational database. Some JPA features does not work with bigtable such as aggregation queries, Join queries, owned many-to-many relationships. BTW, GAE supports JDO 2.3 while only support JPA 1.0. I will recommend JDO if GAE is your target cloud platform.

For the record, it is Google App Engine (GAE), so we play with the Google rules not with the Oracle/Sun rules.
Under it, JPA is not suitable for GAE, it is unstable and it does not work as expected. Neither Google is willing to support it but the bare minimum.
And for other part, JDO is quite stable in GAE and it is (in some extend) well documented by Google.
However, Google does not recommend any of them.
http://code.google.com/appengine/docs/java/datastore/overview.html
Low-level API will give the best performance and GAE is all about performance.
http://gaejava.appspot.com/
For example, add 10 entity
Python :68ms
JDO :378ms
Java Native :30ms

In race between JDO vs JPA I can only agree with the datanucleus posters.
First of all, and also most importantly, the posters of datanucleus know what they are doing. They are after all developing a persistent library and are familiar with data models other than the relational, e.g. Big Table. I am sure that id a developer for hibernate were here, he would have say: "all our assumptions when building our core libraries are tightly coupled to relational model, hibernate is not optimized for GAE".
Secondly, JPA is unquestionably in more widespread use, being a part of the official Java EE stack helps a bit, but that does not necessarily mean that it is better.
In fact, JDO, if you read about it, corresponds to a higher level of abstraction than JPA. JPA is tightly coupled to the RDBMS data model.
From a programming stand point, using the JDO APIs is a much better option, because you are conceptually compromising a lot less. You can switch, theoretically to any data model of your desire, provided the provider you use supports the underlying database.
(In practice you rarely achieve such a high level of transparancy, because you will find yourself setting your primary keys on GAE's object and you will be tying yourself to a specific database provider, e.g. google). it will still be easier to migrate though.
Thirdly, you can use Hibernate, Eclipse Link, and even spring with GAE. Google seems to have made a big effort to allow you to use the frameworks you are used to building your applications on. But what people realize when they build their GAE applications as if they were running on RDBMS is that they are slow. Spring on GAE is SLOW. You can google Google IO videos on this topic to see that it is true.
Also, adhering to standards is a good sensible thing to do, in principle I applaud. On the other hand, JPA being part of the Java EE stack makes people, at times, lose their notion of options.
Realize, if you will, that Java Server Faces is also part of the Java EE stack. And it is an unbelievably tidy solution for web GUI development. But in the end, why do people, the smarter people if I may say so, deviate from this standard and use GWT instead?
In all of this, I have to sate that there is one very significant thing going for JPA. That is Guice and its convenient support for JPA. Seems that google was not as smart as usual in this point and are content, for now in not supporting JDO. I still think that they can afford it, and eventually Guice will engulf JDO as well,... or maybe not.

Go JDO. Even if you don't have experience in it, it is not hard to pick up, and you will have a new skill under your belt!

What I think is terrible about using JDO at the time of writing this is that the only implementation vendor is Datanucleus and the drawbacks of that is the lack of competition which leads to numerous issues like:
A not very detailed documentation about some aspects like extensions
You usually get sarcastic responses from the authors like (Have you checked the logs ? May be there is a reason for having them) and annoying responses like that
You don't get an answer to your question in a helpful amount of time, sometimes if you get an answer in less than 7 days, you should consider your self lucky, even here on StackOverflow
I'm always hoping for someone to start implementing the JDO specification themselves, may be then they'll offer something more and hopefully more free attention to the community and not always bothering about being paid for support, not saying that Datanucleus authors only care about commercial support, but I'm just saying.
I personally consider Datanucleus authors has no obligation whatsoever to Datanucleus itself nor it's community. They can drop the whole project at anytime and no one can judge them for it, it's their effort and their own property. But you should know what you are getting into. You see, when one of us developers look for a framework to use, you cannot punish or command the framework's author, but on the other hand, you need your work done ! If you had time to write that framework, why would you look for one in the first place ?!
On the other hand, JDO itself has some complications like objects life cycle and stuff which isn't very intuitive and common (I think).
Edit: Now I know also JPA enforces the object life cycle mechanism, so it looks like its inevitable to deal with persisted entities life cycle states if you wish to use a standard ORM API (i.e. JPA or JDO)
What I like most about JDO is the ability to work with ANY database management system without considerable effort.

GAE/J is slated to add MYSQL before the end of the year.

JPA is the way to go as it seems to be pushed as a standardized API and has recently got momentum in EJB3.0.. JDO seems to have lost the steam.

Neither!
Use Objectify, because is cheaper (use less resources) and is faster.
FYI: http://paulonjava.blogspot.mx/2010/12/tuning-google-appengine.html
Objectify is a Java data access API specifically designed for the
Google App Engine datastore. It occupies a "middle ground"; easier to
use and more transparent than JDO or JPA, but significantly more
convenient than the Low-Level API. Objectify is designed to make
novices immediately productive yet also expose the full power of the
GAE datastore.
Objectify lets you persist, retrieve, delete, and query your own typed objects.
#Entity
class Car {
#Id String vin; // Can be Long, long, or String
String color;
}
ofy().save().entity(new Car("123123", "red")).now();
Car c = ofy().load().type(Car.class).id("123123").now();
ofy().delete().entity(c);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.