Given: There are multiple document-storage type NoSQL systems such as couchdb/base, mongodb, and aws-dynamodb (that can be both document and key-value). cross-check research: Document-storage based NoSQL systems can be found here: http://nosql-database.org/
When: Different nosql types (key-value, document store, etc) provide different benefits depending on what you are trying to implement. In the case of this question, document-storage of JSON content is desired, and evaluating a way to implement a good document-storage solution and test against different nosql databases is desired.
Question: Are there JVM-based frameworks (java, scala, groovy, whatever) that would provide an agnostic overlay similar to JPA and other ORM's that would allow implement-once code and test run against different nosql databases without rewirting the code (only change the configuration)?
Note: http://hibernate.org/ogm/ is one such example, but only lists one document-store based nosql database supported.
Spring Data currently has modules for Redis, MongoDB, Couchbase, DynamoDB and several other NoSQL databases.
The Hibernate OGM project currently supports MongoDB and Neo4j.
It looks like there is also JPA support for CouchBase, MongoDB and Google Cloud Datastore.
Hibernate seems to have what you are looking for.
Related
I am researching how to build a general application or microservice to enable building workflow-centric applications. I have done some research about frameworks (see below), and the most promising candidates share a hard reliance upon RDBMSes to store workflow and process state combined with JPA-annotated entities. In my opinion, this damages the possibility of designing a general, data-driven workflow microservice. It seems that a truly general workflow system can be built upon NoSQL solutions like MondoDB or Cassandra by storing data objects and rules in JSON or XML. These would allow executing code to enforce types or schemas while using one or two simple Java objects to retrieve and save entities. As I see it, this could enable a single application to be deployed as a Controller for different domains' Model-View pairs without modification (admittedly given a very clever interface).
I have tried to find a workflow engine/BPM framework that supports NoSQL backends. The closest I have found is Activiti-Neo4J, which appears to be an abandoned project enabling a connector between Activity and Neo4J.
Is there a Java Work Engine/BPM framework that supports NoSQL backends and generalizes data objects without requiring specific POJO entities?
If I were to give up on my ideal, magically general solution, I would probably choose a framework like jBPM and Activi since they have great feature sets and are mature. In trying to find other candidates, I have found a veritable graveyard of abandoned projects like this one on Java-Source.net.
Yes, Temporal Workflow has pluggable persistence and runs on Cassandra as well as on SQL databases. It was tested to up to 100 Cassandra nodes and could support tens of thousands of events per second and hundreds of millions of open workflows.
It allows to model your workflow logic as plain old java classes and ensures that the code is fully fault tolerant and durable across all sorts of failures. This includes local variable and threads.
See this presentation that goes into more details about the programming model.
I think the reason why workflow engines are often based on RDBMS is not the database schema but more the combination to a transaction-safe data store.
Transactional robustness is an important factor for workflow engines, especially for long-running or nested transactions which are typical for complex workflows.
So maybe this is one reason why most engines (like activi) did not focus on a data-driven approach. (I am not talking about data replication here which is covered by NoSQL databases in most cases)
If you take a look at the Imixs-Workflow Project you will find a different approach based on Java Enterprise. This engine uses a generic data object which can consume any kind of serializable data values. The problem of the data retrieval is solved with the Lucene Search technology. Each object is translated into a virtual document with name/value pairs for each item. This makes it easy to search through the processed business data as also to query structured workflow data like the status information or the process owners. So this is one possible solution.
Apart from that, you always have the option to store your business data into a NoSQL database. This is independent from the workflow data of a running process instance as far as you link both objects together.
Going back to the aspect of transactional robustness it's a good idea to store the reference to your NoSQL data storage into the process instance, which is transaction aware. Take also a look here.
So the only problem you can run into is the fact that it's very hard to synchronize a transaction context from a EJB/JPA to an 'external' NoSQL database. For example: what will you do when your data was successful saved into your NoSQL data storage (e.g. Casnadra), but the transaction of the workflow engine fails and a role-back is triggered?
The designers of the Activiti project have also been aware of the problem you have stated, but knew it would be quite a re-write to implement such flexibility which, arguably, should have been designed into the project from the beginning. As you'll see in the link provided below, the problem has been a lack of interfaces toward which to code different implementations other than that of a relational database. With version 6 they went ahead and ripped off the bandaid and refactored the framework with a set of interfaces for which different implementations (think Neo4J, MongoDB or whatever other persistence technology you fancy) could be written and plugged in.
In the linked article below, they provide some code examples for a simple in-memory implementation of the aforementioned interfaces. Looks pretty cool and sounds to perhaps be precisely what you're looking for.
https://www.javacodegeeks.com/2015/09/pluggable-persistence-in-activiti-6.html
I wanted to ask you, if you have any experience that Hibernate OGM works as much fine with mongodb, that it could be used in an enterprise solution without any worries. With other words - does this combination work as fine as for example Hibernate ORM with MySQL and is is also that easy to set up? Is it worth to use it - meant the level of afford needed to set it up compared to the level of improvement of the work with the database? Would you prefer another OGM framework or even don't use any? I read about it some time ago, but it was in the early stages of this project and didn't work too well yet. Thanks for advices and experiences.
(Disclaimer: I'm one of the Hibernate OGM authors)
With other words - does this combination work as fine as for example Hibernate ORM with MySQL?
The 4.1 release is the first final we consider to be ready to use in production. The general user experience should be not much different from using the classic Hibernate ORM (which still is what you use under the hood when using Hibernate OGM). Also the MongoDB dialect probably is the one we put most effort in, so it is in good shape.
But as Hibernate OGM is a fairly young project, of course there may be bugs and glitches which need to be ironed out. Feature-wise, there are some things not supported yet (e.g. secondary tables, criteria API, more complex JPA queries), but you either shouldn't really need those in most kinds of applications or there are work-arounds (e.g. native queries).
and is is also that easy to set up?
Yes, absolutely. The set-up is not different from using Hibernate ORM / JPA with an RDBMS. You only use another JPA provider class (HibernateOgmPersistence) and need to set some OGM-specific options (which NoSQL store to use, host name etc.). Check out this blog post which walks you through the set-up. For store-specific settings (e.g. how to store associations in document stores) there is an easy-to-use option system based on annotations and/or a fluent API.
[Is it worth the effort] to set it up compared to the level of improvement of the work with the database?
I don't think there is a general answer to that. In many cases object mappers like Hibernate ORM/OGM are great, in others cases working with plain SQL or NoSQL APIs might be a better option. It depends on your use case and its specific requirements. In general, OxMs work well if there is a defined domain model which you want to persist, navigate its associations etc.
Would you prefer another OGM framework
I'm obviously biased, but let me say that using Hibernate OGM allows you to
benefit from the eco-system existing around JPA/Hibernate, be it integration with other libraries such as Hibernate Validator or Hibernate Search (or your in-house developed Hibernate-based API) or tooling such as modelling tools which emit JPA entities.
work with different NoSQL backends using the same API. So if chances are you need to integrate another NoSQL store (e.g. Neo4j to run graph queries) or an RDMBS, then Hibernate OGM will allow you to do so easily.
I read about it some time ago, but it was in the early stages of this project
Much work has been put into Hibernate OGM over the last year, so my recommendation definitely is to try it out and see in a prototype or spike how it works for your requirements.
If you have any feature requests or questions, please let us know and we'll see what we can do for you.
I have an application that uses SQL Server. I wanted to use a NOSQL store and I decided it to be graph since my data is highly connected. Neo4j is an option.
I want optimally to be able to switch the databases without touching the application layer, say, just modifying some xml configuration files.
I've taken a look at some examples public on the web, I've seen that ORM and OGM don't configure applications the same way, the config file of each has it's own name and more importantly its own structure. Looking at the code of each revealed that they also differ in the way they initialize the session, which doesn't sound good for what I'm thinking of.
My question is: "is it possible or feasible-without-great-overhead to switch between the two databases without touching the existing application code? I may add things but not touch what exists already". It would be a great idea to establish a pure polyglot persistence between SQL and NOSQL databases, for example, using Hibernate.
I want to hear from you guys before digging deeper. Do we have one of Hibernate men with us here in SO?
The goal of Hibernate OGM is to offer an unified abstraction for various NoSQL data stores. The project is still young, as we speak, so I am not sure if you can adopt it right out-of-the-box.
There is also the problem of transactions. If your application was designed to use SQL transactions, then things will radically change when you switch to a NOSQL solution.
Using an abstraction layer is good for portability but doesn't offer all the power of native querying. That's the same problem with JP-QL, which only covers SQL-92, lacking support for window functions or CTE.
Polyglot persistence is a great feature, but try using separate repositories, like Spring Data offers. I find that much more flexible from an architectural point of view.
Background:
I've been using JPA lately, and I am very impressed by how easily I was able to produce a persistence layer for a reasonably large relational database project.
We use a lot of no-sql databases at my company, specifically column oriented ones. I have some questions about potentially using JPA for those databases:
Questions
Can JPA be used with NO-SQL databases? It stands to reason that if the framework can generate a query for a SQL database and map the results, then it probably could reasonably easily be tailored to generate a different kind of query and a different mapping, for say, querying Hadoop maybe?
If it's possible, are there any existing implementaitons of JPA that use things besides SQL?
Are there any good resources on implementing/extending JPA? I realize TSQL, PLSQL, etc. must all be specifically addressed in JPA, so there must be an extensibility mechanism we can manipulate.
There are various JPA implementations that support (the badly termed) "NoSQL" set of datastores. The most complete we've found to be DataNucleus which also provides the more suitable JDO API also. It supports MongoDB, Cassandra, HBase, AppEngine, LDAP, spreadsheets, Neo4j, and some others
As per your question i came across Hibernate OGM which stands for Hibernate Object Grid Mapper which provides JPA (java Persistence api)the support for the NoSQL solutions.
Hibernate OGM has the following capabilities : -
persists entities into a NoSQL
datastore specific native queries
full-text queries, using Hibernate Search as indexing engine
I haven't explore more on this framework OGM but looks very promising solution for your questions.
You can refer to the following URL to get more idea about the Hibernate OGM
http://hibernate.org/ogm/
I'm trying to find a good way to implement a generic search API in Java that will allow my users to search the backend repository without needing to know what that backend technology is, and so that if in the future we switch vendors I can reimplement the underlying logic without needing to recode the API. The repository underneath could be a relational database or a document store like SOLR, CouchDB, MongoDB, etc... It would need to support all the typical search requirements such as wildcards, ranges, bitwise operators, and so on.
Are there any standard ways of approaching this problem?
Would JPA be my best bet? Would it do everything I need it for, including non-relational databases?
Thanks in advance!
What you need is a ORM framework like Hibernate, if you go for JPA, you need to re-invent a lot of wheel.
using Hibernate you can write the business logic for searching the backend database or repository without vendor specific implementation, and if later you need to change the backend, you can do it without affecting your existing business code implementation.
I would advice you to check the hibernate documentation for further reference
The Spring Data umbrella of projects provides a nice DAO abstraction named CrudRepository. I believe most of the sub-projects (JPA, MongoDB, etc.) provide some implementations of it.
JPA would be one of a number of implementations you would use to map your relational database to objects. It would not protect you from database changes.
I think you're looking for the DAO Pattern. What I'm doing is as follows:
Create an interface for each DAO
Create a higher level DAO implementation that simply calls my actual database specific implementation
Wire the higher level DAO implementation to the database specific implementation with Spring.
This way, no code anywhere touches database specific implementation. The connections are formed only in XML.
JPA is designed around RDBMS ... only. Using it for other types of datastores makes little sense since things likes its query language leak SQL syntax. JDO is designed for datastore agnoticity, and provides persistence to many datastores using its implementations such as DataNucleus, though not all of those that you mention.
JPA is designed around RDBMS, Hibernate is also designed for RDBMS. There are few implementations of JPA which support no-sql. Similar projects are built around hibernate to support no-sql databases. However the API itself is tuned for RDBMS.
Implemeting a DAO patterns would require you to write your own query api. Later extend the implementation when ever your data store changes.
JDO and DataNucleus is ground up designed for heterogeneous data stores. Already has support for a dozen stores ,plus RDBMS. Beauty is that the query api remains constant across the stores. JDO allows you to work with domain model and leave the storage details to implementations like DataNucleus.
Hence I suggest JDO api with datanucleus.
The below link gives list data stores and f features already available in DataNucleus
http://www.datanucleus.org/products/accessplatform_3_0/datastore_features.html