Integrating Pentaho/Talend/etc. with an OR Mapper - java

We have an application (Java) with an own OR mapper. Within this system we have what can be compared to Hibernate's interceptors (we call it triggers): Do specific actions just before saving data in the database, after it's deleted and so on. The underlying database is MySQL.
Now we would like to use tools such as Pentaho Data Integration or Talend to convert data to put it into our system. It's no problem to do that directly on the SQL level, but by doing so we loose the built-in power of our triggers.
Is there a way to somehow integrate any of the Data Integration solutions into our existing application? It would be great if there was a way to write into instances of our classes instead of writing into the database directly.
Any hints welcome :-)

I'd prefer Talend which is a Java code generator tool. (You can se my blog post at http://www.robertomarchetto.com/www/talend_studio_vs_kettle_pentao_pdi_comparison)
You could use a tJavaRow so you can write Java code for each processed row. In tJavaRow you can call Hibernate code, for example using a custom class defined in a new routine.

2 ways with Pentaho data integration I can think of straight off:
Simply create a plugin which adds/deletes data - you could copy the existing salesforce insert/update plugins, they would be a good start - rip out all the salesforce code and replace with yours.
Perhaps harder; But maybe more satisfying - Write a jdbc driver which uses your code!

Related

Integrating ETL with Java using intellij

I want to start with ETL on java. I am using Intellij. I wanted to know how the integration can be done or which tool is compatible with intellij.
Also if there is any tutorials on the basics of ETL with java.
Exactly what and all I will need if I want to do the transformation of data
It can be basic like just taking in random input from a file and transforming
the data based on particular logic
Creating code to extract (query different sources like DB, XML, web service, etc) to transform (you know make everything compatible, removing dup's, creating Dims and Facts) to load them to targets (databases and more)...
All this is not new. Java is great but creating ETLs with it is creating a non-standar app... And is going to become a legacy and then you need to build a scheduler to run the loads and to integrate with several components.
So. I strongly recommend instead of create a Java app, take a look to products like Informatica PowerCenter and/or Oracle Data Integrator.
This solutions are business wide standard for ETL worldwide, provides objects and methods to avoid apps that needs to be hardly mantein and are on top of any applications... Also are used for integration, migration, B2B, BI... Named...
Good luck!
You would be re- inventing the wheel if you are trying to create a Java based etl product .
Talend is a java based open source ETL tool which gives the features of an ETL tool and lets one write Java code to integrate ..
Pentaho is another Java based ETL tool..
Both of them are popular and have good UI...

Neo4J POJO approach

I have a quick question for Neo4J, is it possible to migrate from mysql to neo4j? Based from what I read, it seems that you can, but so far all the tutorials are meant for web service. I was wondering if there is a way (POJO) to do this kind of process. Currently I have over 300k records in process to be exported in CSV and I plan to load them into neo4j using spring. Can I just read them with JDBC and create new nodes in neo4j? thanks!
It's possible to migrate MySQL database to Neo4j, but it depends how you want to do it and what results do you expect.
You can use CSV export/import. It's simple to use, but with some limitations. For one time operation it should be good enough.
Second option is to write your own script or program which transform data from RDBMS to the Graph. It could be more powerful, you can do cleaning, transformation easily. Also you can use Spring Data for Neo4j to create persisted entities.
Next option is to use GraphAware Neo4j Importer. It's "Framework" for importing data from RDBMS to the Neo4j with lot of powerful features, but learning curve is steep.

Spark Model to use in Java Application

For analysis.
I know we can use the Save function and load the Model in Spark application. But it works only in Spark application (Java, Scala, Python).
We can also use the PMML and export the model to other type of application.
Is there any way to use a Spark model in a Java application?
I am one of the creators of MLeap. Check us out, it is meant for exactly your use case. If there is a transformer you need that is not currently supported, get in touch with me and we will get it in there.
Our serialization format is solely JSON/Protobuf right now, so it is very portable and supports large models like RandomForest. You can serialize your model to a zip file then load it up wherever.
Take a look at our demo to get a use case:
https://github.com/TrueCar/mleap-demo
Currently no, your options are to use PMML for those models that support it, or write your own framework for using models outside of Spark.
There is movement towards enabling this (see this issue). You could also check out Mleap.

Using App Fuse to generate CRUD operations directly from database schema

I was told that one can generate CRUD operations directly form the database schema using app fuse. But I was unsuccessful in doing that using it and it look very long without any clue.
One possibility is to create the models and annotate it and create CRUDs and create database tables and also populate them with dummy variables. But is it possible to do it other way round.
I have been following this tutorial provided from App Fuse. Am I doing it wrong or is it possible.
Thanks
I would try using appfuse:gen-model:
http://static.appfuse.org/plugins/appfuse-maven-plugin/gen-model-mojo.html
Note, that AppFuse isn't great at creating relationships between classes, so you might have to do some work after it generates the code. You also might try searching the user mailing list archives:
http://appfuse.547863.n4.nabble.com/AppFuse-User-f547864.html

Is it possible to save persistent objects to the file system

I'd like to save persistent objects to the file system using Hibernate without the need for a SQL database.
Is this possible?
Hibernate works on top of JDBC, so all you need is a JDBC driver and a matching Hibernate dialect.
However, JDBC is basically an abstraction of SQL, so whatever you use is going to look, walk and quack like an SQL database - you might as well use one and spare yourself a lot of headaches. Besides, any such solution is going to be comparable in size and complexity to lighweight Java DBs like Derby.
Of course if you don't insist absolutely on using Hibernate, there are many other options.
It appears that it might technically be possible if you use a JDBC plaintext driver; however I haven't seen any opensource ones which provide write access; the one I found on sourceforge is read-only.
You already have an entity model, I suppose you do not want to lose this nor the relationships contained within it. An entity model is directed to be translated to a relational database.
Hibernate and any other JPA provider (EclipseLink) translate this entity model to SQL. They use a JDBC driver to provide a connection to an SQL database. This, you need to keep as well.
The correct question to ask is: does anybody know an embedded Java SQL database, one that you can start from within Java? There are plenty of those, mentioned in this topic:
HyperSQL: stores the result in an SQL clear-text file, readily imported into any other database
H2: uses binary files, low JAR file size
Derby: uses binary files
Ashpool: stores data in an XML-structured file
I have used HyperSQL on one project for small data, and Apache Derby for a project with huge databases (2Gb and more). Apache Derby performs better on these huge databases.
I don't know exactaly your need, but maybe it's one of below:
1 - If your need is just run away from SQL, you can use a NoSQL database.
Hibernate suports it through Hibernate OGM ( http://www.hibernate.org/subprojects/ogm ).
There are some DBs like Cassandra, MongoDB, CouchDB, Hadoop... You have some suggestions Here
.
2 - Now, if you want not to use a database server (with a service process running always), you can use Apache Derby. It's a DB just like any other SQL, but no need of a server. It uses a singular file to keep data. You can easily transport all database with your program.
Take a look: http://db.apache.org/derby/
3 - If you really want some text plain file, you can do like Michael Borgwardt said. But I don't know if Hibernate would be a good idea in this case.
Both H2 and HyperSQL support embedded mode (running inside your JVM instead of in a separate server) and saving to local file(s); these are still SQL databases, but with Hibernate there's not many other options.
Well, since the question is still opened and the OP said he's opened to new approaches/suggestions, here's mine (a little late but ok).
Do you know Prevayler? It's a Java Prevalence implementation which keep all of your business objects in RAM and mantain Snapshots/Changelogs in the File System, this way it's extremely fast and reliable, since if there's any crash, it'll restore it's last state and reapply every change to it.
Also, it's really easy to setup and run in your app.
Ofcourse this is possible, You can simply use file io features of Java, following steps are required:-
Create a File Object
2.Create an object of FileInputStream (though there are ways which use other Classes)
Wrap this object in a Buffer object or simply inside a java.util.Scanner.
use specific write functions of the object created in previous step.
Note that your object must implement Serializable interface. See following link,

Categories