We are designing a fairly large brownfield application, and run into a bit of a issue.
We have a fairly large amount of information in a DB2 database from a legacy application that is still loading data. We also have information in an Oracle database that we control.
We have to do a 'JOIN' type of operation on the tables. Right now, I was thinking of pulling the information out of the DB2 table into a List<> and then iterating those into a SQL statement on the Oracle database such as:
select * from accounts where accountnum in (...)
Is there any easier way to interact between the databases, or at least, what is the best practice for this sort of action?
I've done this two ways.
With two Sybase databases on different boxes, I set up store procedures, and called then like functions, to send data back and forth. This additionally allowed the sprocs to audit/log, to convince the customer no data was being lost in the process.
On an Oracle to Sybase one way, I used a view to marshall the data and each vendors' C libraries called from a C++ program that gave the C APIs a common interface.
On a MySQL and DB2 setup, where like your situation, the Db2 was "legacy but live", I employed a setup similar to what you're describing: pulling the data out into a (Java) client program.
If the join is always one-to-one, and each box's resultset has the same key, you can pull them both with the same ordering and trivially connect them in the client. Even if they're one-to-many, stitching them together is just a one-way iteration of both of your lists.
If it gets to be many-to-many, then I might fall back to processing one item at a time (though you could use HashSet lookup).
Basically, though, your choices are sprocs (for which you'd need to and a client layer), or just doing it in the client.
You can export data from DB2 in flat file format and use this flat file as an external table or use sql loader, this is a batch process.
There is also something called heterogeneous connectivity. Here you create a database link from Oracle to DB2. This makes it possible to query your DB2 database real time and you can join a Oracle table with a DB2 table.
You can also use this database link in combination with materialized views.
There are different kinds of heterogeneous connectivity so read the documentation carefully.
Does it have to be real time data?. If so, there are products available for heterogeneous connectivity especially db2 relational connect which is part of federated server. If the lag is accepted, you can setup scripts to replicate the data to oracle using which you can do a native join.
You will get poor performance with pulling data to client application. If this is the only option, try to create a db2 stored procedure to return the data which will make the performance slightly better.
If it is possible to copy the data from the legacy database to the database you control, you can think to a data extraction job that copies once per day (or as often as possible) the new records from the legacy DB to the Oracle DB. It might not be so simple, if you can't identify the new records that are produced in the legacy database since the last data loading.
Then, you can do the joins in your Oracle instance.
If you ask the vendors, probably the best practice would be to buy another product.
From the IBM side, there is IBM Federation Server, which can "Combine data from disparate sources such as DB2, Oracle, and SQL Server into a single virtual view." I imagine there is also one from Oracle but I'm less familiar with their products.
Oracle Transparent Gateway for DRDA http://www.oracle.com/technetwork/database/gateways/index.html
IBM Infosphere Federation Server
http://www-03.ibm.com/software/products/en/ibminfofedeserv/
Note if you have DB2 Advanced Enterprise Server Edition (AESE), Infosphere Federation Server is included.
Both products would allow you to use a single join query sent to one DB that returns data from both DBs. The Oracle product is really nice in that it allows Oracle to see the DB2 database as another Oracle DB and for DB2 to see the Oracle database as another DB2 database. (Thanks to IBM publishing the specs for both the client and server side of the DRDA protocol DB2 uses. Too bad no other vendor is willing to do so, though they have no trouble taking advantage of the fact IBM did so.)
Neither product is what I would call cheap.
For cheap, you could take advantage of Oracle Database Gateway for ODBC
http://docs.oracle.com/cd/E16655_01/gateways.121/e17936/toc.htm
Related
I need to develop application that can be getting data from multiple data sources ( Oracle, Excel, Microsoft Sql Server, and so on) using one SQL query. For example:
SELECT o.employeeId, count(o.orderId)
FROM employees#excel e. customers#microsoftsql c, orders#oracle o
WHERE o.employeeId = e.employeeId and o.customerId = c.customerId
GROUP BY o.employeeId;
This sql and data sources must be changes dynamically by java program. My customers want to write and run sql-like query from different database and storage in same time with group by, having, count, sum and so on in web interface of my aplication. Other requirements is perfomance and light-weight.
I find this way to do it (and what drawbacks I see, please, fix me if I wrong):
Apache Spark (drawbacks: heavy solution, more better for BigData,
slow if you need getting up-to-date informations without cached it
in Spark),
Distributed queries in SQL server (Database link of Oracle, Linked
server of Microsoft SQL Server, Power Query of Excel) - drawbacks:
problem with change data sources dynamically by java program and
problem with working with Excel,
Prestodb (drawbacks: heavy solution, more better for BigData),
Apache Drill (drawbacks: quite young solution, some problem with not
latest odbc drivers and some bugs when working),
Apache Calcite (ligth framework that be used by Apache Drill,
drawbacks: quite young solution yet),
Do join from data sources manually (drawbacks: a lot of work to
develop correct join, "group by" in result set, find best execution plan and so on)
May be, do you know any other way (using free open-source solutions) or give me any advice from your experience about ways in above? Any help would be greatly appreciated.
UnityJDBC is a commercial JDBC Driver that wraps multiple datasoruces and allows you to treat them as if they were all part of the same database. It works as follows:
You define a "schema file" to describe each of your databases. The schema file resembles something like:
...
<TABLE>
<semanticTableName>Database1.MY_TABLE</semanticTableName>
<tableName>MY_TABLE</tableName>
<numTuples>2000</numTuples>
<FIELD>
<semanticFieldName>MY_TABLE.MY_ID</semanticFieldName>
<fieldName>MY_ID</fieldName>
<dataType>3</dataType>
<dataTypeName>DECIMAL</dataTypeName>
...
You also have a central "sources file" that references all of your schema files and gives connection information, and it looks like this:
<SOURCES>
<DATABASE>
<URL>jdbc:oracle:thin:#localhost:1521:xe</URL>
<USER>scott</USER>
<PASSWORD>tiger</PASSWORD>
<DRIVER>oracle.jdbc.driver.OracleDriver</DRIVER>
<SCHEMA>MyOracleSchema.xml</SCHEMA>
</DATABASE>
<DATABASE>
<URL>jdbc:sqlserver://localhost:1433</URL>
<USER>sa</USER>
<PASSWORD>Password123</PASSWORD>
<DRIVER>com.microsoft.sqlserver.jdbc.SQLServerDriver</DRIVER>
<SCHEMA>MySQLServerSchema.xml</SCHEMA>
</DATABASE>
</SOURCES>
You can then use unity.jdbc.UnityDriver to allow your Java code to run SQL that joins across databases, like so:
String sql = "SELECT *\n" +
"FROM MyOracleDB.Whatever, MySQLServerDB.Something\n" +
"WHERE MyOracleDB.Whatever.whatever_id = MySQLServerDB.Something.whatever_id";
stmt.execute(sql);
So it looks like UnityJDBC provides the functionality that you need, however, I have to say that any solution that allows users to execute arbitrary SQL that joins tables across different databases sounds like a recipe to bring your databases to their knees. The solution that I would actually recommend to your type of requirements is to do ETL processes from all of your data sources into a single data warehouse and allow your users to query that; how to define those processes and your data warehouse is definitely too broad for a stackoverflow question.
One of the appropriate solution is DataNucleus platform which has JDO, JPA and REST APIs. It has support for almost every RDBMS (PostgreSQL, MySQL, SQLServer, Oracle, DB2 etc) and NoSQL datastore like Map based, Graph based, Doc based etc, database web services, LDAP, Documents like XLS, ODF, XML etc.
Alternatively you can use EclipseLink, which also has support for RDBMS, NoSQL, database web services and XML.
By using JDOQL which is part of JDO API, the requirement of having one query to access multiple datastore will be met. Both the solutions are open-source, relatively lightweight and performant.
Why did I suggest this solution ?
From your requirement its understood that the datastore will be your customer choice and you are not looking for a Big Data solution.
You are preferring open-source solutions, which are light weight and performant.
Considering your use case you might require a data management platform with polyglot persistence behaviour, which has the ability to leverage multiple datastore, based on your/customer's use cases.
To read more about polyglot persistence
https://dzone.com/articles/polyglot-persistence-future
https://www.mapr.com/products/polyglot-persistence
SQL is related to the database management system. SQL Server will require other SQL statements than an Oracle SQL server.
My suggestion is to use JPA. It is completely independent from your database management system and makes development in Java much more efficient.
The downside is, that cannot combine several database systems with JPA out of the box (like in an 1:1 relation between SQL Server and Oracle SQL server). You could, however, create several EntityManagerFactories (one for each database) and link them together in your code.
Pros for JPA in this scenario:
write database management system independent JPQL queries
reduces required java code
Cons for JPA:
you cannot relate entities from different databases (like in a 1:1 relationship)
you cannot query several databases with one query (combining tables from different databases in a group by or similar)
More information:
Wikipedia
I would recommend presto and calcite.
performance and lightweight doesn't always go hand in hand.
presto : quite a lot of proven usages, as you have said "big data". performs well scales well. I don't quite know what light weight means specifically, if requiring less machines is one of them, you could definitely scale less according to your need
calcite : embeded in a lot of data analytic libraries like drill kylin phoenix. does what you needed " connecting to multiple DBs" and most importantly "light weight"
Having experience with some of the candidates (Apache Spark, Prestodb, Apache Drill) makes me chose Prestodb. Even though it is used in big data mostly I think it is easy to set it up and it has support for (almost) everything your are asking for. There are plenty of resources available online (including running it in Docker) and it also has excellent documentation and active community, also support from two companies (Facebook & Netflix).
Multiple Databases on Multiple Servers from Different Vendors
The most challenging case is when the databases are on different servers and some of the servers run different database software. For example, the customers database may be hosted on machine X on Oracle, and the orders database may be hosted on machine Y with Microsoft SQL Server. Even if both databases are hosted on machine X but one is on Oracle and the other on Microsoft SQL Server, the problem is the same: somehow the information in these databases must be shared across the different platforms. Many commercial databases support this feature using some form of federation, integration components, or table linking (e.g. IBM, Oracle, Microsoft), but support in the open-source databases (HSQL, MySQL, PostgreSQL) is limited.
There are various techniques to handling this problem:
Table Linking and Federation - link tables from one source into
another for querying
Custom Code - write code and multiple queries to manually combine
the data
Data Warehousing/ETL - extract, transform, and load the data into
another source
Mediation Software - write one query that is translated by a
mediator to extract the data required
May be wage idea. Try to use Apache solr. User different data sources and import the data in to Apache solr. Once data is available you can write different queries by indexing it.
It is open source search platform, that makes sure your search is faster.
That's why Hibernate framework is for, Hibernate has its own query language HQL mostly identical to SQL. Hibernate acts as a middle ware to convert HQL query to database specific queries.
I have one simple table in one Oracle database that needs to be joined with a group of tables in another Oracle database. They reside on the same server (different ports). I am using JDBC and want to keep it simple. I could connect to both DBs and join the result sets in Java. But I am wondering if there is a better/easier way.
I cannot easily use new tools or frameworks since I work in the rigid corporate environment, so want to know if it can be accomplished with just JDBC.
No way to do it in pure JDBC as far as I am aware, but you could use the oracle databaselink facility. It makes the tables from one database available in another, allowing you to carry out joins etc as if they were in the same database. JDBC will work nicely with tables that are subject to these links.
Setting them up is an administrative function, so you'll need some DBA involvement.
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5005.htm
Other than that, if the "one" table isn't too big, you may have to read that to a Map and then peform the join parts of the query in your code ( which is not ideal )
I see two options.
Get the data from two datasources manually and then combine it(which you have already thought about).
Create a db link from one database to other database and create a synonym in your schema for the remote object and directly execute the sql statement with joins from java.
I want to or need to (without the use of other databases) setup Entities(database tables) in memory that have relationships, like one-to-many or many-to-many etc.
I saw something related here on this forum:
Map SQL (not JPQL) to a collection of simple Java objects?
I need to query these Entities that have relationships and get the resultsets from this,
in order to push the resulting data into an Access database, I am using Jackcess and its not a JDBC driver.
So far I have looked at MetaModel and jOOQ.
Is there anything else out there. I have a little bit of exposure to ORM's, do they query the in-memory collections or only just pass the sql query to the database.
Any help or suggestions is greatly appreciated.
Apparently, you're looking for something like .NET's LINQ-to-Objects in the Java ecosystem. There's nothing as sophisticated as LINQ-to-Objects, but there are a couple of ways to "query" collections in Java as well. You might be interested in any of these libraries:
Quaere: http://quaere.codehaus.org
Coolection: https://github.com/wagnerandrade/coollection
Lambdaj: https://code.google.com/p/lambdaj
JXPath: http://commons.apache.org/proper/commons-jxpath
JoSQL: http://josql.sourceforge.net
All of the above projects are open source and may not be so actively maintained anymore, as Java 8 will introduce a much better collections API along with language-supported lambda expressions, which renders these non-SQL focused LINQesque Java APIs obsolete.
Note, you were asking specifically about MetaModel and jOOQ. These provide you with a querying API for querying databases. I think that will not help you much for your use-cases.
Hibernate will query the object cache, but only if you query using Criteria or HQL. If you query straight SQL, it'll get run directly against the database.
Your problem description sounds like it's more than Jackcess can handle natively, but what if at program startup you read the full Access DB into an in-memory database (one that has a JDBC driver), run Hibernate queries against that in-memory database, and then at program exit just flush all Hibernate changes to the in-memory database and then write the in-memory database's contents into the Access database? You get all the complicated querying capability of Hibernate, and all you have to do is write Jackcess-to-JDBC code to load the Access DB into an equivalent schema in the in-memory database and then the inverse code to copy it back, which is way easier than writing the full JDBC driver for Jackcess.
We need to retrieve data that is spread across 3 databases (db2, sql server and as400). Is there any java library that would seamlessly and with good performance allow to "join" across tables in different databases.
What I mean by "join" is that it would be intelligent enough to query one DB and get just the keys that will be used to query the other DB and then assemble all data together. I don't want to retrieve thousands of not useful rows and iterate through to find out what should I query in other databases (as the current implementation we have now). Maybe there's something that integrates easily with hibernate.
I'd like to save persistent objects to the file system using Hibernate without the need for a SQL database.
Is this possible?
Hibernate works on top of JDBC, so all you need is a JDBC driver and a matching Hibernate dialect.
However, JDBC is basically an abstraction of SQL, so whatever you use is going to look, walk and quack like an SQL database - you might as well use one and spare yourself a lot of headaches. Besides, any such solution is going to be comparable in size and complexity to lighweight Java DBs like Derby.
Of course if you don't insist absolutely on using Hibernate, there are many other options.
It appears that it might technically be possible if you use a JDBC plaintext driver; however I haven't seen any opensource ones which provide write access; the one I found on sourceforge is read-only.
You already have an entity model, I suppose you do not want to lose this nor the relationships contained within it. An entity model is directed to be translated to a relational database.
Hibernate and any other JPA provider (EclipseLink) translate this entity model to SQL. They use a JDBC driver to provide a connection to an SQL database. This, you need to keep as well.
The correct question to ask is: does anybody know an embedded Java SQL database, one that you can start from within Java? There are plenty of those, mentioned in this topic:
HyperSQL: stores the result in an SQL clear-text file, readily imported into any other database
H2: uses binary files, low JAR file size
Derby: uses binary files
Ashpool: stores data in an XML-structured file
I have used HyperSQL on one project for small data, and Apache Derby for a project with huge databases (2Gb and more). Apache Derby performs better on these huge databases.
I don't know exactaly your need, but maybe it's one of below:
1 - If your need is just run away from SQL, you can use a NoSQL database.
Hibernate suports it through Hibernate OGM ( http://www.hibernate.org/subprojects/ogm ).
There are some DBs like Cassandra, MongoDB, CouchDB, Hadoop... You have some suggestions Here
.
2 - Now, if you want not to use a database server (with a service process running always), you can use Apache Derby. It's a DB just like any other SQL, but no need of a server. It uses a singular file to keep data. You can easily transport all database with your program.
Take a look: http://db.apache.org/derby/
3 - If you really want some text plain file, you can do like Michael Borgwardt said. But I don't know if Hibernate would be a good idea in this case.
Both H2 and HyperSQL support embedded mode (running inside your JVM instead of in a separate server) and saving to local file(s); these are still SQL databases, but with Hibernate there's not many other options.
Well, since the question is still opened and the OP said he's opened to new approaches/suggestions, here's mine (a little late but ok).
Do you know Prevayler? It's a Java Prevalence implementation which keep all of your business objects in RAM and mantain Snapshots/Changelogs in the File System, this way it's extremely fast and reliable, since if there's any crash, it'll restore it's last state and reapply every change to it.
Also, it's really easy to setup and run in your app.
Ofcourse this is possible, You can simply use file io features of Java, following steps are required:-
Create a File Object
2.Create an object of FileInputStream (though there are ways which use other Classes)
Wrap this object in a Buffer object or simply inside a java.util.Scanner.
use specific write functions of the object created in previous step.
Note that your object must implement Serializable interface. See following link,