I need to develop application that can be getting data from multiple data sources ( Oracle, Excel, Microsoft Sql Server, and so on) using one SQL query. For example:
SELECT o.employeeId, count(o.orderId)
FROM employees#excel e. customers#microsoftsql c, orders#oracle o
WHERE o.employeeId = e.employeeId and o.customerId = c.customerId
GROUP BY o.employeeId;
This sql and data sources must be changes dynamically by java program. My customers want to write and run sql-like query from different database and storage in same time with group by, having, count, sum and so on in web interface of my aplication. Other requirements is perfomance and light-weight.
I find this way to do it (and what drawbacks I see, please, fix me if I wrong):
Apache Spark (drawbacks: heavy solution, more better for BigData,
slow if you need getting up-to-date informations without cached it
in Spark),
Distributed queries in SQL server (Database link of Oracle, Linked
server of Microsoft SQL Server, Power Query of Excel) - drawbacks:
problem with change data sources dynamically by java program and
problem with working with Excel,
Prestodb (drawbacks: heavy solution, more better for BigData),
Apache Drill (drawbacks: quite young solution, some problem with not
latest odbc drivers and some bugs when working),
Apache Calcite (ligth framework that be used by Apache Drill,
drawbacks: quite young solution yet),
Do join from data sources manually (drawbacks: a lot of work to
develop correct join, "group by" in result set, find best execution plan and so on)
May be, do you know any other way (using free open-source solutions) or give me any advice from your experience about ways in above? Any help would be greatly appreciated.
UnityJDBC is a commercial JDBC Driver that wraps multiple datasoruces and allows you to treat them as if they were all part of the same database. It works as follows:
You define a "schema file" to describe each of your databases. The schema file resembles something like:
...
<TABLE>
<semanticTableName>Database1.MY_TABLE</semanticTableName>
<tableName>MY_TABLE</tableName>
<numTuples>2000</numTuples>
<FIELD>
<semanticFieldName>MY_TABLE.MY_ID</semanticFieldName>
<fieldName>MY_ID</fieldName>
<dataType>3</dataType>
<dataTypeName>DECIMAL</dataTypeName>
...
You also have a central "sources file" that references all of your schema files and gives connection information, and it looks like this:
<SOURCES>
<DATABASE>
<URL>jdbc:oracle:thin:#localhost:1521:xe</URL>
<USER>scott</USER>
<PASSWORD>tiger</PASSWORD>
<DRIVER>oracle.jdbc.driver.OracleDriver</DRIVER>
<SCHEMA>MyOracleSchema.xml</SCHEMA>
</DATABASE>
<DATABASE>
<URL>jdbc:sqlserver://localhost:1433</URL>
<USER>sa</USER>
<PASSWORD>Password123</PASSWORD>
<DRIVER>com.microsoft.sqlserver.jdbc.SQLServerDriver</DRIVER>
<SCHEMA>MySQLServerSchema.xml</SCHEMA>
</DATABASE>
</SOURCES>
You can then use unity.jdbc.UnityDriver to allow your Java code to run SQL that joins across databases, like so:
String sql = "SELECT *\n" +
"FROM MyOracleDB.Whatever, MySQLServerDB.Something\n" +
"WHERE MyOracleDB.Whatever.whatever_id = MySQLServerDB.Something.whatever_id";
stmt.execute(sql);
So it looks like UnityJDBC provides the functionality that you need, however, I have to say that any solution that allows users to execute arbitrary SQL that joins tables across different databases sounds like a recipe to bring your databases to their knees. The solution that I would actually recommend to your type of requirements is to do ETL processes from all of your data sources into a single data warehouse and allow your users to query that; how to define those processes and your data warehouse is definitely too broad for a stackoverflow question.
One of the appropriate solution is DataNucleus platform which has JDO, JPA and REST APIs. It has support for almost every RDBMS (PostgreSQL, MySQL, SQLServer, Oracle, DB2 etc) and NoSQL datastore like Map based, Graph based, Doc based etc, database web services, LDAP, Documents like XLS, ODF, XML etc.
Alternatively you can use EclipseLink, which also has support for RDBMS, NoSQL, database web services and XML.
By using JDOQL which is part of JDO API, the requirement of having one query to access multiple datastore will be met. Both the solutions are open-source, relatively lightweight and performant.
Why did I suggest this solution ?
From your requirement its understood that the datastore will be your customer choice and you are not looking for a Big Data solution.
You are preferring open-source solutions, which are light weight and performant.
Considering your use case you might require a data management platform with polyglot persistence behaviour, which has the ability to leverage multiple datastore, based on your/customer's use cases.
To read more about polyglot persistence
https://dzone.com/articles/polyglot-persistence-future
https://www.mapr.com/products/polyglot-persistence
SQL is related to the database management system. SQL Server will require other SQL statements than an Oracle SQL server.
My suggestion is to use JPA. It is completely independent from your database management system and makes development in Java much more efficient.
The downside is, that cannot combine several database systems with JPA out of the box (like in an 1:1 relation between SQL Server and Oracle SQL server). You could, however, create several EntityManagerFactories (one for each database) and link them together in your code.
Pros for JPA in this scenario:
write database management system independent JPQL queries
reduces required java code
Cons for JPA:
you cannot relate entities from different databases (like in a 1:1 relationship)
you cannot query several databases with one query (combining tables from different databases in a group by or similar)
More information:
Wikipedia
I would recommend presto and calcite.
performance and lightweight doesn't always go hand in hand.
presto : quite a lot of proven usages, as you have said "big data". performs well scales well. I don't quite know what light weight means specifically, if requiring less machines is one of them, you could definitely scale less according to your need
calcite : embeded in a lot of data analytic libraries like drill kylin phoenix. does what you needed " connecting to multiple DBs" and most importantly "light weight"
Having experience with some of the candidates (Apache Spark, Prestodb, Apache Drill) makes me chose Prestodb. Even though it is used in big data mostly I think it is easy to set it up and it has support for (almost) everything your are asking for. There are plenty of resources available online (including running it in Docker) and it also has excellent documentation and active community, also support from two companies (Facebook & Netflix).
Multiple Databases on Multiple Servers from Different Vendors
The most challenging case is when the databases are on different servers and some of the servers run different database software. For example, the customers database may be hosted on machine X on Oracle, and the orders database may be hosted on machine Y with Microsoft SQL Server. Even if both databases are hosted on machine X but one is on Oracle and the other on Microsoft SQL Server, the problem is the same: somehow the information in these databases must be shared across the different platforms. Many commercial databases support this feature using some form of federation, integration components, or table linking (e.g. IBM, Oracle, Microsoft), but support in the open-source databases (HSQL, MySQL, PostgreSQL) is limited.
There are various techniques to handling this problem:
Table Linking and Federation - link tables from one source into
another for querying
Custom Code - write code and multiple queries to manually combine
the data
Data Warehousing/ETL - extract, transform, and load the data into
another source
Mediation Software - write one query that is translated by a
mediator to extract the data required
May be wage idea. Try to use Apache solr. User different data sources and import the data in to Apache solr. Once data is available you can write different queries by indexing it.
It is open source search platform, that makes sure your search is faster.
That's why Hibernate framework is for, Hibernate has its own query language HQL mostly identical to SQL. Hibernate acts as a middle ware to convert HQL query to database specific queries.
I have 2 data models with different amount of tables in MySQL, but both designed for the same purpose.
I need to have mechanism which will migrate data from model #1 to model #2. It can be stored procedure, a set of SQL-scripts, or Java-code. It would be best to create mappings visually (e.g. drag from Table1M1.field1 to Table1M2.field5). Is there any tool for this exists?
MySQL Workbench has a Database Migration module. Check it out.
I'm currently working on a simple Java application that calculates and graphs the different types of profit for a company. A company can have many branches, and each branch can have many years, and each year can have up to 12 months.
The hierarchy looks as follows:
-company
+branch
-branch
+year
-year
+month
-month
My intention was to have the data storage as simple as possible for the user. The structure I had in mind was an XML file that stored everything to do with a single company. Either as a single XML file or have multiple XML files that are linked together with unique IDs.
Both of these options would also allow the user to easily transport the data, as apposed to using a database.
The problem with a database that is stopping me right now, is that the user would have to setup a database by him/herself which would be very difficult for them if they aren't the technical type.
What do you think I should go for XML file, database, or something else?
It will be more complicated to use XML, XML is more of an interchange format, not a substitute for a DB.
You can use an embeddedable database such as H2 or Apache Derby / JavaDB, in this case the user won't have to set up a database. The data will be stored only locally though, so if this is ok for your application, you can consider it.
I would defintely go for the DB:
you have relational data, a thing DBs are very good at
you can query your data in that relational much easier than in XML
the CRUD operations (create, read, update, delete) are much more easier in DB than in XML
You can avoid the need for the user to install a DB engine by embedding SQLite with your app for example.
If it's a single-user application and the amount of data is unlikely to exceed a couple of megabytes, then using an XML file for the persistent storage might well make sense in that it reduces the complexity of the package and its installation process. But you're limiting the scalability: is that wise?
Is there any free tool which can generate data entry and listing screen for a database table(Oracle) based on the metadata. Desired features:
Drop downs for reference data
Abilty to customise the label name, showing/hiding specific column of the table, ordering of columns etc..
Operation on listing screen (modify,delete,activate,deactivate etc.)
import the data from CSV file.
Ability to add custom validation before save/modify the data in DB.
Pre Delete Validations.
Option to choose technology stack e,g, (Struts 2, Spring, Hibernate) or (Struts 1, EJB,DAO pattern) etc.
For Oracle database there is a tool called APEX that can generate views and edit pages starting from db structure.
This is the pointer: http://www.oracle.com/technetwork/developer-tools/apex/overview/index.html
But APEX has its own technology stack and it does not generate code, it's a Access-like framework to handle Oracle db data.
As you are interested in Java technology I urge you to check out JDeveloper. It gets over-looked by Java people because it's not Eclipse but it has lots of cool features. In particular its ADF BC wizards can generate quite sophisticated data-driven components. Find out more.
Oh, and the tool is free as in free beer, although there are licences payable to deploy ADF and TopLink components. Thanks to carpenteri for pointing out the relevant documentation.
these exist (www.enterprise-elements.com) but are certainly not free.
I'm doing a Java software-project at my university that mainly is about storing data-sets (management of software tests).
The first thing I thought of was a simple SQL DB, however the necessary DB scheme is not available for now (let's say the project is stupid but there's no choice).
Is a persistency framework like Hibernate able to store data internally (for example in XML) and to convert this XML into decent SQL later?
My intention is to use the additional abstraction layer of a framework like Hibernate to save work, because it might have conversion functions. I know that Hibernate can generate class files from SQL, but I'm not too sure whether it needs a DB at every point during development. Using a XML Scheme for now and converting it into SQL later maybe an idea :)
You can persist XML with hibernate into a relational DB, but you cannot use XML directly as a storage engine. Why not simply store you're data into a relational db from the start - you'll create some schema yourself and you'll adapt it to the actual one when you receive it.
I would recommand using a lightweight DB such as HSQLDB instead.