I am trying to develop a Java application which merges data from multiple data source basically RDBMS. The scenario is some thing like this.
I have creates a connection to two data sources, lets say a MSSQL database and other Oracle. Also on each connection a user can create a DataObject( a Java object) which contains a SQL query and a connection. The query is executed on the connection and result are displayed.
Now what I want is that my user can join and filter result obtained from multiple DataObject.
Currently I am looking on the following solution:
JDO/Hibernate - I will create a object from the ResultSet obtained from the query execution and will use the multiple objects with filter and joining condition.
Java RowSet - I will create a RowSet object over result sets and user JoinRowSet and FilteredRowSet to join multiple result set.
Please advice me on my choice. Also please can other solution be looked into.
I would suggest the former. To me its as simple as getting the list of entities, and add those in a single list, based on some filter.
Oracle comes with a generic ODBC gateway that allows you to link the oracle database with another database, so you can join tables from both databases etc. with SQL, as if both tables were on Oracle. See this link for details. By doing that, you don't have to replicate database features in your java program.
Related
I want to design a system. There are different customers using this system. I need to create the duplicated tables for every customer. For example, I have a table Order, then all of order records for customerA are in table Order_A, as well as customerB data are in table Order_B. I can distinct different customers from session, but how can I let Spring JPA to reflect the RDS table data to Java object?
I know 2 solutions, but both are not satisfied.
Consider to use Mybatis because it supports load SQL from xml file and parameters inside SQL;
Consider to use org.hibernate.EmptyInterceptor. This is my current implement in my project. For every entity, I must define a subclass of it. It can update the SQL before Hibernate's execution.
However, both are not graceful. I prefer the better solution.
I'm trying to access data in multiple databases on a single running instance. the table structures of these databases are all the same; As far as I know, create a new connnection using jdbc is very expensive. But the connection string of jdbc require format like this
jdbc:mysql://hostname/ databaseName, which needs to specify a specific database.
So I'm wondering is there any way to query data in multiple databases using one connection?
The MySQL documentation is badly written on this topic.
The SELECT Syntax page refers to the JOIN Syntax page for how a table name can be written, even if you don't use JOIN clauses. The JOIN Syntax page simply says tbl_name, without further defining what that is. There is even a comment at the bottom calling this out:
This page needs to make it explicit that a table reference can be of the form schema_name.tbl_name, and that joins between databases are therefore posible.
The Schema Object Names page says nothing about qualifying names, but does have a sub-page called Identifier Qualifiers, which says that a table column can be referred to using the syntax db_name.tbl_name.col_name. The page says nothing about the ability to refer to tables using db_name.tbl_name.
But, if you can refer to a column using db_name.tbl_name.col_name, it would only make sense if you can also refer to a table using db_name.tbl_name, which means that you can access all your databases using a single Connection, if you're ok with having to qualify the table names in the SQL statements.
As mentioned by #MarkRotteveel in a comment, you can also switch database using the Connection.setCatalog(String catalog) method.
This is documented in the MySQL Connector/J 5.1 Developer Guide:
Initial Database for Connection
If the database is not specified, the connection is made with no default database. In this case, either call the setCatalog() method on the Connection instance, or fully specify table names using the database name (that is, SELECT dbname.tablename.colname FROM dbname.tablename...) in your SQL. Opening a connection without specifying the database to use is generally only useful when building tools that work with multiple databases, such as GUI database managers.
Note: Always use the Connection.setCatalog() method to specify the desired database in JDBC applications, rather than the USE database statement.
I tried to make this as simple as possible with a short example.
We have two databases, one in MSSQLServer and other in Progress.
We have the user DTO as it follows that we shown in a UI table within a web application.
User
int, id
String, name
String, accountNumber
String, street
String, city
String, country
Now this DTO(Entity) is not stored only in one database, some information (fields) for the same user are stored in one database and some in the other database.
MSsql
Table user
int, id
String, name
String, accountNumber
Table userModel
int, id
String, street
String, city
String, country
As you can see the key is the only piece that link two tables in both databases, as I said before they are not in the same database and not using same database vendor.
We have a requirement for sorting the UI table for each column. Obviously we need to create user dto with the information coming from both databases.
Our proposal at this moment is if user want to apply sorting using street field, we run a query in the Progress database and obtain a page (using pagination) using this resultset and go directly to the MSSQLServer User table with those keys and run another query to extract the missing information and save it to our DTO and transfer it to the UI. With implies run a query in one database then other query based on the returned keys in the second database.
The order of the database could change depending in which column(field) the user wants to apply sorting.
Technically we will create a jparepository that acts as a facade and depending on the field make the process in the correct database.
My question is:
There is some kind of pattern that is commonly used in this scenarios, we are using spring, so probably spring have some out of the box features to support this requirement, will be great if this is possible using jparepositories (I have several doubts about it as we will use two different entitymanagers, one for each database).
Note: Move data from one database to another is not an option.
For this, you need to have separate DataSource/EntityManagerFactory/JpaRepository.
There is no out-of-the-box support for this architecture in the Spring framework, but you can easily hide the dual DataSource pair behind a Service layer. You can even configure JTA DataSources for ACID operations.
As you will always need to fetch data from both databases, why not populate local java User objects then sort these objects (using a comparator with the appropriate fields you want to sort on).
The advantage of sorting locally vs doing the sort in the database query is that you won't have to send requests to the database every time you change the sorting field.
So, to summarize:
1- Issue two sql queries for the two databases to get your users
2- Build your User objects using the retrieved values
3- Use Java comparators to sort the users on any field without having to issue new queries to the database.
My advice would be to find a way to link 2 databases together so that you can utilize database driver features without your code being affected.
Essentially if Progress database can be linked to SQL Server, you will be able to query both databases using a single SQL query with a join on id column and you will get a merged, sorted and paginated result set for your application to display.
I am not an expert in Progress database but it seems there is an ODBC driver for it so you might try to link it to SQL Server.
There are some JDBC Rowsets like CachedRowSet, WebRowSet, FilteredRowSet and JoinRowSet. Does any bode know where they are used?
Ok, may be CachedRowSet is good where I do not want open and connection, may be WebRowSet good when I need insert some XML data ("may be" but I do not sure). But what about others?
Obviously, It is better for performance to write join in SQL query instead of create 2 JoinRowSet, take all data from they and join fields in java. The same about FilteredRowSet - it is more efficient to add where clause to the SQL query instead of grub a lot of the data and filtered it by java.
But somebody "invented" CachedRowSet, WebRowSet, FilteredRowSet and JoinRowSet why? Does anybody has some good experience about they usage?
The CachedRowSet interface defines the basic capabilities available to all disconnected RowSet objects. The other three are extensions of the CachedRowSet interface, which provide more specialized capabilities. The following information shows how they are related:
A CachedRowSet object has all the capabilities of a JdbcRowSet object plus it can also do the following:
Obtain a connection to a data source and execute a query.
Read the data from the resulting ResultSet object and populate itself with
that data.
Manipulate data and make changes to data while it is
disconnected.
Reconnect to the data source to write changes back to
it.
Check for conflicts with the data source and resolve those
conflicts
A WebRowSet object has all the capabilities of a CachedRowSet object plus it can also do the following:
Write itself as an XML document
Read an XML document that describes a WebRowSet object
A JoinRowSet object has all the capabilities of a WebRowSet object (and therefore also those of a CachedRowSet object) plus it can also do the following:
Form the equivalent of a SQL JOIN without having to connect to a data source
A FilteredRowSet object likewise has all the capabilities of a WebRowSet object (and therefore also a CachedRowSet object) plus it can also do the following:
Apply filtering criteria so that only selected data is visible. This is equivalent to executing a query on a RowSet object without having to use a query language or connect to a data source.
RowSet interface, the rows are retrieved from a JDBC data source, but a rowset may be customized so that its data can also be from a spreadsheet, a flat file, or any other data source with a tabular format.
Disconnected (not connected to the data source except when reading data from it or writing data to it)
CachedRowSet
JoinRowSet
FilteredRowSet
WebRowSet
Cached Rowset - being disconnected and able to operate without a driver, is designed to work especially well with a thin client for passing data in a distributed application or for making a result set scrollable and updatable
WebRowSet - the ability to read and write a rowset in XML format.
FilteredRowSet- used to the filtered subset of data from a rowset.
JoinRowSet -used to combine data from two different RowSet objects. This can be especially valuable when related data is stored in different data sources
Documentation
Not sure but this is what I think about FilteredRowSet. One can get the data from database by making connection in one shot. For example data for city, state and country. Later one can further subset the data with out going back to database in Java. Such as all records related to city or state or country or combination of them.
I'm working on a web based application that belongs to an automobil manufacturer, developed in Spring-Hibernate with MS SQL Server 2005 database.
There are three kind of use cases:
1) Through this application, end users can request for creating a Car, Bus, Truck etc through web based interfaces. When a user logs in, a HTML form gets displayed for capturing technical specification of vehicle, for ex, if someone wanted to request for Car, he can speify the Engine Make/Model, Tire, Chassis details etc and submit the form. I'm using Hibernate here for persistence, i.e. I've a Car Entity that gets saved in DB for each such request.
2) This part of the application deals with generation of reports. These reports mainly dela with number of requests received in a day and the summary. Some of the reports calculate Turnaround time for individual Create vehicle requests.
I'm using plain JDBC calls with Preparedstatement (if report can be generated with SQLs), Callablestatement (if report is complex enough and needs a DB procedure/Function to fetch all details) and HibernateCallback to execute the SQLs/Procedures and display information on screen.
3) Search: This part of application allows ensd users to search for various requests data, i.e. how many vehicle have been requested in a Year etc. I'm using DB procedure with CallableStatement..Once again executing these procedures within HibernateCallback, populating and returning search result on GUI in a POJO.
I'm using native SQL in (2) and (3) above, because for the reporting/search purpose the report data structure to display on screen is not matching with any of my Entity. For ex: Car entity has got more than 100 attributes in itself, but for reporting purpose I don't need more than 10 of them.. so i just though loading all 100 attributes does not make any sense, so why not use plain SQL and retrieve just the data needed for displaying on screen.
Similarly for Search, I had to write procedures/Functions because search algorithm is not straight forward and Hibernate has no way to write a stored procedure kind of thing.
This is working fine for proto type, however I would like to know
a. If my approach for using native SQLs and DB procedures are fine for case 2 and 3 based on my judgement.
b. Also whether executing SQLs in HibernateCallback is correct approach?
Need expert's help.
I would like to know (...) if my approach for using native SQLs and DB procedures are fine for case 2 and 3 based on my judgment
Nothing forces your to use a stored procedure for case 2, you could use HQL and projections as already pointed out:
select f.id, f.firstName from Foo f where ...
Which would return an Object[] or a List<Object[]> depending on the where condition.
And if you want type safe results, you could use a SELECT NEW expression (assuming you're providing the appropriate constructor):
select new Foo(f.id, f.firstName) from Foo f
And you can even return non entities
select new com.acme.LigthFoo(f.id, f.firstName) from Foo f
For case 3, the situation seems different. Just in case, note that the Criteria API is more appropriate than HQL to build dynamic queries. But it looks like this won't help here.
I would like to know (...) whether executing SQLs in HibernateCallback is correct approach?
First of all, there are several restrictions when using stored procedures and I prefer to avoid them when possible. Secondly, if you want to return entities, it isn't the only way and simplest solution as we saw. So for case 2, I would consider using HQL.
For case 3, since you aren't returning entities at all, I would consider not using Hibernate API but the JDBC support from Spring which offers IMHO a cleaner API than Session#connection() and the HibernateCallback.
More interesting readings:
References
Hibernate Core reference guide
14.6. The select clause (about the select new)
16.1.5. Returning non-managed entities (about ResultTransformer)
16.2.2. Using stored procedures for querying
Resources
Hibernate 3.2: Transformers for HQL and SQL
Related questions
hibernate SQLquery extract variable
hibernate query language or using criteria
You should strive to use as much HQL as possible, unless you have a good argument (like performance, but do a benchmark first). If the use of native queries becomes to excessive, you should consider whether Hibernate has been a good choice.
Note a few things:
you can have native queries and stored procedures that result in Hibernate entities. You just have to map the query / storproc call to a class and call it by session.createSQLQuery(queryName)
If you really need to construct native queries at runtime, the newest version of hibernate have a doWork(..) method, by which you can do JDBC work.
You say
For ex: Car entity has got more than 100 attributes in itself, but for reporting purpose I don't need more than 10 of them.. so i just though loading all 100 attributes does not make any sense
but HQL in hibernate allows you to do a projection (select only a subset of the columns back). You don't have to pull the entire entity if you don't want to.
Then you get all the benefits of HQL (typing of results, HQL join syntax) but you can pretty much write SQLish code.
See here for the HQL docs and here for the select syntax. If you're used to SQL it's pretty easy.
So to answer you directly
a - No, I think you should be using HQL
b - Becomes irrelevant if you go with my suggestion for a.