Connections vs DataSources

Connections vs DataSources - java

I am reading about Connections vs DataSources in Java and I have some questions. Is a DataSource really just a manager and an abstraction of a Connection (or multiple connections)?

From docs:
A factory for connections to the physical data source that this DataSource object represents. An alternative to the DriverManager facility, a DataSource object is the preferred means of getting a connection.
Actually, a DataSource is a provider of Connections and it has a variety of implementations which operate in different manners. Such as:
Basic implementation -- produces a standard Connection object
Connection pooling implementation -- produces a Connection object that will automatically participate in connection pooling. This
implementation works with a middle-tier connection pooling manager.
Distributed transaction implementation -- produces a Connection object that may be used for distributed transactions and almost always
participates in connection pooling. This implementation works with a
middle-tier transaction manager and almost always with a connection
pooling manager.

Connection is the connection :) DataSource is a manager of connections (pool of connections).

Data source is the source of your data, the connection is the driver.

DataSource is ill concept in its existing form as if it was meant to abstract interactions it should not return some SQL connection to interact with in the first place. Also it is overkill and XA is fantasy concept that some paid dearly for lack of reliable implementation in the world (I mean all enterprise commercial implementers simply fail and expose business... someone got hurt because of it in finance, but I will not mention names). In general, regardless if Sun or Oracle recommends it, it results in over-engineering and some technical dirt in code (dealing with contexts, extra steps to get to data, some external configurations... and in the end it is still specific to vendor implementations anyway). Some developed coroporate solutions that deal with pooling, connection recovery e.t.c much better based on plain connections and DriverManager than DataSource implemtations provided by enterprise DBMS vendors.
For the record I worked with both and I base it on facts encountered in different places. And if you doubt it then ask why you can see plain JDBC URL in Hibernate configuration all over the place in businesses. Many simply dump the heavyweigtt idea of J2EE and go lightweight... also using plain connections based on DriverManager.
You want to get your career on line then go ahead with XA DataSources and recovery of failed transactions where poor XA implemetation failed badly.

Related

Why use DataSource instead of XADataSource?

As far as I understand there are two types of DataSource connections, javax.sql.DataSource and javax.sql.XADataSource, this tutorial explains that javax.sql.DataSource give the connection the ability to be pooled and javax.sql.XADataSource give the connection distributed transactional behavior.
I understand most XADataSource will implement connection pooling as well as distributed transactions so I don't see the point to use a DataSource when you could use a XADataSource and have both.
Are there any tradeoff when choosing a XADataSource over a DataSource?
I understand is not mandatory for a XADataSource to use pooled connections, is it there a way to find out if a XADataSource uses pooled connections or only relying on the XADataSource provider's documentation?
EDIT:
I am refering to javax.sql.DataSource and javax.sql.XADataSource because those are the types Tomcat 8 factory gives you:
Type should always be javax.sql.DataSource or javax.sql.XADataSource
Depending on the type a org.apache.tomcat.jdbc.pool.DataSource or a
org.apache.tomcat.jdbc.pool.XADataSource will be created.
I do understand that in the end I would be using a DataSource on my code as an API, abstracting the underlying implementation... my question is more related on the decision making process I have to go through when I am configuring Tomcat 8 (or any other server as well).
I want to have pooled connections and there are many XADataSource implementations that will give transactional and pooled connections, so why not always use XADataSource if I will get more? (this of course doesn't applies for a XADataSource that doesn't implements pooled connections)

When to configure XADataSource
As explained in the second section, your code will always use the DataSource interface (which might use a XADataSource). If the question is when should you use a XADataSource (eg configure it in your application server), then the answer is simple:
You use an XADataSource if you need to have distributed transactions: that is ensure a transaction succeeds or fails across multiple resources (eg different databases).
If you don't need distributed transactions, then you can still configure an XADataSource, but this might have some overhead in terms of memory and processing, for example extra objects (eg XAResource) that go unused, and maybe in terms of the 'bookkeeping' done by the data source. This overhead is probably negligible though.
Some data source (eg the Tomcat pool as mentioned in your question), can either use a DataSource or an XADataSource as a factory to create connections (according to the JDBC spec a ConnectionPoolDataSource should also be available as a factory, but it looks like Tomcat ignores that option). This doesn't change the way you decide what to use:
Don't need distributed transactions:
Program --uses--> Tomcat connection pool DataSource --uses--> JDBC driver DataSource
Need distributed transactions:
Program --uses--> Tomcat connection pool DataSource --uses--> JDBC driver XADataSource
In both cases the connection pool is provided by the Tomcat connection pool DataSource, not by the JDBC driver (XA)DataSource. A correct* implementation of XADataSource will not implement connection pooling: that would be the (optional) responsibility of the DataSource implementation that uses the XADataSource as its factory. So that is not a reason to choose (or not choose) an XADataSource.
Your question might stem from the confusing terminology that a XADataSource creates XAConnection which extends PooledConnection. The name PooledConnection doesn't mean it comes from a connection pool, it means that after creation these can be held in a connection pool (which would be inside the DataSource that called XADataSource.getXAConnection).
Responsibilities of DataSource and XADataSource
In JDBC the responsibility of a DataSource is to create connections that can be used by your application. This means that it can be a very basic implementation that does nothing more than go directly to DriverManager, but also an implementation that provides connection pooling, and support for distributed transactions.
The idea is that you can swap one implementation for the other, while your code would be untouched.
So, the code consuming connections should always use a javax.sql.DataSource implementation. The javax.sql.XADataSource (and javax.sql.ConnectionPoolDataSource for that matter) are intended to be used by javax.sql.DataSource implementations that provided advanced features like connection pooling and/or distributed transactions. They should not be used directly in your own program. As the tutorial you link says:
Similarly, when the DataSource implementation is implemented to work with an XADataSource class, all of the connections it produces will automatically be connections that can be used in a distributed transaction.
In other words DataSource is the API you use to obtain a connection, and a XADataSource is used by a data source library that provides distributed transaction support. It obtains the XAConnection, registers it with a distributed transaction manager, and then gives you the logical connection obtained from XAConnection.getConnection().
This is also a described in the JDBC 4.2 specification, section 12.1:
Distributed transactions require an infrastructure that provides these
roles:
Transaction manager — controls transaction boundaries and manages the two-phase commit protocol. This typically will be an
implementation of JTA.
JDBC drivers that implement the XADataSource, XAConnection, and XAResource interfaces. These are described in the next section.
An application-visible implementation of DataSource to “sit on top of” each XADataSource object and interact with the transaction
manager. The DataSource implementation is typically provided by an
application server.
Resource manager(s) to manage the underlying data. In the context of the JDBC API, a resource manager is a DBMS server. The term
“resource manager” is borrowed from JTA to emphasize the point that
distributed transactions using the JDBC API follow the architecture
specified in that document.
TL;DR: You --use--> DataSource --(potentially) uses--> XADataSource
*: Historically there has been some confusion in various JDBC implementations regarding responsibilities, and in some cases connection pools have implemented all three interfaces at the same time.

Connection pool for dynamic database connections

The problem setup is based on a webservice (Spring/Java, Tomcat7 and MySql) where every user gets their own database, hence each request needs their own connection. As all databases are created dynamically during runtime, configuring them statically before startup is not an option.
To optimise database connection usage, an implementation of a database connection pool would be great, right?
With Java/Spring: How would I create a connection pool for dynamic databases? I am a bit struck by the lack of clean options here!
Problem: Tomcat's Connection Pool (and as far as i understand C3P0 as well) treats each new DataSource instance as a whole new connection pool -> stack-reference
Is it a good idea to create a static datasource with a generic MySql connection (without specifing the database on connection) and use a connection pool with this datasource together with adapted SQL statements?
stack-reference
What about developing a custom persistent database based datasource pool? Any experience with performance here? Any advice? Any libraries that do that?
Or would it be feasable to workaround Tomcat's DataSource problem by creating Tomcat JNDI Datasources dynamically by manipulating it's context.xml dynamically from Java?
I can't believe that there aren't more plain/simple solutions for this. Grails/Hibernate struggles with this, Java/JDBC struggles with this, ... is it such a rare use-case to separate userdata on a user basis by creating user specific databases dynamically? If so, what would be a better setup?
EDIT
Another option is the suggestion from #M.Deinum to use a single configured datasource and dynamically hotswap it for the right connection ->M.Deinum Blog and stack-reference. How does that perform with a connection pool like the ones above?

I believe that HikariCP works without having to specify a single database.

Once the databases are created in runtime, you have to create the pools also in runtime. I am afraid the spring infrastructure is not giving you any help here, as it is tuned for the usual static use case.
I'd have a map of pools:
have a Map < connectionUrlString,List< c3poPool > > map
when requesting a connection, get the corresponding c3po pool from the map
and you can get the best of both worlds, since the real connection pool for each dynamically created database is handled by a c3po instance, but you can create new instances in runtime
This works as a low-level solution. If you want to go further, you can wrap this logic into a db connection provider, and register that as a "driver". This way any part of your application requests a new connection, you can just return one connection from the existing pools (and if a totally new connection is requested, create a new pool for that).

First than all, sorry for my english, i'm improving every day.
In my experience, I had a similar situation and it was resolve with spring framework. Let me explain you how you'd solve that question.
Make a spring config file with these characteristics:
a) A resource loader: This one is the responsible of load properties from configurations files or from database, those properties will be the appropriates to establish the database connection.
b) A pool database configuration parameterized with the properties that you'll load.
Create a locator class: In this class you'll need a HashMap
Use the multi context feature of spring: The idea is assign a code to every one connection that you establish and later load that connection like an application context with spring, then in the locator class, put in the map that context and use it as frequent as you need.
I think is you follow these steps, you can create dynamic pool or database connection as you want.

Does hibernate use PreparedStatement by default

"Hibernate always uses PreparedStatement for calls to the database" Quoted here.
If so then where does hibernate cache compiled queries, does DB driver cache them.
I read about c3p0. If hibernate caches PreparedStatement by default then what is the use of hibernate.c3p0.max_statements in c3p0.
If hibernate does not do it by default then is connection pooling mandatory for caching prepared statements.
Could somebody please clarify these.

Caching prepared statements only makes sense in the scope of a specific JDBC connection. So, you will only gain something out of caching prepared statements when a sort of connection pooling is available to the ORM layer. Otherwise, you get a new "physical" JDBC connection every time you create a Hibernate Session (which is not very efficient normally). Without any connection pooling caching prepared statements is only useful in the scope of a single JDBC connection / Hibernate Session. This happens because without any connection pooling the "physical" connection is actually closed and won't be reused - instead always a new connection will be created using the database driver, whenever it is required.
One other thing that you need to take into account, is that the number of open prepared statements on a single JDBC connection is limited (the limitation is vendor dependent and varies between driver implementations as far as I know). So, in a pooled connections scenario, the pooling implementation will likely need to know how many open prepared statements may be maintained on each of the pool's "physical" underlying JDBC connections. Likely, a "least used prepared statements gets closed first" policy is implemented, but this is pure speculation on my part.
I hope this makes some sense. Whenever I mention a "physical" JDBC connection I mean an actually new TCP/IP connection to the database. Connections obtained by a connection pool will typically decorate/wrap a "physical" one.
Edits to answer your questions more directly:
Hibernate most likely uses and caches PreparedStatements (this is very basic JDBC optimization). The question is does this caching happen on the statements created by a "physical" or a pool provided JDBC connection. Without a pool caching the PreparedStatements only optimizes the part of the application execution that uses a specific PreparedStatement twice in the scope of a specific Hibernate Session. With a pool the same PreparedStatement will (effectively) be used across many Hibernate Session instances that will happen to use the same underlying "physical" connection.
The property hibernate.c3p0.max_statements of your hibernate configuration will most likely configure the C3PO pool instance (which I am pretty sure is created automatically for you) and this configuration has something to do with the fact about the number of open prepared statements being limited in a "physical" JDBC connection.

Making datasource in Glassfish

I am creating a JDBC connection pool resource for GlassFish, using the server's Admin Console.
One of the fields on the page to create the pool is labeled 'Resource Type'. This field has four possible values: javax.sql.DataSource, javax.sql.XADataSource, javax.sql.ConnectionPoolDataSource and javax.sql.Driver, but the help text for the Create JDBC connection pool 'wizard' does not have much info about the advantages and disadvantages of these choices.
When prompted to pick a resource type which should I choose?
I am going to connect to a local MySQL server. It would be nice to get an explanation of the differences between the choices in the drop-down as well.

Below are the scenarios where you would need each of the resource types listed. Hope this helps.
DataSource
DataSource A DataSource object is a factory for Connection objects. When using simple DataSource, appserver uses its own pooling instead of native.
ConnectionPoolDataSource
A ConnectionPoolDataSource object is a factory for PooledConnection objects. ConnectionPoolDataSource is used to give access to PooledConnection which implements native pooling by JDBC driver. In this case application server can implement connections pooling using this native interface. Please refer to Java API to know what a PooledConnection is...A ConnectionPoolDataSource can use a third party implementation for pooling - as far as I know for Tomcat, for instance, DBCP connection pooling is used.
XADataSource
You need an XADataSource if you want to execute a Distributed Transaction. You should use XADataSource instead of DataSource if the application
Uses the Java Transaction API (JTA)
Includes multiple database updates within a single transaction
Accesses multiple resources, such as a database and the Java Messaging Service (JMS), during a transaction

Context specific usernames for connections on a Java EE Datasource (JBoss 5.1)

We have an application that needs to access a database that is owned by a different team.
That database has security inside the database (triggers, table permissions, etc) and so we need to establish a connection to the database using the same username/password that connected to our EJB.
We're running on JBoss 5.1. Standard Java EE solutions are preferred, but JBoss specific answers will do.
At the moment our solution is
Create a datasource in JBoss with no user-id password
Require the client to pass their username/password into the EJB (the EJB is a stateful session bean (SFSB), and remembers the username/password)
The session bean creates a new connection using DataSource.getConnection(String, String)
The connection is "created" from the datasource at the start of each request (The datasource implementation might reuse an existing connection)
The main problem we have is connection pooling.
The JBoss connection pool doesn't manage separate pools for each username - they're all thrown into 1 big pool, and the username is checked after the object is retrieved from the pool (inside InternalManagedConnectionPool).
If the usernames don't match, then the connection is removed from the pool & destroyed.
This means that as soon as we have 2 users, there's a 50% chance that any connection that is put into the pool will be destroyed when it is next accessed. And as we increase the number of users, those odds get a lot worse.
We can't simply create 1 connection in the SFSB and retain it because JBoss is too smart for us, and it detects that we've left a connection open and automatically returns it to the pool for us, so the next request to the SFSB will fail with a "not associated" connection.
It would also be nice if we could simply get JBoss to create a connection as "the currently logged in user", but the solution we have is bearable.
My googling has failed to find any recommended patterns for doing this sort of thing. Everyone seems to assume that you want your datasource to use a single user for all connections (which is nice when it's possible, but I can't do that in this case)
The only solutions I can some up with are
Don't use a container provided datasource. Put the JDBC URL into a configuration value somewhere and create connections myself (possibly with the help of spring)
Bind a different Datasource implementation (possibly a custom one) into JNDI
Has anyone got any better solutions? Or pointers to recommended practices in this area?
(The database is Sybase ASE 15, but I doubt that makes any difference to the solution)

A bit more reading of the documentation has led me to what appears to be the solution.
Adding
<application-managed-security/>
into my datasource file seems to have fixed the problem of pooling the connections.
I'm not sure how I missed that the first time around.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.