The preferred way of obtaining a connection is through a DataSource from what I have read. There are different interfaces such as DataSource and ConnectionPoolDataSource. Let say we use PostgreSQL driver and want to use connection pooling on a Glassfish server.
In your application code you invoke getConnection() on a property of type DataSource. How is this possible? Haven't Glassfish created a datasource of type ConnectionPoolDataSource (or more correctly an implementing class) and bound it to a JNDI name and when you get a datasource by using the JNDI name you get an object of ConnectionPoolDataSource not DataSource?? ConnectionPoolDataSource does not have a getConnection() method. I don't understand this server magic.
Could someone explain how all this fits together?
The DataSource, Driver or ConnectionPoolDataSource that you can select in the Glassfish config is not exposed to your application directly, instead the Application Server has its own DataSource that maintains a connection pool, this datasource uses the configured DataSource, Driver or ConnectionPoolDataSource as the factory for the connections it will keep in its pool.
So when you configure Glassfish with a ConnectionPoolDataSource, it uses the ConnectionPoolDataSource to create the physical connections (PooledConnection objects) for a connection pool. This connection pool is kept by the application server DataSource implementation. Your application then access that connection pool using this DataSource. The DataSource hands out logical Connection objects from the connection pool.
The exact inner workings of a logical and physical connection are implementation dependent, but these logical connections are usually some kind of proxy or wrapper around the physical connection. When you obtain the logical connection, the physical connection is checked out from the connection pool. When you close the logical connection, the connection pool receives a signal that the physical connection is available again and returns it to the connection pool.
Related
As far as I understand there are two types of DataSource connections, javax.sql.DataSource and javax.sql.XADataSource, this tutorial explains that javax.sql.DataSource give the connection the ability to be pooled and javax.sql.XADataSource give the connection distributed transactional behavior.
I understand most XADataSource will implement connection pooling as well as distributed transactions so I don't see the point to use a DataSource when you could use a XADataSource and have both.
Are there any tradeoff when choosing a XADataSource over a DataSource?
I understand is not mandatory for a XADataSource to use pooled connections, is it there a way to find out if a XADataSource uses pooled connections or only relying on the XADataSource provider's documentation?
EDIT:
I am refering to javax.sql.DataSource and javax.sql.XADataSource because those are the types Tomcat 8 factory gives you:
Type should always be javax.sql.DataSource or javax.sql.XADataSource
Depending on the type a org.apache.tomcat.jdbc.pool.DataSource or a
org.apache.tomcat.jdbc.pool.XADataSource will be created.
I do understand that in the end I would be using a DataSource on my code as an API, abstracting the underlying implementation... my question is more related on the decision making process I have to go through when I am configuring Tomcat 8 (or any other server as well).
I want to have pooled connections and there are many XADataSource implementations that will give transactional and pooled connections, so why not always use XADataSource if I will get more? (this of course doesn't applies for a XADataSource that doesn't implements pooled connections)
When to configure XADataSource
As explained in the second section, your code will always use the DataSource interface (which might use a XADataSource). If the question is when should you use a XADataSource (eg configure it in your application server), then the answer is simple:
You use an XADataSource if you need to have distributed transactions: that is ensure a transaction succeeds or fails across multiple resources (eg different databases).
If you don't need distributed transactions, then you can still configure an XADataSource, but this might have some overhead in terms of memory and processing, for example extra objects (eg XAResource) that go unused, and maybe in terms of the 'bookkeeping' done by the data source. This overhead is probably negligible though.
Some data source (eg the Tomcat pool as mentioned in your question), can either use a DataSource or an XADataSource as a factory to create connections (according to the JDBC spec a ConnectionPoolDataSource should also be available as a factory, but it looks like Tomcat ignores that option). This doesn't change the way you decide what to use:
Don't need distributed transactions:
Program --uses--> Tomcat connection pool DataSource --uses--> JDBC driver DataSource
Need distributed transactions:
Program --uses--> Tomcat connection pool DataSource --uses--> JDBC driver XADataSource
In both cases the connection pool is provided by the Tomcat connection pool DataSource, not by the JDBC driver (XA)DataSource. A correct* implementation of XADataSource will not implement connection pooling: that would be the (optional) responsibility of the DataSource implementation that uses the XADataSource as its factory. So that is not a reason to choose (or not choose) an XADataSource.
Your question might stem from the confusing terminology that a XADataSource creates XAConnection which extends PooledConnection. The name PooledConnection doesn't mean it comes from a connection pool, it means that after creation these can be held in a connection pool (which would be inside the DataSource that called XADataSource.getXAConnection).
Responsibilities of DataSource and XADataSource
In JDBC the responsibility of a DataSource is to create connections that can be used by your application. This means that it can be a very basic implementation that does nothing more than go directly to DriverManager, but also an implementation that provides connection pooling, and support for distributed transactions.
The idea is that you can swap one implementation for the other, while your code would be untouched.
So, the code consuming connections should always use a javax.sql.DataSource implementation. The javax.sql.XADataSource (and javax.sql.ConnectionPoolDataSource for that matter) are intended to be used by javax.sql.DataSource implementations that provided advanced features like connection pooling and/or distributed transactions. They should not be used directly in your own program. As the tutorial you link says:
Similarly, when the DataSource implementation is implemented to work with an XADataSource class, all of the connections it produces will automatically be connections that can be used in a distributed transaction.
In other words DataSource is the API you use to obtain a connection, and a XADataSource is used by a data source library that provides distributed transaction support. It obtains the XAConnection, registers it with a distributed transaction manager, and then gives you the logical connection obtained from XAConnection.getConnection().
This is also a described in the JDBC 4.2 specification, section 12.1:
Distributed transactions require an infrastructure that provides these
roles:
Transaction manager — controls transaction boundaries and manages the two-phase commit protocol. This typically will be an
implementation of JTA.
JDBC drivers that implement the XADataSource, XAConnection, and XAResource interfaces. These are described in the next section.
An application-visible implementation of DataSource to “sit on top of” each XADataSource object and interact with the transaction
manager. The DataSource implementation is typically provided by an
application server.
Resource manager(s) to manage the underlying data. In the context of the JDBC API, a resource manager is a DBMS server. The term
“resource manager” is borrowed from JTA to emphasize the point that
distributed transactions using the JDBC API follow the architecture
specified in that document.
TL;DR: You --use--> DataSource --(potentially) uses--> XADataSource
*: Historically there has been some confusion in various JDBC implementations regarding responsibilities, and in some cases connection pools have implemented all three interfaces at the same time.
What is the difference between Spring DriverManagerDataSource and apache BasicDataSource?
Which of them is preferable and in which situations?
Thank you.
As per the Spring documentation
This class is not an actual connection pool; it does not actually pool Connections. It just serves as simple replacement for a full-blown connection pool, implementing the same standard interface, but creating new Connections on every call.
If you need a "real" connection pool outside of a J2EE container, consider Apache's Jakarta Commons DBCP or C3P0. Commons DBCP's BasicDataSource and C3P0's ComboPooledDataSource are full connection pool beans, supporting the same basic properties as this class plus specific settings (such as minimal/maximal pool size etc).
Also read Controlling database connections
When using Spring's JDBC layer, you obtain a data source from JNDI or you configure your own with a connection pool implementation provided by a third party. Popular implementations are Apache Jakarta Commons DBCP and C3P0. Implementations in the Spring distribution are meant only for testing purposes and do not provide pooling.
From Spring DriverManagerDataSource API:
This class is not an actual connection pool; it does not actually
pool Connections. It just serves as simple replacement for a full-blown
connection pool, implementing the same standard interface, but creating new
Connections on every call.
In other words, it may be OK for tests but in real application use Apache DBCP
I'm not certain how to get a DataSource object. I was able to use the DriverManager method to obtain a connection to a SQL database running on localhost, but every time I try to use the DataSource method to do so I wind up getting exceptions (mostly for naming).
What I was wondering is:
Is it possible to get a DataSource object for local hosted databases?
Does the DataSource class need to be published, or is it like DriverManager where you just get a connection with no new class creation?
Could you show an example?
A DataSource allows getting a JDBC connection mostly from a pool of connections. A DataSource object represents a particular DBMS or some other data source, such as a file. If a company uses more than one data source, it will deploy a separate DataSource object for each of them. The DataSource interface is implemented by a driver vendor. You externalize DB connection properties file and fetch the object using JNDI. Using a Datasource you need to know only the JNDI name. The Application server cares about the details.
It can be implemented in three different ways:
A basic DataSource implementation produces standard Connection objects that are not pooled or used in a distributed transaction.
A DataSource implementation that supports connection pooling produces Connection objects that participate in connection pooling, that is, connections that can be recycled.
A DataSource implementation that supports distributed transactions produces Connection objects that can be used in a distributed transaction, that is, a transaction that accesses two or more DBMS servers.
Like, in Spring, you can configure the datasource in an XML file and then (1) either inject it into your bean, (2) get it from ApplicationContext.
DataSource ds = (DataSource) ApplicationContextProvider.
getApplicationContext().getBean("myDataSource");
Connection c = ds.getConnection();
Suggested Reading:
Connecting with DataSource Objects
Why do we use a DataSource instead of a DriverManager?
Data access with JDBC
How to retrieve DB connection using DataSource without JNDI?
Best way to manage DB connections without JNDI
I am creating a JDBC connection pool resource for GlassFish, using the server's Admin Console.
One of the fields on the page to create the pool is labeled 'Resource Type'. This field has four possible values: javax.sql.DataSource, javax.sql.XADataSource, javax.sql.ConnectionPoolDataSource and javax.sql.Driver, but the help text for the Create JDBC connection pool 'wizard' does not have much info about the advantages and disadvantages of these choices.
When prompted to pick a resource type which should I choose?
I am going to connect to a local MySQL server. It would be nice to get an explanation of the differences between the choices in the drop-down as well.
Below are the scenarios where you would need each of the resource types listed. Hope this helps.
DataSource
DataSource A DataSource object is a factory for Connection objects. When using simple DataSource, appserver uses its own pooling instead of native.
ConnectionPoolDataSource
A ConnectionPoolDataSource object is a factory for PooledConnection objects. ConnectionPoolDataSource is used to give access to PooledConnection which implements native pooling by JDBC driver. In this case application server can implement connections pooling using this native interface. Please refer to Java API to know what a PooledConnection is...A ConnectionPoolDataSource can use a third party implementation for pooling - as far as I know for Tomcat, for instance, DBCP connection pooling is used.
XADataSource
You need an XADataSource if you want to execute a Distributed Transaction. You should use XADataSource instead of DataSource if the application
Uses the Java Transaction API (JTA)
Includes multiple database updates within a single transaction
Accesses multiple resources, such as a database and the Java Messaging Service (JMS), during a transaction
When using Tomcat with MySQL, what is the relationship between poolPreparedStatements setting in Tomcat DataSource configuration (I believe coming from DBCP) and Connector/J cachePrepStmts setting? What's the optimal configuration?
poolPreparedStatements is a setting for the Tomcat JDBC connection pool and cachePrepStmts is a setting for Connector/J to tell MySQL to cache prepared statements. Two completely different things. cachePrepStmts is a per connection setting, but Connector/J doesn't concern itself with whether it's connecting to a database connection pool or to MySQL directly, yet cachePrepStmts works at it's best with persistent connections (e.g. connection pools). To use cachePrepStmts with a connection pool is the optimal configuration. Using poolPreparedStatements in Tomcat is to open a can of memory management worms (check out the Tomcat docs for this setting and you'll see). Really, it's best to let MySQL cache the prepared statements and let Tomcat pool the connections and not try to have one do the other's job.