What is Type 4 XA driver? - java

In our application when we create the Datasource, we select the
Database Name DB2
Driver: BEA Type 4 XA DB2
But what i know is, there are only 4 types of Driver. Then what is Type 4 XA driver?

From this blog entry.
An XA transaction, in the most general
terms, is a "global transaction" that
may span multiple resources.
That is, a transaction running across (say) 2 databases. So, for example, insertions can be managed across those 2 databases and committed/rolled back atomically.
The "type 4" refers to a native Java JDBC driver converting directly into the database protocol. See here for more details

Type 4: All Native Java
XA: stands for Extensible Architecture which is refered mostly for a 2-phase-commit protocol - see wikipedia. Short: A standard protocol for a global transaction between one transaction coordinator and several transaction managers. Sometimes they are also called transaction monitors. It's pretty slow so should avoid it if you don't really need it. But well, at our customer we mostly need it :(

Major advantage of XA is that it can access multiple databases in one connection/transaction.

Related

With Oracle JDBC, what problems will I face if I do not COMMIT select-only transactions

In order to connect to my Oracle DB, I use the thin Oracle JDBC driver and the tomcat JDBC connection pool.
The default setting tomcat-pool is not to touch the connection when it is returned to the pool (any associated transaction is not committed or rolled back, and the actual connection is not closed).
The default transaction isolation level of the Oracle driver is READ_COMMITTED, and auto-commit is set to false
I have several business methods that use the same connection pool to connect to the database. In some business methods, I explicitly create a new transaction, issue some updates and commit that transaction. So far so good.
Some other business methods are read-only. They are read-only because I only issue SELECT queries, not because I've explicitly set READ ONLY mode on the connection (it stays off).
My understanding is that under these conditions, Oracle creates a new implicit transaction the first time is encounters a SELECT statement, and keeps that transaction open until I explicitly COMMIT it.
However, given that I only issue SELECT queries, I don't feel that it is necessary to commit the transaction.
I understand that not committing those read-only transactions before returning their connections to the pool means that then next time the connection is borrowed from the pool, the next business method will execute within a transaction that was already started earlier.
My question are:
Does it matter?
Given the transaction isolation is set to READ_COMMITTED and auto-commit is false, is there any potential extra data race condition that wouldn't have existed if I committed my read-only transaction?
Does not committing the transaction incur performance costs on the DB server side (the transaction could potentially stay uncommitted for hours)?
Do the answers to the above questions change if we are connecting to RAC instead of a single Oracle DB instance?
What about if I'm selecting from a DB Link?
From all the resources I've read, my conclusion so far is that what I'm doing is technically not correct, but that as long as I don't pick a higher transaction isolation level it also doesn't matter. READ_COMMITTED will read whatever was most recently committed by other transactions regardless of whether or not the current transaction is committed or not, and the overhead on the oracle server for keeping track of a transaction that has no pending modifications is trivial. So fixing this issue falls into the "should be fixed, but not an emergency" category.
Does that make sense or did I misunderstand something important?
Like Alex Poole commented, it's true that the database engine does not necessarily need to create a transaction for certain cases of read operations for correct operation. I'm not familiar with Oracle internals, but I'll trust Alex that running this query after a select query will show that no actual TX was created.
However, the database engine is the lowest level and knows everything. On top of that we have DB client programs, one of which is the JDBC driver. The JDBC has its own standard, which includes things like setAutocommit(), a thing that databases don't know about, and which is used for transaction handling in JDBC code. Note that setAutocommit(false) is an abstraction, and it does not mean "start transaction", it just means that operations will be grouped in a transaction in some way.
This is where the JDBC vs. DB engine differences start. I'm basing this on PostgreSQL driver code (as it is more readily available), but I expect the Oracle driver to behave in similar way.
Since the driver cannot easily know whether a TX is needed or not, it cannot do heavy optimization like the actual DB engine. The PostgreSQL driver has a SUPPRESS_BEGIN flag that can be used to indicate that a transaction isn't needed even if one would be started otherwise, but based on a quick look it's used only when fetching metadata, an operation known to be safe from transaction issues. Otherwise the driver will issue a BEGIN to start a transaction even for a SELECT, when autocommit=false.
So even if the DB engine doesn't need a TX, the driver is an abstraction on top of the DB, working with JDBC specs. Since you've observed open transactions, it's safe to assume that the Oracle driver also creates explicit transactions for operations where the Oracle DB engine would not create implicit transactions.
So, does it matter?
As discussed in the comments, probably won't make a difference when you have some simple selects to local tables as the transactions aren't holding up significant resources. But throw in DB Links, changes to queries over time, and there's a non-zero amount of risk.

What the standard way in jdbc to manage lost connection?

My application sometimes can lost connection to MySQL database.
I think that good solution will be schedule some timer to try to recconect after some time.
How better it can be done? May be separate thread that try to connecto to db? Or exist the stardard practices ?
Thanks.
JDBC is a great way to start building a java database application, but managing object mappings and connections / transactions can very rapidly lead to alot of boiler plate and rewriting of logic which has already been written many times by many programmers.
It is expected that you should lose/close a connection normally, unless you have a high throughput application, in which case you might keep several connections alive (this is known as connection pooling).
There are essentially 3 "high-level" approaches to maintaining efficient connections and transactions :
1) The simplest solution is to check when you are reusing a connection to make sure it is valid, or reopen it every time.
2) A more sophisticated solution is to use a connection pooling mechanism, such as the apache http://commons.apache.org/dbcp/ dbcp library.
3) Finally, in my opinion, the most maintainable solution is to use a JDBC framework like ibatis/hibernate which will , providing you a simple , declarative interface for managing object relational mapping / transactions / database state ---- while also transparently maintaining connection logic for you.
ALSO : If object relational mapping is not your thing, then you can use a framework such as DBUtils by apache to manage querying and connections, without the heavy weight data mapping stuff getting in the way.
JDBC is a simple API to abstract the operations of different database system. It makes something uniform, such different native types to java type.
However, the lost connection is another big issue. Using the connection pool library is better than writing a new one by yourself. It is too many details to implement a connection pool from scratch without bug.
Consider to use the mature library:
Commons-DBCP
bonecp
etc
Commons DBCP is based on the Commons Pool.
You should understand the configurable options both of them.
bonecp is another new connection pool, no lock is its advantage.
Validated SQL String is important to check a connection is dead or alive.
Lost connection checking enable by the validation string set.
Here is the dbcp configuration page:
http://commons.apache.org/dbcp/configuration.html
It says:
NOTE - for a true value to have any effect, the validationQuery
parameter must be set to a non-null string.
For example:
dataSource.setValidationQuery(isDBOracle() ? "select 1 from dual" : "select 1");
dataSource.setTestWhileIdle(true);
dataSource.setTestOnReturn(true);
dataSource.setRemoveAbandoned(true);
dataSource.setRemoveAbandonedTimeout(60 * 3 /* 3 mins */);
dataSource.setMaxIdle(30);
dataSource.setMaxWait(1000 * 20 /* 20 secs*/);
Remind: If you use the Common DBCP in weblogic, don't forget the older Commons library in the server, it will drive your application use the different version. The prefer-web-inf-classes setting will help you.

Is JPA an appropriate ORM for this scenario?

We have a central database for accounts. It contains login information, and a field called database profile. The database profile indicates what database connection should be used for the account. For example, we would have Profile1, Profile2, Profile3... ProfileN
If a user is indicated to have Profile1, they would be using a different database than a user who is indicated to be a part of Profile2.
My understanding of JPA is that you would need a new EntityManagerFactory for each Profile (Persistence Unit), even though the databases all have the same schema, just different connection information. So if we ended up having 100 profiles we would have 100 Entity Manager Factories, which doesn't seem ideal.
I've looked into the EntityManagerFactory, and it doesn't seem to let you change the database connection options.
Is my only option to have N EntityManagerFactory's, and if so would their be be any major consequences to this (such as bad performance)?
Thanks for any advice.
The kinds of things you're talking about are getting out of the scope of the JPA abstraction. Things like specific connection management are generally more provider specific. For example, with Hibernate SessionFactory you can certainly create new sessions from an aribtraty JDBC connection. There would be pitfalls to consider such as ID generation schemes (you'll probably have to use sequences generated in the DB), and you're basically hooped for L2 caching, but with careful programming it could be made to work.
Just use javax.persistence.Persistence#createEntityMananagerFactory(String,Map), and provide in the Map the connection parameters. Cache the EMF, and use the connections judiciously, don't mix n match object from the different EMFs.
If you are using spring then I know there is a way to dynamically switch the DataSource. Find more information here

Cassandra - transaction support

I am going through apache cassandra and working on sample data insertion, retrieving etc.
The documentation is very limited.
I am interested in knowing
can we completely replace relation db like mysql/ oracle with cassandra?
does cassandra support rollback/ commit?
does cassandra clients (thrift/ hector) support fetching associated object (objects where we save one super columns' key in another super column family)?
This will help me a lot to proceed further.
thank you in advance.
Short answer: No.
By design, Cassandra values availability and partition tolerance over consistency1. Basically, it's not possible to get acceptable latency while maintaining all three of qualities: one has to be sacrificed. This is called CAP theorem.
The amount of consistency is configurable in Cassandra using consistency levels, but there doesn't exist any semantics for rollback. There's no guarantee that you'll be able to roll back your changes even if the first write succeeds.
If you want to build application with transactions or locks on top of Cassandra, you probably want to look at Zookeeper, which can be used to provide distributed synchronization.
You might've already guessed this, but Cassandra doesn't have foreign keys or anything like that. This has to be handled manually. I'm not that familiar with Hector, but a higher-level client could be able to do this semi-automatically.
Whether or not you can use Cassandra to easily replace a RDBMS depends on your specific use case. In your use case (based on your questions), it might be hard to do so.
In version 2.x you can combine CQL-statements in logged batch that is atomic. Either all or none of statements succeed. Also you can read about lightweight transactions.
More than that - there are several persistence managers for Cassandra. You can achive foreign keys behavior on client level with them. For example, Achilles and Kundera.
If Zookeeper is able to handle transactions that has Oracle-quality then its a done deal. Relations and relation integrity is no problem to implement on top of ANY database. A foreign key is just another data-field. ACID/Transactions is the key issue.
instead of commit and rollback, you must use batch.
batch worked atomic, this means all records in multiple tables submit or no submit atomic mode
for example :
var batch = new BatchStatement();
batchItem= session.Prepare(stringCommand);
batch.Add(batchItem);
var result = session.ExecuteAsync(batch);
Of course you can but it is completely depends on your use case. If you don't pick the right db for your use case then you need to worry about lots of things on your own. For ex, in rdbms geographically distribution doesn't provided you need to find a way to do it. In cassandra, you lack some acid properties under some conditions. You need to handle those properties on application side.
Yes but limited for certain use cases. You can use batch property. It supports rollback but you lack the isolation. I am not sure this property exist in OSS Cassandra. For more info look
Dont understand what you mean by super column. If you ask to find an id in another table columns, yeah you can do it, why not. But definitely not understand what you mean by super column.
Overall Cassandra is not ACID compliant but there are some features that helps you under some conditions to be ACID compliant like batch, lightweight transactions.

When can/should you go whole hog with the ORM approach?

It seems to me that introducing an ORM tool is supposed to make your architecture cleaner, but for efficiency I've found myself bypassing it and iterating over a JDBC Result Set on occasion. This leads to an uncoordinated tangle of artifacts instead of a cleaner architecture.
Is this because I'm applying the tool in an invalid Context, or is it deeper than that?
When can/should you go whole hog with the ORM approach?
Any insight would be greatly appreciated.
A little of background:
In my environment I have about 50 client computers and 1 reasonably powerful SQL Server.
I have a desktop application in which all 50 clients are accessing the data at all times.
The project's Data Model has gone through a number of reorganizations for various reasons including clarity, efficiency, etc.
My Data Model's history
JDBC calls directly
DAO + POJO without relations between Pojos (basically wrapping the JDBC).
Added Relations between POJOs implementing Lazy Loading, but just hiding the inter-DAO calls
Jumped onto the Hibernate bandwagon after seeing how "simple" it made data access (it made inter POJO relations trivial) and because it could decrease the number of round trips to the database when working with many related entities.
Since it was a desktop application keeping Sessions open long term was a nightmare so it ended up causing a whole lot of issues
Stepped back to a partial DAO/Hibernate approach that allows me to make direct JDBC calls behind the DAO curtain while at the same time using Hibernate.
Hibernate makes more sense when your application works on object graphs, which are persisted in the RDBMS. Instead, if your application logic works on a 2-D matrix of data, fetching those via direct JDBC works better. Although Hibernate is written on top of JDBC, it has capabilities which might be non-trivial to implement in JDBC. For eg:
Say, the user views a row in the UI and changes some of the values and you want to fire an update query for only those columns that did indeed change.
To avoid getting into deadlocks you need to maintain a global order for SQLs in a transaction. Getting this right JDBC might not be easy
Easily setting up optimistic locking. When you use JDBC, you need to remember to have this in every update query.
Batch updates, lazy materialization of collections etc might also be non-trivial to implement in JDBC.
(I say "might be non-trivial", because it of course can be done - and you might be a super hacker:)
Hibernate lets you fire your own SQL queries also, in case you need to.
Hope this helps you to decide.
PS: Keeping the Session open on a remote desktop client and running into trouble is really not Hibernate's problem - you would run into the same issue if you keep the Connection to the DB open for long.

Categories