I currently have a Java EE application in which I have implemented my own connection pool class. Each method I use does a simple query (Statement and ResultSet). In the finally block of each method I use JDBC/my pool in, I first close the ResultSet, and then I close the Statement as many, many resources in books and online indicate should be done. Finally, I return the connection back to the pool.
While watching the memory of the JVM, I notice that the memory never really releases after I make a call which uses JDBC through my connection pool, or it takes a very long time to do so. I have checked my Garbage Collection settings and I am using gencon (IBM WebSphere), which many resources online have indicated is very good as well. I am using Spring Framework in my application too.
The connection pool class I wrote is very simple. Upon initialization, it creates a certain number of connections to the database and adds them to a Queue (I tried another implementation with just a simple Vector, but same results with the memory). When you request a connection, it checks to make sure there's an available connection, and if so it gives one to the caller. At the end, you return it back and it puts it back into the Queue/Vector.
I am wondering if there's anything else that could be done about this? Should I let Spring Framework handle my connection pool instead, or is there something else that handles the memory better? To me, it does make sense, but I'm not too familiar with implementing connection pooling. All the resources say to do what I am doing, but I'm assuming they might be using some built in pooling implementation. I do know that closing the connection works, but since this is a custom pooled solution, I cannot really do that.
Thanks!
Is your application running inside the WebSphere application server? Is it getting its datasource from the app server, rather than creating it itself? If so, then the connections and statements are already being pooled, and you don't need to pool them yourself.
OK, first thing's first: unless you have a very good reason, implementing your own connection pooling mechanism is an exercise in unnecessary risk. There's simply many connection pooling mechanisms out there to pick from - Spring would be one of them, Jakarta's DBCP being another. If you have the option of using a third-party utility for connection pooling - do it. If you're running inside a container, let the container do the work for you (WebSphere already does this, through its various "helper classes").
Regarding the symptoms you're experiencing: the fact that the memory isn't released, doesn't mean that there's a memory leak. It is up to the JVM to decide whether and when to actually free-up unreferenced object instances. Are you encountering OutOfMemory errors as well?
Start by issuing some heap dumps (right after JDBC resources' release) to see what's going on in there.
About the code you wrote - without posting the code, it's very hard to guesstimate whether you have hidden bugs there; but the flow you're describing appears correct.
Related
In my early Java web applications, I would close the Connection at the end of each request. I would not close ResultSet or PreapredStatement because they are closed automatically when the Connection is closed.
However, some of my applications had to create many PreparedStatements in a loop. So that if I did not close them individually, I would get an OutOfMemory error before the request could finish. It seems that each Connection keeps references to all of its PreparedStatements.
I'm used to Perl automatically cleaning up the statement ($sth) when the variable becomes inaccessible (i.e. subroutine returns). If $sth is still accessible when I call $dbh->disconnect, then I have to call $sth->finish, but most times it is not accessible at the end of my scripts and that is not necessary.
I would think that Java could give the developer similar convenience using the finalize method and possibly using WeakReference. The finalize method is called when all references are gone and the object is inaccessible. (except using weak references)
Would this be a reasonable solution to automatically closing PreparedStatements? If so, I am planning to create wrapper functions to achieve such automation in my application. If that's not a reasonable solution, please explain.
What would be the likely side-effects that prevented the the designers of the MySQL-Java Connector (a JDBC driver) from including this automation in their original design?
The only side-effect I can think of would be memory overhead in the MySQL server itself. However, I don't know whether the MySQL server even has any overhead when a PreparedStatement remains open. Is there any memory kept overhead in MySQL?
The finalize method runs when the JVM decides it needs to - it may or may not run when the object goes out of scope. It is not the same as a C++ destructor for example. You should not need to create PreparedStatements in a loop - the whole point is that they are "prepared" and can be used many times. That doesn't mean it should live forever but it should be reused if, in your example, you need to loop over something.
In short - never count on finalize. If you open something then close it. If at all possible use the AutoClosable interface in new code so that you don't have to worry about it.
I'm used to Perl automatically cleaning up the statement
I can imagine two reasons:
The Connection is probably referenced strongly with the Driver. This could be replaced by a WeakReference, but it won't do what you need (see below).
Perl has tons of built-in special features making programming more convenient and this may be one of them: "When a variable can be freed when leaving a sub, it will be freed."
I would think that Java could give the developer similar convenience using the finalize method and possibly using WeakReference.
No way. GC is only a tool for reclaiming memory, misusing it for anything else does not work (see below).
Would this be a reasonable solution to automatically closing PreparedStatements?
No. All it's practically good for is helping you find out whether and where you're leaking resources, so you can fix it. The problem is that other resources may get exhausted much faster leaving you with tons of memory, but e.g., no single free file descriptor. The GC won't trigger because of this. You could call System.gc manually, but it'd waste a lot of time.
What would be the likely side-effects that prevented the the designers of the MySQL-Java Connector ... from including this automation in their original design?
It just can't work. If it was possible, then it'd be implemented in JDBC, so that the Driver authors wouldn't need to bother.
The only side-effect I can think of would be memory overhead in the MySQL server itself.
I'd bet, when the MySQLConnection gets closed, it tells the server to release associated resources, too. There are such resources.
I'd strongly recommend to forget about using GC for this. There's no perfect solution in Java, but there are many good ones:
Since Java 7, you can use try-with-resources.
You can use #Cleanup from Project Lombok.
You can write a class doing all the work including the cleanup and converting the ResultSet to a List<Object[]> if you have enough memory.
You can use the Template method pattern with an abstract method specifying how to process a result row.
You can use one of thousands tools out there....
I am making an app that use RMI and DATABSE,but i wanted to follow a good resource utilization for my connection to the database. I wanted to have only one single instance to be created when the server start and I need other class needs to get the connection and do their executeQuery(),createStatement() and return results to the client which be liter bind to combobox,table.
The proper way is to use a JDBC connection pool library such as boneCP. Do not attempt a do-it-yourself implementation of the connection pool: it is ridden with difficulties, and the ready-made alternatives always work due to the simple contract they implement.
I should also mention that using plain RMI is an obsoleted communication technique which was mainly associated with Java applets of the early 2000's. Today you would be much better served by a REST/JSON combination if you are doing something lightweight. Even on the enterprise level, REST/XML is gaining popularity, while for techniques based on Java Serialization it is waning. These newer technologies are preferred (among many other aspects) for the greater transparency of their line protocol, which helps diagnostics, debugging, and general predictability of the system as a whole.
I have a very simple app that is using ActiveMQ. The use case that it solves will involve sending small atomic Topic messages.
My first pass at this functionality built one connection to the broker and reused it as needed. However, in reading some of the docs, it seems like hanging onto a connection for reuse potentially hogs resources in the JVM.
So my dilema is, do I go incur the overhead of building up and tearing down a connection for every message, or do I incur the cost of hanging onto resources that for the most part sit idle.
I know there is no one definitive answer and the real answer is "it depends", but would really like some insight and opinions from others.
I believe you should be aware of both mentioned criteria. The solution is to use a pool of connection. In this case you share the connections and most of the time do not create a new one, as well as pool usually is limited to a specific number of connection (this is my assumption as of how I would implement it) - so that it doesn't take all resources in JVM.
Take a look at PooledConnectionFactory related section.
Also decision to keep connections or to recreate them totally depends on your usage scenario. If you plan to send messages regularly - sharing the connection is the right thing to do - since connection and session (I would recommend sharing sessions if possible in a high traffic) creation are quite expensive operations. However if your messages will be sent not that often ( few times per hour? :) ) - it will make sense not to keep alive the idle connection.
I am writing a ETL project in JAVA. I will connect to the source database, get the data only once, do some transformations and Load the data to a target database.
The point is that I am not connecting to the source or the target database multiple times repeatedly. I just connect once (using JDBC), get the data I need and close the connection.
Should I still use the connection pooling?
Thank you for your views!
Connection pooling is used to get around the fact that many database drivers take a long time to create a connection. If you only need to use it shortly, and then discard it, the overhead might be substantial (both in time and cpu) if you need many connections. It is simply faster to reuse than to create a new.
If you do not have that need, there is no reason to set up a connection pool if you don't have it already. If you happen to have one already, then just use that.
My guess is that in some circonstances, using several threads and concurrent connections could improve the overvall throughput of your software allowing for exemple to use all CPU of your RDBMS server, or of the client ETL. This also could help using the fact that several tables could sit physically on differents hardware and thus could be accessed in parallel.
The real impact would really depend of the computers you use and the architecture of the database.
Be carefull that typically ETL have ordering constraints and doing several things at the same time should not violate theses constraints.
Edit : An exemple of this. You can configure Oracle to execute each requests using several cores or not. (Depending of configuration and licence if I understand right). So if one request is allowed to use only one core, using several connections at the same time will allow several requests as the same time and better use the CPU resources of the server.
In a desktop application with an embedded Derby database, what should I keep alive (as opposed to recreating each time when talking with the database) for the whole lifetime of the application?
Connection and Statement, using the same statement throughout the lifetime of the program?
Connection, recreating statement repeatedly?
Neither of these. That is, recreating connection and statement repeatedly?
From a database amateur's viewpoint it would seem reasonable to avoid recreating anything that doesn't need to be recreated, but is option 1 (or 2) against standard practices or are there some obvious cons? Is (re)creating connections and statements expensive or not?
In an embedded Derby application, both Connection and Statement objects are quite cheap and I think you should not worry about creating them as you need them. In the Derby unit test suites, we create tens of thousands of connections and hundreds of thousands of statements, without problems.
It is also fine to keep your Connection and Statement objects around as long as you wish. Embedded Derby has no time limit, and will not drop the connection or statement objects unless you tell it to (by closing them), or unless you leak them away, in which case the Garbage Collector will clean them up (eventually).
Although it is fine to keep the connection around, you should commit() the transaction when it is complete (unless you run in autocommit mode of course).
And, if you are keeping a result set around, be aware that committing the transaction will usually also close the result set, less you specifically construct the special result sets that are held open across commit.
Connecting is indeed expensive (may cost a few hundred milliseconds). The connection has however a limited lifetime and the statement and resultset depends on its lifetime. The average DB will timeout and drop the connection whenever it's been released for more than 30 minutes. You can add some timeout checker in your code so that it will re-acquire the connection "automatically", but that's a tedious work and very prone to bugs if you don't know how it ought to work under the hoods. Rather use an existing, thoroughly developed and robust connection pool like C3P0 and write the JDBC code the usual way (acquire and close all the resources in the shortest possible scope). That should be it.
Although in theory (and apparently also in practice) in embedded databases connecting will be less expensive and a connection can survive forever, I would strongly disagree to approach embedded databases differently in JDBC code. It would make your JDBC code semantically flawed and completely unportable. You have to rewrite/reimplement everything whenever you'd like to distribute it and/or change to a real RDBMS server with more powers.