Why are MySQL PreparedStatements not automatically closed in Java upon GC?

Why are MySQL PreparedStatements not automatically closed in Java upon GC? - java

In my early Java web applications, I would close the Connection at the end of each request. I would not close ResultSet or PreapredStatement because they are closed automatically when the Connection is closed.
However, some of my applications had to create many PreparedStatements in a loop. So that if I did not close them individually, I would get an OutOfMemory error before the request could finish. It seems that each Connection keeps references to all of its PreparedStatements.
I'm used to Perl automatically cleaning up the statement ($sth) when the variable becomes inaccessible (i.e. subroutine returns). If $sth is still accessible when I call $dbh->disconnect, then I have to call $sth->finish, but most times it is not accessible at the end of my scripts and that is not necessary.
I would think that Java could give the developer similar convenience using the finalize method and possibly using WeakReference. The finalize method is called when all references are gone and the object is inaccessible. (except using weak references)
Would this be a reasonable solution to automatically closing PreparedStatements? If so, I am planning to create wrapper functions to achieve such automation in my application. If that's not a reasonable solution, please explain.
What would be the likely side-effects that prevented the the designers of the MySQL-Java Connector (a JDBC driver) from including this automation in their original design?
The only side-effect I can think of would be memory overhead in the MySQL server itself. However, I don't know whether the MySQL server even has any overhead when a PreparedStatement remains open. Is there any memory kept overhead in MySQL?

The finalize method runs when the JVM decides it needs to - it may or may not run when the object goes out of scope. It is not the same as a C++ destructor for example. You should not need to create PreparedStatements in a loop - the whole point is that they are "prepared" and can be used many times. That doesn't mean it should live forever but it should be reused if, in your example, you need to loop over something.
In short - never count on finalize. If you open something then close it. If at all possible use the AutoClosable interface in new code so that you don't have to worry about it.

I'm used to Perl automatically cleaning up the statement
I can imagine two reasons:
The Connection is probably referenced strongly with the Driver. This could be replaced by a WeakReference, but it won't do what you need (see below).
Perl has tons of built-in special features making programming more convenient and this may be one of them: "When a variable can be freed when leaving a sub, it will be freed."
I would think that Java could give the developer similar convenience using the finalize method and possibly using WeakReference.
No way. GC is only a tool for reclaiming memory, misusing it for anything else does not work (see below).
Would this be a reasonable solution to automatically closing PreparedStatements?
No. All it's practically good for is helping you find out whether and where you're leaking resources, so you can fix it. The problem is that other resources may get exhausted much faster leaving you with tons of memory, but e.g., no single free file descriptor. The GC won't trigger because of this. You could call System.gc manually, but it'd waste a lot of time.
What would be the likely side-effects that prevented the the designers of the MySQL-Java Connector ... from including this automation in their original design?
It just can't work. If it was possible, then it'd be implemented in JDBC, so that the Driver authors wouldn't need to bother.
The only side-effect I can think of would be memory overhead in the MySQL server itself.
I'd bet, when the MySQLConnection gets closed, it tells the server to release associated resources, too. There are such resources.
I'd strongly recommend to forget about using GC for this. There's no perfect solution in Java, but there are many good ones:
Since Java 7, you can use try-with-resources.
You can use #Cleanup from Project Lombok.
You can write a class doing all the work including the cleanup and converting the ResultSet to a List<Object[]> if you have enough memory.
You can use the Template method pattern with an abstract method specifying how to process a result row.
You can use one of thousands tools out there....

Related

Java Force Garbage collection without using JVMTI?

The only way I know to force garbage collection is to use ForceGarbageCollection() from JVMTI. Is there any cross-platofrm way to force GC (so I don't need to create a JVMTI library for each platform)?

I think that the answer is No.
But I also think that you shouldn't need to do this anyway.
The way to request the garbage collector to run is to call System.gc(). But as the javadoc explains, this can be ignored.
The normal reason that System.gc() is ignored is that the JVM has been launched with the option -XX:+DisableExplicitGC. This is NOT the default. It only happens if the person or script or whatever launching the JVM wants this behavior.
So you are really asking for a way for an application override the user or administrator's explicit instructions to ignore System.gc() calls. You should not be doing that. It is not the application or the application writer's prerogative to override the user's wishes.
If your Java application really needs to run the GC explicitly, include in the installation instructions that it should NOT be run with the -XX:+DisableExplicitGC option. Then System.gc() should work.
So why did they provide a way to disable gc() calls?
Basically because explicitly running the gc() is bad practice (see Why is it bad practice to call System.gc()?) and (nearly always1) unnecessary in a properly written application2. If you application relies on the GC running at specific times to function, then you have made a mistake in the application design.
1 - A couple of exceptions are test cases for code that uses Reference types and similar, and interactive games where you want to (say) clean up between levels to avoid a GC pause during normal play.
2 - It is not uncommon for a Java programmer to start out as a C or C++ programmer. It can be difficult for such people to realize that they don't need to take a hand in Java memory management. The JVM (nearly always) has better understanding of when to run the GC. People also come across Object.finalize and dream up "interesting" ways to use it ... without realizing that it is an expensive and (ultimately) unreliable mechanism.

JDBC Connection Pool Memory Issues (Java EE Application)

I currently have a Java EE application in which I have implemented my own connection pool class. Each method I use does a simple query (Statement and ResultSet). In the finally block of each method I use JDBC/my pool in, I first close the ResultSet, and then I close the Statement as many, many resources in books and online indicate should be done. Finally, I return the connection back to the pool.
While watching the memory of the JVM, I notice that the memory never really releases after I make a call which uses JDBC through my connection pool, or it takes a very long time to do so. I have checked my Garbage Collection settings and I am using gencon (IBM WebSphere), which many resources online have indicated is very good as well. I am using Spring Framework in my application too.
The connection pool class I wrote is very simple. Upon initialization, it creates a certain number of connections to the database and adds them to a Queue (I tried another implementation with just a simple Vector, but same results with the memory). When you request a connection, it checks to make sure there's an available connection, and if so it gives one to the caller. At the end, you return it back and it puts it back into the Queue/Vector.
I am wondering if there's anything else that could be done about this? Should I let Spring Framework handle my connection pool instead, or is there something else that handles the memory better? To me, it does make sense, but I'm not too familiar with implementing connection pooling. All the resources say to do what I am doing, but I'm assuming they might be using some built in pooling implementation. I do know that closing the connection works, but since this is a custom pooled solution, I cannot really do that.
Thanks!

Is your application running inside the WebSphere application server? Is it getting its datasource from the app server, rather than creating it itself? If so, then the connections and statements are already being pooled, and you don't need to pool them yourself.

OK, first thing's first: unless you have a very good reason, implementing your own connection pooling mechanism is an exercise in unnecessary risk. There's simply many connection pooling mechanisms out there to pick from - Spring would be one of them, Jakarta's DBCP being another. If you have the option of using a third-party utility for connection pooling - do it. If you're running inside a container, let the container do the work for you (WebSphere already does this, through its various "helper classes").
Regarding the symptoms you're experiencing: the fact that the memory isn't released, doesn't mean that there's a memory leak. It is up to the JVM to decide whether and when to actually free-up unreferenced object instances. Are you encountering OutOfMemory errors as well?
Start by issuing some heap dumps (right after JDBC resources' release) to see what's going on in there.
About the code you wrote - without posting the code, it's very hard to guesstimate whether you have hidden bugs there; but the flow you're describing appears correct.

Is it worth cleaning ThreadLocals in Filter to solve thread pool-related issues?

In short - tomcat uses a thread pool, so threads are reused. Some libraries use ThreadLocal variables, but don't clean them up (using .remove()), so in fact they return "dirty" threads to the pool.
Tomcat has new features of detecting these things on shutdown, and cleaning the thread locals. But it means the threads are "dirty" during the whole execution.
What I can do is implement a Filter, and right after the request completes (and the thread is returned to the pool), clean all ThreadLocals, using the code from tomcat (the method there is called checkThreadLocalsForLeaks).
The question is, is it worth it? Two pros:
preventing memory leaks
preventing undeterministic behaviour by the libraries, which assume the thread is "fresh"
One con:
The solution uses reflection, so it's potentially slow. All reflection data (Fields) will be cached, of course, but still.
Another option is to report the issue to the libraries that don't clean their thread locals.

I would go through the route of reporting the issue to the library developers for 2 reasons:
It will help to other people who want to use the same library, but lack the skills / time to find such a HORRIBLE memory leak.
To help the developers of the library to build a better product.
Honestly, I've never seen this type of error before and I think it's an exception rather than something that we should guard as it happens often. Could you share on which library you've seen this behaviour?
As a side note, I wouldn't mind enabling that filter in the development / test environment and logging a critical error if a ThreadLocal variable is still attached.

in theory, this seems like a good idea. however, i could see some situations where you might not want to do this. for instance, some of the xml related technologies have some non-trivial setup costs (like setting up DocumentBuilders and Transformers). if you are doing a lot of that in your webapp, it may make sense to cache these instances in ThreadLocals (as the utilities are generally not thread-safe). in this case, you probably don't want to clean these between requests.

If you think there's a chance that the dirtiness of the threads will actually cause problems, then this is a sensible thing to do. Problems are to be avoided where possible.
The use of threadlocals may be bad behaviour by the library, and you should certainly report it to the authors, but sadly, right now, it's down to you to deal with it.
I wouldn't worry too much about performance. The slow bit in reflection is the metadata lookup; once you have a Field object, then using it is fairly quick, and gets quicker over time - AIUI, it starts out working by making a native call into the JVM, but after some number of uses, it generates bytecode for the access, which can then be compiled into native code, optimised, inlined, etc, so it shouldn't be much slower than a direct field access. I don't think the Tomcat code reuses the Field objects across requests, though, so if you want to take advantage of that, you'd have to write your own cleaning code. In any case, the performance cost will be far smaller than the cost of the IO associated with the request.

How to force hibernate to release memory once the session is closed?

We've just recently started using Hibernate and are still getting used to the way it works.
On of the things we've seen is that even after all sessions are closed and references have gone out of scope, hibernate still seems to maintain the previously used database values in it's cache.
We have code that reads from a set of tables in multiple passes. Because all the memory is freed very sparingly, the later passes slow down to a crawl.
Is there any way to force Hibernate to clear its cache ?
An explicit call to System.gc() doesn't help. (Yes, I know it is a suggestion)
Additional Info: We've explicitly disabled the second-level cache.

You could try calling Session.clear to force a clear of the first-level cache. Be sure to call Session.flush first to write any pending changes to the database. If that "fixes" the problem, then I suspect something is still holding a reference to the session, preventing the objects in the cache from being garbage-collected. You may need to obtain a heap dump of your program to track down the leak.

Hibernate also has an optional second level cache which could be at play. I agree with Rob, the easiest way to find out is to see what is still in memory after the session ends.
My current favorite tool for this is YourKit which is commercial and not exactly cheap. They used to offer (and may still offer) a personal license option which was very inexpensive ($99 IIRC). I have used YourKit for precisely this task when troubleshooting heap usage problems with Alfresco ECM. There are other tools available (e.g. CodeGear JGears) which I understand also work very well.
You might consider using a product in evaluation mode - if this finds your problem it might earn its keep ;)

Why doesn't the JVM cache JIT compiled code?

The canonical JVM implementation from Sun applies some pretty sophisticated optimization to bytecode to obtain near-native execution speeds after the code has been run a few times.
The question is, why isn't this compiled code cached to disk for use during subsequent uses of the same function/class?
As it stands, every time a program is executed, the JIT compiler kicks in afresh, rather than using a pre-compiled version of the code. Wouldn't adding this feature add a significant boost to the initial run time of the program, when the bytecode is essentially being interpreted?

Without resorting to cut'n'paste of the link that #MYYN posted, I suspect this is because the optimisations that the JVM performs are not static, but rather dynamic, based on the data patterns as well as code patterns. It's likely that these data patterns will change during the application's lifetime, rendering the cached optimisations less than optimal.
So you'd need a mechanism to establish whether than saved optimisations were still optimal, at which point you might as well just re-optimise on the fly.

Oracle's JVM is indeed documented to do so -- quoting Oracle,
the compiler can take advantage of
Oracle JVM's class resolution model to
optionally persist compiled Java
methods across database calls,
sessions, or instances. Such
persistence avoids the overhead of
unnecessary recompilations across
sessions or instances, when it is
known that semantically the Java code
has not changed.
I don't know why all sophisticated VM implementations don't offer similar options.

An updated to the existing answers - Java 8 has a JEP dedicated to solving this:
=> JEP 145: Cache Compiled Code. New link.
At a very high level, its stated goal is:
Save and reuse compiled native code from previous runs in order to
improve the startup time of large Java applications.
Hope this helps.

Excelsior JET has a caching JIT compiler since version 2.0, released back in 2001. Moreover, its AOT compiler may recompile the cache into a single DLL/shared object using all optimizations.

I do not know the actual reasons, not being in any way involved in the JVM implementation, but I can think of some plausible ones:
The idea of Java is to be a write-once-run-anywhere language, and putting precompiled stuff into the class file is kind of violating that (only "kind of" because of course the actual byte code would still be there)
It would increase the class file sizes because you would have the same code there multiple times, especially if you happen to run the same program under multiple different JVMs (which is not really uncommon, when you consider different versions to be different JVMs, which you really have to do)
The class files themselves might not be writable (though it would be pretty easy to check for that)
The JVM optimizations are partially based on run-time information and on other runs they might not be as applicable (though they should still provide some benefit)
But I really am guessing, and as you can see, I don't really think any of my reasons are actual show-stoppers. I figure Sun just don't consider this support as a priority, and maybe my first reason is close to the truth, as doing this habitually might also lead people into thinking that Java class files really need a separate version for each VM instead of being cross-platform.
My preferred way would actually be to have a separate bytecode-to-native translator that you could use to do something like this explicitly beforehand, creating class files that are explicitly built for a specific VM, with possibly the original bytecode in them so that you can run with different VMs too. But that probably comes from my experience: I've been mostly doing Java ME, where it really hurts that the Java compiler isn't smarter about compilation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.