Memory/Heap Status after closing ResultSet in JDBC

Memory/Heap Status after closing ResultSet in JDBC - java

ResultSet fetches records from database.
After using the resultset object we finally close the resultset.
Question is , once rs.close() is called, will it free the delete the fetched records from memory?
or
when JVM is facing shortage of space, garabage collector will be called to delete the resultSet?
If JVM is invoking GC when it faces shortage of memory, is it a good practice to call the Garbage collector manually in the java program to free up the space?

Result Sets are often implemented by using a database cursor. Calling resultSet.close() will release that cursor, so it will immediately free resources in the database.
The data read by a Result Set is often received in blocks of records. Calling resultSet.close() might "release" the last block, making it eligible for GC, but that would happen anyway once the resultSet itself goes out of scope and becomes eligible for GC, and that likely happens right after calling close(), so it really doesn't matter if calling close() releases Java memory early.
Java memory is only freed by a GC run. You don't control when that happens (calling System.gc() is only a hint, you don't have control).
You're considering the wrong things. What you should focus on is:
Making sure resources1 are always closed as soon a possible to free up database and system resources.
This is best done using try-with-resources.
Making sure you don't keep too much data, e.g. don't create objects for every row retrieved if you can process the data as you get it.
This is usually were memory leaks occur, not inside the JDBC driver.
1) E.g. ResultSet, Statement, Connection, InputStream, OutputStream, Reader, Writer, etc.

ResultSet.close() will immediately release all resources, except Blob, Clob and NClob objects. Release means resources will be freed when Garbage Collector decides so. Usually we don't have to worry about it.
However, some memory used by JDBC may remain used.
Suppose that the driver has some sort of cache built in, and that cache is connection-scoped. To release that memory, you'd have to close JDBC Connection.
E.g. MySQL JDBC has default fetch size of 0, meaning it loads entire table into memory and keeps it in the memory for all of your statements. What's the scope of this in-memory buffer? ;)
Anyway, if you suspect memory issues, have a look at your JDBC driver specifics.
Rule of thumb, explicit GC is never good idea. But for a quick look to determine if ResultSet.close()/Connection.close() release any resources, give it a try: inspect used/free memory, close(), gc(), inspect memory again. Without explicit GC you will hardly see any changes.

Explicit GC is a burden on JVM as it has to frequently check the memory usage and decide when to trigger it. Where as, setting the appropriate GC as per application requirement would be sufficient to handle the above scenarios.
ResultSet.close will mark the resources for garbage collection i.e. freed up the reference to mark the memory blocks as non-reachable. Also, for a jdbc, connection needs to be closed so that memory holding the connection cache can also be marked for gc.

Related

Thread with streams

Suppose a thread in my program reading a file from disk, and it encountered an Error(outOfMemory) and thread got killed without a chance to execute closing of streams given in finally. Will it keep that stream open even after that thread kills?

The finally block will still be executed. However, if the JVM is out of memory, there's a chance that there will be a problem closing the stream, resulting in another out of memory error thrown from within the finally block. If that happens, the stream will likely not be closed until the JVM exits.

In most case it should be closed. But it mostly depends on memory left when hiting close method and reference retention you have on the stream.
OOM are raised when trying to use more Heap memory than JVM is allowed to. But it doesn't mean you have no memory available at all. After OOM is raised, a lot of memory can be available due to many reasons : the process just try to allocate a BIG array that don't fit into memory, many intermediate allocated objects may have been discarded due to raised exception, GC may have run deeper collection than usual incremental ones, Stack memory can be used to process stream closing, etc.
Then, most streams are closed when garbage collected. Generally, you open and close a stream in the scope of method, then when exited there's no more reference over it. Thus the reference become eligible to garbage collection and may close automatically (however you have to wait for GC to collect it).
Most software good practice are based on "best effort". Don't try to think/do too much. Make the "best effort" to clean up and let it crash.
What are you suppose to do about a non-closed stream while your entire JVM is going away ?
In your case ("stream handling"), "best effort" is done trough usage of try-with-resources statement.
If you are worry about overhead of non-closed streams, you just have to use try-with-resources statement ("best effort" application) and MUST focus on reference retention which the real cause of "memory leak" in Java (as most Stream are closed when garbage collected).
The real problem about "non-closed streams" is related to limitation OS apply about number of "file descriptor/handler" that a process can have at a given time.
Thread aren't supposed to be "killed" and if so, you may quickly run into troubles as monitor aren't freed (which will cause more damage through your VM).

Using MapDB efficiently (confused about commits)

I'm using MapDB in a project that deals with billions of Objects that need to be mapped/queued. I don't need any kind of persistence after the program finishes (the MapDB databases are all temporary). I want the program to run as fast as possible, but I'm confused about MapDB's commit() function (which I assume is relevant to performance), even after reading the docs. My questions:
What exactly does commit do? My working understanding is that it serializes Objects from the heap to disk, thus freeing heap space. Is this accurate?
What happens to the references to Objects that were just committed? Do they get cleaned up by GC, or do they somehow 'reference' an Object on disk (with MapDB making this transparent?)
Ultimately I want to know how to use MapDB as efficiently as I can, but I can't do that without knowing what commit() is for. I'd appreciate any other advice that you might have for using MapDB efficiently.

The commit operation is an operation on transactions, just as you would find in a database system. MapDB implements transactions, so commit is effectively 'make the changes I've made to this DB permanent and visible to other users of it'. The complimentary operation is rollback, which discards all of the changes you've made within the current transaction. Commit doesn't (directly) affect what is in memory and what is not. You might want to look at compact() instead, if you're trying to reclaim heap space.
For your second question, if you're holding a strong reference to an object then you continue holding that strong reference. MapDB isn't going to delete it for you. You should think of MapDB as a normal Java Map, most of the time. When you call get, MapDB hides whether it's in memory or on disk from you and just returns you a usable reference to the retrieved object. That retrieved object will hang around in memory until it becomes garbage, just like anything else.

It is a good idea to try to commit not after every single change to a map you make, but instead do it on some sort of schedule.
like
Every N changes
Every M seconds
After some sort of logical checkpoints in your code.
Doing too many commits will make your application very slow.

Finding memory leak By JProfiler

My question is different from this I've done profiling of my application it is too slow.
After completion of one process I have seen some live objects in the heap walker.
Though we are caching some data from database to HashMap but the heap walker shows me some live objects like Resultset.getString and Statement.getString which should not be there.
HashMap.put() taking very less memory then above two methods.
Have I done everything fine and is this analysis right? OR I am missing anything and the memory is occupied by HashMap itself and HeapWalker is just showing me methods of JDBC (getString and executeQuery).

Since you're talking about methods, I guess you're looking at the "Allocations" view of the heap walker. That view shows where objects have been created, not where objects are referenced. There's a screen cast that explains allocation recording in JProfiler.
HashMap.put will not allocate a lot of memory, it just creates small "Entry" objects that are are used to store key-value pairs. The objects that take a lot of memory are created before you put them into the hash map.
Resultset.getString and Statement.getString create the String objects that you read from your database. So it's reasonable to assume that some of these objects are longer-lived.
To find out why objects are still on the heap, you should go to the reference view, select incoming references and search for the "path to GC root". The "biggest objects" view is also very helpful in tracking down excessive memory usage.

What you may be seeing is cached data held by the connection (perhaps its buffer cache) or the statement or result set.
This could be through not closing the connection, statement or result set or it could be due to connection pooling. If you look at the memory profile, you may be able to see the "path to GC root" (the path to the object root) and this would indicate what is holding your ResultSet strings. You should see if it's in your code, cached within something you retain or if it's in a pool.
N.B. I've not used JProfiler but this is how I would track it with YourKit.

What is a nonmemory resource?

I am reading "Effective Java".
In the discussion about finalize, he says
C++ destructors are also used to reclaim other nonmemory resources.
In Java, the try finally block is generally used for this purpose.
What are nonmemory resources?
Is a database connection a nonmemory resource? Doesn't the object for holding the database connection occupy some memory?

Database connections, network connections, file handles, mutexes, etc. Something which needs to be released (not just garbage-collected) when you're finished with it.
Yes, these objects typically occupy some memory, but the critical point is that they also have (possibly exclusive) access to some resource in addition to memory.

Is a database connection a non memory resource?
Yes, that's one of the most common examples. Others are file handles, native GUI objects (e.g. Swing or AWT windows) and sockets.
Doesn't the Object for holding the database connection occupy some memory?
Yes, but the point is that the non-memory part of the resource needs to be released as well and is typically much scarcer than the comparatively small amount of memory the object uses. Typically, such objects have a finalize() method that releases the non-memory resource, but the problem is that this finalizers will only run when the objects are garbage collected.
Since the objects are small, there may be plenty of available heap memory so that the garbage collector runs rarely. And in between runs of the garbage collector, the non-memory resources are not released and you may run out of them.
This may even cause problems with only a single object: for example, if you want to move a file between filesystems by opening it, opening the target file, copying the data and then deleting the original file, the delete will fail if the file is still opened - and it is almost certain to be if you only set the reference to the input stream to null and don't call close() explicitly, because it's very unlikely that the garbage collector would have run at exactly the right point between the object becoming eligible for garbage collection and the call to delete()

Another important peace on
Java Automatic Memory Management which touches on some of the essentials.

The question is better answered the other way around, in my view- 'why don't I need to release memory manually'.
This raises the question, 'why do I need to release any resources?'
Fundamentally, your running program uses many forms of resources to execute and do work (CPU cycles, memory locations, disk access, etc.). Almost all of these suffer from 'scarcity', that is, there is a fixed pool of any such resource available, if all resource is allocated then the OS can't satisfy requests and generally your program can't continue and dies very ungracefully- possibly making the whole system unstable. The only one that comes to mind that isn't scarce is CPU cycles, you can issue as many of these as you like, you're only limited by the rate at which you can issue them, they aren't consumed in the same sense that memory or file handles are.
So, any resource you use (memory, file handles, database connexions, network sockets, etc.) comes from a fixed amount of such resource (avoiding the word 'pool') and as your program (and, bear-in-mind other programs, not to mention the OS itself) allocates these resources, the amount available decreases.
If a program requests and is allocated resources and never releases them to be used elsewhere, eventually (often soon) the system will run out of such resources. At which point, either the system halts, or sometimes the offending program can be killed abruptly.
Pre-90s, resource management (at least in mainstream development) was a problem that every programmer had to deal with explicitly. Some resource allocation management isn't too hard, mostly because the allocation is already abstracted (e.g. file handles or network sockets) and one can obtain the resource, use it and explicitly release it when it's no longer wanted.
However, managing memory is very hard, particularly as memory allocation cannot (in non-trivial situations) be calculated at design-time, whereas, say, database connexions can feasibly be managed this way. (There's no way of knowing how much memory you will use, it's very difficult/ impossible to know when an allocation of memory is no longer in use). Also, memory allocations tend to hang-around for a while, where most other resource allocations are limited to a narrow scope, often within a single try-block, or method, at most usually a class. Therefore, vendors developed methods of abstracting memory allocation and bringing it under a single management system, handled by the executing environment, not the program.
This is the difference between managed environments (e.g. Java, .NET) and unmanaged ones (e.g. C, C++ run directly through the OS). In C/C++ memory allocation is done explicitly (with malloc()/new and associated reallocation), which leads to all sorts of problems- how much do I need? How do I calculate when I need more/less? How do I release memory? How do I make sure I'm not using memory that's already been released? How do I detect and manage situations where a memory allocation request fails? How do I avoid writing over memory (perhaps not even my own memory)? All this is extremely difficult and leads to memory leaks, core dumps and all sorts of semi-random, unreproducible errors.
So, Java implements automatic memory-management. The programmer simply allocates new a object and is neither interested, nor should be in terms of what or where memory is allocated (this is also why there isn't much in the way of pointers in managed environments):
object thing = new Object();
and that's all that needs to be done. The JVM will keep track of what memory is available, when it needs allocating, when it can be released (as it's no longer in use), providing ways of dealing with out of memory situations as gracefully as possible (and limiting any problems to the executing thread/ JVM and not bringing down the entire OS).
Automatic memory management is the standard with most programming now, as memory management is by far the most difficult resource to manage (mainly as others are abstracted away to some extent already, database connection pools, socket abstractions etc).
So, to answer the question, yes, you need to manage all resources, but in Java you don't need to (and can't) explicitly manage memory yourself (though it's worth considering in some situations, e.g. designing a cache). This leaves all other resources that you do need to explicitly manage (and these are the non-memory resources, i.e. everything except object instantiation/destruction).
All these other resources are wrapped in a memory resource, clearly, but that's not the issue here. For instance, there are a finite number of database connexion you are allowed to open, a finite number of file handles you may create. You need to manage the allocation of these. The use of the finally block allows you to ensure resources are deallocated, even when an exception occurs.
e.g.
public void server()
{
try
{
ServerSocket serverSocket = new ServerSocket(25);
}
catch (Exception exception)
{
// Something went wrong.
}
finally
{
// Clear up and deallocate the unmanaged resource serverSocket here.
// The close method will internally ensure that the network socket is actually flushed, closed and network resources released.
serverSocket.close();
// The memory used by serverSocket will be automatically released by the JVM runtime at this point, as the serverSocket has gone out-of-scope- it is never used again, so can safely be deallocated.
}
}

Java MySQL JDBC Memory Leak

Ok, so I have this program with many (~300) threads, each of which communicates with a central database. I create a global connection to the DB, and then each thread goes about its business creating statements and executing them.
Somewhere along the way, I have a massive memory leak. After analyzing the heap dump, I see that the com.mysql.jdbc.JDBC4Connection object is 70 MB, because it has 800,000 items in "openStatements" (a hash map). Somewhere it's not properly closing the statements that I create, but I cannot for the life of me figure out where (every single time I open one, I close it as well). Any ideas why this might be occurring?

I had exactly the same problem. I needed to keep 1 connection active for 3 threads and at the same time every thread had to execute a lot of statements (the order of 100k). I was very careful and I closed every statement and every resultset using a try....finally... algorithm. This way, even if the code failed in some way, the statement and the resultset were always closed. After running the code for 8 hours I was suprised to find that the necessary memory went from the initial 35MB to 500MB. I generated a dump of the memory and I analyzed it with Mat Analyzer from Eclipse. It turned out that one com.mysql.jdbc.JDBC4Connection object was taking 445MB of memory keeping alive some openStatements objects wich in turn kept alive aroun 135k hashmap entries, probably from all the resultsets. So it seems that even if you close all you statements and resultsets, if you do not close the connection, it keeps references to them and the GarbageCollector can't free the resources.
My solution: after a long search I found this statement from the guys at MySQL:
"A quick test is to add "dontTrackOpenResources=true" to your JDBC URL. If the memory leak
goes away, some code path in your application isn't closing statements and result sets."
Here is the link: http://bugs.mysql.com/bug.php?id=5022. So I tried that and guess what? After 8 hours I was around 40MB of memory required, for the same database operations.
Maybe a connection pool would be advisible, but if that's not an option, this is the next best thing I came around.

You know unless MySQL says so, JDBC Connections are NOT thread safe. You CANNOT share them across threads, unless you use a connection pool. In addition as pointed out you should be try/finally guaranteeing all statements, result sets, and connections are closed.

Once upon a time, whenever my code saw "server went away," it opened a new DB connection. If the error happened in the right (wrong!) place, I was left with some non-free()d orphan memory hanging around. Could something like this account for what you are seeing? How are you handling errors?

Without seeing your code (which I'm sure is massive), you should really consider some sort of more formal thread pooling mechanism, such as Apache Commons pool framework, Spring's JDBC framework, and others. IMHO, this is a much simpler approach, since someone else has already figured out how to effectively manage these types of situations.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.