What is a nonmemory resource? - java

I am reading "Effective Java".
In the discussion about finalize, he says
C++ destructors are also used to reclaim other nonmemory resources.
In Java, the try finally block is generally used for this purpose.
What are nonmemory resources?
Is a database connection a nonmemory resource? Doesn't the object for holding the database connection occupy some memory?

Database connections, network connections, file handles, mutexes, etc. Something which needs to be released (not just garbage-collected) when you're finished with it.
Yes, these objects typically occupy some memory, but the critical point is that they also have (possibly exclusive) access to some resource in addition to memory.

Is a database connection a non memory resource?
Yes, that's one of the most common examples. Others are file handles, native GUI objects (e.g. Swing or AWT windows) and sockets.
Doesn't the Object for holding the database connection occupy some memory?
Yes, but the point is that the non-memory part of the resource needs to be released as well and is typically much scarcer than the comparatively small amount of memory the object uses. Typically, such objects have a finalize() method that releases the non-memory resource, but the problem is that this finalizers will only run when the objects are garbage collected.
Since the objects are small, there may be plenty of available heap memory so that the garbage collector runs rarely. And in between runs of the garbage collector, the non-memory resources are not released and you may run out of them.
This may even cause problems with only a single object: for example, if you want to move a file between filesystems by opening it, opening the target file, copying the data and then deleting the original file, the delete will fail if the file is still opened - and it is almost certain to be if you only set the reference to the input stream to null and don't call close() explicitly, because it's very unlikely that the garbage collector would have run at exactly the right point between the object becoming eligible for garbage collection and the call to delete()

Another important peace on
Java Automatic Memory Management which touches on some of the essentials.

The question is better answered the other way around, in my view- 'why don't I need to release memory manually'.
This raises the question, 'why do I need to release any resources?'
Fundamentally, your running program uses many forms of resources to execute and do work (CPU cycles, memory locations, disk access, etc.). Almost all of these suffer from 'scarcity', that is, there is a fixed pool of any such resource available, if all resource is allocated then the OS can't satisfy requests and generally your program can't continue and dies very ungracefully- possibly making the whole system unstable. The only one that comes to mind that isn't scarce is CPU cycles, you can issue as many of these as you like, you're only limited by the rate at which you can issue them, they aren't consumed in the same sense that memory or file handles are.
So, any resource you use (memory, file handles, database connexions, network sockets, etc.) comes from a fixed amount of such resource (avoiding the word 'pool') and as your program (and, bear-in-mind other programs, not to mention the OS itself) allocates these resources, the amount available decreases.
If a program requests and is allocated resources and never releases them to be used elsewhere, eventually (often soon) the system will run out of such resources. At which point, either the system halts, or sometimes the offending program can be killed abruptly.
Pre-90s, resource management (at least in mainstream development) was a problem that every programmer had to deal with explicitly. Some resource allocation management isn't too hard, mostly because the allocation is already abstracted (e.g. file handles or network sockets) and one can obtain the resource, use it and explicitly release it when it's no longer wanted.
However, managing memory is very hard, particularly as memory allocation cannot (in non-trivial situations) be calculated at design-time, whereas, say, database connexions can feasibly be managed this way. (There's no way of knowing how much memory you will use, it's very difficult/ impossible to know when an allocation of memory is no longer in use). Also, memory allocations tend to hang-around for a while, where most other resource allocations are limited to a narrow scope, often within a single try-block, or method, at most usually a class. Therefore, vendors developed methods of abstracting memory allocation and bringing it under a single management system, handled by the executing environment, not the program.
This is the difference between managed environments (e.g. Java, .NET) and unmanaged ones (e.g. C, C++ run directly through the OS). In C/C++ memory allocation is done explicitly (with malloc()/new and associated reallocation), which leads to all sorts of problems- how much do I need? How do I calculate when I need more/less? How do I release memory? How do I make sure I'm not using memory that's already been released? How do I detect and manage situations where a memory allocation request fails? How do I avoid writing over memory (perhaps not even my own memory)? All this is extremely difficult and leads to memory leaks, core dumps and all sorts of semi-random, unreproducible errors.
So, Java implements automatic memory-management. The programmer simply allocates new a object and is neither interested, nor should be in terms of what or where memory is allocated (this is also why there isn't much in the way of pointers in managed environments):
object thing = new Object();
and that's all that needs to be done. The JVM will keep track of what memory is available, when it needs allocating, when it can be released (as it's no longer in use), providing ways of dealing with out of memory situations as gracefully as possible (and limiting any problems to the executing thread/ JVM and not bringing down the entire OS).
Automatic memory management is the standard with most programming now, as memory management is by far the most difficult resource to manage (mainly as others are abstracted away to some extent already, database connection pools, socket abstractions etc).
So, to answer the question, yes, you need to manage all resources, but in Java you don't need to (and can't) explicitly manage memory yourself (though it's worth considering in some situations, e.g. designing a cache). This leaves all other resources that you do need to explicitly manage (and these are the non-memory resources, i.e. everything except object instantiation/destruction).
All these other resources are wrapped in a memory resource, clearly, but that's not the issue here. For instance, there are a finite number of database connexion you are allowed to open, a finite number of file handles you may create. You need to manage the allocation of these. The use of the finally block allows you to ensure resources are deallocated, even when an exception occurs.
e.g.
public void server()
{
try
{
ServerSocket serverSocket = new ServerSocket(25);
}
catch (Exception exception)
{
// Something went wrong.
}
finally
{
// Clear up and deallocate the unmanaged resource serverSocket here.
// The close method will internally ensure that the network socket is actually flushed, closed and network resources released.
serverSocket.close();
// The memory used by serverSocket will be automatically released by the JVM runtime at this point, as the serverSocket has gone out-of-scope- it is never used again, so can safely be deallocated.
}
}

Related

Out of memory errors in Java API that uses finalizers in order to free memory allocated by C calls

We have a Java API that is a wrapper around a C API.
As such, we end up with several Java classes that are wrappers around C++ classes.
These classes implement the finalize method in order to free the memory that has been allocated for them.
Generally, this works fine. However, in high-load scenarios we get out of memory exceptions.
Memory dumps indicate that virtually all the memory (around 6Gb in this case) is filled with the finalizer queue and the objects waiting to be finalized.
For comparison, the C API on its own never goes over around 150 Mb of memory usage.
Under low load, the Java implementation can run indefinitely. So this doesn't seem to be a memory leak as such. It just seem to be that under high load, new objects that require finalizing are generated faster than finalizers get executed.
Obviously, the 'correct' fix is to reduce the number of objects being created. However, that's a significant undertaking and will take a while. In the meantime, is there a mechanism that might help alleviate this issue? For example, by giving the GC more resources.
Java was designed around the idea that finalizers could be used as the primary cleanup mechanism for objects that go out of scope. Such an approach may have been almost workable when the total number of objects was small enough that the overhead of an "always scan everything" garbage collector would have been acceptable, but there are relatively few cases where finalization would be appropriate cleanup measure in a system with a generational garbage collector (which nearly all JVM implementations are going to have, because it offers a huge speed boost compared to always scanning everything).
Using Closable along with a try-with-resources constructs is a vastly superior approach whenever it's workable. There is no guarantee that finalize methods will get called with any degree of timeliness, and there are many situations where patterns of interrelated objects may prevent them from getting called at all. While finalize can be useful for some purposes, such as identifying objects which got improperly abandoned while holding resources, there are relatively few purposes for which it would be the proper tool.
If you do need to use finalizers, you should understand an important principle: contrary to popular belief, finalizers do not trigger when an object is actually garbage collected"--they fire when an object would have been garbage collected but for the existence of a finalizer somewhere [including, but not limited to, the object's own finalizer]. No object can actually be garbage collected while any reference to it exists in any local variable, in any other object to which any reference exists, or any object with a finalizer that hasn't run to completion. Further, to avoid having to examine all objects on every garbage-collection cycle, objects which have been alive for awhile will be given a "free pass" on most GC cycles. Thus, if an object with a finalizer is alive for awhile before it is abandoned, it may take quite awhile for its finalizer to run, and it will keep objects to which it holds references around long enough that they're likely to also earn a "free pass".
I would thus suggest that to the extent possible, even when it's necessary to use finalizer, you should limit their use to privately-held objects which in turn avoid holding strong references to anything which isn't explicitly needed for their cleanup task.
Phantom references is an alternative to finalizers available in Java.
Phantom references allow you to better control resource reclamation process.
you can combine explicit resource disposal (e.g. try with resources construct) with GC base disposal
you can employ multiple threads for postmortem housekeeping
Using phantom references is complicated tough. In this article you can find a minimal example of phantom reference base resource housekeeping.
In modern Java there are also Cleaner class which is based on phantom reference too, but provides infrastructure (reference queue, worker threads etc) for ease of use.

Why is closing resources important when you have finalize()

Alright, so I know that you should always close your streams and other native resources, but I am currently not sure why.
I see one reason for why and that is because of the limited amount of resources available and you want to release them as soon as you are done with them.
But lets assume that your application doesn't use that many resources, then there shouldn't be a need to close the resource right?
Especially since you have the finalize() block that should close all native resources when the GC gets to it.
The assumption that “you have the finalize() block that should close all native resources when the GC gets to it” is wrong in the first place. There is no guaranty that every object representing a native resource has such a finalize() method.
Second, a resource isn’t necessarily a native resource.
When you have, e.g. a BufferedOutputStream, an ObjectOutputStream, or a ZipOutputStream wrapping a FileOutputStream, the latter likely has a finalize() method to free the underlying native resource (that’s implementation dependent), but that doesn’t write any pending data of the wrapper stream needed to have correctly written data. Closing the wrapper stream is mandatory to ensure that the written output is complete.
Naively adding a finalize() method to these wrapper stream classes to close them would not work. When the stream object gets collected, it implies that there is no reference to it, in other words there is no directed application→wrapper stream→native stream graph anymore. Since object graphs could be cyclic, there is no guaranty that an expensive search for an order among dead objects would succeed, and that’s why the JVM doesn’t even try.
Or, as the specification puts it:
The Java programming language imposes no ordering on finalize method calls. Finalizers may be called in any order, or even concurrently.
Therefore, a finalize() method of a wrapper would not be guaranteed to be called before the finalize() method of underlying native stream, thus, the underlying native stream might have been finalized and closed before the wrapper stream making it impossible to write the pending data.
And the rabbit hole goes even deeper. Object instances are maintained by the JVM as needed by the code, but the code can get optimized to use encapsulated data directly. If the wrapper stream class had a finalize() method, you might find out that the wrapper instance can be freed even earlier than expected, as discussed in finalize() called on strongly reachable object in Java 8.
Or, in short, explicit closing is the only way to ensure that it happens exactly at the right point of time.
Simple: you always strive to do the right thing.
Building an application on assumptions such as "it doesn't use many resources" is a wrong approach. Instead: you focus on getting your logic right.
You see: the thing with real world application is: when they are helpful, they will be used. And as soon as you have users, you will be dealing with additional requirements. That results in you enhancing and maintaining your code. And as a result of that, any "compromise" that you made earlier on ("it's just a small thing, so who cares") has the potential to make such activities much harder than the ought be.
In other words: you should strive to build applications that are "correct". You don't write sloppy code because it "doesn't matter". Because you very often can not predict if your code doesn't become "important" at some point - and then your sins will come back biting you.
And no, finalize isn't the answer to anything ( see here ). You care about all resources that your code is using, and you carefully make sure that resources have a well defined life cycle.
Firstly finalize is never guaranteed to be called by the GC, so you cannot rely on it.
Secondly, in a real world application the amount of resources needed by your application changes, hence you shouldn't rely on assumptions and you should free up as many resources as you can to provide the best possible performance.
In my opinion the key words are performance, availability, scalability.
As a Java developer, you have little control over when, or even if, finalizers are invoked. If your resources are in limited supply (database connections, for example), or create additional threads that have to be serviced, or hold locks, or use substantial memory, then you need to exercise control over when they are allocated and freed. It isn't always obvious what the implications are for keeping a resource allocated, and it's always safer to minimize the scope and duration of your resource usage, so far as the logic of the application allows.

How to get the statistics of the garbage collected objects?

Is it possible to see the java objects (and their class type) that were made null and which are
Not yet garbage collected/cleaned
garbage collected/cleaned.
This statistic will help to know that how many objects repeatedly created (by a wrong logic) instead of creating one time.
I think that it is theoretically possible, though frankly you would be crazy to try it.
The route to finding unreachable objects is to use the Java VM Tool Interface (JVMTI) to iterate over all objects in the heap (reachable or unreachable) in order to find the one you are looking for. Then you extract its state via JVMTI and (somehow) reify it so that you can display it.
Normally you would do this in a separate JVM; e.g. the one running your debugger or profiling tool. But it is possible for an application to attach an agent to itself, and use it to dig around in the JVM. However, this is not the intended usage for JVMTI, and I would anticipate that there could be "hazards" in doing this.
You can read more here:
Creating a Debugging and Profiling Agent with JVMTI
Own your heap: Iterate class instances with JVMTI
But please don't blame me if you go crazy trying to get this working.
UPDATE I concur with Marko's note that you are unlikely to learn anything significant by looking at unreachable objects.
to display the unwanted or null java objects that are not cleaned by the java garbage process
This is not a well-defined concept; at least there are no useful definitions which would give you anything of relevance.
A piece of memory where an object was allocated can be considered free for all practical purposes as soon as that object has become unreachable. The amount of memory that the block represents is available to the JVM allocator in the sense that no out-of-memory event will happen due to that block being "overlooked" in some sense.
Further note that many "garbage collection" algorithms usually do the exact opposite: they find live objects and relocate them so they occupy a contiguous block of memory. The algorithms are simply oblivious to "garbage" objects and treat them as just empty space.
So, even if you manage to write up some low-level Java Agent-based module which will enumerate all the objects on the heap, you will not gain any interesting insight: the unreachable objects which you encounter will just happen to linger on because the JVM has not yet felt the need to reuse their memory.

Dynamic Memory Handling Java vs C++

I am a C++ programmer currently trying to work on Java. Working on C++ I have an habit of keeping track of dynamic memory allocations and employing various techniques like RAII to avoid memory leak. Java as we know provides a Garbage Collector(GC) to take care of memory leaks.So while programing in Java should one just let go all the wholesome worries of heap memory and leave it for GC to take care of the memory leaks or should one have a approach similar to that while programming languages without GC, try to take care of the memory you allocate and just let GC take care of ones that you might miss out. What should be the approach? What are the downsides of either?
I'm not sure what you mean by trying to take care of the memory you allocate in presence of a GC, but I'll try some mind reading.
Basically, you shouldn't "worry" about your memory being collected. If you don't reference objects anymore, they will be picked up. Logical memory leaks are still possible if you create a situation where objects are referenced for the rest of your program (example: register listeners and never un-register them, example: implementing a vector-like collection that doesn't set items to null when removing items off the end).
However, if you have a strong RAII background, you'll be disapointed to learn that there is no direct equivalent in Java. The GC is a first-class tool for dealing with memory, but there is no guaranteed on when (or even if) finalizers are called. This means that the first-class treatment applied to memory is not applied to any other resource, such as: windows, database connections, sockets, files, synchronization primitives, etc.
With Java and .net (and i imagine, most other GC'ed languages), you don't need to worry much about heap memory at all. What you do need to worry about, is native resources like file handles, sockets, GUI primitives like fonts and such. Those generally need to be "disposed", which releases the native resources. (They often dispose themselves on finalization anyway, but it's kinda iffy to fall back on that. Dispose stuff yourself.)
With a GC, you have to:
still take care to properly release non-memory resources like file handles and DB connections.
make sure you don't keep references to objects you don't need anymore (like keeping them in collections). Because if you do that, you have a memory leak.
Apart from that, you can't really "take care of the memory you allocate", and trying to do so would be a waste of time.
Technically you don't need to worry about cleaning up after memory allocations since all objects are properly reference counted and the GC will take care of everything. In practice, an overactive GC will negatively impact performance. So while Java does not have a delete operator, you will do well to reuse objects as much as possible.
Also Java does not have destructors since objects will exists until the GC gets to them,. Java therefore has the finally construct which you should use to ensure that all non-memory related resources (files sockets etc) are closed when you are finished with them. Do not rely on the finalise method to do this for you.
In Java the GC takes care of allocating memory and freeing unused memory. This does not mean you can disregard the issue alltogether.
The Java GC frees objects that have are not referenced from the root. This means that Java can still have memory leaks if you are not carefull to remove references from global contexts like caches in global HashMaps, etc.
If any cluster of objects that reference eachother are not referenced from the root, the Java GC will free them. I.e. it does not work with reference counts, so you do not need to null all object references (although some coding styles do prefer clearing references as sonn as they are not needed anymore.)

What does flushing thread local memory to global memory mean?

I am aware that the purpose of volatile variables in Java is that writes to such variables are immediately visible to other threads. I am also aware that one of the effects of a synchronized block is to flush thread-local memory to global memory.
I have never fully understood the references to 'thread-local' memory in this context. I understand that data which only exists on the stack is thread-local, but when talking about objects on the heap my understanding becomes hazy.
I was hoping that to get comments on the following points:
When executing on a machine with multiple processors, does flushing thread-local memory simply refer to the flushing of the CPU cache into RAM?
When executing on a uniprocessor machine, does this mean anything at all?
If it is possible for the heap to have the same variable at two different memory locations (each accessed by a different thread), under what circumstances would this arise? What implications does this have to garbage collection? How aggressively do VMs do this kind of thing?
(EDIT: adding question 4) What data is flushed when exiting a synchronized block? Is it everything that the thread has locally? Is it only writes that were made inside the synchronized block?
Object x = goGetXFromHeap(); // x.f is 1 here
Object y = goGetYFromHeap(); // y.f is 11 here
Object z = goGetZFromHead(); // z.f is 111 here
y.f = 12;
synchronized(x)
{
x.f = 2;
z.f = 112;
}
// will only x be flushed on exit of the block?
// will the update to y get flushed?
// will the update to z get flushed?
Overall, I think am trying to understand whether thread-local means memory that is physically accessible by only one CPU or if there is logical thread-local heap partitioning done by the VM?
Any links to presentations or documentation would be immensely helpful. I have spent time researching this, and although I have found lots of nice literature, I haven't been able to satisfy my curiosity regarding the different situations & definitions of thread-local memory.
Thanks very much.
The flush you are talking about is known as a "memory barrier". It means that the CPU makes sure that what it sees of the RAM is also viewable from other CPU/cores. It implies two things:
The JIT compiler flushes the CPU registers. Normally, the code may kept a copy of some globally visible data (e.g. instance field contents) in CPU registers. Registers cannot be seen from other threads. Thus, half the work of synchronized is to make sure that no such cache is maintained.
The synchronized implementation also performs a memory barrier to make sure that all the changes to RAM from the current core are propagated to main RAM (or that at least all other cores are aware that this core has the latest values -- cache coherency protocols can be quite complex).
The second job is trivial on uniprocessor systems (I mean, systems with a single CPU which has as single core) but uniprocessor systems tend to become rarer nowadays.
As for thread-local heaps, this can theoretically be done, but it is usually not worth the effort because nothing tells what parts of the memory are to be flushed with a synchronized. This is a limitation of the threads-with-shared-memory model: all memory is supposed to be shared. At the first encountered synchronized, the JVM should then flush all its "thread-local heap objects" to the main RAM.
Yet recent JVM from Sun can perform an "escape analysis" in which a JVM succeeds in proving that some instances never become visible from other threads. This is typical of, for instance, StringBuilder instances created by javac to handle concatenation of strings. If the instance is never passed as parameter to other methods then it does not become "globally visible". This makes it eligible for a thread-local heap allocation, or even, under the right circumstances, for stack-based allocation. Note that in this situation there is no duplication; the instance is not in "two places at the same time". It is only that the JVM can keep the instance in a private place which does not incur the cost of a memory barrier.
It is really an implementation detail if the current content of the memory of an object that is not synchronized is visible to another thread.
Certainly, there are limits, in that all memory is not kept in duplicate, and not all instructions are reordered, but the point is that the underlying JVM has the option if it finds it to be a more optimized way to do that.
The thing is that the heap is really "properly" stored in main memory, but accessing main memory is slow compared to access the CPU's cache or keeping the value in a register inside the CPU. By requiring that the value be written out to memory (which is what synchronization does, at least when the lock is released) it forcing the write to main memory. If the JVM is free to ignore that, it can gain performance.
In terms of what will happen on a one CPU system, multiple threads could still keep values in a cache or register, even while executing another thread. There is no guarantee that there is any scenario where a value is visible to another thread without synchronization, although it is obviously more likely. Outside of mobile devices, of course, the single-CPU is going the way of floppy disks, so this is not going to be a very relevant consideration for long.
For more reading, I recommend Java Concurrency in Practice. It is really a great practical book on the subject.
It's not as simple as CPU-Cache-RAM. That's all wrapped up in the JVM and the JIT and they add their own behaviors.
Take a look at The "Double-Checked Locking is Broken" Declaration. It's a treatise on why double-checked locking doesn't work, but it also explains some of the nuances of Java's memory model.
One excellent document for highlighting the kinds of problems involved, is the PDF from the JavaOne 2009 Technical Session
This Is Not Your Father's Von Neumann Machine: How Modern Architecture Impacts Your Java Apps
By Cliff Click, Azul Systems; Brian Goetz, Sun Microsystems, Inc.

Categories