Ok so given the following steps taken from Cpp
Use jni to make a dictionary
Make the jobject of the dictionary returned into a globalref
Use jni to call a Java method that returns an object (we will call this object *A*)
Add *A* to the dictionary WITHOUT making the ref of *A* global
What is the lifespan of the *A*?
My expectations are as follows. The dictionary itself is global and so is protected from garbage collection, when I call the 'Add' method from jni *A* is passed 'back into java' and then the dictionary will hold a new reference to it, protecting it too from garbage collection. So I expect *A* to last as long as the dictionary (ignoring outside meddling).
Am I on the right track here? Thanks.
when I call the 'Add' method from jni A is passed 'back into java' and then the dictionary will hold a new reference to it, protecting it too from garbage collection.
No. You are conflating two properties of local and global references. A local reference is only valid for the duration of the JNI call it is created in. After that, it's invalid. If you want to use it again in a subsequent JNI call, make it a GlobalRef.
So I expect A to last as long as the dictionary (ignoring outside meddling).
Yes, but the reference itself has become invalid so you can't use it anyway.
.NET has a function called GC.KeepAlive(Object). Its sole purpose is to ensure the lifetime of the referenced object lasts until code flow reaches the call.
This is normally not necessary unless one is interoperating with native code.
I have a situation where I've got a graph of C++ objects accessed via JNI, where certain root objects need to be kept alive to keep the children alive. Both the root objects and the child objects have mirrors in JVM land. If the root object is collected and freed on the C++ side (by way of a SWIG-generated finalizer), however, the child objects will become invalid, since their C++ backing object will have been freed.
This can be solved by ensuring the local variables that root the object graph have a lifetime that exceeds the last use of a child object. So I need an idiomatic function that does nothing to an object, yet won't be optimized away or moved (e.g. hoisted out of a loop). That's what GC.KeepAlive(Object) does in .NET.
What is the approximate equivalent in Java?
PS: some possibly illustrative code:
class Parent {
long ptr;
void finalize() { free(ptr); }
Child getChild() { return new Child(expensive_operation(ptr)); }
}
class Child {
long ptr;
void doStuff() { do_stuff(ptr); }
}
// BAD CODE with potential for SIGSEGV
for (Parent p : getParents()) {
p.getChild().doStuff();
}
The trouble is that the GC freeing Parent p will free the memory allocated for Child while doStuff is executing. GC has been observed to do this in practice. A potential fix if GC.KeepAlive was available:
// BAD CODE with potential for SIGSEGV
for (Parent p : getParents()) {
p.getChild().doStuff();
GC.KeepAlive(p);
}
I could e.g. call toString on p, but I won't be doing anything with its output. I could poke p into an array temporarily, but how do I know the JVM won't discard the store? Etc.
I guess you could use JMH Blackhole for this. It was designed for ensuring that the reference doesn't get eliminated in benchmarks so it should work.
Basically it just compares the given object reference against a stored volatile reference and reassigns the later with some small and decreasing probability (storing is expensive so it gets minimized).
Whenever the garbage collector is aggressive enough to claim the object while invoking a native method, and also in Java world little people seem to care to the point that either the problem doesn't exist or there's a lot bugged code around, this other SO answer seems to provide a reasonable alternative to use GC.KeepAlive(Object), that is by using non-static native JNI methods, reasonably preventing any possible garbage collection of the instance invoking these methods.
OK, let me see if I can explain.
I have some code that wraps a Java iterator (from Hadoop, as it happens) in a Scala Stream, so that it potentially can be read more than once, by client code that I have no direct control over. The last thing that gets done with this Stream is a reduce() operation. Stream remembers all the items that it's already seen. Unfortunately, in some circumstances the iterator will be extremely large, so that storing all the items in it will lead to out-of-memory errors. However, in general, the situations where the client code needs the multiple-iteration facility are not the same ones with the memory-busting Iterators, and if such cases do exist, that's not my problem.
What I want to ensure is that I can provide the memoizing capability for code that needs it, but not for code that doesn't need it (in particular, for code that never looks at the Stream at all).
The code for reduce() in Stream says that it's written in a way to allow for GC of the already-visited parts of the Stream to happen while reducing. So if I can make sure this actually happens, I'll be fine. But in practice how can I make sure that this happens? In particular, if function A creates and passes the stream to function B, and function B passes the stream to function C, and function C then calls reduce(), then what about the references to the stream still in functions A, B and C? In all these cases, there will be no further use of the stream in any of the three functions, although the calls aren't necessarily tail-recursive. Is the JVM smart enough to ensure that its reference count is 0 from functions A, B and C at the time that reduce() is called, so that the GC can happen? Essentially this means that the JVM notices in function A that the last thing it does with the item is call function B, so it eliminates its own handle at the same time it calls B, and likewise for B to C, and C to reduce().
If this works properly, does it also work if A, B or C has a local variable holding onto the item? (Which, again, won't be used, afterwards.) That's because it's rather more tricky to code this properly without using local vars.
A variable which is in scope but which will never be read from is dead. A JVM is free to ignore dead variables for the purposes of garbage collection; an object which is only pointed to by dead variables is unreachable, and may be collected. The relevant bit of the JLS is, obscurely enough, ยง12.6.1 Implementing Finalization, which says:
A reachable object is any object that can be accessed in any potential continuing computation from any live thread.
And explains that:
Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
Another example of this occurs if the values in an object's fields are stored in registers. The program may then access the registers instead of the object, and never access the object again. This would imply that the object is garbage. Note that this sort of optimization is only allowed if references are on the stack, not stored in the heap.
If your method A has only dead variables referring to the stream, then it won't obstruct its collection.
Note, however, that that means local variables: if you have fields which refer to the stream (including closed-over local variables from a method enclosing a nested class), then this doesn't apply; i don't think the JVM is allowed to treat these as dead. In other words, here:
public Callable<String> foo(final Object o) {
return new Callable<String>() {
public String call() throws InterruptedException {
String s = o.toString();
Thread.sleep(1000000);
return s;
}
};
}
The object o cannot be collected until the anonymous Callable is collected, even though it is never used after the toString call, because there is a synthetic field referring to it in the Callable.
Global Reference in JNI is said to be a reference that has to be manually freed by the programmer. It has nothing to do with the c context. so a code like :
{
jclass clsStr = (*env)->NewGlobalRef(env,cls);
}
return clsStr;
will give an error saying that clsStr is undefined / undeclared. I understand this. But what i don't understand is the use of these type of references.
What is global about clsStr in the above code ? How this variable can be useful in future or after the call returns ? I also read that "Basically, the global reference is useful for ensuring that it (and its contents) will survive to the next JNI invocation" but i don't understand this.
It means that you're allowed to hold on to the reference you get from NewGlobalRef() across multiple calls to the native mathod. The reference will remain valid until you explicitly call DeleteGlobalRef().
This is in contrast to local references:
A local reference is valid only within the dynamic context of the native method that creates it, and only within that one invocation of the native method. All local references created during the execution of a native method will be freed once the native method returns.
If you store a global reference in a variable that's allowed to go out of scope before you call DeleteGlobalRef(), you leak memory. The following is an example of that:
{
jclass clsStr = (*env)->NewGlobalRef(env,cls);
}
Global Reference in JNI is said to be a reference that has to be
manually freed by the programmer. It has nothing to do with the c
context.
No it isn't. That is a terrible misquote from the JNI Specification. Here's what it really says:
The JNI divides object references used by the native code into two
categories: local and global references. Local references are valid
for the duration of a native method call, and are automatically freed
after the native method returns. Global references remain valid until
they are explicitly freed.
Nothing in JNI can alter the semantics of the C programming language.
There is no Automatic Garbage Collection in C/C++.
Assume that I wrote a simple program in C/C++ and created a single object.
Assume that there are exactly 10 or extremely limited number of addresses for allocation.
I have a for loop running for 100 times inside which this single object is created every time the loop is run.
In Java, because there is Automatic Garbage collection, after a single loop is executed the address of the object is removed each time automatically.
Rough Example:
for(int i = 0; i < 100; i++)
{
Object o = new Object;
}
In C/C++ we have to manually remove the Object address inside the for loop. Do we have to reboot each time as well to properly remove the Object reference in C++?
For those who say that there are no problems deleting an object more than once in C++. from Wikipedia:
When an object is deleted more than
once, or when the programmer attempts
to release a pointer to an object not
allocated from the free store,
catastrophic failure of the dynamic
memory management system can result.
The result of such actions can include
heap corruption, premature destruction
of a different (and newly created)
object which happens to occupy the
same location in memory as the
multiply deleted object, and other
forms of undefined behavior.
The link: Manual Memory Management
So there IS a risk?
C++ doesn't have (built-in) garbage collection, but that doesn't mean you should let your dynamically allocated memory stay allocated forever. You can and should free it. The most basic way to that would be to free your object reference (or pointer, since this is C++) manually when you don't need it anymore:
for(int i = 0; i < 100; i++)
{
// Dynamically allocate an object
Object* o = new Object();
// do something with object
// Release object memory (and call object destructor, if there is one)
delete o;
}
If you allocated your object on the stack however, it always gets automatically released when it goes out of scope, and you don't have to wait for garbage collection to happen - it's always released immediately:
for(int i = 0; i < 100; i++)
{
// Create a new object on the stack
Object o = Object(); // Note we're not using the new keyword here.
// Do something with object
// Object gets automatically deallocated, or more accurately, in this specific
// case (a loop), the compiler will optimize things, so object's destructor
// will get called, but the object's stack memory will be reused.
}
This behavior of C++ stack values (automatic destruction when they go out of scope), which is awkwardly termed RAII (Resource Acquisition Is Initialization) allows for very nice things that Java just can't do. One of them is smart pointers, that allow dynamically allocated objects to be freed automatically just like their stack counterparts, and you can use sophisticated smart pointers to implement your own version of garbage collection if you want.
Another advantage of RAII is that finally-blocks are rarely necessary in C++: local variables that reference to a resource that should be freed immediately are usually allocated on the stack, and therefore get released automatically. No need for finally blocks here.
Practically speaking, when programming in C++ you'd usually put local variables on the stack (and get all the advantages of RAII without lifting a finger), unless you need to keep them alive for longer than that (e.g. you need to create an object and store a reference to it that stays alive when you leave the function that created the object). In that case, you can use pointers directly, but if you don't want to deal with manually deleting pointers and all the problems it can lead to, you'd usually use a smart pointer. A smart pointer is an object allocated on the stack, that wraps a 'dumb' (i.e. regular) pointer, and deletes the pointer when its destructor gets called. More advanced version of smart pointers can implement reference counting (so the pointed object will be released only when all smart pointers referencing it go out of scope) and even garbage collection. The standard library comes with two smart pointers: auto_ptr and smart_ptr (the latter is reference-counting, but some old compiler may not support it).
Java doesn't automatically remove the object at the end of each loop. Instead it waits until there is a lot of garbage and then goes through and collects it. Java makes no guarantees about how long it will be before the object is collected.
In C++, once the program has exited all resources are returned. You don't need to reboot (assuming a reasonable operating system) in order to make sure that resources are returned. Your operating system will take care of making sure that any resources your program didn't release get released when it is shutdown.
If you delete the object in C++, then its gone right then. you don't have to do anything else to recover the memory.
C++ certainly does have automatic storage duration; it just has other types of storage duration as well, and the ability to choose what is most appropriate.
In something like your example, you would use automatic storage:
for (int i = 0; i < 100; ++i) {
Object o;
// The object is automatically destroyed after each iteration
}
If the object is required to outlive the loop, perhaps because it is passed to another object to manage, then you would use smart pointers (the closest C++ equivalent to Java's references). Here are examples using auto_ptr and shared_ptr:
// some function that takes ownership of an object
void register(auto_ptr<Object> const & o);
void register(shared_ptr<Object> const & o);
for (int i = 0; i < 100; ++i) {
auto_ptr<Object> o(new Object);
register(o); // ownership may be transferred; our pointer is now null in that case
// The pointer is automatically destroyed after each iteration,
// deleting the object if it still owns it.
}
for (int i = 0; i < 100; ++i) {
shared_ptr<Object> o(new Object);
register(o); // ownership is shared; our pointer is still valid
// The pointer is automatically destroyed after each iteration,
// deleting the object if there are no other shared pointers to it.
}
In none of these cases do you need to manually delete the object; that is only necessary when dealing with raw object pointers which, in my opinion, should only be done when absolutely necessary.
C++ also has an advantage over Java here: destruction is always deterministic (that is, you know exactly when it happens). In Java, once an object is discarded, you do not know exactly when (or even if) the garbage collector will remove it. This means that, if the object manages a resource (such as a lock, or a database connection) that needs to be released after use, then it is up to the user of the object to manually release it. In C++, this can be done automatically in the destructor, making the class easier and less error-prone to use.
There is no Automatic Garbage Collection in C/C++.
False. C and C++ don't mandate automatic garbage collection, but it's available anyway (e.g., see the Boehm/Demers/Weiser collector).
In C/C++ we have to manually remove the Object address inside the for loop.
Again, false. In C or C++, we'd define the object using the automatic storage class, which would deterministically destroy the object:
for(int i = 0; i < 100; i++)
{
Object o;
}
Just for example, let's do a quick test, by defining Object something like this:
struct Object {
Object() { std::cout << "Created an Object\n"; }
~Object() { std::cout << "Destroyed an Object\n"; }
};
For the loop above, this generates:
Created an Object
Destroyed an Object
Created an Object
Destroyed an Object
Created an Object
Destroyed an Object
[97 more repetitions of the same pattern removed ]
Do we have to reboot each time as well to properly remove the Object reference in C++?
No, of course not. The fundamental difference between C++ and Java in this respect is that in C++ the object destruction is deterministic, but in Java it's not. Just for example, in C++, the code above must follow exactly the prescribed pattern -- the body of the loop is a block, and the object must be created on entry to the block, and destroyed on exit from the block. In Java, you get no such assurance. It might only destroy the first object after it has allocated 10 or 20. The objects may be destroyed in any order, and there's not really a guarantee that any particular object will be destroyed at all.
That difference isn't always important, but certainly can be. In C++, it's used to support RAII (aka., SBRM -- stack bound resource management). This used to assure not only that memory is freed when no longer needed (i.e., the same thing Java's automatic garbage collector handles) but also that other resources (anything from files to widgets to network or database connections) are handled automatically as well.
If you have short lived objects and performance is critical (most of the time it is not) you can create a mutable object which is reused on each loop. This moves work from a per-iteration to per-loop.
List list = new ArrayList(); // mutable object.
for(int i = 0; i < 100; i++) {
list.clear();
// do something with the list.
}
// one list is freed later.
You can make the list a member field (meaning the mutable object might never be freed), which is fine, provided your class doesn't need to be thread safe.
Assuming you are on a typical operating system (Windows/Linux) - no you do not need to reboot. The OS will protect you via the Process/Virtual Memory structure.
Only your process will run out of memory. The OS will clean up after you when you process ends.
Running on many small embedded systems without an OS - yes you would crash or lockup the processor and require a reboot.
Adding to the already given answers, you can always write signal handlers, upon receiving which you can clean up the memory used by your process..
Normally the OS deallocate your program when you quit it. But some OS might not (I've seen that on a http://www.beck-ipc.com/en/products/sc1x/sc13.asp : RTOS didn't deallocate on exit so I ran out of memory after a couple of launch when I didn't deallocate all my objects, so yes I had to reboot).
But most OS will clear/deallocate previously used programs (linux and windows does), so you shouldn't have to worry.