C++: would universal use of shared_ptr<> be equivalent to a gc? - java

This is just an academic question (I would never do this in real code):
If I were to use shared_ptr<> universally in my code, would the behavior be equivalent to a gc-collected language like Java?
If not, how would the behavior be different from a gc-embedded language? Which C++ construct would yield equivalent behavior compared to a gc-embedded language?
Note: In real coding, I strongly prefer the use of RAII and strict ownership over the use of any smart pointers. I also know that other less-generic pointers, unique_ptr<> would be more efficient. This question is just a query into smart-pointer equivalence.

No, there'd be a couple of important differences:
You would get a memory leak any time you have a cyclic reference. A garbage collector can handle cycles, ref-counting can't.
You would avoid any stalls or pauses because no garbage collection ever occurs. On the other hand, you'd likely spend more total CPU time cleaning up resources, because the amortized cost of an occasional garbage collection is pretty low, and ref-counting can be relatively expensive if you do it on everything.
Obviously the first point is the killer. If you did this, many of your resources wouldn't get freed, and you'd leak memory and your app just wouldn't behave very well.
Which C++ construct would yield equivalent behavior compared to a gc-embedded language?
None. C++ doesn't have a garbage collector because there's no way to implement a correct, reliable one. (Yes, I'm aware of Boehm's GC, and it's a good approximation, but it's conservative, and doesn't detect all references, only the ones it can be 100% sure of. There is no way, in a general C++ program, to implement a garbage collector that Just Works(tm))

#jalf says this in his answer:
You would avoid any stalls or pauses because no garbage collection ever occurs.
While smart pointers (or any reference counting scheme) have no pause while garbage collection occurs, you can get a pause if you null the last external pointer to a large data structure, and trigger a cascade of reference count adjustments and finalizations for each node in the data structure. While a smart smart-pointer implementation could ameliorate this, you'd be sacrificing immediate reclamation ... which some people claim is an advantage of smart pointers.
Also, there is an overhead of a few instructions each time you assign to a smart pointer-typed variable, and the overheads of allocating an object is greater.

Garbage collection happens whenever the GC decides that it should. shared_ptrs are not collected. An object managed by a shared_ptr will only ever be destroyed in the destructor of a shared_ptr. And therefore, you know exactly when memory can and can not be freed.
You still have control over when memory goes away with shared_ptr. You don't have that with a garbage collector (outside of coarse-grained commands like turning it on/off or modifying it's behavior a bit).

The main difference is that reference counting alone can't free circular data structures.
Many cases of such structures can nevertheless be handled by using weak_ptr appropriately, and some cases can be handled by delegating cleanup responsibility to a collection object.
However, the most frivolous spaghetti structures, if you want them (e.g. for math), can't have automated cleanup implemented by reference counting alone, because there will be circular sub-structures.
Cheers & hth.,

Its worth noting that a shared ptr is much larger that a Java reference. Generally this won't matter but some situations it might.
In Java 6, 64-bit JVMs still use 32-bit references access up to 32 GB of heap (it can do this because objects are on 8 byte boundaries) However a shared ptr uses two pointers (each 8 bytes in a 64-bit applications), the second pointer references an object which contains the counter. On libgcc it allocates 32-byte minimum to any malloc/new object. In total the shared pointer could be using 48 bytes which is relatively larger than 4 bytes. 44 bytes is not going to make a difference, but it could if you have lots of these.

Related

Dynamic memory allocation across programming languages

I have a question regarding dynamic memory allocation.
When it comes to C, memory is allocated using the functions malloc(), calloc() and realloc() and de-allocated using free().
However in objected oriented languages like C++,C# and Java, memory is dynamically allocated using the new and deallocated using delete keywords (operators) in case of C++.
My question is, why are there operators instead of functions for these objected oriented languages for dynamic memory allocation? Even when using new, finally a pointer is returned to the class object reference during allocation, just like a function.
Is this done only to simplify the syntax? Or is there a more profound reason?
In C, the memory allocation functions are just that. They allocate memory. Nothing else. And you have to remember to release that memory when done.
In the OO languages (C++, C#, Java, ...), a new operator will allocate memory, but it will also call the object constructor, which is a special method for initializing the object.
As you can see, that is semantically a totally different thing. The new operator is not just simpler syntax, it's actually different from plain memory allocation.
In C++, you still have to remember to release that memory when done.
In C# and Java, that will be handled for you by the Garbage Collector.
I believe it's done solely to simplify the syntax as you've said.
Operators are simply another way to call methods (or functions).
using "12 + 13" is no different than using Add(12, 13).
A way to see this is via the operator overrides in C# for example:
// Sample from - https://msdn.microsoft.com/en-us/library/8edha89s.aspx
public static Complex operator +(Complex c1, Complex c2)
{
Return new Complex(c1.real + c2.real, c1.imaginary + c2.imaginary);
}
It's a regular method but allows the usage of operators over complex classes.
I'm using the Add operator as an example since I see it as no different than the memory allocation operators such as "new".
The whole point of Object Oriented design/programming is to provide meaningful abstractions.
When you are doing good OO design; you do not think (immediately) on areas in memory. One thinks about of objects; that carry state and provide behavior.
Even when writing code in C++, in most cases, I don't have to worry about subtleties like "why will my bits be aligned", "how much memory does one of my objects required at runtime" and so on. Of course, these questions are relevant in certain situations; but within OO design; the true value comes from creating useful abstractions that help to solve "whatever domain" problems as precise, easy, maintainable, ... as possible.
For the "keyword" versus "function" thing: just have a look at Java. The fathers of the language simply didn't want Java programmers start thinking about "memory pointers". You only deal with objects; and references to objects. Thus, the concept of "allocating" memory, and getting back a "pointer" simply does not exist at all here. So, how would you then provide this functionality as library method?! Well, if you would like to: you can't.
Finally, to a certain degree, this is a matter of "taste/style" by the people designing the language. Sometimes people prefer a small language core; and do everything in libraries; and other people prefer to have "more" things built-in.
The new keyword is ideed to simplify the syntax, which is pretty suggestive and also does more than memory allocation, it invokes the constructor(s) also.
One thing you have said:
C++,C# and Java, memory is dynamically allocated and de-allocated using the new and delete keywords (operators)
for Java and C# it is only the new keyword, there is no delete. I know that in C# you are able to use using blocks to ensure that the resource will be released when the object is not used anymore, but this does not involves memory deallocation in every case, such as it's calling the Dispose method.
One more thing which needs to be pointed is that the goal of an object oriented programming language, as GhostCat just said, is to release the programmer to think of how memory is allocated in most of the cases, and more important, how are the objects released, this is why garbage collector was introduced.
The main principle is that as the programming language is higher, it has to abstract such things as memory management, and provide easy ways to solve the actual business problems one is looking for. Of course this might been considered when a programming langage is chosed for a specific task.
C :malloc calloc are basically the only ways in C to allocate memory.
malloc : it allocate uninitialized memory according to requested size without initializing it to any value
calloc : almost same as malloc ,plus it also initialize it to zero(0).
In both cases , you required something :
The requested memory size for allocation should be given at the time of initialization and it can be increase with realloc.
The allocated memory need to be deleted with free ,sometimes it can be result in a OOM error if somebody don't have a good memory to free the allocated memory although free is quite handy when you are doing lot of memory extensive work.
NOTE : Casting and size(to allocate memory) is required with malloc and calloc
C++: C++ also has malloc and calloc (free and reallocate too) along new and delete ,new and delete can think of as a modern way to allocate and free memory but not all of the OOP's based language have both. e.g java don't have delete.
new uses constructors to initialize default value so it's pretty useful while working with objects when you have various scenarios to set initial value using parameterize ,default or copy constructors.
NOTE : With new you don't have to do the appropriate casing unlike with malloc and calloc and no need to give a memory size for allocation. one less thing , right.
delete is used to release the memory, the delete call on some object also calls destructor which is the last place of the life-cycle of that object where you can do some farewell tasks like saving current state etc and then memory will be released .
Note : In C# and java the deallocation of memory is handled by Garbage-Collector who does the memory management to release the memory.It used various algos like mark-sweep to release the memory if there is no reference variable pointing to that memory or the reference variable value is set as null.
This may also lead to memory leak if there is a reference variable pointing to that object in memory which is no longer required.
The downside of GC is, this makes things slow

Can the JVM GC move objects in the middle of a reference comparison, causing a comparison to fail even when both sides refer to the same object?

It's well known that GCs will sometimes move objects around in memory. And it's to my understanding that as long as all references are updated when the object is moved (before any user code is called), this should be perfectly safe.
However, I saw someone mention that reference comparison could be unsafe due to the object being moved by the GC in the middle of a reference comparison such that the comparison could fail even when both references should be referring to the same object?
ie, is there any situation under which the following code would not print "true"?
Foo foo = new Foo();
Foo bar = foo;
if(foo == bar) {
System.out.println("true");
}
I tried googling this and the lack of reliable results leads me to believe that the person who stated this was wrong, but I did find an assortment of forum posts (like this one) that seemed to indicate that he was correct. But that thread also has people saying that it shouldn't be the case.
Java Bytecode instructions are always atomic in relation to the GC (i.e. no cycle can happen while a single instruction is being executed).
The only time the GC will run is between two Bytecode instructions.
Looking at the bytecode that javac generates for the if instruction in your code we can simply check to see if a GC would have any effect:
// a GC here wouldn't change anything
ALOAD 1
// a GC cycle here would update all references accordingly, even the one on the stack
ALOAD 2
// same here. A GC cycle will update all references to the object on the stack
IF_ACMPNE L3
// this is the comparison of the two references. no cycle can happen while this comparison
// "is running" so there won't be any problems with this either
Aditionally, even if the GC were able to run during the execution of a bytecode instruction, the references of the object would not change. It's still the same object before and after the cycle.
So, in short the answer to your question is no, it will always output true.
Source:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.21.3
The short answer is, looking at the java 8 specification: No.
The == operator will always perform object equality check (given that neither reference is null). Even if the object is moved, the object is still the same object.
If you see such an effect, you have just found a JVM bug. Go submit it.
It could, of course, be that some obscure implementation of the JVM does not enforce this for whatever strange performance reason. If that is the case, it would be wise to simply move on from that JVM...
TL;DR
You should not think about that kind of stuff what so ever, It's a dark place.
Java has clearly stated out it's specifications and you should not doubt it, ever.
2.7. Representation of Objects
The Java Virtual Machine does not mandate any particular internal structure for objects.
Source: JVMS SE8.
I doubt it! If you may doubt this very basic operator you may find yourself doubt everything else, getting frustrated and paranoid with trust issues is not the place you want to be.
What if it happens to me? Such a bug should not be existed. The Oracle discussion you supplied reporting a bug that happened years ago and somehow discussion OP decided to pop that up for no reason, either without reliable documentation of such bug existed now days. However, if such bug or any others has occurred to you, please submit it here.
To let your worries go away, Java has adjusted the pointer to pointer approach into the JVM pointer table, you can read more about it's efficenty here.
GCs only happen at points in the program where the state is well-defined and the JVM has exact knowledge where everything is in registers/the stack/on the heap so all references can be fixed up when an object gets moved.
I.e. they cannot occur between execution of arbitrary assembly instructions. Conceptually you can think of them occuring between bytecode instructions of the JVM with the GC adjusting all references that have been generated by previous instructions.
You are asking a question with a wrong premise. Since the == operator does not compare memory locations, it isn’t sensible to changes of memory location per se. The == operator, applied to references, compares the identity of the referred objects, regardless of how the JVM implements it.
To name an example that counteracts the usual understanding, a distributed JVM may have objects held in the RAM of different computers, including the possibility of local copies. So simply comparing addresses won’t work. Of course, it’s up to the JVM implementation to ensure that the semantics, as defined in the Java Language Specification, do not change.
If a particular JVM implementation implements a reference comparison by directly comparing memory locations of objects and has a garbage collector that can change memory locations, of course, it’s up to the JVM to ensure that these two features can’t interfere with each other in an incompatible way.
If you are curious on how this can work, e.g. inside optimized, JIT compiled code, the granularity isn’t as fine as you might think. Every sequential code, including forward branches, can be considered to run fast enough to allow to delay garbage collection to its completion. So garbage collection can’t happen at any time inside optimized code, but must be allowed at certain points, e.g.
backward branches (note that due to loop unrolling, not every loop iteration implies a backward branch)
memory allocations
thread synchronization actions
invoking a method that hasn’t been inlined/analyzed
maybe something special, I forgot
So the JVM emits code containing certain “safe points” at which it is known, which references are currently held, how to replace them, if necessary and, of course, changing locations has no impact on the correctness. Between these points, the code can run without having to care about the possibility of changing memory locations whereas the garbage collector will wait for code reaching a safe point when necessary, which is guaranteed to happen in finite, rather short time.
But, as said, these are implementation details. On the formal level, things like changing memory locations do not exist, so there is no need to explicitly specify that they are not allowed to change the semantics of Java code. No implementation detail is allowed to do that.
I understand you are asking this question after someone says it behaves that way, but really asking if it does behave that way isn't the right approach to evaluating what they said.
What you should really be asking (primarily yourself, others only if you can't decide on an answer) is whether it makes sense for the GC to be allowed to cause a comparison to fail that logically should succeed (basically any comparison that doesn't include a weak reference).
The answer to that is obviously "no", as it would break pretty much anything beyond "hello, world" and probably even that.
So, if allowed, it is a bug -- either in the spec or the implementation. Now since both the spec and the implementation were written by humans, it is possible such a bug exists. If so, it will be reported and almost certainly fixed.
No, because that would be flagrantly ridiculous and a patent bug.
The GC takes a great deal of care behind the scenes to avoid catastrophically breaking everything. In particular, it will only move objects when threads are paused at safepoints, which are specific places in the running code generated by the JVM for threads to be paused at. A thread at a safepoint is in a known state, where the positions of all the possible object references in registers and memory are known, so the GC can update them to point to the object's new address. Garbage collection won't break your comparison operations.
Java object hold a reference to the "object" not to the memory space where the object is stored.
Java do this because it allow the JVM to manage memory usage by its own (e.g. Garbage collector) and to improve global usage without impacting the client program directly.
As instance for improvement, the first X int (I don't remember how much) are always allocated in memory to execute for loop fatser (ex: for (int i =0; i<10; i++))
And as example for object reference, just try to create an and try to print it
int[] i = {1,2,3};
System.out.println(i);
You will see that Java returning a something starting with [I#. It is saying that is point on a "array of int at" and then the reference to the object. Not the memory zone!

What are good Java coding practices to help Java GC? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I avoid garbage collection delays in Java games? (Best Practices)
Java's GC pause is a killer. Often the time, the application doesn't have the memory leak. At some point, it may pause for ~1 second for every 1G memory allocation.
What are good Java coding practices to help Java GC?
One example, since null object becomes eligible for garbage collection, so it is a good idea to explicitly set an object to null e.g. object = null.
In general to help GC you should avoid of unreasonable memory usage. Simple advices could be:
1) Do not produce new objects, where it is not needed. For example do not use constructions like String test = new String("blabla");. If it possible, reuse old objects (it belongs to immutable objects mainly).
2) Do not declare fields in classes, where they are used only inside methods; i.e. make them local variables.
3) Avoid of using object wrappers under primitive types. I.e. use int instead of Integer, boolean instead of Boolean, if you really do not need to store in them null values. Also for example, where it is possible, for memory economy, do not use ArrayList, use simple Java arrays of primitive types (not Integer[], but int[]).
The single best thing that you can do to minimize GC pauses is to properly size your heap.
If you know that your program never uses more than 1Gb of live objects, then it's pointless to pass -Xms4096m. That will actually increase your GC pauses, because the JVM will leave garbage around until it absolutely has to clear it.
Similarly, if you know that you have very few long-lived objects, you can usually benefit by increasing the size of the young generation relative to the tenured generation (for Sun JVMs).
The only thing that you can really do, coding-wise, is to move large objects off-heap. But that's unlikely to be useful for most applications.
Objects that are allocated and not immediately released get moved into the tenured heap space. Tenured memory is the most expensive to collect. Avoid churning through long lived objects should help with GC pauses.

Can there be memory leak in Java

I get this question asked many times. What is a good way to answer
Can there be memory leak in Java?
The answer is that it depends on what kind of memory leak you are talking about.
Classic C / C++ memory leaks occur when an application neglects to free or dispose an object when they are done with it, and it leaks. Cyclic references are a sub-case of this where the application has difficulty knowing when to free / dispose, and neglects to do it as a result. Related problems are where the application uses an object after it has been freed, or attempts to free it twice. (You could call the latter problems memory leaks, or just bugs. Either way ... )
Java and other (fully1) managed languages mostly don't suffer from these problems because the GC takes care of freeing objects that are no longer reachable. (Certainly, dangling pointer and double-free problems don't exist, and cycles are not problematic as they are for C / C++ "smart pointers" and other reference count schemes.)
But in some cases GC in Java will miss objects that (from the perspective of the programmer) should be garbage collected. This happens when the GC cannot figure out that an object cannot be reached:
The logic / state of the program might be such that the execution paths that would use some variable cannot occur. The developer can see this as obvious, but the GC cannot be sure, and errs on the side of caution (as it is required to).
The programmer could be wrong about it, and the GC is avoiding what might otherwise result in a dangling reference.
(Note that the causes of memory leaks in Java can be simple, or quite subtle; see #jonathan.cone's answer for some subtle ones. The last one potentially involves external resources that you shouldn't rely on the GC to deal with anyway.)
Either way, you can have a situation where unwanted objects cannot be garbage collected, and hang around tying up memory ... a memory leak.
Then there is the problem that a Java application or library can allocate off-heap objects via native code that need to be managed manually. If the application / library is buggy or is used incorrectly, you can get a native memory leak. (For example: Android Bitmap memory leak ... noting that this problem is fixed in later versions of Android.)
1 - I'm alluding to a couple of things. Some managed languages allow you to write unmanaged code where you can create classic storage leaks. Some other managed languages (or more precisely language implementations) use reference counting rather than proper garbage collecting. A reference count-based storage manager needs something (i.e. the application) to break cycles ... or else storage leaks will ensue.
Yes. Memory leaks can still occur even when you have a GC. For example, you might hold on to resources such as database result sets which you must close manually.
Well, considering that java uses a garbage collector to collect unused objects, you can't have a dangling pointer. However, you could keep an object in scope for longer than it needs to be, which could be considered a memory leak. More on this here: http://web.archive.org/web/20120722095536/http://www.ibm.com:80/developerworks/rational/library/05/0816_GuptaPalanki/
Are you taking a test on this or something? Because that's at least an A+ right there.
The answer is a resounding yes, but this is generally a result of the programming model rather than an indication of some defect in the JVM. This is common when frameworks have lifecycles different of that than a running JVM. Some examples are:
Reloading a context
Failing to dereference observers (listeners)
Forgetting to clean up resources after you're finished using them *
* - Billions of consulting dollars have been made resolving the last one
Yes, in the sense that your Java application can accumulate memory over time that the garbage collector is unable to free.
By maintaining references to uneeded/unwanted objects they will never fall out of scope and their memory will not be claimed back.
yes, if you don't de-reference objects they will never be garbage-collected and memory usage will increase. however because of how java is designed, this is difficult to achieve whereas in some other languages this is sometimes difficult not to achieve.
edit: read Amokrane's link. it's good.
Yes it is possible.
In Effective Java there is an example involving a stack implemented using arrays. If your pop operations simply decrement the index value it is possible to have a memory leak. Why? Because your array still has a reference to the popped value and you still have a reference to the stack object. So the correct thing to do for this stack implementation would be to clear the reference to the popped value using an explicit null assignment at the popped array index.
The short answer:
A competent JVM has no memory
leaks, but more memory can be used
than is needed, because not all unused
objects have been garbage collected,
yet. Also, Java apps themselves can hold references to objects they no longer need and this can result in a memory leak.
The book Effective Java gives two more reasons for "memory leaks":
Once you put object reference in Cache and forget that it's there. The reference remains in cache long before becoming irrelevant. Solution is to represent cache as a WeakHashMap
in an API where clients register callbacks and don't re-register them explicitly. Solution is to store only weak references to them.
Yes, it can be, in a context when a program mistakenly hold a reference to an object that would be never used again and therefore it's not cleaned by the GC.
An example to it would be forgetting to close an opened stream:
class MemoryLeak {
private void startLeaking() throws IOException {
StringBuilder input = new StringBuilder();
URLConnection conn = new URL("www.example.com/file.txt").openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8));
while (br.readLine() != null) {
input.append(br.readLine());
}
}
public static void main(String[] args) throws IOException {
MemoryLeak ml = new MemoryLeak();
ml.startLeaking();
}
}
One simple answer is : JVM will take care of all your initialization of POJO's [plain old java objects] as long as you are not working with JNI. With JNI if you have made any memory allocation with the native code you have to take care of that memory by yourself.
Yes. A memory leak is unused memory not released to the memory manager by the app.
I've seen many times Java code wich stores items on a data structure but the items are never removed from there, filling the memory until an OutOfMemoryError:
void f() {
List<Integer> w = new ArrayList<Integer>();
while (true) {
w.add(new Integer(42));
}
}
While this example is too obvious, Java memory errors tend to be more subtle. For example, using Dependency Injection storing a huge object on a component with SESSION scope, without releasing it when the object is no longer used.
On a 64 bits VM this tends to get worse since the swap memory space starts to get filled until the system crawls on too many IO operations.

Is there something like malloc/free in java?

I've never seen such statements though,does it exist in java world at all?
Java's version of malloc is new -- it creates a new object of a specified type.
In Java, memory is managed for you, so you cannot explicitly delete or free an object.
Java has a garbage collector. That's why you never see such statements in your code(which is nice if you ask me)
In computer science, garbage
collection (GC) is a form of automatic
memory management. It is a special
case of resource management, in which
the limited resource being managed is
memory. The garbage collector, or just
collector, attempts to reclaim
garbage, or memory occupied by objects
that are no longer in use by the
program. Garbage collection was
invented by John McCarthy around 1959
to solve problems in Lisp.
new instead of malloc, garbage collector instead of free.
No direct equivalents exist in Java:
C malloc creates an untyped heap node and returns you a pointer to it that allows you to access the memory however you want.
Java does not have the concept of an untyped object, and does not allow you to access memory directly. The closest that you can get in Java to malloc would be new byte[size], but that returns you a strongly typed object that you can ONLY use as a byte array.
C free releases a heap node.
Java does not allow you to explicitly release objects. Object deallocation in Java is totally in the hands of the garbage collector. In some cases you can influence the behaviour of the GC; e.g. by assigning null to a reference variable and calling System.gc(). However, this does not force the object to be deallocated ... and is a very expensive way to proceed.
If you are up to no good (tm) I suppose you can get access to raw memory though the JNI interface. This is where you can call C programs from Java Programs. Of course you have to be running in an environment where your program has the privileges to do so (a browser won't normally allow this unless it is suicidal) but you can access objects via C pointers that way.
I sort of wonder where the original question is coming from. At one point long ago I was totally skeptical of the notion that C-style memory management and C-style pointers were not needed, but at this point I am true believer.

Categories