String Immutability memory Issue

String Immutability memory Issue - java

Once the String object is created , we can't modify it But if we do any operations on it JVM will create New Object. Here by creating new objects then JVM consumes more memory. Then i think it causes to memory issue right.?

You are correct. It is definitely worth being aware of this issue, even if it doesn't affect you every time.
As you say, Strings cannot change after creation - they're immutable and they don't expose many ways to change them.
However, operations such as a split() will be generating additional string objects in the background, and each of those strings have a memory overhead if you are holding onto references to them.
As the other posters note, the objects will be small and garbage collection will usually clean up the old ones after they have gone out of scope, so you generally won't have to worry about this.
However, if you're doing something specific and holding onto large amounts of string references then this could bite you.
Look at String interning depending on your use case, noting the warnings on the linked page.
Two things to note:
1) Hard coded String literals will be automatically interned by Java, reducing the impact of this.
2) The + operator is more efficient in this regard, it will use String Builders underneath giving performance & memory benefits.

No, that does not. If you do not hold strong links to String instances they eventually will be collected by a garbage collector.
For example:
while (true) {
new String("that is a string");
}
in this snippet you continuously create new object instances, however you will never get OutOfMemoryException as created instances become garbage (there are obviously no strong links).

It consumes more memory for new objects, that's right. But that fact in itself does not create an issue, because garbage collector promptly reclaims all inaccessible memory. Of course you can turn it into an issue by creating links to the newly created strings, but that would be an issue of your program, not of JVM.

The biggest memory issue you have to know about is taking a small substring of a huge string. That substring shares the original string's char array and even if the original string gets gc'd, the huge char array will still be referenced by the substring. The workaround is to use new String(hugeString.substring(i)).

The issue that is generated is the fact that garbage is generated. This issue is resolved by the virtual machine by calling the garbage collector which frees the memory used by that garbage.

As soon as the old object is not used anymore, it can be removed by the garbage collector. (Which will be done far before any memory issue arises).
If you want to prevent the copying of the data, use a StringBuilder.

Unused objects are collected by GC.
and Immutability got many benefits in java.
In Java achieving as much immutability as possible is a good practice.
They can be safely used in Collections frameworks also.
Check this

As far as I know StringBuilder (or StringBuffer for thread safe) is useful for managing String and make them mutable.
Manipulate some characters in a huge String do not 'eat' many bytes in memory.
It is also more powerful/speed for concate.

Since a string instance is immutable it can be reused by the jvm. The String class is implemented with Flyweight Design Pattern that is used to avoid memory issues.

Related

Should Guava Splitters/Joiners be created each time they are used?

Guava contains utilities for splitting and joining Strings, but it requires the instantiation of a Splitter/Joiner object to do so. These are small objects that typically only contain the character(s) on which to split/join. Is it a good idea to maintain references to these objects in order to reuse them, or is it preferable to just create them whenever you need them and let them be garbage collected?
For example, I could implement this method in the following two ways:
String joinLines(List<String> lines) {
return Joiner.on("\n").join(lines);
}
OR
static final Joiner LINE_JOINER = Joiner.on("\n");
String joinLines(List<String> lines) {
return LINE_JOINER.join(lines);
}
I find the first way more readable, but it seems wasteful to create a new Joiner each time the method is called.

To be honest, this sounds like premature optimization to me. I agree with #Andy Turner, write whatever is easiest to understand and maintain.
If you plan to use Joiner.on("\n") in a few places, make it a well named constant; go with option two.
If you only plan to use it in your joinLines method, a constant seems overly verbose; go with option one.

It depends greatly on how often you expect the code to be called and what tradeoffs you want to make between CPU time, memory consumption and readability. Since Joiner is such a small thing, it's not going to make a huge difference either way: if you make it a constant, you save the (fairly minimal) costs of allocating it and GCing it for each call, while adding the (fairly minimal) memory consumption overhead to the program.
It also depends in part on what platform you're running the code on: if you're running on the server, typically you'll have plenty of memory so keeping a constant won't be an issue. On the other hand, if you're running on Android you're more memory constrained, but you also want to avoid unnecessary allocations since garbage collection is going to be worse and more impactful to your performance.
Personally, I tend to allocate a constant unless I know it's only going to be used some fixed number of times as opposed to repeatedly throughout the program.

What are some best practices to build memory-efficient Java applications?

Java programs can be very memory hungry. For example, a Double object has 24 bytes: 8 bytes of data and 16 bytes of JVM-imposed overhead. In general, the objects that represent the primitive types are very expensive.
The same happens for any collection in the Java Standard Library. There are even some counterintuitive facts such as a HashSet being more memory hungry than a HashMap, since a HashSet contains a HashMap inside (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html).
Could you come up with some advice when modeling data and delegation of objects in high performance settings so that these "weaknesses" of Java are mitigated?

Some techniques I use to reduce memory:
Make your own IntArrayList (etc) class that prevents boxing
Make your own IntHashMap (etc) class where keys are primitives
Use nio's ByteBuffer to store large arrays of data efficiently (and in native memory, outside heap). It's like a byte array but contains methods to store/retrieve all primitive types from the buffer at any arbitrary offset (trade memory for speed)
Don't use pooling because pools keep unused instances explicitly alive.
Use threads scarcely, they're super memory hungry (in native memory, outside heap)
When making substrings of big strings, and discarding the original, the substrings still refer to the original. So use new String to dispose of the old big string.
A linear array is smaller than a multidimensional array, and if the size of all but the last dimension is a power of two, calculating indices is fastest: array[x|y<<4] for a 16xN array.
Initialize collections and StringBuilder with an initial capacity chosen such that it prevents internal reallocation in a typical circumstance.
Use StringBuilder instead of string concatenation, because the compiled class files use new StringBuilder() without initial capacity to concatenate strings.

Depends on the application, but generally speaking
Layout data structures in (parallel) arrays of primitives
Try to make big "flat" objects, inlining otherwise sensible sub-structures
Specialize collections of primitives
Reuse objects, use object pools, ThreadLocals
Go off-heap
I cannot say these practices are "best", because they, unfortunately, make you suffer, losing the point why you are using Java, reduce flexibility, supportability, reliability, testability and other "good" properties of the codebase.
But, they certainly allow to lower memory footprint and GC pressure.

One of the memory problems that are easy to overlook in Java is memory leakage. Nicholas Greene already pointed you to memory profiling.
Many people assume that Java's garbage collection prevents memory leaks, but that is not actually true - all it takes is one forgotten reference somewhere to keep an object around in perpetuity. Paradoxically, trying to optimize your program may introduce more opportunities for memory leaks because you end up with more complex data structures.
One example for a memory leak if you are implementing, for instance, a stack:
Integer stack[];
stack = new Integer[10];
int stackPtr = 0;
// a few push operation on our stack.
stack[stackPtr++] = new Integer(5);
stack[stackPtr++] = new Integer(3);
// and pop from the stack again
--stackPtr;
--stackPtr;
// at this point, the stack is logically empty, but
// the Integer objects are still referenced by the array,
// and are basically leaked.
The correct solution would have been:
stack[--stackPtr] = null;

If you have high performance constraints and need to use collections for simple types, you might take a look on some implementations of Primitive Collections for Java.
Some are:
HPPC
GNU Trove
Apache Commons Primitives
Also, as a reference take a look at this question: Why can Java Collections not directly store Primitives types?

Luís Bianchin already gave you a few libraries which implement optimal collections in Java.
Nevertheless, it seems that you are specially concerned about Java collections' memory allocation. In that case, there are a few alternatives which are quite straight forward.
Cache
You could use a cache to limit the memory the collection (the cache) can allocate. By doing that, you only load in main memory the most frequently used entries and you don't need to load the whole data set form disk/network/whatever. I highly recommend Guava Cache as it's very well documented and pretty mature.
Persistent Collections
Sometimes a cache is not a solution for your problem. For example, in an ETL solution, you might know you will only load each entry once. For this scenario I recommend to go for persistent collections. These are disk stored collections that are way faster than traditional databases but have nice Java APIs.
MapDB and PCollections are for me the best libraries.
Profile memory usage
On top of that, if you really want to know the actual state of your program's memory allocation I highly recommend you to use a profiler. This way you will not only know how much memory you collections occupy, but also how the GC behaves over time.
In fact, you should only try an alternative to Java's collections and data structures if there is an actual memory problem, and that is something a profiler can tell you.
The JDK has a profiler called VisualVM which does a great job. Nevertheless, I recommend you to use a commercial profiler if you can afford it. The commercial profilers usually have a low impact in the application's performance when compared to VisualVM.
Memory optimal data is nice with the network.
Finally, that it's not strictly related to your question, but it's closely connected. In case you want to serialize your Java objects into an optimal binary representation I recommend you Google Protocol Buffers in Java. Protocol buffers are ideal to transfer data structures thought the network using the least bandwidth possible and having a really fast coding/decoding.

Well there is a lot of things you can do.
Here are a few problems and solutions:
When you change the value of a string in java, the string is not actually overwritten. Instead, a new string is created to replace the old one. However, the old string still exists. This can be a problem when using RAM efficiently is a concern. Here are some solutions to this problem:
When using a string to specify something like the "state" of an object or anything else that can only have a specific set of possible values, don't use a string. Instead use an enum. If you don't know what an enum is or how to use one yet, here's a link to a tutorial on what enums are and how to use them!
If you are using a string as a variable who's value will change at some point in the program, don't define a string how you usually would. Instead, use the StringBuilder class from the java.lang package. StringBuilder is a class which is used to create strings and change their values. This class handles strings differently than usual. When it is used to change the value of a string, StringBuilder doesn't create a duplicate string with a different value to replace the old string, it actually changes the value of the original string. Therefore, since you aren't creating duplicate strings, this saves RAM. Here is a link to to the StringBuilder class in the java api.
Writer and reader objects such as fileWriters and fileReaders also take up RAM. If you have a lot of them, this can also cause problems. Here are some solutions:
All reader and writer objects have a method called close(). As you can probably guess, it closes the writer or reader object. All it does is get rid of the reader or writer object. Whenever you have a reader or writer object and you reach the point in your code when you know you will never use the reader or writer object anymore, use this method. It will get rid of the reader or writer object and will free some RAM.
Every object in java takes up memory. When you have an object that you won't use anymore, it's not very convenient to keep it around.
The Object class has a method called finalize(). This method has the same effect as the close() method in reader and writer objects. When you aren't going to use an object anymore, use the finalize() method to get rid of it and free some RAM.

Beware of early optimisation.
See When is optimisation premature?
While not knowing the exact requirements of your application or runtime environment, in my experience java was able to handle anything I threw it at. Doing some profiling on your demo /proof of concept app might be time well spent if performance or garbage collection (you tagged memory leaks) is an issue.

Does Java optimize immutable objects?

Java strings are immutable, and instantiating multiple Strings with the same values returns the same object pointer. (Is there a term for this? "pooling" seems to fit, but that already refers to doing caching to save time by doing fewer instantiations.)
Does Java also do this (the thing without a term) with other (user-defined) classes that are immutable? Can Java even detect that a class is immutable, or is this something unique to the string class?

Wrt. Strings, the word you're looking for is interning.
Java won't do this for your own immutable objects. It does have cached versions of boxed primitives, though. See this article on wrapper class caching for more info.

As others here have said this process with Strings is known as interning.
Its worth mentioning that the behaviour of Strings with the same literal values being the same object may or may not be true in Java 7. From 7 onwards:
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
Take a look at Java SE 7 RFE for the full details on this.
With regards to your own immutable objects Java doesnt do anything special with them - it doesnt know that they're immutable. It may inline methods a little more than otherwise if it can detect that its worthwhile/possible, but as far at the compiler and JVM are concerned they're just another object.

The term you are lookig is itering. Java optimize strings "automatically", during compilation and give the developer possibility to do it on runtime. (The details about what is optimized when depend on JVM version.)
As far it goes for immutable objects. I do not think that Java support any type of mechanism that will resolve same instace. String type is not exeption of this rule.
Reason why, is that you have to use operator new to create a instance. If you use new to create string instance, you will always get two different objects.
The intering is avaiable only for String type. But the concept is free, you can add to your immutable class such method and write an compled method that will do the same thing.

String interning. Wikipedia: String Interning

String Interning is unique to String class only. I suppose that JVM does not apply these rules for a user defined classes.

Why are strings immutable in many programming languages? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why can't strings be mutable in Java and .NET?
Why .NET String is immutable?
Several languages have chosen for this, such as C#, Java, and Python. If it is intended to save memory or gain efficiency for operations like compare, what effect does it have on concatenation and other modifying operations?

Immutable types are a good thing generally:
They work better for concurrency (you don't need to lock something that can't change!)
They reduce errors: mutable objects are vulnerable to being changed when you don't expect it which can introduce all kinds of strange bugs ("action at a distance")
They can be safely shared (i.e. multiple references to the same object) which can reduce memory consumption and improve cache utilisation.
Sharing also makes copying a very cheap O(1) operation when it would be O(n) if you have to take a defensive copy of a mutable object. This is a big deal because copying is an incredibly common operation (e.g. whenever you want to pass parameters around....)
As a result, it's a pretty reasonable language design choice to make strings immutable.
Some languages (particularly functional languages like Haskell and Clojure) go even further and make pretty much everything immutable. This enlightening video is very much worth a look if you are interested in the benefits of immutability.
There are a couple of minor downsides for immutable types:
Operations that create a changed string like concatenation are more expensive because you need to construct new objects. Typically the cost is O(n+m) for concatenating two immutable Strings, though it can go as low as O(log (m+n)) if you use a tree-based string data structure like a Rope. Plus you can always use special tools like Java's StringBuilder if you really need to concatenate Strings efficiently.
A small change on a large string can result in the need to construct a completely new copy of the large String, which obviously increases memory consumption. Note however that this isn't usually a big issue in garbage-collected languages since the old copy will get garbage collected pretty quickly if you don't keep a reference to it.
Overall though, the advantages of immutability vastly outweigh the minor disadvantages. Even if you are only interested in performance, the concurrency advantages and cheapness of copying will in general make immutable strings much more performant than mutable ones with locking and defensive copying.

It's mainly intended to prevent programming errors. For example, Strings are frequently used as keys in hashtables. If they could change, the hashtable would become corrupted. And that's just one example where having a piece of data change while you're using it causes problems. Security is another: if you checking whether a user is allowed to access a file at a given path before executing the operation they requested, the string containing the path better not be mutable...
It becomes even more important when you're doing multithreading. Immutable data can be safely passed around between threads while mutable data causes endless headaches.
Basically, immutable data makes the code that works on it easier to reason about. Which is why purely functional languages try to keep everything immutable.

In Java not only String but all primitive Wrapper classes (Integer, Double, Character etc) are immutable. I am not sure of the exact reason but I think these are the basic data types on which all the programming schemes work. If they change, things could go wild. To be more specific, I'll use an example: Say you have opened a socket connection to a remote host. The host name would be a String and port would be Integer. What if these values are modified after the connection is established.
As far as performance is concerned, Java allocates memory to these classes from a separate memory section called Literal Pool, and not from stack or Heap. The Literal Pool is indexed and if you use a string "String" twice, they point to the same object from Literal pool.

Having strings as immutable also allows the new string references easy, as the same/similar strings will be readily available from the pool of the Strings previously created. Thereby reducing the cost of new object creation.

Can there be memory leak in Java

I get this question asked many times. What is a good way to answer

Can there be memory leak in Java?
The answer is that it depends on what kind of memory leak you are talking about.
Classic C / C++ memory leaks occur when an application neglects to free or dispose an object when they are done with it, and it leaks. Cyclic references are a sub-case of this where the application has difficulty knowing when to free / dispose, and neglects to do it as a result. Related problems are where the application uses an object after it has been freed, or attempts to free it twice. (You could call the latter problems memory leaks, or just bugs. Either way ... )
Java and other (fully1) managed languages mostly don't suffer from these problems because the GC takes care of freeing objects that are no longer reachable. (Certainly, dangling pointer and double-free problems don't exist, and cycles are not problematic as they are for C / C++ "smart pointers" and other reference count schemes.)
But in some cases GC in Java will miss objects that (from the perspective of the programmer) should be garbage collected. This happens when the GC cannot figure out that an object cannot be reached:
The logic / state of the program might be such that the execution paths that would use some variable cannot occur. The developer can see this as obvious, but the GC cannot be sure, and errs on the side of caution (as it is required to).
The programmer could be wrong about it, and the GC is avoiding what might otherwise result in a dangling reference.
(Note that the causes of memory leaks in Java can be simple, or quite subtle; see #jonathan.cone's answer for some subtle ones. The last one potentially involves external resources that you shouldn't rely on the GC to deal with anyway.)
Either way, you can have a situation where unwanted objects cannot be garbage collected, and hang around tying up memory ... a memory leak.
Then there is the problem that a Java application or library can allocate off-heap objects via native code that need to be managed manually. If the application / library is buggy or is used incorrectly, you can get a native memory leak. (For example: Android Bitmap memory leak ... noting that this problem is fixed in later versions of Android.)
1 - I'm alluding to a couple of things. Some managed languages allow you to write unmanaged code where you can create classic storage leaks. Some other managed languages (or more precisely language implementations) use reference counting rather than proper garbage collecting. A reference count-based storage manager needs something (i.e. the application) to break cycles ... or else storage leaks will ensue.

Yes. Memory leaks can still occur even when you have a GC. For example, you might hold on to resources such as database result sets which you must close manually.

Well, considering that java uses a garbage collector to collect unused objects, you can't have a dangling pointer. However, you could keep an object in scope for longer than it needs to be, which could be considered a memory leak. More on this here: http://web.archive.org/web/20120722095536/http://www.ibm.com:80/developerworks/rational/library/05/0816_GuptaPalanki/
Are you taking a test on this or something? Because that's at least an A+ right there.

The answer is a resounding yes, but this is generally a result of the programming model rather than an indication of some defect in the JVM. This is common when frameworks have lifecycles different of that than a running JVM. Some examples are:
Reloading a context
Failing to dereference observers (listeners)
Forgetting to clean up resources after you're finished using them *
* - Billions of consulting dollars have been made resolving the last one

Yes, in the sense that your Java application can accumulate memory over time that the garbage collector is unable to free.
By maintaining references to uneeded/unwanted objects they will never fall out of scope and their memory will not be claimed back.

yes, if you don't de-reference objects they will never be garbage-collected and memory usage will increase. however because of how java is designed, this is difficult to achieve whereas in some other languages this is sometimes difficult not to achieve.
edit: read Amokrane's link. it's good.

Yes it is possible.
In Effective Java there is an example involving a stack implemented using arrays. If your pop operations simply decrement the index value it is possible to have a memory leak. Why? Because your array still has a reference to the popped value and you still have a reference to the stack object. So the correct thing to do for this stack implementation would be to clear the reference to the popped value using an explicit null assignment at the popped array index.

The short answer:
A competent JVM has no memory
leaks, but more memory can be used
than is needed, because not all unused
objects have been garbage collected,
yet. Also, Java apps themselves can hold references to objects they no longer need and this can result in a memory leak.

The book Effective Java gives two more reasons for "memory leaks":
Once you put object reference in Cache and forget that it's there. The reference remains in cache long before becoming irrelevant. Solution is to represent cache as a WeakHashMap
in an API where clients register callbacks and don't re-register them explicitly. Solution is to store only weak references to them.

Yes, it can be, in a context when a program mistakenly hold a reference to an object that would be never used again and therefore it's not cleaned by the GC.
An example to it would be forgetting to close an opened stream:
class MemoryLeak {
private void startLeaking() throws IOException {
StringBuilder input = new StringBuilder();
URLConnection conn = new URL("www.example.com/file.txt").openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8));
while (br.readLine() != null) {
input.append(br.readLine());
}
}
public static void main(String[] args) throws IOException {
MemoryLeak ml = new MemoryLeak();
ml.startLeaking();
}
}

One simple answer is : JVM will take care of all your initialization of POJO's [plain old java objects] as long as you are not working with JNI. With JNI if you have made any memory allocation with the native code you have to take care of that memory by yourself.

Yes. A memory leak is unused memory not released to the memory manager by the app.
I've seen many times Java code wich stores items on a data structure but the items are never removed from there, filling the memory until an OutOfMemoryError:
void f() {
List<Integer> w = new ArrayList<Integer>();
while (true) {
w.add(new Integer(42));
}
}
While this example is too obvious, Java memory errors tend to be more subtle. For example, using Dependency Injection storing a huge object on a component with SESSION scope, without releasing it when the object is no longer used.
On a 64 bits VM this tends to get worse since the swap memory space starts to get filled until the system crawls on too many IO operations.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.