I'm using an external library which uses String.intern() for performance reasons. That's fine, but I'm invoking that library a lot in a given run and so I run into the dreaded
java.lang.OutOfMemoryError: PermGen space
Obviously I can use the JVM command-line -XX:MaxPermSize modifier, but that solution isn't very scalable. Instead, is there any way to periodically (between two "batches" of library calls) "flush" the interned string pool, i.e. empty the static table of strings held by the String class?
No. Just size permgen appropriately. It's no different to having to size the heap appropriately. Don't be afraid!
Investigating further, I found this article, which seems to demonstrate that interned strings are still garbage collected. I guess that means that my problem here is a deeper one - the library I use must still hold a living reference to these strings :(
Related
We know that String create object in heap and scp based on situation but what if String use only scp for every situation, so that we can save some memory space
Firstly, despite what you have heard or read, the Oracle documentation does not mention a thing called the "string constant pool". (Or "scp".) In fact, there are two distinct things:
The Constant Pool which is part of the ".class" file format and represents many the kinds of constants emitted by the compiler.
The String Pool is a runtime data structure that is primarily used to implement certain properties about String objects that originated as the result of compile time constant expressions.
But while the latter holds "constants", it can also holds String objects that were placed there by calling String.intern. So from that respect it is not a string "constant" pool. Alternatively, all String objects are immutable (constant), so from that perspective the "constant" in string constant pool is redundant.
In addition you say:
We know that String create object in heap and scp based on situation.
In a modern JVM (Java 7 or later), the strings in the string pool are actually in the regular heap.
The only situations where the JVM puts a string into the string pool are:
when creating a String object corresponding to a String-valued constant-expression in a .class file, or
when application code calls the String.intern method.
No other string constructors or methods do it, and (AFAIK) none of the standard Java SE librarys ever use intern().
So to answer your question:
Why String doesn't use scp only?
Because when the String is not a duplicate, putting it into the pool doesn't save memory. Rather it uses more memory.
Because a String in the string pool tends to live longer beyond the point where it becomes unreachable. (This is certainly true for Java versions where the string pool was in the PermGen heap.) So you may end up using the memory for a pooled String for longer than if it hadn't been pooled. That can also mean that more memory is used overall.
Because searching the string pool each time you created a new string would be (relatively) expensive.
Because the string pool creates more work for the garbage collector. The pool is a native hash table data structure that contains references that are akin to Reference types. Searching or scanning the table to remove strings that are no longer reachable costs extra GC time.
Because ... frankly ... the percentage of memory used by (aka "wasted" on) duplicate strings is not significant in most applications.
Java tends to be memory hungry by virtue of being a garbage collected language. But it is memory hungry irrespective of string pooling. So if your application requirements include running in a minimal memory footprint, Java string pooling is not the solution. You should probably be using a different programming language.
In fact, since Java 9 there is a better way to save memory used by duplicate strings. Enable the GC's string deduplication feature. It is more efficient than interning because the deduping is only done on strings that have survived a number of new-space garbage collections. This reduces the wasted effort on deduping strings that turn out to be short lived.
Java programs can be very memory hungry. For example, a Double object has 24 bytes: 8 bytes of data and 16 bytes of JVM-imposed overhead. In general, the objects that represent the primitive types are very expensive.
The same happens for any collection in the Java Standard Library. There are even some counterintuitive facts such as a HashSet being more memory hungry than a HashMap, since a HashSet contains a HashMap inside (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html).
Could you come up with some advice when modeling data and delegation of objects in high performance settings so that these "weaknesses" of Java are mitigated?
Some techniques I use to reduce memory:
Make your own IntArrayList (etc) class that prevents boxing
Make your own IntHashMap (etc) class where keys are primitives
Use nio's ByteBuffer to store large arrays of data efficiently (and in native memory, outside heap). It's like a byte array but contains methods to store/retrieve all primitive types from the buffer at any arbitrary offset (trade memory for speed)
Don't use pooling because pools keep unused instances explicitly alive.
Use threads scarcely, they're super memory hungry (in native memory, outside heap)
When making substrings of big strings, and discarding the original, the substrings still refer to the original. So use new String to dispose of the old big string.
A linear array is smaller than a multidimensional array, and if the size of all but the last dimension is a power of two, calculating indices is fastest: array[x|y<<4] for a 16xN array.
Initialize collections and StringBuilder with an initial capacity chosen such that it prevents internal reallocation in a typical circumstance.
Use StringBuilder instead of string concatenation, because the compiled class files use new StringBuilder() without initial capacity to concatenate strings.
Depends on the application, but generally speaking
Layout data structures in (parallel) arrays of primitives
Try to make big "flat" objects, inlining otherwise sensible sub-structures
Specialize collections of primitives
Reuse objects, use object pools, ThreadLocals
Go off-heap
I cannot say these practices are "best", because they, unfortunately, make you suffer, losing the point why you are using Java, reduce flexibility, supportability, reliability, testability and other "good" properties of the codebase.
But, they certainly allow to lower memory footprint and GC pressure.
One of the memory problems that are easy to overlook in Java is memory leakage. Nicholas Greene already pointed you to memory profiling.
Many people assume that Java's garbage collection prevents memory leaks, but that is not actually true - all it takes is one forgotten reference somewhere to keep an object around in perpetuity. Paradoxically, trying to optimize your program may introduce more opportunities for memory leaks because you end up with more complex data structures.
One example for a memory leak if you are implementing, for instance, a stack:
Integer stack[];
stack = new Integer[10];
int stackPtr = 0;
// a few push operation on our stack.
stack[stackPtr++] = new Integer(5);
stack[stackPtr++] = new Integer(3);
// and pop from the stack again
--stackPtr;
--stackPtr;
// at this point, the stack is logically empty, but
// the Integer objects are still referenced by the array,
// and are basically leaked.
The correct solution would have been:
stack[--stackPtr] = null;
If you have high performance constraints and need to use collections for simple types, you might take a look on some implementations of Primitive Collections for Java.
Some are:
HPPC
GNU Trove
Apache Commons Primitives
Also, as a reference take a look at this question: Why can Java Collections not directly store Primitives types?
Luís Bianchin already gave you a few libraries which implement optimal collections in Java.
Nevertheless, it seems that you are specially concerned about Java collections' memory allocation. In that case, there are a few alternatives which are quite straight forward.
Cache
You could use a cache to limit the memory the collection (the cache) can allocate. By doing that, you only load in main memory the most frequently used entries and you don't need to load the whole data set form disk/network/whatever. I highly recommend Guava Cache as it's very well documented and pretty mature.
Persistent Collections
Sometimes a cache is not a solution for your problem. For example, in an ETL solution, you might know you will only load each entry once. For this scenario I recommend to go for persistent collections. These are disk stored collections that are way faster than traditional databases but have nice Java APIs.
MapDB and PCollections are for me the best libraries.
Profile memory usage
On top of that, if you really want to know the actual state of your program's memory allocation I highly recommend you to use a profiler. This way you will not only know how much memory you collections occupy, but also how the GC behaves over time.
In fact, you should only try an alternative to Java's collections and data structures if there is an actual memory problem, and that is something a profiler can tell you.
The JDK has a profiler called VisualVM which does a great job. Nevertheless, I recommend you to use a commercial profiler if you can afford it. The commercial profilers usually have a low impact in the application's performance when compared to VisualVM.
Memory optimal data is nice with the network.
Finally, that it's not strictly related to your question, but it's closely connected. In case you want to serialize your Java objects into an optimal binary representation I recommend you Google Protocol Buffers in Java. Protocol buffers are ideal to transfer data structures thought the network using the least bandwidth possible and having a really fast coding/decoding.
Well there is a lot of things you can do.
Here are a few problems and solutions:
When you change the value of a string in java, the string is not actually overwritten. Instead, a new string is created to replace the old one. However, the old string still exists. This can be a problem when using RAM efficiently is a concern. Here are some solutions to this problem:
When using a string to specify something like the "state" of an object or anything else that can only have a specific set of possible values, don't use a string. Instead use an enum. If you don't know what an enum is or how to use one yet, here's a link to a tutorial on what enums are and how to use them!
If you are using a string as a variable who's value will change at some point in the program, don't define a string how you usually would. Instead, use the StringBuilder class from the java.lang package. StringBuilder is a class which is used to create strings and change their values. This class handles strings differently than usual. When it is used to change the value of a string, StringBuilder doesn't create a duplicate string with a different value to replace the old string, it actually changes the value of the original string. Therefore, since you aren't creating duplicate strings, this saves RAM. Here is a link to to the StringBuilder class in the java api.
Writer and reader objects such as fileWriters and fileReaders also take up RAM. If you have a lot of them, this can also cause problems. Here are some solutions:
All reader and writer objects have a method called close(). As you can probably guess, it closes the writer or reader object. All it does is get rid of the reader or writer object. Whenever you have a reader or writer object and you reach the point in your code when you know you will never use the reader or writer object anymore, use this method. It will get rid of the reader or writer object and will free some RAM.
Every object in java takes up memory. When you have an object that you won't use anymore, it's not very convenient to keep it around.
The Object class has a method called finalize(). This method has the same effect as the close() method in reader and writer objects. When you aren't going to use an object anymore, use the finalize() method to get rid of it and free some RAM.
Beware of early optimisation.
See When is optimisation premature?
While not knowing the exact requirements of your application or runtime environment, in my experience java was able to handle anything I threw it at. Doing some profiling on your demo /proof of concept app might be time well spent if performance or garbage collection (you tagged memory leaks) is an issue.
Java strings are immutable, and instantiating multiple Strings with the same values returns the same object pointer. (Is there a term for this? "pooling" seems to fit, but that already refers to doing caching to save time by doing fewer instantiations.)
Does Java also do this (the thing without a term) with other (user-defined) classes that are immutable? Can Java even detect that a class is immutable, or is this something unique to the string class?
Wrt. Strings, the word you're looking for is interning.
Java won't do this for your own immutable objects. It does have cached versions of boxed primitives, though. See this article on wrapper class caching for more info.
As others here have said this process with Strings is known as interning.
Its worth mentioning that the behaviour of Strings with the same literal values being the same object may or may not be true in Java 7. From 7 onwards:
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
Take a look at Java SE 7 RFE for the full details on this.
With regards to your own immutable objects Java doesnt do anything special with them - it doesnt know that they're immutable. It may inline methods a little more than otherwise if it can detect that its worthwhile/possible, but as far at the compiler and JVM are concerned they're just another object.
The term you are lookig is itering. Java optimize strings "automatically", during compilation and give the developer possibility to do it on runtime. (The details about what is optimized when depend on JVM version.)
As far it goes for immutable objects. I do not think that Java support any type of mechanism that will resolve same instace. String type is not exeption of this rule.
Reason why, is that you have to use operator new to create a instance. If you use new to create string instance, you will always get two different objects.
The intering is avaiable only for String type. But the concept is free, you can add to your immutable class such method and write an compled method that will do the same thing.
String interning. Wikipedia: String Interning
String Interning is unique to String class only. I suppose that JVM does not apply these rules for a user defined classes.
Once the String object is created , we can't modify it But if we do any operations on it JVM will create New Object. Here by creating new objects then JVM consumes more memory. Then i think it causes to memory issue right.?
You are correct. It is definitely worth being aware of this issue, even if it doesn't affect you every time.
As you say, Strings cannot change after creation - they're immutable and they don't expose many ways to change them.
However, operations such as a split() will be generating additional string objects in the background, and each of those strings have a memory overhead if you are holding onto references to them.
As the other posters note, the objects will be small and garbage collection will usually clean up the old ones after they have gone out of scope, so you generally won't have to worry about this.
However, if you're doing something specific and holding onto large amounts of string references then this could bite you.
Look at String interning depending on your use case, noting the warnings on the linked page.
Two things to note:
1) Hard coded String literals will be automatically interned by Java, reducing the impact of this.
2) The + operator is more efficient in this regard, it will use String Builders underneath giving performance & memory benefits.
No, that does not. If you do not hold strong links to String instances they eventually will be collected by a garbage collector.
For example:
while (true) {
new String("that is a string");
}
in this snippet you continuously create new object instances, however you will never get OutOfMemoryException as created instances become garbage (there are obviously no strong links).
It consumes more memory for new objects, that's right. But that fact in itself does not create an issue, because garbage collector promptly reclaims all inaccessible memory. Of course you can turn it into an issue by creating links to the newly created strings, but that would be an issue of your program, not of JVM.
The biggest memory issue you have to know about is taking a small substring of a huge string. That substring shares the original string's char array and even if the original string gets gc'd, the huge char array will still be referenced by the substring. The workaround is to use new String(hugeString.substring(i)).
The issue that is generated is the fact that garbage is generated. This issue is resolved by the virtual machine by calling the garbage collector which frees the memory used by that garbage.
As soon as the old object is not used anymore, it can be removed by the garbage collector. (Which will be done far before any memory issue arises).
If you want to prevent the copying of the data, use a StringBuilder.
Unused objects are collected by GC.
and Immutability got many benefits in java.
In Java achieving as much immutability as possible is a good practice.
They can be safely used in Collections frameworks also.
Check this
As far as I know StringBuilder (or StringBuffer for thread safe) is useful for managing String and make them mutable.
Manipulate some characters in a huge String do not 'eat' many bytes in memory.
It is also more powerful/speed for concate.
Since a string instance is immutable it can be reused by the jvm. The String class is implemented with Flyweight Design Pattern that is used to avoid memory issues.
In Java,
String literals in String Constant Pool are not garbage collected,
since they are referenced from Table of references which is created by instance of runtime in order to optimize space.
If Size of String literal pool exceeds then,
Since each String in String literal pool has reference hence it will be not eligible for GC.
how it is handled by JVM ?
There is a long discussion with real code examples at JavaRanch.
The general output is the following:
If a string is added to constant pool at RUNTIME using String.intern(), it can be garbage collected after it is no longer in use. Most probably, the string pool keeps only soft references to added strings, thus allowing to garbage collect them (can't be sure, because String.intern() is a native method).
If a string is a COMPILE time constant, it is added to the constant pool of the corresponding class. Therefore, it can be garbage collected only after the class is unloaded.
The answer to your question is: the only way how you can get OutOfMemoryError because of String constants is to load lot's of classes with many string literals computed at compile time. Then you can eventually exceed maximum size of PermGen space. But this will happen at the time you load classes into memory (e.g., start your application, deploy project to a WebServer, dynamically load new library, etc.)
String literals can be collected when they are no longer needed. Usually this is not a problem if they appear in classes because there are other limits you are likely to reach if you attempt to load lots of classes e.g. Maximum Perm Gen.
Generally speaking, developers are smart enough not to over use the string literal pool and instead using databases or file to load the bulk of their data if its a non trivial size.
You can introduce a problem if you use String.intern() a lot in an attempt to optimise the space of your system. String.intern() is not free and becomes increasingly expensive if you add a large number (millions) of string into it. If this is a performance problem, it should be reasonably obvious to the developer when it is.