Java strings are immutable, and instantiating multiple Strings with the same values returns the same object pointer. (Is there a term for this? "pooling" seems to fit, but that already refers to doing caching to save time by doing fewer instantiations.)
Does Java also do this (the thing without a term) with other (user-defined) classes that are immutable? Can Java even detect that a class is immutable, or is this something unique to the string class?
Wrt. Strings, the word you're looking for is interning.
Java won't do this for your own immutable objects. It does have cached versions of boxed primitives, though. See this article on wrapper class caching for more info.
As others here have said this process with Strings is known as interning.
Its worth mentioning that the behaviour of Strings with the same literal values being the same object may or may not be true in Java 7. From 7 onwards:
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
Take a look at Java SE 7 RFE for the full details on this.
With regards to your own immutable objects Java doesnt do anything special with them - it doesnt know that they're immutable. It may inline methods a little more than otherwise if it can detect that its worthwhile/possible, but as far at the compiler and JVM are concerned they're just another object.
The term you are lookig is itering. Java optimize strings "automatically", during compilation and give the developer possibility to do it on runtime. (The details about what is optimized when depend on JVM version.)
As far it goes for immutable objects. I do not think that Java support any type of mechanism that will resolve same instace. String type is not exeption of this rule.
Reason why, is that you have to use operator new to create a instance. If you use new to create string instance, you will always get two different objects.
The intering is avaiable only for String type. But the concept is free, you can add to your immutable class such method and write an compled method that will do the same thing.
String interning. Wikipedia: String Interning
String Interning is unique to String class only. I suppose that JVM does not apply these rules for a user defined classes.
Related
I have read oracle document but there is nothing given regarding method area and string constant pool. I have doubt that where method area, string constant pool reside in memory in JDK 8 or 8+ .
The java language specification does not specify where this lives.
It also doesn't matter. These objects end up being created, there is no way to directly access them, which doesn't matter.
That's sort of how java works: The spec says what you can and cannot rely on, this gives room to JVM implementations to do whatever they want, so long as they fulfill the contract. "Where in memory..." is a question that in java doesn't matter, you can't manipulate memory directly at all.
Go back to why you think you need to know and find another way; any answer to this question would be specific to some implementation of the JVM, and therefore your code wouldn't be portable. That is, any version update to the JVM, or some alternative JVM implementation such as OpenJ9 rolls along and your code just breaks, probably with a raw core dump. That doesn't sound like a good idea.
In Java 8 and later:
the method area is in metaspace
the string pool is in the regular heap.
This is an implementation detail for Oracle and OpenJDK JVMs. Other implementations may be different. But it really doesn't matter where strings and code is stored. Your application doesn't need to know.
By the way, it is called the "string pool", not the "string constant pool".
All strings are constant in the sense that they are immutable.
Strings variables that are declared as static final (and are constant in that sense) are not necessarily in the string pool.
Not all strings in the string pool are static final.
Not all strings in the string pool are string literals or other compile-time constant values.
What contributes to the size of a single object in memory?
I know that primitives and references would, but is there anything else?
Would the number of methods and the length of them matter?
This is completely implementation-dependent, but there are a few factors that influence object size in Java.
First, the number and types of the fields in the Java object definitely influence space usage, since you need to have at least as much storage space as is necessary to hold all of the object's fields. However, due to padding, alignment, and pointer compression optimizations, there is no direct formula you can use to compute precisely how much space is being used this way.
As for methods, typically speaking the number of methods in an object has no impact on its size. Methods are often implemented using a feature called virtual function tables (or "vtables") that make it possible to invoke methods through a base class reference in constant time. These tables are usually stored by having a single instance of the vtable shared across multiple objects, then having each object store a single pointer to the vtable.
Interface methods complicate this picture a bit, because there are several different implementations possible. One implementation adds a new vtable pointer for each interface, so the number of interfaces implemented may affect object size, while others do not. Again, it's implementation dependent how things are actually put together in memory, so you can't know for certain whether or not this will have a memory cost.
To the best of my knowledge there are no implementations of the JVM in existence today in which the length of a method influences the size of an object. Typically, only one copy of each method is stored in memory, and the code is then shared across all instances of a particular object. Having longer methods might require more total memory, but should not impact the per-object memory for instances of a class. That said, the JVM spec makes no promises that this must be the case, but I can't think of a reasonable implementation that would expend extra space per object for method code.
In addition to fields and methods, many other factors could contribute to the size of an object. Here's a few:
Depending on what type of garbage collector (or collectors) that the JVM is using, each object might have extra storage space to hold information about whether the object is live, dead, reachable, etc. This can increase storage space, but it's out of your control. In some cases, the JVM might optimize object sizes by trying to store the object on the stack instead of the heap. In this case, the overhead may not even be present for some types of objects.
If you use synchronization, the object might have extra space allocated for it so that it can be synchronized on. Some implementations of the JVM don't create a monitor for an object until it's necessary, so you may end up having smaller objects if you don't use synchronization, but you cannot guarantee that this will be the case.
Additionally, to support operators like instanceof and typecasting, each object may have some space reserved to hold type information. Typically, this is bundled with the object's vtable, but there's no guarantee that this will be true.
If you use assertions, some JVM implementations will create a field in your class that contains whether or not assertions are enabled. This is then used to disable or enable assertions at runtime. Again, this is implementation-specific, but it's good to keep in mind.
If your class is a nonstatic inner class, it may need to hold a reference to the class that contains it so that it can access its fields. However, the JVM might optimize this away if you never end up using this.
If you use an anonymous inner class, the class may need to have extra space reserved to hold the final variables that are visible in its enclosing scope so that they can be referenced inside the class. It's implementation-specific whether this information is copied over into the class fields or just stored locally on the stack, but it can increase object size.
Some implementations of Object.hashCode() or System.identityHashCode(Object) may require extra information to be stored in each object that contains the value of that hash code if it can't compute it any other way (for example, if the object can be relocated in memory). This might increase the size of each object.
To add a bit of (admittedly vague) data to #templatetypedef's excellent answer. These numbers are for typical recent 32-bit JVMs, but they are implementation specific:
The header overhead for each object typically 2 words for a regular object and 3 words for an array. The header includes GC related flags, and some kind of pointer to the object's actual class. For an array, an extra word is needed to hold the array size.
If you've called (directly or indirectly) System.identityHashCode() on an object, and it has survived a GC cycle, then add an extra word to store the hashcode value. (Modern JVMs use a clever trick to avoid reserving a hashcode header field for all objects ...)
The storage allocation granularity may be a multiple of words; e.g. 2.
Fields of an object are typically word aligned; i.e. they are not packed.
Elements of an array of a primitive type are packed, but booleans are typically represented by a byte in packed form.
References occupy 4 bytes both as fields and as array elements.
Things are a bit more complicated for 64-bit JVMs because of pointer compression (OOPS) in some JVMs. Also, I'm not sure if fields 32 or 64 bit aligned.
(Note: the above are based on what I've heard / read in various places from various "knowledgeable people". There is no definitive source for this kind of information apart from Oracle / Sun, and (AFAIK) they haven't published anything.)
Check out java.sizeOf in sourceforge here: http://sizeof.sourceforge.net/
AFAIK, in HBase source code, there is some caculation about object size based on the some common known rules how different fields occupies the spaces. And it will be different in 32bit or 64bit OS. At least above people all know. But I didn't look into details why they do that. But they really did it in the source code.
Besides,Java.lang.intrument.Intrumentation Class can do it also by getObjectSize(). I guess the open source project is also based on it.
In this link,there is details of how to use it.
In Java, what is the best way to determine the size of an object?
As a comment. Actually I am also interested in if you do it in the source code, what will be the most meaningful use case?
I was looking into this and just wondering does Java provide any construct to find out the size of an object?
Unfortunately not. It's relatively complex. e.g. if I create a String object, I have to consider:
the size of the fields of the objects. For primitives etc. that's simple
the size of objects referred to. Each member object is a reference, and not actually contained exclusively within the object under question. e.g. String contains a reference to a char array, but that char array can be shared across multiple Strings (see the source of substring() to understand how - this is known as the flyweight pattern)
the size of any native implementation details in the JVM
No. It goes against the concept of the language. In a real Object Oriented Programming language (not a hacked-together OOP support like C++), Objects are abstract concepts, not bits on the computer. Until and unless you serialize the object, it's treated like an actual object and not a sequence of bits.
Actually i think you can get the size of an object with the help of the Instrumentation class, but the process is a little more complex(you have to specify in the manifest file the premain class, define a Instrumetation agent etc.) compared to the one in C++ where you have at your use the sizeof() function.
One disadvantage is you get the size of the object but not the size of the referred objects(if it has any).
Another way would be, but this is rudimentary, to have your object under test serialised and wrote in a file and have the size of the file(use the ObjectOutputStream and get its size).For further documentation related to this subject and not only read a little bit about Java agents in general and Java probing.Probing the JVM is a very helpful technique especially if you want to make an analisys on performance(objects sizes, running threads, memory leaks etc.).
Once the String object is created , we can't modify it But if we do any operations on it JVM will create New Object. Here by creating new objects then JVM consumes more memory. Then i think it causes to memory issue right.?
You are correct. It is definitely worth being aware of this issue, even if it doesn't affect you every time.
As you say, Strings cannot change after creation - they're immutable and they don't expose many ways to change them.
However, operations such as a split() will be generating additional string objects in the background, and each of those strings have a memory overhead if you are holding onto references to them.
As the other posters note, the objects will be small and garbage collection will usually clean up the old ones after they have gone out of scope, so you generally won't have to worry about this.
However, if you're doing something specific and holding onto large amounts of string references then this could bite you.
Look at String interning depending on your use case, noting the warnings on the linked page.
Two things to note:
1) Hard coded String literals will be automatically interned by Java, reducing the impact of this.
2) The + operator is more efficient in this regard, it will use String Builders underneath giving performance & memory benefits.
No, that does not. If you do not hold strong links to String instances they eventually will be collected by a garbage collector.
For example:
while (true) {
new String("that is a string");
}
in this snippet you continuously create new object instances, however you will never get OutOfMemoryException as created instances become garbage (there are obviously no strong links).
It consumes more memory for new objects, that's right. But that fact in itself does not create an issue, because garbage collector promptly reclaims all inaccessible memory. Of course you can turn it into an issue by creating links to the newly created strings, but that would be an issue of your program, not of JVM.
The biggest memory issue you have to know about is taking a small substring of a huge string. That substring shares the original string's char array and even if the original string gets gc'd, the huge char array will still be referenced by the substring. The workaround is to use new String(hugeString.substring(i)).
The issue that is generated is the fact that garbage is generated. This issue is resolved by the virtual machine by calling the garbage collector which frees the memory used by that garbage.
As soon as the old object is not used anymore, it can be removed by the garbage collector. (Which will be done far before any memory issue arises).
If you want to prevent the copying of the data, use a StringBuilder.
Unused objects are collected by GC.
and Immutability got many benefits in java.
In Java achieving as much immutability as possible is a good practice.
They can be safely used in Collections frameworks also.
Check this
As far as I know StringBuilder (or StringBuffer for thread safe) is useful for managing String and make them mutable.
Manipulate some characters in a huge String do not 'eat' many bytes in memory.
It is also more powerful/speed for concate.
Since a string instance is immutable it can be reused by the jvm. The String class is implemented with Flyweight Design Pattern that is used to avoid memory issues.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why can't strings be mutable in Java and .NET?
Why .NET String is immutable?
Several languages have chosen for this, such as C#, Java, and Python. If it is intended to save memory or gain efficiency for operations like compare, what effect does it have on concatenation and other modifying operations?
Immutable types are a good thing generally:
They work better for concurrency (you don't need to lock something that can't change!)
They reduce errors: mutable objects are vulnerable to being changed when you don't expect it which can introduce all kinds of strange bugs ("action at a distance")
They can be safely shared (i.e. multiple references to the same object) which can reduce memory consumption and improve cache utilisation.
Sharing also makes copying a very cheap O(1) operation when it would be O(n) if you have to take a defensive copy of a mutable object. This is a big deal because copying is an incredibly common operation (e.g. whenever you want to pass parameters around....)
As a result, it's a pretty reasonable language design choice to make strings immutable.
Some languages (particularly functional languages like Haskell and Clojure) go even further and make pretty much everything immutable. This enlightening video is very much worth a look if you are interested in the benefits of immutability.
There are a couple of minor downsides for immutable types:
Operations that create a changed string like concatenation are more expensive because you need to construct new objects. Typically the cost is O(n+m) for concatenating two immutable Strings, though it can go as low as O(log (m+n)) if you use a tree-based string data structure like a Rope. Plus you can always use special tools like Java's StringBuilder if you really need to concatenate Strings efficiently.
A small change on a large string can result in the need to construct a completely new copy of the large String, which obviously increases memory consumption. Note however that this isn't usually a big issue in garbage-collected languages since the old copy will get garbage collected pretty quickly if you don't keep a reference to it.
Overall though, the advantages of immutability vastly outweigh the minor disadvantages. Even if you are only interested in performance, the concurrency advantages and cheapness of copying will in general make immutable strings much more performant than mutable ones with locking and defensive copying.
It's mainly intended to prevent programming errors. For example, Strings are frequently used as keys in hashtables. If they could change, the hashtable would become corrupted. And that's just one example where having a piece of data change while you're using it causes problems. Security is another: if you checking whether a user is allowed to access a file at a given path before executing the operation they requested, the string containing the path better not be mutable...
It becomes even more important when you're doing multithreading. Immutable data can be safely passed around between threads while mutable data causes endless headaches.
Basically, immutable data makes the code that works on it easier to reason about. Which is why purely functional languages try to keep everything immutable.
In Java not only String but all primitive Wrapper classes (Integer, Double, Character etc) are immutable. I am not sure of the exact reason but I think these are the basic data types on which all the programming schemes work. If they change, things could go wild. To be more specific, I'll use an example: Say you have opened a socket connection to a remote host. The host name would be a String and port would be Integer. What if these values are modified after the connection is established.
As far as performance is concerned, Java allocates memory to these classes from a separate memory section called Literal Pool, and not from stack or Heap. The Literal Pool is indexed and if you use a string "String" twice, they point to the same object from Literal pool.
Having strings as immutable also allows the new string references easy, as the same/similar strings will be readily available from the pool of the Strings previously created. Thereby reducing the cost of new object creation.