How String s ="sometext" works?

How String s ="sometext" works? - java

In object oriented languages, creation of new object is by using the new keyword (since memory allocation in java done dynamically).
Even though String is a class how its object is created without the new Keyword?
Even though it uses string pooling I am not able to understand it clearly:
"It is possible to create a user defined class where we can initialize variable directly like String"

The mechanism enabling you to create String objects with string literals is built into the compiler and JVM. It is not available for use with objects of user-defined types.
When you write for the first time
String s = "sometext";
the compiler emits two things:
A constant pool entry with "sometext" in it, and
An instruction that sets s to reference to the entry in the constant table.
If you write
String t = "sometext";
in the same class, the compiler will reuse an existing constant for "sometext", rather than creating a new one.
At runtime, JVM creates a new String object for each entry from the constant table, and gives your program access to them. Essentially, JVM invokes new on your program's behalf, and hands it a ready-to-use object.
Similar system is in play when you create instances of primitive wrappers with autoboxing. The common thing, however, is that it requires support from the compiler, and is not available for user-defined types.

In Java, Strings are immutable and optionally pooled ("interned").
"sometext" is just an instance of a String that comes from the static pool.
You cannot create a user-defined String because java.lang.String is a final class, exactly for the reason of immutability (you can share duplicates by pointing them to a single instance).

Related

How does the JVM lookup the String in the String constant pool? [duplicate]

This question already has answers here:
What is the Java string pool and how is "s" different from new String("s")? [duplicate]
(5 answers)
Closed 7 years ago.
I want to understand the string pool more deeply. Please help me get to the source class file containing this implementation in Java.
The question is more of related to finding the source code or implementation of the String Pool to delve deeper on this concept to know more about some unknown or elusive things in it. This way we can make the use of strings even more efficiently or think of some other way to implement our own garbage collections in case we have an application creating so many literals and string objects.

I am sorry to disappoint you but the Java String-Pool is not an actual Java class but somewhere implemented in the JVM i.e. it is writen as C++ code.
If you look at the source code of the String class (pretty much all the way down) you see that the intern() method is native.
You will have to go through some JVM code to get more information.
Edit:
Some implementation can be found here (C++ header, C++ implementation). Search for StringTable.
Edit2: As Holger pointed out in the comments, this is not a hard requirement of the JVM implementation. So it is possible to have a JVM that implements the String Pool differently, e.g. using an actual Java class. Though all commonly used JVMs I am aware of implement it in the JVMs C++ code.

You can go through this article: Strings, Literally
When a .java file is compiled into a .class file, any String literals
are noted in a special way, just as all constants are. When a class is
loaded (note that loading happens prior to initialization), the JVM
goes through the code for the class and looks for String literals.
When it finds one, it checks to see if an equivalent String is already
referenced from the heap. If not, it creates a String instance on the
heap and stores a reference to that object in the constant table. Once
a reference is made to that String object, any references to that
String literal throughout your program are simply replaced with the
reference to the object referenced from the String Literal Pool.
So, in the example shown above, there would be only one entry in the
String Literal Pool, which would refer to a String object that
contained the word "someString". Both of the local variables, one and
two, would be assigned a reference to that single String object. You
can see that this is true by looking at the output of the above
program. While the equals() method checks to see if the String objects
contain the same data ("someString"), the == operator, when used on
objects, checks for referential equality - that means that it will
return true if and only if the two reference variables refer to the
exact same object. In such a case, the references are equal. From the
above output, you can see that the local variables, one and two, not
only refer to Strings that contain the same data, they refer to the
same object.

Compile-Time By-Reference Parameters on the JVM

Currently developing on a custom programming language on the JVM, I would like the language to support by-reference parameters in methods. How would I go about doing that? So far, I was able to come up with three different ways to accomplish this.
Wrapper Objects
The idea behind this is to create a wrapper object that is created containing the current value of the field, passed to the by-ref method call, and then unboxed after the call. This is a fairly straight-forward way to do this, but requires a lot of 'garbage' objects that are created and immediately discarded.
Arrays
Simply create an array of the type with 1 element, put field value in the array, call the method passing the array and finally assign the field from the array. The nice thing about this is that it ensures runtime type-safety, other than a generic wrapper class which would require additional casts.
Unsafe
This one is slightly more advanced: Use sun.misc.Unsafe to allocate some native memory space, store the field value on that memory, call the method and pass the address, re-assign the field from the native memory address, and free it up again.
Bonus: Is it possible to implement pointers and pointer arithmetic using the Unsafe class?

Wrapper Objects
[...] but requires a lot of 'garbage' objects that are created and immediately discarded.
If the lifetime of such a wrapper is limited to a callsite (+ inlined callee) then the compiler may be able to prove that through escape analysis and avoid the allocation by decomposing the wrapper object into its primitive members and use them directly in the generated code.
That essentially requires that those reference-wrappers are never stored to fields and only passed as method arguments
Unsafe
Use sun.misc.Unsafe to allocate some native memory space, store the field value on that memory
You cannot store object-references in native memory. The garbage collector would not know about it and thus could change the memory address under your feet or GC the object if that is your only reference.
But since you're creating your own language you could simply desugar field references into object references + an offset. I.e. pass two parameters (object ref + long offset) instead of one. If you know the offset you can use Unsafe to manipulate the field.
Obviously this will only work for object fields. Local references cannot be changed this way.
Bonus: Is it possible to implement pointers and pointer arithmetic using the Unsafe class?
Yes for unmanaged memory.
For memory within the managed heap you are only allowed to point to objects themselves and do pointer arithmetic relative to the object header.
And you always must store object references in Object-typed fields. Storing them in a long would lead to GC-implementations (precise ones at least) missing the reference.
Edit: You might also be interested in ongoing work in the JDK regarding VarHandles.
It's something you probably want to keep in mind when developing your language.

It’s seems you have missed an important point about the pass-by-reference concept: whenever a write into the reference happens, the referenced variable will be updated. This is different to any concept like yours that will actually pass a copy in a holder and update the original variable upon method return.
You can notice the difference even in single-threaded use case:
foo(myField, ()-> {
// if myField is pass-by-reference, whenever foo() modifies
// it and calls this Runnable, it should see the new value:
System.out.println(myField);
});
Of course, you could make both references accessing the same wrapper, but for an environment allowing (almost) arbitrary code, it would imply that you would have to replace every reference to the field (in the end, change the contents of the field) to the wrapper.
So if you want to implement a clean, real pass-by-value mechanism within the JVM, it must be able to modify the referenced artifact, i.e. field or array slot. For local variables, there is no way to do it so there’s no way around replacing local variables with a holder object once a reference to it has been created.
So the kind of options is already known, you can pass a java.lang.reflect.Field (does not work with array slots), a pair of java.lang.invoke.MethodHandle or an arbitrary typed object (of a generated type) offering read and write access.
When implementing this reference accessor type, you can resort to Unsafe to create an anonymous class just like Java’s lambda expression facility does. If fact, you can steal inspire yourself a lot from the lambda expression mechanism:
put an invokedynamic instruction at the place where a reference has to be created, pointing to your factory method and providing a handle to the field or array slot
Let the factory analyze the handle and dynamically create the accessor implementation, the main difference being that your type will have two operations, read and write
Use Unsafe to create that class (which might access the field, even if its private)
If the field is static, create an instance and return a CallSite with a handle returning that instance
Otherwise return a CallSite with a handle pointing to the constructor of the accessor class accepting an object instance or an array
This way you will only have an overhead at the first-time usage while subsequent uses will either use singleton in the case of static fields or construct an accessor on-the-fly for instance fields and array slots. These accessor instance creation can be elided by HotSpots escape analysis if used frequently just like with ordinary objects.

use native integer data types to represent instruction and data words, rather than using dynamically allocated class-typed objects or strings

Who can explain to me what does the following sentance mean in java?
"In particular, you must use native integer data types to represent instruction and data
words, rather than using dynamically allocated class-typed objects or strings. Likewise, you should not attempt to represent
memory using a large array of words. Instead, consider a representation that allocates blocks of memory on
demand (that is, on the first read or write to an address within a block)."

Like the others, I'm not very sure what it means. But working with Java Primitive Data Types allows you to store your
represent instruction
in to the Stack memory. Maybe that is why this advice talks about String
should not attempt to represent memory using a large array of words.
since it's implementation is array of chars and by default have overridden equals() which actually compares Strings and not their references.
and
rather than using dynamically allocated class-typed objects
For the reference types there are some complications that need to be considered first:
when assigned, for references the object is not copied, it is shared (reference variables are aliases)
when compared for references the contents of the objects are not compared
assing Parameters for references the object is not copied, it is shared (i.e., actual parameter and formal parameter are aliases)
returning values, locally created object can survive if it is returned or if it is stored in a data member
Here is visualization of the memory:
maybe this is a little simple explanation, but is all I can think from this quoted text, you've provide us.

I think it refer to declare things like int instruction = 2; instead of Instruction instruction = new Instruction4OpenFile().
But you need to give us more context.

When are Java Strings interned?

Inspired by the comments on this question, I'm pretty sure that Java Strings are interned at runtime rather than compile time - surely just the fact that classes can be compiled at different times, but would still point to the same reference at runtime.
I can't seem to find any evidence to back this up. Can anyone justify this?

The optimization happens (or at least can happen) in both places:
If two references to the same string constant appear in the same class, I'd expect the class file to only contain one constant pool entry. This isn't strictly required in order to ensure that there's only one String object created in the JVM, but it's an obvious optimization to make. This isn't actually interning as such - just constant optimization.
When classes are loaded, the string pool for the class is added to the intern pool. This is "real" interning.
(I have a vague recollection that one of the bits of work for Java 7 around "small jar files" included a single string pool for the whole jar file... but I could be very wrong.)
EDIT: Section 5.1 of the JVM spec, "The Runtime Constant Pool" goes into details of this:
To derive a string literal, the Java
virtual machine examines the sequence
of characters given by the
CONSTANT_String_info structure.
If the method String.intern has
previously been called on an instance
of class String containing a sequence
of Unicode characters identical to
that given by the CONSTANT_String_info
structure, then the result of string
literal derivation is a reference to
that same instance of class String.
Otherwise, a new instance of class
String is created containing the
sequence of Unicode characters given
by the CONSTANT_String_info structure;
that class instance is the result of
string literal derivation. Finally,
the intern method of the new String
instance is invoked.

Runtime.
JLS and JVM specifications specify javac compilation to class files which contain constant declarations (in the Constant Pool) and constant usage in code (which javac can inline as primitive / object reference values). For compile-time String constants, the compiler generates code to construct String instances and to call String.intern() for them, so that the JVM interns String constants automatically. This is a behavioural requirement from JLS:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.28
Compile-time constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern.
But these specs have neither the concept nor the definition of any particular String intern pool structures/references/handles whether compile time or runtime. (Of course, in general, the JVM spec does not mandate any particular internal structure for objects: http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.7)
The reason that no intern pool structures are mentioned is because they're handled entirely with the String class. The intern pool is a private static/class-level structure of the String class (unspecified by JLS & JVM specs & javadoc).
Objects are added to the intern pool when String.intern() is called at runtime. The intern pool is leveraged privately by the String class - when code create new String instances and calls String.intern(), the String class determines whether to reuse existing internal data. Optimisation can be carried out by the JIT compiler - at runtime.
There's no compile-time contribution here, bar the vanilla inlining of constant values.

Java: Why are wrapper classes needed?

On the very high level, I know that we need to "wrap" the primitive data types, such as int and char, by using their respective wrapper classes to use them within Java collections.I would like to understand how Java collections work at the low level by asking:"why do we need to wrap primitive data types as objects to be able to use them in collections?"I thank you in advance for your help.

Because Java collections can only store Object References (so you need to box primitives to store them in collections).
Read this short article on Autoboxing for more info.
If you want the nitty gritty details, it pretty much boils down to the following:
Local Primitives are stored on the Stack. Collections store their values via a reference to an Object's memory location in the Heap. To get that reference for a local primitive, you have to box (take the value on the Stack and wrap it for storage on the Heap) the value.

At the virtual machine level, it's because primitive types are represented very differently in memory compared to reference types like java.lang.Object and its derived types. Primitive int in Java for example is just 4 bytes in memory, whereas an Object takes up at minimum 8 bytes by itself, plus another 4 bytes for referencing it. Such design is a simple reflection of the fact that CPUs can treat primitive types much more efficiently.
So one answer to your question "why wrapper types are needed" is because of performance improvement that it enables.
But for programmers, such distinction adds some undesirable cognitive overhead (e.g., can't use int and float in collections.) In fact, it's quite possible to do a language design by hiding that distinction --- many scripting languages do this, and CLR does that. Starting 1.5, Java does that, too. This is achieved by letting the compiler silently insert necessary conversion between primitive representation and Object representation (which is commonly referred to as boxing/unboxing.)
So another answer to your question is, "no, we don't need it", because the compiler does that automatically for you, and to certain extent you can forget what's going on behind the scene.

Read all of the answers, but none of them really explains it simply in layman terms.
A wrapper class wraps(encloses) around a data type (can be any primitive data type such as int, char, byte, long) and makes it an object.
Here are a few reasons why wrapper classes are needed:
Allows null values.
Can be used in collection such as List, Map, etc.
Can be used in methods which accepts arguments of Object type.
Can be created like Objects using new ClassName() like other objects:
Integer wrapperInt = new Integer("10");
Makes available all the functions that Object class has such as clone(), equals(), hashCode(), toString() etc.
Wrapper classes can be created in two ways:
Using constructor:
Integer i = new Integer("1"); //new object is created
Using valueOf() static method:
Integer i = Integer.valueOf("100"); //100 is stored in variable
It is advised to use the second way of creating wrapper classes as it takes less memory as a new object is not created.

To store the Primitive type values in Collection. We require Wrapper classes.

Primitive data types can't be referenced as memory addresses. That's why we need wrappers which serve as placeholders for primitive values. These values then can be mutated and accessed, reorganized, sorted or randomized.

Collection uses Generics as the bases. The Collection Framework is designed to collect, store and manipulate the data of any class. So it uses generic type. By using Generics it is capable of storing the data of ANY CLASS whose name you specify in its declaration.
Now we have various scenario in which want to store the primitive data in the same manner in which the collection works. We have no way to store primitive data using Collection classes like ArrayList, HashSet etc because Collection classes can store objects only. So for storing primitive types in Collection we are provided with wrapper classes.
Edit:
Another benefit of having wrapper classes is that absence of an object can be treated as "no data". In case of primitive, you will always have a value.
Say we have method signature as
public void foo(String aString, int aNumber)
you can't make aNumber as optional in above method signature.
But if you make signature like:
public void foo(String aString, Integer aNumber)
you have now made aNumber as optional since user can pass null as a value.

See Boxing and unboxing: when does it come up?
It's for C#, but the same concept apply to Java. And John Skeet wrote the answer.

Well, the reason is because Java collections doesn't differentiate between primitive and Object. It processes them all as Object and therefore, it will need a wrapper. You can easily build your own collection class that doesn't need wrapper, but at the end, you will have to build one for each type char, int, float, double, etc multiply by the types of the collections (Set, Map, List, + their implementation).
Can you imagine how boring that is?
And the fact is, the performance it brings by using no wrapper is almost negligible for most applications. Yet if you need very high performance, some libraries for primitive collections are also available (e.g. http://www.joda.org/joda-primitives/)

Wrapper classes provide useful methods related to corresponding data types which you can make use of in certain cases.
One simple example. Consider this,
Integer x=new Integer(10);
//to get the byte value of 10
x.byteValue();
//but you can't do this,
int x=10;
x.byteValue(); //Wrong!
can you get the point?

If a variable is known to either hold a specific bit pattern representing null or else information which can be used to locate a Java Virtual Machine object header, and if the method for reading an object header given a reference will inherently trap if given the bit pattern associated with null, then the JVM can access the object identified by the variable on the assumption that there is one. If a variable could hold something which wasn't a valid reference but wasn't the specific null bit pattern, any code which tried to use that variable would have to first check whether it identified an object. That would greatly slow down the JVM.
If Object derived from Anything, and class objects derived from Object, but primitives inherited from a different class derived from Anything, then in a 64-bit implementation it might be practical to say that about 3/4 of the possible bit patterns would represent double values below 2^512, 1/8 of them to represent long values in the range +/- 1,152,921,504,606,846,975, a few billion to represent any possible value of any other primitve, and the 1/256 to identify objects. Many kinds of operations on things of type Anything would be slower than with type Object, but such operations would not be terribly frequent; most code would end up casting Anything to some more specific type before trying to work with it; the actual type stored in the Anything would need to be checked before the cast, but not after the cast was performed. Absent a distinction between a variable holding a reference to a heap type, however, versus one holding "anything", there would be no way to avoid having the overhead extend considerably further than it otherwise would or should.

Much like the String class, Wrappers provide added functionality and enable the programmer to do a bit more with the process of data storage. So in the same way people use the String class like....
String uglyString = "fUbAr";
String myStr = uglyString.toLower();
so too, they can with the Wrapper. Similar idea.
This is in addition to the typing issue of collections/generics mentioned above by Bharat.

because int does not belongs any class .
we convert datatype(int) to object(Interger)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.