Compile-Time By-Reference Parameters on the JVM

Compile-Time By-Reference Parameters on the JVM - java

Currently developing on a custom programming language on the JVM, I would like the language to support by-reference parameters in methods. How would I go about doing that? So far, I was able to come up with three different ways to accomplish this.
Wrapper Objects
The idea behind this is to create a wrapper object that is created containing the current value of the field, passed to the by-ref method call, and then unboxed after the call. This is a fairly straight-forward way to do this, but requires a lot of 'garbage' objects that are created and immediately discarded.
Arrays
Simply create an array of the type with 1 element, put field value in the array, call the method passing the array and finally assign the field from the array. The nice thing about this is that it ensures runtime type-safety, other than a generic wrapper class which would require additional casts.
Unsafe
This one is slightly more advanced: Use sun.misc.Unsafe to allocate some native memory space, store the field value on that memory, call the method and pass the address, re-assign the field from the native memory address, and free it up again.
Bonus: Is it possible to implement pointers and pointer arithmetic using the Unsafe class?

Wrapper Objects
[...] but requires a lot of 'garbage' objects that are created and immediately discarded.
If the lifetime of such a wrapper is limited to a callsite (+ inlined callee) then the compiler may be able to prove that through escape analysis and avoid the allocation by decomposing the wrapper object into its primitive members and use them directly in the generated code.
That essentially requires that those reference-wrappers are never stored to fields and only passed as method arguments
Unsafe
Use sun.misc.Unsafe to allocate some native memory space, store the field value on that memory
You cannot store object-references in native memory. The garbage collector would not know about it and thus could change the memory address under your feet or GC the object if that is your only reference.
But since you're creating your own language you could simply desugar field references into object references + an offset. I.e. pass two parameters (object ref + long offset) instead of one. If you know the offset you can use Unsafe to manipulate the field.
Obviously this will only work for object fields. Local references cannot be changed this way.
Bonus: Is it possible to implement pointers and pointer arithmetic using the Unsafe class?
Yes for unmanaged memory.
For memory within the managed heap you are only allowed to point to objects themselves and do pointer arithmetic relative to the object header.
And you always must store object references in Object-typed fields. Storing them in a long would lead to GC-implementations (precise ones at least) missing the reference.
Edit: You might also be interested in ongoing work in the JDK regarding VarHandles.
It's something you probably want to keep in mind when developing your language.

It’s seems you have missed an important point about the pass-by-reference concept: whenever a write into the reference happens, the referenced variable will be updated. This is different to any concept like yours that will actually pass a copy in a holder and update the original variable upon method return.
You can notice the difference even in single-threaded use case:
foo(myField, ()-> {
// if myField is pass-by-reference, whenever foo() modifies
// it and calls this Runnable, it should see the new value:
System.out.println(myField);
});
Of course, you could make both references accessing the same wrapper, but for an environment allowing (almost) arbitrary code, it would imply that you would have to replace every reference to the field (in the end, change the contents of the field) to the wrapper.
So if you want to implement a clean, real pass-by-value mechanism within the JVM, it must be able to modify the referenced artifact, i.e. field or array slot. For local variables, there is no way to do it so there’s no way around replacing local variables with a holder object once a reference to it has been created.
So the kind of options is already known, you can pass a java.lang.reflect.Field (does not work with array slots), a pair of java.lang.invoke.MethodHandle or an arbitrary typed object (of a generated type) offering read and write access.
When implementing this reference accessor type, you can resort to Unsafe to create an anonymous class just like Java’s lambda expression facility does. If fact, you can steal inspire yourself a lot from the lambda expression mechanism:
put an invokedynamic instruction at the place where a reference has to be created, pointing to your factory method and providing a handle to the field or array slot
Let the factory analyze the handle and dynamically create the accessor implementation, the main difference being that your type will have two operations, read and write
Use Unsafe to create that class (which might access the field, even if its private)
If the field is static, create an instance and return a CallSite with a handle returning that instance
Otherwise return a CallSite with a handle pointing to the constructor of the accessor class accepting an object instance or an array
This way you will only have an overhead at the first-time usage while subsequent uses will either use singleton in the case of static fields or construct an accessor on-the-fly for instance fields and array slots. These accessor instance creation can be elided by HotSpots escape analysis if used frequently just like with ordinary objects.

Related

use native integer data types to represent instruction and data words, rather than using dynamically allocated class-typed objects or strings

Who can explain to me what does the following sentance mean in java?
"In particular, you must use native integer data types to represent instruction and data
words, rather than using dynamically allocated class-typed objects or strings. Likewise, you should not attempt to represent
memory using a large array of words. Instead, consider a representation that allocates blocks of memory on
demand (that is, on the first read or write to an address within a block)."

Like the others, I'm not very sure what it means. But working with Java Primitive Data Types allows you to store your
represent instruction
in to the Stack memory. Maybe that is why this advice talks about String
should not attempt to represent memory using a large array of words.
since it's implementation is array of chars and by default have overridden equals() which actually compares Strings and not their references.
and
rather than using dynamically allocated class-typed objects
For the reference types there are some complications that need to be considered first:
when assigned, for references the object is not copied, it is shared (reference variables are aliases)
when compared for references the contents of the objects are not compared
assing Parameters for references the object is not copied, it is shared (i.e., actual parameter and formal parameter are aliases)
returning values, locally created object can survive if it is returned or if it is stored in a data member
Here is visualization of the memory:
maybe this is a little simple explanation, but is all I can think from this quoted text, you've provide us.

I think it refer to declare things like int instruction = 2; instead of Instruction instruction = new Instruction4OpenFile().
But you need to give us more context.

Create a new object every time or keep a single one

We use primitive types without considering constructors and destructors. It may be because of that, most of them are stored in the stack. We also use struct like float3 for primitive types. We may also make the same for classes. At the beginning of a function, create a new instance, use it and release the memory at the end of the function.
Instead of using a local variable, if we declared an instance variable at the class level, the variable will exist until the class that holds it is released. This increases the steady memory usage. Further, there should be some update methods that are forwarded to this instance. For example, changing container size may affect the content, so a new size should be forwarded to them.
How should a class keep a reference to a variable to avoid creating the variable numerous times?
I know it is related to the number of times its constructor (or destructor) is called, but I am looking for a general solution. Such as, if the class contains only primitives like x, y, z and they are immutable you should construct them always etc.
A way of deciding which way to choose, making float3 immutable or making its x, y, and z modifiable.

If you use primitive types, there is likely no difference.
If you use objects of some more "complicated" types, you will probably have to reset it to a known state before reusing it. This might take at least as much code as creating a new object. It also complicates your code, which is never an advantage.
Unless you notice a particular bottleneck in your code, you should try to keep it simple and easy to read. Don't complicate things until you absolutely have to.

typically you'll want to minimize the scope of a variable to improve performance. also, in Java, always prefer primitives to their wrapper class equivalents.

Java: Why are wrapper classes needed?

On the very high level, I know that we need to "wrap" the primitive data types, such as int and char, by using their respective wrapper classes to use them within Java collections.I would like to understand how Java collections work at the low level by asking:"why do we need to wrap primitive data types as objects to be able to use them in collections?"I thank you in advance for your help.

Because Java collections can only store Object References (so you need to box primitives to store them in collections).
Read this short article on Autoboxing for more info.
If you want the nitty gritty details, it pretty much boils down to the following:
Local Primitives are stored on the Stack. Collections store their values via a reference to an Object's memory location in the Heap. To get that reference for a local primitive, you have to box (take the value on the Stack and wrap it for storage on the Heap) the value.

At the virtual machine level, it's because primitive types are represented very differently in memory compared to reference types like java.lang.Object and its derived types. Primitive int in Java for example is just 4 bytes in memory, whereas an Object takes up at minimum 8 bytes by itself, plus another 4 bytes for referencing it. Such design is a simple reflection of the fact that CPUs can treat primitive types much more efficiently.
So one answer to your question "why wrapper types are needed" is because of performance improvement that it enables.
But for programmers, such distinction adds some undesirable cognitive overhead (e.g., can't use int and float in collections.) In fact, it's quite possible to do a language design by hiding that distinction --- many scripting languages do this, and CLR does that. Starting 1.5, Java does that, too. This is achieved by letting the compiler silently insert necessary conversion between primitive representation and Object representation (which is commonly referred to as boxing/unboxing.)
So another answer to your question is, "no, we don't need it", because the compiler does that automatically for you, and to certain extent you can forget what's going on behind the scene.

Read all of the answers, but none of them really explains it simply in layman terms.
A wrapper class wraps(encloses) around a data type (can be any primitive data type such as int, char, byte, long) and makes it an object.
Here are a few reasons why wrapper classes are needed:
Allows null values.
Can be used in collection such as List, Map, etc.
Can be used in methods which accepts arguments of Object type.
Can be created like Objects using new ClassName() like other objects:
Integer wrapperInt = new Integer("10");
Makes available all the functions that Object class has such as clone(), equals(), hashCode(), toString() etc.
Wrapper classes can be created in two ways:
Using constructor:
Integer i = new Integer("1"); //new object is created
Using valueOf() static method:
Integer i = Integer.valueOf("100"); //100 is stored in variable
It is advised to use the second way of creating wrapper classes as it takes less memory as a new object is not created.

To store the Primitive type values in Collection. We require Wrapper classes.

Primitive data types can't be referenced as memory addresses. That's why we need wrappers which serve as placeholders for primitive values. These values then can be mutated and accessed, reorganized, sorted or randomized.

Collection uses Generics as the bases. The Collection Framework is designed to collect, store and manipulate the data of any class. So it uses generic type. By using Generics it is capable of storing the data of ANY CLASS whose name you specify in its declaration.
Now we have various scenario in which want to store the primitive data in the same manner in which the collection works. We have no way to store primitive data using Collection classes like ArrayList, HashSet etc because Collection classes can store objects only. So for storing primitive types in Collection we are provided with wrapper classes.
Edit:
Another benefit of having wrapper classes is that absence of an object can be treated as "no data". In case of primitive, you will always have a value.
Say we have method signature as
public void foo(String aString, int aNumber)
you can't make aNumber as optional in above method signature.
But if you make signature like:
public void foo(String aString, Integer aNumber)
you have now made aNumber as optional since user can pass null as a value.

See Boxing and unboxing: when does it come up?
It's for C#, but the same concept apply to Java. And John Skeet wrote the answer.

Well, the reason is because Java collections doesn't differentiate between primitive and Object. It processes them all as Object and therefore, it will need a wrapper. You can easily build your own collection class that doesn't need wrapper, but at the end, you will have to build one for each type char, int, float, double, etc multiply by the types of the collections (Set, Map, List, + their implementation).
Can you imagine how boring that is?
And the fact is, the performance it brings by using no wrapper is almost negligible for most applications. Yet if you need very high performance, some libraries for primitive collections are also available (e.g. http://www.joda.org/joda-primitives/)

Wrapper classes provide useful methods related to corresponding data types which you can make use of in certain cases.
One simple example. Consider this,
Integer x=new Integer(10);
//to get the byte value of 10
x.byteValue();
//but you can't do this,
int x=10;
x.byteValue(); //Wrong!
can you get the point?

If a variable is known to either hold a specific bit pattern representing null or else information which can be used to locate a Java Virtual Machine object header, and if the method for reading an object header given a reference will inherently trap if given the bit pattern associated with null, then the JVM can access the object identified by the variable on the assumption that there is one. If a variable could hold something which wasn't a valid reference but wasn't the specific null bit pattern, any code which tried to use that variable would have to first check whether it identified an object. That would greatly slow down the JVM.
If Object derived from Anything, and class objects derived from Object, but primitives inherited from a different class derived from Anything, then in a 64-bit implementation it might be practical to say that about 3/4 of the possible bit patterns would represent double values below 2^512, 1/8 of them to represent long values in the range +/- 1,152,921,504,606,846,975, a few billion to represent any possible value of any other primitve, and the 1/256 to identify objects. Many kinds of operations on things of type Anything would be slower than with type Object, but such operations would not be terribly frequent; most code would end up casting Anything to some more specific type before trying to work with it; the actual type stored in the Anything would need to be checked before the cast, but not after the cast was performed. Absent a distinction between a variable holding a reference to a heap type, however, versus one holding "anything", there would be no way to avoid having the overhead extend considerably further than it otherwise would or should.

Much like the String class, Wrappers provide added functionality and enable the programmer to do a bit more with the process of data storage. So in the same way people use the String class like....
String uglyString = "fUbAr";
String myStr = uglyString.toLower();
so too, they can with the Wrapper. Similar idea.
This is in addition to the typing issue of collections/generics mentioned above by Bharat.

because int does not belongs any class .
we convert datatype(int) to object(Interger)

Why do some languages need Boxing and Unboxing?

This is not a question of what is boxing and unboxing,
it is rather why do languages like Java and C# need that ?
I am greatly familiar wtih C++, STL and Boost.
In C++ I could write something like this very easily,
std::vector<double> dummy;
I have some experience with Java, but I was really surprised because I had to write something like this,
ArrayList<Double> dummy = new ArrayList<Double>();
My question, why should it be an Object, what is so hard technically to include primitive types when talking about Generics ?

what is so hard technically to include primitive types when talking about Generics ?
In Java's case, it's because of the way generics work. In Java, generics are a compile-time trick, that prevents you from putting an Image object into an ArrayList<String>. However, Java's generics are implemented with type erasure: the generic type information is lost during run-time. This was for compatibility reasons, because generics were added fairly late in Java's life. This means that, run-time, an ArrayList<String> is effectively an ArrayList<Object> (or better: just ArrayList that expects and returns Object in all of its methods) that automatically casts to String when you retrieve a value.
But since int doesn't derive from Object, you can't put it in an ArrayList that expects (at runtime) Object and you can't cast an Object to int either. This means that the primitive int must be wrapped into a type that does inherit from Object, like Integer.
C# for example, works differently. Generics in C# are also enforced at runtime and no boxing is required with a List<int>. Boxing in C# only happens when you try to store a value type like int in a reference type variable like object. Since int in C# inherits from Object in C#, writing object obj = 2 is perfectly valid, however the int will be boxed, which is done automatically by the compiler (no Integer reference type is exposed to the user or anything).

Boxing and unboxing are a necessity born out of the way that languages (like C# and Java) implement their memory allocation strategies.
Certain types are allocated on the stack and other on the heap. In order to treat a stack-allocated type as a heap-allocated type, boxing is required to move the stack-allocated type onto the heap. Unboxing is the reverse processes.
In C# stack-allocated types are called value types (e.g. System.Int32 and System.DateTime) and heap-allocated types are called reference types (e.g. System.Stream and System.String).
In some cases it is advantageous to be able to treat a value type like a reference type (reflection is one example) but in most cases, boxing and unboxing are best avoided.

I believe this is also because primitives do not inherit from Object. Suppose you have a method that wants to be able to accept anything at all as the parameter, eg.
class Printer {
public void print(Object o) {
...
}
}
You may need to pass a simple primitive value to that method, like:
printer.print(5);
You would be able to do that without boxing/unboxing, because 5 is a primitive and is not an Object. You could overload the print method for each primitive type to enable such functionality, but it's a pain.

I can only tell you for Java why it doesn't support primitve types in generics.
First there was the problem that the question to support this everytime brought on the discussion if java should even have primitive types. Which of course hindered the discussion of the actual question.
Second the main reason not to include it was that they wanted binary backward compatibility so it would run unmodified on a VM not aware of generics. This backward compatibility/migration compatibility reason is also why now the Collections API supports generics and stayed the same and there isn't (as in C# when they introduced generics) a complete new set of a generic aware Collection API.
The compatibility was done using ersure (generic type parameter info removed at compile time) which is also the reason you get so many unchecked cast warnings in java.
You could still add reified generics but it's not that easy. Just adding the type info add runtime instead of removing it won't work as it breaks source & binary compatibility (you can't continue to use raw types and you can't call existing compiled code because they don't have the corresponding methods).
The other approach is the one C# chose: see above
And automated autoboxing/unboxing wasn't supported for this use case because autoboxing costs too much.
Java theory and practice: Generics gotchas

Every non-array non-string object stored on the heap contains an 8- or 16-byte header (sizes for 32/64-bit systems), followed by the contents of that object's public and private fields. Arrays and strings have the above header, plus some more bytes defining the length of the array and size of each element (and possibly the number of dimensions, length of each extra dimension, etc.), followed by all of the fields of the first element, then all the fields of the second, etc. Given an reference to an object, the system can easily examine the header and determine what type it is.
Reference-type storage locations hold a four- or eight-byte value which uniquely identifies an object stored on the heap. In present implementations, that value is a pointer, but it's easier (and semantically equivalent) to think of it as an "object ID".
Value-type storage locations hold the contents of the value type's fields, but do not have any associated header. If code declares a variable of type Int32, there's no need to need to store information with that Int32 saying what it is. The fact that that location holds an Int32 is effectively stored as part of the program, and so it doesn't have to be stored in the location itself. This an represent a big savings if, e.g., one has a million objects each of which have a field of type Int32. Each of the objects holding the Int32 has a header which identifies the class that can operate it. Since one copy of that class code can operate on any of the million instances, having the fact that the field is an Int32 be part of the code is much more efficient than having the storage for every one of those fields include information about what it is.
Boxing is necessary when a request is made to pass the contents of a value-type storage location to code which doesn't know to expect that particular value type. Code which expects objects of unknown type can accept a reference to an object stored on the heap. Since every object stored on the heap has a header identifying what type of object it is, code can use that header whenever it's necessary to use an object in a way which would require knowing its type.
Note that in .net, it is possible to declare what are called generic classes and methods. Each such declaration automatically generates a family of classes or methods which are identical except fort he type of object upon which they expect to act. If one passes an Int32 to a routine DoSomething<T>(T param), that will automatically generate a version of the routine in which every instance of type T is effectively replaced with Int32. That version of the routine will know that every storage location declared as type T holds an Int32, so just as in the case where a routine was hard-coded to use an Int32 storage location, it will not be necessary to store type information with those locations themselves.

In Java and C# (unlike C++) everything extends Object, so collection classes like ArrayList can hold Object or any of its descendants (basically anything).
For performance reasons, however, primitives in java, or value types in C#, were given a special status. They are not object. You cannot do something like (in Java):
7.toString()
Even though toString is a method on Object. In order to bridge this nod to performance, equivalent objects were created. AutoBoxing removes the boilerplate code of having to put a primitive in its wrapper class and take it out again, making the code more readable.
The difference between value types and objects in C# is more grey. See here about how they are different.

General Question: Java has the heap and local stack. Can you access any object from the heap?

I was really looking at the differences between pass by value and how Java allocates objects and what java does to put objects on the stack.
Is there anyway to access objects allocated on the heap? What mechanisms does java enforce to guarantee that the right method can access the right data off the heap?
It seems like if you were crafty and maybe even manipulate the java bytecode during runtime, that you might be able to manipulate data off the heap when you aren't supposed to?

There is no instruction in the JVM instruction set that gives arbitrary access to the heap. Hence, bytecode manipulation will not help you here.
The JVM also has a verifier. It checks the code of every method (as a class is being loaded) to verify that the method does not try to pop more values off the execution stack than what it had pushed onto it. This ensures that a method cannot "see" the objects pointed by its calling method.
Finally, local variables are stored in a per-method array (known as the "local variables array"). Again, the verifier makes sure that every read/write instruction from-/to- that array specifies an index that is less than the size of the array. Note that these JVM instructions can only specify a constant index. They cannot take a computed value and use it as an index.
So to recap, the answer is No.

All objects in Java are located on the heap. I'm not quite sure what you mean by "access objects from the heap". The only things stored on the stack are the list of functions which called into the current context and their local variables and parameters. All local variables and parameters are either primitive types or references.
If you allocate an object using new (which is the only way to allocate non-primitive types; yes this includes array types), then the object is allocated on the heap, and a reference to that object is stored on either the stack or the heap, depending on if the reference is stored in a local variable/parameter or as a member of another object.
When passed as parameters to functions, all objects are passed by reference - if the function modifies the parameter, the original object is also modified. Identically, one could also say that the object references are passed by value - if you change a parameter to refer to a new object, it will continue to refer to that object for the duration of the function, but the original object which was passed in will still refer to whatever it referred to before. Primitive types are also passed by value.

Regarding objects on the stack, it is only the new Java 6 VM from SUN (and perhaps some others) that will try to optimize byte code by putting objects on the stack. Typically, all objects will go into the heap. For reference, check out: http://www.ibm.com/developerworks/java/library/j-jtp09275.html
Also the JVM spec is at http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html#6348. The JVM protects its heap by simply not giving you instructions needed to corrupt it. Flaws in JVM implementations may cause your mileage to vary.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.