Making generic calls with JAVA JNI and C++ - java

I am working with JNI and I have to pass in some generic types to the C++. I am stuck with how to approach this on the C++ side
HashMap<String, Double[]> data1 ;
ArrayList<ArrayList<String>> disc ;
I am new to JNI and looked around but could not find much help. Can some one help me how to write JNI code for this please. Any reference to material on the net would be very helpful too.

Short answer: You cannot.
Long answer: Type Erasure : http://download.oracle.com/javase/tutorial/java/generics/erasure.html
Consider a parametrized instance of ArrayList<Integer>. At compile time, the compiler checks that you are not putting anything but things compatible to Integer in the array list instance.
However, also at compile time (and after syntactic checking), the compiler strips the type parameter, rendering ArrayList<Integer> into Arraylist<?> which is equivalent to ArrayList<Object> or simply ArrayList (as in pre JDK 5 times.)
The later form is what JNI expects (because of historical reasons as well as due to the way generics are implemented in Java... again, type erasure.)
Remember, an ArrayList<Integer> is-a ArrayList. So you can pass an ArrayList<Integer> to JNI wherever it expects an ArrayList. The opposite is not necessarily true as you might get something out of JNI that is not upwards compatible with your nicely parametrized generics.
At this point, you are crossing a barrier between a typed, parametrized domain (your generics) and an untyped one (JNI). You have to encapsulate that barrier pretty nicely, and you have to add glue code and error checking/error handling code to detect when/if things don't convert well.

The runtime signature is just plain HashMap and ArrayList - Generics are a compile-time thing.
You can use javah to generate a C header file with correct signatures for native functions.

It depends on what you're trying to map to and if they are yours to change.
Here are a few directions I'd try to go about (if i were you, that is :) ):
Using SWIG templates (related SO question) or TypeMaps.
Doing some reflection magic to be used against your-own-custom-generic-data-passing native API (haven't figured the details out, but if you want to follow on it, tell what you've got on the C++ side).
This has been asked before and you might want to resort to Luis' arrays solution.

Related

Compiling Java Generics with Wildcards to C++ Templates

I am trying to build a Java to C++ trans-compiler (i.e. Java code goes in, semantically "equivalent" (more or less) C++ code comes out).
Not considering garbage collection, the languages are quite familiar, so the overall process works quite well already. One issue, however, are generics which do not exist in C++. Of course, the easiest way would be to perform erasure as done by the java compiler. However, the resulting C++ code should be nice to handle, so it would be good if I would not lose generic type information, i.e., it would be good, if the C++ code would still work with List<X> instead of List. Otherwise, the C++ code would need explicit casting everywhere where such generics are used. This is bug-prone and inconvenient.
So, I am trying to find a way to somehow get a better representation for generics. Of course, templates seem to be a good candidate. Although they are something completely different (metaprogramming vs. compile-time only type enhancement), they could still be useful. As long as no wildcards are used, just compiling a generic class to a template works reasonably well. However, as soon as wildcards come into play, things get really messy.
For example, consider the following java constructor of a list:
class List<T>{
List(Collection<? extends T> c){
this.addAll(c);
}
}
//Usage
Collection<String> c = ...;
List<Object> l = new List<Object>(c);
how to compile this? I had the idea of using chainsaw reinterpret cast between templates. Then, the upper example could be compiled like that:
template<class T>
class List{
List(Collection<T*> c){
this.addAll(c);
}
}
//Usage
Collection<String*> c = ...;
List<Object*> l = new List<Object*>(reinterpret_cast<Collection<Object*>>(c));
however, the question is whether this reinterpret cast produces the expected behaviour. Of course, it is dirty. But will it work? Usually, List<Object*> and List<String*> should have the same memory layout, as their template parameter is only a pointer. But is this guaranteed?
Another solution I thought of would be replacing methods using wildcards by template methods which instanciate each wildcard parameter, i.e., compile the constructor to
template<class T>
class List{
template<class S>
List(Collection<S*> c){
this.addAll(c);
}
}
of course, all other methods involving wildcards, like addAll would then also need template parameters. Another problem with this approach would be handling wildcards in class fields for example. I cannot use templates here.
A third approach would be a hybrid one: A generic class is compiled to a template class (call it T<X>) and an erased class (call it E). The template class T<X> inherits from the erased class E so it is always possible to drop genericity by upcasting to E. Then, all methods containing wildcards would be compiled using the erased type while others could retain the full template type.
What do you think about these methods? Where do you see the dis-/advantages of them?
Do you have any other thoughts of how wildcards could be implemented as clean as possible while keeping as much generic information in the code as possible?
Not considering garbage collection, the languages are quite familiar, so the overall process works quite well already.
No. While the two languages actually look rather similar, they are significantly different as to "how things are done". Such 1:1 trans-compilations as you are attempting will result in terrible, underperforming, and most likely faulty C++ code, especially if you are looking not at a stand-alone application, but at something that might interface with "normal", manually-written C++.
C++ requires a completely different programming style from Java. This begins with not having all types derive from Object, touches on avoiding new unless absolutely necessary (and then restricting it to constructors as much as possible, with the corresponding delete in the destructor - or better yet, follow Potatoswatter's advice below), and doesn't end at "patterns" like making your containers STL-compliant and passing begin- and end-iterators to another container's constructor instead of the whole container. I also didn't see const-correctness or pass-by-reference semantics in your code.
Note how many of the early Java "benchmarks" claimed that Java was faster than C++, because Java evangelists took Java code and translated it to C++ 1:1, just like you are planning to do. There is nothing to be won by such transcompilation.
An approach you haven't discussed is to handle generic wildcards with a wrapper class template. So, when you see Collection<? extends T>, you replace it with an instantiation of your template that exposes a read-only[*] interface like Collection<T> but wraps an instance of Collection<?>. Then you do your type erasure in this wrapper (and others like it), which means the resulting C++ is reasonably nice to handle.
Your chainsaw reinterpret_cast is not guaranteed to work. For instance if there's multiple inheritance in String, then it's not even possible in general to type-pun a String* as an Object*, because the conversion from String* to Object* might involve applying an offset to the address (more than that, with virtual base classes)[**]. I expect you'll use multiple inheritance in your C++-from-Java code, for interfaces. OK, so they'll have no data members, but they will have virtual functions, and C++ makes no special allowance for what you want. I think with standard-layout classes you could probably reinterpret the pointers themselves, but (a) that's too strong a condition for you, and (b) it still doesn't mean you can reinterpret the collection.
[*] Or whatever. I forget the details of how the wildcards work in Java, but whatever's supposed to happen when you try to add a T to a List<? extends T>, and the T turns out not to be an instance of ?, do that :-) The tricky part is auto-generating the wrapper for any given generic class or interface.
[**] And because strict aliasing forbids it.
If the goal is to represent Java semantics in C++, then do so in the most direct way. Do not use reinterpret_cast as its purpose is to defeat the native semantics of C++. (And doing so between high-level types almost always results in a program that is allowed to crash.)
You should be using reference counting, or a similar mechanism such as a custom garbage collector (although that sounds unlikely under the circumstances). So these objects will all go to the heap anyway.
Put the generic List object on the heap, and use a separate class to access that as a List<String> or whatever. This way, the persistent object has the generic type that can handle any ill-formed means of accessing it that Java can express. The accessor class contains just a pointer, which you already have for reference counting (i.e. it subclasses the "native" reference, not an Object for the heap), and exposes the appropriately downcasted interface. You might even be able to generate the template for the accessor using the generics source code. If you really want to try.

Are Java generics really this clumsy? Why?

Bear with me for a while. I know this sounds subjective and argumentative for a while, but I swear there is a question mark at the end, and that the question can actually be answered in an objective way...
Coming from a .NET and C# background, I have during recent years been spoiled with the syntactic sugar that generics combined with extension methods provide in many .NET solutions to common problems. One of the key features that make C# generics so extremely powerful is the fact that if there is enough information elsewhere, the compiler can infer the type arguments, so I almost never have to write them out. You don't have to write many lines of code before you realize how many keystrokes you save on that. For example, I can write
var someStrings = new List<string>();
// fill the list with a couple of strings...
var asArray = someStrings.ToArray();
and C# will just know that I mean the first var to be List<string>, the second one to be string[] and that .ToArray() really is .ToArray<string>().
Then I come to Java.
I have understood enough about Java generics to know that they are fundamentally different, above else in the fact that the compiler doesn't actually compile to generic code - it strips the type arguments and makes it work anyway, in some (quite complicated) way (that I haven't really understood yet). But even though I know generics in Java is fundamentally different, I can't understand why constructs like these are necessary:
ArrayList<String> someStrings = new ArrayList<String>();
// fill the list with a couple of strings...
String[] asArray = someStrings.toArray(new String[0]); // <-- HERE!
Why on earth must I instantiate a new String[], with no elements in it, that won't be used for anything, for the Java compiler to know that it is String[] and not any other type of array I want?
I realize that this is the way the overload looks, and that toArray() currently returns an Object[] instead. But why was this decision made when this part of Java was invented? Why is this design better than, say, skipping the .toArray() that returns Object[]overload entirely and just have a toArray() that returns T[]? Is this a limitation in the compiler, or in the imagination of the designers of this part of the framework, or something else?
As you can probably tell from my extremely keen interest in things of the utmost unimportance, I haven't slept in a while...
No, most of these reasons are wrong. It has nothing to do with "backward compatibility" or anything like that. It's not because there's a method with a return type of Object[] (many signatures were changed for generics where appropriate). Nor is it because taking an array will save it from reallocating an array. They didn't "leave it out by mistake" or made a bad design decision. They didn't include a T[] toArray() because it can't be written with the way arrays work and the way type erasure works in generics.
It is entirely legal to declare a method of List<T> to have the signature T[] toArray(). However, there is no way to correctly implement such a method. (Why don't you give it a try as an exercise?)
Keep in mind that:
Arrays know at runtime the component type they were created with. Insertions into the array are checked at runtime. And casts from more general array types to more specific array types are checked at runtime. To create an array, you must know the component type at runtime (either using new Foo[x] or using Array.newInstance()).
Objects of generic (parameterized) types don't know the type parameters they were created with. The type parameters are erased to their erasure (lower bound), and only those are checked at runtime.
Therefore you can't create an array of a type parameter component type, i.e. new T[...].
In fact, if Lists had a method T[] toArray(), then generic array creation (new T[n]), which is not possible currently, would be possible:
List<T> temp = new ArrayList<T>();
for (int i = 0; i < n; i++)
temp.add(null);
T[] result = temp.toArray();
// equivalent to: T[] result = new T[n];
Generics are just a compile-time syntactic sugar. Generics can be added or removed with changing a few declarations and adding casts and stuff, without affecting the actual implementation logic of the code. Let's compare the 1.4 API and 1.5 API:
1.4 API:
Object[] toArray();
Object[] toArray(Object[] a);
Here, we just have a List object. The first method has a declared return type of Object[], and it creates an object of runtime class Object[]. (Remember that compile-time (static) types of variables and runtime (dynamic) types of objects are different things.)
In the second method, suppose we create a String[] object (i.e. new String[0]) and pass that to it. Arrays have a subtyping relationship based on the subtyping of their component types, so String[] is a subclass of Object[], so this is find. What is most important to note here is that it returns an object of runtime class String[], even though its declared return type is Object[]. (Again, String[] is a subtype of Object[], so this is not unusual.)
However, if you try to cast the result of the first method to type String[], you will get a class cast exception, because as noted before, its actual runtime type is Object[]. If you cast the result of the second method (assuming you passed in a String[]) to String[], it will succeed.
So even though you may not notice it (both methods seem to return Object[]), there is already a big fundamental difference in the actual returned object in pre-Generics between these two methods.
1.5 API:
Object[] toArray();
T[] toArray(T[] a);
The exact same thing happens here. Generics adds some nice stuff like checking the argument type of the second method at compile time. But the fundamentals are still the same: The first method creates an object whose real runtime type is Object[]; and the second method creates an object whose real runtime type is the same as the array you passed in.
In fact, if you try to pass in an array whose class is actually a subtype of T[], say U[], even though we have a List<T>, guess what it would do? It will try to put all the elements into a U[] array (which might succeed (if all the elements happen to be of type U), or fail (if not)) return an object whose actual type is U[].
So back to my point earlier. Why can't you make a method T[] toArray()? Because you don't know the the type of array you want to create (either using new or Array.newInstance()).
T[] toArray() {
// what would you put here?
}
Why can't you just create a new Object[n] and then cast it to T[]? It wouldn't crash immediately (since T is erased inside this method), but when you try to return it to the outside; and assuming the outside code requested a specific array type, e.g. String[] strings = myStringList.toArray();, it would throw an exception, because there's an implicit cast there from generics.
People can try all sort of hacks like look at the first element of the list to try to determine the component type, but that doesn't work, because (1) elements can be null, and (2) elements can be a subtype of the actual component type, and creating an array of that type might fail later on when you try to put other elements in, etc. Basically, there is no good way around this.
The toArray(String[]) part is there because the two toArray methods existed before generics were introduced in Java 1.5. Back then, there was no way to infer type arguments because they simply didn't exist.
Java is pretty big on backward compatibility, so that's why that particular piece of the API is clumsy.
The whole type-erasure thing is also there to preserve backward compatibility. Code compiled for 1.4 can happily interact with newer code that contains generics.
Yes, it's clumsy, but at least it didn't break the enormous Java code base that existed when generics were introduced.
EDIT: So, for reference, the 1.4 API is this:
Object[] toArray();
Object[] toArray(Object[] a);
and the 1.5 API is this:
Object[] toArray();
T[] toArray(T[] a);
I'm not sure why it was OK to change the signature of the 1-arg version but not the 0-arg version. That seems like it would be a logical change, but maybe there's some complexity in there that I'm missing. Or maybe they just forgot.
EDIT2: To my mind, in cases like this Java should use inferred type arguments where available, and an explicit type where the inferred type is not available. But I suspect that would be tricky to actually include in the language.
Others have answered the "why" question (I prefer Cameron Skinner's answer), I will just add that you don't have to instantiate a new array each time and it does not have to be empty. If the array is large enough to hold the collection, it will be used as the return value. Thus:
String[] asArray = someStrings.toArray(new String[someStrings.size()])
will only allocate a single array of the correct size and populate it with the elements from the Collection.
Furthermore, some of the Java collections utility libraries include statically defined empty arrays which can be safely used for this purpose. See, for example, Apache Commons ArrayUtils.
Edit:
In the above code, the instantiated array is effectively garbage when the Collection is empty. Since arrays cannot be resized in Java, empty arrays can be singletons. Thus, using a constant empty array from a library is probably slightly more efficient.
That's because of type erasure. See Problems with type erasure in the Wikipedia article about Java generics: the generic type information is only available at compile time, it is completely stripped by the compiler and is absent at runtime.
So toArray needs another way to figure out what array type to return.
The example provided in that Wikipedia article is quite illustrative:
ArrayList<Integer> li = new ArrayList<Integer>();
ArrayList<Float> lf = new ArrayList<Float>();
if (li.getClass() == lf.getClass()) // evaluates to true <<==
System.out.println("Equal");
According to Neal Gafter, SUN simply did not have enough resources like MS did.
SUN pushed out Java 5 with type erasure because making generic type info available to runtime means a lot more work. They couldn't delay any longer.
Note: type erasure is not required for backward compatibility - that part is handled by raw types, which is a different concept from reifiable types. Java still has the chance to remove type erasure, i.e. to make all types reifiable; however it's not considered urgent enough to be on any foreseeable agenda.
Java generics are really something that doesn't exist. It's only a syntactic (sugar? or rather hamburger?) which is handled by compiler only. It's in reality only short(?)cut to class casting, so if you look into bytecode you can be at first a bit suprised... Type erasure, as noted in post above.
This shortcut to class casting seemed to be a good idea when operating on Collections. In one time you could at least declare what type of element you're storing, and the programmer-independent mechanism (compiler) will check for that. However, using reflection you can still put into such collection whatever you want :)
But when you do generic strategy, that works on generic bean, and it is put into generic service generized(?) from both bean and strategy, taking generic listener for generic bean etc. you're going stright into generic hell. I've once finished with four (!) generic types specified in declaration, and when realized I need more, I've decided to un-generize the whole code because I've run into problems with generic types compliance.
And as for diamond operator... You can skip the diamond and have exactly the same effect, compiler will do the same checks and generate the same code. I doubt in next versions of java it would change, because of this backward compatibility needed... So another thing that gives almost nothing, where Java has much more problems to deal with, e.g. extreme inconvienience when operating with date-time types....
I can't speak to the JDK team's design decisions, but a lot of the clumsy nature of Java generics comes from the fact that generics were a part of Java 5 (1.5 - whatever). There are plenty of methods in the JDK which suffer from the attempt to preserve backwards compatibility with the APIs which pre-dated 1.5.
As for the cumbersome List<String> strings = new ArrayList<String>() syntax, I wager it is to preserve the pedantic nature of the language. Declaration, not inference, is the Java marching order.
strings.toArray( new String[0] );? The type cannot be inferred because the Array is itself considered a generic class (Array<String>, if it were exposed).
All in all, Java generics mostly exist to protect you from typing errors at compile-time. You can still shoot yourself in the foot at runtime with a bad declaration, and the syntax is cumbersome. As happens a lot, best practices for generics use are the best you can do.

why java generics have to erase the type information?

Java's generics would erase the type information after the source code was compiled. And i guess the "erase" is necessary because java only keep one copy of class no matter what the generic type is. So List<String> or List<Number> are simply just one List. Then I wonder if it possible that at the premise of keeping only one copy of certain class, the instance of the class stores the generic type information at the time it is created.
For instance:
when we write:
List<String> list = new List<String>.
the compiler create an object of List along with a String's Class Object(meaning the Object String.class) accociated with the List, so that the generic object list can check the type information at runtime using the Class Object. Is it posssible or practicable?
I'm not entirely sure what you're asking specifically, but the big reason Java has to use erasure for generics is backwards compatibility. If the behaviour of
List list = new ArrayList(); //You can't do new list(), it's an interface
...was altered between versions, when you upgraded from say Java 1.4 to Java 5, you'd have all sorts of weird things going on potentially causing bugs where the code didn't behave in the same way as previously. That's definitely a bad thing if that happens!
If they didn't have to preserve backwards compatibility then yes, they could've done what they liked - we could've had nice reified generics and done a whole bunch of other stuff we couldn't do now. There was a proposal (by Gafter I think) that would've allowed reified generics in Java in a backwards compatible way, but it would've involved creating new versions of all the classes that should have been generic. That would've caused a load of mess with the API, so (for better or worse) they chose not to go down that route.
We are using List<String> l = new ArrayList<String>(); from java 5. Before it was not like this.
It was like List l = new ArrayList(); and user can add anything into it like Integer,String or any Object of user-type too.So Java people did not want to change the old code.So they just keep it upto compiler which can check this at compile time.
to preserve binary backward compatibility with pre-Java5 code.

Why doesn't Java allow for the creation of generic arrays?

There are plenty of questions on stackoverflow from people who have attempted to create an array of generics like so:
ArrayList<Foo>[] poo = new ArrayList<Foo>[5];
And the answer of course is that the Java specification doesn't allow you to declare an array of generics.
My question however is why ? What is the technical reason underlying this restriction in the java language or java vm? It's a technical curiosity I've always wondered about.
Arrays are reified - they retain type information at runtime.
Generics are a compile-time construct - the type information is lost at runtime. This was a deliberate decision to allow backward compatibility with pre-generics Java bytecode. The consequence is that you cannot create an array of generic type, because by the time the VM wants to create the array, it won't know what type to use.
See Effective Java, Item 25.
Here is an old blog post I wrote where I explain the problem: Java generics quirks
See How do I generically create objects and arrays? from Angelika Langer's Java Generics FAQ for a workaround (you can do it using reflection). That FAQ contains everything you ever want to know about Java generics.
You are forced to use
ArrayList<Foo>[] poo = new ArrayList[5];
which will give you an unchecked warning. The reason is that there is a potential type safety issue with generics and the runtime type-checking behavior of Java arrays, and they want to make sure you are aware of this when you program. When you write new ArrayList[...], you are creating something which will check, at runtime, everything that gets put into it to make sure that it is an instance of ArrayList. Following this scheme, when you do new ArrayList<Foo>[...], then you expect to create something that checks at runtime everything that gets put into it to make sure it is an instance of ArrayList<Foo>. But this is impossible to do at runtime, because there is no generics info at runtime.

Are Java generics mainly a way of forcing static type on elements of a collection?

private ArrayList<String> colors = new ArrayList<String>();
Looking at the example above, it seems the main point of generics is to enforce type on a collection. So, instead of having an array of "Objects", which need to be cast to a String at the programmer's discretion, I enforce the type "String" on the collection in the ArrayList. This is new to me but I just want to check that I'm understanding it correctly. Is this interpretation correct?
That's by far not the only use of generics, but it's definitely the most visible one.
Generics can be (and are) used in many different places to ensure static type safety, not just with collections.
I'd just like to mention that, because you'll come accross places where generics could be useful, but if you're stuck with the generics/collections association, then you might overlook that fact.
Yes, your understanding is correct. The collection is strongly-typed to whatever type is specified, which has various advantages - including no more run-time casting.
Yeah, that's basically it. Before generics, one had to create an ArrayList of Objects. This meant that one could add any type of Object to the list - even if you only meant for the ArrayList to contain Strings.
All generics do is add type safety. That is, now the JVM will make sure that any object in the list is a String, and prevent you from adding a non-String object to the list. Even better: this check is done at compile time.
Yes. To maintain type safety and remove runtime casts is the correct answer.
You may want to check out the tutorial in the Java site. It gives a good explanation of in the introduction.
Without Generics:
List myIntList = new LinkedList(); // 1
myIntList.add(new Integer(0)); // 2
Integer x = (Integer) myIntList.iterator().next(); // 3
With Generics
List<Integer> myIntList = new LinkedList<Integer>(); // 1'
myIntList.add(new Integer(0)); // 2'
Integer x = myIntList.iterator().next(); // 3'
I think it as type safety and also saving the casting. Read more about autoboxing.
You can add runtime checks with the Collections utility class.
http://java.sun.com/javase/6/docs/api/java/util/Collections.html#checkedCollection(java.util.Collection,%20java.lang.Class)
Also see checkedSet, checkedList, checkedSortedSet, checkedMap, checkedSortedMap
Yes, you are correct. Generics adds compile-time type safety to your program, which means the compiler can detect if you are putting the wrong type of objects into i.e. your ArrayList.
One thing I would like to point out is that although it removes the visible run-time casting and un-clutters the source code, the JVM still does the casting in the background.
The way Generics is implemented in Java it just hides the casting and still produces non-generic bytecode. An ArrayList<String> still is an ArrayList of Objects in the byte-code. The good thing about this is that it keeps the bytecode compatible with earlier versions. The bad thing is that this misses a huge optimization opportunity.
You can use generic anywhere where you need a type parameter, i.e. a type that should be the same across some code, but is left more or less unspecified.
For example, one of my toy projects is to write algorithms for computer algebra in a generic way in Java. This is interesting for the sake of the mathematical algorithms, but also to put Java generics through a stress test.
In this project I've got various interfaces for algebraic structures such as rings and fields and their respective elements, and concrete classes, e.g. for integers or for polynomials over a ring, where the ring is a type parameter. It works, but it becomes somewhat tedious in places. The record so far is a type in front of a variable that spans two complete lines of 80 characters, in an algorithm for testing irreducibility of polynomials. The main culprit is that you can't give a complicated type its own name.

Categories