Related
The context
I'm working on a project that is heavily dependent on generic types. One of its key components is the so-called TypeToken, which provides a way of representing generic types at runtime and applying some utility functions on them. To avoid Java's Type Erasure, I'm using the curly brackets notation ({}) to create an automatically generated subclass since this makes the type reifiable.
What TypeToken basically does
This is a strongly simplified version of TypeToken which is way more lenient than the original implementation. However, I'm using this approach so I can make sure that the real problem doesn't lie in one of those utility functions.
public class TypeToken<T> {
private final Type type;
private final Class<T> rawType;
private final int hashCode;
/* ==== Constructor ==== */
#SuppressWarnings("unchecked")
protected TypeToken() {
ParameterizedType paramType = (ParameterizedType) this.getClass().getGenericSuperclass();
this.type = paramType.getActualTypeArguments()[0];
// ...
}
When it works
Basically, this implementation works perfectly in almost every situation.
It has no problem with handling most types. The following examples work perfectly:
TypeToken<List<String>> token = new TypeToken<List<String>>() {};
TypeToken<List<? extends CharSequence>> token = new TypeToken<List<? extends CharSequence>>() {};
As it doesn't check the types, the implementation above allows every type that the compiler permits, including TypeVariables.
<T> void test() {
TypeToken<T[]> token = new TypeToken<T[]>() {};
}
In this case, type is a GenericArrayType holding a TypeVariable as its component type. This is perfectly fine.
The weird situation when using lambdas
However, when you initialize a TypeToken inside a lambda expression, things start to change. (The type variable comes from the test function above)
Supplier<TypeToken<T[]>> sup = () -> new TypeToken<T[]>() {};
In this case, type is still a GenericArrayType, but it holds null as its component type.
But if you're creating an anonymous inner class, things start to change again:
Supplier<TypeToken<T[]>> sup = new Supplier<TypeToken<T[]>>() {
#Override
public TypeToken<T[]> get() {
return new TypeToken<T[]>() {};
}
};
In this case, the component type again holds the correct value (TypeVariable)
The resulting questions
What happens to the TypeVariable in the lambda-example? Why does the type inference not respect the generic type?
What is the difference between the explicitly-declared and the implicitly-declared example? Is type inference the only difference?
How can I fix this without using the boilerplate explicit declaration? This becomes especially important in unit testing since I want to check whether the constructor throws exceptions or not.
To clarify it a bit: This is not a problem that's "relevant" for the program since I do NOT allow non-resolvable types at all, but it's still an interesting phenomenon I'd like to understand.
My research
Update 1
Meanwhile, I've done some research on this topic. In the Java Language Specification §15.12.2.2 I've found an expression that might have something to do with it - "pertinent to applicability", mentioning "implicitly typed lambda expression" as an exception. Obviously, it's the incorrect chapter, but the expression is used in other places, including the chapter about type inference.
But to be honest: I haven't really figured out yet what all of those operators like := or Fi0 mean what makes it really hard to understand it in detail. I'd be glad if someone could clarify this a bit and if this might be the explanation of the weird behavior.
Update 2
I've thought of that approach again and came to the conclusion, that even if the compiler would remove the type since it's not "pertinent to applicability", it doesn't justify to set the component type to null instead of the most generous type, Object. I cannot think of a single reason why the language designers decided to do so.
Update 3
I've just retested the same code with the latest version of Java (I used 8u191 before). To my regret, this hasn't changed anything, although Java's type inference has been improved...
Update 4
I've requested an entry in the offical Java Bug Database/Tracker a few days ago and it just got accepted. Since the developers who reviewed my report assigned the priority P4 to the bug, it might take a while until it'll be fixed. You can find the report here.
A huge shoutout to Tom Hawtin - tackline for mentioning that this might be an essential bug in the Java SE itself. However, a report by Mike Strobel would probably be way more detailed than mine due to his impressive background knowledge. However, when I wrote the report, Strobel's answer wasn't yet available.
tldr:
There is a bug in javac that records the wrong enclosing method for lambda-embedded inner classes. As a result, type variables on the actual enclosing method cannot be resolved by those inner classes.
There are arguably two sets of bugs in the java.lang.reflect API implementation:
Some methods are documented as throwing exceptions when nonexistent types are encountered, but they never do. Instead, they allow null references to propagate.
The various Type::toString() overrides currently throw or propagate a NullPointerException when a type cannot be resolved.
The answer has to do with the generic signatures that usually get emitted in class files that make use of generics.
Typically, when you write a class that has one or more generic supertypes, the Java compiler will emit a Signature attribute containing the fully parameterized generic signature(s) of the class's supertype(s). I've written about these before, but the short explanation is this: without them, it would not be possible to consume generic types as generic types unless you happened to have the source code. Due to type erasure, information about type variables gets lost at compilation time. If that information were not included as extra metadata, neither the IDE nor your compiler would know that a type was generic, and you could not use it as such. Nor could the compiler emit the necessary runtime checks to enforce type safety.
javac will emit generic signature metadata for any type or method whose signature contains type variables or a parameterized type, which is why you are able to obtain the original generic supertype information for your anonymous types. For example, the anonymous type created here:
TypeToken<?> token = new TypeToken<List<? extends CharSequence>>() {};
...contains this Signature:
LTypeToken<Ljava/util/List<+Ljava/lang/CharSequence;>;>;
From this, the java.lang.reflection APIs can parse the generic supertype information about your (anonymous) class.
But we already know that this works just fine when the TypeToken is parameterized with concrete types. Let's look at a more relevant example, where its type parameter includes a type variable:
static <F> void test() {
TypeToken sup = new TypeToken<F[]>() {};
}
Here, we get the following signature:
LTypeToken<[TF;>;
Makes sense, right? Now, let's look at how the java.lang.reflect APIs are able to extract generic supertype information from these signatures. If we peer into Class::getGenericSuperclass(), we see that the first thing it does is call getGenericInfo(). If we haven't called into this method before, a ClassRepository gets instantiated:
private ClassRepository getGenericInfo() {
ClassRepository genericInfo = this.genericInfo;
if (genericInfo == null) {
String signature = getGenericSignature0();
if (signature == null) {
genericInfo = ClassRepository.NONE;
} else {
// !!! RELEVANT LINE HERE: !!!
genericInfo = ClassRepository.make(signature, getFactory());
}
this.genericInfo = genericInfo;
}
return (genericInfo != ClassRepository.NONE) ? genericInfo : null;
}
The critical piece here is the call to getFactory(), which expands to:
CoreReflectionFactory.make(this, ClassScope.make(this))
ClassScope is the bit we care about: this provides a resolution scope for type variables. Given a type variable name, the scope gets searched for a matching type variable. If one is not found, the 'outer' or enclosing scope is searched:
public TypeVariable<?> lookup(String name) {
TypeVariable<?>[] tas = getRecvr().getTypeParameters();
for (TypeVariable<?> tv : tas) {
if (tv.getName().equals(name)) {return tv;}
}
return getEnclosingScope().lookup(name);
}
And, finally, the key to it all (from ClassScope):
protected Scope computeEnclosingScope() {
Class<?> receiver = getRecvr();
Method m = receiver.getEnclosingMethod();
if (m != null)
// Receiver is a local or anonymous class enclosed in a method.
return MethodScope.make(m);
// ...
}
If a type variable (e.g., F) is not found on the class itself (e.g., the anonymous TypeToken<F[]>), then the next step is to search the enclosing method. If we look at the disassembled anonymous class, we see this attribute:
EnclosingMethod: LambdaTest.test()V
The presence of this attribute means that computeEnclosingScope will produce a MethodScope for the generic method static <F> void test(). Since test declares the type variable W, we find it when we search the enclosing scope.
So, why doesn't it work inside a lambda?
To answer this, we must understand how lambdas get compiled. The body of the lambda gets moved into a synthetic static method. At the point where we declare our lambda, an invokedynamic instruction gets emitted, which causes a TypeToken implementation class to be generated the first time we hit that instruction.
In this example, the static method generated for the lambda body would look something like this (if decompiled):
private static /* synthetic */ Object lambda$test$0() {
return new LambdaTest$1();
}
...where LambdaTest$1 is your anonymous class. Let's dissassemble that and inspect our attributes:
Signature: LTypeToken<TW;>;
EnclosingMethod: LambdaTest.lambda$test$0()Ljava/lang/Object;
Just like the case where we instantiated an anonymous type outside of a lambda, the signature contains the type variable W. But EnclosingMethod refers to the synthetic method.
The synthetic method lambda$test$0() does not declare type variable W. Moreover, lambda$test$0() is not enclosed by test(), so the declaration of W is not visible inside it. Your anonymous class has a supertype containing a type variable that your the class doesn’t know about because it’s out of scope.
When we call getGenericSuperclass(), the scope hierarchy for LambdaTest$1 does not contain W, so the parser cannot resolve it. Due to how the code is written, this unresolved type variable results in null getting placed in the type parameters of the generic supertype.
Note that, had your lambda had instantiated a type that did not refer to any type variables (e.g., TypeToken<String>) then you would not run into this problem.
Conclusions
(i) There is a bug in javac. The Java Virtual Machine Specification §4.7.7 ("The EnclosingMethod Attribute") states:
It is the responsibility of a Java compiler to ensure that the method identified via the method_index is indeed the closest lexically enclosing method of the class that contains this EnclosingMethod attribute. (emphasis mine)
Currently, javac seems to determine the enclosing method after the lambda rewriter runs its course, and as a result, the EnclosingMethod attribute refers to a method that never even existed in the lexical scope. If EnclosingMethod reported the actual lexically enclosing method, the type variables on that method could be resolved by the lambda-embedded classes, and your code would produce the expected results.
It is arguably also a bug that the signature parser/reifier silently allows a null type argument to be propagated into a ParameterizedType (which, as #tom-hawtin-tackline points out, has ancillary effects like toString() throwing a NPE).
My bug report for the EnclosingMethod issue is now online.
(ii) There are arguably multiple bugs in java.lang.reflect and its supporting APIs.
The method ParameterizedType::getActualTypeArguments() is documented as throwing a TypeNotPresentException when "any of the actual type arguments refers to a non-existent type declaration". That description arguably covers the case where a type variable is not in scope. GenericArrayType::getGenericComponentType() should throw a similar exception when "the underlying array type's type refers to a non-existent type declaration". Currently, neither appears to throw a TypeNotPresentException under any circumstances.
I would also argue that the various Type::toString overrides should merely fill in the canonical name of any unresolved types rather than throwing a NPE or any other exception.
I have submitted a bug report for these reflection-related issues, and I will post the link once it is publicly visible.
Workarounds?
If you need to be able to reference a type variable declared by the enclosing method, then you can't do that with a lambda; you'll have to fall back to the longer anonymous type syntax. However, the lambda version should work in most other cases. You should even be able to reference type variables declared by the enclosing class. For example, these should always work:
class Test<X> {
void test() {
Supplier<TypeToken<X>> s1 = () -> new TypeToken<X>() {};
Supplier<TypeToken<String>> s2 = () -> new TypeToken<String>() {};
Supplier<TypeToken<List<String>>> s3 = () -> new TypeToken<List<String>>() {};
}
}
Unfortunately, given that this bug has apparently existed since lambdas were first introduced, and it has not been fixed in the most recent LTS release, you may have to assume the bug remains in your clients’ JDKs long after it gets fixed, assuming it gets fixed at all.
As a workaround, you can move the creation of TypeToken out of lambda to a separate method, and still use lambda instead of fully declared class:
static<T> TypeToken<T[]> createTypeToken() {
return new TypeToken<T[]>() {};
}
Supplier<TypeToken<T[]>> sup = () -> createTypeToken();
I've not found the relevant part of the spec, but here's a partial answer.
There's certainly a bug with the component type being null. To be clear, this is TypeToken.type from above cast to GenericArrayType (yuck!) with the method getGenericComponentType invoked. The API docs do not explicitly mention whether the null returned is valid or not. However, the toString method throws NullPointerException, so there is definitely a bug (at least in the random version of Java I am using).
I don't have a bugs.java.com account, so can't report this. Someone should.
Let's have a look at the class files generated.
javap -private YourClass
This should produce a listing containing something like:
static <T> void test();
private static TypeToken lambda$test$0();
Notice that our explicit test method has it's type parameter, but the synthetic lambda method does not. You might expect something like:
static <T> void test();
private static <T> TypeToken<T[]> lambda$test$0(); /*** DOES NOT HAPPEN ***/
// ^ name copied from `test`
// ^^^ `Object[]` would not make sense
Why doesn't this happen. Presumably because this would be a method type parameter in a context where a type type parameter is required, and they are surprisingly different things. There is also a restriction on lambdas not allowing them to have method type parameters, apparently because there is no explicit notation (some people may suggest this seems like a poor excuse).
Conclusion: There is at least one unreported JDK bug here. The reflect API and this lambda+generics part of the language is not to my taste.
I'm trying to understand the benefit of a programming language being statically typed, and through that, I'm wondering why we need to include type in declaration? Does it serve any purpose rather than to make type explicit? If this is the case, I don't see the point. I understand that static typing allows for type checking at compile-time, but if we leave out the explicit type declaration, can't Java still infer type during compile-time?
For example, let's say we have in Java:
myClass test = new myClass();
Isn't the type declaration unnecessary here? If I'm not mistaken, this is static binding, and Java should know test is of type myClass without explicit declaration of type even at compile-time.
Response to possible duplicate: this is not a question regarding static vs. dynamic type, but rather about type inference in statically typed languages, as explained in the accepted answer.
There are statically typed languages that allow you to omit the type declaration. This is called type inference. The downsides are that it's tougher to design (for the language designers), tougher to implement (for the compiler writers), and can be tougher to understand when something goes wrong (for programmers). The problem with the last one of those is that if many (or all) of your types are inferred, the compiler can't really tell you much more than "the types aren't all consistent" — often via a cryptic message.
In a trivial case like the one you cite, yes, it's easy. But as you get farther from the trivial case, the system quickly grows in complexity.
Java does actually do a bit of type inference, in very limited forms. For instance, in this snippet:
List<String> emptyStrings = Collections.emptyList();
... the compiler has inferred that the method call emptyList returns a List<String>, and not just a List<T> where the type T is unspecified. The non-inferred version of that line (which is also valid Java) is:
List<String> emptyStrings = Collections.<String> emptyList();
It is necessary. You can have inheritance, where types are necessary.
For example:
Building build1 = new House();
Building build2 = new SkyScraper();
It is the same in polymorphism.
You can then collect all Buildings to array for example. If there will be one House and one SkyScraper you can't do this.
I am wondering how can I get the runtime type which is written by the programmer when using generics. For example if I have class Main<T extends List<String>> and the programmer write something like
Main<ArrayList<String>> main = new Main<>();
how can I understand using reflection which class extending List<String> is used?
I'm just curious how can I achieve that. With
main.getClass().getTypeParameters()[0].getBounds[]
I only can understand the bounding class (not the runtime class).
As the comments above point out, due to type erasure you can't do this. But in the comments, the follow up question was:
I know that the generics are removed after compilation, but I am wondering how then ClassCastException is thrown runtime ? Sorry, if this is a stupid question, but how it knows to throws this exception if there isn't any information about classes.
The answer is that, although the type parameter is erased from the type, it still remains in the bytecode.
Essentially, the compiler transforms this:
List<String> list = new ArrayList<>();
list.add("foo");
String value = list.get(0);
into this:
List list = new ArrayList();
list.add("foo");
String value = (String) list.get(0); // note the cast!
This means that the type String is no longer associated with the type ArrayList in the bytecode, but it still appears (in the form of a class cast instruction). If at runtime the type is different you'll get a ClassCastException.
This also explains why you can get away with things like this:
// The next line should raise a warning about raw types
// if compiled with Java 1.5 or newer
List rawList = new ArrayList();
// Since this is a raw List, it can hold any object.
// Let's stick a number in there.
rawList.add(new Integer(42));
// This is an unchecked conversion. Not always wrong, but always risky.
List<String> stringList = rawList;
// You'd think this would be an error. But it isn't!
Object value = stringList.get(0);
And indeed if you try it, you'll find that you can safely pull the 42 value back out as an Object and not have any errors at all. The reason for this is that the compiler doesn't insert the cast to String here -- it just inserts a cast to Object (since the left-hand side type is just Object) and the cast from Integer to Object succeeds, as it should.
Anyhow, this is just a bit of a long-winded way of explaining that type erasure doesn't erase all references to the given type, only the type parameter itself.
And in fact, as a now-deleted answer here mentioned, you can exploit this "vestigial" type information, through a technique called Gafter's Gadget, which you can access using the getActualTypeArguments() method on ParameterizedType.
The way the gadget works is by creating an empty subclass of a parameterized type, e.g. new TypeToken<String>() {}. Since the anonymous class here is a subclass of a concrete type (there is no type parameter T here, it's been replaced by a real type, String) methods on the type have to be able to return the real type (in this case String). And using reflection you can discover that type: in this case, getActualTypeParameters()[0] would return String.class.
Gafter's Gadget can be extended to arbitrarily complex parameterized types, and is actually often used by frameworks that do a lot of work with reflection and generics. For example, the Google Guice dependency injection framework has a type called TypeLiteral that serves exactly this purpose.
I was surprised today when this code compiled:
class GenericClass<T> {
public void emptyMethod(T instance) {
// ..
}
public void print(T instance) {
System.out.println(instance);
}
}
public class Main {
public static void main(String[] args) {
GenericClass first = new GenericClass();
System.out.println("Wow");
first.emptyMethod(10);
first.print(16);
}
}
The compiler emits a warning (Type safety: The method emptyMethod(Object) belongs to the raw type GenericList. References to generic type GenericList should be parameterized), but anyway it does not cause a compiler error and it runs 'fine' (at least the provided print method). As I'm understanding, the compiler is using object as the type argument, but I find it counter-intuitive. Why would the compiler do such thing? Why it doesn't require me to specify the type parameter?
You're using a raw class, basically.
Think back to when generics were first introduced in Java: there was a load of code which already used List, ArrayList etc. In order to avoid breaking all of that code, but still reusing the existing classes, raw types were introduced - it's basically using a generic type as if it weren't one.
As you can see, you get a warning - so it's worth avoiding - but that's the primary reason for it being allowed at all.
See section 4.8 of the JLS for more information, which includes:
Raw types are closely related to wildcards. Both are based on existential types. Raw types can be thought of as wildcards whose type rules are deliberately unsound, to accommodate interaction with legacy code. Historically, raw types preceded wildcards; they were first introduced in GJ, and described in the paper Making the future safe for the past: Adding Genericity to the Java Programming Language by Gilad Bracha, Martin Odersky, David Stoutamire, and Philip Wadler, in Proceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA 98), October 1998.
You have to know how generics are implemented in Java. They are far from perfect.
You have to remember that during run time everything is an Object. There are no types during run time.
Generics were added for added security in places, where you need it, but if you don't want to use it, you can ignore warnings and use unparametrized instances.
However, if you'd like java compiler to help you with type safety, then you parametrize generic class instances. Once you create a GenericClass for example, compiler will not allow you to use it with an integer parameter (first.emptyMethod(10) will not compile). You can still make it work with integer parameter if you do explicit type casting though.
So consider it a good practice for added security, which only works if you follow the rules.
This question already has answers here:
When should I use the java 5 method cast of Class?
(5 answers)
Java Class.cast() vs. cast operator
(8 answers)
Closed 4 years ago.
I recently stumbled upon a piece of code that went like this:
Object o = .. ;
Foo foo = Foo.class.cast(o);
I was actually not even aware that java.lang.Class had a cast method, so I looked into the docs, and from what I gather this does simply do a cast to the class that the Class object represents. So the code above would be roughly equivalent to
Object o = ..;
Foo foo = (Foo)o;
So I wondered, why I would want to use the cast method instead of simply doing a cast "the old way". Has anyone a good example where the usage of the cast method is beneficial over doing the simple cast?
I don't think it's often used exactly as you have shown. Most common use I have seen is where folks using generics are trying to do the equivalent of this:
public static <T extends Number> T castToNumber(Object o) {
return (T)o;
}
Which doesn't really do anything useful because of type erasure.
Whereas this works, and is type safe (modulo ClassCastExceptions):
public static <T extends Number> T castToNumber(Object o, Class<T> clazz) {
return clazz.cast(o);
}
EDIT: Couple of examples of use from google guava:
MutableClassToInstanceMap
Cute use in Throwables#propagateIfInstanceOf, for type safe
generic throw spec
In Java there is often more than one way to skin a cat. Such functionality may be useful in cases where you have framework code. Imagine a method which accepts a Class object instance and an Object instance and returns the Object case as the class:
public static void doSomething(Class<? extends SomeBaseClass> whatToCastAs,Object o)
{
SomeBaseClass castObj = whatToCastAs.cast(o);
castObj.doSomething();
}
In general, use the simpler casting, unless it does not suffice.
In some cases, you only know the type to cast an object to during runtime, and that's when you have to use the cast method.
There is absolutely no reason to write Foo.class.cast(o), it is equivalent to (Foo)o.
In general, if X is a reifiable type, and Class<X> clazz, then clazz.cast(o) is same as (X)o.
If all types are reifiable, method Class.cast() is therefore redundant and useless.
Unfortunately, due to erasure in current version of Java, not all types are reifiable. For example, type variables are not reifiable.
If T is a type variable, cast (T)o is unchecked, because at runtime, the exact type of T is unknown to JVM, JVM cannot test if o is really type T. The cast may be allowed erroneously, which may trigger problems later.
It is not a huge problem; usually when the programmer does (T)o, he has already reasoned that the cast is safe, and won't cause any problem at runtime. The cast is checked by app logic.
Suppose a Class<T> clazz is available at the point of cast, then we do know what T is at runtime; we can add extra runtime check to make sure o is indeed a T.
check clazz.isInstance(o);
(T)o;
And this is essentially what Class.cast() does.
We would never expect the cast to fail in any case, therefore in a correctly implemented app, check clazz.isInstance(o) must always succeed anway, therefore clazz.cast(o) is equivalent to (T)o - once again, under the assumption that the code is correct.
If one can prove that the code is correct and the cast is safe, one could prefer (T)o to clazz.cast(o) for performance reason. In the example of MutableClassToInstanceMap raised in another answer, we can see obviously that the cast is safe, therefore simple (T)o would have sufficed.
class.cast is designed for generics type.
When you construct a class with generic parameter T, you can pass in a
Class. You can then do the cast with both static and dynamic
checking, which (T) does not give you. It also doesn't produce unchecked
warnings, because it is checked (at that point).
The common sample for that is when you retrieve from persistent layer a collection of entity referenced with a Class Object and some conditions. The returned collection could contain unchecked objects, so if you just cast it as pointed G_H, you will throw the Cast Exception at this point, and not when the values are accessed.
One example for this is when you retrieve a collection from a DAO that returns an unchecked collection and on your service you iterate over it, this situation can lead to a ClassCastException.
One way to solve it, as you have the wanted class and the unchecked collection is iterate over it and cast it inside the DAO transforming the collection in a checked collection and afterwards return it.
Because you might have something this:
Number fooMethod(Class<? extends Number> clazz) {
return clazz.cast(var);
}
A "cast" in Java, e.g. (Number)var, where the thing inside the parentheses is a reference type, really consists of two parts:
Compile time: the result of the cast expression has the type of the type you cast to
Run time: it inserts a check operation, which basically says, if the object is not an instance of that class, then throw a ClassCast Exception (if the thing you're casting to is a type variable, then the class it checks would be the lower bound of the type variable)
To use the syntax, you need to know the class at the time you write the code. Suppose you don't know at compile-time what class you want to cast to; you only know it at runtime.
Now you would ask, then what is the point of casting? Isn't the point of casting to turn the expression into the desired type at compile time? So if you don't know the type at compile time, then there is no benefit at compile-time, right? True, but that is just the first item above. You're forgetting the runtime component of a cast (second item above): it checks the object against the class.
Therefore, the purpose of a runtime cast (i.e. Class.cast()) is to check that the object is an instance of the class, and if not, throw an exception. It is roughly equivalent to this but shorter:
if (!clazz.isInstance(var))
throw new ClassCastException();
Some people have mentioned that Class.cast() also has a nice return type that is based on the type parameter of the class passed in, but that is just a compile-time feature that is provided by a compile-time cast anyway. So for that purpose there is no point in using Class.cast().