Flagging calls to mutator methods on immutable collections in IntelliJ

Flagging calls to mutator methods on immutable collections in IntelliJ - java

It is well known that Java immutable collections provide mutator methods owing their existence to the (unfortunate but well documented) fact that both mutable and immutable collections of a particular type implement the same superinterface. The immutable implementations' mutators will throw the UnsupportedOperationException at runtime, and that, typically, is where the story ends——and where my questions begins. The following code is clearly highlighted as problematic by IntelliJ:
What does IntelliJ know that javac doesn't? (This is JDK 17, in case it matters.)

Intellij knows that a List created by List.of(..) is always immutable. So it doesn't make sense to add another element to that list.
However the return type of List.of(..) is simply the List interface. So
var films = List.of("Citizen Kane")
is equal to
List<String> films = List.of("Citizen Kane")
for javac. As you pointed out the list interface has an add()-Method so compiling this code is perfectly fine.
Or in other words. Javac just checks syntax while IntelliJ also has rudimentary semantic checks.
Edit: to add he technical term, IntelliJ uses static analysis to give those warnings and suggestions. Read about it in their block right here

Related

Tracking method implementation changes in class bytecode

I have some abstract project (let's call it The Project) bytecode (of it's every class) inside some kotlin code, and each class bytecode is stored as ByteArray; the task is to tell which specific methods in each class are being modified from build to build of The Project. In other words, there are two ByteArrays of a same class of The Project, but they belong to different versions of it, and I need to compare them accurate. A simple example. Let's assume we have a trivial class:
class Rst {
fun getjson(): String {
abc("""ss""");
return "jsonValid"
}
public fun abc(s: String) {
println(s)
}
}
It's bytecode is stored in oldByteCode. Now some changes happened to the class:
class Rst {
fun getjson(): String {
abc("""ss""");
return "someOtherValue"
}
public fun newMethod(s: String) {
println("it's not abc anymore!")
}
}
It's bytecode is stored in newByteCode.
That's the main goal: compare oldByteCode to newByteCode.
Here we have the following changes:
getjson() method had been changed;
abc() method had been removed;
newMethod() had been created.
So, a method is changed, if it's signature remains the same. If not, it's already some different method.
Now back to the actual problem. I have to know every method's exact status by it's bytecode. What I have at the moment is the jacoco analyzer, which parses class bytecode to "bundles". In these bundles I have hierarchy of packages, classes, methods, but only with their signatures, so I cant tell if a method's body has any changes. I can only track signature differences.
Are there any tools, libs to split class bytecode to it's methods bytecodes? With those I could, for example, calculate hashes and compare them. Maybe asm library has any deal with that?
Any ideas are welcome.

TL;DR you approach of just comparing bytecode or even hashes won’t lead to a reliable solution, in fact, there is no solution with a reasonable effort to this kind of problem at all.
I don’t know, how much of it applies to the Kotlin compiler, but as elaborated in Is the creation of Java class files deterministic?, Java compilers are not required to produce identical bytecode even if the same version is used to compile exactly the same source code. While they may have an implementation that tries to be as deterministic as possible, things change when looking at different versions or alternative implementations, as explained in Do different Java Compilers (where the vendor is different) produce different bytecode.
Even when we assume that the Kotlin compiler is outstandingly deterministic, even across versions, it can’t ignore the JVM evolution. E.g. the removal of the jsr/ret instructions could not be ignored by any compiler, even when trying to be conservative. But it’s rather likely that it will incorporate other improvements as well, even when not being forced¹.
So in short, even when the entire source code did not change, it’s not a safe bet to assume that the compiled form has to stay the same. Even with an explicitly deterministic compiler we would have to be prepared for changes when recompiling with newer versions.
Even worse, if one method changes, it may have an impact on the compiled form of others, as instructions refer to items of a constant pool whenever constants or linkage information are needed and these indices may change, depending on how the other methods use the constant pool. There’s also an optimized form for certain instructions when accessing one of the first 255 pool indices, so changes in the numbering may require changing the form of the instruction. This in turn may have an impact on other instructions, e.g. switch instructions have padding bytes, depending on their byte code position.
On the other hand, a simple change of a constant value used in only one method may have no impact on the method’s bytecode at all, if the new constant happened to end up at the same place in the pool than the old constant.
So, to determine whether the code of two methods does actually the same, there is no way around parsing the instructions and understanding their meaning to some degree. Comparing just bytes or hashes won’t work.
¹ to name some non-mandatory changes, the compilation of class literals changed, likewise string concatenation changed from using StringBuffer to use StringBuilder and changed again to use StringConcatFactory, the use of getClass() for intrinsic null checks changed to requireNonNull(…), etc. A compiler for a different language doesn’t have to follow, but no-one wants to be left behind…
There are also bugs to fix, like obsolete instructions, which no compiler would keep just to stay deterministic.

Ways to signal that API returns an unmodifiable/immutable collection

Other than documenting it (obviously it should also be documented), using a special return type (I'm wary of limiting myself to an ImmutableX) or having the user find out at runtime, is there any other way of telling the users of an API that the collection they receive from said API is unmodifiable/immutable?
Are there any naming conventions or marker annotations that universally signal the same thing?
Edit: Unmodifiable and immutable do not mean the same thing, but for the purposes of this question, they are similar enough. The question basically boils down to letting the user know that the returned object does not fully honour its contract (ie. some common operations will throw a runtime exception).

Not a general naming convention but you might be interested in using this #Immutable annotation: http://aspects.jcabi.com/annotation-immutable.html
Besides the documentation purpose this mechanism will also validate if your object is really immutable (during it's instantiation) and throw a runtime exception if it is not.

Good and verbose solution would be to make your own UnmodifiableCollection wrapper class, and return it:
public UnmodifiableCollection giveMeSomeUnmodifableCollection() {
return new UnmodifiableCollection(new LinkedList());
}
The name of the return type would be enough to make verbose statement about the unmodifiablility of the collection.

Document it indeed
Provide API for checking if the given object is imutable collection
Return collection in wrapper that will hold information is the collection inside of it is mutable or not - my favorite solution
If possible, dont use mullable and immutable collections, but pick one of them. Results can always be immutable as they are results - why changing it. If there would be such need, it is a matter of single line to copy collection to new, mutable one and modify it (eg for chain processing)

Writing an #Immutable annotation on the return type of a method is the best approach. It has multiple benefits:
the annotation documents the meaning for users
a tool can verify that client code respects the annotation (that is, that client code does not have bugs)
a tool can verify that the library code respects the annotation (that is, that library code does not have bugs)
What's more, the verification can occur at compile time, before you ever run your code.
If you want verification at compile time, you can use the IGJ Immutability Checker. It distinguishes between
#Immutable references whose abstract value never changes, and
#ReadOnly references upon which side effects cannot be performed.

How do you test the type-safetiness of your genericized API?

You can use e.g. JUnit to test the functionality of your library, but how do you test its type-safetiness with regards to generics and wildcards?
Only testing against codes that compile is a "happy path" testing; shouldn't you also test your API against non-type-safe usage and confirm that those codes do NOT compile?
// how do you write and verify these kinds of "tests"?
List<Number> numbers = new ArrayList<Number>();
List<Object> objects = new ArrayList<Object>();
objects.addAll(numbers); // expect: this compiles
numbers.addAll(objects); // expect: this does not compile
So how do you verify that your genericized API raises the proper errors at compile time? Do you just build a suite a non-compiling code to test your library against, and consider a compilation error as a test success and vice versa? (Of course you have to confirm that the errors are generics-related).
Are there frameworks that facilitate such testing?

Since this is not testing in the traditional sense (that is - you can't "run" the test), and I don't think such a tool exists, here's what I can suggest:
Make a regular unit-test
Generate code in it - both the right code and the wrong code
Use the Java compiler API to try to compile it and inspect the result
You can make an easy-to-use wrapper for that functionality and contribute it for anyone with your requirements.

It sounds like you are trying to test the Java compiler to make sure it would raise the right compilation errors if you assign the wrong types (as opposed to testing your own api).
If that is the case, why aren't you also concerned about the compiler not failing when you assign Integers to String fields, and when you call methods on objects that have not been initialized, and the million other things compilers are supposed to check when they compile code?!

I guess your question isn't limited to generics. We can raise the same question to non-generic codes. If the tool you described exists, I'll be terrified. There are lots of people very happy to test their getters and setters(and try to enforce that on others). Now they are happier to write new tests to make sure that accesses to their private fields don't compile! Oh the humanity!
But then I guess generics are way more complicated so your question isn't moot. To most programmers, they'll be happy if they can get their damn generics code finally compile. If a piece of generics code doesn't compile, which is the norm during dev, they aren't really sure who to blame.

"How do you test the type-safetiness of your genericized API?" IMHO, the short answer to your question should be:
Don't use any #SuppressWarnings
Make sure you compile without warnings (or errors)
The longer answer is that "type safety" is not a property of an API, it is a property of the programming language and its type system. Java 5 generics is type safe in the sense that it gives you the guarantee that you will not have a type error (ClassCastException) at runtime unless it originates from a user-level cast operation (and if you program with generics, you rarely need such casts anymore). The only backdoor is the use of raw types for interoperability with pre-Java 5 code, but for these cases the compiler will issue warnings such as the infamous "unchecked cast" warning to indicate that type-safety may be compromised. However, short of such warnings, Java will guarantee your type safety.
So unless you are a compiler writer (or you do not trust the compiler), it seems strange to want to test "type safety". In the code example that you give, if you are the implementor of ArrayList<T>, you should only care to give addAll the most flexible type signature that allows you to write a functionally correct implementation. For example, you could type the argument as Collection<T>, but also as Collection<? extends T>, where the latter is preferred because it is more flexible. While you can over-constrain your types, the programming language and the compiler will make sure that you cannot write something that is not type-safe: for example, you simply cannot write a correct implementation for addAll where the argument has type Collection<?> or Collection<? super T>.
The only exception I can think of, is where you are writing a facade for some unsafe part of the system, and want to use generics to enforce some kind of guarantees on the use of this part through the facade. For example, although Java's reflection is not controlled as such by the type system, Java uses generics in things such as Class<T>, to allow that some reflective operations, such as clazz.newInstance(), to integrate with the type system.

Maybe you can use Collections.checkedList() in your unit test. The following example will compile but will throw a ClassCassException. Example below is copied from #Simon G.
List<String> stringList = new ArrayList<String>();
List<Number> numberList = Collections.checkedList(new ArrayList<Number>(), Number.class);
stringList.add("a string");
List list = stringList;
numberList.addAll(list);
System.out.println("Number list is " + numberList);

Testing for compilation failures sounds like barking up the wrong tree, then using a screwdriver to strip the bark off again. Use the right tool for the right job.
I would think you want one or more of:
code reviews (maybe supported by a code review tool like JRT).
static analysis tools (FindBugs/CheckStyle)
switch language to C++, with an implementation that supports concepts (may require also switching universe to one in which such an implementation exists).
If you really needed to to this as a 'test', you could use reflection to enforce any desired rule, say 'any function starting with add must have an argument that is a generic'. That's not very different from a custom Checkstyle rule, just clumsier and less reusable.

Well, in C++ they tried to do this with concepts but that got booted from the standard.
Using Eclipse I get pretty fast turn around time when something in Java doesn't compile, and the error messages are pretty straight forward. For example if you expect a type to have a certain method call and it doesn't then your compiler tells you what you need to know. Same with type mismatches.
Good luck building compile time concepts into java :P

Keeping track of what's in a Collection in pre-generics Java?

For a bunch of reasons that (believe it or not) are not as unsound as you may think, we are still (sigh) using Java 1.4 to build and run our code (though we plan to finally move to Java 7 by the end of the year).
Our existing code that uses Collection classes doesn't do a very good job of making it clear what is expected to be in the Collection. Obviously, you can read the code and see what the downcasts end up being done and infer from that, but you can't just look at a method declaration and know what the Collection object that is a method argument or method return value actually holds.
In new code that I'm writing and when I am in older code that uses Collections, I've been adding in-line comments to Collections declarations to show what would have been declared if generics were being used. For example:
Map/*<String, Set<Integer>>*/ theMap = new HashMap/*<String, Set<Integer>>*/();
or
List/*<Actions>*/ someMethod(List/*<Job>*/ jobs);
In keeping with the frowning at subjectivity here at SO, rather than asking what you think of this (though admittedly I'd like to know -- I do find it a bit ugly but still like having the type info there) I'd instead just ask what, if anything, you do to make it clear what is being held by pre-generics Collection objects.

What we recommended back in the old days -- and I was a Java Architect at Sun when Java 1.1 was the New Thing -- was to write a class around the structure (I don't think 1.1 even had Collection as a base class) so that the typecasts happned in code you control instead of in user code. So, for example, something like
public class ArrayOfFoo {
Object [] ary; // ctor left as exercise
public void set(int index, Foo value){
ary[index] = (Object) value; // cast strictly not needed, any Foo is an Object
}
public void get(int index){
return (Foo) ary[index]; // cast needed, not every Object is a Foo
}
}
Sounds like the code base you have isn't built to this convention; if you're writing new code, there's no reason you can't start. Failing that, your convention isn't bad, but it's easy to forget the cast and then have to search to find out why you're getting a bad cast exception. It's mildly better to resort of some variant on Hungarian notation, or the Smalltalk 'aVariable' convention, by encoding the type in the names, so that you use
Object fooAry = new Object[aZillion];
fooAry[42] = new Foo();
Foo aFoo = fooAry[42];

Use clear variable identifiers such as jobList, actionList, or dictionaryMap. If you're concerned with the type of objects they contain, you could even make it a convention to always let the identifier of a Collection hint about which type of objects it holds.
The inlined comments aren't that idea actually. When I ported a 1.5 project back to 1.4 I did just that (instead of removing the type parameters). It worked out quite well.

I'd recommend writing tests. For various reasons:
You should be writing tests anyway!
You can assert the type of a collection member very easily to ensure that all your code paths are adding the right types to the collection
You can use the test to write code that serves as an "example" of how to use the collection correctly

If you just need binary compatibility to 1.4 you could consider using a tool to downgrade the class files back to 1.4 and thus start to develop in 1.6 or 1.7 right now. You would of course need to avoid any API that hasn't been there in 1.4 (unfortunately you can't compile code with generics against the 1.4 jars directly as they don't declare any generic types). The Bytecode is still the same (at least with 1.6, I don't know for sure about 1.7). One free tool that can do the trick is ProGuard. It can do much more sophisticated things and can also remove all traces of generics in the class files. Just turn off the obfuscation and optimization if you don't need it. It will also warn you if some missing API was used in the processed code if you feed it the 1.4 libraries.
I'm aware that is considered a hack by many but we had a similar requirement where we needed some code to still run on a Personal Java VM (this is essentially Java 1.1) and several other exotic VMs and this approach worked quite well. We started with ProGuard and then made our own tool for the task to be able to implement a few workarounds for some Bugs in the diverse VMs.

Explicit typing in Groovy: sometimes or never?

[Later: Still can't figure out if Groovy has static typing (seems that it does not) or if the bytecode generated using explicit typing is different (seems that it is). Anyway, on to the question]
One of the main differences between Groovy and other dynamic languages -- or at least Ruby -- is that you can statically explicitly type variables when you want to.
That said, when should you use static typing in Groovy? Here are some possible answers I can think of:
Only when there's a performance problem. Statically typed variables are faster in Groovy. (or are they? some questions about this link)
On public interfaces (methods, fields) for classes, so you get autocomplete. Is this possible/true/totally wrong?
Never, it just clutters up code and defeats the purpose of using Groovy.
Yes when your classes will be inherited or used
I'm not just interested in what YOU do but more importantly what you've seen around in projects coded in Groovy. What's the norm?
Note: If this question is somehow wrong or misses some categories of static-dynamic, let me know and I'll fix it.

In my experience, there is no norm. Some use types a lot, some never use them. Personally, I always try to use types in my method signatures (for params and return values). For example I always write a method like this
Boolean doLogin(User user) {
// implementation omitted
}
Even though I could write it like this
def doLogin(user) {
// implementation omitted
}
I do this for these reasons:
Documentation: other developers (and myself) know what types will be provided and returned by the method without reading the implementation
Type Safety: although there is no compile-time checking in Groovy, if I call the statically typed version of doLogin with a non-User parameter it will fail immediately, so the problem is likely to be easy to fix. If I call the dynamically typed version, it will fail some time after the method is invoked, and the cause of the failure may not be immediately obvious.
Code Completion: this is particularly useful when using a good IDE (i.e. IntelliJ) as it can even provide completion for dynamically added methods such as domain class' dynamic finders
I also use types quite a bit within the implementation of my methods for the same reasons. In fact the only times I don't use types are:
I really want to support a wide range of types. For example, a method that converts a string to a number could also covert a collection or array of strings to numbers
Laziness! If the scope of a variable is very short, I already know which methods I want to call, and I don't already have the class imported, then declaring the type seems like more trouble than it's worth.
BTW, I wouldn't put too much faith in that blog post you've linked to claiming that typed Groovy is much faster than untyped Groovy. I've never heard that before, and I didn't find the evidence very convincing.

I worked on a several Groovy projects and we stuck to such conventions:
All types in public methods must be specified.
public int getAgeOfUser(String userName){
...
}
All private variables are declared using the def keyword.
These conventions allow you to achieve many things.
First of all, if you use joint compilation your java code will be able to interact with your groovy code easily. Secondly, such explicit declarations make code in large projects more readable and sustainable. And of-course auto-completion is an important benefit too.
On the other hand, the scope of a method is usually quite small that you don't need to declare types explicitly. By the way, modern IDEs can auto-complete your local variables even if you use defs.

I have seen type information used primarily in service classes for public methods. Depending on how complex the parameter list is, even here I usually see just the return type typed. For example:
class WorkflowService {
....
WorkItem getWorkItem(processNbr) throws WorkflowException {
...
...
}
}
I think this is useful because it explicitly tells the user of the service what type they will be dealing with and does help with code assist in IDE's.

Groovy does not support static typing. See it for yourself:
public Foo func(Bar bar) {
return bar
}
println("no static typing")
Save and compile that file and run it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.