Detecting potential errors when converting String constants to enum - java

I'm replacing a group of String constants with an enum, but the constants weren't used everywhere they should have been. So we're replacing a lot of someValue.equals(FOO_CONST) with someValue == MyEnum.FOO. It's easy to fix all the places where they were used--just delete the constants and the compiler tells you where the problems are. However, there are also bits like "foo".equals(someValue), which the compiler can't identify as an error after the change is made.
Is there any way I can detect potential bugs caused by any of these inline literals that get missed during the conversion? (I'm using eclipse)

FindBugs reports bugs for calls to equals(Object) when the two objects are not of the same type, which handles this problem nicely.
They will show up in the Bug Explorer under:
Scariest
High confidence
Call to equals() comparing different types

Related

Tracking method implementation changes in class bytecode

I have some abstract project (let's call it The Project) bytecode (of it's every class) inside some kotlin code, and each class bytecode is stored as ByteArray; the task is to tell which specific methods in each class are being modified from build to build of The Project. In other words, there are two ByteArrays of a same class of The Project, but they belong to different versions of it, and I need to compare them accurate. A simple example. Let's assume we have a trivial class:
class Rst {
fun getjson(): String {
abc("""ss""");
return "jsonValid"
}
public fun abc(s: String) {
println(s)
}
}
It's bytecode is stored in oldByteCode. Now some changes happened to the class:
class Rst {
fun getjson(): String {
abc("""ss""");
return "someOtherValue"
}
public fun newMethod(s: String) {
println("it's not abc anymore!")
}
}
It's bytecode is stored in newByteCode.
That's the main goal: compare oldByteCode to newByteCode.
Here we have the following changes:
getjson() method had been changed;
abc() method had been removed;
newMethod() had been created.
So, a method is changed, if it's signature remains the same. If not, it's already some different method.
Now back to the actual problem. I have to know every method's exact status by it's bytecode. What I have at the moment is the jacoco analyzer, which parses class bytecode to "bundles". In these bundles I have hierarchy of packages, classes, methods, but only with their signatures, so I cant tell if a method's body has any changes. I can only track signature differences.
Are there any tools, libs to split class bytecode to it's methods bytecodes? With those I could, for example, calculate hashes and compare them. Maybe asm library has any deal with that?
Any ideas are welcome.
TL;DR you approach of just comparing bytecode or even hashes won’t lead to a reliable solution, in fact, there is no solution with a reasonable effort to this kind of problem at all.
I don’t know, how much of it applies to the Kotlin compiler, but as elaborated in Is the creation of Java class files deterministic?, Java compilers are not required to produce identical bytecode even if the same version is used to compile exactly the same source code. While they may have an implementation that tries to be as deterministic as possible, things change when looking at different versions or alternative implementations, as explained in Do different Java Compilers (where the vendor is different) produce different bytecode.
Even when we assume that the Kotlin compiler is outstandingly deterministic, even across versions, it can’t ignore the JVM evolution. E.g. the removal of the jsr/ret instructions could not be ignored by any compiler, even when trying to be conservative. But it’s rather likely that it will incorporate other improvements as well, even when not being forced¹.
So in short, even when the entire source code did not change, it’s not a safe bet to assume that the compiled form has to stay the same. Even with an explicitly deterministic compiler we would have to be prepared for changes when recompiling with newer versions.
Even worse, if one method changes, it may have an impact on the compiled form of others, as instructions refer to items of a constant pool whenever constants or linkage information are needed and these indices may change, depending on how the other methods use the constant pool. There’s also an optimized form for certain instructions when accessing one of the first 255 pool indices, so changes in the numbering may require changing the form of the instruction. This in turn may have an impact on other instructions, e.g. switch instructions have padding bytes, depending on their byte code position.
On the other hand, a simple change of a constant value used in only one method may have no impact on the method’s bytecode at all, if the new constant happened to end up at the same place in the pool than the old constant.
So, to determine whether the code of two methods does actually the same, there is no way around parsing the instructions and understanding their meaning to some degree. Comparing just bytes or hashes won’t work.
¹ to name some non-mandatory changes, the compilation of class literals changed, likewise string concatenation changed from using StringBuffer to use StringBuilder and changed again to use StringConcatFactory, the use of getClass() for intrinsic null checks changed to requireNonNull(…), etc. A compiler for a different language doesn’t have to follow, but no-one wants to be left behind…
There are also bugs to fix, like obsolete instructions, which no compiler would keep just to stay deterministic.

Adding immutable programming rules to the Java language within a program

I'm writing a program in Java. I find that reading and debugging code is easiest when the paradigm techniques are consistent, allowing me very quickly assume where and what a problem is.
Doing this has, as you might guess, made my programming much faster, and so I want to find a way to enforce these rules.
For example, lets say I have a method that makes changes to the state of an object, and returns a value. If the method is called outside of the class, I don't ever want to see it resolve inside parameter parentheses, like this:
somefunction(param1, param2, object.change_and_return());
Instead, I want it to be done like this:
int relevant_variable_name = object.change_and_return();
somefunction(param1, param2, relevant_variable_name);
Another example, is I want to create a base class that includes certain print methods, and I want all classes that are user defined to be derived from that base class, much in the way java has done so.
Within my objects, is there a way I can force myself (and anyone else) to adhere to these rules? Ie. if you try to run code that breaks the rules, it will terminate and return the custom error report. Also, if you write code that breaks the rules, the IDE (I use eclipse) will recognize it as an error, underline and call the appropriate javadoc?
For the check and underline violations part:
You can use PMD, it is a static code analyzer.
It has a default ruleset, and you can write custom rules matching what you need.
However your controls seem to be quite complex to express in "PMD language".
PMD is available in Eclipse Marketplace.
For the crash if not conform part
There see no easy way to do it.
Hard/complex ways could be:
Write a rule within PMD, run the analysis at compile time, parse the report (still at compile time) and return an error if your rule is violated.
Write a Java Agent doing the rule check and make it crash the VM if the rule is violated (not sure it is really feasable, agents are meant for instrumentation).
Use reflection anywhere in your code to load classes, and analyze loaded class against your rules and crash the VM if the rule is violated (seriously don't do this: the code would be ugly and the rule easily bypassable).

Comparison of two Java classes

I have two java classes that are very similar in semantics but differ in syntax. The differences are minor, like -
Changes in variable names,
Changes in position of some statements (with no dependent lines in between),
Extra imports, etc.
I need to compare these two classes to prove that they are indeed semantically identical. The same needs to be done for a large number of java file pairs.
The first approach of reading from the two files and comparing the lines, with logic to deal with the differences mentioned above seems inefficient. Is there some other way that I can achieve this task? Any helpful APIs out there?
Compile both of the classes without debug information and then decompile them back to source files. The decompiled files should be a lot more similar than the original source files.
You can improve this further by running some optimizations on the compiled files. For example you can use Proguard with just shrinking enabled to removed unused code.
Changes in position of some statements can be hard to detect though.
If you want to examine the changes in the code try Araxis Merge or WinMerge.
But if you want logical differences, I am afraid you might have to do it manually.
I would advise to use one of these tools to look for textual changes and then look for logical differences.
There are a lot of similarity checker out there, and until now there's no yet perfect tool for this. Each has its own advantages / disadvantages. The approaches generally falls into two categories: token-based or tree-based.
Token-based similarity checking is usually done with regular expressions, but other approaches are possible. In one of my projects at university, we developed one utilizing alignment strategy from bioinformatics field. The disadvantage of this technique is mainly if the size of the two sources isn't more or less equal.
Tree-based is more like a compiler, so normally using some compilation techniques it's possible (well, more or less) to check for this. Tree-based approach has disadvantages of being exponential in comparison complexity.
Comparing line by line wont work. I think you may need to use a parser. I would suggest that you take a look at ANTLR. It should have a java grammar where you could put your actions which will do the comparison.
As far as I know there's now way to compare the semantics of two Java classes. Take for example the following two methods:
public String m1(String a, int b) { ... }
and
public String m2(String x, int y) { ... }
A part from changes in variables and methods names, their signature is the same: same return type, and same input types. However, this is no guarantee that the two methods are semantically equivalent. For example, m1 could return a string consisting of the first b characters of a, while m2 could return a string consisting of y repetitions of x. As you can see, although only variables and names change, the semantics of the two methods is totally different.
I don't see an easy way out for your problem. You can perhaps make some assumption and try the following approach:
assume that the methods names in the two classes are the same
write test cases (for example with JUnit) for all the methods in the first class
run the test cases on the second class
ensure that the second class does not have other (untested) methods (for example using reflection)
This approach gives you an idea about equivalent semantics, but it makes strong assumption.
As a final remark, let me add that specifying the semantics of programs is an interesting and open research topic. Some interesting development in this area include research on Semantic Web Services. A widely adopted approach to give machine processable semantics to programs is that of specifying their IOPE: Input and Output types (as int the Java methods above), and their Preconditions and Effects. Preconditions are essentially logical conditions that must hold true for successfully invoking the program, and Effects are formal descriptions of the changes (in the state of the world) caused by the successful execution of the program. Even with IOPE there are a lot of problems ... which I skip in this short description.

How to get string.format to complain at compile time

The compiler has access to the format string AND the required types and parameters. So I assume there would be some way to indicate missing parameters for the varargs ... even if only for a subset of cases. Is there someway for eclipse or another ide to indicate that the varargs passed might cause a problem at runtime ?
It looks as if FindBugs can solve your problem. There are some warning categories related to format strings.
http://www.google.com/search?q=%2Bjava+%2Bprintf+%2Bfindbugs
http://findbugs.sourceforge.net/bugDescriptions.html#VA_FORMAT_STRING_MISSING_ARGUMENT
The Java compiler doesn't have any built-in semantic knowledge of StringFormat parameters, so it can't check on these at compile time. For all it knows, StringFormat is just another class and String.format is just another method, and the given format string is just another string like any other.
But yeah, I feel your pain, having come across these same problems in the past couple days. What they ought to have done is make it 'less careful' about the number of parameters, and just leave trailing %s markers un-replaced.

What are the differences between PMD and FindBugs?

There was a question comparing PMD and CheckStyle. However, I can't find a nice breakdown on the differences/similarities between PMD and FindBugs. I believe a key difference is that PMD works on source code, while FindBugs works on compiled bytecode files. But in terms of capabilities, should it be an either/or choice or do they complement each other?
I'm using both. I think they complement each other.
As you said, PMD works on source code and therefore finds problems like: violation of naming conventions, lack of curly braces, misplaced null check, long parameter list, unnecessary constructor, missing break in switch, etc. PMD also tells you about the Cyclomatic complexity of your code which I find very helpful (FindBugs doesn't tell you about the Cyclomatic complexity).
FindBugs works on bytecode. Here are some problems FindBugs finds which PMD doesn't: equals() method fails on subtypes, clone method may return null, reference comparison of Boolean values, impossible cast, 32bit int shifted by an amount not in the range of 0-31, a collection which contains itself, equals method always returns true, an infinite loop, etc.
Usually each of them finds a different set of problems. Use both. These tools taught me a lot about how to write good Java code.
The best feature of PMD, is its XPath Rules, bundled with a Rule Designer to let you easily construct new rules from code samples (similar to RegEx and XPath GUI builders). FindBugs is stronger out of the box, but constructing project specific rules and patterns is very important.
For example, I encountered a performance problem involving 2 nested for loops, resulting in a O(n^2) running time, which could easily be avoided. I used PMD to construct an ad-hoc query, to review other instances of nested for loops - //ForStatement/Statement//ForStatement. This pointed out 2 more instances of the problem. This is not a generic rule whatsoever.
PMD is
famous
used widely in industry
you can add your rules in xml
gives you detailed analysis in Errors levels and warning levels
you can also scan your code for "copy and paste lines". Duplicate code. This gives good idea about implementing java oops.

Categories