What does Java do with my "equals" implementations here?

What does Java do with my "equals" implementations here? - java

Today, I've stumbled over the following:
Consider two classes NewClass and NewClass1, which have the following "equals"-methods:
NewClass:
#Override
public boolean equals(Object obj) {
return false;
}
public boolean equals(NewClass obj) {
return value == obj.getValue();
}
NewClass1:
#Override
public boolean equals(Object obj) {
if(!(obj instanceof NewClass1)) {
return false;
}
return equals((NewClass1) obj);
}
public boolean equals(NewClass1 obj) {
return value == obj.getValue();
}
What I find weird is that the equals in NewClass1 seems to be exponentially slower than the one in NewClass (for 10.000.000 calls 14ms against 3000ms). At first, I thought this was related to the "instanceof" check, but if I replace "return equals((NewClass1) obj);" with "return false;" in NewClass1, suddenly it runs more or less equally fast. I don't really understand what is happening here, because in my opinion, the return statement in equals(Object) should never actually be called. What am I getting wrong here?
The following is my "benchmarking code", in case I made some mistake there:
public static void main(String[] args) {
// TODO code application logic here
NewClass i1 = new NewClass(1);
NewClass i2 = new NewClass(1);
NewClass i3 = new NewClass(5);
NewClass1 j1 = new NewClass1(1);
NewClass1 j2 = new NewClass1(1);
NewClass1 j3 = new NewClass1(5);
Object o1 = new Object();
Object o2 = new Object();
assert(i1.equals(i1));
assert(i1.equals(i2));
assert(i1.equals(i3) == false);
assert(i1.equals(o1) == false);
assert(j1.equals(j1));
assert(j1.equals(j2));
assert(j1.equals(j3) == false);
assert(j1.equals(o1) == false);
long start = System.currentTimeMillis();
for(int i=0; i<1000000000; i++) {
i1.equals(i1);
i1.equals(i2);
i1.equals(o1);
i1.equals(o2);
}
long end = System.currentTimeMillis();
System.out.println("Execution time was "+(end-start)+" ms.");
start = System.currentTimeMillis();
for(int i=0; i<1000000000; i++) {
j1.equals(j1);
j1.equals(j2);
j1.equals(o1);
j1.equals(o2);
}
end = System.currentTimeMillis();
System.out.println("Execution time was "+(end-start)+" ms.");
}

I would guess that it is the instanceof test that is consuming the time. When you change the final return in that method to always return false, the compiler probably eliminates the conditional, since the result will be the same (return false) regardless of its evaluation. This would also explain why changing the final return has any effect at all, since as you say it should never actually be reached in the code path.
To put it more generally, a code change can impact performance even if it is not on the executed code path, by changing how the compiler optimizes the code.

In the first example equals(NewClass) would ordinarily never be called. equals(Object) can be inlined by HotSpot (or similar), and the body of your test can be reduced to effectively nothing.
Back of enveloper calculations can be informative. "10.000.000 calls 8ms" is 1,250,000,000 iterations a second. Assuming a 4 GHz processor, that's about three cycles per iteration. A bit fast to be doing anything worthwhile. In fact the code says 1,000,000,000 not 10,000,000.
In fact in the actual code all the loop body could be eliminated. So, it doesn't really matter what you are measuring - it wont be reliable indication of anything useful. There are many other problems with doing microbenchmarks, which you can read in many other places.

In the first example you always return false. This is very fast. In the second example you have a much longer comparison algorithm

Well, the first example does nearly nothing.. You can descrease the iteration number to 100000, again you get the same result, 5 or 6 ms. It means that JVM optimizes aggressively that part of your code.

Related

Having trouble understanding return type placement( Big Java Ex 6.8)

Currently on the chapter in my book where we talk about for loops and loops. I have sometimes come across an issue where the method needs me to return something. For example consider my code
below. Basically the exercise is to get all the factors in ascending order. Now heres the issue
As you can see I need a return statement outside of the for loop. Now I guess my book didn't exactly explain this properly, or I didn't understand the concept
of return properly in java, but does our return statement always have to be in the most outer indentation if you will?
The thing is, I don't really want to return anything outside of the for loop. I just want to return i upon that condition. Why doesn't java let me do this?
Whats a good counter-action?
Ever since I started learning loops and for loops, I have been having trouble understanding this. I guess I could just system.out.println(i) instead of returning it? But then what should I return? I could also make it a void type, and then make another method to print it, I guess?
class factors{
private int num;
public factors(int num)
{
this.num = num;
}
public int getFactors()
{
for(int i = 1 ; i<num ; i++)
{
if (num % i == 0)
{
return i;
}
}
// I NEED TO PUT A RETURN STATEMENT HERE
}
}
public class test{
public static void main(String [] args)
{
factors fact = new factors(20);
System.out.println(fact.getFactors());
}
}
IT WORKS NOW ( I dont particularly like my solution)
class factors{
private int num;
public factors(int num)
{
this.num = num;
}
public void getFactors()
{
for(int i = 1 ; i<num ; i++)
{
if (num % i == 0)
{
System.out.println(i);
}
}
}
}
public class test{
public static void main(String [] args)
{
factors fact = new factors(20);
fact.getFactors();
}
}

The thing is, I don't really want to return anything outside of the for loop. I just want to return i upon that condition. Why doesn't java let me do this?
Java lets you do that. There is nothing wrong with returning inside the loop upon reaching the condition.
Java allows you to have multiple return statements, so adding another return 0; after the loop is allowed.
Java returns once it hits the first return statement, and other return statements are not executed (the method isn't executed anymore) (except for some rare edge cases with try-catch and return, but thats another story entirely).
But why is it required?
Java requires that for all possible paths there exists a return with the proper type. Even if you yourself can proof mathematically that the path Java complains about is never taken, the compiler might not be able to prove that the path is not possible at runtime. So you simply need to add an return there with a dummy value.
In your concrete example, there is a condition in which the loop gets never executed. If num <= 0, then the loop condition is never satified and the entire loop body is skipped. Without the return,the method is invalid, because you can't return nothing from an method with return type int.
So, in your example, the compiler is actually smarter then you, and prevents you from making a mistake - because it found the path you thought wouldn't occur.
new factors(-1).getFactors(); // you don't check the passed value at all ;)
From your comments, it seems that you want to return all factors. In java, you return once, and only once, from a function. This means you have to aggregate the results and return a List or array of values:
public List<Integer> getFactors(int num) {
List<Integer> factors = new ArrayList<>();
for (int i = 1 ; i<num ; i++)
{
if (num % i == 0)
{
factors.add(i);
}
}
return factors;
}
public static void main(String[] args) {
System.out.println(Arrays.toString(new factors(20).getFactors());
// prints a comma-separated list of all factors
}

does our return statement always have to be in the most outer indentation if you will?
No.
However, all potential code paths must return something. Consider this structure:
for(int i = 1 ; i<num ; i++)
{
if (num % i == 0)
{
return i;
}
}
What happens if num is a value where the loop itself is never entered? Or what happens if the if condition is never satisfied? No return statement would ever be encountered, which is invalid.
The compiler has to guarantee that the method will return something, under any and all potential runtime conditions. So while it's perfectly valid to return from within the loop, you also must provide logic for what to return if that return statement is never reached.

Java doesn't let you do that because what happens if the if (num % i == 0) is never true?
The methods return type is int, so it has to return an int. And it's possible that the if statement could be false, not every condition is covered with a return statement.
So if you wanted to you could return something like -1, or another invalid value. Then you know that the function didn't find what it was looking for.

Why does the equals() implementation generated by Eclipse check for null before type checking (instanceof)?

I regularly used Eclipse's code generation tools (Source / Generate hashCode() and equals()...) to create the equals() implementation for simple POJO classes. If I choose to "Use instanceof to compare types" this produces an equals() implementation similar to this:
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (!(obj instanceof MyClass)) {
return false;
}
MyClass other = (MyClass) obj;
// check the relevant fields for equality
}
Today a colleague pointed out, that the second if statement is not necessary at all, since the instanceof type check will return false whenever obj is null. (See question 3328138.)
Now, I guess that the folks writing the code templates for Eclipse JDT are worth their salt, too. So I figure there must be some reason for that null check, but I'm not really sure what it is?
(Also question 7570764 might give a hint: if we use getClass() comparison for type checking instead instanceof, obj.getClass() is not null safe. Maybe the code template is just not clever enough to omit the null check if we use instanceof.)
EDIT: Dragan noticed in his answer, that the instanceof type check is not the default setting in Eclipse, so I edited that out of the question. But that does not change anything.
Also please do not suggest that I should use getClass() or (even better!) a different IDE. That's not the point, that does not answer the question. I didn't ask for advice on how to write an equals() implementation, whether to use instanceof or getClass(), etc.
The question roughly is: is this a minor bug in Eclipse? And if it's not, then why does it qualify as a feature?

It is unnecessary because instanceof has a built in null check.
But instanceof is a lot more than a simple foo == null. It is a full instruction preparing a class check doing unnecessary work before the null check is done. (see for more details http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.instanceof)
So a separate null check could be a performance improvement.
Did a quick measurement and no surprise foo==null is faster than a nullcheck with instanceof.
But usually you do not have a ton of nulls in an equals() leaving you with a duplicate unnecessary nullcheck most of the times... which will likely eat up any improvement made during null comparisons.
My conclusion: It is unnecessary.
Code used for testing for completeness (remember to use -Djava.compiler=NONE else you will only measure the power of java):
public class InstanceOfTest {
public static void main(String[] args) {
Object nullObject = null;
long start = System.nanoTime();
for(int i = Integer.MAX_VALUE; i > 0; i--) {
if (nullObject instanceof InstanceOfTest) {}
}
long timeused = System.nanoTime() - start;
long start2 = System.nanoTime();
for(int i = Integer.MAX_VALUE; i > 0; i--) {
if (nullObject == null) {}
}
long timeused2 = System.nanoTime() - start2;
System.out.println("instanceof");
System.out.println(timeused);
System.out.println("nullcheck");
System.out.println(timeused2);
}
}

Indeed, it is unnecessary and it is the mistake of the authors of the Eclipse template. And it is not the first one; I found more of smaller errors there. For example, the generation of the toString() method when I want to omit null values:
public class A {
private Integer a;
private Integer b;
#Override
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("A [");
if (a != null)
builder.append("a=").append(a).append(", ");
if (b != null)
builder.append("b=").append(b);
builder.append("]");
return builder.toString();
}
}
If a is not null and b is, there will be an extra comma before the closing ].
So, regarding your statement: "Now, I guess that the folks writing the code templates for Eclipse JDT are worth their salt, too.", I assume they are, but it would not hurt them to pay more attention to these tiny inconsistencies. :)

In Java what is the quickest way to check if list contains items from another list, both list are of same type?

Say I have class called MyClass as follow:
public class MyClass
{
//Identifier is alpha-numeric. If the identifier starts will 'ZZ'
//is special special identifier.
private String identifier = null;
//Date string format YYYY-MM-DD
private String dateString = null;
//Just a flag (not important for this scenario)
private boolean isCoolCat = false;
//Default Constructor and getters/setters implemented
//Overrides the standard Java equals() method.
//This way, when ArrayList calls contains() for MyClass objects
//it will only check the Date (for ZZ identifier)
//and identifier values against each other instead of
//also comparing the isCoolCat indicator value.
#Override
public boolean equals(Object obj)
{
if(this == obj)
{
return true;
}
if(obj == null)
{
return false;
}
if(getClass() != obj.getClass())
{
return false;
}
MyClass other = (MyClass) obj;
if(this.identifier == null)
{
if(other.identifier != null)
{
return false;
}
} else if(!this.identifier.equals(other.identifier)) {
return false;
}
if(other.identifier.startsWith("ZZ"))
{
if(!this.dateString.equals(other.dateString))
{
return false;
}
}
return true;
}
}
In another class I have two List of MyClass type, each contain 100,000 objects. I need to check if items in one list are in the other list and I currently accomplish this as follow:
`
List<MyClass> inList = new ArrayList<MyClass>();
List<MyClass> outList = new ArrayList<MyClass>();
inList = someMethodForIn();
outList = someMethodForOut();
//For loop iterates through inList and check if outList contains
//MyClass object from inList if it doesn't then it adds it.
for(MyClass inObj : inList)
{
if(!outList.contains(inObj))
{
outList.add(inObj);
}
}
My question is: Is this the fastest way to accomplish this? If not can you please show me a better implementation that will give me a performance boost? The list size is not always going to be 100,000. Currently on my platform it takes about 2 minutes for 100,000 size. Say it can vary from 1 to 1,000,000.

You want to use a Set for this. Set has a contains method which can determine if an object is in the set in O(1) time.
A couple things to watch out for when converting from List<MyClass> to Set<MyClass>:
You will lose the ordering of the elements
You will lose the duplicate elements
Your MyClass needs to implement hashcode() and equals(), and they should be consistent.
To convert your List to Set you can just use:
Set<MyObject> s1 = new HashSet<>(inList);
Set<MyObject> s2 = new HashSet<>(outList);
This Java doc explains how to find the union, intersection, and difference of two sets. In particular, it seems like you're interested in the Union:
// transforms s2 into the union of s1 and s2. (The union of two sets
// is the set containing all of the elements contained in either set.)
s2.addAll(s1)

Hashing ! Hashing is always the answer !
Current complexity of this code is, O(nm) where n is the size of inList and m is the size of outList.
You can use a HashSet to reduce your complexity to O(n). Because contains will now take O(1)
This can be done like this,
HashSet<MyClass> outSet = new HashSet<>(outList);
for(MyClass inObj : inList)
{
if(!outSet.contains(inObj))
{
outList.add(inObj);
}
}
Credits and Sources.
returning difference between two lists in java
Time complexity of contains(Object o), in an ArrayList of Objects
HashSet.contains performance

2 minutes comparing 2 very large lists, probably not going to get much time savings here, so depending on your application, can you set a flag so that things dependant on this cannot run until finished and push this into it's own thread and let the user do something else (while also telling them this is on-going.) Or at least put up a progress bar. Letting the user know the app is busy and telling them (ish) how long it will take on something only taking a few minutes in a very complex computation like this is OK and probably better than just shaving a few seconds off the time. users are quite tolerant of delays if they know how long they will be and you tell them there is time to go get a coffee.

How is the return statement working in the following java method?

is it possible by any means in the following method that the print statement get executed after the if statement returns true in the for loop?
public boolean contains(Object o) {
if(o == null){
throw new IllegalArgumentException();
}
for(int i = 0; i < size(); i++){
if(o.equals(getNodeAt(i).data)){
System.out.println("contains passed here: "+o+" "+getNodeAt(i)+" "+i);
return true;
}
System.out.println(getNodeAt(1));
}
System.out.println("cointain failed here "+o);
return false;
}

Of course; call the method again. More effectively, efficiently, and specifically with an Object such that o.equals(getNodeAt(i).data is false. The truth is...
"[B]y any means" is a pretty loose constraint; you say...
is it possible by any means in the following method that the print statement get[s] executed after the if statement returns true in the for loop?
I'm saying that YES, that's possible by any means when the means are recalling the method. In fact, it's perpetually true as long as you're using whatever container.
Proof:
Assume that it is impossible by any means in the following method that the second return statement gets executed after the if statement returns true in the for loop.
static String proof(Object o) {
for(int i = 0; i < 1; ++i) {
if (o == null) {
return "I'm returning from the for loop!!!";
}
}
return "I'm now called after the for's return statement (by any means)!! - QED";
}
But given...
public static void main(String...args) {
System.out.println(proof(null));
System.out.println(proof(new String("Hello Proof!")));
}// end main method
the ouput is...
I'm returning from the for loop!!!
I'm now called after the for's return statement!! - QED
Therefore our assumption is wrong and it is possible by some means for the second return statement to get executed after the if statement returns true in the for loop.
;)
A "better" way to phrase that so it's clear what you're asking would be, perhaps, - "Is it possible for the code in a method body to continue to execute after a return statement?"
That answer is no and can be tested in any good IDE as follows.
static String proof(Object o) {
for(;;)
if(true)
return "Donkey Butts";
return "Poops";
}
This basically says forever it is true that I will return "Donkey Butts". In any IDE I'd waste my time using you will get an error for "unreachable statement". The IDE can determine this truth from your code which implicitly is telling you that any time the loop is active and the if is true the code below cannot execute.

No, it is definitely not possible.

No, but it is possible that System.out isn't flushed until after the return statement.

Yes, if you enclose in a try and finally.
public boolean contains(Object o) {
if(o == null){
throw new IllegalArgumentException();
}
for(int i = 0; i < size(); i++){
try {
if(o.equals(getNodeAt(i).data)){
System.out.println("contains passed here: "+o+" "+getNodeAt(i)+" "+i);
return true;
}
} finally {
System.out.println(getNodeAt(1));
}
}
System.out.println("cointain failed here "+o);
return false;
}

Nothing inside a method can be executed after the return statement.
But when you deal with output operations, things can happen quite differently from what you might expect. In fact, writes to an output file/device are often buffered, i.e. written to an internal array. When the array is full, it is sent to the file/device. This happens for efficiency reasons, because writing a few big chunks of data is faster than writing lots of small ones.
This means that these operations sometimes seem to happen long after the place where they appear in the code.

Is there any appreciable difference between if and if-else?

Given the following code snippets, is there any appreciable difference?
public boolean foo(int input) {
if(input > 10) {
doStuff();
return true;
}
if(input == 0) {
doOtherStuff();
return true;
}
return false;
}
vs.
public boolean foo(int input) {
if(input > 10) {
doStuff();
return true;
} else if(input == 0) {
doOtherStuff();
return true;
} else {
return false;
}
}
Or would the single exit principle be better here with this piece of code...
public boolean foo(int input) {
boolean toBeReturned = false;
if(input > 10) {
doStuff();
toBeReturned = true;
} else if(input == 0) {
doOtherStuff();
toBeReturned = true;
}
return toBeReturned;
}
Is there any perceptible performance difference? Do you feel one is more or less maintainable/readable than the others?

With the second example you state very clearly that both conditions are mutually exclusive.
With the first one, it is not so clear, and in the (unlikely) event that an assignment to input is added between both ifs, the logic would change.
Suppose someone in the future adds input = 0 before the second if.
Of course this is unlikely to happen, but if we are talking about maintainability here, if-else says clearly that there are mutually exclusive conditions, while a bunch of ifs don't, and they are not so dependent between each other as are if-else blocks.
edit:Now that I see, in this particular example, the return clause forces the mutual exclusivity, but again, we're talking about maintainability and readability.
Anyway, about performance, if this is coded in Java you shouldn't care for performance of a couple of if blocks, if it were embedded C in a really slow hardware, maybe, but certainly not with java.

Use whatever form best describes your intent.
Do not follow the single exit principle if things are this simple, though--it just makes it more confusing.

In the first:
somebody eventually, by some strange reason and when you're not looking will add some add statement that will make this method fail under certain strange conditions, everybody ( or worst, one single person ) will spend 4 hrs. watching the source code and debugging the application to finally found there was something in the middle.
The second is definitely better, not only it prevents this scenario, but also helps to clearly state , it this or this other no more.
If all the code we write within an if where 10 lines long at most, this wouldn't matter really, but unfortunately that's not the case, there exists other programmers which by some reason think that a if body should be > 200 lines long... anyway.
I don't like the third, it forces me to look for the return variable, and it's easier to find the return keyword
About speed performance, they are ( almost ) identical. Don't worry about that.

In your last example, don't do this:
public boolean foo(int input) {
boolean toBeReturned = false;
if(input > 10) {
doStuff();
toBeReturned = true;
} else if(input == 0) {
doOtherStuff();
toBeReturned = true;
}
return toBeReturned;
}
but this (notice the use of Java's final):
public boolean foo(int input) {
final boolean toBeReturned; // no init here
if(input > 10) {
doStuff();
toBeReturned = true;
} else if(input == 0) {
doOtherStuff();
toBeReturned = true;
} else {
toBeReturned = false;
}
return toBeReturned;
}
By doing so you make your intend clear and this is a godsend for IDEs supporting "programming by intention" (there's no need to "compile" to see potential errors, even on a partial AST, a good IDE can examine incomplete source in real-time and give you instant warnings).
This way you are sure not to forget to initialize your return value. This is great if later on you decide that after all you need another condition.
I do this all the time and even moreso since I started using IntelliJ IDEA (version 4 or so, a long time ago) and this has saved me so many silly distraction mistakes...
Some people will argue that this is too much code for such a simple case but that's entirely missing the point: the point is to make the intent clear so that the code reads easily and can be easily extended later on, without accidentally forgetting to assign toBeReturned and without accidentally forgetting to return from a later clause you may add.
Otherwise, if "conciseness" was the name of the game, then I'd write:
public boolean foo(int a) {
return a > 10 ? doStuff() : a == 0 ? doOtherStuff() : false;
}
Where both doStuff and doOtherStuff would return true.

Semantically — no. Performance-wise this depends on compiler, i.e. whether it can spot that both conditions cannot be true at once. I'd bet standard Sun compiler can. Whether to use single exit principle depends on tastes. I personally hate it.

Version #1 and #2 may be faster than #3, but I suppose the performance difference is minimal. I would rather focus on readability.
Personally, I would never use version #2. Between #1 and #3, I would choose the one that yields the most readable code for the case in question. I don't like many exit points in my methods, because it makes the code hard to analyze. However, there are cases where the flow becomes clearer when we exit immediately for some special cases, and continue with the main cases.

Think of this case when the two examples won't be similar:
public boolean foo(int input) {
if (input > 10) {
// doStuff();
return true;
}
System.out.println("do some other intermediary stuff");
if (input == 0) {
// doOtherStuff();
return true;
}
return false;
}
vs.
public boolean foo(int input) {
if (input > 10) {
// doStuff();
return true;
}
//System.out.println("doing some intermediary stuff... doesn't work");
else if (input == 0) {
// doOtherStuff();
return true;
} else {
return false;
}
return false;
}
The first approach is probably more flexible, but both formulas have their use in different circumstances.
Regarding performance, I think the differences are to small to be taken in consideration, for any regular java application, coded by sane programmers :).

In your case the second if would only get called if the first if failed so it's less important here but if your first if did something and didn't return, the second if (which would then always be false) would still be tested unless it was in an else-if.
In other words, there are cases where the difference between if-else-if and if-if matters, but this isn't one of them.
Example: Try this and then try it after removing the else. You get two different outputs:
int someNumber = 1;
if(someNumber < 5)
{
someNumber += 5;
Console.WriteLine("First call.");
}
else if(someNumber >= 5)
{
Console.WriteLine("Second call.");
}

Between the first and second snippets, there's really no difference. However the third snippet is quite inefficient. Because you wait to return control of the program to the caller until the last line of code in the method, you waste processing power/memory whereas the first two code snippets return control as soon as it determines one of the conditions to be true.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What does Java do with my "equals" implementations here? - java

In the first example you always return false. This is very fast. In the second example you have a much longer comparison algorithm

Well, the first example does nearly nothing.. You can descrease the iteration number to 100000, again you get the same result, 5 or 6 ms. It means that JVM optimizes aggressively that part of your code.

Related

Having trouble understanding return type placement( Big Java Ex 6.8)

Why does the equals() implementation generated by Eclipse check for null before type checking (instanceof)?

In Java what is the quickest way to check if list contains items from another list, both list are of same type?

How is the return statement working in the following java method?

Is there any appreciable difference between if and if-else?

Categories

Resources