indexOf() Strange Java.util.List behaviour with duplicate Strings

indexOf() Strange Java.util.List behaviour with duplicate Strings - java

I've just come across some odd behaviour I wouldn't expect from an ArrayList<String> in Java. This is coming, for sure, from my poor understanding of references in Java.
Let me show you this piece of code:
List<String> myList = new ArrayList<>();
myList.add("One");
myList.add("Two");
myList.add("Two");
myList.add("Three");
for (String s : myList){
System.out.println(myList.indexOf(s));
}
This piece of code provides the following output:
0
1
1
3
How come? I've added on purpose two Strings containing the same characters ("Two"), but the object itself shouldn't be the same. What am I misunderstanding here? I was expecting this other output:
0
1
2
3

ArrayList.indexOf() doesn't use reference equality to find the object. It uses the equals() method. Notice what the documentation says (emphasis mine):
returns the lowest index i such that (o==null ? get(i)==null : o.equals(get(i))), or -1 if there is no such index.
Thus, it will match on the first string that is logically equal.
EDIT:
Andremoniy's comment is absolutely right. In the case of strings literals, because they are interned, they will also happen to have the same reference. So your 2 strings "Two" are actually the same reference in this case.
System.out.println("Two" == "Two"); // will return true because they are the same reference.

It's simply because indexOf returns the first occurrence of the item in the list that is equal to the given string. See its documentation:
Returns the index of the first occurrence of the specified element in this list, or -1 if this list does not contain the element. More formally, returns the lowest index i such that (o==null ? get(i)==null : o.equals(get(i))), or -1 if there is no such index.

You'll have to note two points:
most probably you are using the same String-instance, since the constant "Two" gets interned, that is all occurences of this literal will refer to the same instance.
List.indexOf() doesn't compare items by == (that is object-identity) but using equals() - that is some class-defined way to compare two objects for equality (which makes perfect sense as otherwise you wouldn't be able to find something in the list unless you already have a reference to it). So even two different String-objects (e.g. created by new String("Two")) would still produce the same output.
For completeness the quote from the javadoc of indexOf(as already mentioned in the other answers:
returns the lowest index i such that (o==null ? get(i)==null :
o.equals(get(i))), or -1 if there is no such index.

Java doesn't allow you to make a distinction between these two, but you have stumbled across the difference between (and disparity between) a method and a function.
Simply put a method may change the state of an object. A function will not. So calling your method add(String) will change the state of the List. Specifically, it adds the String to the list. indexOf(String) however is not a method, it is a function. Now sure, Java calls them methods because... that's what they call them. And it's conceivable that the implementation --could-- change the state. But we know that it doesn't, by contract.
A function, given the same inputs (of which the current state of the underlying object is part of those inputs) will always return the same result. Always. That's what's great about a function. You can call a function (a true function) as many times as you want and always get the same result as long as your inputs and the underlying data haven't changed.
Some folks at MIT did research into the analysis of functions in Java (which to avoid confusion, they call "pure methods"). It would be nice if there was a framework that allowed you to specify that a particular method was indeed a function (or as they call it, was pure) and then have an analyzer make sure you didn't accidentally introduce a mutation into code protected by that annotation.

Related

Stream different data types

I'm getting my head around Streams API.
What is happening with the 2 in the first line? What data type is it treated as? Why doesn't this print true?
System.out.println(Stream.of("hi", "there",2).anyMatch(i->i=="2"));
The second part of this question is why doesn't the below code compile (2 is not in quotes)?
System.out.println(Stream.of("hi", "there",2).anyMatch(i->i==2));

In the first snippet, you are creating a Stream of Objects. The 2 element is an Integer, so comparing it to the String "2" returns false.
In the second snippet, you can't compare an arbitrary Object to the int 2, since there is no conversion from Object to 2.
For the first snippet to return true, you have to change the last element of the Stream to a String (and also use equals instead of == in order not to rely on the String pool):
System.out.println(Stream.of("hi", "there", "2").anyMatch(i->i.equals("2")));
The second snippet can be fixed by using equals instead of ==, since equals exists for any Object:
System.out.println(Stream.of("hi", "there",2).anyMatch(i->i.equals(2)));

You should instead make use of:
System.out.println(Stream.of("hi", "there",2).anyMatch(i->i.equals(2)));
The reason for that is the comparison within the anyMatch you're doing is for i which is an Object(from the stream) and is incompatible with an int.
Also, note that the first part compiles successfully since you are comparing an integer(object) with an object string "2" in there and hence returns false.

about java recursion to create combination of string

The question was asking me to return set containing all the possible combination of strings made up of "cc" and "ddd" for given length n.
so for example if the length given was 5 then set would include "ccddd" and "dddcc".
and length 6 would return set containing "cccccc","dddddd"
and length 7 would return set contating "ccdddcc","dddcccc","ccccddd"
and length 12 will return 12 different combination and so on
However, set returned is empty.
Can you please help?
"Please understand extremeply poor coding style"
public static Set<String> set = new HashSet<String>();
public static Set<String> generateset(int n) {
String s = strings(n,n,"");
return set; // change this
}
public static String strings(int n,int size, String s){
if(n == 3){
s = s + ("cc");
return "";}
if(n == 2){
s = s + ("ddd");
return "";}
if(s.length() == size)
set.add(s);
return strings(n-3,size,s) + strings(n-2,size,s);
}

I think you'll need to rethink your approach. This is not an easy problem, so if you're extremely new to Java (and not extremely familiar with other programming languages), you may want to try some easier problems involving sets, lists, or other collections, before you tackle something like this.
Assuming you want to try it anyway: recursive problems like this require very clear thinking about how you want to accomplish the task. I think you have a general idea, but it needs to be much clearer. Here's how I would approach the problem:
(1) You want a method that returns a list (or set) of strings of length N. Your recursive method returns a single String, and as far as I can tell, you don't have a clear definition of what the resulting string is. (Clear definitions are very important in programming, but probably even more so when solving a complex recursive problem.)
(2) The strings will either begin with "cc" or "ddd". Thus, to form your resulting list, you need to:
(2a) Find all strings of length N-2. This is where you need a recursive call to get all strings of that length. Go through all strings in that list, and add "cc" to the front of each string.
(2b) Similarly, find all strings of length N-3 with a recursive call; go through all the strings in that list, and add "ddd" to the front.
(2c) The resulting list will be all the strings from steps (2a) and (2b).
(3) You need base cases. If N is 0 or 1, the resulting list will be empty. If N==2, it will have just one string, "cc"; if N==3, it will have just one string, "ddd".
You can use a Set instead of a list if you want, since the order won't matter.
Note that it's a bad idea to use a global list or set to hold the results. When a method is calling itself recursively, and every invocation of the method touches the same list or set, you will go insane trying to get everything to work. It's much easier if you let each recursive invocation hold its own local list with the results. Edit: This needs to be clarified. Using a global (i.e. instance field that is shared by all recursive invocations) collection to hold the final results is OK. But the approach I've outlined above involves a lot of intermediate results--i.e. if you want to find all strings whose length is 8, you will also be finding strings whose length is 6, 5, 4, ...; using a global to hold all of those would be painful.

The answer to why set is returned empty is simply follow the logic. Say you execute generateset(5); which will execute strings(5,5,"");:
First iteration strings(5,5,""); : (s.length() == size) is false hence nothing added to set
Second iteration strings(2,5,""); : (n == 2) is true, hence nothing added to set
Third iteration strings(3,5,""); : (n == 3) is true, hence nothing added
to set
So set remains un changed.

Comparing elements in loop. How to best avoid comparing to self?

I have been given some code to optimise. One of the bits contains some code which takes a set with elements and for all elements in the set compares them to all other elements. The comparison isn't symmetric so no shortcut there. The code looks as follows:
for(String string : initialSet)
{
Set<String> copiedSet = new HashSet<>(initialSet);
copiedSet.remove(string);
for(String innerString : copiedSet)
{
/**
* Magic, unicorns, and elves! Compare the distance of the two strings by
* some very fancy method! No need to detail it here, just believe me it
* works, it isn't the subject of the question!
*/
}
}
To my understanding, the complexity would look as follows: the initial loop has a complexity of O(n) where n is the size of the initial set. Creating a set via the copy constructor would, in my understanding induce equals tests on all elements as the set would need to ensure the contract of the set, that is, no duplicate elements. This would mean that for n insertions, the complexity would increase by the sum from 0 to n-1. The removal would again need to check, in the worst case, n elements. The inner for loop then loops on n-1 elements.
The method I have used this far is simply:
for(String string : set)
{
for(String innerString : copiedSet)
{
if(! string.equals(innerString)
{
/**
* Magic, unicorns, and elves! Compare the distance of the two strings by
* some very fancy method! No need to detail it here, just believe me it
* works, it isn't the subject of the question!
*/
}
}
}
In my understanding, this would induce a complexity of roughly O(n^2) abstracting the complexity of the code in the if clause.
Therefore, the second piece of code would be better by at least one order plus the sum I outlined above. However, I am working with a dangerous assumption, and that is that I assume how the copy constructor of the HashSet works. Simple benchmarks showed that the results were indeed better for the second snipped by about a factor of n. I would like to tap into your knowledge to confirm these findings and gain more insight into the workings of the copy constructor if possible. Also, the ideal case would be to find a resource listing functions by time complexity but I guess that last one will remain wishful thinking!

The source code for the copy constructor is widely available, so you can study that as well as clone() and see if one of them suits you.
But truly, if all you are trying to do is avoid comparing an element with itself then I think your second idea involving magic, unicorns, and Elvis elves, is probably the best idea of all. Comparing every element in a Set with every other element in it is inherently an O(n2) problem, and you're not going to get much better than that.

There's no reason to compare the elements in a Set. By definition, they are all different to one another.
From the javadoc:
A collection that contains no duplicate elements.
More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.
As implied by its name, this interface models the mathematical set abstraction.
If you have different type of collection, though, and want to skip the comparing with self, you can't iterate with a step variable(s) (i and j) and skip the steps in which they are equal. For example:
for (int i = 0; i < collection.size(); i++) {
for (int j = 0; j < collection.size(); j++) {
if (i != j) {
//compare
}
}
}

I'm not sure exactly what you are doing in your "comparison" but if it really is just finding matching elements then the Set Interface at http://docs.oracle.com/javase/tutorial/collections/interfaces/set.html has some useful methods.
For example:
s1.retainAll(s2) — transforms s1 into the intersection of s1 and s2. (The intersection of two sets is the set containing only the elements common to both sets.)
s1.removeAll(s2) — transforms s1 into the (asymmetric) set difference of s1 and s2. (For example, the set difference of s1 minus s2 is the set containing all of the elements found in s1 but not in s2.)
s1.addAll(s2) — transforms s1 into the union of s1 and s2. (The union of two sets is the set containing all of the elements contained in either set.)
This lets you easily get intersections, combinations, etc for Java Sets.
In general the Java collections classes use very efficient algorithms so you are unlikely to improve upon them without a lot of work.

can return a compareTo different result in different situations

i have a big doubt, i have for example a list of name and a value that could be the same for different names, when i do a compareTo it would return the list order by the number. But my questions is if i execute the app in order computer with the same list, will be returned the same list?
Deppends compareTo to the computer where you execute?
I know that the solution if to implement a compareTo where you use other param in addition to do the order, but i would like if in any case with tha same list the return list it will be different.

I'm not sure if I understand your question.
In the case where you have the same value for two items, the compareTo will return 0 (in this case because this "number" is the same). This function, compareTo, only defines the order in which the algorithm should sort the items, and is up to the sorting algorithm to decide which one will be first. If you use the same algorithm, with exactly the same list, the output should be the same, at least for most commons algorithms, because they are not random. But if you change the list or the algorithm, the result can be completely different.
So, to avoid that, you can reimplement the comapreTo function to compare a second value, in case the first one is the same. For example, if you have 2 num: n1 and n2 that matches 2 strings s1 and s2, you should do:
int comparison = n1.compareTo(n2);
if (comparison == 0){
return s1.compareTo (s2));
}
Of course, you still have the problem when both n1==n2 and s1==s2. But then maybe this is not a problem because if all the data is the same, the objects are equivalent, right? ;)
I hope it helps

UnitTesting-SearchMethod

I have a method that searches for an element using binary search. I have created two test methods to test it. The first one asserts that the index returned by the method is the same as the index I give during the test. And the second test method asserts that the method returns -1 when the element is not in the array.
Are those two methods sufficient?
I am using Java

How about these cases.
The element occurs twice in the array.
If the array stores objects (rather than primitives), then consider the case when the array has
an element that is equal to the object that your searching for, but
is not the same object.
Searching for null.
When the array has no elements.
Searching for an element that's greater than everything in the array.
Searching for an element that's less than everything in the array.

Your test results should coincide with the business and functional requirements of your application. It sounds like you've covered the "happy path" scenario. Now you'll want to focus on edge cases which may include something like having empty strings to search against or when searching for a "-1".

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.