Comparing elements in loop. How to best avoid comparing to self? - java

I have been given some code to optimise. One of the bits contains some code which takes a set with elements and for all elements in the set compares them to all other elements. The comparison isn't symmetric so no shortcut there. The code looks as follows:
for(String string : initialSet)
{
Set<String> copiedSet = new HashSet<>(initialSet);
copiedSet.remove(string);
for(String innerString : copiedSet)
{
/**
* Magic, unicorns, and elves! Compare the distance of the two strings by
* some very fancy method! No need to detail it here, just believe me it
* works, it isn't the subject of the question!
*/
}
}
To my understanding, the complexity would look as follows: the initial loop has a complexity of O(n) where n is the size of the initial set. Creating a set via the copy constructor would, in my understanding induce equals tests on all elements as the set would need to ensure the contract of the set, that is, no duplicate elements. This would mean that for n insertions, the complexity would increase by the sum from 0 to n-1. The removal would again need to check, in the worst case, n elements. The inner for loop then loops on n-1 elements.
The method I have used this far is simply:
for(String string : set)
{
for(String innerString : copiedSet)
{
if(! string.equals(innerString)
{
/**
* Magic, unicorns, and elves! Compare the distance of the two strings by
* some very fancy method! No need to detail it here, just believe me it
* works, it isn't the subject of the question!
*/
}
}
}
In my understanding, this would induce a complexity of roughly O(n^2) abstracting the complexity of the code in the if clause.
Therefore, the second piece of code would be better by at least one order plus the sum I outlined above. However, I am working with a dangerous assumption, and that is that I assume how the copy constructor of the HashSet works. Simple benchmarks showed that the results were indeed better for the second snipped by about a factor of n. I would like to tap into your knowledge to confirm these findings and gain more insight into the workings of the copy constructor if possible. Also, the ideal case would be to find a resource listing functions by time complexity but I guess that last one will remain wishful thinking!

The source code for the copy constructor is widely available, so you can study that as well as clone() and see if one of them suits you.
But truly, if all you are trying to do is avoid comparing an element with itself then I think your second idea involving magic, unicorns, and Elvis elves, is probably the best idea of all. Comparing every element in a Set with every other element in it is inherently an O(n2) problem, and you're not going to get much better than that.

There's no reason to compare the elements in a Set. By definition, they are all different to one another.
From the javadoc:
A collection that contains no duplicate elements.
More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.
As implied by its name, this interface models the mathematical set abstraction.
If you have different type of collection, though, and want to skip the comparing with self, you can't iterate with a step variable(s) (i and j) and skip the steps in which they are equal. For example:
for (int i = 0; i < collection.size(); i++) {
for (int j = 0; j < collection.size(); j++) {
if (i != j) {
//compare
}
}
}

I'm not sure exactly what you are doing in your "comparison" but if it really is just finding matching elements then the Set Interface at http://docs.oracle.com/javase/tutorial/collections/interfaces/set.html has some useful methods.
For example:
s1.retainAll(s2) — transforms s1 into the intersection of s1 and s2. (The intersection of two sets is the set containing only the elements common to both sets.)
s1.removeAll(s2) — transforms s1 into the (asymmetric) set difference of s1 and s2. (For example, the set difference of s1 minus s2 is the set containing all of the elements found in s1 but not in s2.)
s1.addAll(s2) — transforms s1 into the union of s1 and s2. (The union of two sets is the set containing all of the elements contained in either set.)
This lets you easily get intersections, combinations, etc for Java Sets.
In general the Java collections classes use very efficient algorithms so you are unlikely to improve upon them without a lot of work.

Related

about java recursion to create combination of string

The question was asking me to return set containing all the possible combination of strings made up of "cc" and "ddd" for given length n.
so for example if the length given was 5 then set would include "ccddd" and "dddcc".
and length 6 would return set containing "cccccc","dddddd"
and length 7 would return set contating "ccdddcc","dddcccc","ccccddd"
and length 12 will return 12 different combination and so on
However, set returned is empty.
Can you please help?
"Please understand extremeply poor coding style"
public static Set<String> set = new HashSet<String>();
public static Set<String> generateset(int n) {
String s = strings(n,n,"");
return set; // change this
}
public static String strings(int n,int size, String s){
if(n == 3){
s = s + ("cc");
return "";}
if(n == 2){
s = s + ("ddd");
return "";}
if(s.length() == size)
set.add(s);
return strings(n-3,size,s) + strings(n-2,size,s);
}
I think you'll need to rethink your approach. This is not an easy problem, so if you're extremely new to Java (and not extremely familiar with other programming languages), you may want to try some easier problems involving sets, lists, or other collections, before you tackle something like this.
Assuming you want to try it anyway: recursive problems like this require very clear thinking about how you want to accomplish the task. I think you have a general idea, but it needs to be much clearer. Here's how I would approach the problem:
(1) You want a method that returns a list (or set) of strings of length N. Your recursive method returns a single String, and as far as I can tell, you don't have a clear definition of what the resulting string is. (Clear definitions are very important in programming, but probably even more so when solving a complex recursive problem.)
(2) The strings will either begin with "cc" or "ddd". Thus, to form your resulting list, you need to:
(2a) Find all strings of length N-2. This is where you need a recursive call to get all strings of that length. Go through all strings in that list, and add "cc" to the front of each string.
(2b) Similarly, find all strings of length N-3 with a recursive call; go through all the strings in that list, and add "ddd" to the front.
(2c) The resulting list will be all the strings from steps (2a) and (2b).
(3) You need base cases. If N is 0 or 1, the resulting list will be empty. If N==2, it will have just one string, "cc"; if N==3, it will have just one string, "ddd".
You can use a Set instead of a list if you want, since the order won't matter.
Note that it's a bad idea to use a global list or set to hold the results. When a method is calling itself recursively, and every invocation of the method touches the same list or set, you will go insane trying to get everything to work. It's much easier if you let each recursive invocation hold its own local list with the results. Edit: This needs to be clarified. Using a global (i.e. instance field that is shared by all recursive invocations) collection to hold the final results is OK. But the approach I've outlined above involves a lot of intermediate results--i.e. if you want to find all strings whose length is 8, you will also be finding strings whose length is 6, 5, 4, ...; using a global to hold all of those would be painful.
The answer to why set is returned empty is simply follow the logic. Say you execute generateset(5); which will execute strings(5,5,"");:
First iteration strings(5,5,""); : (s.length() == size) is false hence nothing added to set
Second iteration strings(2,5,""); : (n == 2) is true, hence nothing added to set
Third iteration strings(3,5,""); : (n == 3) is true, hence nothing added
to set
So set remains un changed.

Best way to remove one arraylist elements from another arraylist

What is the best performance method in Java (7,8) to eliminate integer elements of one Arraylist from another. All the elements are unique in the first and second lists.
At the moment I know the API method removeall and use it this way:
tempList.removeAll(tempList2);
The problem appears when I operate with arraylists have more than 10000 elements. For example when I remove 65000 elements, the delay appears to be about 2 seconds. But I need to opperate with even more large lists with more than 1000000 elements.
What is the strategy for this issue?
Maybe something with new Stream API should solve it?
tl;dr:
Keep it simple. Use
list.removeAll(new HashSet<T>(listOfElementsToRemove));
instead.
As Eran already mentioned in his answer: The low performance stems from the fact that the pseudocode of a generic removeAll implementation is
public boolean removeAll(Collection<?> c) {
for (each element e of this) {
if (c.contains(e)) {
this.remove(e);
}
}
}
So the contains call that is done on the list of elements to remove will cause the O(n*k) performance (where n is the number of elements to remove, and k is the number of elements in the list that the method is called on).
Naively, one could imagine that the this.remove(e) call on a List might also have O(k), and this implementation would also have quadratic complexity. But this is not the case: You mentioned that the lists are specifically ArrayList instances. And the ArrayList#removeAll method is implemented to delegate to a method called batchRemove that directly operates on the underlying array, and does not remove the elements individually.
So all you have to do is to make sure that the lookup in the collection that contains the elements to remove is fast - preferably O(1). This can be achieved by putting these elements into a Set. In the end, it can just be written as
list.removeAll(new HashSet<T>(listOfElementsToRemove));
Side notes:
The answer by Eran has IMHO two major drawbacks: First of all, it requires sorting the lists, which is O(n*logn) - and it's simply not necessary. But more importantly (and obviously) : Sorting will likely change the order of the elements! What if this is simply not desired?
Remotely related: There are some other subtleties involved in the removeAll implementations. For example, HashSet removeAll method is surprisingly slow in some cases. Although this also boils down to the O(n*n) when the elements to be removed are stored in a list, the exact behavior may indeed be surprising in this particular case.
Well, since removeAll checks for each element of tempList whether it appears in tempList2, the running time is proportional to the size of the first list multiplied by the size of the second list, which means O(N^2) unless one of the two lists is very small and can be considered as "constant size".
If, on the other hand, you pre-sort the lists, and then iterate over both lists with a single iteration (similar to the merge step in merge sort), the sorting will take O(NlogN) and the iteration O(N), giving you a total running time of O(NlogN). Here N is the size of the larger of the two lists.
If you can replace the lists by a sorted structure (perhaps a TreeSet, since you said the elements are unique), you can implement removeAll in linear time, since you won't have to do any sorting.
I haven't tested it, but something like this can work (assuming both tempList and tempList2 are sorted) :
Iterator<Integer> iter1 = tempList.iterator();
Iterator<Integer> iter2 = tempList2.iterator();
Integer current = null;
Integer current2 = null;
boolean advance = true;
while (iter1.hasNext() && iter2.hasNext()) {
if (advance) {
current = iter1.next();
advance = false;
}
if (current2 == null || current > current2) {
current2 = iter2.next();
}
if (current <= current2) {
advance = true;
if (current == current2)
iter1.remove();
}
}
I suspect removing from an ArrayList, is a perfromance hit since the list may either be divided when an element in the middle is removed, or if the list must be compacted after an element is removed. It may be faster to do this:
Create 'Set' of the elements to be removed
Create a new result ArrayList that you need, call it R. You can give it enough size at construction.
Iterate thru the original list you need elements from it removed, if the element is found in the Set, don't add it to R, otherwise add it.
This should have O(N); if creating the Set and a lookup in it is assumed constant.

Extract first k elements from a Set efficiently

Problem
I'm writing a simple Java program in which I have a TreeSet which contains Comparable elements (it's a class that I've written myself). In a specific moment I need to take only the first k elements from it.
What I've done
Currently I've found two different solution for my problem:
Using a simple method written by me; It copies the first k elements from the initial TreeSet;
Use Google Guava greatestOf method.
For the second option you need to call the method in this way:
Ordering.natural().greatestOf(mySet, 80))
But I think that it's useless to use this kind of invocation because the elements are already sorted. Am I wrong?
Question
I want to ask here which is a correct and, at the same time, efficient method to obtain a Collection derived class which contains the first k elements of a TreeSet?
Additional information
Java version: >= 7
You could use Guava's Iterables#limit:
ImmutableList.copyOf(Iterables.limit(yourSet, 7))
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterables.html#limit(java.lang.Iterable, int)
I would suggest you to use a TreeSet<YourComparableClass> collection, it seems to be the solution you are looking for.
A TreeSet can return you an iterator, and you can simply iterates K times, by storing the objects the iterator returns you: the elements will be returned you in order.
Moreover a TreeSet keep your elements always sorted: at any time, when you add or remove elements, they are inserted and removed so that the structure remains ordered.
Here a possible example:
public static ArrayList<YourComparableClass> getFirstK(TreeSet<YourComparableClass> set, int k) {
Iterator<YourComparableClass> iterator = set.iterator();
ArrayList<YourComparableClass> result = new ArrayList<>(k); //to store first K items
for (int i=0;i<k;i++) result.add(iterator.next()); //iterator returns items in order
//you should also check iterator.hasNext(); if you are not sure to have always a K<set.size()
return result;
}
The descendingIterator() method of java.util.TreeSet yields elements from greatest to least, so you can just step it however many times, inserting the elements into a collection. The running time is O(log n + k) where k is the number of elements returned, which is surely fast enough.
If you're using a HashSet, on the other hand, then the elements in fact are not sorted, so you need to use the linear-time selection method that you indicated.

How to merge two big Lists in one sorted List in Java?

I had an interview today, and they gave me:
List A has:
f
google
gfk
fat
...
List B has:
hgt
google
koko
fat
ffta
...
They asked me to merge these two list in one sorted List C.
What I said:
I added List B to List A, then I created a Set from List A, then create a List from the Set. The interviewer told me the lists are big, and this method will not be good for performance, he said it will be a nlog(n).
What would be a better approach to this problem?
Well your method would require O(3N) additional space (the concatenated List, the Set and the result List), which is its main inefficiency.
I would sort ListA and ListB with whatever sorting algorithm you choose (QuickSort is in-place requiring O(1) space; I believe Java's default sort strategy is MergeSort which typically requires O(N) additional space), then use a MergeSort-like algorithm to examine the "current" index of ListA to the current index of ListB, insert the element that should come first into ListC, and increment that list's "current" index count. Still NlogN but you avoid multiple rounds of converting from collection to collection; this strategy only uses O(N) additional space (for ListC; along the way you'll need N/2 space if you MergeSort the source lists).
IMO the lower bound for an algorithm to do what the interviewer wanted would be O(NlogN). While the best solution would have less additional space and be more efficient within that growth model, you simply can't sort two unsorted lists of strings in less than NlogN time.
EDIT: Java's not my forte (I'm a SeeSharper by trade), but the code would probably look something like:
Collections.sort(listA);
Collections.sort(listB);
ListIterator<String> aIter = listA.listIterator();
ListIterator<String> bIter = listB.listIterator();
List<String> listC = new List<String>();
while(aIter.hasNext() || bIter.hasNext())
{
if(!bIter.hasNext())
listC.add(aIter.next());
else if(!aIter.hasNext())
listC.add(bIter.next());
else
{
//kinda smells from a C# background to mix the List and its Iterator,
//but this avoids "backtracking" the Iterators when their value isn't selected.
String a = listA[aIter.nextIndex()];
String b = listB[bIter.nextIndex()];
if(a==b)
{
listC.add(aIter.next());
listC.add(bIter.next());
}
else if(a.CompareTo(b) < 0)
listC.add(aIter.next());
else
listC.add(bIter.next());
}
}

What is the fastest way to find an array within another array in Java?

Is there any equivalent of String.indexOf() for arrays? If not, is there any faster way to find an array within another other than a linear search?
Regardless of the elements of your arrays, I believe this is not much different than the string search problem.
This article provides a general intro to the various known algorithms.
Rabin-Karp and KMP might be your best options.
You should be able to find Java implementations of these algorithms and adapt them to your problem.
List<Object> list = Arrays.asList(myArray);
Collections.sort(list);
int index = Collections.binarySearch(list, find);
OR
public static int indexOf(Object[][] array, Object[] find){
for (int i = 0; i < array.length(); i ++){
if (Arrays.equals(array[i], find)){
return i;
}
}
return -1;
}
OR
public static int indexOf(Object[] array, Object find){
for (int i = 0; i < array.length(); i ++){
if (array[i].equals(find)){
return i;
}
}
return -1;
}
OR
Object[] array = ...
int index = Arrays.asList(array).indexOf(find);
As far as I know, there is NO way to find an array within another without a linear search. String.indexOf uses a linear search, just inside a library.
You should write a little library called indexOf that takes two arrays, then you will have code that looks just like indexOf.
But no matter how you do it, it's a linear search under the covers.
edit:
After looking at #ahmadabolkader's answer I kind of take this back. Although it's still a linear search, it's not as simple as just "implement it" unless you are restricted to fairly small test sets/results.
The problem comes when you want to see if ...aaaaaaaaaaaaaaaaaab fits into a string of (x1000000)...aaaaaaaaab (in other words, strings that tend to match most places in the search string).
My thought was that as soon as you found a first character match you'd just check all subsequent characters one-on-one, but that performance would degrade terrifyingly when most of the characters matched most of the time. There was a rolling hash method in #a12r's answer that sounded much better if this is a real-world problem and not just an assignment.
I'm just going to vote for #a12r's answer because of those awesome Wikipedia references.
The short answer is no - there is no faster way to find an array within an array by using some existing construct in Java. Based on what you described, consider creating a HashSet of arrays instead of an array of arrays.
Normally the way you find things in collections in java is
put them in a hashmap (dictionary) and look them up by their hash.
loop through each object and test its equality
(1) won't work for you because an array object's hash won't tell you that the contents are the same. You could write some sort of wrapper that would create a hashcode based on the contents (you'd also have to make sure equals returned values consistent with that).
(2) also will require a bit of work because object equality for arrays will only test that the objects are the same. You'd need to wrap the arrays with a test of the contents.
So basically, not unless you write it yourself.
You mean you have an array which elements also are array elements? If that is the case and the elements are sorted you might be able to use binarysearch from java.util.Arrays

Categories