Let's say I have 2 ArrayList of Points:
(0,2)->(0,3)->(0,4)
(0,2)->(0,3)->(0,6)
And I want to obtain a new list : (0,2)->(0,3)
How do I do that?
current solution
Using two foreach loops to compare the two lists, element by element. I think it's a very inefficient way. Are there any other ways?
You can use the List#retainAll(Collection<?> c) method, which:
Retains only the elements in this list that are contained in the specified collection (optional operation). In other words, removes from this list all of its elements that are not contained in the specified collection.
List<Point> first = ...
List<Point> second = ...
first.retainAll(second);
If large lists, add elements of one list to a HashSet and iterate other while keep adding elements to the new list which the HashSet contains
List<Point> list1 = new ArrayList<Point>(Arrays.asList(new Point[]{new Point(0,2), new Point(0,3), new Point(0,4)}));
List<Point> list2 = new ArrayList<Point>(Arrays.asList(new Point[]{new Point(0,2), new Point(0,3), new Point(0,6)}));
Set<Point> setList1 = new HashSet<Point>(list1);
List<Point> intersection = list2.stream().filter( l -> setList1.contains(l)).collect(Collectors.toList());
Time complexity,
Adding to Set = O(n), Iterating list = O(k) time hashset lookup O(1)
~ overall O(n)
Related
I want to iterate over 2 collections each one roughly 600 records. I want to compare each element of collection one with all other elements in collection two. If I choose my collection to be LinkedHashSet then, I have to call iterator on each collection and have two while (inner and outer) loop.
And for the choice of ArrayList, I will have two for loops (inner and outer) to read data from each collection.
Primarily I chose LinkedHashSet because I read that LinkedHashSet has a better performance, I also preferred using set to remove duplicate, but after seeing it running very slow, taking around 2 hours to finish, I thought maybe It would be better to copy set into an ArrayList and then iterate over ArrayList instead of LinkedHashSet.
I was wondering which one would have a better choice to speed up the runtime.
public ArrayList> processDataSourcesV2(LinkedHashMap> ppmsFinalResult,LinkedHashMap> productDBFinalResult ) {
//each parameter is a hashmap that contains key(id) and value (set of unique parameters)
ArrayList> result = new ArrayList>();
Iterator<Entry<RecordId, LinkedHashSet<String>>> ppmsIterator = ppmsFinalResult.entrySet().iterator();
Iterator<Entry<RecordId, LinkedHashSet<String>>> productIdIterator =null;
//pair of id from each list
ArrayList<Pair> listOfIdPair = new ArrayList<Pair>();
while (ppmsIterator.hasNext()) {
//RecordId object is an object containing the id and which list this id belongs to
Entry<RecordId, LinkedHashSet<String>> currentPpmsPair = ppmsIterator.next();
RecordId currentPpmsIDObj = currentPpmsPair.getKey();
//set of unique string
LinkedHashSet<String> currentPpmsCleanedTerms = (LinkedHashSet<String>)currentPpmsPair.getValue();
productIdIterator = productDBFinalResult.entrySet().iterator();
while (productIdIterator.hasNext()) {
Entry<RecordId, LinkedHashSet<String>> currentProductDBPair = productIdIterator.next();
RecordId currentProductIDObj = currentProductDBPair.getKey();
LinkedHashSet<String> currentProductCleanedTerms = (LinkedHashSet<String>)currentProductDBPair.getValue();
ArrayList<Object> listOfRowByRowProcess = new ArrayList <Object>();
Pair currentIDPair = new Pair(currentPpmsIDObj.getIdValue(),currentProductIDObj.getIdValue());
//check for duplicates
if ((currentPpmsIDObj.getIdValue()).equals(currentProductIDObj.getIdValue()) || listOfIdPair.contains(currentIDPair.reverse()) ) {
continue;
}
else {
LinkedHashSet<String> commonTerms = getCommonTerms(currentPpmsCleanedTerms,currentProductCleanedTerms);
listOfIdPair.add(currentIDPair.reverse());
if (commonTerms.size()>0) {
listOfRowByRowProcess.add(currentPpmsIDObj);
listOfRowByRowProcess.add(currentProductIDObj);
listOfRowByRowProcess.add(commonTerms);
result.add(listOfRowByRowProcess);
}
}
}
}
return result;
}
public LinkedHashSet<String> getCommonTerms(LinkedHashSet<String> setOne, LinkedHashSet<String> setTwo){
Iterator<String> setOneIt = setOne.iterator();
LinkedHashSet<String> setOfCommon = new LinkedHashSet<String>();
//making hard copy
while (setOneIt.hasNext()) {
setOfCommon.add(setOneIt.next());
}
setOfCommon.retainAll(setTwo);
return setOfCommon;
}
Arrays are faster than any other structure when it comes to iteration (all elements are stored sequentially in memory ), one the other hand, it slower when deleting and inserting element because it has to ensure the sequential storage. Iterating over linked list is slower because you might get page fault... So it's up to you which one to choose.
If you want to find which elements are in both collections, make one a Set and get its intersection with the other collection:
Collection<T> collection1, collection2; // given these
Set<T> intersection = new HashSet<T>(collection1);
intersection.retainAll(collection2);
This will execute in O(n) time, where n is the size of collection2, because finding elements in a HashSet performs in constant time.
My guess is you are checking every element of collection1 with every element of collection2, which has O(n2) time complexity.
I have two big arrays of strings. I want to remove the elements from the first array that do not exist in the second array.
First I create two arrays:
Array to modify:
String[] sarr = fdata.split(System.getProperty("line.separator"));
ArrayList<String> items = new ArrayList(Arrays.asList(sarr));
Filter array:
List<String> filter = new ArrayList<String>();
filter = Arrays.asList(voc.split(System.getProperty("line.separator")))
Then I create Iterator to iterate through the elements of the items array and check if the iterated item exists in filter array, if it does, remove it from items:
Iterator<String> it = items.iterator();
while (it.hasNext()) {
String s = it.next();
if (!filter.contains(s)) {
it.remove();
}
}
items arrays contains 286,568 strings and filter contains 100,000 strings. It appears that the operation takes too much time so I am not doing it efficiently.
Is there a faster way?
Just use different collection types. For the Filter, use HashSet for O(1) (instad of O(n) for ArrayList) search complexity, and for the items, use LinkedList instead of ArrayList - which will be more efficient for the remove operations.
I didn't test this code, but...
String[] sarr = fdata.split(System.getProperty("line.separator"));
LinkedList<String> items = new LinkedList(Arrays.asList(sarr));
Set<String> filter = new HashSet<String>();
filter = new HashSet(Arrays.asList(voc.split(System.getProperty("line.separator"))));
items.retainAll(filter);
When you call collection.contains(element) often for a large collection, you should not use an ArrayList, but rather a HashSet.
Set<String> filter = new HashSet<>();
Collections.addAll(filter, voc.split(System.getProperty("line.separator")));
A HashSet is an optimized data structure for looking up things.
I've two ArrayList both containing Integer values. My objective is to get identical/common/duplicate values comparing these 2 list. In other words (in SQL parlance), I need the INTERSECT result of two lists, that is, values that appear in both list.
Example:
ArrayList<Integer> list1 = new ArrayList<Integer>();
list1.add(100);
list1.add(200);
list1.add(300);
list1.add(400);
list1.add(500);
ArrayList<Integer> list2 = new ArrayList<Integer>();
list2.add(300);
list2.add(600);
One kind of implementation/solution I could think along is looping one of the list something like:
ArrayList<Integer> intersectList = new ArrayList<Integer>();
for (Integer intValue : list1)
{
if(list2.contains(intValue))
intersectList.add(intValue);
}
In this case, intersectList would contain only 1 Integer item being added, that is 300, which appears in both list.
My question is, are there any better/fastest/efficient way of implementing this logic? Any options available in Apache Commons library?. Any other ideas/suggestions/comments are appreciated.
NOTE: For illustration purpose, I've just shown here 5 items and 2 items being added into the list. In my real-time implementation, there will be more than 1000 elements in each list. Therefore, performance is also a key factor to be considered.
If you're okay with overwriting result for list1:
list1.retainAll(list2);
otherwise clone/copy list1 first.
Not sure on performance though.
list1.retainAll(list2)//for intersection
Use ListUtils from org.apache.commons.collections if you do not want to modify existing list.
ListUtils.intersection(list1, list2)
I am breaking my mind to find a solution to the following problem.
I have 4 different ArrayList that get their values from a Database.
They can have size from 0 (including) till what ever.
Each list may have different size and values also.
What I am trying to do effectively is :
Compare all the non 0 size lists and check if they have some common integers and what are those values.
Any ideas?
Thank you!
If you need a collection of common integers for all, excluding empty ones:
List<List<Integer>> lists = ...
Collection<Integer> common = new HashSet<Integer>(lists.get(0));
for (int i = 1; i < lists.size(); i++) {
if (!lists.get(i).isEmpty())
common.retainAll(lists.get(i));
}
at the end the common will contain integers that common for all of them.
You can use set intersection operations with your ArrayList objects.
Something like this:
List<Integer> l1 = new ArrayList<Integer>();
l1.add(1);
l1.add(2);
l1.add(3);
List<Integer> l2= new ArrayList<Integer>();
l2.add(4);
l2.add(2);
l2.add(3);
List<Integer> l3 = new ArrayList<Integer>(l2);
l3.retainAll(l1);
Now, l3 should have only common elements between l1 and l2.
You might be wanting to use apache commons CollectionUtils.intersection() to get the intersection of two collections...
Iteratively generate the intersection, and if it is not empty when you are done - you have a common element, and it is in this resulting collection.
Regarding empty lists: just check if its size() is 0, and if it is - skip this list.
You can do this. If you have multiple elements to search, put the search in a loop.
List aList = new ArrayList();
aList.add(new Integer(1));
if(aList !=null && !aList.isEmpty()) {
if(aList.contains(1)) {
System.out.println("got it");
}
}
What is the "good" (and why ?) solution to get a List from a Set and sorted against a given Comparator ?
Set<Object> set = new HashSet<Object>();
// add stuff
List<Object> list = new ArrayList<Object>(set);
Collections.sort(list, new MyComparator());
Just construct it. The ArrayList has a constructor taking another Collection.
Set<Foo> set = new TreeSet<Foo>(new FooComparator<Foo>());
// Fill it.
List<Foo> list = new ArrayList<Foo>(set);
// Here's your list with items in the same order as the original set.
Either:
Set<X> sortedSet = new TreeSet<X>(comparator); ...
List<X> list = new ArrayList<X>(sortedSet);
or:
Set<X> unsortedSet = new HashSet<X>(); ...
List<X> list = new ArrayList<X>(unsortedSet);
Collections.sort(list, comparator);
Assuming that you start with an unsorted set or a set sorted on a different order, the following is probably the most efficient assuming that you require a modifiable List.
Set<T> unsortedSet = ...
List<T> list = new ArrayList<T>(unsortedSet);
Collections.sort(list, comparator);
If an unmodifiable List is acceptable, then the following is a bit faster:
Set<T> unsortedSet = ...
T[] array = new T[unsortedSet.size()];
unsortedSet.toArray(array);
Arrays.sort(array, comparator);
List<T> list = Arrays.asList(array);
In the first version, Collections.sort(...) copies the list contents to an array, sorts the array, and copies the sorted elements back to the list. The second version is faster because it doesn't need to copy the sorted elements.
But to be honest the performance difference is probably not significant. Indeed, as the input set sizes get larger, the performance will be dominated by the O(NlogN) time to do the sorting. The copying steps are O(N) and will reduce in importance as N grows.
This is how you get a List when you have a Set:
List list = new ArrayList(set);
Not sure what you expect to do with the Comparator. If the Set is sorted, the list will contain the elements in sorted order.