Finding duplications in a List of Lists - java

I've got a list of lists of an object having properties (let's say a name and a value).
I'd like to find and remove the duplicated lists if it's a part of another list - their order does matter.
Example in a pseudocode (I'll present it as a list of strings to make it easier):
List<List<String>> = [
["1", "2", "3", "4", "5", "6", "7", "8"],
["3", "4", "5", "6", "7", "8"],
["5", "6", "7", "8"],
["7", "8"]
]
in this case, I'd like to remove shorter lists because they are part of the longer/most extended list.
My classes can be described like that:
public static class MyObjectBig {
private String startElem;
private String endElem;
private List<MyObjectOne> list;
// constructors, getters, etc.
}
public static class MyObjectOne {
private String name;
private String value;
// constructors, getters, etc.
}
The large lists are massive - like 21,000 elements, and the small lists are most at 20 elements, usually ~10.
I've got a couple of ideas, like creating a Map of the having the first item as a key and all list as a value. Then iterating over all items checking if it exists in it if it does, then checking the next items. But that's very slow.
I'll appreciate any hints or ideas.

Assuming that equals/hashCode contract is properly implemented in the MyObjectOne you can define an auxiliary wrapper class which would hold a reference to the original MyObjectBig instance and maintain a HashMap of map entries containing frequencies of each MyObjectOne elements from the original list (so if there could be duplicated elements within a list, they would be taken into account during comparison).
That's how such wrapper class might look like:
public static class BigObjectWrapper {
private MyObjectBig bigObject;
private Map<MyObjectOne, Long> frequencies;
private int listSize;
private int mapSize;
private boolean isDuplicate;
public BigObjectWrapper(MyObjectBig bigObject) {
this.bigObject = bigObject;
this.frequencies = bigObject.getList().stream()
.collect(Collectors.groupingBy(
Function.identity(),
Collectors.counting()
));
this.listSize = bigObject.getList().size();
this.mapSize = frequencies.size();
}
public boolean contains(BigObjectWrapper other) {
if (listSize < other.listSize || mapSize < other.mapSize) return false;
return containsAll(other.frequencies);
}
private boolean containsAll(Map<MyObjectOne, Long> otherFrequencies) {
return otherFrequencies.entrySet().stream() // frequency of each element in other map less or equal to frequency of this element
.allMatch(entry -> frequencies.getOrDefault(entry.getKey(), 0L) >= entry.getValue());
}
public boolean isDuplicate() {
return isDuplicate;
}
public void setDuplicate() {
isDuplicate = true;
}
// getters, equals/hashCode implemented based on the size and set properties
}
Now, the algorithm can be implemented in the following steps:
Create a list of wrapped objects.
Compare each wrapper against others. If the wrapper has been already proven to be a duplicate, it would be skipped. Also, in cases when a wrapper object containing a smaller set would be compared against a wrapper object holding a greater set, the call would terminate immediately returning false (therefore there's no point in sorting the data).
Generate a new list of MyObjectBig from wrapper objects that were not proven to be duplicates.
That's how implementation might look like:
List<MyObjectBig> source = List.of();
List<BigObjectWrapper> wrappers = source.stream()
.map(BigObjectWrapper::new)
.toList();
for (BigObjectWrapper wrapper : wrappers) {
if (wrapper.isDuplicate()) continue;
for (BigObjectWrapper next : wrappers) {
if (next.isDuplicate()) continue;
if (wrapper.contains(next)) next.setDuplicate();
}
}
List<MyObjectBig> result = wrappers.stream()
.filter(w -> !w.isDuplicate())
.map(BigObjectWrapper::getBigObject)
.toList();
Note: if you don't need to consider the case when one larger list can be a part of another larger list, then you can split the data into two parts. And then as the first step, check only smaller lists against the larger once, and as the second step, check the remained non-duplicated smaller lists against each other.

More of a partial answer... This is the dumbest solution I could think of. I believe it's O(N² * M) where N is the length of the parent list, and M is the length of the sublist.
I was interested in how slow this would be. It's often a mistake to simply assume the stupid solution is going to be "too slow" without any proof.
I assumed the values in each sublist were unique - making them more like ordered sets - and played around with a pool size of possible values. Fewer possible values in the sublist = more sublists removed = faster.
On my modest CPU, for 20 possible values it takes 100ms, for 25 possible values it takes 3secs, and for 40 possible values, between 8 and 10 seconds.
Not posting this because I believe it's a good solution, but it is at least a working benchmark (as far as I know!) that a more complex solution should be surpassing.
public static void main(String[] args) {
List<List<String>> listOfList = randomData();
long start = System.currentTimeMillis();
Set<Integer> idxToRemove = new HashSet<>();
for (int i = 0; i < listOfList.size() - 1; ++i) {
if (idxToRemove.contains(i)) continue;
for (int j = i + 1; j < listOfList.size(); ++j) {
if (idxToRemove.contains(j)) continue;
if (listOfList.get(i).containsAll(listOfList.get(j))) {
idxToRemove.add(j);
}
}
}
idxToRemove.stream()
.sorted(Comparator.reverseOrder())
.mapToInt(i -> i)
.forEach(listOfList::remove);
long duration = System.currentTimeMillis() - start;
System.out.println(idxToRemove.size());
System.out.println("Took " + duration + "ms");
}
private static List<List<String>> randomData() {
Random random = new Random();
int mainLength = 21_000;
int possibleValues = 40;
List<List<String>> listOfList = new ArrayList<>(mainLength);
for (int i = 0; i < mainLength; ++i) {
int subListSize = random.nextInt(5, 20);
List<Integer> subList = new ArrayList<>(subListSize);
while (subList.size() < subListSize) {
int value = random.nextInt(possibleValues);
if (!subList.contains(value)) {
subList.add(value);
}
}
listOfList.add(
subList.stream().sorted().map(String::valueOf).collect(Collectors.toList())
);
}
return listOfList;
}

I would work with an algorithm like this:
Order the lists by their length. You can do this by assigning the inner lists to a new outer list that you sort with Collections.sort(List<T> newOuter, Comparator<? super T> c)) with T being your inner List<String> and a custom Comparator that compares for the length of a list. (String used here only since you used that for your example, type doesn't actually matter.)
Start searching for the shortest list, first in the longest, then in the second longest, etc. until you can either remove the searched list or have searched all other lists. You don't have to touch that shortest list then, it will be kept. Repeat with the second shortest list, etc.
To search a list in another list:
Search the index of the first element of the shorter list in the longer list.
When you found that and the rest of the longer list is long enough to contain the shorter list, you grab a .subList from the longer list starting with the found index over the length of the shorter list.
You can then compare these lists with .equals() to find if the shorter list is contained within the longer one. If your inner list elements don't support .equals() sufficiently for this comparison, you can write your own Comparator and use that to compare the two lists.
If no match was found in step 3, you can continue searching the first element of the shorter list in the longer list. As List.indexOf doesn't provide a parameter at which index to start the search, probably better use a classic for-loop with an index for this. (It also would allow you to use a custom comparison if .equals doesn't work for you.)
To remove a list that was found within another one, you have to take care to remove it from the original outer list, not from your sorted version.
After thinking a bit about it I'm actually not so sure if starting the search in the longest list will save you time on average, due to the length of the searched list. It probably depends a bit on the distribution of lengths.
Also I would like to warn you of premature optimization: first implement a working version, then use a profiler to find the places in your code where optimization will have the biggest impact.

Related

Java: See if ArrayList contains ArrayList with duplicate values

I'm currently trying to create a method that determine if an ArrayList(a2) contains an ArrayList(a1), given that both lists contain duplicate values (containsAll wouldn't work as if an ArrayList contains duplicate values, then it would return true regardless of the quantity of the values)
This is what I have: (I believe it would work however I cannot use .remove within the for loop)
public boolean isSubset(ArrayList<Integer> a1, ArrayList<Integer> a2) {
Integer a1Size= a1.size();
for (Integer integer2:a2){
for (Integer integer1: a1){
if (integer1==integer2){
a1.remove(integer1);
a2.remove(integer2);
if (a1Size==0){
return true;
}
}
}
}
return false;
}
Thanks for the help.
Updated
I think the clearest statement of your question is in one of your comments:
Yes, the example " Example: [dog,cat,cat,bird] is a match for
containing [cat,dog] is false but containing [cat,cat,dog] is true?"
is exactly what I am trying to achieve.
So really, you are not looking for a "subset", because these are not sets. They can contain duplicate elements. What you are really saying is you want to see whether a1 contains all the elements of a2, in the same amounts.
One way to get to that is to count all the elements in both lists. We can get such a count using this method:
private Map<Integer, Integer> getCounter (List<Integer> list) {
Map<Integer, Integer> counter = new HashMap<>();
for (Integer item : list) {
counter.put (item, counter.containsKey(item) ? counter.get(item) + 1 : 1);
}
return counter;
}
We'll rename your method to be called containsAllWithCounts(), and it will use getCounter() as a helper. Your method will also accept List objects as its parameters, rather than ArrayList objects: it's a good practice to specify parameters as interfaces rather than implementations, so you are not tied to using ArrayList types.
With that in mind, we simply scan the counts of the items in a2 and see that they are the same in a1:
public boolean containsAllWithCounts(List<Integer> a1, List<Integer> a2) {
Map<Integer,Integer> counterA1 = getCounter(a1);
Map<Integer,Integer> counterA2 = getCounter(a2);
boolean containsAll = true;
for (Map.Entry<Integer, Integer> entry : counterA2.entrySet ()) {
Integer key = entry.getKey();
Integer count = entry.getValue();
containsAll &= counterA1.containsKey(key) && counterA1.get(key).equals(count);
if (!containsAll) break;
}
return containsAll;
}
If you like, I can rewrite this code to handle arbitrary types, not just Integer objects, using Java generics. Also, all the code can be shortened using Java 8 streams (which I originally used - see comments below). Just let me know in comments.
if you want remove elements from list you have 2 choices:
iterate over copy
use concurrent list implementation
see also:
http://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#synchronizedList-java.util.List-
btw why you don't override contains method ??
here you use simple Object like "Integer" what about when you will be using List< SomeComplexClass > ??
example remove with iterator over copy:
List<Integer> list1 = new ArrayList<Integer>();
List<Integer> list2 = new ArrayList<Integer>();
List<Integer> listCopy = new ArrayList<>(list1);
Iterator<Integer> iterator1 = listCopy.iterator();
while(iterator1.hasNext()) {
Integer next1 = iterator1.next();
Iterator<Integer> iterator2 = list2.iterator();
while (iterator2.hasNext()) {
Integer next2 = iterator2.next();
if(next1.equals(next2)) list1.remove(next1);
}
}
see also this answer about iterator:
Concurrent Modification exception
also don't use == operator to compare objects :) instead use equal method
about use of removeAll() and other similarly methods:
keep in mind that many classes that implements list interface don't override all methods from list interface - so you can end up with unsupported operation exception - thus I prefer "low level" binary/linear/mixed search in this case.
and for comparison of complex classes objects you will need override equal and hashCode methods
f you want to remove the duplicate values, simply put the arraylist(s) into a HashSet. It will remove the duplicates based on equals() of your object.
- Olga
In Java, HashMap works by using hashCode to locate a bucket. Each bucket is a list of items residing in that bucket. The items are scanned, using equals for comparison. When adding items, the HashMap is resized once a certain load percentage is reached.
So, sometimes it will have to compare against a few items, but generally it's much closer to O(1) than O(n).
in short - there is no need to use more resources (memory) and "harness" unnecessary classes - as hash map "get" method gets very expensive as count of item grows.
hashCode -> put to bucket [if many item in bucket] -> get = linear scan
so what counts in removing items ?
complexity of equals and hasCode and used of proper algorithm to iterate
I know this is maybe amature-ish, but...
There is no need to remove the items from both lists, so, just take it from the one list
public boolean isSubset(ArrayList<Integer> a1, ArrayList<Integer> a2) {
for(Integer a1Int : a1){
for (int i = 0; i<a2.size();i++) {
if (a2.get(i).equals(a1Int)) {
a2.remove(i);
break;
}
}
if (a2.size()== 0) {
return true;
}
}
return false;
}
If you want to remove the duplicate values, simply put the arraylist(s) into a HashSet. It will remove the duplicates based on equals() of your object.

Set vs List when need both unique elements and access by index

I need to keep a unique list of elements seen and I also need to pick random one from them from time to time. There are two simple ways for me to do this.
Keep elements seen in a Set - that gives me uniqueness of elements. When there is a need to pick random one, do the following:
elementsSeen.toArray()[random.nextInt(elementsSeen.size())]
Keep elements seen in a List - this way no need to convert to array as there is the get() function for when I need to ask for a random one. But here I would need to do this when adding.
if (elementsSeen.indexOf(element)==-1) {elementsSeen.add(element);}
So my question is which way would be more efficient? Is converting to array more consuming or is indexOf worse? What if attempting to add an element is done 10 or 100 or 1000 times more often?
I am interested in how to combine functionality of a list (access by index) with that of a set (unique adding) in the most performance effective way.
If using more memory is not a problem then you can get the best of both by using both list and set inside a wrapper:
public class MyContainer<T> {
private final Set<T> set = new HashSet<>();
private final List<T> list = new ArrayList<>();
public void add(T e) {
if (set.add(e)) {
list.add(e);
}
}
public T getRandomElement() {
return list.get(ThreadLocalRandom.current().nextInt(list.size()));
}
// other methods as needed ...
}
HashSet and TreeSet both extend AbstractCollection, which includes the toArray() implementation as shown below:
public Object[] toArray() {
// Estimate size of array; be prepared to see more or fewer elements
Object[] r = new Object[size()];
Iterator<E> it = iterator();
for (int i = 0; i < r.length; i++) {
if (! it.hasNext()) // fewer elements than expected
return Arrays.copyOf(r, i);
r[i] = it.next();
}
return it.hasNext() ? finishToArray(r, it) : r;
}
As you can see, its responsible for allocating the space for an array, as well as creating an Iterator object for copying. So, for a Set, adding is O(1), but retrieving a random element will be O(N) because of the element copy operation.
A List, on the other hand, allows you quick access to a specific index in the backing array, but doesn't guarantee uniqueness. You would have to re-implement the add, remove and associated methods to guarantee uniqueness on insert. Adding a unique element will be O(N), but retrieval will be O(1).
So, it really depends on which area is your potential high usage point. Are the add/remove methods going to be heavily used, with random access used sparingly? Or is this going to be a container for which retrieval is most important, since few elements will be added or removed over the lifetime of the program?
If the former, I'd suggest using the Set with toArray(). If the latter, it may be beneficial for you to implement a unique List to take advantage to the fast retrieval. The significant downside is add contains many edge cases for which the standard Java library takes great care to work with in an efficient manner. Will your implementation be up to the same standards?
Write some test code and put in some realistic values for your use case. Neither of the methods are so complex that it's not worth the effort, if performance is a real issue for you.
I tried that quickly, based on the exact two methods you described, and it appears that the Set implementation will be quicker if you are adding considerably more than you are retrieving, due to the slowness of the indexOf method. But I really recommend that you do the tests yourself - you're the only person who knows what the details are likely to be.
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Random;
import java.util.Set;
public class SetVsListTest<E> {
private static Random random = new Random();
private Set<E> elementSet;
private List<E> elementList;
public SetVsListTest() {
elementSet = new HashSet<>();
elementList = new ArrayList<>();
}
private void listAdd(E element) {
if (elementList.indexOf(element) == -1) {
elementList.add(element);
}
}
private void setAdd(E element) {
elementSet.add(element);
}
private E listGetRandom() {
return elementList.get(random.nextInt(elementList.size()));
}
#SuppressWarnings("unchecked")
private E setGetRandom() {
return (E) elementSet.toArray()[random.nextInt(elementSet.size())];
}
public static void main(String[] args) {
SetVsListTest<Integer> test;
List<Integer> testData = new ArrayList<>();
int testDataSize = 100_000;
int[] addToRetrieveRatios = new int[] { 10, 100, 1000, 10000 };
for (int i = 0; i < testDataSize; i++) {
/*
* Add 1/5 of the total possible number of elements so that we will
* have (on average) 5 duplicates of each number. Adjust this to
* whatever is most realistic
*/
testData.add(random.nextInt(testDataSize / 5));
}
for (int addToRetrieveRatio : addToRetrieveRatios) {
/*
* Test the list method
*/
test = new SetVsListTest<>();
long t1 = System.nanoTime();
for(int i=0;i<testDataSize; i++) {
// Use == 1 here because we don't want to get from an empty collection
if(i%addToRetrieveRatio == 1) {
test.listGetRandom();
} else {
test.listAdd(testData.get(i));
}
}
long t2 = System.nanoTime();
System.out.println(((t2-t1)/1000000L)+" ms for list method with add/retrieve ratio "+addToRetrieveRatio);
/*
* Test the set method
*/
test = new SetVsListTest<>();
t1 = System.nanoTime();
for(int i=0;i<testDataSize; i++) {
// Use == 1 here because we don't want to get from an empty collection
if(i%addToRetrieveRatio == 1) {
test.setGetRandom();
} else {
test.setAdd(testData.get(i));
}
}
t2 = System.nanoTime();
System.out.println(((t2-t1)/1000000L)+" ms for set method with add/retrieve ratio "+addToRetrieveRatio);
}
}
}
Output on my machine was:
819 ms for list method with add/retrieve ratio 10
1204 ms for set method with add/retrieve ratio 10
1547 ms for list method with add/retrieve ratio 100
133 ms for set method with add/retrieve ratio 100
1571 ms for list method with add/retrieve ratio 1000
23 ms for set method with add/retrieve ratio 1000
1542 ms for list method with add/retrieve ratio 10000
5 ms for set method with add/retrieve ratio 10000
You could extend HashSet and track the changes to it, maintaining a current array of all entries.
Here I keep a copy of the array and adjust it every time the set changes. For a more robust (but more costly) solution you could use toArray in your pick method.
class PickableSet<T> extends HashSet<T> {
private T[] asArray = (T[]) this.toArray();
private void dirty() {
asArray = (T[]) this.toArray();
}
public T pick(int which) {
return asArray[which];
}
#Override
public boolean add(T t) {
boolean added = super.add(t);
dirty();
return added;
}
#Override
public boolean remove(Object o) {
boolean removed = super.remove(o);
dirty();
return removed;
}
}
Note that this will not recognise changes to the set if removed by an Iterator - you will need to handle that some other way.
So my question is which way would be more efficient?
Quite a difficult question to answer depending on what one does more, insert or select at random?
We need to look at the Big O for each of the operations. In this case (best cases):
Set: Insert O(1)
Set: toArray O(n) (I'd assume)
Array: Access O(1)
vs
List: Contains O(n)
List: Insert O(1)
List: Access O(1)
So:
Set: Insert: O(1), Access O(n)
List: Insert: O(n), Access O(1)
So in the best case they are much of a muchness with Set winning if you insert more than you select, and List if the reverse is true.
Now the evil answer - Select one (the one that best represents the problem (so Set IMO)), wrap it well and run with it. If it is too slow then deal with it later, and when you do deal with it, look at the problem space. Does your data change often? No, cache the array.
It depends what you value more.
List implementations in Java normally makes use of an array or a linked list. That means inserting and searching for an index is fast, but searching for a specific element will require looping thought the list and comparing each element until the element is found.
Set implementations in Java mainly makes use of an array, the hashCode method and the equals method. So a set is more taxing when you want to insert, but trumps list when it comes to looking for an element. As a set doesn't guarantee the order of the elements in the structure, you will not be able to get an element by index. You can use an ordered set, but this brings with it latency on the insert due to the sort.
If you are going to be working with indexes directly, then you may have to use a List because the order that element will be placed into Set.toArray() changes as you add elements to the Set.
Hope this helps :)

Insert Objects in a Constant Length List - Java

I am looking for a good optimal strategy to write a code for the following problem.
I have a List of Objects.
The Objects have a String "valuation" field among other fields. The valuation field may or may not be unique.
The List is of CONSTANT length which is calculated within the program. The length would usually be between 100 and 500.
The Objects are all sorted within the list based on String field - valuation
As new objects are found or created: The String field valuation is compared with the existing members of the list.
If the comparison fails e.g. with the bottom member of the list, then the Object is NOT added to the list.
If the comparison succeeds and the new Object is added to the list - within the sort criteria;the new object is added in the right position and the bottom member is ousted from the list to keep the length of the list constant.
One strategy which I am thinking:
Keep adding members to the list - till it reaches maxLength
Sort - (e.g Collections.sort with a comparator) the list
When a new member is created - compare it with the bottom member of the list.
If success - replace the bottom member else continue
Re-Sort the List - if success
and continue.
The program loops through million or more iterations, thus optimized comparison and running has become an issue.
Any guidance on a good strategy to address this within the Java domain. What lists will be the most effective e.g. LinkedList or ArrayLists or Sets etc. Which sort/insert (standard package) will be the most effective?
Consider this example based on TreeSet and comparing over a String for Results. As you can see, after enough iterations, only elements with very large keys are left in List. On my quite old laptop, I had 10.000 items in less than 50ms - so roundabout 5s per million list operations.
public class Valuation {
public static class Element implements Comparable<Element> {
String valuation;
String data;
Element(String v, String d) {
valuation = v;
data = d;
}
#Override
public int compareTo(Element e) {
return valuation.compareTo(e.valuation);
}
}
private TreeSet<Element> ts = new TreeSet<Element>();
private final static int LISTLENGTH = 500;
public static void main(String[] args) {
NumberFormat nf = new DecimalFormat("00000");
Random r = new Random();
Valuation v = new Valuation();
for(long l = 1; l < 150; ++l) {
long start = System.currentTimeMillis();
for(int j = 0; j < 10000; ++j) {
v.pushNew(new Element(nf.format(r.nextInt(50000))
, UUID.randomUUID().toString()));
}
System.out.println("10.000 finished in " + (System.currentTimeMillis()-start) + "ms. Set contains: " + v.ts.size());
}
for(Element e : v.ts) {
System.out.println("-> " + e.valuation);
}
}
private void pushNew(Element hexString) {
if(ts.size() < LISTLENGTH) {
ts.add(hexString);
} else {
if(ts.first().compareTo(hexString) < 0) {
ts.add(hexString);
if(ts.size() > LISTLENGTH) {
ts.remove(ts.first());
}
}
}
}
}
Any guidance on a good strategy to address this within the Java domain.
My advice would be - there is no need to do any sorting. You can ensure your data is sorted by doing binary insertion as you add more objects into your collection.
This way, as you add more items, the collection itself is already is a sorted state.
After the 500th item, if you want to add another one, we just perform another binary insertion. The insertion performance always remains at O(log(n)) and there is no need to perform any sorting.
Comparing with your algorithm
Your algorithm works fine from 1 - 4. But step 5 will likely be the bottle neck of your algorithm:
5.Re-Sort the List - if success
This is because even though your list will only have a maximum of 500 items, but there can be infinite number of insertions to be performed on this list after the 500th item is being added.
Imagine having another 1 million more insertions and (in worse case scenario), all 1 million items "succeeded" and can be inserted into the list, that implies your algorithm will need to perform 1 million more sorts!
That will be 1 million * n(log(n)) for sorting.
Compare with binary insertion, in the worse case it will be 1 million * log(n) for insertion (no sorting).
What lists will be the most effective e.g. LinkedList or ArrayLists or Sets etc.
If you use ArrayList, insertion won't be as efficient as compared to a linked list since ArrayList is backed by an array. However accessing of elements is only O(1) for arrayList as compare to linked list which is O(n). So there isn't a data structure which is efficient for all scenarios. You will have to plan your algorithm first and see which one fits best for your strategy.
Which sort/insert (standard package) will be the most effective?
As far as I know, there is Arrays.sort() and Collections.sort() available which will give you a good performance of O(n log(n)) as they are using a dual pivot sort which will be more effective than a simple insertion/bubble/selection sort created by yourself.

Efficient search for not empty intersection (Java)

I have a method that returns an integer value or integer range (initial..final) and I want to know if values are all disjoint.
Is there a more efficient solution than the following one:
ArrayList<Integer> list = new ArrayList<Integer>();
// For single value
int value;
if(!list.contains(value))
list.add(value);
else
error("",null);
// Range
int initialValue,finalValue;
for(int i = initialValue; i <= finalValue; i++){
if(!list.contains(i))
list.add(i);
else
error("",null);
}
Finding a value (contains) in HashSet is a constant-time operation (O(1)) on average, which is better than a List, where contains is linear (O(n)). So, if your lists are large enough, it may be worthwhile to replace your first line with:
HashSet<Integer> list = new HashSet<Integer>();
The reason for this is that to find a value in an (unsorted) list, you need to check every index in the list until you find the one you want or run out of indexes to check. On average you'll check half the list before finding a value if the value is in the list, or the whole list if it's not. For a hash table, you generate an index from the value you want to find, then you check that one index (it's possible you need to check more than one, but it should be uncommon in a well-designed hash table).
Also, if you use a Set, you get a guarantee that each value is unique, so if you try to add a value that already exists, add will return false. You can use that to slightly simplify the code (note: This will not work if you use a List, because add always returns true on a List):
HashSet<Integer> list = new HashSet<Integer>();
int value;
if(!list.add(value))
error("",null);
Problems involving ranges often lend themselves to the use of a tree. Here's a way to do that using TreeSet:
public class DisjointChecker {
private final NavigableSet<Integer> integers = new TreeSet<Integer>();
public boolean check(int value) {
return integers.add(value);
}
public boolean check(int from, int to) {
NavigableSet<Integer> range = integers.subSet(from, true, to, true);
if (range.isEmpty()) {
addRange(from, to);
return true;
}
else {
return false;
}
}
private void addRange(int from, int to) {
for (int i = from; i <= to; ++i) {
integers.add(i);
}
}
}
Here, rather than calling an error handler, the check methods return a boolean indicating whether the arguments were disjoint from all previous arguments. The semantics of the range version are different to in the original code; if the range is not disjoint, none of the elements are added, whereas in the original, any below the first non-disjoint element are added.
A few points may deserve elaboration:
Set::add returns a boolean indicating whether the addition modified the set; we can use that as the return value from the method.
NavigableSet is an obscure but standard subinterface of SortedSet which is sadly neglected. Although you could actually use a plain SortedSet here with only minor modifications.
The NavigableSet::subSet method (like SortedSet::subSet) returns a lightweight view on the underlying set which is restricted to a given range. This provides a very efficient way to query the tree for any overlap with the whole range in one operation.
The addRange method here is very simple, and runs in O(m log n) when adding m items to a checker which has seen n items previously. It would be possible to make a version which ran in O(m) by writing an implementation of SortedSet which described a range of integers and then using Set::addAll, because TreeSet's implementation of this contains a special case for adding other SortedSets in linear time. The code for that special set implementation is very simple, but involves a lot of boilerplate, so i leave it as an exercise for the reader!

Count the occurrences of items in ArrayList

I have a java.util.ArrayList<Item> and an Item object.
Now, I want to obtain the number of times the Item is stored in the arraylist.
I know that I can do arrayList.contains() check but it returns true, irrespective of whether it contains one or more Items.
Q1. How can I find the number of time the Item is stored in the list?
Q2. Also, If the list contains more than one Item, then how can I determine the index of other Items because arrayList.indexOf(item) returns the index of only first Item every time?
You can use Collections class:
public static int frequency(Collection<?> c, Object o)
Returns the number of elements in the specified collection equal to the specified object. More formally, returns the number of elements e in the collection such that (o == null ? e == null : o.equals(e)).
If you need to count occurencies of a long list many times I suggest you to use an HashMap to store the counters and update them while you insert new items to the list. This would avoid calculating any kind of counters.. but of course you won't have indices.
HashMap<Item, Integer> counters = new HashMap<Item, Integer>(5000);
ArrayList<Item> items = new ArrayList<Item>(5000);
void insert(Item newEl)
{
if (counters.contains(newEl))
counters.put(newEl, counters.get(newEl)+1);
else
counters.put(newEl, 1);
items.add(newEl);
}
A final hint: you can use other collections framework (like Apache Collections) and use a Bag datastructure that is described as
Defines a collection that counts the number of times an object appears in the collection.
So exactly what you need..
This is easy to do by hand.
public int countNumberEqual(ArrayList<Item> itemList, Item itemToCheck) {
int count = 0;
for (Item i : itemList) {
if (i.equals(itemToCheck)) {
count++;
}
}
return count;
}
Keep in mind that if you don't override equals in your Item class, this method will use object identity (as this is the implementation of Object.equals()).
Edit: Regarding your second question (please try to limit posts to one question apiece), you can do this by hand as well.
public List<Integer> indices(ArrayList<Item> items, Item itemToCheck) {
ArrayList<Integer> ret = new ArrayList<Integer>();
for (int i = 0; i < items.size(); i++) {
if (items.get(i).equals(itemToCheck)) {
ret.add(i);
}
}
return ret;
}
As the other respondents have already said, if you're firmly committed to storing your items in an unordered ArrayList, then counting items will take O(n) time, where n is the number of items in the list. Here at SO, we give advice but we don't do magic!
As I just hinted, if the list gets searched a lot more than it's modified, it might make sense to keep it sorted. If your list is sorted then you can find your item in O(log n) time, which is a lot quicker; and if you have a hashcode implementation that goes well with your equals, all the identical items will be right next to each other.
Another possibility would be to create and maintain two data structures in parallel. You could use a HashMap containing your items as keys and their count as values. You'd be obligated to update this second structure any time your list changes, but item count lookups would be o(1).
I could be wrong, but it seems to me like the data structure you actually want might be a Multiset (from google-collections/guava) rather than a List. It allows multiples, unlike Set, but doesn't actually care about the order. Given that, it has a int count(Object element) method that does exactly what you want. And since it isn't a list and has implementations backed by a HashMap, getting the count is considerably more efficient.
Thanks for your all nice suggestion. But this below code is really very useful as we dont have any search method with List that can give number of occurance.
void insert(Item newEl)
{
if (counters.contains(newEl))
counters.put(newEl, counters.get(newEl)+1);
else
counters.put(newEl, 1);
items.add(newEl);
}
Thanks to Jack. Good posting.
Thanks,
Binod Suman
http://binodsuman.blogspot.com
I know this is an old post, but since I did not see a hash map solution, I decided to add a pseudo code on hash-map for anyone that needs it in the future. Assuming arraylist and Float data types.
Map<Float,Float> hm = new HashMap<>();
for(float k : Arralistentry) {
Float j = hm.get(k);
hm.put(k,(j==null ? 1 : j+1));
}
for(Map.Entry<Float, Float> value : hm.entrySet()) {
System.out.println("\n" +value.getKey()+" occurs : "+value.getValue()+" times");
}

Categories