Performance (runtime) of pulling arbitrary element from HashSet - java

I am using HashSets in an implementation to have fast adding, removing and element testing (amortized constant time).
However, I'd also like a method to obtain an arbitraty element from that set. The only way I am aware of is
Object arbitraryElement = set.iterator.next();
My question is - how fast (asymptotically speaking) is this? Does this work in (not amortized) constant time in the size of the set, or does the iterator().next() method do some operations that are slower? I ask because I seem to lose a log-factor in my implementation as experiments show, and this is one of the few lines affected.
Thank you very much!

HashSet.iterator().next() linearly scans the table to find the next contained item.
For the default load factor of .75, you would have three full slots for every empty one.
There is, of course, no guarantee what the distribution of the objects in the backing array will be & the set will never actually be that full so scans will take longer.
I think you'd get amortized constant time.
Edit: The iterator does not create a deep copy of anything in the set. It only references the array in the HashSet. Your example creates a few objects, but nothing more & no big copies.

I wouldn't expect this to be a logarithmic factor, on average, but it might be slow in some rare cases. If you care about this, use LinkedHashSet, which will guarantee constant time.

I would maintain an ArrayList of your keys, and when you need a random object, just generate an index, grab the key, and pull it out of the set. O(1) baby...

Getting the first element out of a HashSet using the iterator is pretty fast: I think it's amortised O(1) in most cases. This assumes the HashSet is reasonably well-populated for it's given capacity - if the capacity is very large compared to the number of elements then it will be more like O(capacity/n) which is the average number of buckets the iterator needs to scan before finding a value.
Even scanning an entire HashSet with an iterator is only O(n+capacity) which is effectively O(n) if your capacity is appropriately scaled. So it's still not particularly expensive (unless your HashSet is very large)
If you want better than that , you'll need a different data structure.
If you really need the fast access of arbitrary elements by index then I'd personally just put the objects in an ArrayList which will give you very fast O(1) access by index. You can then generate the index as a random number if you want to select an arbitrary element with equal probability.
Alternatively, if you want to get an arbitrary element but don't care about indexed access then a LinkedHashSet may be a good alternative.

This is from the JDK 7 JavaDoc for HashSet:
Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
I looked at the JDK 7 implementation of HashSet and LinkedHashSet. For the former, the next operation is a linked-list traversal within a has bucket, and between buckets an array traversal, where the size of the array is given by capacity(). The latter is strictly a linked list traversal.

If you need an arbitrary element in the probabilistic sense, you could use the following approach.
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant O(1) time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant O(1) time
void add(A a) {
indices.put(a,contents.size());
contents.add(a);
}
//removes element in constant O(1) time
void remove(A a) {
int index = indices.get(a);
contents.set(index,contents.get(contents.size()-1));
contents.remove(contents.size()-1);
indices.set(contents.get(contents.size()-1),index);
indices.remove(a);
}
//all other operations (contains(), ...) are those from indices.keySet()
}

If you are repeatedly choosing an arbitrary set element using an iterator and often removing that element, this can lead to a situation where the internal representation becomes unbalanced and finding the first element degrades to linear time complexity.
This is actually a pretty common occurrence when implementing algorithms involving graph traversal.
Use a LinkedHashSet to avoid this problem.
Demonstration:
import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.Random;
import java.util.Set;
import java.util.function.Supplier;
import java.util.stream.Collectors;
public class SetPeek {
private static final Random rng = new Random();
private static <T> T peek(final Iterable<T> i) {
return i.iterator().next();
}
private static long testPeek(Set<Integer> items) {
final long t0 = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
peek(items);
}
final long t1 = System.currentTimeMillis();
return t1 - t0;
}
private static <S extends Set<Integer>> S createSet(Supplier<S> factory) {
final S set = new Random().ints(100000).boxed()
.collect(Collectors.toCollection(factory));
// Remove first half of elements according to internal iteration
// order. With the default load factor of 0.75 this will not trigger
// a rebalancing.
final Iterator<Integer> it = set.iterator();
for (int k = 0; k < 50000; k++) {
it.next();
it.remove();
}
return set;
}
public static void main(String[] args) {
final long hs = testPeek(createSet(HashSet::new));
System.err.println("HashSet: " + hs + " ms");
final long lhs = testPeek(createSet(LinkedHashSet::new));
System.err.println("LinkedHashSet: " + lhs + " ms");
}
}
Results:
HashSet: 6893 ms
LinkedHashSet: 8 ms

Related

Set vs List when need both unique elements and access by index

I need to keep a unique list of elements seen and I also need to pick random one from them from time to time. There are two simple ways for me to do this.
Keep elements seen in a Set - that gives me uniqueness of elements. When there is a need to pick random one, do the following:
elementsSeen.toArray()[random.nextInt(elementsSeen.size())]
Keep elements seen in a List - this way no need to convert to array as there is the get() function for when I need to ask for a random one. But here I would need to do this when adding.
if (elementsSeen.indexOf(element)==-1) {elementsSeen.add(element);}
So my question is which way would be more efficient? Is converting to array more consuming or is indexOf worse? What if attempting to add an element is done 10 or 100 or 1000 times more often?
I am interested in how to combine functionality of a list (access by index) with that of a set (unique adding) in the most performance effective way.
If using more memory is not a problem then you can get the best of both by using both list and set inside a wrapper:
public class MyContainer<T> {
private final Set<T> set = new HashSet<>();
private final List<T> list = new ArrayList<>();
public void add(T e) {
if (set.add(e)) {
list.add(e);
}
}
public T getRandomElement() {
return list.get(ThreadLocalRandom.current().nextInt(list.size()));
}
// other methods as needed ...
}
HashSet and TreeSet both extend AbstractCollection, which includes the toArray() implementation as shown below:
public Object[] toArray() {
// Estimate size of array; be prepared to see more or fewer elements
Object[] r = new Object[size()];
Iterator<E> it = iterator();
for (int i = 0; i < r.length; i++) {
if (! it.hasNext()) // fewer elements than expected
return Arrays.copyOf(r, i);
r[i] = it.next();
}
return it.hasNext() ? finishToArray(r, it) : r;
}
As you can see, its responsible for allocating the space for an array, as well as creating an Iterator object for copying. So, for a Set, adding is O(1), but retrieving a random element will be O(N) because of the element copy operation.
A List, on the other hand, allows you quick access to a specific index in the backing array, but doesn't guarantee uniqueness. You would have to re-implement the add, remove and associated methods to guarantee uniqueness on insert. Adding a unique element will be O(N), but retrieval will be O(1).
So, it really depends on which area is your potential high usage point. Are the add/remove methods going to be heavily used, with random access used sparingly? Or is this going to be a container for which retrieval is most important, since few elements will be added or removed over the lifetime of the program?
If the former, I'd suggest using the Set with toArray(). If the latter, it may be beneficial for you to implement a unique List to take advantage to the fast retrieval. The significant downside is add contains many edge cases for which the standard Java library takes great care to work with in an efficient manner. Will your implementation be up to the same standards?
Write some test code and put in some realistic values for your use case. Neither of the methods are so complex that it's not worth the effort, if performance is a real issue for you.
I tried that quickly, based on the exact two methods you described, and it appears that the Set implementation will be quicker if you are adding considerably more than you are retrieving, due to the slowness of the indexOf method. But I really recommend that you do the tests yourself - you're the only person who knows what the details are likely to be.
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Random;
import java.util.Set;
public class SetVsListTest<E> {
private static Random random = new Random();
private Set<E> elementSet;
private List<E> elementList;
public SetVsListTest() {
elementSet = new HashSet<>();
elementList = new ArrayList<>();
}
private void listAdd(E element) {
if (elementList.indexOf(element) == -1) {
elementList.add(element);
}
}
private void setAdd(E element) {
elementSet.add(element);
}
private E listGetRandom() {
return elementList.get(random.nextInt(elementList.size()));
}
#SuppressWarnings("unchecked")
private E setGetRandom() {
return (E) elementSet.toArray()[random.nextInt(elementSet.size())];
}
public static void main(String[] args) {
SetVsListTest<Integer> test;
List<Integer> testData = new ArrayList<>();
int testDataSize = 100_000;
int[] addToRetrieveRatios = new int[] { 10, 100, 1000, 10000 };
for (int i = 0; i < testDataSize; i++) {
/*
* Add 1/5 of the total possible number of elements so that we will
* have (on average) 5 duplicates of each number. Adjust this to
* whatever is most realistic
*/
testData.add(random.nextInt(testDataSize / 5));
}
for (int addToRetrieveRatio : addToRetrieveRatios) {
/*
* Test the list method
*/
test = new SetVsListTest<>();
long t1 = System.nanoTime();
for(int i=0;i<testDataSize; i++) {
// Use == 1 here because we don't want to get from an empty collection
if(i%addToRetrieveRatio == 1) {
test.listGetRandom();
} else {
test.listAdd(testData.get(i));
}
}
long t2 = System.nanoTime();
System.out.println(((t2-t1)/1000000L)+" ms for list method with add/retrieve ratio "+addToRetrieveRatio);
/*
* Test the set method
*/
test = new SetVsListTest<>();
t1 = System.nanoTime();
for(int i=0;i<testDataSize; i++) {
// Use == 1 here because we don't want to get from an empty collection
if(i%addToRetrieveRatio == 1) {
test.setGetRandom();
} else {
test.setAdd(testData.get(i));
}
}
t2 = System.nanoTime();
System.out.println(((t2-t1)/1000000L)+" ms for set method with add/retrieve ratio "+addToRetrieveRatio);
}
}
}
Output on my machine was:
819 ms for list method with add/retrieve ratio 10
1204 ms for set method with add/retrieve ratio 10
1547 ms for list method with add/retrieve ratio 100
133 ms for set method with add/retrieve ratio 100
1571 ms for list method with add/retrieve ratio 1000
23 ms for set method with add/retrieve ratio 1000
1542 ms for list method with add/retrieve ratio 10000
5 ms for set method with add/retrieve ratio 10000
You could extend HashSet and track the changes to it, maintaining a current array of all entries.
Here I keep a copy of the array and adjust it every time the set changes. For a more robust (but more costly) solution you could use toArray in your pick method.
class PickableSet<T> extends HashSet<T> {
private T[] asArray = (T[]) this.toArray();
private void dirty() {
asArray = (T[]) this.toArray();
}
public T pick(int which) {
return asArray[which];
}
#Override
public boolean add(T t) {
boolean added = super.add(t);
dirty();
return added;
}
#Override
public boolean remove(Object o) {
boolean removed = super.remove(o);
dirty();
return removed;
}
}
Note that this will not recognise changes to the set if removed by an Iterator - you will need to handle that some other way.
So my question is which way would be more efficient?
Quite a difficult question to answer depending on what one does more, insert or select at random?
We need to look at the Big O for each of the operations. In this case (best cases):
Set: Insert O(1)
Set: toArray O(n) (I'd assume)
Array: Access O(1)
vs
List: Contains O(n)
List: Insert O(1)
List: Access O(1)
So:
Set: Insert: O(1), Access O(n)
List: Insert: O(n), Access O(1)
So in the best case they are much of a muchness with Set winning if you insert more than you select, and List if the reverse is true.
Now the evil answer - Select one (the one that best represents the problem (so Set IMO)), wrap it well and run with it. If it is too slow then deal with it later, and when you do deal with it, look at the problem space. Does your data change often? No, cache the array.
It depends what you value more.
List implementations in Java normally makes use of an array or a linked list. That means inserting and searching for an index is fast, but searching for a specific element will require looping thought the list and comparing each element until the element is found.
Set implementations in Java mainly makes use of an array, the hashCode method and the equals method. So a set is more taxing when you want to insert, but trumps list when it comes to looking for an element. As a set doesn't guarantee the order of the elements in the structure, you will not be able to get an element by index. You can use an ordered set, but this brings with it latency on the insert due to the sort.
If you are going to be working with indexes directly, then you may have to use a List because the order that element will be placed into Set.toArray() changes as you add elements to the Set.
Hope this helps :)

Insert Objects in a Constant Length List - Java

I am looking for a good optimal strategy to write a code for the following problem.
I have a List of Objects.
The Objects have a String "valuation" field among other fields. The valuation field may or may not be unique.
The List is of CONSTANT length which is calculated within the program. The length would usually be between 100 and 500.
The Objects are all sorted within the list based on String field - valuation
As new objects are found or created: The String field valuation is compared with the existing members of the list.
If the comparison fails e.g. with the bottom member of the list, then the Object is NOT added to the list.
If the comparison succeeds and the new Object is added to the list - within the sort criteria;the new object is added in the right position and the bottom member is ousted from the list to keep the length of the list constant.
One strategy which I am thinking:
Keep adding members to the list - till it reaches maxLength
Sort - (e.g Collections.sort with a comparator) the list
When a new member is created - compare it with the bottom member of the list.
If success - replace the bottom member else continue
Re-Sort the List - if success
and continue.
The program loops through million or more iterations, thus optimized comparison and running has become an issue.
Any guidance on a good strategy to address this within the Java domain. What lists will be the most effective e.g. LinkedList or ArrayLists or Sets etc. Which sort/insert (standard package) will be the most effective?
Consider this example based on TreeSet and comparing over a String for Results. As you can see, after enough iterations, only elements with very large keys are left in List. On my quite old laptop, I had 10.000 items in less than 50ms - so roundabout 5s per million list operations.
public class Valuation {
public static class Element implements Comparable<Element> {
String valuation;
String data;
Element(String v, String d) {
valuation = v;
data = d;
}
#Override
public int compareTo(Element e) {
return valuation.compareTo(e.valuation);
}
}
private TreeSet<Element> ts = new TreeSet<Element>();
private final static int LISTLENGTH = 500;
public static void main(String[] args) {
NumberFormat nf = new DecimalFormat("00000");
Random r = new Random();
Valuation v = new Valuation();
for(long l = 1; l < 150; ++l) {
long start = System.currentTimeMillis();
for(int j = 0; j < 10000; ++j) {
v.pushNew(new Element(nf.format(r.nextInt(50000))
, UUID.randomUUID().toString()));
}
System.out.println("10.000 finished in " + (System.currentTimeMillis()-start) + "ms. Set contains: " + v.ts.size());
}
for(Element e : v.ts) {
System.out.println("-> " + e.valuation);
}
}
private void pushNew(Element hexString) {
if(ts.size() < LISTLENGTH) {
ts.add(hexString);
} else {
if(ts.first().compareTo(hexString) < 0) {
ts.add(hexString);
if(ts.size() > LISTLENGTH) {
ts.remove(ts.first());
}
}
}
}
}
Any guidance on a good strategy to address this within the Java domain.
My advice would be - there is no need to do any sorting. You can ensure your data is sorted by doing binary insertion as you add more objects into your collection.
This way, as you add more items, the collection itself is already is a sorted state.
After the 500th item, if you want to add another one, we just perform another binary insertion. The insertion performance always remains at O(log(n)) and there is no need to perform any sorting.
Comparing with your algorithm
Your algorithm works fine from 1 - 4. But step 5 will likely be the bottle neck of your algorithm:
5.Re-Sort the List - if success
This is because even though your list will only have a maximum of 500 items, but there can be infinite number of insertions to be performed on this list after the 500th item is being added.
Imagine having another 1 million more insertions and (in worse case scenario), all 1 million items "succeeded" and can be inserted into the list, that implies your algorithm will need to perform 1 million more sorts!
That will be 1 million * n(log(n)) for sorting.
Compare with binary insertion, in the worse case it will be 1 million * log(n) for insertion (no sorting).
What lists will be the most effective e.g. LinkedList or ArrayLists or Sets etc.
If you use ArrayList, insertion won't be as efficient as compared to a linked list since ArrayList is backed by an array. However accessing of elements is only O(1) for arrayList as compare to linked list which is O(n). So there isn't a data structure which is efficient for all scenarios. You will have to plan your algorithm first and see which one fits best for your strategy.
Which sort/insert (standard package) will be the most effective?
As far as I know, there is Arrays.sort() and Collections.sort() available which will give you a good performance of O(n log(n)) as they are using a dual pivot sort which will be more effective than a simple insertion/bubble/selection sort created by yourself.

Find the highest N numbers in an infinite list

I was asked this question in a recent Java interview.
Given a List containing millions of items, maintain a list of the highest n items. Sorting the list in descending order then taking the first n items is definitely not efficient due to the list size.
Below is what I did, I'd appreciate if anyone could provide a more efficient or elegant solution as I believe this could also be solved using a PriorityQueue:
public TreeSet<Integer> findTopNNumbersInLargeList(final List<Integer> largeNumbersList,
final int highestValCount) {
TreeSet<Integer> highestNNumbers = new TreeSet<Integer>();
for (int number : largeNumbersList) {
if (highestNNumbers.size() < highestValCount) {
highestNNumbers.add(number);
} else {
for (int i : highestNNumbers) {
if (i < number) {
highestNNumbers.remove(i);
highestNNumbers.add(number);
break;
}
}
}
}
return highestNNumbers;
}
The for loop at the bottom is unnecessary, because you can tell right away if the number should be kept or not.
TreeSet lets you find the smallest element in O(log N)*. Compare that smallest element to number. If the number is greater, add it to the set, and remove the smallest element. Otherwise, keep walking to the next element of largeNumbersList.
The worst case is when the original list is sorted in ascending order, because you would have to replace an element in the TreeSet at each step. In this case the algorithm would take O(K log N), where K is the number of items in the original list, an improvement of logNK over the solution of sorting the array.
Note: If your list consists of Integers, you could use a linear sorting algorithm that is not based on comparisons to get the overall asymptotic complexity to O(K). This does not mean that the linear solution would be necessarily faster than the original for any fixed number of elements.
* You can maintain the value of the smallest element as you go to make it O(1).
You don't need nested loops, just keep inserting and remove the smallest number when the set is too large:
public Set<Integer> findTopNNumbersInLargeList(final List<Integer> largeNumbersList,
final int highestValCount) {
TreeSet<Integer> highestNNumbers = new TreeSet<Integer>();
for (int number : largeNumbersList) {
highestNNumbers.add(number);
if (highestNNumbers.size() > highestValCount) {
highestNNumbers.pollFirst();
}
}
return highestNNumbers;
}
The same code should work with a PriorityQueue, too. The runtime should be O(n log highestValCount) in any case.
P.S. As pointed out in the other answer, you can optimize this some more (at the cost of readability) by keeping track of the lowest number, avoiding unnecessary inserts.
It's possible to support amortized O(1) processing of new elements and O(n) querying of the current top elements as follows:
Maintain a buffer of size 2n, and whenever you see a new element, add it to the buffer. When the buffer gets full, use quick select or another linear median finding algorithm to select the current top n elements, and discard the rest. This is an O(n) operation, but you only need to perform it every n elements, which balances out to O(1) amortized time.
This is the algorithm Guava uses for Ordering.leastOf, which extracts the top n elements from an Iterator or Iterable. It is fast enough in practice to be quite competitive with a PriorityQueue based approach, and it is much more resistant to worst case input.
I would start by saying that your question, as stated, is impossible. There is no way to find the highest n items in a List without fully traversing it. And there is no way to fully traverse an infinite List.
That said, the text of your question differs from the title. There is a massive difference between very large and infinite. Please bear that in mind.
To answer the feasible question, I would begin by implementing a buffer class to encapsulate the behaviour of keeping the top N, lets call it TopNBuffer:
class TopNBuffer<T extends Comparable<T>> {
private final NavigableSet<T> backingSet = new TreeSet<>();
private final int limit;
public TopNBuffer(int limit) {
this.limit = limit;
}
public void add(final T t) {
if (backingSet.add(t) && backingSet.size() > limit) {
backingSet.pollFirst();
}
}
public SortedSet<T> highest() {
return Collections.unmodifiableSortedSet(backingSet);
}
}
All we do here is to, on add, if the number is not unique, and adding the number makes the Set exceeds its limit, then we simply remove the lowest element from the Set.
The method highest gives an unmodifiable view of the current highest elements. So, in Java 8 syntax, all you need to do is:
final TopNBuffer<Integer> topN = new TopNBuffer<>(n);
largeNumbersList.foreach(topN::add);
final Set<Integer> highestN = topN.highest();
I think in an interview environment, its not enough to simply whack lots of code into a method. Demonstrating an understanding of OO programming and separation of concerns is also important.

Java - Which collection suits this situation best in terms of performance?

I'm writing a class that needs to read strings from a file and store them in some data structure. What should I use given the following:
The file will contain up to several hundreds of strings (they need to be stored in a structure, can't stream).
The entries need to be stored in a specific order.
Once sorted the collection will not be modified (It doesn't have to be immutable, but I know it won't be modified).
I will need to iterate through the collection several times.
If there are duplicate entries in the set, only one of them should be stored.
The following answer (and others) say that an ArrayList is better if I only need to sort once since it reads faster, but if I use an ArrayList then I will have to make sure they are unique manually.
You can use a TreeSet. It is a set, so it will not store duplicate entries. It sorts the entries directly when inserting. The basic operations require log(n) time. Thus, the overall time requirement is similar to inserting into a list first and then using a n*log(n) sorting algorithm.
I did a little benchmark of TreeSet vs ArrayList insertion/performance. Obviously ArrayList performs the better but, with a million unique records 279ms of full iteration time is not that bad.
If in your case that time is negligible i'd stick with the TreeSet. Otherwise you'll be forced to reivent the wheel and to manually check for duplicates before inserting the element into the ArrayList.
import java.util.ArrayList;
import java.util.Iterator;
import java.util.TreeSet;
public class TestTreeSetVsArrayList {
public static int ENTRIES = 10000000;
public static void main(String[] args) {
TreeSet<String> treeSet = new TreeSet<String>();
ArrayList<String> arrayList = new ArrayList<String>(10000);
long l = System.currentTimeMillis();
for (int i = 0; i < TestTreeSetVsArrayList.ENTRIES; i++) {
treeSet.add("String"+i);
}
System.out.println("treeset insertion time: "+ (System.currentTimeMillis()-l));
l = System.currentTimeMillis();
for (int i = 0; i < TestTreeSetVsArrayList.ENTRIES; i++) {
treeSet.add("String"+i);
}
System.out.println("arraylist insertion time: "+ (System.currentTimeMillis()-l));
Iterator<String> iter;
iter = treeSet.iterator();
l = System.currentTimeMillis();
while(iter.hasNext()) {
iter.next();
}
System.out.println("treeset iteration time: "+ (System.currentTimeMillis()-l));
iter = arrayList.iterator();
l = System.currentTimeMillis();
while(iter.hasNext()) {
iter.next();
}
System.out.println("arraylist iteration time: "+ (System.currentTimeMillis()-l));
}
}
The results on my pc are:
treeset insertion time: 11350
arraylist insertion time: 3583
treeset iteration time: 279
arraylist iteration time: 0
If you can sort the elements while inserting, consider a TreeSet (if necessary with a self-defined Comparator).
If not, it seems you might need two structures:
An ArrayList for initial filling and sorting.
Afterwards, a LinkedHashSet in order to ensure singularity while preserving order.
You probably want to use a LinkedHashSet, which is a:
Hash table and linked list implementation of the Set interface, with predictable iteration order
...
This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet.
If you can sort at any time: Insert the strings to a Set (preferably HashSet, I presume), and then spill them into an ArrayList and sort.

Complexity of Set operations

This is what I am doing:
String one = "some string"
String two = "some string"
I want to know all the characters that are in string one and two and they should come in order as they are in string one
I wrote a Java program which by using Collections performs set operations on both the collection.
What I would like to know that what is the complexity of performing set operations, is it polynomial time or linear time
My program is here
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package careercup.google;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
/**
*
* #author learner
*/
public class CharaterStringsIntersection {
private static final String one = "abcdefgabcfmnx";
private static final String two = "xbcg";
public static void main(String args[]){
List<Character> l_one = new ArrayList<Character>();
List<Character> l_two = new ArrayList<Character>();
for(int i=0; i<one.length(); i++){
l_one.add(one.charAt(i));
}
for(int j=0; j<two.length(); j++){
l_two.add(two.charAt(j));
}
l_one.retainAll(l_two);
Iterator iter = l_one.iterator();
while(iter.hasNext()){
System.out.println(" > " + iter.next());
}
}
}
Output :
run:
> b
> c
> g
> b
> c
> x
Its complexity is O(N * (N + M)), where N = one.length(), M = two.length().
It works in the following way: for each character of l_one (there are N of them) it scans l_two (since l_two is an ArrayList, scan is linear, so it takes O(M) steps, see [1]) and removes item from l_one if necessary (removing from ArrayList takes O(N), see [1]), so you get O(N * (N + M)).
You can lower the complexity to O(N * M) by using LinkedList for l_one (since removing from LinkedList is O(1), see [2]), and further
lower it to O(N * log(M)) by using TreeSet for l_two (since searching in TreeSet takes O(log(M)), see [3]).
References:
All of the other operations run in linear time (ArrayList javadoc)
All of the operations perform as could be expected for a doubly-linked list, (LinkedList javadoc)
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains) (TreeSet javadoc)
I think the more important question you are trying to ask is what is the complexity of retainAll() function. Assuming there are no hashes or quick look up, I would assume the time complexity is O(n^2). One loop to iterate the list and one loop to compare each item.
You are using the wrong collection for your purposes..
Since you are doing it with plain ArrayList the complexity wouldn't be good, I don't know the inner implementation but I would guess it's quadratic (since it has to scroll both lists), this unless Java internally converts two lists to sets before doing the intersection.
You should start from Set interface (here) and choose which collection is more suitable for your needs, I think that a HashSet (constant time) or a TreeSet (logarithmic but able to keep natural ordering) would be best solutions.
Briefly:
ArrayList.add() is O(1).
ArrayList.retainAll(other) is O(N1 * N2) where N1 is the size of this and N2 is the size of other. (The actual cost depends on the ratio of elements retained to elements removed.)
iter.hasNext() and iter.next() are O(1) for an ArrayList iterator.
This is all as you would expect, based on mentally modelling an ArrayList as a Java array.

Categories