I want to create a large (~300,000 entries) List of self defined objects of the class Drug.
Every Drug has an ID and I want to be able to search the Drugs in logarithmic time via that ID.
What kind of List do I have to use?
How do I declare that it should be searchable via the ID?
The various implementations of the Map interface should do what you want.
Just remember to override the hashCode() method of your Drug class if you plan to use a HashMap.
public class Drug implements Comparable<Drug> {
public int compareTo(Drug o) {
return this.id.compareTo(o.getId());
}
}
Then in your List you can use binarySearch
List<Drug> drugList; <--- List of all drugs
Drug drugToSearchFor; <---- The drug that you want to search for, containing the id
// Sort before search
Collections.sort(drugList);
int index = Collections.binarySearch(drugList, drugToSearchFor);
if (index >= 0) {
return true;
} else {
return false;
}
Wouldn't you use TreeMap instead of List using the ID as your Key?
If searching by a key is important for you, then you probably need to use a Map and not a List. From the Java Collections Trail:
The three general-purpose Map
implementations are HashMap, TreeMap
and LinkedHashMap. If you need
SortedMap operations or key-ordered
Collection-view iteration, use
TreeMap; if you want maximum speed and
don't care about iteration order, use
HashMap; if you want near-HashMap
performance and insertion-order
iteration, use LinkedHashMap.
Due to the high number of entries you might consider to use a database instead of holding everything in memory.
If you still want to keep it in memory you might have a look at b-trees.
You could use any list, and as long as it is sorted you can use a binary search.
But I would use a Map which searches in O(1).
I know I am pretty redundant with this statement, but as everybody said isnt this exactly the case for a Map ?
Related
I have implemented a graph.
I want to sort a given subset of vertices with respect to their degrees.
Therefore, I've written a custom comparator named DegreeComparator.
private class DegreeComparator implements Comparator<Integer>
{
#Override
public int compare(Integer arg0, Integer arg1)
{
if(adj[arg1].size() == adj[arg0].size()) return arg1 - arg0;
else return adj[arg1].size() - adj[arg0].size());
}
}
So, which one of the below is more efficient?
Using TreeSet
public Collection<Integer> sort(Collection<Integer> unsorted)
{
Set<Integer> sorted = new TreeSet<Integer>(new DegreeComparator());
sorted.addAll(unsorted);
return sorted;
}
Using ArrayList
Collections.sort(unsorted, new DegreeComparator());
Notice that the second approach is not a function, but a one-line code.
Intuitively, I'd rather choose the second one. But I'm not sure if it is more efficient.
Java API contains numerous Collection and Map implementations so it might be confusing to figure out which one to use. Here is a quick flowchart that might help with choosing from the most common implementations
A TreeSet is a Set. It removes duplicates (elements with the same degree). So both aren't equivalent.
Anyway, if what you want naturally is a sorted list, then sort the list. This will work whether the collection has duplicates or not, and even if it has the same complexity (O(n*log(n)) as populating a TreeSet, it is probably faster (because it just has to move elements in an array, instead of having to create lots of tree nodes).
If you only sort once, then the ArrayList is an obvious winner. The TreeSet is better if you add or remove items often as sorting a list again and again would be slow.
Note also that all tree structures need more memory and memory access indirection which makes them slower.
If case of medium sized lists, which change rather frequently by a single element, the fastest solution might be using ArrayList and inserting into the proper position (obviously assuming the arrays get sorted initially).
You'd need to determine the insert position via Arrays.binarySearch and insert or remove. Actually, I would't do it, unless the performance were really critical and a benchmark would show it helps. It gets slow when the list get really big and the gain is limited as Java uses TimSort, which is optimized for such a case.
As pointed in a comment, assuring that the Comparator returns different values is sometimes non-trivial. Fortunately, there's Guava's Ordering#arbitrary, which solves the problem if you don't need to be compatible with equals. In case you do, a similar method can be written (I'm sure I could find it somewhere if requested).
I have implemented a graph.
I want to sort a given subset of vertices with respect to their degrees.
Therefore, I've written a custom comparator named DegreeComparator.
private class DegreeComparator implements Comparator<Integer>
{
#Override
public int compare(Integer arg0, Integer arg1)
{
if(adj[arg1].size() == adj[arg0].size()) return arg1 - arg0;
else return adj[arg1].size() - adj[arg0].size());
}
}
So, which one of the below is more efficient?
Using TreeSet
public Collection<Integer> sort(Collection<Integer> unsorted)
{
Set<Integer> sorted = new TreeSet<Integer>(new DegreeComparator());
sorted.addAll(unsorted);
return sorted;
}
Using ArrayList
Collections.sort(unsorted, new DegreeComparator());
Notice that the second approach is not a function, but a one-line code.
Intuitively, I'd rather choose the second one. But I'm not sure if it is more efficient.
Java API contains numerous Collection and Map implementations so it might be confusing to figure out which one to use. Here is a quick flowchart that might help with choosing from the most common implementations
A TreeSet is a Set. It removes duplicates (elements with the same degree). So both aren't equivalent.
Anyway, if what you want naturally is a sorted list, then sort the list. This will work whether the collection has duplicates or not, and even if it has the same complexity (O(n*log(n)) as populating a TreeSet, it is probably faster (because it just has to move elements in an array, instead of having to create lots of tree nodes).
If you only sort once, then the ArrayList is an obvious winner. The TreeSet is better if you add or remove items often as sorting a list again and again would be slow.
Note also that all tree structures need more memory and memory access indirection which makes them slower.
If case of medium sized lists, which change rather frequently by a single element, the fastest solution might be using ArrayList and inserting into the proper position (obviously assuming the arrays get sorted initially).
You'd need to determine the insert position via Arrays.binarySearch and insert or remove. Actually, I would't do it, unless the performance were really critical and a benchmark would show it helps. It gets slow when the list get really big and the gain is limited as Java uses TimSort, which is optimized for such a case.
As pointed in a comment, assuring that the Comparator returns different values is sometimes non-trivial. Fortunately, there's Guava's Ordering#arbitrary, which solves the problem if you don't need to be compatible with equals. In case you do, a similar method can be written (I'm sure I could find it somewhere if requested).
Given a sorted array of objects, while the order is based on some object attribute. (Sorting is done via a List using Collections.sort() with a custom Comparator and then calling toArray()).
Duplicate instances of SomeObject are not allowed ("duplicates" in this regard depends on multiple attribute value in SomeObject), but it's possible that multiple instances of SomeObject have the same value for attribute1, which is used for sorting.
public SomeObject {
public attribute1;
public attribute2;
}
List<SomeObject> list = ...
Collections.sort(list, new Comparator<SomeObject>() {
#Override
public int compare(SomeObject v1, SomeObject v2) {
if (v1.attribute1 > v2.attribute1) {
return 1;
} else if (v1.attribute1 < v2.attribute1) {
return -1;
} else
return 0;
}
});
SomeObject[] array = list.toArray(new SomeObject[0]);
How to efficiently check whether a certain object based on some attribute is in that array while also being able to "mark" objects already found in some previous look up (e.g. simply by removing them from the array; already found objects don't need to be accessed at later time).
Without the later requirement, one could do a Arrays.binarySearch() with custom Comparator. But obviously it's not working when one want to remove objects already found.
Use a TreeSet (or TreeMultiset).
You can initialize it with your comparator; it sorts itself; look-up and removal are in logarithmic time.
You can also check for existence and remove in one step, because remove returns a boolean.
Building on Arian's answer, you can also use TreeBag from Apache Commons' Collections library. This is backed by a TreeMap, and maintains a count for repeated elements.
If you want you can put all the elements into some sort of linked list whose nodes are also connected in a heap form when you sort them. That way, finding an element would be log n and you can still delete the nodes in place.
I need a collection class which has both: quick index and hash access.
Now I have ArrayList. It has good index acces, but his contains method is not performant. HashSet has good contains implementation but no indexed acces. Which collection has both? Probably something from Apache?
Or should I create my own collection class which has both: ArrayList for indexed acces and HashSet for contains check?
Just for clarification: i need both get(int index) and contains(Object o)
If indexed access performance is not a problem the closest match is LinkedHashSet whose API says that it is
Hash table and linked list implementation of the Set interface, with predictable iteration order.
at least I dont think that the performance will be be worse than that of LinkedListPerformance. Otherwise I cannot see no alternative but your ArrayList + HashTable solution
If you are traversing the index from start to finish, I think this might satisfy your needs: LinkedHashSet
If you need to randomly access via the index, as well as hash access, if no-one else has a better suggestion I guess you can make your own collection which does both.
Do like this; use combination of Hash technique as well as list to get best of both worlds :)
class DataStructure<Integer>{
Hash<Integer,Integer> hash = new HashMap<Integer, Integer>();
List<Integer> list = new ArrayList<Integer>();
public void add(Integer i){
hash.add(i,i);
list.add(i);
}
public Integer get(int index){
return list.get(index);
}
...
} //used Integers to make it simpler
So object; you keep in HashMap/HashSet as well as ArrayList.
So if you want to use
contains method : call hashed contains method.
get an object with index: use array to return the value
Just make sure that you have both these collections in sync. And take care of updation/deletion in both data structures.
I don't know the exact lookup times, but maybe you could use some implementation of the Map interface. You could store you objects with map.put(objectHash, obj).
Then you could verify that you have a certain object with:
boolean contained = map.containsValue(obj);
And you can use the hash to lookup an object in the map:
MyObject object = map.get(objectHash);
Though, the only downfall is that you would need to know your hashes on this lookup call which may not be probably in your implementation.
I have a ArrayList containing Attributes
class Attribute{
private int id;
public string getID(){
return this.id;
}
private string value;
public string getValue(){
return this.value;
}
//... more properties here...
}
Well I filled the ArrayList with like hundreds of those attributes. And I want to find the Attribute with a defined ID. I want to do something like this:
ArrayList<Attribute> arr = new ArrayList<Attribute>();
fillList(arr); //Method that puts a lot of these Attributes in the list
arr.find(234); //Find the attribute with the ID 234;
Is looping over the ArrayList the only solution.
Well something's going to have to loop over the array list, yes. There are various ways of doing this, different libraries etc.
If you fill the array in an ordered way (e.g. so that low IDs always come before high IDs) then you can perform a binary search in O(log N) time. Otherwise, it'll be O(N).
If you're going to search by IDs a lot, however, why not create a Map<Integer, Attribute> to start with - e.g. a HashMap, or a LinkedHashMap if you want to preserve ordering?
If you're only going to search for a single ID (or a few), however, this almost certainly won't be worth it - there's a cost involved in hashing, after all; filling the map will be more expensive than filling the list, and the difference is likely to be greater than the time saved looking up a few IDs.
Have you already established that this is a performance bottleneck? If so, this is an easy place to improve by using a map (or just a sorted list with a binary search). If not, I wouldn't disturb your code if it more naturally uses a list than a map - but you should certainly check whether it's a bottleneck or not.
You want to use a Map
If you wish to access elements of a collection using element attributes, and this attribute is guaranteed to be unique per element, then you really should use a Map. Try a Map with the Attribute.id as the key.