I need a Java structure which can store list of all countries. Which Java data structure will you recommend?
You may use Set collection implementations like HashSet which avoids duplicates (if just names). If you want to keep country code and name, then may be Map collection.
Are you going to iterate over the collection?
If so, java.util.ArrayList.
Are you going to use it to do some kind of look up? Like a 'does this exist' scenario?
If so, java.util.HashSet
Do you need to attach additional information to each country?
If so, java.util.HashMap
Do you need a lookup and an ordered iteration?
If so, java.util.TreeSet.
There's also concurrency to be concerned about, but I didn't see any mention of it, so I'll leave off those guys.
You should use a HashSet which provides both the uniqueness of elements in a set and the constant time of key search.
Try to use Dictionary<k,v> , I recommend it
I think a HashSet should be suitable so long as you don't expect to have duplicates. HashSets provide constant time lookup which will speed up searches. You can consider other Thread-safe variants like Collections.SynchronizedSet or CopyOnWriteArrayList if you expect the data structure to be accessed and modified by multiple threads. You'll need to provide more details on the use case to narrow down your options
I've done very similar thing recently as I have a list of languages in my app. I just use an Enum to do that. List of countries is fairly stable, so you shouldn't need to recompile this class to often;)
Related
Java has tons of different Collections designed for concurrency and thread safety, and I'm at a loss as to which one to choose for my situation.
Multiple threads may be calling .add() and .remove(), and I will be copying this list frequently with something like List<T> newList = new ArrayList<T>(concurrentList). I will never be looping over the concurrent list.
I thought about something like CopyOnWriteArrayList, but I've read that it can be very inefficient because it copies itself every time it's modified. I'm hoping to find a good compromise between safety and efficiency.
What is the best list (or set) for this situation?
As #SpiderPig said, the best case scenario with a List would be an immutable, singly-linked list.
However, looking at what's being done here, a List is unnecessary (#bhspencer's comment). A ConcurrentSkipListSet will work most efficiently (#augray).
This Related Thread's accepted answer offers more insight on the pros and cons of different concurrent collections.
You might want to look into whether a ctrie would be appropriate for your use case - it has thread-safe add and remove operations, and "copying" (in actuality, taking a snapshot of) the data structure runs in O(1). I'm aware of two JVM implementations of the data structure: implementation one, implementation two.
Collections.newSetFromMap(new ConcurrentHashMap<...>())
This is typically how a normal Set is done (HashSet is really a modified wrapper over HashMap). It offers both the advantages of performance/concurrecy from ConcurrentHashMap, and does not have extra features like ConcurrentSkipListSet (ordering), COW lists (copying every modification), or concurrent queues (FIFO/LIFO ordering).
Edit: I didn't see #bhspencer's comment on the original post, apologies for stealing the spotlight.
Hashset being hashing based would be better than List.
Add last and remove first will be good with LinkedList.
Search will be fast in arraylist being array index based.
Thanks,
Given set of attributes and a comparator I'd like to generate an order preserving hash code that provides O(1) access. Is there a Java library for this sort of thing or would I have to design the hash function myself?
Try:
java.util.LinkedHashMap()
There is no single collection that will do this. Depending on the detail requirements there are several options to chose from.
For simplicity, I would just use a HashMap for lookups and when I need the sorted data, I'd make a copy of the values and sort it:
List<?> sorted = new ArrayList<?>(hashMap.values());
Collections.sort(sorted, Comparator<?>);
This suffices for most real world use cases.
You could also write your own super-container that internally holds the elements in two collections, one HashMap and maybe a TreeSet. You can then easily provide access methods that make use of the collection better for the purpose of the method. Just make sure you make additions and removals affect both the contained collections.
This question relates to using most efficient data structure for a part of a uni-project.
I have to store several instruction objects in a data structure. Each instruction has a unique int ID called Stage. Is HashMap the best choice to find the instruction i need fast ?I havent used it before but from the description it seems that using the int ID as key would make this run efficiently. If you can, please suggest a more efficient way to do it. Thanks
If you only want to lookup entries and not add/delete move, sort or do anything else,
than an array is the fastest data structure for this.
Yes. Some kind of Map seems to be the data structure of choice in your scenario.
Note that a HashMap does not maintain the order of its elements. If order is important to you, I suggest you use LinkedHashMap (or perhaps even some List structure) instead.
I think that's the best way because that way you can access the table in O(1).
Depends also on what's the type of your ids, maybe an array is enough (and even more efficient), but generally a hash-table is more flexible for these purposes.
If you know the domain of the keys, an Arraylist or a plain array may be even more efficient. But there are reasons not to use plain arrays too much.
If the ID's are simply Integers and they are like 0,1,2..n then an array would be the best choice.
Let's say I want to put words in a data structure and I want to have constant time lookups to see if the word is in this data structure. All I want to do is to see if the word exists. Would I use a HashMap (containsKey()) for this? HashMaps use key->value pairings, but in my case I don't have a value. Of course I could use null for the value, but even null takes space. It seems like there ought to be a better data structure for this application.
The collection could potentially be used by multiple threads, but since the objects contained by the collection would not change, I do not think I have a synchronization/concurrency requirement.
Can anyone help me out?
Use HashSet instead. It's a hash implementation of Set, which is used primarily for exactly what you describe (an unordered set of items).
You'd generally use an implementation of Set, and most usually HashSet. If you did need concurrent access, then ConcurrentHashSet provides a drop-in replacement that provides safe, concurrent access, including safe iteration over the set.
I'd recommend in any case referring to it as simply a Set throughout your code, except in the one place where you construct it; that way, it's easier to drop in one implementation for the other if you later require it.
Even if the set is read-only, if it's used by a thread other than the one that creates it, you do need to think about safe publication (that is, making sure that any other thread sees the set in a consistent state: remember any memory writes, even in constructors, aren't guaranteed to be made available to other threads when or in the otder you expect, unless you take steps to ensure this). This can be done by both of the following:
making sure the only reference(s) to the set are in final fields;
making sure that it really is true that no thread modifies the set.
You can help to ensure the latter by using the Collections.unmodifiableSet() wrapper. This gives you an unmodifiable view of the given set-- so provided no other "normal" reference to the set escapes, you're safe.
You probably want to use a java.util.Set. Implementations include java.util.HashSet, which is the Set equivalent of HashMap.
Even if the objects contained in the collection do not change, you may need to do synchronization. Do new objects need to be added to the Set after the Set is passed to a different thread? If so, you can use Collections.synchronizedSet() to make the Set thread-safe.
If you have a Map with values, and you have some code that just wants to treat the Map as a Set, you can use Map.entrySet() (though keep in mind that entrySet returns a Set view of the keys in the Map; if the Map is mutable, the Map can be changed through the set returned by entrySet).
You want to use a Collection implementing the Set interface, probably HashSet to get the performance you stated. See http://java.sun.com/javase/6/docs/api/java/util/Set.html
Other than Sets, in some circumstances you might want to convert a Map into a Set with Collections.newSetFromMap(Map<E,Boolean>) (some Maps disallow null values, hence the Boolean).
as everyone said HashSet is probably the simplest solution but you won't have constant time lookup in a HashSet (because entries may be chained) and you will store a dummy object (always the same) for every entry...
For information here a list of data structures maybe you'll find one that better fits your needs.
Is there any practical difference between a Set and Collection in Java, besides the fact that a Collection can include the same element twice? They have the same methods.
(For example, does Set give me more options to use libraries which accept Sets but not Collections?)
edit: I can think of at least 5 different situations to judge this question. Can anyone else come up with more? I want to make sure I understand the subtleties here.
designing a method which accepts an argument of Set or Collection. Collection is more general and accepts more possibilities of input. (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Collection.)
designing a method which returns a Set or Collection. Set offers more guarantees than Collection (even if it's just the guarantee not to include one element twice). (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Set.)
designing a class that implements the interface Set or Collection. Similar issues as #2. Users of my class/interface get more guarantees, subclassers/implementers have more responsibility.
designing an interface that extends the interface Set or Collection. Very similar to #3.
writing code that uses a Set or Collection. Here I might as well use Set; the only reasons for me to use Collection is if I get back a Collection from someone else's code, or if I have to handle a collection that contains duplicates.
Collection is also the supertype of List, Queue, Deque, and others, so it gives you more options. For example, I try to use Collection as a parameter to library methods that shouldn't explicitly depend on a certain type of collection.
Generally, you should use the right tool for the job. If you don't want duplicates, use Set (or SortedSet if you want ordering, or LinkedHashSet if you want to maintain insertion order). If you want to allow duplicates, use List, and so on.
I think you already have it figured out- use a Set when you want to specifically exclude duplicates. Collection is generally the lowest common denominator, and it's useful to specify APIs that accept/return this, which leaves you room to change details later on if needed. However if the details of your application require unique entries, use Set to enforce this.
Also worth considering is whether order is important to you; if it is, use List, or LinkedHashSet if you care about order and uniqueness.
See Java's Collection tutorial for a good walk-through of Collection usage. In particular, check out the class hierarchy.
As #mmyers states, Collection includes Set, as well as List.
When you declare something as a Set, rather than a Collection, you are saying that the variable cannot be a List or a Map. It will always be a Collection, though. So, any function that accepts a Collection will accept a Set, but a function that accepts a Set cannot take a Collection (unless you cast it to a Set).
One other thing to consider... Sets have extra overhead in time, memory, and coding in order to guarantee that there are no duplicates. (Time and memory because sets are usually backed by a HashMap or a Tree, which adds overhead over a list or an array. Coding because you have to implement the hashCode() and equals() methods.)
I usually use sets when I need a fast implementation of contains() and use Collection or List otherwise, even if the collection shouldn't have duplicates.
You should use a Set when that is what you want.
For example, a List without any order or duplicates. Methods like contains are quite useful.
A collection is much more generic. I believe that what mmyers wrote on their usage says it all.
The practical difference is that Set enforces the set logic, i.e. no duplicates and unordered, while Collection does not. So if you need a Collection and you have no particular requirement for avoiding duplicates then use a Collection. If you have the requirement for Set then use Set. Generally use the highest interface possibble.
As Collection is a super type of Set and SortedSet these can be passed to a method which expects a Collection. Collection just means it may or may not be sorted, order or allow duplicates.