I have a set of elements with two properties, a name (string, not unique) and id (integer, unique). All elements with the same name are stored together, sorted according to some criteria.
Insertion is done just once as all elements are known beforehand, so it can be done easily. Deletion is done according to either the order (the first one) or eventually the id. Reading the values will be the most common (and relevant) operation.
Performance is the top requirement for the data-structure. I thought a multikey, linked data structure or a mixed hashmap/stack would be ideal, but I know non. Some options I considered are:
- Guava tables (multiple keys), but they don't have push/pop behaviors.
- LinkedHashMaps, but they just have one key.
Of course I can use LinkedHasMaps and iterate for deletion for those cases in which I have to delete an element based on the id. I just want to know if there is something out there already implemented with high performance.
Any suggestions?
Thanks to everyone
Use a Map<String, TreeSet<Integer>>. This will allow you to store several items under the same key, and keep the integer values sorted.
The idea is that you have a single key map to a data structure that can hold multiple values. To insert a name, value pair, you could do something like:
private Map<String, TreeSet<Integer>> map = new HashMap<>();
public void insert(String name, int value)
{
if (! map.containsKey(name))
{
map.put(name, new TreeSet<Integer>());
}
map.get(name).add(value);
}
A LinkedHashMap just keeps track of insertion order of the keys, and iterates in that order--it wouldn't improve performance over using a HashMap.
Related
I'm trying to remove duplicates from key-value pairs. And sorting the Data first seems like the best way to do this. I have tuples(Both values are Integer) so the code doesn't necessarily have to work for different Objects and if it can be optimised for Integers that would be great. I would like to sort all my pairs first by Value, and then by Key(Note that I need both operations while maintaining the key-value relationship)
I'm new to Java, and I was wondering if there exist sorting methods in a Map(or any other data-structure which I can use) that would do this for me. Since the dataset I'm using is huge(>50GB), I have to save time wherever possible. I have tried simply adding all the pairs into a Set(as a concatenated string of both integers) and then taking them out, but it takes too long. I'm open to switching to external-sort algorithms if needed(I'm using 64 GB memory pc, so anything that takes more than O(n) space will be problematic)
Well, you can both sort and eliminate duplicate by storing those data into a TreeMap. TreeMap is a implementation of Map where keys in TreeMap are sorted using their natural order. We could implement the Comparable<Data_Type> and override public int compareTo(T t) to define the sorting order.
As this is not a multikey Hash, only one key could exists in Map. So the duplicate entity will be automatically over-written.
Have a look at this link: Sort a HashMap in Java
Redis has a data structure called a sorted set.
The interface is roughly that of a SortedMap, but sorted by value rather than key. I could almost make do with a SortedSet, but they seem to assume static sort values.
Is there a canonical Java implementation of a similar concept?
My immediate use case is to build a set with a TTL on each element. The value of the map would be the expiration time, and I'd periodically prune expired elements. I'd also be able to bump the expiration time periodically.
So... several things.
First, decide which kind of access you'll be doing more of. If you'll be doing more HashMap actions (get, put) than accessing a sorted list, then you're better off just using a HashMap and sorting the values when you want to prune the collection.
As for pruning the collection, it sounds like you want to just remove values that have a time less than some timestamp rather than removing the earliest n items. If that's the case then you're better off just filtering the HashMap based on whether the value meets a condition. That's probably faster than trying to sort the list first and then remove old entries.
Since you need two separate conditions, one on the keys and the other one on the values, it is likely that the best performance on very large amounts of data will require two data structures. You could rely on a regular Set and, separately, insert the same objects in PriorityQueue ordered by TTL. Bumping the TTL could be done by writing in a field of the object that contains an additional TTL; then, when you remove the next object, you check if there is an additional TTL, and if so, you put it back with this new TTL and additional TTL = 0 [I suggest this because the cost of removal from a PriorityQueue is O(n)]. This would yield O(log n) time for removal of the next object (+ cost due to the bumped TTLs, this will depend on how often it happens) and insertion, and O(1) or O(log n) time for bumping a TTL, depending on the implementation of Set that you choose.
Of course, the cleanest approach would be to design a new class encapsulating all this.
Also, all of this is overkill if your data set is not very large.
You can implement it using a combination of two data structures.
A sorted mapping of keys to scores. And a sorted reverse mapping of scores to keys.
In Java, typically these would be implemented with TreeMap (if we are sticking to the standard Collections Framework).
Redis uses Skip-Lists for maintaining the ordering, but Skip-Lists and Balanced Binary Search Trees (such as TreeMap) both serve the purpose to provide average O(log(N)) access here.
For a given sort set,
we can implement it as an independent class as follows:
class SortedSet {
TreeMap<String, Integer>> keyToScore;
TreeMap<Integer, Set<String>>> scoreToKey
public SortedSet() {
keyToScore= new TreeMap<>();
scoreToKey= new TreeMap<>();
}
void addItem(String key, int score) {
if (keyToScore.contains(key)) {
// Remove old key and old score
}
// Add key and score to both maps
}
List<String> getKeysInRange(int startScore, int endScore) {
// traverse scoreToKey and retrieve all values
}
....
}
I'd like to maintain a set of data that has two main attributes: 1. I can quickly look up the existence of an object by a numerical ID, and 2. I want to sort the data, but avoid needlessly sorting it since it can be slow. For a more concrete example, I have a set of user data, where each user has a unique ID (an int) and a unique username (a String). I'll be adding and removing users, and occasionally I want to generate a human-readable, alphabetically-sorted list to the user, but as my number of users increases, so does the time needed to sort the data.
How would you structure this? The only reasonable approach I can think of involves creating two separate data structures, and redundantly add/remove items to BOTH structures at the same time. As my data grows, it will be using more data than a single structure would. I might also introduce more bugs this way, as I have to remind myself to duplicate the operations to both structures when I come back to add to the code later. In other words, I could have:
TreeMap<String,Integer> nameSortedMap = new TreeMap<String,Integer>(String.CASE_INSENSITIVE_ORDER);
and
Map<Integer,String> idMap = new HashMap<Integer,String>();
Whenever I add or remove data, I do it on both maps. If I am retrieving a username by ID, I'd call idMap.get(id) or idMap.contains(id) (to see if a user exists). On the other hand, if I need to display a sorted list, I would use nameSortedMap.keySet(), which I gather should already be in name order, avoiding the need for additional work each time a sorted list is needed.
How's my thought process? Is there a better or simpler way to accomplish this? Thank you!
There are two ways I can think of:
Use a database and index both columns. Databases are fast, and can be very small (see: SQLite), but they're probably overkill if you don't need to save the data, or if this is the only thing you would use it for.
Create a class containing both of your maps above, which handles all inserting and deleting. That way, you only have one place where you have to remember to do operations on both. This is one of the major selling points for object oriented programming.
If you dont call the ordered list very often I dont think would be necessary to keep two maps, but if you prefer to do that. I would suggest you to create one class thtat extend the HashMap and implement the methods to handle both maps. Example:
public class UserMap extends HashMap<Integer, String> {
TreeMap<String, Integer> nameSortedMap = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
#Override
public String put(Integer key, String value) {
String put = super.put(key, value);
nameSortedMap.put(value, key);
return put;
}
#Override
public String remove(Object key) {
String toRemove = get(key);
if (toRemove != null) {
remove(key);
getOrderedName().remove(toRemove);
}
return toRemove;
}
public Set<String> getSortedNames() {
return nameSortedMap.keySet();
}
}
I didn't get the sense of Maps in Java. When is it recommended to use a Map instead of a List?
Say you have a bunch of students with names and student IDs. If you put them in a List, the only way to find the student with student_id = 300 is to look at each element of the list, one at a time, until you find the right student.
With a Map, you associate each student's ID and the student instance. Now you can say, "get me student 300" and get that student back instantly.
Use a Map when you need to pick specific members from a collection. Use a List when it makes no sense to do so.
Say you had exactly the same student instances but your task was to produce a report of all students' names. You'd put them in a List since there would be no need to pick and choose individual students and thus no need for a Map.
Java map: An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.
Java list: An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.
The difference is that they are different. Map is a mapping of key/values, a list of a list of items.
I thinks its a lot the question of how you want to access your data. With a map you can "directly" access your items with a known key, in a list you would have to search for it, evan if its sorted.
Compare:
List<MyObject> list = new ArrayList<MyObject>();
//Fill up the list
// Want to get object "peter"
for( MyObject m : list ) {
if( "peter".equals( m.getName() ) {
// found it
}
}
In a map you can just type
Map<String, MyObject> map = new HashMap<String, MyObject>();
// Fill map
MyObject getIt = map.get("peter");
If you have data to process and need to do it with all objects anyway, a list is what you want. If you want to process single objects with well known key, a map is better.
Its not the full answer (just my 2...) but I hope it might help you.
A map is used as an association of a key and a value. With a list you have basically only values.
The indexes in List are always int, whereas in Map you can have another Object as a key.
Resources :
sun.com - Introduction to the Collections Framework, Map
Depends on your performance concerns. A Map more explicitly a HashMap will guarantee O(1) on inserts and removes. A List has at worst O(n) to find an item. So if you would be so kind as to elaborate on what your scenario is we may help more.
Its probably a good idea to revise Random Access Vs Sequential Access Data Structures. They both have different run time complexities and suitable for different type of contexts.
When you want to map instead of list. The names of those interfaces have meaning, and you shouldn't ignore it.
Use a map when you want your data structure to represent a mapping for keys to values. Use a list when you want your data to be stored in an arbitrary, ordered format.
Map and List serve different purpose.
List holds collection of items. Ordered (you can get item by index).
Map holds mapping key -> value. E.g. map person to position: "JBeg" -> "programmer". And it is unordered. You can get value by key, but not by index.
Maps store data objects with unique keys,therefore provides fast access to stored objects. You may use ConcurrentHashMap in order to achieve concurrency in multi-threaded environments.
Whereas lists may store duplicate data and you have to iterate over the data elements in order to access a particular element, therefore provide slow access to stored objects.
You may choose any data structure depending upon your requirement.
In Java, ArrayList and HashMap are used as collections. But I couldn't understand in which situations we should use ArrayList and which times to use HashMap. What is the major difference between both of them?
You are asking specifically about ArrayList and HashMap, but I think to fully understand what is going on you have to understand the Collections framework. So an ArrayList implements the List interface and a HashMap implements the Map interface. So the real question is when do you want to use a List and when do you want to use a Map. This is where the Java API documentation helps a lot.
List:
An ordered collection (also known as a
sequence). The user of this interface
has precise control over where in the
list each element is inserted. The
user can access elements by their
integer index (position in the list),
and search for elements in the list.
Map:
An object that maps keys to values. A
map cannot contain duplicate keys;
each key can map to at most one value.
So as other answers have discussed, the list interface (ArrayList) is an ordered collection of objects that you access using an index, much like an array (well in the case of ArrayList, as the name suggests, it is just an array in the background, but a lot of the details of dealing with the array are handled for you). You would use an ArrayList when you want to keep things in sorted order (the order they are added, or indeed the position within the list that you specify when you add the object).
A Map on the other hand takes one object and uses that as a key (index) to another object (the value). So lets say you have objects which have unique IDs, and you know you are going to want to access these objects by ID at some point, the Map will make this very easy on you (and quicker/more efficient). The HashMap implementation uses the hash value of the key object to locate where it is stored, so there is no guarentee of the order of the values anymore. There are however other classes in the Java API that can provide this, e.g. LinkedHashMap, which as well as using a hash table to store the key/value pairs, also maintains a List (LinkedList) of the keys in the order they were added, so you can always access the items again in the order they were added (if needed).
If you use an ArrayList, you have to access the elements with an index (int type). With a HashMap, you can access them by an index of another type (for example, a String)
HashMap<String, Book> books = new HashMap<String, Book>();
// String is the type of the index (the key)
// and Book is the type of the elements (the values)
// Like with an arraylist: ArrayList<Book> books = ...;
// Now you have to store the elements with a string key:
books.put("Harry Potter III", new Book("JK Rownling", 456, "Harry Potter"));
// Now you can access the elements by using a String index
Book book = books.get("Harry Potter III");
This is impossible (or much more difficult) with an ArrayList. The only good way to access elements in an ArrayList is by getting the elements by their index-number.
So, this means that with a HashMap you can use every type of key you want.
Another helpful example is in a game: you have a set of images, and you want to flip them. So, you write a image-flip method, and then store the flipped results:
HashMap<BufferedImage, BufferedImage> flipped = new HashMap<BufferedImage, BufferedImage>();
BufferedImage player = ...; // On this image the player walks to the left.
BufferedImage flippedPlayer = flip(player); // On this image the player walks to the right.
flipped.put(player, flippedPlayer);
// Now you can access the flipped instance by doing this:
flipped.get(player);
You flipped player once, and then store it. You can access a BufferedImage with a BufferedImage as key-type for the HashMap.
I hope you understand my second example.
Not really a Java specific question. It seems you need a "primer" on data structures. Try googling "What data structure should you use"
Try this link http://www.devx.com/tips/Tip/14639
From the link :
Following are some tips for matching the most commonly used data structures with particular needs.
When to use a Hashtable?
A hashtable, or similar data structures, are good candidates if the stored data is to be accessed in the form of key-value pairs. For instance, if you were fetching the name of an employee, the result can be returned in the form of a hashtable as a (name, value) pair. However, if you were to return names of multiple employees, returning a hashtable directly would not be a good idea. Remember that the keys have to be unique or your previous value(s) will get overwritten.
When to use a List or Vector?
This is a good option when you desire sequential or even random access. Also, if data size is unknown initially, and/or is going to grow dynamically, it would be appropriate to use a List or Vector. For instance, to store the results of a JDBC ResultSet, you can use the java.util.LinkedList. Whereas, if you are looking for a resizable array, use the java.util.ArrayList class.
When to use Arrays?
Never underestimate arrays. Most of the time, when we have to use a list of objects, we tend to think about using vectors or lists. However, if the size of collection is already known and is not going to change, an array can be considered as the potential data structure. It's faster to access elements of an array than a vector or a list. That's obvious, because all you need is an index. There's no overhead of an additional get method call.
4.Combinations
Sometimes, it may be best to use a combination of the above approaches. For example, you could use a list of hashtables to suit a particular need.
Set Classes
And from JDK 1.2 onwards, you also have set classes like java.util.TreeSet, which is useful for sorted sets that do not have duplicates. One of the best things about these classes is they all abide by certain interface so that you don't really have to worry about the specifics. For e.g., take a look at the following code.
// ...
List list = new ArrayList();
list.add(
Use a list for an ordered collection of just values. For example, you might have a list of files to process.
Use a map for a (usually unordered) mapping from key to value. For example, you might have a map from a user ID to the details of that user, so you can efficiently find the details given just the ID. (You could implement the Map interface by just storing a list of keys and a list of values, but generally there'll be a more efficient implementation - HashMap uses a hash table internally to get amortised O(1) key lookup, for example.)
A Map vs a List.
In a Map, you have key/value pairs. To access a value you need to know the key. There is a relationship that exists between the key and the value that persists and is not arbitrary. They are related somehow. Example: A persons DNA is unique (the key) and a persons name (the value) or a persons SSN (the key) and a persons name (the value) there is a strong relationship.
In a List, all you have are values (a persons name), and to access it you need to know its position in the list (index) to access it. But there is no permanent relationship between the position of the value in the list and its index, it is arbitrary.