Is there a data structure that does the following:
Returns the value given the index
Returns the index given the value
Returns all values sorted by index as List<>
As far as I am aware, a HashMap supports property 2, doesn't support properties 1 and 3.
An ArrayList supports 1 and 3 but not 2.
Is there something that fits my needs?
(1) and (2) describe a bi-directional map; the Guava library provides several implementations of this data structure.
Unfortunately there isn't a SortedBiMap class (presently), however depending on your specific constraints you may be able to address (3) in different ways.
For example, the simplest thing to do would be to create a new wrapping type that contains a BiMap<Integer, V> and a List<V> and ensures the two data structures are kept in sync. This may be inefficient for some use-cases (e.g. removals are O(n) due to the backing list) but may well be all you need.
Alternatively you could try to loosen constraint (3) if you don't really need a List, but just need to be able to iterate in a fixed order, in which case you could probably use Guava's ImmutableBiMap, which is guaranteed to iterate in insertion-order.
Otherwise, you could probably create your own SortedBiMap type modeled after HashBiMap but using TreeMap instead of HashMap. This would allow you to iterate over the keys in order (e.g. 0->n) regardless of their insertion order.
List (Any List including ArrayList) supports all 3 of your requirments. 1 and 3 you already know about, for #2 see method indexOf(). Also see related method lastIndexOf()
Related
There is a Team object , that contains list of players List<Players>. All teams need to be stored in a Teams collection.
Conditions:
If a new player need to be added to a particular team , that particular team is retrieved from Teams and Player need to be added to Players list of that team
Each Team object in the collection Teams need to be unique based on the team name
Team objects in the collection need to be sorted based on team name.
Considerations:
In this scenario when I use List<Team> , I can achieve 1, 3 . But uniqueness cannot be satisfied.
If I use TreeSet<Team> 2,3 can be achieved. But as there is no get method on TreeSet , a particular team cannot be selected
So I ended up using TreeMap<teamName,Team>. This makes all 1,2,3 possible. But I think it's not the good way to do it
Which Data Structure is ideal for this use case? Preferably form Java collections.
You can utilize your TreeSet if you wish. However, if you're going to utilize the Set interface you can use remove(object o) instead of get. You'll remove the object, make your modifications, then add it back into the set.
I think extending (i.e. creating a subclass from) ArrayList or LinkedList and overriding the set(), add(), addAll(), remove(), and removeRange() methods in such way that they ensure the uniqueness and sortedness conditions (invariants) would be a very clean design. You can also implement a binary search method in your class to quickly find a team with a given name.
ArrayList is a better choice to base your class on, if you aren't going to add or remove teams too frequently. ArrayList would give you O(n) insertion and removal, but O(log n) cost for element access and ensuring uniqueness if you use binary search (where n is the number of elements in the array).
See the generics tutorial for subclassing generics.
How about using a Guava's MultiMap? More precisely, a SetMultimap. Specifically, a SortedSetMultimap. Even more specifically, its TreeMultimap implementation (1).
Explanations:
In a MultiMap, a Key points not to a single value, but rather to a Collection of values.
This means you can bind to a single Team key a collection of several Player values, so that's Req1 solved.
In a SetMultiMap, the Keys are unique.
This gets your Req2 solved.
In a SortedSetMultimap, the Valuess are also sorted.
While you don't specifically care for this, it's nice to have.
In a TreeMultimap, The Keyset and each of their Values collections are Sorted.
This gets your Req3 sorted (See what I did there?)
Usage:
TreeMultimap<Team, Player> ownership = new TreeMultimap<Team, Player>();
ownership.put(team1, playerA);
ownership.put(team1, playerB);
ownership.put(team2, playerC);
Collection<Player> playersOfTeamA = ownership.get(team1); // contains playerA, playerB
SortedSet<Team> allTeams = ownership.keySet(); // contains team1, team2
Gothas:
Remember to set equals and hashCode correctly on your Team object to use its name.
Alternatively, you could use the static create(Comparator<? super K> keyComparator, Comparator<? super V> valueComparator) which provides a purpose-built comparison if you do not wish to change the natural ordering of Team. (use Ordering.natural() for the Player comparator to keep its natural ordering - another nice Guava thing). In any case, make sure it is compatible with equals!
MultiMaps are not Maps because puting a new value to a key does not remove the previously held value (that's the whole point), so make sure you understand it. (for instance it still hold that you cannot put a key-value pair twice...)
(1): I am unsure wether SortedSetMultimap is sufficient. In its Javadoc, it states the Values are sorted, but nothing is said of the keys. Does anyone know any better?
(2) I assure you, I'm not affiliated to Guava in any way. I just find it awesome!
I need a data structure to do a get / find in Log N time and iterate starting from the object that was returned by the get operation. The iterator should iterate in the same order in which elements are inserted into the data structure.
Can I achieve this using TreeSet ? Or any other data structure?
Thanks!
This answer assumes that you want to get / find items by value, as opposed to access by insertion sequence number. I assume that this value is completely unrelated to the order in which items are inserted.
The closest you can get with standard Java foundation classes is a LinkedHashSet. This allows fast searching and iteration in insertion order. But it does not give you an iterator starting at a given position, so you'll have to implement that yourself. Either based on the LinkedHashSet, or using your own set implementation. I guess the easiest way would be using a HashSet and implementing the linking yourself. That way, you could use the set methods to look up the starting element, and use that to construct an iterator following the links. You could hide the links inside a wrapper object, so you won't have to expose them in your API.
If you start with a SortedMap<Integer, Object> and use keys for keeping track of the insertion order, you'll be able to use the fast tailMap operation for your needs.
If you need to find the position of an object by a key (or maybe by the object itself), then introduce another WeakHashMap<Object, Integer> that will map from your key to the position of the object. You'll then use the retrieved sequence number as the key into the former map.
Provided you are using Java 6, you can have a look at ConcurrentSkipListSet and ConcurrentSkipListMap.
I'd like to have a Map that is also a Collection. Or more specifically, I'd like to be able to iterate over the entries in a Map, including the case where there are multiple entries for a particular key.
The specific problem I'm trying to solve is providing an object that can be used in jstl both to iterate over using c:forEach and in an expression like ${a.b.c}. In this example, I'd want ${a.b.c} to evaluate to the the first value of c (or null if there are none), but also be able to iterate over all cs with <c:forEach items="${a.b.c}"> and have the loop body see each individual value of c in turn, although they have the same key in the Map.
Looking at things from a method point of view, this should be straightforward, just provide a Map implementation whose entrySet() method returns a set with multiple Entries with the same key. But since this seems to violate the contract of a Map, will things break in subtle yet disastrous ways? Has anyone else done this sort of thing?
(If you guessed I'm trying to present xml, you'd be correct)
EDIT
Please note that this is for use in jstl, so whatever interface I present must meet 2 conditions:
for use with the [] and . operators, it must be a Map, List, array or JavaBeans object (and of those it can't be a List or array because the indexes will not be numbers)
for use with forEach it must be an array, Collection, Iterator, Enumeration, Map, or String.
So I guess the real question is, can I count on jstl only calling .containsKey(), .get(), and .entrySet() and not caring about invariants being violated, and not internally making a copy of the Map which would not preserve the special iteration behavior.
What you are looking for is a Multimap. Guava provides an implementation of it and specifically you are looking for ArrayListMultimap.
I barely remember jstl, but what you're saying sounds a kind of controversial:
In foreach:
here ${a.b.c} should point to some container of values and then we iterate over it.
On the other hand you say, ${a.b.c} "should evaluate to the the first value of c" (or null...)
Its an ambiguous definition.
If you feel like Multimap is not what you want, you can provide your own collection implementation (probably internally based on Multimap)
Just as an idea you can always look at a single element as a list (that accidentally
is comprised of one element). This way you would resolve your ambiguity, I guess.
I hope this helps
Having a Map with multiple entries for the same key irreparably breaks the Map contract. If Multimap doesn't work for you, then there's no way to do this without breaking a lot of things.
Specifically, if you pass your monstrosity to something that's specified to take a Map, it'll almost certainly break...and it sounds like that's what you want to do with it, so yeah.
how about you use a Map with Collections as values? then you can have different values for the same key and you can iterate over them by a nested foreach-loop
you can also easily write a wrapper for an existing map-implementation, which gives you a single iterator over all values, if you need it that way
I've been using HashMaps since I started programming again in Java without really understanding these Collections thing.
Honestly I am not really sure if using HashMaps all the way would be best for me or for production code. Up until now it didn't matter to me as long as I was able to get the data I need the way I called them in PHP (yes, I admit whatever negative thing you are thinking right now) where $this_is_array['this_is_a_string_index'] provides so much convenience to recall an array of variables.
So now, I have been working with java for more than 3 months and came across the Interfaces I specified above and wondered, why are there so many of these things (not to mention, vectors, abstractList {oh well the list goes on...})?
I mean how are they different from each other?
And more importantly, what is the best Interface to use in my case?
The API is pretty clear about the differences and/or relations between them:
Collection
The root interface in the collection hierarchy. A collection represents a group of objects, known as its elements. Some collections allow duplicate elements and others do not. Some are ordered and others unordered.
http://download.oracle.com/javase/6/docs/api/java/util/Collection.html
List
An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.
http://download.oracle.com/javase/6/docs/api/java/util/List.html
Set
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
http://download.oracle.com/javase/6/docs/api/java/util/Set.html
Map
An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.
http://download.oracle.com/javase/6/docs/api/java/util/Map.html
Is there anything in particular you find confusing about the above? If so, please edit your original question. Thanks.
A short summary of common java collections:
'Map': A 'Map' is a container that allows to store key=>value pair. This enables fast searches using the key to get to its associated value. There are two implementations of this in the java.util package, 'HashMap' and 'TreeMap'. The former is implemented as a hastable, while the latter is implemented as a balanced binary search tree (thus also having the property of having the keys sorted).
'Set': A 'Set' is a container that holds only unique elements. Inserting the same value multiple times will still result in the 'Set' only holding one instance of it. It also provides fast operations to search, remove, add, merge and compute the intersection of two sets. Like 'Map' it has two implementations, 'HashSet' and 'TreeSet'.
'List': The 'List' interface is implemented by the 'Vector', 'ArrayList' and 'LinkedList' classes. A 'List' is basically a collection of elements that preserve their relative order. You can add/remove elements to it and access individual elements at any given position. Unlike a 'Map', 'List' items are indexed by an int that is their position is the 'List' (the first element being at position 0 and the last at 'List.size()'-1). 'Vector' and 'ArrayList' are implemented using an array while 'LinkedList', as the name implies, uses a linked list. One thing to note is, unlike php's associative arrays (which are more like a Map), an array in Java and many other languages actually represents a contiguous block of memory. The elements in an array are basically laid out side by side on adjacent "slots" so to speak. This gives very fast lookup and write times, much faster than associative arrays which are implemented using more complex data structures. But they can't be indexed by anything other than the numeric positions within the array, unlike associative arrays.
To get a really good idea of what each collection is good for and their performance characteristics I would recommend getting a good idea about data structures like arrays, linked lists, binary search trees, hashtables, as well as stacks and queues. There is really no substitute to learning this if you want to be an effective programmer in any language.
You can also read the Java Collections trail to get you started.
In Brief (and only looking at interfaces):
List - a list of values, something like a "resizable array"
Set - a container that does not allow duplicates
Map - a collection of key/value pairs
A Map vs a List.
In a Map, you have key/value pairs. To access a value you need to know the key. There is a relationship that exists between the key and the value that persists and is not arbitrary. They are related somehow. Example: A persons DNA is unique (the key) and a persons name (the value) or a persons SSN (the key) and a persons name (the value) there is a strong relationship.
In a List, all you have are values (a persons name), and to access it you need to know its position in the list (index) to access it. But there is no permanent relationship between the position of the value in the list and its index, it is arbitrary.
■ List — An ordered collection of elements that allows duplicate entries
Concrete Classes:
ArrayList — Standard resizable list.
LinkedList — Can easily add/remove from beginning or end.
Vector — Older thread-safe version of ArrayList.
Stack — Older last-in, first-out class.
■ Set — Does not allow duplicates
Concrete Classes:
HashSet—Uses hashcode() to find unordered elements.
TreeSet—Sorted and navigable. Does not allow null values.
■ Queue — Orders elements for processing
Concrete Classes:
LinkedList — Can easily add/remove from beginning or end.
ArrayDeque—First-in, first-out or last-in, first-out. Does not allow null values.
■ Map — Maps unique keys to values
Concrete Classes:
HashMap — Uses hashcode() to find keys.
TreeMap — Sorted map. Does not allow null keys.
Hashtable — Older version of hashmap. Does not allow null keys or values.
That is a question that ultimately has a very complex answer--there are entire college classes dedicated to data structures. The short answer is that they all have trade-offs in memory usage and the speed of various operations.
What would be really healthy is some time with a nice book on data structures--I can almost guarantee that your code will improve significantly if you get a nice understanding of data structures.
That said, I can give you some quick, temporary advice from my experience with Java. For most simple internal things, ArrayList is generally preferred. For passing collections of data about, simple arrays are generally used. HashMap is only really used for cases when there is some logical reason to have special keys corresponding to values--I haven't seen anyone use them as a general data structure for everything. Other structures are more complicated and tend to be used in special cases.
As you already know, they are containers for objects. Reading their respective APIs will help you understand their differences.
Since others have described what are their differences about their usage, I will point you to this link which describes complexity of various data structures.
This list is programming language agnostic, and, as always, real world implementations will vary.
It is useful to understand complexity of various operations for each of these structures, since in the real world, it will matter if you're constantly searching for an object in your 1,000,000 element linked list that's not sorted. Performance will not be optimal.
List Vs Set Vs Map
1) Duplicity: List allows duplicate elements. Any number of duplicate elements can be inserted into the list without affecting the same existing values and their indexes.
Set doesn’t allow duplicates. Set and all of the classes which implements Set interface should have unique elements.
Map stored the elements as key & value pair. Map doesn’t allow duplicate keys while it allows duplicate values.
2) Null values: List allows any number of null values.
Set allows single null value at most.
Map can have single null key at most and any number of null values.
3) Order: List and all of its implementation classes maintains the insertion order.
Set doesn’t maintain any order; still few of its classes sort the elements in an order such as LinkedHashSet maintains the elements in insertion order.
Similar to Set Map also doesn’t stores the elements in an order, however few of its classes does the same. For e.g. TreeMap sorts the map in the ascending order of keys and LinkedHashMap sorts the elements in the insertion order, the order in which the elements got added to the LinkedHashMap.enter code here
List Vs Set Vs Map
1) Duplicity: List allows duplicate elements. Any number of duplicate elements can be inserted into the list without affecting the same existing values and their indexes.
Set doesn’t allow duplicates. Set and all of the classes which implements Set interface should have unique elements.
Map stored the elements as key & value pair. Map doesn’t allow duplicate keys while it allows duplicate values.
2) Null values: List allows any number of null values.
Set allows single null value at most.
Map can have single null key at most and any number of null values.
3) Order: List and all of its implementation classes maintains the insertion order.
Set doesn’t maintain any order; still few of its classes sort the elements in an order such as LinkedHashSet maintains the elements in insertion order.
Similar to Set Map also doesn’t stores the elements in an order, however few of its classes does the same. For e.g. TreeMap sorts the map in the ascending order of keys and LinkedHashMap sorts the elements in the insertion order, the order in which the elements got added to the LinkedHashMap.
Difference between Set, List and Map in Java -
Set, List and Map are three important interface of Java collection framework and Difference between Set, List and Map in Java is one of the most frequently asked Java Collection interview question. Some time this question is asked as When to use List, Set and Map in Java. Clearly, interviewer is looking to know that whether you are familiar with fundamentals of Java collection framework or not. In order to decide when to use List, Set or Map , you need to know what are these interfaces and what functionality they provide. List in Java provides ordered and indexed collection which may contain duplicates. Set provides an un-ordered collection of unique objects, i.e. Set doesn't allow duplicates, while Map provides a data structure based on key value pair and hashing. All three List, Set and Map are interfaces in Java and there are many concrete implementation of them are available in Collection API. ArrayList and LinkedList are two most popular used List implementation while LinkedHashSet, TreeSet and HashSet are frequently used Set implementation. In this Java article we will see difference between Map, Set and List in Java and learn when to use List, Set or Map.
Set vs List vs Map in Java
As I said Set, List and Map are interfaces, which defines core contract e.g. a Set contract says that it can not contain duplicates. Based upon our knowledge of List, Set and Map let's compare them on different metrics.
Duplicate Objects
Main difference between List and Set interface in Java is that List allows duplicates while Set doesn't allow duplicates. All implementation of Set honor this contract. Map holds two object per Entry e.g. key and value and It may contain duplicate values but keys are always unique. See here for more difference between List and Set data structure in Java.
Order
Another key difference between List and Set is that List is an ordered collection, List's contract maintains insertion order or element. Set is an unordered collection, you get no guarantee on which order element will be stored. Though some of the Set implementation e.g. LinkedHashSet maintains order. Also SortedSet and SortedMap e.g. TreeSet and TreeMap maintains a sorting order, imposed by using Comparator or Comparable.
Null elements
List allows null elements and you can have many null objects in a List, because it also allowed duplicates. Set just allow one null element as there is no duplicate permitted while in Map you can have null values and at most one null key. worth noting is that Hashtable doesn't allow null key or values but HashMap allows null values and one null keys. This is also the main difference between these two popular implementation of Map interface, aka HashMap vs Hashtable.
Popular implementation
Most popular implementations of List interface in Java are ArrayList, LinkedList and Vector class. ArrayList is more general purpose and provides random access with index, while LinkedList is more suitable for frequently adding and removing elements from List. Vector is synchronized counterpart of ArrayList. On the other hand, most popular implementations of Set interface are HashSet, LinkedHashSet and TreeSet. First one is general purpose Set which is backed by HashMap , see how HashSet works internally in Java for more details. It also doesn't provide any ordering guarantee but LinkedHashSet does provides ordering along with uniqueness offered by Set interface. Third implementation TreeSet is also an implementation of SortedSet interface, hence it keep elements in a sorted order specified by compare() or compareTo() method. Now the last one, most popular implementation of Map interface are HashMap, LinkedHashMap, Hashtable and TreeMap. First one is the non synchronized general purpose Map implementation while Hashtable is its synchronized counterpart, both doesn' provide any ordering guarantee which comes from LinkedHashMap. Just like TreeSet, TreeMap is also a sorted data structure and keeps keys in sorted order.
Often, I have a list of objects. Each object has properties. I want to extract a subset of the list where a specific property has a predefined value.
Example:
I have a list of User objects. A User has a homeTown. I want to extract all users from my list with "Springfield" as their homeTown.
I normally see this accomplished as follows:
List users = getTheUsers();
List returnList = new ArrayList();
for (User user: users) {
if ("springfield".equalsIgnoreCase(user.getHomeTown())
returnList.add(user);
}
I am not particularly satisfied with this solution. Yes, it works, but it seems so slow. There must be a non-linear solution.
Suggestions?
Well, this operation is linear in nature unless you do something extreme like index the collection based on properties you expect to examine in this way. Short of that, you're just going to have to look at each object in the collection.
But there may be some things you can do to improve readability. For example, Groovy provides an each() method for collections. It would allow you to do something like this...
def returnList = new ArrayList();
users.each() {
if ("springfield".equalsIgnoreCase(it.getHomeTown())
returnList.add(user);
};
You will need a custom solution for this. Create a custom collection such that it implements List interface and add all elements from original list into this list.
Internally in this custom List class you need to maintain some collections of Map of all attributes which can help you lookup values as you need. To populate this Map you will have to use introspection to find list of all fields and their values.
This custom object will have to implement some methods as List findAllBy(String propertyName, String propertyValue); that will use above hash map to look up those values.
This is not an easy straightforward solution. Further more you will need to consider nested attributes like "user.address.city". Making this custom List immutable will help a lot.
However even if you are iterating list of 1000's of objects in List, still it will be faster so you are better off iterating List for what you need.
As I have found out, if you are using a list, you have to iterate. Whether its a for-each, lambda, or a FindAll - it is still being iterated. No matter how you dress up a duck, it's still a duck. As far as I know there are HashTables, Dictionaries, and DataTables that do not require iteration to find a value. I am not sure what the Java equivalent implementations are, but maybe this will give you some other ideas.
If you are really interested in performance here, I would also suggest a custom solution. My suggestion would be to create a Tree of Lists in which you can sort the elements.
If you are not interested about the ordering of the elements inside your list (and most people are usually not), you could also use a TreeMap (or HashMap) and use the homeTown as key and a List of all entries as value. If you add new elements, just look up the belonging list in the Map and append it (if it is the first element of course you need to create the list first). If you want to delete an element simply do the same.
In the case you want a list of all users with a given homeTown you just need to look up that list in the Map and return it (no copying of elements needed), I am not 100% sure about the Map implementations in Java, but the complete method should be in constant time (worst case logarithmic, depending on the Map implementation).
I ended up using Predicates. Its readability looks similar to Drew's suggestion.
As far as performance is concerned, I found negligible speed improvements for small (< 100 items) lists. For larger lists (5k-10k), I found 20-30% improvements. Medium lists had benefits but not quite as large as bigger lists. I did not test super large lists, but my testing made it seem the large the list the better the results in comparison to the foreach process.