Efficient EnumSet + List

Efficient EnumSet + List - java

Someone knows a nice solution for EnumSet + List
I mean I need to store enum values and I also need to preserve the order , and to be able to access its index of the enum value in the collection in O(1) time.

The closest thing I can come to think of, present in the API is the LinkedHashSet:
From http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashSet.html:
Hash table and linked list implementation of the Set interface, with predictable iteration order.
I doubt it's possible to do what you want. Basically, you want to look up indexes in constant time, even after modifying the order of the list. Unless you allow remove / reorder operations to take O(n) time, I believe you can't get away with lower than O(log n) (which can be achieved by a heap structure).

The only way I can see to satisfy ordering and O(1) access is to duplicate the data in a List and an array of indexes (wrapped in a nice little OrderedEnumSet, of course).

Related

Java: What collection type should I use for this case?

What I need:
Fastest put/remove, this is used alot.
Iteration, also used frequently.
Holds an object, e.g. Player. remove should be o(1) so maybe hashmap?
No duplicate keys
direct get() is never used, mainly iterating to retrieve data.`
I don't worry about memory, I just want the fastest speed possible even if it's at the cost of memory.

For iteration, nothing is faster than a plain old array. Entries are saved sequentially in memory, so the JVM can get to the next entry simply by adding the length of one entry to the its address.
Arrays are typically a bit of a hassle to deal with compared to maps or lists (e.g: no dictionary-style lookups, fixed length). However, in your case I think it makes sense to go with a one or two dimensional array since the length of the array will not change and dictionary-style lookups are not needed.

So if I understand you correctly you want to have a two-dimensional grid that holds information of which, if any, player is in specific tiles? To me it doesn't sound like you should be removing, or adding things to the grid. I would simply use a two-dimensional array that holds type Player or something similar. Then if no player is in a tile you can set that position to null, or some static value like Player.none() or Tile.empty() or however you'd want to implement it. Either way, a simple two-dimensional array should work fine. :)

The best Collection for your case is a LinkedList. Linked lists will allow for fast iteration, and fast removal and addition at any place in the linked list. For example, if you use an ArrayList, and you can to insert something at index i, then you have to move all the elements from i to the end one entry to the right. The same would happen if you want to remove. In a linked list you can add and remove in constant time.
Since you need two dimensions, you can use linked lists inside of linked lists:
List<List<Tile> players = new LinkedList<List<Tile>>(20);
for (int i = 0; i < 20; ++i){
List<Tile> tiles = new LinkedList<Tile>(20);
for (int j = 0; j < 20; ++j){
tiles.add(new Tile());
}
players.add(tiles);
}

use a map of sets guarantee O(1) for vertices lookup and amortized O(1) complexity edge insertion and deletions.
HashMap<VertexT, HashSet<EdgeT>> incidenceMap;

There is no simple one-size-fits-all solution to this.
For example, if you only want to append, iterate and use Iterator.remove(), there are two obvious options: ArrayList and LinkedList
ArrayList uses less memory, but Iterator.remove() is O(N)
LinkedList uses more memory, but Iterator.remove() is O(1)
If you also want to do fast lookup; (e.g. Collection.contains tests), or removal using Collection.remove, then HashSet is going to be better ... if the collections are likely to be large. A HashSet won't allow you to put an object into the collection multiple times, but that could be an advantage. It also uses more memory than either ArrayList or LinkedList.
If you were more specific on the properties required, and what you are optimizing for (speed, memory use, both?) then we could give you better advice.

The requirement of not allowing duplicates is effectively adding a requirement for efficient get().
Your options are either hash-based, or O(Log(N)). Most likely, hashcode will be faster, unless for whatever reason, calling hashCode() + equals() once is much slower than calling compareTo() Log(N) times. This could be, for instance, if you're dealing with very long strings. Log(N) is not very much, by the way: Log(1,000,000,000) ~= 30.
If you want to use a hash-based data structure, then HashSet is your friend. Make sure that Player has a good fast implementation of hashCode(). If you know the number of entries ahead of time, specify the HashSet size. ( ceil(N/load_factor)+1. The default load factor is 0.75).
If you want to use a sort-based structure, implement an efficient Player.compareTo(). Your choices are TreeSet, or Skip List. They're pretty comparable in terms of characteristics. TreeSet is nice in that it's available out of the box in the JDK, whereas only a concurrent SkipList is available. Both need to be rebalanced as you add data, which may take time, and I don't know how to predict which will be better.

List vs. Map: Which takes less space and more efficient?

I have two classes Foo and Bar.
class Foo
{
Set<Integer> bars; // Foo objects have collection of bars.
Set<Integer> adjacents; // Adjacency list of Foos.
}
class Bar
{
int foo; // ID of foo of which this object belongs to
Ipsum ipsum; // This an arbitrary class. But it must be present
Map<Integer, Float> adjacents; // Adjacency list of Bars
}
Number of Bars are predefined (up to 1000). Hence, I may use an array.
But number of Foos are undefined (at most #ofBars/4).
When you consider addition, deletion and get(), I need the one which is faster and takes less space (because I'm going to use serialization).
Here are my options (as far as I have thought)
Option 1: Don't define a class for Foo. Instead, use List<Set<Integer>> foo; and another map for Map> fooAdjacencies;
Option 2: Use Map<Integer, Set<Integer> foo if I want to get bars of i, I simply write foo.get(i).
Option 3: Dont define classes. Instead, use option 2 and for Bar class:
Map<Integer, Ipsum> bar;
Map<Integer, Map<Integer, Floar>> barAdjacencies;
Which option should I choose in terms of space and time efficiency?

This sounds like it'd be very helpful for you (specifically the Data Structures section): http://bigocheatsheet.com/
You say
I need my structure to be efficient while adding, removing and finding elements. No other behavior.
The problem is that Lists and Maps are usually used in totally different cases. Their names describe their use cases fairly well -- you use a List if you need to list something (probably in some sequential order), while a Map would be used if you need to map an input to an output. You can use a Map as a List by mapping Integers to your elements, but that's overcomplicating things a bit. However, even within List and Map you can have different implementations that differ wildly in asymptotic performance.
With few exceptions, data structures will take O(n) space, which makes sense. If memory serves, anything other than an ArrayList (or other collections backed only by a primitive array) will have a decent amount of space overhead as they use other objects (e.g. Nodes for LinkedLists and Entry objects for Maps) to organize the underlying structure. I wouldn't worry too much about this overhead though unless space really is at a premium.
For best-performance addition, deletion, and search, you want to look at how the data structure is implemented.
LinkedList-style implementation will net you O(1) addition and deletion (and with a good constant factor, too!), but will have a pretty expensive get() with O(n) time, because the list will have to be traversed every time you want to get something. Java's LinkedList implementation, though, removes in O(n) time; while the actual act of deletion is O(1), that's only if you have a reference to the actual node that you're removing. Because you don't, removals in Java's LinkedList are O(n) -- O(n) for searching for the node to remove, and O(1) for removal.
Data structures backed with a plain array will have O(1) get() because it's an array, but takes O(n) to add, and delete, because any addition/deletion other than at the last element requires all other elements to be shuffled (in Java's implementation at least). Searching for something using an object instead of an index is done in O(n) time because you have to iterate over the array to find the object.
The following two structures are usually Maps, and so usually require you to implement equals() (and hashCode() for HashMaps):
Data structures backed by a tree (e.g. TreeMap) will have amortized (I think) O(lg n) add/remove, as a good implementation should be self-balancing, making worst-case addition/deletions only have to go through the height of the tree at most. get() operations are O(lg n). Using a tree requires that your elements be sortable/comparable in some way, which could be a bonus or hinderance, depending on your usage.
Hash-based data structures have amortized (average) O(1) everything, albeit with a slightly higher constant factor due to the overhead of hashing (and following any chains if the hash spread is poor). HashMaps could start sucking if you write a bad hashCode() function, though, so you want to be careful with that, although the implementers of Java's HashMap did do some magic behind the scenes to try to at least partially negate the effect of bad hashCode() implementations.
Hope that rundown helped. If you clear up how your program is structured, I might be able to give a recommendation. Until then, the best I can do is show you the options and let you pick.

I find this problem description a little hard to follow, but I think you're just looking for general collections/data structures advice.
A list (say, an array list) easily allows you to add and iterate over elements. When it is expanded beyond the size of the underlying array, a one-off costly resize operation is executed to add more space; but that is fine because it happens rarely and the amortized time is not bad. Searching for a specific element in a list is slow because you need to traverse it in order; there is no implied ordering in most lists. Deleting elements depends on the underlying list implementation. An array list could be slow in this regard; but I'm guessing that they optimized it just by marking the underlying element as deleted and skipping it during iteration. When using lists you also have to consider where you are adding elements. Linked lists are slower to iterate but can easily add and remove elements at any position. Array lists cannot easily add an element anywhere but the end.
Per your requirements, if you are required to execute a "get" or find on an element, then you need some kind of searching functionality to speed it up. This would make a map better as you can locate elements in log(n) time instead of linear time as when searching an unordered list. Adding and removing elements in a list is also relatively fast, so that's probably your best option.
Most importantly, implement it more than one way and profile it yourself to learn more :) Lists are rarely a good choice when searching is required though.

Java HashSet vs Array Performance

I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.
On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.

Depends on your data;
HashSet gives you an O(1) contains() method but doesn't preserve order.
ArrayList contains() is O(n) but you can control the order of the entries.
Array if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set, you can directly use SortedSet which too has O(n) too but with flexible operations.
I believe Set is more flexible.

The choice greatly depends on what do you want to do with it.
If it is what mentioned in your question:
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are
If this is what you need to do, the you need neither of them. There is a size() method in Collection for which you can get the size of it, which mean how many of them there are in the collection.
If what you mean for "collection of object" is not really a collection, and you need to choose a type of collection to store your objects for further processing, then you need to know, for different kind of collections, there are different capabilities and characteristic.
First, I believe to have a fair comparison, you should consider using ArrayList instead Array, for which you don't need to deal with the reallocation.
Then it become the choice of ArrayList vs HashSet, which is quite straight-forward:
Do you need a List or Set? They are for different purpose: Lists provide you indexed access, and iteration is in order of index. While Sets are mainly for you to keep a distinct set of data, and given its nature, you won't have indexed access.
After you made your decision of List or Set to use, then it is a choice of List/Set implementation, normally for Lists, you choose from ArrayList and LinkedList, while for Sets, you choose between HashSet and TreeSet.
All the choice depends on what you would want to do with that collection of data. They performs differently on different action.
For example, an indexed access in ArrayList is O(1), in HashSet (though not meaningful) is O(n), (just for your interest, in LinkedList is O(n), in TreeSet is O(nlogn) )
For adding new element, both ArrayList and HashSet is O(1) operation. Inserting in the middle is O(n) for ArrayList, while it doesn't make sense in HashSet. Both will suffer from reallocation, and both of them need O(n) for the reallocation (HashSet is normally slower in reallocation, because it involve calculation of hash for each element again).
To find if certain element exists in the collection, ArrayList is O(n) and HashSet is O(1).
There are still lots of operations you can do, so it is quite meaningless to discuss for performance without knowing what you want to do.

theoretically, and as SCJP6 Study guide says :D
arrays are faster than collections, and as said, most of the collections depend mainly on arrays (Maps are not considered Collection, but they are included in the Collections framework)
if you guarantee that the size of your elements wont change, why get stuck in Objects built on Objects (Collections built on Arrays) while you can use the root objects directly (arrays)

It looks like you will want an HashMap that maps id's to counts. Particularly,
HashMap<Integer,Integer> counts=new HashMap<Integer,Integer>();
counts.put(uniqueID,counts.get(uniqueID)+1);
This way, you get amortized O(1) adds, contains and retrievals. Essentially, an array with unique id's associated with each object IS a HashMap. By using the HashMap, you get the added bonus of not having to manage the size of the array, not having to map the keys to an array index yourself AND constant access time.

Searching LinkedHashMap, faster method than sequential?

I am wondering if there is a more efficient method for getting objects out of my LinkedHashMap with timestamps greater than a specified time. I.e. something better than the following:
Iterator<Foo> it = foo_map.values().iterator();
Foo foo;
while(it.hasNext()){
foo = it.next();
if(foo.get_timestamp() < minStamp) continue;
break;
}
In my implementation, each of my objects has essentially three values: an "id," "timestamp," and "data." The objects are insterted in order of their timestamps, so when I call an iterator over the set, I get ordered results (as required by the linked hashmap contract). The map is keyed to the object's id, so I can quickly lookup them up by id.
When I look them up by a timestamp condition, however, I get an iterator with sorted results. This is an improvement over a generic hashmap, but I still need to iterate sequentially over much of the range until I find the next entry with a higher timestamp than the specified one.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential? If I went with a treemap as an alternative, would it offer overall speed advantages, or is it doing essentially the same thing in the background? Since the collection is sorted by insertion order already, I'm thinking tree map has a lot more overhead I don't need?

There is no faster way ... if you just use a LinkedHashMap.
If you want faster access, you need to use a different data structure. For example, a TreeSet with an appropriate comparator might be a better solution for this aspect of your problem. For example if your TreeSet is ordered by date, then calling tailSet with an appropriate dummy value can give you all elements greater or equal to a given date.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential?
Not for a LinkedHashMap.
However, if the ordered list was an ArrayList instead, then you could use "binary search" on the list ... provided that you could lock it to prevent concurrent modifications while you are searching. (Actually, concurrency is a potential issue to consider no matter how you implement this ... including your current linear search.)
If you want to keep the ability to do id lookups, then you need two data structures; e.g. a TreeSet and a HashMap which share their element objects. A TreeSet will probably be more efficient than trying to maintain an ArrayList in order assuming that there are random insertions and/or random deletions.

Most Lightweight Java Collection

If I am going to create a Java Collection, and only want to fill it with elements, and then iterate through it (without knowing the necessary size beforehand), i.e. all I need is Collection<E>.add(E) and Collection<E>.iterator(), which concrete class should I choose? Is there any advantage to using a Set rather than a List, for example? Which one would have the least overhead?

which concrete class should I choose?
I would probably just go with an ArrayList or a LinkedList. Both support the add and iterator methods, and neighter of them have any considerable overhead.
Is there any advantage to using a Set rather than a List, for example?
No, I wouldn't say so. (Unless you rely on the order of the elements, in which case you must use a List, or want to disallow duplicates, in which case you should use a Set.)
(I don't see how any Set implementation could beat a list implementation for add / iterator methods, so I'd probably go with a List even if I don't care about order.)
Which one would have the least overhead?
Sounds like micro benchmarking here, but if I'd be forced to guess, I'd say ArrayList (or perhaps LinkedList in coner cases where ArrayLists need to reallocate memory often :-)

Do not go with a Set. Sets and Lists differ according to their purpose, that you should always consider when choosing the right Collection
a List is there for maintaining elements in the order you added them; and if you insert the same element twice it will be kept twice
a Set is there for holding one specific element exactly once (uniqueness); order is only relevant for specific implementations (like TreeSet), but still elements that are 'the same' would not be added twice

Set is only meaningful if you want to sort your objects and to make sure no duplicate element is 'registered'. Else, an ArrayList is just fine.
However, if you want to add elements while iterating too, an ArrayBlockingQueue is better.

Here are some key points which can help you to choose your collection according to your requirement -
List(ArrayList or LinkedList)
Allowed duplicate values.
Insertion order preserved.
Set
Not allowed duplicate values.
Insertion order is not preserved.
So according to your requirement List seems to be a suitable choice.
Now Between ArrayList and LinkedList -
ArrayList is a random access list. Use if your frequent operation is the retrieval of elements.
LinkedList is the best option if you want to add or remove elements from the list.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.