Structure like Java's EnumSet that can hold repeated elements

Structure like Java's EnumSet that can hold repeated elements - java

I need some structure where to store N Enums, some of them repeated. And be able to easily extract them. So far I've try to use the EnumSet like this.
cards = EnumSet.of(
BEST_OF_THREE,
BEST_OF_THREE,
SIMPLE_QUESTION,
SIMPLE_QUESTION,
STAR);
But now I see it can only have one of each. Conceptually, which one would be the best structure to use for this problem.
Regards
jose

You can use a Map of type Enumeration -> Integer, where the integer indicates how many of each there are. The google guava "MultiSet" does this for you, and handles the edge cases of adding an enum to the set when there is not already an entry, and removing an enum when it leaves none left.
Another strategy is to use the Enumeration ordinal index. Because this index is unique, you can use this to index into an int array that is sized to the Enumeration size, where the count in each array slot would indicate how many of each enumeration you have. Like this:
// initialize array for counting each enumeration type
// TODO: someone should double check every initial value will be zero
int[] cardCount = new int[CardEnum.values().length];
...
// incrementing the count for an enumeration (when we add)
cardCount[BEST_OF_THREE.ordinal()]++;
...
// decrementing the count for an enumeration (when we remove)
cardCount[BEST_OF_THREE.ordinal()]--;
// DEBUG: assert cardCount[BEST_OF_THREE.ordinal()] >= 0
...
// getting the count for an enumeration
int count = cardCount[BEST_OF_THREE.ordinal()];
... Some time later
Having read the clarifying comments underneath the original post that explained what the OP was asking, it is clear that you're best off with a linear structure with an entry per element. I didn't realize that you didn't need detailed information on how many of each you needed. Storing them in a MultiSet or an equivalent counting structure makes it hard to randomly pick, as you need to attribute an index picked at random from [0, size) to a particular container, which takes log time.

Sets don't allow duplicates, so if you want repeats you'll need either a List or a Map.
If you just need the number of duplicates, an EnumMap with Integer values is probably your best bet.
If the order is important, and you need quick access to the number of each type, you'll probably need to roll your own data structure.
If the order is important (but the count of each is not), then a List is the way to go, which implementation depends on how you will use it.
LinkedList - Best when there will be many inserts/removals from the beginning of the List. Indexing into a LinkedList is very expensive, and should be avoided whenever possible. If a List is built by shifting data onto the front of the list, but any later additions are at the end, conversion to an ArrayList once the initial List is built is a good idea - especially if indexing into the List is anticipated at any point.
ArrayList - When in doubt, this is a good place to start. Inserting or removing items requires shifting, so if this is a common operation look elsewhere.
TreeList - This is a good all-around option, and insertions and removals anywhere in the List are inexpensive. This does require the Apache commons library, and uses a bit more memory than the others.
Benchmarks, and the code used go generate them can be found in this gist.

Related

Fastest Variable Set in Java

I need Structure (Arraylist, LinkedList, etc) that is very fast for this case:
While the structure is not empty I search the structure for elements that satisfy a condition , lets say k, remove the elements that satisfy k and start over for another condition lets say k+1.
e.g.:
for (int i = 1 ; i <= 1000000; i++) {
structure.add(i);
}
d = 2;
while (!structure.isEmpty()) {
for(int boom : structure.clone) {
if (boom % d == 2) {
structure.remove(boom);
}
d++;
}
}

If the elements are primitives, then the fastest structure will most probably be a specialized primitive collection (e.g., trove). Following references for boxed primitives is a nearly sure cache miss and this probably dominates the costs.
I wouldn't suggest a LinkedList for the same reason: It's dead slow due to cache misses.
If the order is unimportant, than an ArrayList is perfect. Instead of removing an element, replace it by the last one and remove the last array element. This is an O(1) operation and doesn't suffer from the bad spatial locality.
If the order is important, you can build your own ArrayList-like structure. Instead of removing an element, you mark it for removal e.g. in a BitSet or in a boolean[]. Finally you perform the removal in one sweep by moving all elements to their right position and adjusting the length. The optimized loop will most probably look similar to CharMatcher.removeFrom loop.
A simpler solution would be to use an ArrayList and copy all surviving elements to another one. I'd bet it'd beat the LinkedList hands down. As a minor GC-friedly optimization you can work with two lists.

LinkedList should be fastest for this case. Use the iterator explicitly (structure.iterator()) and call the remove method of the iterator instead of calling structure.remove(element)!

I don't know your exact use case, but here's one note.
If you have your predicates P1 .. PN pre-compiled, available, and if you are not modifying the contents of the collection and if your predicates are not dependent on each other, you might want to create a composite predicate, like bundle up N predicates in some logical order, and then in only one iteration over your collection perform the filtering method.
As for data structure, I'd think if it like this:
If my filtering predicates will be totally arbitrary, then a list should be OK to use.
In some more specific cases with very limited and strict value sets, you might consider a tree-like or a graph-like structure, where you could have some master nodes which would denote that property "property1" has value "value1". In case you wanted to drop all items where "property1" value is "value1" you could tell that master node to remove all his children (and that they should detach themselves from any other parent master nodes they might have).

Sorted List data structure
If you construct the lists yourself you can consider using a sorted data-structure. It will give you best search performance( log n complexity so it is very fast).
Linked List data structure
LinkedList gives you constant time element removal but random access doesn't have constant complexity (is slow).
You will have to benchmark if a LinkedList or a sorted list would be faster for your scenario.

If your elements are ints, I suppose bit set would be the fastest data structure for this task. Iteration would be slightly slower than through array list (even not standard java.util.ArrayList, only primitive specialization), but remove ops cost nearly nothing, while removes from any array list are quite expensive.
Note, you can gain much by working directly with long[] as bit set and performing bitwise operations by hand, because java.util.BitSet is not very performance-focused. But, of cause, start with BitSet.

Indexed addition without IndexOutOfBounds Exception

I have n objects each of them with an identifying number. I get them unsorted but the range of indexes (0, n-1) is used to identify them. I want to access them as fastest as possible. I suppose that an ArrayList would be the best option, I'd add the object with identifier n at the position of the ArrayList with index n by:
list.add(identifier, object);
The problem is that when I am adding the objects I get an IndexOutOfBounds Exception because I'm adding them unsorted and the size() is smaller although I know that previous positions will also be filled.
Another option is to use a HashMap but I suppose that this will decrease performance.
Do you know a collection that has the behavior described above?

Do you know a collection that has the behavior described above?
It sounds like you need a plain old Java array. And if you need it as a collection, then use "Arrays.asList(...)" to create a List wrapper for it.
Now this won't work if you needed to add or remove elements from the array / collection, but it sounds like you don't need to from your problem description.
If you do need to add / remove elements (as distinct from using set to update the element at a given position, then Peter Lawrey's approach is best.
By contrast, a HashMap<Integer, Object> would be an expensive alternative. At a rough estimate, I'd say it that "indexing" operations would be at least 10 times slower, and the data structure would take 10 times the space compared to an equivalent array or ArrayList type. A hash table based solution is only really a viable alternative (from a performance perspective) if the array is large and sparse.

Sometimes you get the indexes out of order. This requires you to add dummy entries which may be filled later.
int indexToAdd = ...
E elementToAdd = ...
while(list.size() <= indexToAdd) list.add(null);
list.set(indexToAdd, elementToAdd);
This will allow you to add entries beyond the current end of the list.
The Javadoc for List.add(int, E) and List.set(int, E) both state
IndexOutOfBoundsException - if the index is out of range (index < 0 || index > size())
If you attempt to add entries beyond the end
List list = new ArrayList();
list.add(1, 1);
you get
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.rangeCheckForAdd(ArrayList.java:612)
at java.util.ArrayList.add(ArrayList.java:426)
at Main.main(Main.java:28)

I'm not sure how much more expensive a HashMap<Integer, T> would be. Integer.hashCode() is quite efficient, though the only really expensive operation might be copying the data into a new larger array as the number of items increases. However, if you know your n, you could use a normal array.
As an alternative, you could implement your own Map<Integer, T> that does not use the hash code but the Integer itself. Before you do this, make sure that neither an array is sufficient nor HashMap is efficient enough!

I think you have at least two good options.
The first is to use a straight Java array initialized to the appropriate length, then add objects with the syntax:
theArray[i] = object;
The second would be to use a HashMap, and add objects with the syntax:
theMap.put(i, object);
I'm not sure what performance issues you're worried about, but adding elements within the range, clearing (out of an array) or removing (out of a HashMap), and finding elements from a given index (or key, for a HashMap) are all O(1) for both structures. I would also suggest taking a look at Wikipedia's list of data structures if neither of these seem good.

Adding elements into ArrayList at position larger than the current size

Currently I'm using an ArrayList to store a list of elements, whereby I will need to insert new elements at specific positions. There is a need for me to enter elements at a position larger than the current size. For e.g:
ArrayList<String> arr = new ArrayList<String>();
arr.add(3,"hi");
Now I already know there will be an OutOfBoundsException. Is there another way or another object where I can do this while still keeping the order? This is because I have methods that finds elements based on their index. For e.g.:
ArrayList<String> arr = new ArrayList<String>();
arr.add("hi");
arr.add(0,"hello");
I would expect to find "hi" at index 1 instead of index 0 now.
So in summary, short of manually inserting null into the elements in-between, is there any way to satisfy these two requirements:
Insert elements into position larger than current size
Push existing elements to the right when I insert elements in the middle of the list
I've looked at Java ArrayList add item outside current size, as well as HashMap, but HashMap doesn't satisfy my second criteria. Any help would be greatly appreciated.
P.S. Performance is not really an issue right now.
UPDATE: There have been some questions on why I have these particular requirements, it is because I'm working on operational transformation, where I'm inserting a set of operations into, say, my list (a math formula). Each operation contains a string. As I insert/delete strings into my list, I will dynamically update the unapplied operations (if necessary) through the tracking of each operation that has already been applied. My current solution now is to use a subclass of ArrayList and override some of the methods. I would certainly like to know if there is a more elegant way of doing so though.

Your requirements are contradictory:
... I will need to insert new elements at specific positions.
There is a need for me to enter elements at a position larger than the current size.
These imply that positions are stable; i.e. that an element at a given position remains at that position.
I would expect to find "hi" at index 1 instead of index 0 now.
This states that positions are not stable under some circumstances.
You really need to make up your mind which alternative you need.
If you must have stable positions, use a TreeMap or HashMap. (A TreeMap allows you to iterate the keys in order, but at the cost of more expensive insertion and lookup ... for a large collection.) If necessary, use a "position" key type that allows you to "always" generate a new key that goes between any existing pair of keys.
If you don't have to have stable positions, use an ArrayList, and deal with the case where you have to insert beyond the end position using append.
I fail to see how it is sensible for positions to be stable if you insert beyond the end, and allow instability if you insert in the middle. (Besides, the latter is going to make the former unstable eventually ...)

even you can use TreeMap for maintaining order of keys.

First and foremost, I would say use Map instead of List. I guess your problem can be solved in better way if you use Map. But in any case if you really want to do this with Arraylist
ArrayList<String> a = new ArrayList<String>(); //Create empty list
a.addAll(Arrays.asList( new String[100])); // add n number of strings, actually null . here n is 100, but you will have to decide the ideal value of this, depending upon your requirement.
a.add(7,"hello");
a.add(2,"hi");
a.add(1,"hi2");

Use Vector class to solve this issue.
Vector vector = new Vector();
vector.setSize(100);
vector.set(98, "a");
When "setSize" is set to 100 then all 100 elements gets initialized with null values.

For those who are still dealing with this, you may do it like this.
Object[] array= new Object[10];
array[0]="1";
array[3]= "3";
array[2]="2";
array[7]="7";
List<Object> list= Arrays.asList(array);
But the thing is you need to identify the total size first, this should be just a comment but I do not have much reputation to do that.

Java find nearest (or equal) value in collection

I have a class along the lines of:
public class Observation {
private String time;
private double x;
private double y;
//Constructors + Setters + Getters
}
I can choose to store these objects in any type of collection (Standard class or 3rd party like Guava). I have stored some example data in an ArrayList below, but like I said I am open to any other type of collection that will do the trick. So, some example data:
ArrayList<Observation> ol = new ArrayList<Observation>();
ol.add(new Observation("08:01:23",2.87,3.23));
ol.add(new Observation("08:01:27",2.96,3.17));
ol.add(new Observation("08:01:27",2.93,3.20));
ol.add(new Observation("08:01:28",2.93,3.21));
ol.add(new Observation("08:01:30",2.91,3.23));
The example assumes a matching constructor in Observation. The timestamps are stored as String objects as I receive them as such from an external source but I am happy to convert them into something else. I receive the observations in chronological order so I can create and rely on a sorted collection of observations. The timestamps are NOT unique (as can be seen in the example data) so I cannot create a unique key based on time.
Now to the problem. I frequently need to find one (1) observation with a time equal or nearest to a certain time, e.g if my time was 08:01:29 I would like to fetch the 4th observation in the example data and if the time is 08:01:27 I want the 3rd observation.
I can obviously iterate through the collection until I find the time that I am looking for, but I need to do this frequently and at the end of the day I may have millions of observations so I need to find a solution where I can locate the relevant observations in an efficient manner.
I have looked at various collection-types including ones where I can filter the collections with Predicates but I have failed to find a solution that would return one value, as opposed to a subset of the collection that fulfills the "<="-condition. I am essentially looking for the SQL equivalent of SELECT * FROM ol WHERE time <= t LIMIT 1.
I am sure there is a smart and easy way to solve my problem so I am hoping to be enlightened. Thank you in advance.

Try TreeSet providing a comparator that compares the time. It mantains an ordered set and you can ask for TreeSet.floor(E) to find the greatest min (you should provide a dummy Observation with the time you are looking for). You also have headSet and tailSet for ordered subsets.
It has O(log n) time for adding and retrieving. I think is very suitable for your needs.
If you prefer a Map you can use a TreeMap with similar methods.

Sort your collection (ArrayList will probably work best here) and use BinarySearch which returns an integer index of either a match of the "closest" possible match, ie it returns an...
index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(),

Have the Observation class implement Comparable and use a TreeSet to store the objects, which will keep the elements sorted. TreeSet implements SortedSet, so you can use headSet or tailSet to get a view of the set before or after the element you're searching for. Use the first or last method on the returned set to get the element you're seeking.
If you are stuck with ArrayList, but can keep the elements sorted yourself, use Collections.binarySearch to search for the element. It returns a positive number if the exact element is found, or a negative number that can be used to determine the closest element. http://download.oracle.com/javase/1.4.2/docs/api/java/util/Collections.html#binarySearch(java.util.List,%20java.lang.Object)

If you are lucky enough to be using Java 6, and the performance overhead of keeping a SortedSet is not a big deal for you. Take a look at TreeSet ceiling, floor, higher and lower methods.

Question regarding Java's LinkedList class

I have a question regarding the LinkedList class in Java.
I have a scenario wherein i need to add or set an index based on whether the index exists in the linkedlist or not. A pseudo-code of what i want to achieve is --
if index a exists within the linkedlist ll
ll.set(a,"arbit")
else
ll.add(a,"arbit")
I did go through the Javadocs for the LinkedList class but did not come across anything relevant.
Any ideas ?
Thanks
p1ng

What about using a Map for this:
Map<Integer, String> map = new HashMap<Integer, String>();
// ...
int a = 5;
map.put(a, "arbit");
Even if a already exists, put will just replace the old String.

Searching in linked list is not very efficient (O(n)). Have you considering using different data structure - e.g. HashMap which would give you O(1) access time?

If you need sequential access as well as keyed access you might want to try a LinkedHashMap, available as from 1.4.2
http://download.oracle.com/javase/1.4.2/docs/api/java/util/LinkedHashMap.html

Map<Integer, String> is definitely a good (the best?) way to go here.
Here's an option for keeping with LinkedList if that's for some bizarre reason a requirement. It has horrible runtime performance and disallows null, since null now becomes an indicator that an index isn't occupied.
String toInsert = "arbit";
int a = 5;
//grow the list to allow index a
while ( a >= ll.size() ) {
ll.add(null);
}
//set index a to the new value
ll.set(a, toInsert);
If you're going to take this gross road, you might be better off with an ArrayList.
Why is it so bad? Say you had only one element at index 100,000. This implementation would require 100,000 entries in the list pointing to null. This results in horrible runtime performance and memory usage.

LinkedList cannot have holes inside, so you can't have list [1,2,3,4] and then ll.add(10,10), so I think there's something wrong with your example. Use either Map or search for some other sparse array

It looks like you're trying to use a as a key, and don't state whether you have items at index i < a. If you run your code when ll.size() <= a then you'll end up with a NullPointerException.
And if you add an item at index a the previous item at a will now be at a+1.
In this case it would be best to remove item at a first (if it exists) then add item "arbit" into a. Of course, the condition above re: ll.size() <=a still applies here.
If the order of the results is important, a different approach could use a HashMap<Integer,String> to create your dataset, then extract the keys using HashMap<?,?>.getKeySet() then sort them in their natural order (they're numeric after all) then extract the values from the map while iterating over the keySet. Nasty, but does what you want... Or create your own OrderedMap class, that does the same...
Could you expand on why you need to use a LinkedList? Is ordering of the results important?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Structure like Java's EnumSet that can hold repeated elements - java

Related

Fastest Variable Set in Java

Indexed addition without IndexOutOfBounds Exception

Adding elements into ArrayList at position larger than the current size

Java find nearest (or equal) value in collection

Question regarding Java's LinkedList class

Categories

Resources